(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 277 results for author: Bai, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20564  [pdf, other

    cs.CL

    CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

    Authors: Tianshi Zheng, Jiaxin Bai, Yicheng Wang, Tianqing Fang, Yue Guo, Yauwai Yim, Yangqiu Song

    Abstract: While large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks by acquiring rich factual knowledge from their broad training data, their ability to synthesize and logically reason with this knowledge in complex ways remains underexplored. In this work, we present a systematic evaluation of state-of-the-art LLMs' complex logical reasoni… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 9 pages

  2. arXiv:2407.12023  [pdf, other

    cs.CL cs.AI

    CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models

    Authors: Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Zhi-Long Ji, Jin-Feng Bai, Zhen-Ru Pan, Fan-Hu Zeng, Jian Xu, Jia-Xin Zhang, Cheng-Lin Liu

    Abstract: Due to the rapid advancements in multimodal large language models, evaluating their multimodal mathematical capabilities continues to receive wide attention. Despite the datasets like MathVista proposed benchmarks for assessing mathematical capabilities in multimodal scenarios, there is still a lack of corresponding evaluation tools and datasets for fine-grained assessment in the context of K12 ed… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  3. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  4. arXiv:2407.09550  [pdf

    cs.CV cs.AI cs.LG

    CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network

    Authors: Jia-Hau Bai, Chi-Ting Liu, Yu Wang, Fu-Chieh Chang, Pei-Yuan Wu

    Abstract: This study uses CAPM (Convex Adversarial Polytope for Maxpool-based CNN) to improve the verified bound for general purpose maxpool-based convolutional neural networks (CNNs) under bounded norm adversarial perturbations. The maxpool function is decomposed as a series of ReLU functions to extend the convex relaxation technique to maxpool functions, by which the verified bound can be efficiently comp… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  5. arXiv:2407.06964  [pdf, other

    cs.CV

    Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach

    Authors: Taolin Zhang, Jiawang Bai, Zhihe Lu, Dongze Lian, Genping Wang, Xinchao Wang, Shu-Tao Xia

    Abstract: Recent works on parameter-efficient transfer learning (PETL) show the potential to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters. However, since they usually insert new structures into the pre-trained model, entire intermediate features of that model are changed and thus need to be stored to be involved in back-propagation, resulting in… ▽ More

    Submitted 14 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  6. arXiv:2407.00291  [pdf, other

    eess.AS cs.SD

    FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

    Authors: Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

    Abstract: This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Technical report for DCASE 2024 Challenge Task 4

  7. arXiv:2406.16855  [pdf, other

    cs.CV

    DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

    Authors: Yuang Peng, Yuxin Cui, Haomiao Tang, Zekun Qi, Runpei Dong, Jing Bai, Chunrui Han, Zheng Ge, Xiangyu Zhang, Shu-Tao Xia

    Abstract: Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content. However, current evaluations either are automated but misalign with humans or require human evaluations that are time-consuming and expensive. In this work, we present DreamBench++, a human-aligned benchmark automated by advan… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Project page: https://dreambenchplus.github.io/

  8. arXiv:2406.11045  [pdf, other

    cs.LG math.NA

    Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on Kolmogorov Arnold Networks

    Authors: Yizheng Wang, Jia Sun, Jinshuai Bai, Cosmin Anitescu, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, Yinghua Liu

    Abstract: AI for partial differential equations (PDEs) has garnered significant attention, particularly with the emergence of Physics-informed neural networks (PINNs). The recent advent of Kolmogorov-Arnold Network (KAN) indicates that there is potential to revisit and enhance the previously MLP-based PINNs. Compared to MLPs, KANs offer interpretability and require fewer parameters. PDEs can be described in… ▽ More

    Submitted 4 August, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  9. arXiv:2406.10885  [pdf, other

    cs.CL

    On the Role of Entity and Event Level Conceptualization in Generalizable Reasoning: A Survey of Tasks, Methods, Applications, and Future Directions

    Authors: Weiqi Wang, Tianqing Fang, Haochen Shi, Baixuan Xu, Wenxuan Ding, Liyu Zhang, Wei Fan, Jiaxin Bai, Haoran Li, Xin Liu, Yangqiu Song

    Abstract: Entity- and event-level conceptualization, as fundamental elements of human cognition, plays a pivotal role in generalizable reasoning. This process involves abstracting specific instances into higher-level concepts and forming abstract knowledge that can be applied in unfamiliar or novel situations, which can enhance models' inferential capabilities and support the effective transfer of knowledge… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  10. arXiv:2406.10701  [pdf, other

    cs.CL

    MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding

    Authors: Baixuan Xu, Weiqi Wang, Haochen Shi, Wenxuan Ding, Huihao Jing, Tianqing Fang, Jiaxin Bai, Long Chen, Yangqiu Song

    Abstract: Improving user experience and providing personalized search results in E-commerce platforms heavily rely on understanding purchase intention. However, existing methods for acquiring large-scale intentions bank on distilling large language models with human annotation for verification. Such an approach tends to generate product-centric intentions, overlook valuable visual information from product i… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  11. arXiv:2406.10173  [pdf, other

    cs.CL

    IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce

    Authors: Wenxuan Ding, Weiqi Wang, Sze Heng Douglas Kwok, Minghao Liu, Tianqing Fang, Jiaxin Bai, Junxian He, Yangqiu Song

    Abstract: Enhancing Language Models' (LMs) ability to understand purchase intentions in E-commerce scenarios is crucial for their effective assistance in various downstream tasks. However, previous approaches that distill intentions from LMs often fail to generate meaningful and human-centric intentions applicable in real-world E-commerce contexts. This raises concerns about the true comprehension and utili… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  12. arXiv:2406.07880  [pdf, other

    cs.CV eess.IV

    A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

    Authors: Jun Bai, Di Wu, Tristan Shelley, Peter Schubel, David Twine, John Russell, Xuesen Zeng, Ji Zhang

    Abstract: Material defects (MD) represent a primary challenge affecting product performance and giving rise to safety issues in related products. The rapid and accurate identification and localization of MD constitute crucial research endeavours in addressing contemporary challenges associated with MD. Although conventional non-destructive testing methods such as ultrasonic and X-ray approaches have mitigat… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.03127  [pdf, other

    cs.CL

    Towards Real-world Scenario: Imbalanced New Intent Discovery

    Authors: Shun Zhang, Chaoran Yan, Jian Yang, Jiaheng Liu, Ying Mo, Jiaqi Bai, Tongliang Li, Zhoujun Li

    Abstract: New Intent Discovery (NID) aims at detecting known and previously undefined categories of user intent by utilizing limited labeled and massive unlabeled data. Most prior works often operate under the unrealistic assumption that the distribution of both familiar and new intent classes is uniform, overlooking the skewed and long-tailed distributions frequently encountered in real-world scenarios. To… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ACL 2024

  14. arXiv:2405.19732  [pdf, other

    cs.CV cs.CL cs.LG

    Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning

    Authors: Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo

    Abstract: Learning a skill generally relies on both practical experience by doer and insightful high-level guidance by instructor. Will this strategy also work well for solving complex non-convex optimization problems? Here, a common gradient-based optimizer acts like a disciplined doer, making locally optimal update at each step. Recent methods utilize large language models (LLMs) to optimize solutions for… ▽ More

    Submitted 6 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  15. arXiv:2405.15758  [pdf, other

    cs.CV cs.AI

    InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

    Authors: Yuchi Wang, Junliang Guo, Jianhong Bai, Runyi Yu, Tianyu He, Xu Tan, Xu Sun, Jiang Bian

    Abstract: Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable. In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars, offering f… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Project page: https://wangyuchi369.github.io/InstructAvatar/

  16. arXiv:2405.10612  [pdf, other

    cs.CV cs.CR cs.LG

    Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transformers

    Authors: Sheng Yang, Jiawang Bai, Kuofeng Gao, Yong Yang, Yiming Li, Shu-tao Xia

    Abstract: Given the power of vision transformers, a new learning paradigm, pre-training and then prompting, makes it more efficient and effective to address downstream visual recognition tasks. In this paper, we identify a novel security threat towards such a paradigm from the perspective of backdoor attacks. Specifically, an extra prompt token, called the switch token in this work, can turn the backdoor mo… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  17. arXiv:2405.09981  [pdf, other

    cs.CV

    Adversarial Robustness for Visual Grounding of Multimodal Large Language Models

    Authors: Kuofeng Gao, Yang Bai, Jiawang Bai, Yong Yang, Shu-Tao Xia

    Abstract: Multi-modal Large Language Models (MLLMs) have recently achieved enhanced performance across various vision-language tasks including visual grounding capabilities. However, the adversarial robustness of visual grounding remains unexplored in MLLMs. To fill this gap, we use referring expression comprehension (REC) as an example task in visual grounding and propose three adversarial attack paradigms… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: ICLR 2024 Workshop on Reliable and Responsible Foundation Models

  18. arXiv:2405.09556  [pdf, other

    eess.SP cs.AI cs.IT

    Co-learning-aided Multi-modal-deep-learning Framework of Passive DOA Estimators for a Heterogeneous Hybrid Massive MIMO Receiver

    Authors: Jiatong Bai, Feng Shu, Qinghe Zheng, Bo Xu, Baihua Shi, Yiwen Chen, Weibin Zhang, Xianpeng Wang

    Abstract: Due to its excellent performance in rate and resolution, fully-digital (FD) massive multiple-input multiple-output (MIMO) antenna arrays has been widely applied in data transmission and direction of arrival (DOA) measurements, etc. But it confronts with two main challenges: high computational complexity and circuit cost. The two problems may be addressed well by hybrid analog-digital (HAD) structu… ▽ More

    Submitted 12 June, 2024; v1 submitted 27 April, 2024; originally announced May 2024.

  19. Robust Covariance-Based Activity Detection for Massive Access

    Authors: Jianan Bai, Erik G. Larsson

    Abstract: The wireless channel is undergoing continuous changes, and the block-fading assumption, despite its popularity in theoretical contexts, never holds true in practical scenarios. This discrepancy is particularly critical for user activity detection in grant-free random access, where joint processing across multiple resource blocks is usually undesirable. In this paper, we propose employing a low-dim… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 5 pages, 11 figures. Asilomar SSC 2023 Conference

  20. arXiv:2405.07551  [pdf, other

    cs.CL cs.AI

    MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

    Authors: Shuo Yin, Weihao You, Zhilong Ji, Guoqiang Zhong, Jinfeng Bai

    Abstract: The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly in… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: The state-of-the-art open-source tool-use LLMs for mathematical reasoning

  21. arXiv:2405.07518  [pdf, other

    cs.AR cs.AI

    SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

    Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain , et al. (5 additional authors not shown)

    Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  22. arXiv:2405.07497  [pdf, other

    cs.LG

    Towards Subgraph Isomorphism Counting with Graph Kernels

    Authors: Xin Liu, Weiqi Wang, Jiaxin Bai, Yangqiu Song

    Abstract: Subgraph isomorphism counting is known as #P-complete and requires exponential time to find the accurate solution. Utilizing representation learning has been shown as a promising direction to represent substructures and approximate the solution. Graph kernels that implicitly capture the correlations among substructures in diverse graphs have exhibited great discriminative power in graph classifica… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  23. arXiv:2405.05806  [pdf, other

    cs.CV

    MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation

    Authors: Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Lei Zhang, Wangmeng Zuo

    Abstract: Text-to-image (T2I) diffusion models have shown significant success in personalized text-to-image generation, which aims to generate novel images with human identities indicated by the reference images. Despite promising identity fidelity has been achieved by several tuning-free methods, they usually suffer from overfitting issues. The learned identity tends to entangle with irrelevant information… ▽ More

    Submitted 28 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: ECCV 2024. Our code can be found at https://github.com/csyxwei/MasterWeaver

  24. Optimizing E-commerce Search: Toward a Generalizable and Rank-Consistent Pre-Ranking Model

    Authors: Enqiang Xu, Yiming Qiu, Junyang Bai, Ping Zhang, Dadong Miao, Songlin Wang, Guoyu Tang, Lin Liu, Mingming Li

    Abstract: In large e-commerce platforms, search systems are typically composed of a series of modules, including recall, pre-ranking, and ranking phases. The pre-ranking phase, serving as a lightweight module, is crucial for filtering out the bulk of products in advance for the downstream ranking module. Industrial efforts on optimizing the pre-ranking model have predominantly focused on enhancing ranking c… ▽ More

    Submitted 13 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    ACM Class: H.3.3

  25. arXiv:2405.03349  [pdf, other

    cs.CV

    Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement

    Authors: Jiesong Bai, Yuhao Yin, Qiyuan He, Yuanxian Li, Xiaofeng Zhang

    Abstract: In the field of low-light image enhancement, both traditional Retinex methods and advanced deep learning techniques such as Retinexformer have shown distinct advantages and limitations. Traditional Retinex methods, designed to mimic the human eye's perception of brightness and color, decompose images into illumination and reflection components but struggle with noise management and detail preserva… ▽ More

    Submitted 19 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  26. arXiv:2405.02942  [pdf, other

    physics.optics cs.CV cs.RO eess.IV

    Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

    Authors: Shaohua Gao, Qi Jiang, Yiqi Liao, Yi Qiu, Wanglei Ying, Kailun Yang, Kaiwei Wang, Benhao Zhang, Jian Bai

    Abstract: We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360°x(35°~110°) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 len… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to Optics & Laser Technology

  27. arXiv:2405.01074  [pdf, other

    cs.IT eess.SY

    Stability Analysis of Interacting Wireless Repeaters

    Authors: Erik G. Larsson, Jianan Bai

    Abstract: We consider a wireless network with multiple single-antenna repeaters that amplify and instantaneously re-transmit the signals they receive to improve the channel rank and system coverage. Due to the positive feedback formed by inter-repeater interference, stability could become a critical issue. We investigate the problem of determining the maximum amplification gain that the repeaters can use wi… ▽ More

    Submitted 7 July, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted to SPAWC 2024. 5 pages, 7 figures

  28. arXiv:2404.18356  [pdf, other

    cs.DC

    FEDQ-Trust: Efficient Data-Driven Trust Prediction for Mobile Edge-Based IoT Systems

    Authors: Jiahui Bai, Hai Dong, Athman Bouguettaya

    Abstract: We introduce FEDQ-Trust, an innovative data-driven trust prediction approach designed for mobile edge-based Internet of Things (IoT) environments. The decentralized nature of mobile edge environments introduces challenges due to variations in data distribution, impacting the accuracy and training efficiency of existing distributed data-driven trust prediction models. FEDQ-Trust effectively tackles… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 14 pages, 6 figures, submitted to IEEE Transactions on Services Computing

  29. arXiv:2404.10763  [pdf, other

    cs.AI cs.CL cs.CV

    LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

    Authors: Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun

    Abstract: Diffusion models have exhibited remarkable capabilities in text-to-image generation. However, their performance in image-to-text generation, specifically image captioning, has lagged behind Auto-Regressive (AR) models, casting doubt on their applicability for such tasks. In this work, we revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding. With… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  30. arXiv:2404.08977  [pdf, other

    cs.CL cs.LG

    RoNID: New Intent Discovery with Generated-Reliable Labels and Cluster-friendly Representations

    Authors: Shun Zhang, Chaoran Yan, Jian Yang, Changyu Ren, Jiaqi Bai, Tongliang Li, Zhoujun Li

    Abstract: New Intent Discovery (NID) strives to identify known and reasonably deduce novel intent groups in the open-world scenario. But current methods face issues with inaccurate pseudo-labels and poor representation learning, creating a negative feedback loop that degrades overall model performance, including accuracy and the adjusted rand index. To address the aforementioned challenges, we propose a Rob… ▽ More

    Submitted 18 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

    Comments: DASFAA 2024

  31. arXiv:2404.07943  [pdf, other

    cs.CE cs.LG

    HomoGenius: a Foundation Model of Homogenization for Rapid Prediction of Effective Mechanical Properties using Neural Operators

    Authors: Yizheng Wang, Xiang Li, Ziming Yan, Yuqing Du, Jinshuai Bai, Bokai Liu, Timon Rabczuk, Yinghua Liu

    Abstract: Homogenization is an essential tool for studying multiscale physical phenomena. However, traditional numerical homogenization, heavily reliant on finite element analysis, requires extensive computation costs, particularly in handling complex geometries, materials, and high-resolution problems. To address these limitations, we propose a numerical homogenization model based on operator learning: Hom… ▽ More

    Submitted 18 March, 2024; originally announced April 2024.

  32. arXiv:2404.06351  [pdf, other

    cs.CV

    HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

    Authors: Xiaolong Tang, Meina Kan, Shiguang Shan, Zhilong Ji, Jinfeng Bai, Xilin Chen

    Abstract: Predicting the trajectories of road agents is essential for autonomous driving systems. The recent mainstream methods follow a static paradigm, which predicts the future trajectory by using a fixed duration of historical frames. These methods make the predictions independently even at adjacent time steps, which leads to potential instability and temporal inconsistency. As successive time steps hav… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  33. arXiv:2404.05983  [pdf, ps, other

    cs.IT

    On Achievable Covert Communication Performance under CSI Estimation Error and Feedback Delay

    Authors: Jiaqing Bai, Ji He, Yanping Chen, Yulong Shen, Xiaohong Jiang

    Abstract: Covert communication's effectiveness critically depends on precise channel state information (CSI). This paper investigates the impact of imperfect CSI on achievable covert communication performance in a two-hop relay system. Firstly, we introduce a two-hop covert transmission scheme utilizing channel inversion power control (CIPC) to manage opportunistic interference, eliminating the receiver's s… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  34. arXiv:2403.20130  [pdf, other

    cs.SD cs.LG eess.AS

    Sound event localization and classification using WASN in Outdoor Environment

    Authors: Dongzhe Zhang, Jianfeng Chen, Jisheng Bai, Mou Wang

    Abstract: Deep learning-based sound event localization and classification is an emerging research area within wireless acoustic sensor networks. However, current methods for sound event localization and classification typically rely on a single microphone array, making them susceptible to signal attenuation and environmental noise, which limits their monitoring range. Moreover, methods using multiple microp… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  35. arXiv:2403.17556  [pdf, other

    cs.CL cs.AI

    m3P: Towards Multimodal Multilingual Translation with Multimodal Prompt

    Authors: Jian Yang, Hongcheng Guo, Yuwei Yin, Jiaqi Bai, Bing Wang, Jiaheng Liu, Xinnian Liang, Linzheng Cahi, Liqun Yang, Zhoujun Li

    Abstract: Multilingual translation supports multiple translation directions by projecting all languages in a shared space, but the translation quality is undermined by the difference between languages in the text-only modality, especially when the number of languages is large. To bridge this gap, we introduce visual context as the universal language-independent representation to facilitate multilingual tran… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: COLING 2024

  36. arXiv:2403.16913  [pdf, other

    cs.CL

    New Intent Discovery with Attracting and Dispersing Prototype

    Authors: Shun Zhang, Jian Yang, Jiaqi Bai, Chaoran Yan, Tongliang Li, Zhao Yan, Zhoujun Li

    Abstract: New Intent Discovery (NID) aims to recognize known and infer new intent categories with the help of limited labeled and large-scale unlabeled data. The task is addressed as a feature-clustering problem and recent studies augment instance representation. However, existing methods fail to capture cluster-friendly representations, since they show less capability to effectively control and coordinate… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: COLING 2024

  37. arXiv:2403.16638  [pdf, other

    cs.CV cs.CR

    AI-Generated Video Detection via Spatio-Temporal Anomaly Learning

    Authors: Jianfa Bai, Man Lin, Gang Cao

    Abstract: The advancement of generation models has led to the emergence of highly realistic artificial intelligence (AI)-generated videos. Malicious users can easily create non-existent videos to spread false information. This letter proposes an effective AI-generated video detection (AIGVDet) scheme by capturing the forensic traces with a two-branch spatio-temporal convolutional neural network (CNN). Speci… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  38. MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection

    Authors: Yupeng Li, Haorui He, Jin Bai, Dacheng Wen

    Abstract: The prevalence of fake news across various online sources has had a significant influence on the public. Existing Chinese fake news detection datasets are limited to news sourced solely from Weibo. However, fake news originating from multiple sources exhibits diversity in various aspects, including its content and social context. Methods trained on purely one single news source can hardly be appli… ▽ More

    Submitted 24 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by the ACM Web Conference 2024 (WWW 2024) oral, dataset available: https://github.com/TrustworthyComp

  39. arXiv:2403.06382  [pdf, other

    cs.CV cs.AI cs.LG

    Pre-Trained Model Recommendation for Downstream Fine-tuning

    Authors: Jiameng Bai, Sai Wu, Jie Song, Junbo Zhao, Gang Chen

    Abstract: As a fundamental problem in transfer learning, model selection aims to rank off-the-shelf pre-trained models and select the most suitable one for the new target task. Existing model selection techniques are often constrained in their scope and tend to overlook the nuanced relationships between models and tasks. In this paper, we present a pragmatic framework \textbf{Fennec}, delving into a diverse… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  40. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  41. arXiv:2403.04880  [pdf, other

    cs.CV

    An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control

    Authors: Aosong Feng, Weikang Qiu, Jinbin Bai, Xiao Zhang, Zhen Dong, Kaicheng Zhou, Rex Ying, Leandros Tassiulas

    Abstract: Building on the success of text-to-image diffusion models (DPMs), image editing is an important application to enable human interaction with AI-generated content. Among various editing methods, editing within the prompt space gains more attention due to its capacity and simplicity of controlling semantics. However, since diffusion models are commonly pretrained on descriptive text captions, direct… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  42. arXiv:2402.19481  [pdf, other

    cs.CV

    DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

    Authors: Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han

    Abstract: Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method split… ▽ More

    Submitted 14 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR 2024 Highlight Code: https://github.com/mit-han-lab/distrifuser Website: https://hanlab.mit.edu/projects/distrifusion Blog: https://hanlab.mit.edu/blog/distrifusion

  43. arXiv:2402.14609  [pdf, other

    cs.LG cs.AI cs.CR cs.DB

    FedCQA: Answering Complex Queries on Multi-Source Knowledge Graphs via Federated Learning

    Authors: Qi Hu, Weifeng Jiang, Haoran Li, Zihao Wang, Jiaxin Bai, Qianren Mao, Yangqiu Song, Lixin Fan, Jianxin Li

    Abstract: Complex logical query answering is a challenging task in knowledge graphs (KGs) that has been widely studied. The ability to perform complex logical reasoning is essential and supports various graph reasoning-based downstream tasks, such as search engines. Recent approaches are proposed to represent KG entities and logical queries into embedding vectors and find answers to logical queries from the… ▽ More

    Submitted 25 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  44. arXiv:2402.13185  [pdf, other

    cs.CV

    UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing

    Authors: Jianhong Bai, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu, Jiang Bian

    Abstract: Recent advances in text-guided video editing have showcased promising results in appearance editing (e.g., stylization). However, video motion editing in the temporal dimension (e.g., from eating to waving), which distinguishes video editing from image editing, is underexplored. In this work, we present UniEdit, a tuning-free framework that supports both video motion and appearance editing by harn… ▽ More

    Submitted 7 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Project page: https://jianhongbai.github.io/UniEdit/

  45. arXiv:2402.12862  [pdf, other

    cs.CL

    Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation

    Authors: Wen Wu, Bo Li, Chao Zhang, Chung-Cheng Chiu, Qiujia Li, Junwen Bai, Tara N. Sainath, Philip C. Woodland

    Abstract: The subjective perception of emotion leads to inconsistent labels from human annotators. Typically, utterances lacking majority-agreed labels are excluded when training an emotion classifier, which cause problems when encountering ambiguous emotional expressions during testing. This paper investigates three methods to handle ambiguous emotion. First, we show that incorporating utterances without m… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  46. arXiv:2402.11505  [pdf, other

    cs.CL cs.AI

    Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

    Authors: Jiamu Bai, Daoyuan Chen, Bingchen Qian, Liuyi Yao, Yaliang Li

    Abstract: Federated Learning (FL) has recently been applied to the parameter-efficient fine-tuning of Large Language Models (LLMs). While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients. This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the ``bucket effect'' in traditional FL that re… ▽ More

    Submitted 30 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: 19 pages, 13 tables, 9 figures

  47. arXiv:2402.09773  [pdf, other

    cs.CL

    NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models

    Authors: Shengrui Li, Junzhe Chen, Xueting Han, Jing Bai

    Abstract: The considerable size of Large Language Models (LLMs) presents notable deployment challenges, particularly on resource-constrained hardware. Structured pruning, offers an effective means to compress LLMs, thereby reducing storage costs and enhancing inference speed for more efficient utilization. In this work, we study data-efficient and resource-efficient structure pruning methods to obtain small… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  48. arXiv:2402.02694  [pdf, other

    eess.AS cs.LG cs.SD

    Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

    Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

    Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More

    Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  49. arXiv:2402.00763  [pdf, other

    cs.CV cs.GR

    360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming

    Authors: Jiayang Bai, Letian Huang, Jie Guo, Wen Gong, Yuanqi Li, Yanwen Guo

    Abstract: 3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto th… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 11 pages, 10 figures

  50. arXiv:2402.00752  [pdf, other

    cs.CV cs.GR

    On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy

    Authors: Letian Huang, Jiayang Bai, Jie Guo, Yuanqi Li, Yanwen Guo

    Abstract: 3D Gaussian Splatting has garnered extensive attention and application in real-time neural rendering. Concurrently, concerns have been raised about the limitations of this technology in aspects such as point cloud storage, performance, and robustness in sparse viewpoints, leading to various improvements. However, there has been a notable lack of attention to the fundamental problem of projection e… ▽ More

    Submitted 29 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.