(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 177 results for author: Qu, X

Searching in archive cs. Search in all archives.
.
  1. Enhancing Eye-Tracking Performance through Multi-Task Learning Transformer

    Authors: Weigeng Li, Neng Zhou, Xiaodong Qu

    Abstract: In this study, we introduce an innovative EEG signal reconstruction sub-module designed to enhance the performance of deep learning models on EEG eye-tracking tasks. This sub-module can integrate with all Encoder-Classifier-based deep learning models and achieve end-to-end training within a multi-task learning framework. Additionally, as the module operates under unsupervised learning, it is versa… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Journal ref: In: Schmorrow, D.D., Fidopiastis, C.M. (eds) Augmented Cognition. HCII 2024 vol 14695 (2024)

  2. arXiv:2408.04378  [pdf, other

    cs.CL

    Overview of the NLPCC 2024 Shared Task on Chinese Metaphor Generation

    Authors: Xingwei Qu, Ge Zhang, Siwei Wu, Yizhi Li, Chenghua Lin

    Abstract: This paper presents the results of the shared task on Chinese metaphor generation, hosted at the 13th CCF Conference on Natural Language Processing and Chinese Computing (NLPCC 2024). The goal of this shared task is to generate Chinese metaphors using machine learning techniques and effectively identifying basic components of metaphorical sentences. It is divided into two subtasks: 1) Metaphor Gen… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  3. arXiv:2408.03480  [pdf, other

    cs.LG

    Advancing EEG-Based Gaze Prediction Using Depthwise Separable Convolution and Enhanced Pre-Processing

    Authors: Matthew L Key, Tural Mehtiyev, Xiaodong Qu

    Abstract: In the field of EEG-based gaze prediction, the application of deep learning to interpret complex neural data poses significant challenges. This study evaluates the effectiveness of pre-processing techniques and the effect of additional depthwise separable convolution on EEG vision transformers (ViTs) in a pretrained model architecture. We introduce a novel method, the EEG Deeper Clustered Vision T… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Journal ref: International Conference on Human-Computer Interaction (HCII 2024)

  4. arXiv:2408.03472  [pdf, other

    cs.LG cs.CY cs.HC

    Integrating HCI Datasets in Project-Based Machine Learning Courses: A College-Level Review and Case Study

    Authors: Xiaodong Qu, Matthew Key, Eric Luo, Chuhui Qiu

    Abstract: This study explores the integration of real-world machine learning (ML) projects using human-computer interfaces (HCI) datasets in college-level courses to enhance both teaching and learning experiences. Employing a comprehensive literature review, course websites analysis, and a detailed case study, the research identifies best practices for incorporating HCI datasets into project-based ML educat… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Journal ref: International Conference on Human-Computer Interaction (HCII 2024)

  5. arXiv:2408.00555  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation

    Authors: Xiaoye Qu, Qiyuan Chen, Wei Wei, Jishuo Sun, Jianfeng Dong

    Abstract: Despite the remarkable ability of large vision-language models (LVLMs) in image comprehension, these models frequently generate plausible yet factually incorrect responses, a phenomenon known as hallucination.Recently, in large language models (LLMs), augmenting LLMs by retrieving information from external knowledge resources has been proven as a promising solution to mitigate hallucinations.Howev… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  6. arXiv:2408.00550  [pdf, other

    cs.CV cs.AI cs.CL

    Mitigating Multilingual Hallucination in Large Vision-Language Models

    Authors: Xiaoye Qu, Mingyang Song, Wei Wei, Jianfeng Dong, Yu Cheng

    Abstract: While Large Vision-Language Models (LVLMs) have exhibited remarkable capabilities across a wide range of tasks, they suffer from hallucination problems, where models generate plausible yet incorrect answers given the input image-query pair. This hallucination phenomenon is even more severe when querying the image in non-English languages, while existing methods for mitigating hallucinations in LVL… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  7. arXiv:2407.17379  [pdf, other

    cs.CV cs.CL

    MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models

    Authors: Siwei Wu, Kang Zhu, Yu Bai, Yiming Liang, Yizhi Li, Haoning Wu, J. H. Liu, Ruibo Liu, Xingwei Qu, Xuxin Cheng, Ge Zhang, Wenhao Huang, Chenghua Lin

    Abstract: Given the remarkable success that large visual language models (LVLMs) have achieved in image perception tasks, the endeavor to make LVLMs perceive the world like humans is drawing increasing attention. Current multi-modal benchmarks primarily focus on facts or specific topic-related knowledge contained within individual images. However, they often overlook the associative relations between multip… ▽ More

    Submitted 5 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: VLMs, Multi-Image Association

  8. arXiv:2407.15613  [pdf, other

    cs.CV

    Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

    Authors: Xiangyan Qu, Jing Yu, Keke Gai, Jiamin Zhuang, Yuanmin Tang, Gang Xiong, Gaopeng Gou, Qi Wu

    Abstract: Recent work shows that documents from encyclopedias serve as helpful auxiliary information for zero-shot learning. Existing methods align the entire semantics of a document with corresponding images to transfer knowledge. However, they disregard that semantic information is not equivalent between them, resulting in a suboptimal alignment. In this work, we propose a novel network to extract multi-v… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM International Conference on Multimedia (MM) 2024

  9. arXiv:2407.07403  [pdf, other

    cs.CV

    A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

    Authors: Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu

    Abstract: With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the compl… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  10. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  11. arXiv:2406.16554  [pdf, other

    cs.CL

    LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

    Authors: Tong Zhu, Xiaoye Qu, Daize Dong, Jiacheng Ruan, Jingqi Tong, Conghui He, Yu Cheng

    Abstract: Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems. Motivated by this limit, we investigate building MoE models from existing dense large language models. Specifically, based on the well-known LLaMA-2 7B mod… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  12. arXiv:2406.15480  [pdf, other

    cs.CL cs.AI cs.LG

    On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion

    Authors: Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng

    Abstract: Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. \thm{Can we fine-tune a series of task-specific small models and transfer their… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: submit under review

  13. arXiv:2406.15479  [pdf, other

    cs.CL cs.AI cs.LG

    Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

    Authors: Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng

    Abstract: In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these i… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: submit in review

  14. arXiv:2406.14550  [pdf, other

    cs.CL cs.AI

    GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

    Authors: Shilong Li, Yancheng He, Hangyu Guo, Xingyuan Bu, Ge Bai, Jie Liu, Jiaheng Liu, Xingwei Qu, Yangguang Li, Wanli Ouyang, Wenbo Su, Bo Zheng

    Abstract: Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The first four authors contributed equally, 27 pages

  15. arXiv:2406.14192  [pdf, other

    cs.CL cs.AI

    Timo: Towards Better Temporal Reasoning for Language Models

    Authors: Zhaochen Su, Jun Zhang, Tong Zhu, Xiaoye Qu, Juntao Li, Min Zhang, Yu Cheng

    Abstract: Reasoning about time is essential for Large Language Models (LLMs) to understand the world. Previous works focus on solving specific tasks, primarily on time-sensitive question answering. While these methods have proven effective, they cannot generalize to a wider spectrum of temporal reasoning tasks. Therefore, we propose a crucial question: Can we build a universal framework to handle a variety… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review

  16. arXiv:2406.11256  [pdf, other

    cs.CL

    Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

    Authors: Tong Zhu, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng

    Abstract: Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales. However, previous methods simply merge all training tasks (e.g. creative writing, coding, and mathematics) and apply fixed sampling weights, without considering the importance of different tasks as the model training state changes. In this way, the most helpful data c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  17. arXiv:2406.09072  [pdf, other

    cs.CL

    Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

    Authors: Zhaochen Su, Juntao Li, Jun Zhang, Tong Zhu, Xiaoye Qu, Pan Zhou, Yan Bowen, Yu Cheng, Min zhang

    Abstract: Temporal reasoning is fundamental for large language models (LLMs) to comprehend the world. Current temporal reasoning datasets are limited to questions about single or isolated events, falling short in mirroring the realistic temporal characteristics involving concurrent nature and intricate temporal interconnections. In this paper, we introduce CoTempQA, a comprehensive co-temporal Question Answ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted to the ACL 2024 main conference

  18. arXiv:2406.07001  [pdf, other

    cs.CL cs.AI

    Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models

    Authors: Zhenyi Lu, Jie Tian, Wei Wei, Xiaoye Qu, Yu Cheng, Wenfeng xie, Dangyang Chen

    Abstract: Text classification is a crucial task encountered frequently in practical scenarios, yet it is still under-explored in the era of large language models (LLMs). This study shows that LLMs are vulnerable to changes in the number and arrangement of options in text classification. Our extensive empirical analyses reveal that the key bottleneck arises from ambiguous decision boundaries and inherent bia… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL2024 findings

  19. arXiv:2406.01375  [pdf, other

    cs.CL

    D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

    Authors: Haoran Que, Jiaheng Liu, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  20. arXiv:2406.01213  [pdf, other

    cs.CL cs.AI

    Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

    Authors: Zhuojun Ding, Wei Wei, Xiaoye Qu, Dangyang Chen

    Abstract: Cross-lingual named entity recognition (NER) aims to train an NER model for the target language leveraging only labeled source language data and unlabeled target language data. Prior approaches either perform label projection on translated source language data or employ a source model to assign pseudo labels for target language data and train a target model on these pseudo-labeled data to generali… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI 2024

  21. arXiv:2406.00009  [pdf, other

    cs.RO

    ULTra-AV: A Unified Longitudinal Trajectory Dataset for Automated Vehicle

    Authors: Hang Zhou, Ke Ma, Shixiao Liang, Xiaopeng Li, Xiaobo Qu

    Abstract: Automated Vehicles (AVs) promise significant advances in transportation. Critical to these improvements is understanding AVs' longitudinal behavior, relying heavily on real-world trajectory data. Existing open-source trajectory datasets of AV, however, often fall short in refinement, reliability, and completeness, hindering effective performance metrics analysis and model development. This study a… ▽ More

    Submitted 16 May, 2024; originally announced June 2024.

    Comments: NA

  22. arXiv:2405.13445  [pdf, other

    cs.LG cs.AI

    Task-agnostic Decision Transformer for Multi-type Agent Control with Federated Split Training

    Authors: Zhiyuan Wang, Bokui Chen, Xiaoyang Qu, Zhenhou Hong, Jing Xiao, Jianzong Wang

    Abstract: With the rapid advancements in artificial intelligence, the development of knowledgeable and personalized agents has become increasingly prevalent. However, the inherent variability in state variables and action spaces among personalized agents poses significant aggregation challenges for traditional federated learning algorithms. To tackle these challenges, we introduce the Federated Split Decisi… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  23. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang Jin, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables

  24. arXiv:2405.09185  [pdf, other

    cs.SI cs.NE

    Influence Maximization in Hypergraphs Using A Genetic Algorithm with New Initialization and Evaluation Methods

    Authors: Xilong Qu, Wenbin Pei, Yingchao Yang, Xirong Xu, Renquan Zhang, Qiang Zhang

    Abstract: Influence maximization (IM) is a crucial optimization task related to analyzing complex networks in the real world, such as social networks, disease propagation networks, and marketing networks. Publications to date about the IM problem focus mainly on graphs, which fail to capture high-order interaction relationships from the real world. Therefore, the use of hypergraphs for addressing the IM pro… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  25. arXiv:2405.06929  [pdf, other

    cs.CV

    PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition

    Authors: Shenglin He, Xiaoyang Qu, Jiguang Wan, Guokuan Li, Changsheng Xie, Jianzong Wang

    Abstract: Recognizing human actions from point cloud sequence has attracted tremendous attention from both academia and industry due to its wide applications. However, most previous studies on point cloud action recognition typically require complex networks to extract intra-frame spatial features and inter-frame temporal features, resulting in an excessive number of redundant computations. This leads to hi… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  26. arXiv:2404.13892  [pdf, other

    cs.SD cs.AI eess.AS

    Retrieval-Augmented Audio Deepfake Detection

    Authors: Zuheng Kang, Yayun He, Botao Zhao, Xiaoyang Qu, Junqing Peng, Jing Xiao, Jianzong Wang

    Abstract: With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Conference on Multimedia Retrieval (ICMR 2024)

  27. arXiv:2404.06393  [pdf, other

    cs.SD cs.AI eess.AS

    MuPT: A Generative Symbolic Music Pretrained Transformer

    Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

    Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  28. arXiv:2404.04167  [pdf, other

    cs.CL cs.AI

    Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

    Authors: Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Wenhu Chen, Ge Zhang

    Abstract: In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  29. arXiv:2404.03543  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

    Authors: Jiawei Guo, Ziming Li, Xueling Liu, Kaijing Ma, Tianyu Zheng, Zhouliang Yu, Ding Pan, Yizhi LI, Ruibo Liu, Yue Wang, Shuyue Guo, Xingwei Qu, Xiang Yue, Ge Zhang, Wenhu Chen, Jie Fu

    Abstract: Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks, including debugging, translating, polishing, and requirement switching. Unlike existing benchmarks focusing solely on code generation, CodeEditorBench empha… ▽ More

    Submitted 6 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  30. arXiv:2404.01204  [pdf, other

    cs.CL

    The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis

    Authors: Chen Yang, Junzhuo Li, Xinyao Niu, Xinrun Du, Songyang Gao, Haoran Zhang, Zhaoliang Chen, Xingwei Qu, Ruibin Yuan, Yizhi Li, Jiaheng Liu, Stephen W. Huang, Shawn Yue, Wenhu Chen, Jie Fu, Ge Zhang

    Abstract: Uncovering early-stage metrics that reflect final model performance is one core principle for large-scale pretraining. The existing scaling law demonstrates the power-law correlation between pretraining loss and training flops, which serves as an important indicator of the current training state for large language models. However, this principle only focuses on the model's compression properties o… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  31. arXiv:2404.00899  [pdf, other

    cs.CL

    TM-TREK at SemEval-2024 Task 8: Towards LLM-Based Automatic Boundary Detection for Human-Machine Mixed Text

    Authors: Xiaoyan Qu, Xiangfeng Meng

    Abstract: With the increasing prevalence of text generated by large language models (LLMs), there is a growing concern about distinguishing between LLM-generated and human-written texts in order to prevent the misuse of LLMs, such as the dissemination of misleading information and academic dishonesty. Previous research has primarily focused on classifying text as either entirely human-written or LLM-generat… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 1st place at SemEval-2024 Task 8, Subtask C, to appear in SemEval-2024 proceedings

  32. arXiv:2403.12931  [pdf, other

    cs.CV

    You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs

    Authors: Yihong Luo, Xiaolong Chen, Xinghua Qu, Jing Tang

    Abstract: We introduce YOSO, a novel generative model designed for rapid, scalable, and high-fidelity one-step image synthesis. YOSO integrates the diffusion process with GANs to achieve the best of two worlds. Specifically, we smooth the distribution by the denoising generator itself, performing self-cooperative learning. We show that our method can serve as a one-step generation model training from scratc… ▽ More

    Submitted 15 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Early version

  33. arXiv:2403.04273  [pdf, other

    cs.MS cond-mat.stat-mech physics.comp-ph

    GenML: A Python Library to Generate the Mittag-Leffler Correlated Noise

    Authors: Xiang Qu, Hui Zhao, Wenjie Cai, Gongyi Wang, Zihan Huang

    Abstract: Mittag-Leffler correlated noise (M-L noise) plays a crucial role in the dynamics of complex systems, yet the scientific community has lacked tools for its direct generation. Addressing this gap, our work introduces GenML, a Python library specifically designed for generating M-L noise. We detail the architecture and functionalities of GenML and its underlying algorithmic approach, which enables th… ▽ More

    Submitted 28 July, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: 7 pages, 4 figures

  34. arXiv:2403.04233  [pdf, other

    cs.CL cs.AI

    DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning

    Authors: Xingwei Qu, Yiming Liang, Yucheng Wang, Tianyu Zheng, Tommy Yue, Lei Ma, Stephen W. Huang, Jiajun Zhang, Yinan Shi, Chenghua Lin, Jie Fu, Ge Zhang

    Abstract: It has long been assumed that the sheer number of parameters in large language models (LLMs) drives in-context learning (ICL) capabilities, enabling remarkable performance improvements by leveraging task-specific demonstrations. Challenging this hypothesis, we introduce DEEP-ICL, a novel task Definition Enriched ExPert Ensembling methodology for ICL. DEEP-ICL explicitly extracts task definitions f… ▽ More

    Submitted 16 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  35. arXiv:2402.15939  [pdf

    eess.IV cs.LG

    Deep Separable Spatiotemporal Learning for Fast Dynamic Cardiac MRI

    Authors: Zi Wang, Min Xiao, Yirong Zhou, Chengyan Wang, Naiming Wu, Yi Li, Yiwen Gong, Shufu Chang, Yinyin Chen, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Di Guo, Guang Yang, Xiaobo Qu

    Abstract: Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge leads to necessitate extensive training data in many deep learning reconstruction methods. This work proposes a novel and efficient approach, levera… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 10 pages, 11 figures, 3 tables

  36. arXiv:2402.13145  [pdf, other

    cs.CL cs.AI

    CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation

    Authors: Yujie Shao, Xinrong Yao, Xingwei Qu, Chenghua Lin, Shi Wang, Stephen W. Huang, Ge Zhang, Jie Fu

    Abstract: Metaphor is a prominent linguistic device in human language and literature, as they add color, imagery, and emphasis to enhance effective communication. This paper introduces a large-scale high quality annotated Chinese Metaphor Corpus, which comprises around 28K sentences drawn from a diverse range of Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the accuracy and con… ▽ More

    Submitted 20 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  37. arXiv:2402.13109  [pdf, other

    cs.CL cs.AI

    CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models

    Authors: Yizhi LI, Ge Zhang, Xingwei Qu, Jiali Li, Zhaoqun Li, Zekun Wang, Hao Li, Ruibin Yuan, Yinghao Ma, Kai Zhang, Wangchunshu Zhou, Yiming Liang, Lei Zhang, Lei Ma, Jiajun Zhang, Zuowen Li, Stephen W. Huang, Chenghua Lin, Jie Fu

    Abstract: The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following. Yet, their effectiveness often diminishes in low-resource languages like Chinese, exacerbated by biased evaluations from data leakage, casting doubt on their true generalizability to new linguistic territories. I… ▽ More

    Submitted 4 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Camera-ready version for ACL 2024. Project page at https://yizhilll.github.io/CIF-Bench/

  38. arXiv:2402.12845  [pdf, other

    cs.AI cs.GT

    MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces

    Authors: Tianyu Zheng, Ge Zhang, Xingwei Qu, Ming Kuang, Stephen W. Huang, Zhaofeng He

    Abstract: Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised learning task by integrating multimodal and pre-trained language models. Our approach incorporates sta… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  39. arXiv:2402.11816  [pdf, other

    cs.CV cs.LG

    Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

    Authors: Jihai Zhang, Xiang Lan, Xiaoye Qu, Yu Cheng, Mengling Feng, Bryan Hooi

    Abstract: Self-Supervised Contrastive Learning has proven effective in deriving high-quality representations from unlabeled data. However, a major challenge that hinders both unimodal and multimodal contrastive learning is feature suppression, a phenomenon where the trained model captures only a limited portion of the information from the input data while overlooking other potentially valuable content. This… ▽ More

    Submitted 15 July, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: ECCV 2024 Camera-Ready

  40. arXiv:2402.05966  [pdf, other

    cs.LG cs.AI

    Rethinking Model Re-Basin and Linear Mode Connectivity

    Authors: Xingyu Qu, Samuel Horvath

    Abstract: Recent studies suggest that with sufficiently wide models, most SGD solutions can, up to permutation, converge into the same basin. This phenomenon, known as the model re-basin regime, has significant implications for model averaging by ensuring the linear mode connectivity. However, current re-basin strategies are ineffective in many scenarios due to a lack of comprehensive understanding of under… ▽ More

    Submitted 9 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 39 pages

  41. arXiv:2401.13714  [pdf, other

    cs.CV cs.LG

    Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers

    Authors: Wei Tao, Shenglin He, Kai Lu, Xiaoyang Qu, Guokuan Li, Jiguang Wan, Jianzong Wang, Jing Xiao

    Abstract: Deploying neural networks on microcontroller units (MCUs) presents substantial challenges due to their constrained computation and memory resources. Previous researches have explored patch-based inference as a strategy to conserve memory without sacrificing model accuracy. However, this technique suffers from severe redundant computation overhead, leading to a substantial increase in execution lat… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted by the 27th Design, Automation and Test in Europe Conference (DATE 2024)

  42. arXiv:2401.12435  [pdf, ps, other

    cs.AI cs.LG math.AP

    Quantitative Analysis of Molecular Transport in the Extracellular Space Using Physics-Informed Neural Network

    Authors: Jiayi Xie, Hongfeng Li, Jin Cheng, Qingrui Cai, Hanbo Tan, Lingyun Zu, Xiaobo Qu, Hongbin Han

    Abstract: The brain extracellular space (ECS), an irregular, extremely tortuous nanoscale space located between cells or between cells and blood vessels, is crucial for nerve cell survival. It plays a pivotal role in high-level brain functions such as memory, emotion, and sensation. However, the specific form of molecular transport within the ECS remain elusive. To address this challenge, this paper propose… ▽ More

    Submitted 23 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  43. arXiv:2401.11944  [pdf, other

    cs.CL cs.AI cs.CV

    CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

    Authors: Ge Zhang, Xinrun Du, Bei Chen, Yiming Liang, Tongxu Luo, Tianyu Zheng, Kang Zhu, Yuyang Cheng, Chunpu Xu, Shuyue Guo, Haoran Zhang, Xingwei Qu, Junjie Wang, Ruibin Yuan, Yizhi Li, Zekun Wang, Yudong Liu, Yu-Hsuan Tsai, Fengji Zhang, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu

    Abstract: As the capabilities of large multimodal models (LMMs) continue to advance, evaluating the performance of LMMs emerges as an increasing need. Additionally, there is an even larger gap in evaluating the advanced knowledge and reasoning abilities of LMMs in non-English contexts such as Chinese. We introduce CMMMU, a new Chinese Massive Multi-discipline Multimodal Understanding benchmark designed to e… ▽ More

    Submitted 18 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  44. arXiv:2401.11667  [pdf, other

    cs.LG

    INCPrompt: Task-Aware incremental Prompting for Rehearsal-Free Class-incremental Learning

    Authors: Zhiyuan Wang, Xiaoyang Qu, Jing Xiao, Bokui Chen, Jianzong Wang

    Abstract: This paper introduces INCPrompt, an innovative continual learning solution that effectively addresses catastrophic forgetting. INCPrompt's key innovation lies in its use of adaptive key-learner and task-aware prompts that capture task-relevant information. This unique combination encapsulates general knowledge across tasks and encodes task-specific knowledge. Our comprehensive evaluation across mu… ▽ More

    Submitted 7 April, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted by the 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  45. arXiv:2401.11666  [pdf, other

    cs.LG cs.AI

    P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer

    Authors: Zhiyuan Wang, Xiaoyang Qu, Jing Xiao, Bokui Chen, Jianzong Wang

    Abstract: Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks. In our work, we propose a novel solution - the Progressive Prompt Decision Transformer (P2DT). This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus foste… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted by the 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  46. arXiv:2401.11449  [pdf, other

    eess.SP cs.NI

    Energy Consumption Analysis for Continuous Phase Modulation in Smart-Grid Internet of Things of beyond 5G

    Authors: Hongjian Gao, Yang Lu, Shaoshi Yang, Jingsheng Tan, Longlong Nie, Xinyi Qu

    Abstract: Wireless sensor network (WSN) underpinning the smart-grid Internet of Things (SG-IoT) has been a popular research topic in recent years due to its great potential for enabling a wide range of important applications. However, the energy consumption (EC) characteristic of sensor nodes is a key factor that affects the operational performance (e.g., lifetime of sensors) and the total cost of ownership… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: 7 figures, 2 tables

    Journal ref: Sensors, vol. 24, no. 2, pp. 1-14, article number 533, Jan. 2024

  47. arXiv:2401.10475  [pdf, other

    cs.CV cs.MM

    CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios

    Authors: Xiangshuo Qiao, Xianxin Li, Xiaozhe Qu, Jie Zhang, Yang Liu, Yu Luo, Cihang Jin, Jin Ma

    Abstract: Vision-Language Models pre-trained on large-scale image-text datasets have shown superior performance in downstream tasks such as image retrieval. Most of the images for pre-training are presented in the form of open domain common-sense visual elements. Differently, video covers in short video search scenarios are presented as user-originated contents that provide important visual summaries of vid… ▽ More

    Submitted 25 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  48. arXiv:2401.06477  [pdf, other

    cs.CL cs.AI

    Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation

    Authors: Tianyu Zheng, Shuyue Guo, Xingwei Qu, Jiawei Guo, Weixu Zhang, Xinrun Du, Qi Jia, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu, Ge Zhang

    Abstract: In this paper, we introduce Kun, a novel approach for creating high-quality instruction-tuning datasets for large language models (LLMs) without relying on manual annotations. Adapting a self-training algorithm based on instruction back-translation and answer polishment, Kun leverages unlabelled data from diverse sources such as Wudao, Wanjuan, and SkyPile to generate a substantial dataset of over… ▽ More

    Submitted 23 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 12 pages, 12 figures

  49. arXiv:2401.02212  [pdf, other

    cs.CL cs.AI

    Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge Graph

    Authors: Rikui Huang, Wei Wei, Xiaoye Qu, Wenfeng Xie, Xianling Mao, Dangyang Chen

    Abstract: Temporal Knowledge Graph (TKG) is an extension of regular knowledge graph by attaching the time scope. Existing temporal knowledge graph question answering (TKGQA) models solely approach simple questions, owing to the prior assumption that each question only contains a single temporal fact with explicit/implicit temporal constraints. Hence, they perform poorly on questions which own multiple tempo… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  50. Enhancing Low-Resource Relation Representations through Multi-View Decoupling

    Authors: Chenghao Fan, Wei Wei, Xiaoye Qu, Zhenyi Lu, Wenfeng Xie, Yu Cheng, Dangyang Chen

    Abstract: Recently, prompt-tuning with pre-trained language models (PLMs) has demonstrated the significantly enhancing ability of relation extraction (RE) tasks. However, in low-resource scenarios, where the available training data is scarce, previous prompt-based methods may still perform poorly for prompt-based representation learning due to a superficial understanding of the relation. To this end, we hig… ▽ More

    Submitted 29 May, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024