(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 4,465 results for author: Zhang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04214  [pdf, other

    cs.CL

    ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

    Authors: Yuanyi Ren, Haoran Ye, Hanjun Fang, Xin Zhang, Guojie Song

    Abstract: Large Language Models (LLMs) are transforming diverse fields and gaining increasing influence as human proxies. This development underscores the urgent need for evaluating value orientations and understanding of LLMs to ensure their responsible integration into public-facing applications. This work introduces ValueBench, the first comprehensive psychometric benchmark for evaluating value orientati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024

  2. arXiv:2406.04146  [pdf, other

    cs.CL

    Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness

    Authors: Guangliang Liu, Milad Afshari, Xitong Zhang, Zhiyu Xue, Avrajit Ghosh, Bidhan Bashyal, Rongrong Wang, Kristen Johnson

    Abstract: While task-agnostic debiasing provides notable generalizability and reduced reliance on downstream data, its impact on language modeling ability and the risk of relearning social biases from downstream task-specific data remain as the two most significant challenges when debiasing Pretrained Language Models (PLMs). The impact on language modeling ability can be alleviated given a high-quality and… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2406.03835  [pdf, other

    cs.CV cs.RO

    Monocular Localization with Semantics Map for Autonomous Vehicles

    Authors: Jixiang Wan, Xudong Zhang, Shuzhou Dong, Yuwei Zhang, Yuchen Yang, Ruoxi Wu, Ye Jiang, Jijunnan Li, Jinquan Lin, Ming Yang

    Abstract: Accurate and robust localization remains a significant challenge for autonomous vehicles. The cost of sensors and limitations in local computational efficiency make it difficult to scale to large commercial applications. Traditional vision-based approaches focus on texture features that are susceptible to changes in lighting, season, perspective, and appearance. Additionally, the large storage siz… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2406.03807  [pdf, other

    cs.AI cs.CL cs.RO

    Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering

    Authors: Yanming Liu, Xinyue Peng, Yuwei Zhang, Jiannan Cao, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

    Abstract: Large language models (LLMs) have demonstrated exceptional reasoning capabilities, enabling them to solve various complex problems. Recently, this ability has been applied to the paradigm of tool learning. Tool learning involves providing examples of tool usage and their corresponding functions, allowing LLMs to formulate plans and demonstrate the process of invoking and executing each tool. LLMs… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 46pages first version

  5. arXiv:2406.03799  [pdf

    cs.CV cs.AI

    Enhanced Semantic Segmentation Pipeline for WeatherProof Dataset Challenge

    Authors: Nan Zhang, Xidan Zhang, Jianing Wei, Fangjun Wang, Zhiming Tan

    Abstract: This report describes the winning solution to the WeatherProof Dataset Challenge (CVPR 2024 UG2+ Track 3). Details regarding the challenge are available at https://cvpr2024ug2challenge.github.io/track3.html. We propose an enhanced semantic segmentation pipeline for this challenge. Firstly, we improve semantic segmentation models, using backbone pretrained with Depth Anything to improve UperNet mod… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  6. arXiv:2406.03711  [pdf, other

    physics.flu-dyn cs.AI

    Pi-fusion: Physics-informed diffusion model for learning fluid dynamics

    Authors: Jing Qiu, Jiancheng Huang, Xiangdong Zhang, Zeng Lin, Minglei Pan, Zengding Liu, Fen Miao

    Abstract: Physics-informed deep learning has been developed as a novel paradigm for learning physical dynamics recently. While general physics-informed deep learning methods have shown early promise in learning fluid dynamics, they are difficult to generalize in arbitrary time instants in real-world scenario, where the fluid motion can be considered as a time-variant trajectory involved large-scale particle… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  7. arXiv:2406.03508  [pdf, other

    cs.LG cs.AI cs.CR

    Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders

    Authors: Tingxu Han, Weisong Sun, Ziqi Ding, Chunrong Fang, Hanwei Qian, Jiaxun Li, Zhenyu Chen, Xiangyu Zhang

    Abstract: Self-supervised learning (SSL) is increasingly attractive for pre-training encoders without requiring labeled data. Downstream tasks built on top of those pre-trained encoders can achieve nearly state-of-the-art performance. The pre-trained encoders by SSL, however, are vulnerable to backdoor attacks as demonstrated by existing studies. Numerous backdoor mitigation techniques are designed for down… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  8. arXiv:2406.03505  [pdf, other

    cs.LG cs.AI

    Dynamic and Adaptive Feature Generation with LLM

    Authors: Xinhao Zhang, Jinghan Zhang, Banafsheh Rekabdar, Yuanchun Zhou, Pengfei Wang, Kunpeng Liu

    Abstract: The representation of feature space is a crucial environment where data points get vectorized and embedded for upcoming modeling. Thus the efficacy of machine learning (ML) algorithms is closely related to the quality of feature engineering. As one of the most important techniques, feature generation transforms raw data into an optimized feature space conducive to model training and further refine… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  9. arXiv:2406.03459  [pdf, other

    cs.CV

    LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

    Authors: Qiang Chen, Xiangbo Su, Xinyu Zhang, Jian Wang, Jiahui Chen, Yunpeng Shen, Chuchu Han, Ziliang Chen, Weixiang Xu, Fanrong Li, Shan Zhang, Kun Yao, Errui Ding, Gang Zhang, Jingdong Wang

    Abstract: In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. Our approach leverages recent advanced techniques, such as training-effective techniques, e.g., improved loss and pretraining, and interleaved window and global attentions for r… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  10. arXiv:2406.03248  [pdf, other

    cs.IR cs.CL

    Large Language Models as Evaluators for Recommendation Explanations

    Authors: Xiaoyu Zhang, Yishan Li, Jiayin Wang, Bowen Sun, Weizhi Ma, Peijie Sun, Min Zhang

    Abstract: The explainability of recommender systems has attracted significant attention in academia and industry. Many efforts have been made for explainable recommendations, yet evaluating the quality of the explanations remains a challenging and unresolved issue. In recent years, leveraging LLMs as evaluators presents a promising avenue in Natural Language Processing tasks (e.g., sentiment classification,… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  11. arXiv:2406.03243  [pdf, other

    cs.AR cs.DC cs.LG

    Llumnix: Dynamic Scheduling for Large Language Model Serving

    Authors: Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin

    Abstract: Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and unpredictable in terms of resource and latency requirements, as a result of the diverse applications and the dynamic execution nature of LLMs. Existing systems are fundamen… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: To appear at OSDI '24; open-source repo will be available in June 2024

  12. arXiv:2406.02610  [pdf, other

    q-bio.QM cs.AI cs.LG

    MoFormer: Multi-objective Antimicrobial Peptide Generation Based on Conditional Transformer Joint Multi-modal Fusion Descriptor

    Authors: Li Wang, Xiangzheng Fu, Jiahao Yang, Xinyi Zhang, Xiucai Ye, Yiping Liu, Tetsuya Sakurai, Xiangxiang Zeng

    Abstract: Deep learning holds a big promise for optimizing existing peptides with more desirable properties, a critical step towards accelerating new drug discovery. Despite the recent emergence of several optimized Antimicrobial peptides(AMP) generation methods, multi-objective optimizations remain still quite challenging for the idealism-realism tradeoff. Here, we establish a multi-objective AMP synthesis… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  13. arXiv:2406.02599  [pdf, other

    cs.CR cs.AI

    Privacy-Aware Randomized Quantization via Linear Programming

    Authors: Zhongteng Cai, Xueru Zhang, Mohammad Mahdi Khalili

    Abstract: Differential privacy mechanisms such as the Gaussian or Laplace mechanism have been widely used in data analytics for preserving individual privacy. However, they are mostly designed for continuous outputs and are unsuitable for scenarios where discrete values are necessary. Although various quantization mechanisms were proposed recently to generate discrete outputs under differential privacy, the… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  14. arXiv:2406.02560  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Less Peaky and More Accurate CTC Forced Alignment by Label Priors

    Authors: Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur

    Abstract: Connectionist temporal classification (CTC) models are known to have peaky output distributions. Such behavior is not a problem for automatic speech recognition (ASR), but it can cause inaccurate forced alignments (FA), especially at finer granularity, e.g., phoneme level. This paper aims at alleviating the peaky behavior for CTC and improve its suitability for forced alignment generation, by leve… ▽ More

    Submitted 22 April, 2024; originally announced June 2024.

    Comments: Accepted by ICASSP 2024. Github repo: https://github.com/huangruizhe/audio/tree/aligner_label_priors

  15. arXiv:2406.02039  [pdf, other

    cs.AR

    LMB: Augmenting PCIe Devices with CXL-Linked Memory Buffer

    Authors: Jiapin Wang, Xiangping Zhang, Chenlei Tang, Xiang Chen, Tao Lu

    Abstract: PCIe devices, such as SSDs and GPUs, are pivotal in modern data centers, and their value is set to grow amidst the emergence of AI and large models. However, these devices face onboard DRAM shortage issue due to internal space limitation, preventing accommodation of sufficient DRAM modules alongside flash or GPU processing chips. Current solutions either curb device-internal memory usage or supple… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  16. arXiv:2406.01976  [pdf, other

    cs.CL

    Conditional Language Learning with Context

    Authors: Xiao Zhang, Miao Li, Ji Wu

    Abstract: Language models can learn sophisticated language understanding skills from fitting raw text. They also unselectively learn useless corpus statistics and biases, especially during finetuning on domain-specific corpora. In this paper, we propose a simple modification to causal language modeling called conditional finetuning, which performs language modeling conditioned on a context. We show that a c… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: To appear at the 41st International Conference on Machine Learning (ICML 2024)

  17. arXiv:2406.01929  [pdf, other

    eess.SY cs.AI

    Fast networked data selection via distributed smoothed quantile estimation

    Authors: Xu Zhang, Marcos M. Vasconcelos

    Abstract: Collecting the most informative data from a large dataset distributed over a network is a fundamental problem in many fields, including control, signal processing and machine learning. In this paper, we establish a connection between selecting the most informative data and finding the top-$k$ elements of a multiset. The top-$k$ selection in a network can be formulated as a distributed nonsmooth co… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Submitted to the IEEE Transactions on Automatic Control

  18. arXiv:2406.01928  [pdf, other

    cs.RO

    History-Aware Planning for Risk-free Autonomous Navigation on Unknown Uneven Terrain

    Authors: Yinchuan Wang, Nianfei Du, Yongsen Qin, Xiang Zhang, Rui Song, Chaoqun Wang

    Abstract: It is challenging for the mobile robot to achieve autonomous and mapless navigation in the unknown environment with uneven terrain. In this study, we present a layered and systematic pipeline. At the local level, we maintain a tree structure that is dynamically extended with the navigation. This structure unifies the planning with the terrain identification. Besides, it contributes to explicitly i… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

  19. arXiv:2406.01363  [pdf, other

    cs.CL cs.IR

    Privacy in LLM-based Recommendation: Recent Advances and Future Directions

    Authors: Sichun Luo, Wei Shao, Yuxuan Yao, Jian Xu, Mingyang Liu, Qintong Li, Bowei He, Maolin Wang, Guanzhi Deng, Hanxu Hou, Xinyi Zhang, Linqi Song

    Abstract: Nowadays, large language models (LLMs) have been integrated with conventional recommendation models to improve recommendation performance. However, while most of the existing works have focused on improving the model performance, the privacy issue has only received comparatively less attention. In this paper, we review recent advancements in privacy within LLM-based recommendation, categorizing th… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  20. arXiv:2406.01284  [pdf

    physics.med-ph cs.HC

    Extraction of Weak Surface Diaphragmatic Electromyogram Using Modified Progressive FastICA Peel-Off

    Authors: Yao Li, Dongsheng Zhao, Haowen Zhao, Xu Zhang, Min Shao

    Abstract: Diaphragmatic electromyogram (EMGdi) is a crucial electrophysiological signal that contains information about human respiration. Although it is practical to record surface EMGdi (sEMGdi) noninvasively and conveniently by placing electrodes over chest skin, extraction of such weak sEMGdi from noise condition is a challenging task, limiting its clinical use compared with esophageal EMGdi. In this pa… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2406.01281  [pdf

    physics.med-ph cs.HC

    Extraction of Maternal and fetal ECG in a non-invasive way from abdominal ECG recordings using modified Progressive FastICA Peel-off

    Authors: Yao Li, Xuanyu Luo, Haowen Zhao, Jiawen Cui, Yangfan She, Dongfang Li, Lai Jiang, Xu Zhang

    Abstract: The non-invasive abdominal electrocardiogram (AECG) gives a non-invasive way to monitor fetal well-being during pregnancy. Due to the overlap with maternal ECG (MECG) as well as potential noises from other sources, it is challenging to extract weak fetal ECG (FECG) using surface electrodes. Taking advantage of precise source separation capability of the FastICA approach combined with its constrain… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  22. arXiv:2406.01032  [pdf, other

    cs.LG cs.AI

    LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

    Authors: Junjie Xu, Zongyu Wu, Minhua Lin, Xiang Zhang, Suhang Wang

    Abstract: Recent progress in Graph Neural Networks (GNNs) has greatly enhanced the ability to model complex molecular structures for predicting properties. Nevertheless, molecular data encompasses more than just graph structures, including textual and visual information that GNNs do not handle well. To bridge this gap, we present an innovative framework that utilizes multimodal molecular data to extract ins… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  23. arXiv:2406.01028  [pdf, other

    cs.CV

    LLEMamba: Low-Light Enhancement via Relighting-Guided Mamba with Deep Unfolding Network

    Authors: Xuanqi Zhang, Haijin Zeng, Jinwang Pan, Qiangqiang Shen, Yongyong Chen

    Abstract: Transformer-based low-light enhancement methods have yielded promising performance by effectively capturing long-range dependencies in a global context. However, their elevated computational demand limits the scalability of multiple iterations in deep unfolding networks, and hence they have difficulty in flexibly balancing interpretability and distortion. To address this issue, we propose a novel… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 9pages, 7 figures

  24. arXiv:2406.01014  [pdf, other

    cs.CL cs.CV

    Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

    Authors: Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

    Abstract: Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the tw… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 11 figures, 10 Tables

  25. arXiv:2406.00983  [pdf, other

    cs.CL cs.AI

    Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

    Authors: Junyu Lu, Bo Xu, Xiaokun Zhang, Kaiyuan Liu, Dongyu Zhang, Liang Yang, Hongfei Lin

    Abstract: Current methods of toxic language detection (TLD) typically rely on specific tokens to conduct decisions, which makes them suffer from lexical bias, leading to inferior performance and generalization. Lexical bias has both "useful" and "misleading" impacts on understanding toxicity. Unfortunately, instead of distinguishing between these impacts, current debiasing methods typically eliminate them i… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  26. arXiv:2406.00939  [pdf, ps, other

    cs.IT

    Bounds on f-Divergences between Distributions within Generalized Quasi-$\varepsilon$-Neighborhood

    Authors: Xinchun Yu, Shuangqing Wei, Shao-Lun Huang, Xiao-Ping Zhang

    Abstract: A general reverse Pinsker's inequality is derived to give an upper bound on f-divergences in terms of total variational distance when two distributions are close measured under our proposed generalized local information geometry framework. In addition, relationships between two f-divergences equipped with functions that are third order differentiable are established in terms of the lower and upper… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  27. arXiv:2406.00602  [pdf, other

    cs.SE cs.PL

    From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions

    Authors: Weipeng Jiang, Xuanqi Gao, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

    Abstract: Large Code Generation Models (LCGMs) have garnered significant attention and achieved promising results across various programming tasks. However, concerns arise regarding performance when using non-English prompts, as these models are primarily trained on English-centric corpora, and most programming language tokens resemble English. Existing benchmarks often rely on English programming questions… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 10 and a quarter pages, 6 figures

  28. arXiv:2406.00523  [pdf, other

    cs.CR

    Stealing Trust: Unraveling Blind Message Attacks in Web3 Authentication

    Authors: Kailun Yan, Xiaokuan Zhang, Wenrui Diao

    Abstract: As the field of Web3 continues its rapid expansion, the security of Web3 authentication, often the gateway to various Web3 applications, becomes increasingly crucial. Despite its widespread use as a login method by numerous Web3 applications, the security risks of Web3 authentication have not received much attention. This paper investigates the vulnerabilities in the Web3 authentication process an… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  29. arXiv:2406.00456  [pdf, other

    cs.LG cs.AI cs.CL

    Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation

    Authors: Zijie Zhong, Hanwen Liu, Xiaoya Cui, Xiaofan Zhang, Zengchang Qin

    Abstract: Integrating information from different reference data sources is a major challenge for Retrieval-Augmented Generation (RAG) systems because each knowledge source adopts a unique data structure and follows different conventions. Retrieving from multiple knowledge sources with one fixed strategy usually leads to under-exploitation of information. To mitigate this drawback, inspired by Mix-of-Expert,… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 17 pages, 6 figures and 8 tables

  30. arXiv:2406.00380  [pdf, other

    cs.CL cs.AI

    The Best of Both Worlds: Toward an Honest and Helpful Large Language Model

    Authors: Chujie Gao, Qihui Zhang, Dongping Chen, Yue Huang, Siyuan Wu, Zhengyan Fu, Yao Wan, Xiangliang Zhang, Lichao Sun

    Abstract: Large Language Models (LLMs) have achieved remarkable success across various industries due to their exceptional generative capabilities. However, for safe and effective real-world deployments, ensuring honesty and helpfulness is critical. This paper addresses the question: Can we prioritize the helpfulness of LLMs while preserving their honesty? To begin with, we establish exhaustive principles a… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  31. arXiv:2406.00333  [pdf, other

    cs.IR

    A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation

    Authors: Dugang Liu, Shenxian Xian, Xiaolin Lin, Xiaolian Zhang, Hong Zhu, Yuan Fang, Zhen Chen, Zhong Ming

    Abstract: The training paradigm integrating large language models (LLM) is gradually reshaping sequential recommender systems (SRS) and has shown promising results. However, most existing LLM-enhanced methods rely on rich textual information on the item side and instance-level supervised fine-tuning (SFT) to inject collaborative information into LLM, which is inefficient and limited in many applications. To… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  32. arXiv:2406.00276  [pdf

    cs.LG cs.AI cs.CE physics.data-an

    Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

    Authors: Shengyu Tao, Mengtian Zhang, Zixi Zhao, Haoyang Li, Ruifei Ma, Yunhong Che, Xin Sun, Lin Su, Xiangyu Chen, Zihao Zhou, Heng Chang, Tingwei Cao, Xiao Xiao, Yaojun Liu, Wenjun Yu, Zhongling Xu, Yang Li, Han Hao, Xuan Zhang, Xiaosong Hu, Guangmin ZHou

    Abstract: Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed mac… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    ACM Class: J.2; G.3

  33. arXiv:2405.21074  [pdf, other

    cs.CV

    Latent Intrinsics Emerge from Training to Relight

    Authors: Xiao Zhang, William Gao, Seemandhar Jain, Michael Maire, David. A. Forsyth, Anand Bhattad

    Abstract: Image relighting is the task of showing what a scene from a source image would look like if illuminated differently. Inverse graphics schemes recover an explicit representation of geometry and a set of chosen intrinsics, then relight with some form of renderer. However error control for inverse graphics is difficult, and inverse graphics methods can represent only the effects of the chosen intrins… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  34. arXiv:2405.21050  [pdf, other

    cs.CV cs.LG

    Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models

    Authors: Xinxi Zhang, Song Wen, Ligong Han, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Dimitris N. Metaxas

    Abstract: Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and the… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  35. arXiv:2405.20681  [pdf, other

    cs.CR cs.AI

    No Free Lunch Theorem for Privacy-Preserving LLM Inference

    Authors: Xiaojin Zhang, Yulin Fei, Yan Kang, Wei Chen, Lixin Fan, Hai Jin, Qiang Yang

    Abstract: Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to focus on more valuable tasks. Furthermore, LLMs possess the capacity to sift through extensive datasets, uncover underlying patterns, and furnish critical insights that propel the fron… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  36. arXiv:2405.20535  [pdf, other

    cs.AI cs.CL

    Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

    Authors: Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye, Xianjun Yang, Lichang Chen, William Yang Wang, Linda Ruth Petzold

    Abstract: Instruction Fine-Tuning (IFT) significantly enhances the zero-shot capabilities of pretrained Large Language Models (LLMs). While coding data is known to boost reasoning abilities during LLM pretraining, its role in activating internal reasoning capacities during IFT remains understudied. This paper investigates a key question: How does coding data impact LLMs' reasoning capacities during the IFT… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  37. arXiv:2405.20072  [pdf, other

    cs.CV

    Faces of the Mind: Unveiling Mental Health States Through Facial Expressions in 11,427 Adolescents

    Authors: Xiao Xu, Keyin Zhou, Yan Zhang, Yang Wang, Fei Wang, Xizhe Zhang

    Abstract: Mood disorders, including depression and anxiety, often manifest through facial expressions. While previous research has explored the connection between facial features and emotions, machine learning algorithms for estimating mood disorder severity have been hindered by small datasets and limited real-world application. To address this gap, we analyzed facial videos of 11,427 participants, a datas… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  38. arXiv:2405.20032  [pdf, other

    cs.NI cs.AI cs.MM

    Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion

    Authors: Jiangkai Wu, Liming Liu, Yunpeng Tan, Junlin Hao, Xinggong Zhang

    Abstract: With the exponential growth of video traffic, traditional video streaming systems are approaching their limits in compression efficiency and communication capacity. To further reduce bitrate while maintaining quality, we propose Promptus, a disruptive novel system that streaming prompts instead of video content with Stable Diffusion, which converts video frames into a series of "prompts" for deliv… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  39. arXiv:2405.19885  [pdf, other

    cs.LG cs.RO

    Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning

    Authors: Hengkai Tan, Songming Liu, Kai Ma, Chengyang Ying, Xingxing Zhang, Hang Su, Jun Zhu

    Abstract: Transformer has shown promise in reinforcement learning to model time-varying features for obtaining generalized low-level robot policies on diverse robotics datasets in embodied learning. However, it still suffers from the issues of low data efficiency and high inference latency. In this paper, we propose to investigate the task from a new perspective of the frequency domain. We first observe tha… ▽ More

    Submitted 5 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  40. arXiv:2405.19856  [pdf, other

    cs.CL cs.SE

    DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories

    Authors: Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Huanyu Liu, Hao Zhu, Lecheng Wang, Kaibo Liu, Zheng Fang, Lanshen Wang, Jiazheng Ding, Xuanming Zhang, Yuqi Zhu, Yihong Dong, Zhi Jin, Binhua Li, Fei Huang, Yongbin Li

    Abstract: How to evaluate the coding abilities of Large Language Models (LLMs) remains an open question. We find that existing benchmarks are poorly aligned with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs. To address the knowledge gap, we propose a new benchmark named DevEval, which has three advances. (1) DevEval aligns with real-world repositories in multi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). arXiv admin note: substantial text overlap with arXiv:2404.00599, arXiv:2401.06401

  41. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, Jingyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  42. arXiv:2405.19677  [pdf, other

    cs.CR cs.AI

    Large Language Model Watermark Stealing With Mixed Integer Programming

    Authors: Zhaoxi Zhang, Xiaomei Zhang, Yanjun Zhang, Leo Yu Zhang, Chao Chen, Shengshan Hu, Asif Gill, Shirui Pan

    Abstract: The Large Language Model (LLM) watermark is a newly emerging technique that shows promise in addressing concerns surrounding LLM copyright, monitoring AI-generated text, and preventing its misuse. The LLM watermark scheme commonly includes generating secret keys to partition the vocabulary into green and red lists, applying a perturbation to the logits of tokens in the green list to increase their… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 12 pages

  43. arXiv:2405.19650  [pdf, other

    cs.LG cs.AI cs.NE math.OC

    Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization

    Authors: Xi Lin, Yilu Liu, Xiaoyuan Zhang, Fei Liu, Zhenkun Wang, Qingfu Zhang

    Abstract: Multi-objective optimization can be found in many real-world applications where some conflicting objectives can not be optimized by a single solution. Existing optimization methods often focus on finding a set of Pareto solutions with different optimal trade-offs among the objectives. However, the required number of solutions to well approximate the whole Pareto optimal set could be exponentially… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  44. arXiv:2405.19649  [pdf, ps, other

    cs.LG cs.SI stat.ML

    Towards Deeper Understanding of PPR-based Embedding Approaches: A Topological Perspective

    Authors: Xingyi Zhang, Zixuan Weng, Sibo Wang

    Abstract: Node embedding learns low-dimensional vectors for nodes in the graph. Recent state-of-the-art embedding approaches take Personalized PageRank (PPR) as the proximity measure and factorize the PPR matrix or its adaptation to generate embeddings. However, little previous work analyzes what information is encoded by these approaches, and how the information correlates with their superb performance in… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  45. arXiv:2405.19581  [pdf, other

    cs.SE cs.AI

    Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases

    Authors: Zian Su, Xiangzhe Xu, Ziyang Huang, Kaiyuan Zhang, Xiangyu Zhang

    Abstract: Human-Oriented Binary Reverse Engineering (HOBRE) lies at the intersection of binary and source code, aiming to lift binary code to human-readable content relevant to source code, thereby bridging the binary-source semantic gap. Recent advancements in uni-modal code model pre-training, particularly in generative Source Code Foundation Models (SCFMs) and binary understanding models, have laid the g… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  46. arXiv:2405.19444  [pdf, other

    cs.AI

    MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

    Authors: Zhenwen Liang, Dian Yu, Wenhao Yu, Wenlin Yao, Zhihan Zhang, Xiangliang Zhang, Dong Yu

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities in mathematical problem solving, particularly in single turn question answering formats. However, real world scenarios often involve mathematical question answering that requires multi turn or interactive information exchanges, and the performance of LLMs on these tasks is still underexplored. This paper introduces MathChat, a… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  47. arXiv:2405.19363  [pdf, other

    eess.SP cs.AI cs.LG

    Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification

    Authors: Yihe Wang, Nan Huang, Taida Li, Yujun Yan, Xiang Zhang

    Abstract: Medical time series data, such as Electroencephalography (EEG) and Electrocardiography (ECG), play a crucial role in healthcare, such as diagnosing brain and heart diseases. Existing methods for medical time series classification primarily rely on handcrafted biomarkers extraction and CNN-based models, with limited exploration of transformers tailored for medical time series. In this paper, we int… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 20pages (14 pages main paper + 6 pages supplementary materials)

  48. arXiv:2405.19327  [pdf, other

    cs.CL cs.AI cs.LG

    MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

    Authors: Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu , et al. (20 additional authors not shown)

    Abstract: Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl… ▽ More

    Submitted 2 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: https://map-neo.github.io/

  49. arXiv:2405.18955  [pdf, other

    cs.CV

    RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision

    Authors: Jinzhong Wang, Xuetao Tian, Shun Dai, Tao Zhuo, Haorui Zeng, Hongjuan Liu, Jiaqi Liu, Xiuwei Zhang, Yanning Zhang

    Abstract: Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Atte… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  50. arXiv:2405.18524  [pdf, other

    cs.CV

    Aligning in a Compact Space: Contrastive Knowledge Distillation between Heterogeneous Architectures

    Authors: Hongjun Wu, Li Xiao, Xingkuo Zhang, Yining Miao

    Abstract: Knowledge distillation is commonly employed to compress neural networks, reducing the inference costs and memory footprint. In the scenario of homogenous architecture, feature-based methods have been widely validated for their effectiveness. However, in scenarios where the teacher and student models are of heterogeneous architectures, the inherent differences in feature representation significantl… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures, conference paper