(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 313 results for author: Cai, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.01524  [pdf, other

    cs.CL cs.AI

    S$^3$c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners

    Authors: Yuchen Yan, Jin Jiang, Yang Liu, Yixin Cao, Xin Xu, Mengdi zhang, Xunliang Cai, Jian Shao

    Abstract: Self-correction is a novel method that can stimulate the potential reasoning abilities of large language models (LLMs). It involves detecting and correcting errors during the inference process when LLMs solve reasoning problems. However, recent works do not regard self-correction as a spontaneous and intrinsic capability of LLMs. Instead, such correction is achieved through post-hoc generation, ex… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  2. arXiv:2408.17073  [pdf, other

    eess.IV cs.CV

    Approximately Invertible Neural Network for Learned Image Compression

    Authors: Yanbo Gao, Meng Fu, Shuai Li, Chong Lv, Xun Cai, Hui Yuan, Mao Ye

    Abstract: Learned image compression have attracted considerable interests in recent years. It typically comprises an analysis transform, a synthesis transform, quantization and an entropy coding model. The analysis transform and synthesis transform are used to encode an image to latent feature and decode the quantized feature to reconstruct the image, and can be regarded as coupled transforms. However, the… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  3. arXiv:2408.16219  [pdf, other

    cs.CV

    Training-free Video Temporal Grounding using Large-scale Pre-trained Models

    Authors: Minghang Zheng, Xinhao Cai, Qingchao Chen, Yuxin Peng, Yang Liu

    Abstract: Video temporal grounding aims to identify video segments within untrimmed videos that are most relevant to a given natural language query. Existing video temporal localization models rely on specific datasets for training and have high data collection costs, but they exhibit poor generalization capability under the across-dataset and out-of-distribution (OOD) settings. In this paper, we propose a… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  4. arXiv:2408.15545  [pdf, other

    cs.LG cs.CL

    SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

    Authors: Sihang Li, Jin Huang, Jiaxi Zhuang, Yaorui Shi, Xiaochen Cai, Mingjun Xu, Xiang Wang, Linfeng Zhang, Guolin Ke, Hengxing Cai

    Abstract: Scientific literature understanding is crucial for extracting targeted information and garnering insights, thereby significantly advancing scientific discovery. Despite the remarkable success of Large Language Models (LLMs), they face challenges in scientific literature understanding, primarily due to (1) a lack of scientific knowledge and (2) unfamiliarity with specialized scientific tasks. To… ▽ More

    Submitted 30 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  5. arXiv:2408.15496  [pdf, other

    cs.CL

    ReMamba: Equip Mamba with Effective Long-Sequence Modeling

    Authors: Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao

    Abstract: While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mam… ▽ More

    Submitted 1 September, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

  6. arXiv:2408.12708  [pdf, other

    cs.CV

    Revisiting Cross-Domain Problem for LiDAR-based 3D Object Detection

    Authors: Ruixiao Zhang, Juheon Lee, Xiaohao Cai, Adam Prugel-Bennett

    Abstract: Deep learning models such as convolutional neural networks and transformers have been widely applied to solve 3D object detection problems in the domain of autonomous driving. While existing models have achieved outstanding performance on most open benchmarks, the generalization ability of these deep networks is still in doubt. To adapt models to other domains including different cities, countries… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by the ICONIP 2024

  7. arXiv:2408.11397  [pdf, other

    cs.CV

    EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning

    Authors: Zhihao Li, Yao Du, Yang Liu, Yan Zhang, Yufang Liu, Mengdi Zhang, Xunliang Cai

    Abstract: Multi-modal Large Language Models have recently experienced rapid developments and excel in various multi-modal tasks. However, they still struggle with mathematical geometric problem solving, which requires exceptional visual perception proficiency. Existing MLLMs mostly optimize the LLM backbone to acquire geometric reasoning capabilities, while rarely emphasizing improvements in visual comprehe… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  8. arXiv:2408.10520  [pdf, other

    cs.IR

    Efficient and Deployable Knowledge Infusion for Open-World Recommendations via Large Language Models

    Authors: Yunjia Xi, Weiwen Liu, Jianghao Lin, Muyan Weng, Xiaoling Cai, Hong Zhu, Jieming Zhu, Bo Chen, Ruiming Tang, Yong Yu, Weinan Zhang

    Abstract: Recommender systems (RSs) play a pervasive role in today's online services, yet their closed-loop nature constrains their access to open-world knowledge. Recently, large language models (LLMs) have shown promise in bridging this gap. However, previous attempts to directly implement LLMs as recommenders fall short in meeting the requirements of industrial RSs, particularly in terms of online infere… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.10933

  9. arXiv:2408.09739  [pdf, other

    cs.CV

    TraDiffusion: Trajectory-Based Training-Free Image Generation

    Authors: Mingrui Wu, Oucheng Huang, Jiayi Ji, Jiale Li, Xinyue Cai, Huafeng Kuang, Jianzhuang Liu, Xiaoshuai Sun, Rongrong Ji

    Abstract: In this work, we propose a training-free, trajectory-based controllable T2I approach, termed TraDiffusion. This novel method allows users to effortlessly guide image generation via mouse trajectories. To achieve precise control, we design a distance awareness energy function to effectively guide latent variables, ensuring that the focus of generation is within the areas defined by the trajectory.… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: The code: https://github.com/och-mac/TraDiffusion

  10. arXiv:2408.09106  [pdf, other

    q-bio.BM cs.AI

    Fragment-Masked Molecular Optimization

    Authors: Kun Li, Xiantao Cai, Jia Wu, Bo Du, Wenbin Hu

    Abstract: Molecular optimization is a crucial aspect of drug discovery, aimed at refining molecular structures to enhance drug efficacy and minimize side effects, ultimately accelerating the overall drug development process. Many target-based molecular optimization methods have been proposed, significantly advancing drug discovery. These methods primarily on understanding the specific drug target structures… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 11 pages, 5 figures, 2 tables

  11. arXiv:2408.08913  [pdf, other

    cs.IR

    MLoRA: Multi-Domain Low-Rank Adaptive Network for CTR Prediction

    Authors: Zhiming Yang, Haining Gao, Dehong Gao, Luwei Yang, Libin Yang, Xiaoyan Cai, Wei Ning, Guannan Zhang

    Abstract: Click-through rate (CTR) prediction is one of the fundamental tasks in the industry, especially in e-commerce, social media, and streaming media. It directly impacts website revenues, user satisfaction, and user retention. However, real-world production platforms often encompass various domains to cater for diverse customer needs. Traditional CTR prediction models struggle in multi-domain recommen… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 11 pages. Accepted by RecSys'2024, full paper

  12. arXiv:2408.07536  [pdf, other

    cs.NI

    Context-aware Container Orchestration in Serverless Edge Computing

    Authors: Peiyuan Guan, Chen Chen, Ziru Chen, Lin X. Cai, Xing Hao, Amir Taherkordi

    Abstract: Adopting serverless computing to edge networks benefits end-users from the pay-as-you-use billing model and flexible scaling of applications. This paradigm extends the boundaries of edge computing and remarkably improves the quality of services. However, due to the heterogeneous nature of computing and bandwidth resources in edge networks, it is challenging to dynamically allocate different resour… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by the IEEE GLOBECOM 2024 Conference

  13. arXiv:2408.05671  [pdf

    cs.CE

    Research on Heterogeneous Computation Resource Allocation based on Data-driven Method

    Authors: Xirui Tang, Zeyu Wang, Xiaowei Cai, Honghua Su, Changsong Wei

    Abstract: The rapid development of the mobile Internet and the Internet of Things is leading to a diversification of user devices and the emergence of new mobile applications on a regular basis. Such applications include those that are computationally intensive, such as pattern recognition, interactive gaming, virtual reality, and augmented reality. However, the computing and energy resources available on t… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  14. PRISM: PRogressive dependency maxImization for Scale-invariant image Matching

    Authors: Xudong Cai, Yongcai Wang, Lun Luo, Minhang Wang, Deying Li, Jintao Xu, Weihao Gu, Rui Ai

    Abstract: Image matching aims at identifying corresponding points between a pair of images. Currently, detector-free methods have shown impressive performance in challenging scenarios, thanks to their capability of generating dense matches and global receptive field. However, performing feature interaction and proposing matches across the entire image is unnecessary, because not all image regions contribute… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 15 pages, 8 figures, ACM MM 2024. Supplementary materials are included

  15. arXiv:2408.03446  [pdf, other

    cs.NI eess.SP

    Optimizing NOMA Transmissions to Advance Federated Learning in Vehicular Networks

    Authors: Ziru Chen, Zhou Ni, Peiyuan Guan, Lu Wang, Lin X. Cai, Morteza Hashemi, Zongzhi Li

    Abstract: Diverse critical data, such as location information and driving patterns, can be collected by IoT devices in vehicular networks to improve driving experiences and road safety. However, drivers are often reluctant to share their data due to privacy concerns. The Federated Vehicular Network (FVN) is a promising technology that tackles these concerns by transmitting model parameters instead of raw da… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: The paper is accepted by IEEE Globecom 2024

  16. arXiv:2408.03302  [pdf, other

    cs.CV

    TextIM: Part-aware Interactive Motion Synthesis from Text

    Authors: Siyuan Fan, Bo Du, Xiantao Cai, Bo Peng, Longling Sun

    Abstract: In this work, we propose TextIM, a novel framework for synthesizing TEXT-driven human Interactive Motions, with a focus on the precise alignment of part-level semantics. Existing methods often overlook the critical roles of interactive body parts and fail to adequately capture and align part-level semantics, resulting in inaccuracies and even erroneous movement outcomes. To address these issues, T… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  17. arXiv:2408.02632  [pdf, other

    cs.CL cs.AI

    SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models

    Authors: Muxi Diao, Rumei Li, Shiyang Liu, Guogang Liao, Jingang Wang, Xunliang Cai, Weiran Xu

    Abstract: As large language models (LLMs) continue to advance in capability and influence, ensuring their security and preventing harmful outputs has become crucial. A promising approach to address these concerns involves training models to automatically generate adversarial prompts for red teaming. However, the evolving subtlety of vulnerabilities in LLMs challenges the effectiveness of current adversarial… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  18. arXiv:2408.01355  [pdf, other

    cs.CV cs.MM

    Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs

    Authors: Peng Ding, Jingyu Wu, Jun Kuang, Dan Ma, Xuezhi Cao, Xunliang Cai, Shi Chen, Jiajun Chen, Shujian Huang

    Abstract: Multi-modal Large Language Models (MLLMs) have demonstrated remarkable performance on various visual-language understanding and generation tasks. However, MLLMs occasionally generate content inconsistent with the given images, which is known as "hallucination". Prior works primarily center on evaluating hallucination using standard, unperturbed benchmarks, which overlook the prevalent occurrence o… ▽ More

    Submitted 4 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: Acccepted by ACM MM 2024, 14 pages, 11 figures, 9 tables

  19. arXiv:2408.01107  [pdf, other

    cs.CL cs.AI cs.IR

    BioRAG: A RAG-LLM Framework for Biological Question Reasoning

    Authors: Chengrui Wang, Qingqing Long, Meng Xiao, Xunxin Cai, Chengjun Wu, Zhen Meng, Xuezhi Wang, Yuanchun Zhou

    Abstract: The question-answering system for Life science research, which is characterized by the rapid pace of discovery, evolving insights, and complex interactions among knowledge entities, presents unique challenges in maintaining a comprehensive knowledge warehouse and accurate information retrieval. To address these issues, we introduce BioRAG, a novel Retrieval-Augmented Generation (RAG) with the Larg… ▽ More

    Submitted 14 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: 12 pages, 7 figures

  20. arXiv:2407.21534  [pdf, other

    cs.CV

    ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models

    Authors: Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, Xiaoshuai Sun, Rongrong Ji

    Abstract: In this work, we propose a training-free method to inject visual referring into Multimodal Large Language Models (MLLMs) through learnable visual token optimization. We observe the relationship between text prompt tokens and visual tokens in MLLMs, where attention layers model the connection between them. Our approach involves adjusting visual tokens from the MLP output during inference, controlli… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Code:https://github.com/mrwu-mac/ControlMLLM

  21. arXiv:2407.19147  [pdf, other

    quant-ph cs.CR

    Reexamination of the realtime protection for user privacy in practical quantum private query

    Authors: Chun-Yan Wei, Xiao-Qiu Cai, Tian-Yin Wang

    Abstract: Quantum private query (QPQ) is the quantum version for symmetrically private retrieval. However, the user privacy in QPQ is generally guarded in the non-realtime and cheat sensitive way. That is, the dishonest database holder's cheating to elicit user privacy can only be discovered after the protocol is finished (when the user finds some errors in the retrieved database item). Such delayed detecti… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  22. arXiv:2407.16207  [pdf, other

    cs.CL

    Graph-Structured Speculative Decoding

    Authors: Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models (LLMs) by employing a small language model to draft a hypothesis sequence, which is then validated by the LLM. The effectiveness of this approach heavily relies on the balance between performance and efficiency of the draft model. In our research, we focus on enhancing the proportion of d… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  23. arXiv:2407.15355  [pdf, other

    cs.CV

    Attention Beats Linear for Fast Implicit Neural Representation Generation

    Authors: Shuyi Zhang, Ke Liu, Jingjun Gu, Xiaoxu Cai, Zhihua Wang, Jiajun Bu, Haishuai Wang

    Abstract: Implicit Neural Representation (INR) has gained increasing popularity as a data representation method, serving as a prerequisite for innovative generation models. Unlike gradient-based methods, which exhibit lower efficiency in inference, the adoption of hyper-network for generating parameters in Multi-Layer Perceptrons (MLP), responsible for executing INR functions, has surfaced as a promising an… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accept by ECCV 2024

  24. arXiv:2407.14239  [pdf, other

    cs.AI

    KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

    Authors: Kemou Jiang, Xuan Cai, Zhiyong Cui, Aoyong Li, Yilong Ren, Haiyang Yu, Hao Yang, Daocheng Fu, Licheng Wen, Pinlong Cai

    Abstract: Large language models (LLMs) as autonomous agents offer a novel avenue for tackling real-world challenges through a knowledge-driven manner. These LLM-enhanced methodologies excel in generalization and interpretability. However, the complexity of driving tasks often necessitates the collaboration of multiple, heterogeneous agents, underscoring the need for such LLM-driven agents to engage in coope… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 13 pages, 18 figures

  25. arXiv:2407.09360  [pdf, other

    cs.LG math.OC

    Novel clustered federated learning based on local loss

    Authors: Endong Gu, Yongxin Chen, Hao Wen, Xingju Cai, Deren Han

    Abstract: This paper proposes LCFL, a novel clustering metric for evaluating clients' data distributions in federated learning. LCFL aligns with federated learning requirements, accurately assessing client-to-client variations in data distribution. It offers advantages over existing clustered federated learning methods, addressing privacy concerns, improving applicability to non-convex models, and providing… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  26. arXiv:2407.06162  [pdf, other

    cs.CV cs.AI cs.LG

    RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model

    Authors: Khaled Alomar, Halil Ibrahim Aysel, Xiaohao Cai

    Abstract: Human Action Recognition (HAR) encompasses the task of monitoring human activities across various domains, including but not limited to medical, educational, entertainment, visual surveillance, video retrieval, and the identification of anomalous activities. Over the past decade, the field of HAR has witnessed substantial progress by leveraging Convolutional Neural Networks (CNNs) to effectively e… ▽ More

    Submitted 15 August, 2024; v1 submitted 2 June, 2024; originally announced July 2024.

  27. arXiv:2407.06153  [pdf, other

    cs.SE cs.CL

    What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

    Authors: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

    Abstract: The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundar… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 17 pages, 7 figures

  28. arXiv:2407.05765  [pdf, other

    cs.CV

    Enlarging Feature Support Overlap for Domain Generalization

    Authors: Yaoyao Zhu, Xiuding Cai, Dong Miao, Yu Yao, Zhongliang Fu

    Abstract: Deep models often struggle with out-of-distribution (OOD) generalization, limiting their real-world applicability beyond controlled laboratory settings. Invariant risk minimization (IRM) addresses this issue by learning invariant features and minimizing the risk across different domains. Thus, it avoids the pitfalls of pseudo-invariant features and spurious causality associated with empirical risk… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  29. arXiv:2407.04844  [pdf, other

    cs.CV cs.AI

    Neural varifolds: an aggregate representation for quantifying the geometry of point clouds

    Authors: Juheon Lee, Xiaohao Cai, Carola-Bibian Schönlieb, Simon Masnou

    Abstract: Point clouds are popular 3D representations for real-life objects (such as in LiDAR and Kinect) due to their detailed and compact representation of surface-based geometry. Recent approaches characterise the geometry of point clouds by bringing deep learning based techniques together with geometric fidelity metrics such as optimal transportation costs (e.g., Chamfer and Wasserstein metrics). In thi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: The first author, Juheon Lee, is an unaffiliated, independent researcher. This work is a personal endeavor, unrelated to his current job

  30. arXiv:2407.04061  [pdf, other

    cs.CV

    Detect Closer Surfaces that can be Seen: New Modeling and Evaluation in Cross-domain 3D Object Detection

    Authors: Ruixiao Zhang, Yihong Wu, Juheon Lee, Adam Prugel-Bennett, Xiaohao Cai

    Abstract: The performance of domain adaptation technologies has not yet reached an ideal level in the current 3D object detection field for autonomous driving, which is mainly due to significant differences in the size of vehicles, as well as the environments they operate in when applied across domains. These factors together hinder the effective transfer and application of knowledge learned from specific d… ▽ More

    Submitted 12 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by the 27th European Conference on Artificial Intelligence (ECAI 2024)

  31. arXiv:2407.03663  [pdf, other

    cs.CV

    Limited-View Photoacoustic Imaging Reconstruction Via High-quality Self-supervised Neural Representation

    Authors: Youshen xiao, Yuting Shen, Bowei Yao, Xiran Cai, Yuyao Zhang, Fei Gao

    Abstract: In practical applications within the human body, it is often challenging to fully encompass the target tissue or organ, necessitating the use of limited-view arrays, which can lead to the loss of crucial information. Addressing the reconstruction of photoacoustic sensor signals in limited-view detection spaces has become a focal point of current research. In this study, we introduce a self-supervi… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  32. arXiv:2407.02867  [pdf, other

    cs.MM cs.CL

    Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion

    Authors: Yu Zhao, Ying Zhang, Baohang Zhou, Xinying Qian, Kehui Song, Xiangrui Cai

    Abstract: A large number of studies have emerged for Multimodal Knowledge Graph Completion (MKGC) to predict the missing links in MKGs. However, fewer studies have been proposed to study the inductive MKGC (IMKGC) involving emerging entities unseen during training. Existing inductive approaches focus on learning textual entity representations, which neglect rich semantic information in visual modality. More… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted by SIGIR 2024

  33. arXiv:2407.00072  [pdf, other

    cs.IR cs.CL

    Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

    Authors: Yu Bai, Yukai Miao, Li Chen, Dan Li, Yanyu Ren, Hongtao Xie, Ce Yang, Xuhui Cai

    Abstract: In Greek mythology, Pistis symbolized good faith, trust, and reliability. Drawing inspiration from these principles, Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems. This framework consists of distinct stages: matching, pre-ranking, ranking, reasoning, and aggregating. Each stage contributes to narrowing… ▽ More

    Submitted 1 August, 2024; v1 submitted 21 June, 2024; originally announced July 2024.

  34. arXiv:2406.18938  [pdf, other

    cs.IR

    Towards Personalized Federated Multi-Scenario Multi-Task Recommendation

    Authors: Yue Ding, Yanbiao Ji, Xun Cai, Xin Xin, Yuxiang Lu, Suizhi Huang, Chang Liu, Xiaofeng Gao, Tsuyoshi Murata, Hongtao Lu

    Abstract: In modern recommender systems, especially in e-commerce, predicting multiple targets such as click-through rate (CTR) and post-view conversion rate (CTCVR) is common. Multi-task recommender systems are increasingly popular in both research and practice, as they leverage shared knowledge across diverse business scenarios to enhance performance. However, emerging real-world scenarios and data privac… ▽ More

    Submitted 19 August, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  35. arXiv:2406.18017  [pdf, other

    cs.IT cs.ET

    Dependence Analysis and Structured Construction for Batched Sparse Code

    Authors: Jiaxin Qing, Xiaohong Cai, Yijun Fan, Mingyang Zhu, Raymond W. Yeung

    Abstract: In coding theory, codes are usually designed with a certain level of randomness to facilitate analysis and accommodate different channel conditions. However, the resulting random code constructed can be suboptimal in practical implementations. Represented by a bipartite graph, the Batched Sparse Code (BATS Code) is a randomly constructed erasure code that utilizes network coding to achieve near-op… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  36. arXiv:2406.16872  [pdf, other

    eess.SP cs.AI

    Multi-channel Time Series Decomposition Network For Generalizable Sensor-Based Activity Recognition

    Authors: Jianguo Pan, Zhengxin Hu, Lingdun Zhang, Xia Cai

    Abstract: Sensor-based human activity recognition is important in daily scenarios such as smart healthcare and homes due to its non-intrusive privacy and low cost advantages, but the problem of out-of-domain generalization caused by differences in focusing individuals and operating environments can lead to significant accuracy degradation on cross-person behavior recognition due to the inconsistent distribu… ▽ More

    Submitted 28 March, 2024; originally announced June 2024.

  37. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, Jingning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong Jin, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  38. arXiv:2406.10664  [pdf, other

    cs.NI eess.SP

    A Novel Joint DRL-Based Utility Optimization for UAV Data Services

    Authors: Xuli Cai, Poonam Lohan, Burak Kantarci

    Abstract: In this paper, we propose a novel joint deep reinforcement learning (DRL)-based solution to optimize the utility of an uncrewed aerial vehicle (UAV)-assisted communication network. To maximize the number of users served within the constraints of the UAV's limited bandwidth and power resources, we employ deep Q-Networks (DQN) and deep deterministic policy gradient (DDPG) algorithms for optimal reso… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 6 pages, 9 figures

  39. arXiv:2406.04129  [pdf, other

    cs.CV

    LenslessFace: An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification

    Authors: Xin Cai, Hailong Zhang, Chenchen Wang, Wentao Liu, Jinwei Gu, Tianfan Xue

    Abstract: Lensless cameras, innovatively replacing traditional lenses for ultra-thin, flat optics, encode light directly onto sensors, producing images that are not immediately recognizable. This compact, lightweight, and cost-effective imaging solution offers inherent privacy advantages, making it attractive for privacy-sensitive applications like face verification. Typical lensless face verification adopt… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: under review

  40. arXiv:2406.03853  [pdf, other

    cs.CL

    Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

    Authors: Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai

    Abstract: The recent advancements in large language models (LLMs) have been extraordinary, yet the escalating inference costs associated with them present challenges in real-world applications. To address these challenges, we propose a novel approach called Early-exiting Speculative Decoding (EESD) with lossless acceleration. Specifically, EESD utilizes a segment of the LLM to generate draft tokens, incorpo… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 (Findings)

  41. arXiv:2406.00403  [pdf, other

    cs.LG cs.AI

    Dual-perspective Cross Contrastive Learning in Graph Transformers

    Authors: Zelin Yao, Chuang Liu, Xueqi Ma, Mukun Chen, Jia Wu, Xiantao Cai, Bo Du, Wenbin Hu

    Abstract: Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollabl… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE TKDE

  42. arXiv:2406.00247  [pdf, other

    cs.IR cs.AI

    Large Language Models for Relevance Judgment in Product Search

    Authors: Navid Mehrdad, Hrushikesh Mohapatra, Mossaab Bagdouri, Prijith Chandran, Alessandro Magnani, Xunfan Cai, Ajit Puthenputhussery, Sachin Yadav, Tony Lee, ChengXiang Zhai, Ciya Liao

    Abstract: High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search, yet measuring relevance of items to queries is one of the most challenging tasks in product information retrieval, and quality of product search is highly influenced by the precision and scale of available relevance-labelled data. In this paper, we present an array of techniques for… ▽ More

    Submitted 16 July, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure, 11 tables - SIGIR 2024, LLM4Eval

    ACM Class: H.3.3; I.2.7

  43. arXiv:2405.18842  [pdf, other

    cs.CV

    Descriptive Image Quality Assessment in the Wild

    Authors: Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue

    Abstract: With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and capture the multifaceted nature of IQA tasks. However, current methods are still far from practical usage. First, prior works focus narrowly on specific sub-tasks or settings, which do not align with diverse real-wor… ▽ More

    Submitted 12 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  44. arXiv:2405.16440  [pdf, other

    cs.LG cs.AI

    MambaTS: Improved Selective State Space Models for Long-term Time Series Forecasting

    Authors: Xiuding Cai, Yaoyao Zhu, Xueyao Wang, Yu Yao

    Abstract: In recent years, Transformers have become the de-facto architecture for long-term sequence forecasting (LTSF), but faces challenges such as quadratic complexity and permutation invariant bias. A recent model, Mamba, based on selective state space models (SSMs), has emerged as a competitive alternative to Transformer, offering comparable performance with higher throughput and linear complexity rela… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  45. arXiv:2405.15324  [pdf, other

    cs.RO cs.AI cs.CV

    Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

    Authors: Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitiv… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 23 pages, 16 figures

  46. arXiv:2405.14878  [pdf, other

    eess.IV cs.CV cs.LG stat.AP

    Improving and Evaluating Machine Learning Methods for Forensic Shoeprint Matching

    Authors: Divij Jain, Saatvik Kher, Lena Liang, Yufeng Wu, Ashley Zheng, Xizhen Cai, Anna Plantinga, Elizabeth Upton

    Abstract: We propose a machine learning pipeline for forensic shoeprint pattern matching that improves on the accuracy and generalisability of existing methods. We extract 2D coordinates from shoeprint scans using edge detection and align the two shoeprints with iterative closest point (ICP). We then extract similarity metrics to quantify how well the two prints match and use these metrics to train a random… ▽ More

    Submitted 2 April, 2024; originally announced May 2024.

  47. arXiv:2405.12821  [pdf, other

    cs.RO cs.CV

    Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension

    Authors: Runwei Guan, Ruixiao Zhang, Ningwei Ouyang, Jianan Liu, Ka Lok Man, Xiaohao Cai, Ming Xu, Jeremy Smith, Eng Gee Lim, Yutao Yue, Hui Xiong

    Abstract: Embodied perception is essential for intelligent vehicles and robots in interactive environmental understanding. However, these advancements primarily focus on vision, with limited attention given to using 3D modeling sensors, restricting a comprehensive understanding of objects in response to prompts containing qualitative and quantitative queries. Recently, as a promising automotive sensor with… ▽ More

    Submitted 18 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures

  48. arXiv:2405.12806  [pdf, other

    cs.CV

    MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

    Authors: Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Zhanyun Tang, Shengyu Zhang, Feng Lin, Fei Wu

    Abstract: Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcom… ▽ More

    Submitted 21 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:1710.03746 by other authors

  49. arXiv:2405.11742  [pdf, other

    cs.MM

    Universal Organizer of SAM for Unsupervised Semantic Segmentation

    Authors: Tingting Li, Gensheng Pei, Xinhao Cai, Huafeng Liu, Qiong Wang, Yazhou Yao

    Abstract: Unsupervised semantic segmentation (USS) aims to achieve high-quality segmentation without manual pixel-level annotations. Existing USS models provide coarse category classification for regions, but the results often have blurry and imprecise edges. Recently, a robust framework called the segment anything model (SAM) has been proven to deliver precise boundary object masks. Therefore, this paper p… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: accepted by IEEE International Conference on Multimedia & Expo

  50. arXiv:2405.10691  [pdf, other

    eess.IV cs.CV

    LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

    Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

    Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.