(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 616 results for author: Xiong, Y

.
  1. arXiv:2407.21059  [pdf, other

    cs.CL cs.AI cs.IR

    Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks

    Authors: Yunfan Gao, Yun Xiong, Meng Wang, Haofen Wang

    Abstract: Retrieval-augmented Generation (RAG) has markedly enhanced the capabilities of Large Language Models (LLMs) in tackling knowledge-intensive tasks. The increasing demands of application scenarios have driven the evolution of RAG, leading to the integration of advanced retrievers, LLMs and other complementary technologies, which in turn has amplified the intricacy of RAG systems. However, the rapid… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  2. arXiv:2407.19826  [pdf, other

    cs.RO

    Design and Control of a Novel Six-Degree-of-Freedom Hybrid Robotic Arm

    Authors: Yang Chen, Zhonghua Miao, Yuanyue Ge, Sen lin, Liping Chen, Ya Xiong

    Abstract: Robotic arms are key components in fruit-harvesting robots. In agricultural settings, conventional serial or parallel robotic arms often fall short in meeting the demands for a large workspace, rapid movement, enhanced capability of obstacle avoidance and affordability. This study proposes a novel hybrid six-degree-of-freedom (DoF) robotic arm that combines the advantages of parallel and serial me… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by IROS 2024

  3. arXiv:2407.19254  [pdf, ps, other

    math.CV

    Convexity of the Bergman Kernels on Convex Domains

    Authors: Yuanpu Xiong

    Abstract: Let $Ωおめが$ be a convex domain in $\mathbb{C}^n$ and $\varphi$ a convex function on $Ωおめが$. We prove that $\log{K_{Ωおめが,\varphi}(z)}$ is a convex function (might be identically $-\infty$) on $Ωおめが$, where $K_{Ωおめが,\varphi}$ is the weighted Bergman kernel. In particular, $\log{K_Ωおめが(z)}$ is a convex function if $Ωおめが$ is convex. We further show that $\log{K_Ωおめが(z)}$ is strictly convex if and only if $Ωおめが$ does not contain… ▽ More

    Submitted 3 August, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

    Comments: 9 pages. Comments welcome!

  4. arXiv:2407.18877  [pdf, other

    cs.SE

    Code Structure-Aware through Line-level Semantic Learning for Code Vulnerability Detection

    Authors: Ziliang Wang, Ge Li, Jia Li, Yihong Dong, Yingfei Xiong, Zhi Jin

    Abstract: Different from the flow semantics of natural languages, programming languages are inherently rigid in structure and grammar. Existing fine-tuning methodologies for code vulnerability detection generally treat code as long text sequences, stripping away structural elements such as newlines ('/n') and whitespace. However, this approach inadvertently results in the loss of crucial structural informat… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  5. arXiv:2407.18523  [pdf, other

    cs.LG

    DTFormer: A Transformer-Based Method for Discrete-Time Dynamic Graph Representation Learning

    Authors: Xi Chen, Yun Xiong, Siwei Zhang, Jiawei Zhang, Yao Zhang, Shiyang Zhou, Xixi Wu, Mingyang Zhang, Tengfei Liu, Weiqiang Wang

    Abstract: Discrete-Time Dynamic Graphs (DTDGs), which are prevalent in real-world implementations and notable for their ease of data acquisition, have garnered considerable attention from both academic researchers and industry practitioners. The representation learning of DTDGs has been extensively applied to model the dynamics of temporally changing entities and their evolving connections. Currently, DTDG… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  6. arXiv:2407.17337  [pdf, ps, other

    cond-mat.supr-con

    Raman Spectroscopic Study on Bi2Rh3Se2: Two-dimensional-Ising Charge Density Wave and Quantum Fluctuations

    Authors: Fei Jiao, Yonghui Zhou, Shuyang Wang, Chao An, Xuliang Chen, Ying Zhou, Min Zhang, Liang Cao, Xigang Luo, Yimin Xiong, Zhaorong Yang

    Abstract: The ternary chalcogenide Bi2Rh3Se2 was found to be a charge density wave (CDW) superconductor with a 2*2 periodicity. The key questions regarding the underlying mechanism of CDW state and its interplay with lattice and electronic properties remains to be explored. Here, based on the systematic Raman scattering investigations on single crystalline Bi2Rh3Se2, we observed the fingerprinting feature o… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  7. arXiv:2407.16308  [pdf, other

    cs.CV eess.IV

    SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging

    Authors: Lingtong Kong, Bo Li, Yike Xiong, Hao Zhang, Hong Gu, Jinwei Chen

    Abstract: Multi-exposure High Dynamic Range (HDR) imaging is a challenging task when facing truncated texture and complex motion. Existing deep learning-based methods have achieved great success by either following the alignment and fusion pipeline or utilizing attention mechanism. However, the large computation cost and inference delay hinder them from deploying on resource limited devices. In this paper,… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  8. arXiv:2407.15530  [pdf, ps, other

    eess.SP

    Pulse Shaping for Random ISAC Signals: The Ambiguity Function Between Symbols Matters

    Authors: Zihan Liao, Fan Liu, Shuangyang Li, Yifeng Xiong, Weijie Yuan, Christos Masouros, Marco Lops

    Abstract: Integrated sensing and communications (ISAC) has emerged as a pivotal enabling technology for next-generation wireless networks. Despite the distinct signal design requirements of sensing and communication (S&C) systems, shifting the symbol-wise pulse shaping (SWiPS) framework from communication-only systems to ISAC poses significant challenges in signal design and processing This paper addresses… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  9. arXiv:2407.14053  [pdf, other

    cs.GR cs.CV

    DirectL: Efficient Radiance Fields Rendering for 3D Light Field Displays

    Authors: Zongyuan Yang, Baolin Liu, Yingde Song, Yongping Xiong, Lan Yi, Zhaohe Zhang, Xunbo Yu

    Abstract: Autostereoscopic display, despite decades of development, has not achieved extensive application, primarily due to the daunting challenge of 3D content creation for non-specialists. The emergence of Radiance Field as an innovative 3D representation has markedly revolutionized the domains of 3D reconstruction and generation. This technology greatly simplifies 3D content creation for common users, b… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  10. arXiv:2407.13976  [pdf, other

    cs.CV

    PlacidDreamer: Advancing Harmony in Text-to-3D Generation

    Authors: Shuo Huang, Shikun Sun, Zixuan Wang, Xiaoyu Qin, Yanmin Xiong, Yuan Zhang, Pengfei Wan, Di Zhang, Jia Jia

    Abstract: Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations.… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

    ACM Class: I.4.0

  11. arXiv:2407.13193  [pdf, other

    cs.CL

    Retrieval-Augmented Generation for Natural Language Processing: A Survey

    Authors: Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

    Abstract: Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge database… ▽ More

    Submitted 18 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  12. arXiv:2407.13168  [pdf, other

    cs.AI cs.CL

    SciCode: A Research Coding Benchmark Curated by Scientists

    Authors: Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo, Yutao Ma, Hao Tong, Kha Trinh, Chenyu Tian, Zihan Wang, Bohao Wu, Yanyu Xiong, Shengzhu Yin, Minhui Zhu, Kilian Lieret, Yanxin Lu, Genglin Liu, Yufeng Du , et al. (5 additional authors not shown)

    Abstract: Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 25 pages, 9 figures, 7 tables

  13. arXiv:2407.12532  [pdf, other

    cs.CL cs.AI

    Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models

    Authors: Xihe Qiu, Haoyu Wang, Xiaoyu Tan, Chao Qu, Yujie Xiong, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Effective collaboration in multi-agent systems requires communicating goals and intentions between agents. Current agent frameworks often suffer from dependencies on single-agent execution and lack robust inter-module communication, frequently leading to suboptimal multi-agent reinforcement learning (MARL) policies and inadequate task coordination. To address these challenges, we present a framewo… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  14. arXiv:2407.12096  [pdf, other

    cond-mat.mes-hall

    Skew-scattering Pockels effect and metallic electro-optics in gapped bilayer graphene

    Authors: Da Ma, Ying Xiong, Justin C. W. Song

    Abstract: We argue that a range of strong metallic electro-optic (EO) effects can be naturally realized from non-Drude dynamics of free carriers in metals. In particular, in clean metals we identify skew-scattering and a "Snap" (third-order derivative of velocity) dominating the Pockels and Kerr EO behavior of metals in the clean limit. Strikingly, we find that both Pockels and Kerr EO in metals play critic… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  15. arXiv:2407.10195  [pdf, other

    cs.CV

    V2I-Calib: A Novel Calibration Approach for Collaborative Vehicle and Infrastructure LiDAR Systems

    Authors: Qianxin Qu, Yijin Xiong, Xin Wu, Hanyu Li, Shichun Guo

    Abstract: Cooperative vehicle and infrastructure LiDAR systems hold great potential, yet their implementation faces numerous challenges. Calibration of LiDAR systems across heterogeneous vehicle and infrastructure endpoints is a critical step to ensure the accuracy and consistency of perception system data, necessitating calibration methods that are real-time and stable. To this end, this paper introduces a… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: to be published in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS2024)

  16. arXiv:2407.09932  [pdf, other

    quant-ph

    Quantum Clock Synchronization Network with Silicon-chip Dual-Pumped Entangled Photon Source

    Authors: J. A. Li, H. Han, X. P. Huang, B. Y. Tang, K. Guo, J. Q. Huang, S. Y. Xiong, W. R. Yu, Z. J. Zhang, J. B. Yang, B. Liu, H. Chen, Z. K. Lu

    Abstract: In this paper, we propose a quantum clock synchronization (QCS) network scheme with silicon-chip dual-pumped entangled photon source. This scheme couples two pump beams into the silicon-based waveguide, where degenerate and non-degenerate spontaneous four-wave mixing (SFWM) occurs, generating entanglement between one signal channel and three idler channels. The entangled photons are distributed to… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  17. arXiv:2407.09816  [pdf, other

    cs.CL

    MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

    Authors: Zhenpeng Su, Zijia Lin, Xue Bai, Xing Wu, Yizhe Xiong, Haoran Lian, Guangyuan Ma, Hui Chen, Guiguang Ding, Wei Zhou, Songlin Hu

    Abstract: Scaling the size of a model enhances its capabilities but significantly increases computation complexity. Mixture-of-Experts models (MoE) address the issue by allowing model size to scale up without substantially increasing training or inference costs. Despite their promising results, MoE models encounter several challenges. Primarily, for dynamic routing methods, the dispersion of training tokens… ▽ More

    Submitted 28 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

    Comments: Work in progress

  18. arXiv:2407.09761  [pdf, other

    stat.ME

    Exploring Differences between Two Decades of Mental Health Related Emergency Department Visits by Youth via Recurrent Events Analyses

    Authors: Yi Xiong, Joan Hu, Rhonda Rosychuk

    Abstract: We aim to develop a tool for understanding how the mental health of youth aged less than 18 years evolve over time through administrative records of mental health related emergency department (MHED) visits in two decades. Administrative health data usually contain rich information for investigating public health issues; however, many restrictions and regulations apply to their use. Moreover, the d… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  19. arXiv:2407.06691  [pdf, other

    cs.IT eess.SP

    OFDM Achieves the Lowest Ranging Sidelobe Under Random ISAC Signaling

    Authors: Fan Liu, Ying Zhang, Yifeng Xiong, Shuangyang Li, Weijie Yuan, Feifei Gao, Shi Jin, Giuseppe Caire

    Abstract: This paper aims to answer a fundamental question in the area of Integrated Sensing and Communications (ISAC): What is the optimal communication-centric ISAC waveform for ranging? Towards that end, we first established a generic framework to analyze the sensing performance of communication-centric ISAC waveforms built upon orthonormal signaling bases and random data symbols. Then, we evaluated thei… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 14 pages, 12 figures, submitted to IEEE for possible publication

  20. arXiv:2407.06358  [pdf, other

    cs.CV

    MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

    Authors: Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan

    Abstract: Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention. However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we propose MiraData, a high-quality video datase… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  21. arXiv:2407.04954  [pdf, other

    eess.SP

    Extremely Large-Scale Dynamic Metasurface Antennas (XL-DMAs): Near-Field Modeling and Channel Estimation

    Authors: Songjie Yang, Wanting Lyu, Boyu Ning, Yue Xiu, Youzhi Xiong, Hua Chen, Chadi Assi, Chau Yuen

    Abstract: Dynamic metasurface antennas (DMAs) represent a novel transceiver array architecture for extremely large-scale (XL) communications, offering the advantages of reduced power consumption and lower hardware costs compared to conventional arrays. This paper focuses on near-field channel estimation for XL-DMAs. We begin by analyzing the near-field characteristics of uniform planar arrays (UPAs) and i… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  22. arXiv:2407.01296  [pdf, other

    cond-mat.mes-hall cond-mat.quant-gas math-ph physics.optics quant-ph

    Non-Hermitian skin effect in arbitrary dimensions: non-Bloch band theory and classification

    Authors: Yuncheng Xiong, Ze-Yu Xing, Haiping Hu

    Abstract: Non-Hermitian skin effect (NHSE) is a distinctive phenomenon in non-Hermitian systems, characterized by a significant accumulation of eigenstates at system boundaries. While well-understood in one dimension via non-Bloch band theory, unraveling the NHSE in higher dimensions faces formidable challenges due to the diversity of open boundary conditions or lattice geometries and inevitable numerical e… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 24 pages, 17 figures

  23. arXiv:2406.19444  [pdf, other

    cond-mat.mes-hall quant-ph

    Kramers Nonlinearity in PT Symmetric Magnets

    Authors: Oles Matsyshyn, Ying Xiong, Justin C. W. Song

    Abstract: Kramers degeneracies play an essential role in the spectrum of electronic materials. Here we argue that beyond spectral properties, Kramers degeneracy plays a critical role in the nonlinear response of PT symmetric magnets. In particular, we uncover a class of second-order Kramers nonlinearities that only arise in the presence of Kramers degeneracy, vanishing in non-degenerate PT symmetric materia… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures

  24. arXiv:2406.18485  [pdf, other

    cs.DC

    LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

    Authors: Diandian Gu, Peng Sun, Qinghao Hu, Ting Huang, Xun Chen, Yingtong Xiong, Guoteng Wang, Qiaoling Chen, Shangchun Zhao, Jiarui Fang, Yonggang Wen, Tianwei Zhang, Xin Jin, Xuanzhe Liu

    Abstract: Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency issues. We propose LoongTrain, a novel system to efficiently train LLMs with long sequences at scale. The core of LoongTrain is the 2D-Attention mecha… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  25. arXiv:2406.12187  [pdf, other

    cond-mat.mtrl-sci

    Diverse Responses in Lattice Thermal Conductivity of $n$-type/$p$-type Semiconductors Driven by Asymmetric Electron-Phonon Interactions

    Authors: Jianshi Sun, Shouhang Li, Zhen Tong, Cheng Shao, Han Xie, Meng An, Chuang Zhang, Xiongfei Zhu, Chen Huang, Yucheng Xiong, Xiangjun Liu

    Abstract: Accurately assessing the impact of electron-phonon interaction (EPI) on the lattice thermal conductivity of semiconductors is crucial for the thermal management of electronic devices and a unified physical understanding of this issue is highly desired. In this work, we predict the lattice thermal conductivities of typical direct and indirect bandgap semiconductors accounting for EPI based on mode-… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages,5 figures

  26. arXiv:2406.11891  [pdf, other

    cs.SI cs.AI cs.LG

    Towards Adaptive Neighborhood for Advancing Temporal Interaction Graph Modeling

    Authors: Siwei Zhang, Xi Chen, Yun Xiong, Xixi Wu, Yao Zhang, Yongrui Fu, Yinglong Zhao, Jiawei Zhang

    Abstract: Temporal Graph Networks (TGNs) have demonstrated their remarkable performance in modeling temporal interaction graphs. These works can generate temporal node representations by encoding the surrounding neighborhoods for the target node. However, an inherent limitation of existing TGNs is their reliance on fixed, hand-crafted rules for neighborhood encoding, overlooking the necessity for an adaptiv… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: KDD'2024 Research Track Paper

  27. arXiv:2406.11836  [pdf, other

    cs.CV cs.GR

    RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians

    Authors: Bingling Li, Shengyi Chen, Luchao Wang, Kaimin Liao, Sijie Yan, Yuanjun Xiong

    Abstract: In this work, we explore the possibility of training high-parameter 3D Gaussian splatting (3DGS) models on large-scale, high-resolution datasets. We design a general model parallel training method for 3DGS, named RetinaGS, which uses a proper rendering equation and can be applied to any scene and arbitrary distribution of Gaussian primitives. It enables us to explore the scaling behavior of 3DGS i… ▽ More

    Submitted 22 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  28. arXiv:2406.11833  [pdf, other

    cs.CV cs.AI cs.LG

    MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

    Authors: Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang

    Abstract: Generating natural and meaningful responses to communicate with multi-modal human inputs is a fundamental capability of Large Vision-Language Models(LVLMs). While current open-source LVLMs demonstrate promising performance in simplified scenarios such as single-turn single-image input, they fall short in real-world conversation scenarios such as following instructions in a long context history wit… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This project is available at https://github.com/Liuziyu77/MMDU

  29. arXiv:2406.08717  [pdf, other

    cond-mat.str-el

    Comparison of superconducting pairing in doped cuprates and nickelates within an extended Hubbard model

    Authors: Yicheng Xiong, Hang Ma, Hongxing Liu, Runyu Ma, Tianxing Ma

    Abstract: Within an extended Hubbard model, we investigate the superconducting pairing behavior of infinite-layer nickelate $\mathrm{NdNiO_2}$ and cuprates superconductors by using the determinant quantum Monte Carlo method. Our focus is on comparing their dominant pairing symmetries. The results indicate that the $d_{x^2-y^2}$ pairing interaction is significantly enhanced at low temperatures in both doped… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 6 pages and 8 figures

  30. arXiv:2406.07548  [pdf, other

    cs.CV cs.IT cs.LG eess.IV

    Image and Video Tokenization with Binary Spherical Quantization

    Authors: Yue Zhao, Yuanjun Xiong, Philipp Krähenbühl

    Abstract: We propose a new transformer-based image and video tokenizer with Binary Spherical Quantization (BSQ). BSQ projects the high-dimensional visual embedding to a lower-dimensional hypersphere and then applies binary quantization. BSQ is (1) parameter-efficient without an explicit codebook, (2) scalable to arbitrary token dimensions, and (3) compact: compressing visual data by up to 100$\times$ with m… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Tech report

  31. arXiv:2406.06609  [pdf, other

    cs.LG cs.AI cs.CV

    Mitigating Bias in Dataset Distillation

    Authors: Justin Cui, Ruochen Wang, Yuanhao Xiong, Cho-Jui Hsieh

    Abstract: Dataset Distillation has emerged as a technique for compressing large datasets into smaller synthetic counterparts, facilitating downstream training tasks. In this paper, we study the impact of bias inside the original dataset on the performance of dataset distillation. With a comprehensive empirical evaluation on canonical datasets with color, corruption and background biases, we found that color… ▽ More

    Submitted 10 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: ICML

  32. arXiv:2406.06069  [pdf, other

    cs.CV

    PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis

    Authors: Jia-wei Chen, Yu-jie Xiong, Yong-bin Gao

    Abstract: Mamba, based on state space model (SSM) with its linear complexity and great success in classification provide its superiority in 3D point cloud analysis. Prior to that, Transformer has emerged as one of the most prominent and successful architectures for point cloud analysis. We present PointABM, a hybrid model that integrates the Mamba and Transformer architectures for enhancing local feature to… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  33. arXiv:2406.05940  [pdf, other

    cs.SE

    M2CVD: Enhancing Vulnerability Semantic through Multi-Model Collaboration for Code Vulnerability Detection

    Authors: Ziliang Wang, Ge Li, Jia Li, Yingfei Xiong, Jia Li, Meng Yan, Zhi Jin

    Abstract: Large Language Models (LLMs) have strong capabilities in code comprehension, but fine-tuning costs and semantic alignment issues limit their project-specific optimization; conversely, code models such CodeBERT are easy to fine-tune, but it is often difficult to learn vulnerability semantics from complex code languages. To address these challenges, this paper introduces the Multi-Model Collaborativ… ▽ More

    Submitted 19 July, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  34. arXiv:2406.04292  [pdf, other

    cs.IR cs.CL cs.CV

    VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval

    Authors: Junjie Zhou, Zheng Liu, Shitao Xiao, Bo Zhao, Yongping Xiong

    Abstract: Multi-modal retrieval becomes increasingly popular in practice. However, the existing retrievers are mostly text-oriented, which lack the capability to process visual information. Despite the presence of vision-language models like CLIP, the current methods are severely limited in representing the text-only and image-only data. In this work, we present a new embedding model VISTA for universal mul… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference

  35. arXiv:2406.04264  [pdf, other

    cs.CV cs.AI cs.CL

    MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding

    Authors: Junjie Zhou, Yan Shu, Bo Zhao, Boya Wu, Shitao Xiao, Xi Yang, Yongping Xiong, Bo Zhang, Tiejun Huang, Zheng Liu

    Abstract: The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insufficient lengths of videos, a lack of diversity in video types and evaluation tasks, and the inappropriateness for evaluating LVU performances. To addres… ▽ More

    Submitted 19 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  36. arXiv:2406.02874  [pdf, other

    cond-mat.mtrl-sci physics.comp-ph

    Giant enhancement of hole mobility for 4H-silicon carbide through suppressing interband electron-phonon scattering

    Authors: Jianshi Sun, Shouhang Li, Zhen Tong, Cheng Shao, Meng An, Xiongfei Zhu, Chuang Zhang, Xiangchuan Chen, Yucheng Xiong, Thomas Frauenheim, Xiangjun Liu

    Abstract: 4H-Silicon Carbide (4H-SiC) possesses a high Baliga figure of merit, making it a promising material for power electronics. However, its applications are limited by its low hole mobility. Herein, we found that the hole mobility of 4H-SiC is mainly limited by the strong interband electron-phonon scattering using mode-level first-principles calculations. Our research indicates that applying compressi… ▽ More

    Submitted 20 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 22 pages, 4 figures

  37. arXiv:2406.00093  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.MM

    Bootstrap3D: Improving 3D Content Creation with Synthetic Data

    Authors: Zeyi Sun, Tong Wu, Pan Zhang, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

    Abstract: Recent years have witnessed remarkable progress in multi-view diffusion models for 3D content creation. However, there remains a significant gap in image quality and prompt-following ability compared to 2D diffusion models. A critical bottleneck is the scarcity of high-quality 3D assets with detailed captions. To address this challenge, we propose Bootstrap3D, a novel framework that automatically… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: Project Page: https://sunzey.github.io/Bootstrap3D/

  38. arXiv:2405.19731  [pdf, other

    cs.DC

    Some New Approaches to MPI Implementations

    Authors: Yuqing Xiong

    Abstract: This paper provides some new approaches to MPI implementations to improve MPI performance. These approaches include dynamically composable libraries, reducing average layer numbers of MPI libraries, and a single entity of MPI-network, MPI-protocol, and MPI.

    Submitted 30 May, 2024; originally announced May 2024.

  39. arXiv:2405.19487  [pdf, other

    cs.CL

    A Full-duplex Speech Dialogue Scheme Based On Large Language Models

    Authors: Peng Wang, Songshuo Lu, Yaohua Tang, Sijie Yan, Yuanjun Xiong, Wei Xia

    Abstract: We present a generative dialogue system capable of operating in a full-duplex manner, allowing for seamless interaction. It is based on a large language model (LLM) carefully aligned to be aware of a perception module, a motor function module, and the concept of a simple finite state machine (called neural FSM) with two states. The perception and motor function modules operate simultaneously, allo… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  40. arXiv:2405.19119  [pdf, other

    cs.LG

    Can Graph Learning Improve Task Planning?

    Authors: Xixi Wu, Yifei Shen, Caihua Shan, Kaitao Song, Siwei Wang, Bohang Zhang, Jiarui Feng, Hong Cheng, Wei Chen, Yun Xiong, Dongsheng Li

    Abstract: Task planning is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the dependencies among them. Consequently, t… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  41. arXiv:2405.17463  [pdf, other

    cs.GT cs.LG

    No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling

    Authors: Ningyuan Chen, Xuefeng Gao, Yi Xiong

    Abstract: When two players are engaged in a repeated game with unknown payoff matrices, they may be completely unaware of the existence of each other and use multi-armed bandit algorithms to choose the actions, which is referred to as the ``blindfolded game'' in this paper. We show that when the players use Thompson sampling, the game dynamics converges to the Nash equilibrium under a mild assumption on the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  42. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  43. arXiv:2405.16127  [pdf, other

    cs.IR

    Finetuning Large Language Model for Personalized Ranking

    Authors: Zhuoxi Bai, Ning Wu, Fengyu Cai, Xinyi Zhu, Yun Xiong

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various domains, motivating researchers to investigate their potential use in recommendation systems. However, directly applying LLMs to recommendation tasks has proven challenging due to the significant disparity between the data used for pre-training LLMs and the specific requirements of recommendation tasks. In this st… ▽ More

    Submitted 20 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  44. arXiv:2405.15198  [pdf, other

    cs.CL

    RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference

    Authors: Lianming Huang, Shangyu Wu, Yufei Cui, Ying Xiong, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

    Abstract: Deploying large language model inference remains challenging due to their high computational overhead. Early exiting accelerates model inference by adaptively reducing the number of inference layers. Existing methods require training internal classifiers to determine whether to exit at each intermediate layer. However, such classifier-based early exiting frameworks require significant effort to de… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  45. arXiv:2405.11914  [pdf, other

    cs.CV

    PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images

    Authors: Yiheng Xiong, Angela Dai

    Abstract: Generating 3D shapes from single RGB images is essential in various applications such as robotics. Current approaches typically target images containing clear and complete visual descriptions of the object, without considering common realistic cases where observations of objects that are largely occluded or truncated. We thus propose a transformer-based autoregressive model to generate the probabi… ▽ More

    Submitted 6 August, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures. Accepted to BMVC 2024

  46. arXiv:2405.11535  [pdf, ps, other

    cs.PL

    Proving Functional Program Equivalence via Directed Lemma Synthesis

    Authors: Yican Sun, Ruyi Ji, Jian Fang, Xuanlin Jiang, Mingshuai Chen, Yingfei Xiong

    Abstract: Proving equivalence between functional programs is a fundamental problem in program verification, which often amounts to reasoning about algebraic data types (ADTs) and compositions of structural recursions. Modern theorem provers address this problem by applying structural induction, which is insufficient for proving many equivalence theorems. In such cases, one has to invent a set of lemmas, pro… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 21 pages

  47. arXiv:2405.10300  [pdf, other

    cs.CV

    Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

    Authors: Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

    Abstract: This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model o… ▽ More

    Submitted 31 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: homepage: https://deepdataspace.com/home

  48. arXiv:2405.10132  [pdf, other

    cs.CV

    Cooperative Visual-LiDAR Extrinsic Calibration Technology for Intersection Vehicle-Infrastructure: A review

    Authors: Xinyu Zhang, Yijin Xiong, Qianxin Qu, Renjie Wang, Xin Gao, Jing Liu, Shichun Guo, Jun Li

    Abstract: In the typical urban intersection scenario, both vehicles and infrastructures are equipped with visual and LiDAR sensors. By successfully integrating the data from vehicle-side and road monitoring devices, a more comprehensive and accurate environmental perception and information acquisition can be achieved. The Calibration of sensors, as an essential component of autonomous driving technology, ha… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  49. arXiv:2405.07144  [pdf, other

    quant-ph

    Optical transition parameters of the silicon T centre

    Authors: Chloe Clear, Sara Hosseini, Amirhossein AlizadehKhaledi, Nicholas Brunelle, Austin Woolverton, Joshua Kanaganayagam, Moein Kazemi, Camille Chartrand, Mehdi Keshavarz, Yihuang Xiong, Oney O. Soykal, Geoffroy Hautier, Valentin Karassiouk, Mike Thewalt, Daniel Higginbottom, Stephanie Simmons

    Abstract: The silicon T centre's narrow, telecommunications-band optical emission, long spin coherence, and direct photonic integration have spurred interest in this emitter as a spin-photon interface for distributed quantum computing and networking. However, key parameters of the T centre's spin-selective optical transitions remain undetermined or ambiguous in literature. In this paper we present a Hamilto… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 9 pages and 6 figures in the main manuscript. 10 pages and 6 figures in the supplementary information

  50. arXiv:2405.05165  [pdf, other

    cond-mat.mtrl-sci quant-ph

    Discovery of T center-like quantum defects in silicon

    Authors: Yihuang Xiong, Jiongzhi Zheng, Shay McBride, Xueyue Zhang, Sinéad M. Griffin, Geoffroy Hautier

    Abstract: Quantum technologies would benefit from the development of high performance quantum defects acting as single-photon emitters or spin-photon interface. Finding such a quantum defect in silicon is especially appealing in view of its favorable spin bath and high processability. While some color centers in silicon have been emerging in quantum applications, there is still a need to search and develop… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.