(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 168 results for author: Ye, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (172 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  2. arXiv:2407.21416  [pdf, other

    cs.CV cs.RO

    VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning

    Authors: Yuhang Ming, Minyang Xu, Xingrui Yang, Weicai Ye, Weihan Wang, Yong Peng, Weichen Dai, Wanzeng Kong

    Abstract: Visual place recognition (VPR) is an essential component of many autonomous and augmented/virtual reality systems. It enables the systems to robustly localize themselves in large-scale environments. Existing VPR methods demonstrate attractive performance at the cost of heavy pre-training and limited generalizability. When deployed in unseen environments, these methods exhibit significant performan… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  3. arXiv:2407.15498  [pdf, other

    cs.CL

    Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction

    Authors: Dingyao Yu, Yang An, Wei Ye, Xiongfeng Xiao, Shaoguang Mao, Tao Ge, Shikun Zhang

    Abstract: Chinese Spelling Correction (CSC) commonly lacks large-scale high-quality corpora, due to the labor-intensive labeling of spelling errors in real-life human writing or typing scenarios. Two data augmentation methods are widely adopted: (1) \textit{Random Replacement} with the guidance of confusion sets and (2) \textit{OCR/ASR-based Generation} that simulates character misusing. However, both metho… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  4. arXiv:2407.12851  [pdf

    cs.CL

    ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data

    Authors: Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Junqiu Ye, Chu Liao, Qi Hao, Wen Ye, Cheng Luo, Xinyan Wang, Chuang Cheng, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou

    Abstract: Symptom phenotypes are one of the key types of manifestations for diagnosis and treatment of various disease conditions. However, the diversity of symptom terminologies is one of the major obstacles hindering the analysis and knowledge sharing of various types of symptom-related medical data particularly in the fields of Traditional Chinese Medicine (TCM). Objective: This study aimed to construct… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 39 pages, 6 figures, 6 tables

  5. arXiv:2407.00100  [pdf, other

    cs.LG cs.AI cs.CL

    Enhancing In-Context Learning via Implicit Demonstration Augmentation

    Authors: Xiaoling Zhou, Wei Ye, Yidong Wang, Chaoya Jiang, Zhemg Lee, Rui Xie, Shikun Zhang

    Abstract: The emergence of in-context learning (ICL) enables large pre-trained language models (PLMs) to make predictions for unseen inputs without updating parameters. Despite its potential, ICL's effectiveness heavily relies on the quality, quantity, and permutation of demonstrations, commonly leading to suboptimal and unstable performance. In this paper, we tackle this challenge for the first time from t… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 Main 19 pages,10 figures

    ACM Class: I.2.7

  6. arXiv:2406.17126  [pdf, other

    cs.CV cs.LG

    MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

    Authors: Wenqian Ye, Guangtao Zheng, Yunsheng Ma, Xu Cao, Bolin Lai, James M. Rehg, Aidong Zhang

    Abstract: Spurious bias, a tendency to use spurious correlations between non-essential input attributes and target variables for predictions, has revealed a severe robustness pitfall in deep learning models trained on single modality data. Multimodal Large Language Models (MLLMs), which integrate both vision and language models, have demonstrated strong capability in joint vision-language understanding. How… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  7. arXiv:2406.10742  [pdf, other

    cs.CV

    Spuriousness-Aware Meta-Learning for Learning Robust Classifiers

    Authors: Guangtao Zheng, Wenqian Ye, Aidong Zhang

    Abstract: Spurious correlations are brittle associations between certain attributes of inputs and target variables, such as the correlation between an image background and an object class. Deep image classifiers often leverage them for predictions, leading to poor generalization on the data where the correlations do not hold. Mitigating the impact of spurious correlations is crucial towards robust model gen… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to KDD 2024

  8. arXiv:2406.10424  [pdf, other

    cs.CV cs.AI

    What is the Visual Cognition Gap between Humans and Multimodal LLMs?

    Authors: Xu Cao, Bolin Lai, Wenqian Ye, Yunsheng Ma, Joerg Heintz, Jintai Chen, Jianguo Cao, James M. Rehg

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addressing visual cognition problems that require high-level reasoning is not well-established. One such challenge is abstract visual reasoning (AVR) -- the cognitive ability to discern relationships… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures, the appendix will be updated soon

    MSC Class: 68T01

  9. arXiv:2406.10252  [pdf, other

    cs.IR cs.AI cs.CL

    AutoSurvey: Large Language Models Can Automatically Write Surveys

    Authors: Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang

    Abstract: This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due to the vast volume and complexity of information, prompting the need for efficient survey methods. While large language models (LLMs) offer promise in… ▽ More

    Submitted 17 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  10. arXiv:2406.10163  [pdf, other

    cs.CV cs.AI

    MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

    Authors: Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang

    Abstract: Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project Page: https://buaacyw.github.io/mesh-anything/ Code: https://github.com/buaacyw/MeshAnything

  11. arXiv:2406.09739  [pdf, other

    cs.CV

    Decoupling Forgery Semantics for Generalizable Deepfake Detection

    Authors: Wei Ye, Xinan He, Feng Ding

    Abstract: In this paper, we propose a novel method for detecting DeepFakes, enhancing the generalization of detection through semantic decoupling. There are now multiple DeepFake forgery technologies that not only possess unique forgery semantics but may also share common forgery semantics. The unique forgery semantics and irrelevant content semantics may promote over-fitting and hamper generalization for D… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  12. arXiv:2406.08116  [pdf, other

    cs.CL cs.AI

    Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling

    Authors: Zile Qiao, Wei Ye, Yong Jiang, Tong Mo, Pengjun Xie, Weiping Li, Fei Huang, Shikun Zhang

    Abstract: Retrieval-augmented language models (RALMs) have recently shown great potential in mitigating the limitations of implicit knowledge in LLMs, such as untimely updating of the latest expertise and unreliable retention of long-tail knowledge. However, since the external knowledge base, as well as the retriever, can not guarantee reliability, potentially leading to the knowledge retrieved not being he… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.06521  [pdf, other

    cs.CV

    PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction

    Authors: Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, Guofeng Zhang

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has attracted widespread attention due to its high-quality rendering, and ultra-fast training and rendering speed. However, due to the unstructured and irregular nature of Gaussian point clouds, it is difficult to guarantee geometric reconstruction accuracy and multi-view consistency simply by relying on image reconstruction loss. Although many studies on sur… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: project page: https://zju3dv.github.io/pgsr/

  14. arXiv:2405.15173  [pdf, other

    cs.CV

    A3:Ambiguous Aberrations Captured via Astray-Learning for Facial Forgery Semantic Sublimation

    Authors: Xinan He, Yue Zhou, Wei Ye, Feng Ding

    Abstract: Prior DeepFake detection methods have faced a core challenge in preserving generalizability and fairness effectively. In this paper, we proposed an approach akin to decoupling and sublimating forgery semantics, named astray-learning. The primary objective of the proposed method is to blend hybrid forgery semantics derived from high-frequency components into authentic imagery, named aberrations. Th… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 19 pages, 9 figures

  15. arXiv:2405.13573  [pdf, other

    cs.RO

    Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance

    Authors: Kaifeng Zhang, Zhao-Heng Yin, Weirui Ye, Yang Gao

    Abstract: Defining reward functions for skill learning has been a long-standing challenge in robotics. Recently, vision-language models (VLMs) have shown promise in defining reward signals for teaching robots manipulation skills. However, existing works often provide reward guidance that is too coarse, leading to inefficient learning processes. In this paper, we address this issue by implementing more fine-… ▽ More

    Submitted 1 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  16. arXiv:2405.11930  [pdf, other

    cs.LG

    Data Contamination Calibration for Black-box LLMs

    Authors: Wentao Ye, Jiaqi Hu, Liyao Li, Haobo Wang, Gang Chen, Junbo Zhao

    Abstract: The rapid advancements of Large Language Models (LLMs) tightly associate with the expansion of the training data size. However, the unchecked ultra-large-scale training sets introduce a series of potential risks like data contamination, i.e. the benchmark data is used for training. In this work, we propose a holistic method named Polarized Augment Calibration (PAC) along with a new to-be-released… ▽ More

    Submitted 3 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  17. arXiv:2405.06422  [pdf, other

    cs.RO cs.AI

    Contextual Affordances for Safe Exploration in Robotic Scenarios

    Authors: William Z. Ye, Eduardo B. Sandoval, Pamela Carreno-Medrano, Francisco Cru

    Abstract: Robotics has been a popular field of research in the past few decades, with much success in industrial applications such as manufacturing and logistics. This success is led by clearly defined use cases and controlled operating environments. However, robotics has yet to make a large impact in domestic settings. This is due in part to the difficulty and complexity of designing mass-manufactured robo… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 5 pages, 2 figures. Accepted at the 2nd Workshop on Human-aligned Reinforcement Learning for Autonomous Agents and Robots HARL, at the IEEE International Conference on Robotics and Automation ICRA, Yokohama, Japan, 2024

  18. arXiv:2405.05945  [pdf, other

    cs.CV

    Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

    Authors: Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li

    Abstract: Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Technical Report; Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  19. arXiv:2405.05545  [pdf, other

    cs.LG stat.ML

    Deep Hierarchical Graph Alignment Kernels

    Authors: Shuhao Tang, Hao Tian, Xiaofeng Cao, Wei Ye

    Abstract: Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relati… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  20. arXiv:2405.03649  [pdf, other

    cs.LG cs.CV

    Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation

    Authors: Guangtao Zheng, Wenqian Ye, Aidong Zhang

    Abstract: Deep neural classifiers tend to rely on spurious correlations between spurious attributes of inputs and targets to make predictions, which could jeopardize their generalization capability. Training classifiers robust to spurious correlations typically relies on annotations of spurious correlations in data, which are often expensive to get. In this paper, we tackle an annotation-free setting and pr… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to IJCAI 2024

  21. arXiv:2405.00958  [pdf, other

    cs.LG cs.AI cs.HC eess.SY

    Generative manufacturing systems using diffusion models and ChatGPT

    Authors: Xingyu Li, Fei Tao, Wei Ye, Aydin Nassehi, John W. Sutherland

    Abstract: In this study, we introduce Generative Manufacturing Systems (GMS) as a novel approach to effectively manage and coordinate autonomous manufacturing assets, thereby enhancing their responsiveness and flexibility to address a wide array of production objectives and human preferences. Deviating from traditional explicit modeling, GMS employs generative AI, including diffusion models and ChatGPT, for… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  22. arXiv:2405.00353  [pdf, ps, other

    cs.GT

    Dual-Role AoI-based Incentive Mechanism for HD map Crowdsourcing

    Authors: Wentao Ye, Bo Liu, Yuan Luo, Jianwei Huang

    Abstract: A high-quality fresh high-definition (HD) map is vital in enhancing transportation efficiency and safety in autonomous driving. Vehicle-based crowdsourcing offers a promising approach for updating HD maps. However, recruiting crowdsourcing vehicles involves making the challenging tradeoff between the HD map freshness and recruitment costs. Existing studies on HD map crowdsourcing often (1) priorit… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  23. arXiv:2404.18961  [pdf, other

    cs.LG cs.AI cs.CV

    Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras

    Authors: Jun Yu, Yutong Dai, Xiaokang Liu, Jin Huang, Yishan Shen, Ke Zhang, Rong Zhou, Eashan Adhikarla, Wenxuan Ye, Yixin Liu, Zhaoming Kong, Kai Zhang, Yilong Yin, Vinod Namboodiri, Brian D. Davison, Jason H. Moore, Yong Chen

    Abstract: MTL is a learning paradigm that effectively leverages both task-specific and shared information to address multiple related tasks simultaneously. In contrast to STL, MTL offers a suite of benefits that enhance both the training process and the inference efficiency. MTL's key advantages encompass streamlined model architecture, performance enhancement, and cross-domain generalizability. Over the pa… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 60 figures, 116 pages, 500+ references

  24. arXiv:2404.16307  [pdf, other

    cs.LG cs.CV

    Boosting Model Resilience via Implicit Adversarial Data Augmentation

    Authors: Xiaoling Zhou, Wei Ye, Zhemg Lee, Rui Xie, Shikun Zhang

    Abstract: Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To address this, we propose to augment the deep features of samples by incorporating their adversarial and anti-adversarial perturbation distributions, enabling adapti… ▽ More

    Submitted 1 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures, accepted by IJCAI 2024

    ACM Class: I.2.6; I.4.3

  25. arXiv:2404.10719  [pdf, other

    cs.CL

    Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

    Authors: Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorized as either reward-based or reward-free. Novel applications such as ChatGPT and Claude leverage reward-based methods that first learn a reward model and apply actor-critic algorithms, such as Proximal… ▽ More

    Submitted 21 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 16 pages, 2 figures, 14 tables

  26. arXiv:2404.06003  [pdf, other

    cs.CL cs.AI

    FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models

    Authors: Zhuohao Yu, Chang Gao, Wenjin Yao, Yidong Wang, Zhengran Zeng, Wei Ye, Jindong Wang, Yue Zhang, Shikun Zhang

    Abstract: The rapid development of large language model (LLM) evaluation methodologies and datasets has led to a profound challenge: integrating state-of-the-art evaluation techniques cost-effectively while ensuring reliability, reproducibility, and efficiency. Currently, there is a notable absence of a unified and adaptable framework that seamlessly integrates various evaluation approaches. Moreover, the r… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: We open-source all our code at: https://github.com/WisdomShell/FreeEval

  27. arXiv:2404.04681  [pdf, other

    cs.IT

    Computation and Critical Transitions of Rate-Distortion-Perception Functions With Wasserstein Barycenter

    Authors: Chunhui Chen, Xueyan Niu, Wenhao Ye, Hao Wu, Bo Bai

    Abstract: The information rate-distortion-perception (RDP) function characterizes the three-way trade-off between description rate, average distortion, and perceptual quality measured by discrepancy between probability distributions. We study several variants of the RDP functions through the lens of optimal transport. By transforming the information RDP function into a Wasserstein Barycenter problem, we ide… ▽ More

    Submitted 9 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.14611. This paper was presented in part at the 2023 IEEE International Symposium on Information Theory

  28. arXiv:2403.19495  [pdf, other

    cs.CV cs.GR

    CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

    Authors: Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari

    Abstract: The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured p… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Project page: https://people.engr.tamu.edu/nimak/Papers/CoherentGS

  29. arXiv:2403.19287  [pdf, other

    cs.SE

    CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios

    Authors: Zhengran Zeng, Yidong Wang, Rui Xie, Wei Ye, Shikun Zhang

    Abstract: In the evolving landscape of large language models (LLMs) tailored for software engineering, the need for benchmarks that accurately reflect real-world development scenarios is paramount. Current benchmarks are either too simplistic or fail to capture the multi-tasking nature of software development. To address this, we introduce CoderUJB, a new benchmark designed to evaluate LLMs across diverse J… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures, issta2024 accepted

    MSC Class: 68N30 (Primary) 68T20 (Secondary) ACM Class: D.2.0

  30. arXiv:2403.15747  [pdf, other

    cs.SE cs.AI

    CodeShell Technical Report

    Authors: Rui Xie, Zhengran Zeng, Zhuohao Yu, Chang Gao, Shikun Zhang, Wei Ye

    Abstract: Code large language models mark a pivotal breakthrough in artificial intelligence. They are specifically crafted to understand and generate programming languages, significantly boosting the efficiency of coding development workflows. In this technical report, we present CodeShell-Base, a seven billion-parameter foundation model with 8K context length, showcasing exceptional proficiency in code com… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  31. arXiv:2403.15226  [pdf, other

    cs.MM cs.CL

    Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models

    Authors: Qiong Wu, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

    Abstract: In this paper, we propose a novel parameter and computation efficient tuning method for Multi-modal Large Language Models (MLLMs), termed Efficient Attention Skipping (EAS). Concretely, we first reveal that multi-head attentions (MHAs), the main computational overhead of MLLMs, are often redundant to downstream tasks. Based on this observation, EAS evaluates the attention redundancy and skips the… ▽ More

    Submitted 5 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  32. arXiv:2403.07883  [pdf, other

    cs.CV cs.AI

    Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection

    Authors: Wei Ye, Chaoya Jiang, Haiyang Xu, Chenhao Ye, Chenliang Li, Ming Yan, Shikun Zhang, Songhang Huang, Fei Huang

    Abstract: Vision Transformers (ViTs) have become increasingly popular in large-scale Vision and Language Pre-training (VLP) models. Although previous VLP research has demonstrated the efficacy of ViTs, these efforts still struggle with computational inefficiencies caused by lengthy visual sequences. To address this challenge, we introduce an efficient VLP approach called TRIPS, which stands for Text-Relevan… ▽ More

    Submitted 11 January, 2024; originally announced March 2024.

  33. arXiv:2403.07408  [pdf, other

    cs.CV

    NightHaze: Nighttime Image Dehazing via Self-Prior Learning

    Authors: Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Robby T. Tan

    Abstract: Masked autoencoder (MAE) shows that severe augmentation during training produces robust representations for high-level tasks. This paper brings the MAE-like framework to nighttime image enhancement, demonstrating that severe augmentation during training produces strong network priors that are resilient to real-world night haze degradations. We propose a novel nighttime image dehazing method with s… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  34. arXiv:2403.06467  [pdf, other

    cs.CV

    Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy

    Authors: Jiuming Liu, Ruiji Yu, Yian Wang, Yu Zheng, Tianchen Deng, Weicai Ye, Hesheng Wang

    Abstract: Recently, state space model (SSM) has gained great attention due to its promising performance, linear complexity, and long sequence modeling ability in both language and image domains. However, it is non-trivial to extend SSM to the point cloud field, because of the causality requirement of SSM and the disorder and irregularity nature of point clouds. In this paper, we propose a novel SSM-based po… ▽ More

    Submitted 17 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  35. arXiv:2403.03100  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

    Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

    Abstract: While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di… ▽ More

    Submitted 23 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

  36. arXiv:2403.00564  [pdf, other

    cs.LG cs.AI cs.RO

    EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

    Authors: Shengjie Wang, Shaohuai Liu, Weirui Ye, Jiacheng You, Yang Gao

    Abstract: Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded th… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 21 pages,10 figures

  37. arXiv:2402.17010  [pdf, other

    cs.CL cs.AI

    Can Large Language Models Recall Reference Location Like Humans?

    Authors: Ye Wang, Xinrun Xu, Rui Xie, Wenxin Hu, Wei Ye

    Abstract: When completing knowledge-intensive tasks, humans sometimes need not just an answer but also a corresponding reference passage for auxiliary reading. Previous methods required obtaining pre-segmented article chunks through additional retrieval models. This paper explores leveraging the parameterized knowledge stored during the pre-training phase of large language models (LLMs) to independently rec… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  38. arXiv:2402.15721  [pdf, other

    cs.AI cs.CL

    Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

    Authors: Chaoya Jiang, Wei Ye, Mengfan Dong, Hongrui Jia, Haiyang Xu, Ming Yan, Ji Zhang, Shikun Zhang

    Abstract: Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions. Previous hallucination evaluation studies on LVLMs have identified hallucinations in terms of objects, attributes, and relations but overlooked complex hallucinations that create an entire narrative around a fictional entity. In this paper, we introdu… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  39. arXiv:2402.15043  [pdf, other

    cs.CL cs.AI cs.LG

    KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

    Authors: Zhuohao Yu, Chang Gao, Wenjin Yao, Yidong Wang, Wei Ye, Jindong Wang, Xing Xie, Yue Zhang, Shikun Zhang

    Abstract: Automatic evaluation methods for large language models (LLMs) are hindered by data contamination, leading to inflated assessments of their effectiveness. Existing strategies, which aim to detect contaminated texts, focus on quantifying contamination status instead of accurately gauging model performance. In this paper, we introduce KIEval, a Knowledge-grounded Interactive Evaluation framework, whi… ▽ More

    Submitted 3 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024 (main conference); 19 pages, 5 figures, 19 tables, code is available at: https://github.com/zhuohaoyu/KIEval

  40. arXiv:2402.14464  [pdf, other

    cs.CV

    NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection

    Authors: Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui Huang, Binbin Lin, Deng Cai, Wanli Ouyang

    Abstract: NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by innovatively utilizing NeRF to enhance representation learning. Despite its notable performance, we uncover three decisive shortcomings in its current design, including semantic ambiguity, inappropriate sampling, and insufficient utilization of depth supervision. To combat the aforementioned problems, we present thre… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 7 pages, 2 figures

  41. arXiv:2402.12715  [pdf, other

    cs.LG

    Spurious Correlations in Machine Learning: A Survey

    Authors: Wenqian Ye, Guangtao Zheng, Xu Cao, Yunsheng Ma, Aidong Zhang

    Abstract: Machine learning systems are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. These features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's genera… ▽ More

    Submitted 16 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Version 2; Github Link: https://github.com/wenqian-ye/Awesome-Spurious-Correlations

  42. arXiv:2402.02500  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

    Authors: Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He

    Abstract: In robot learning, the observation space is crucial due to the distinct characteristics of different modalities, which can potentially become a bottleneck alongside policy design. In this study, we explore the influence of various observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud. We introduce OBSBench, a benchmark comprising two simulators… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  43. arXiv:2401.00729  [pdf, other

    cs.CV

    NightRain: Nighttime Video Deraining via Adaptive-Rain-Removal and Adaptive-Correction

    Authors: Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Shunli Zhang, Robby Tan

    Abstract: Existing deep-learning-based methods for nighttime video deraining rely on synthetic data due to the absence of real-world paired data. However, the intricacies of the real world, particularly with the presence of light effects and low-light regions affected by noise, create significant domain gaps, hampering synthetic-trained models in removing rain streaks properly and leading to over-saturation… ▽ More

    Submitted 10 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI24

  44. arXiv:2312.15918  [pdf, other

    cs.CL cs.AI

    Supervised Knowledge Makes Large Language Models Better In-context Learners

    Authors: Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang

    Abstract: Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The recent progress in large-scale generative models has further expanded their use in real-world language applications. However, the critical challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. While… ▽ More

    Submitted 11 April, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: Accepted to ICLR 2024

  45. arXiv:2312.12068  [pdf, other

    cs.CV cs.AI cs.LG

    PICNN: A Pathway towards Interpretable Convolutional Neural Networks

    Authors: Wengang Guo, Jiayi Yang, Huilin Yin, Qijun Chen, Wei Ye

    Abstract: Convolutional Neural Networks (CNNs) have exhibited great performance in discriminative feature learning for complex visual tasks. Besides discrimination power, interpretability is another important yet under-explored property for CNNs. One difficulty in the CNN interpretability is that filters and image classes are entangled. In this paper, we introduce a novel pathway to alleviate the entangleme… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  46. arXiv:2312.09086  [pdf, other

    cs.LG cs.NE

    COMBHelper: A Neural Approach to Reduce Search Space for Graph Combinatorial Problems

    Authors: Hao Tian, Sourav Medya, Wei Ye

    Abstract: Combinatorial Optimization (CO) problems over graphs appear routinely in many applications such as in optimizing traffic, viral marketing in social networks, and matching for job allocation. Due to their combinatorial nature, these problems are often NP-hard. Existing approximation algorithms and heuristics rely on the search space to find the solutions and become time-consuming when this space is… ▽ More

    Submitted 1 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  47. arXiv:2312.08846  [pdf, other

    cs.LG cs.CL cs.CV

    TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training

    Authors: Chaoya Jiang, Wei ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang

    Abstract: Self-supervised Multi-modal Contrastive Learning (SMCL) remarkably advances modern Vision-Language Pre-training (VLP) models by aligning visual and linguistic modalities. Due to noises in web-harvested text-image pairs, however, scaling up training data volume in SMCL presents considerable obstacles in terms of computational cost and data inefficiency. To improve data efficiency in VLP, we propose… ▽ More

    Submitted 23 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted on AAAI2024

  48. arXiv:2312.08726  [pdf, other

    cs.CL cs.AI

    Labels Need Prompts Too: Mask Matching for Natural Language Understanding Tasks

    Authors: Bo Li, Wei Ye, Quansen Wang, Wen Zhao, Shikun Zhang

    Abstract: Textual label names (descriptions) are typically semantically rich in many natural language understanding (NLU) tasks. In this paper, we incorporate the prompting methodology, which is widely used to enrich model input, into the label side for the first time. Specifically, we propose a Mask Matching method, which equips an input with a prompt and its label with another, and then makes predictions… ▽ More

    Submitted 15 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: AAAI2024, Regular Paper

  49. arXiv:2312.06968  [pdf, other

    cs.CV

    Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

    Authors: Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang

    Abstract: Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks. However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information. In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning. We first… ▽ More

    Submitted 23 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  50. arXiv:2312.04372  [pdf, other

    cs.CL cs.AI

    LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs

    Authors: Yunsheng Ma, Can Cui, Xu Cao, Wenqian Ye, Peiran Liu, Juanwu Lu, Amr Abdelraouf, Rohit Gupta, Kyungtae Han, Aniket Bera, James M. Rehg, Ziran Wang

    Abstract: Autonomous driving (AD) has made significant strides in recent years. However, existing frameworks struggle to interpret and execute spontaneous user instructions, such as "overtake the car ahead." Large Language Models (LLMs) have demonstrated impressive reasoning capabilities showing potential to bridge this gap. In this paper, we present LaMPilot, a novel framework that integrates LLMs into AD… ▽ More

    Submitted 4 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: CVPR 2024