(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 413 results for author: Tang, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.07406  [pdf, other

    cs.CV cs.AI

    Weakly-supervised Medical Image Segmentation with Gaze Annotations

    Authors: Yuan Zhong, Chenhui Tang, Yumeng Yang, Ruoxi Qi, Kang Zhou, Yuqi Gong, Pheng Ann Heng, Janet H. Hsiao, Qi Dou

    Abstract: Eye gaze that reveals human observational patterns has increasingly been incorporated into solutions for vision tasks. Despite recent explorations on leveraging gaze to aid deep networks, few studies exploit gaze as an efficient annotation approach for medical image segmentation which typically entails heavy annotating costs. In this paper, we propose to collect dense weak supervision for medical… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  2. arXiv:2407.05784  [pdf, other

    cs.AR

    Hecaton: Training and Finetuning Large Language Models with Scalable Chiplet Systems

    Authors: Zongle Huang, Shupei Fan, Chen Tang, Xinyuan Lin, Shuwen Deng, Yongpan Liu

    Abstract: Large Language Models (LLMs) have achieved remarkable success in various fields, but their training and finetuning require massive computation and memory, necessitating parallelism which introduces heavy communication overheads. Driven by advances in packaging, the chiplet architecture emerges as a potential solution, as it can integrate computing power, as well as utilize on-package links with be… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2407.05010  [pdf, other

    cs.CV

    PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference

    Authors: Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi Wang, Wenwu Zhu

    Abstract: We introduce PRANCE, a Vision Transformer compression framework that jointly optimizes the activated channels and reduces tokens, based on the characteristics of inputs. Specifically, PRANCE~ leverages adaptive token optimization strategies for a certain computational budget, aiming to accelerate ViTs' inference from a unified data and architectural perspective. However, the joint framework poses… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  4. arXiv:2407.04281  [pdf, other

    cs.RO

    WOMD-Reasoning: A Large-Scale Language Dataset for Interaction and Driving Intentions Reasoning

    Authors: Yiheng Li, Chongjian Ge, Chenran Li, Chenfeng Xu, Masayoshi Tomizuka, Chen Tang, Mingyu Ding, Wei Zhan

    Abstract: We propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a language annotation dataset built on WOMD, with a focus on describing and reasoning interactions and intentions in driving scenarios. Previous language datasets primarily captured interactions caused by close distances. However, interactions induced by traffic rules and human intentions, which can occur over long distances, are yet… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  5. arXiv:2407.00898  [pdf, other

    cs.RO

    Residual-MPPI: Online Policy Customization for Continuous Control

    Authors: Pengcheng Wang, Chenran Li, Catherine Weaver, Kenta Kawamoto, Masayoshi Tomizuka, Chen Tang, Wei Zhan

    Abstract: Policies learned through Reinforcement Learning (RL) and Imitation Learning (IL) have demonstrated significant potential in achieving advanced performance in continuous control tasks. However, in real-world environments, it is often necessary to further customize a trained policy when there are additional requirements that were unforeseen during the original training phase. It is possible to fine-… ▽ More

    Submitted 11 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  6. arXiv:2407.00614  [pdf, other

    cs.RO cs.CV eess.IV

    Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Grasping in Dexterous Robotics

    Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

    Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved. To address this, we pr… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

  7. arXiv:2407.00020  [pdf, other

    cs.CV cs.AI cs.CL cs.IT cs.LG

    Visual Language Model based Cross-modal Semantic Communication Systems

    Authors: Feibo Jiang, Chuanguo Tang, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan

    Abstract: Semantic Communication (SC) has emerged as a novel communication paradigm in recent years, successfully transcending the Shannon physical capacity limits through innovative semantic transmission concepts. Nevertheless, extant Image Semantic Communication (ISC) systems face several challenges in dynamic environments, including low semantic density, catastrophic forgetting, and uncertain Signal-to-N… ▽ More

    Submitted 6 May, 2024; originally announced July 2024.

    Comments: 12 pages, 10 figures

  8. arXiv:2406.20038  [pdf, other

    cs.CL

    BioMNER: A Dataset for Biomedical Method Entity Recognition

    Authors: Chen Tang, Bohao Yang, Kun Zhao, Bo Lv, Chenghao Xiao, Frank Guerin, Chenghua Lin

    Abstract: Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  9. arXiv:2406.17962  [pdf, other

    cs.CL

    SimsChat: A Customisable Persona-Driven Role-Playing Agent

    Authors: Bohao Yang, Dong Liu, Chen Tang, Chenghao Xiao, Kun Zhao, Chao Li, Lin Yuan, Guang Yang, Lanxiao Huang, Chenghua Lin

    Abstract: Large Language Models (LLMs) possess the remarkable capability to understand human instructions and generate high-quality text, enabling them to act as agents that simulate human behaviours. This capability allows LLMs to emulate human beings in a more advanced manner, beyond merely replicating simple human behaviours. However, there is a lack of exploring into leveraging LLMs to craft characters… ▽ More

    Submitted 30 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  10. arXiv:2406.17911  [pdf, other

    cs.CL

    X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms

    Authors: Kun Zhao, Chenghao Xiao, Chen Tang, Bohao Yang, Kai Ye, Noura Al Moubayed, Liang Zhan, Chenghua Lin

    Abstract: Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This… ▽ More

    Submitted 30 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  11. arXiv:2406.17681  [pdf, other

    cs.CL

    VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

    Authors: Kun Qian, Shunji Wan, Claudia Tang, Youzhi Wang, Xuanming Zhang, Maximillian Chen, Zhou Yu

    Abstract: As large language models achieve impressive scores on traditional benchmarks, an increasing number of researchers are becoming concerned about benchmark data leakage during pre-training, commonly known as the data contamination problem. To ensure fair evaluation, recent benchmarks release only the training and validation sets, keeping the test set labels closed-source. They require anyone wishing… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  12. arXiv:2406.17343  [pdf, other

    cs.CV cs.AI

    Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

    Authors: Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu

    Abstract: Recent advancements in diffusion models, particularly the trend of architectural transformation from UNet-based Diffusion to Diffusion Transformer (DiT), have significantly improved the quality and scalability of image synthesis. Despite the incredible generative quality, the large computational requirements of these large-scale models significantly hinder the deployments in real-world scenarios.… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  13. arXiv:2406.16258  [pdf, other

    cs.RO cs.AI cs.LG

    MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

    Authors: Yuxin Chen, Chen Tang, Chenran Li, Ran Tian, Peter Stone, Masayoshi Tomizuka, Wei Zhan

    Abstract: Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thu… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    ACM Class: I.2.6; I.2.9

  14. arXiv:2406.15704  [pdf, other

    cs.CV

    video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

    Authors: Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required b… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024. arXiv admin note: substantial text overlap with arXiv:2310.05863

  15. arXiv:2406.12928  [pdf, other

    cs.LG cs.AI cs.CL

    Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox

    Authors: Yijun Liu, Yuan Meng, Fang Wu, Shenhao Peng, Hang Yao, Chaoyu Guan, Chen Tang, Xinzhu Ma, Zhi Wang, Wenwu Zhu

    Abstract: Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications. As an effective means to reduce memory footprint and inference cost, quantization also faces challenges in performance degradation at low bit-widths. Understanding the impact of quantization on LLM capabilities, espec… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  16. arXiv:2406.07914  [pdf, other

    cs.SD eess.AS

    Can Large Language Models Understand Spatial Audio?

    Authors: Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and lo… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  17. arXiv:2406.07028  [pdf, other

    cs.LG cs.AI

    Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets

    Authors: Chenxia Tang

    Abstract: In this paper, we attempt to address the challenge of applying Neural Architecture Search (NAS) algorithms, specifically the Differentiable Architecture Search (DARTS), to long-tailed datasets where class distribution is highly imbalanced. We observe that traditional re-sampling and re-weighting techniques, which are effective in standard classification tasks, lead to performance degradation when… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  18. arXiv:2406.05962  [pdf, other

    cs.DC cs.DB

    Data Caching for Enterprise-Grade Petabyte-Scale OLAP

    Authors: Chunxu Tang, Bin Fan, Jing Zhao, Chen Liang, Yi Wang, Beinan Wang, Ziyue Qiu, Lu Qiu, Bowen Ding, Shouzhuo Sun, Saiguang Che, Jiaming Mai, Shouwei Chen, Yu Zhu, Jianjian Xie, Yutian, Sun, Yao Li, Yangjun Zhang, Ke Wang, Mingmin Chen

    Abstract: With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic with potential throttling, as well as skewed and fragmented data access patterns. Addressing these ch… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to the USENIX Annual Technical Conference (USENIX ATC) 2024

  19. arXiv:2406.05543  [pdf, other

    cs.CV cs.AI

    VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification

    Authors: Jianmeng Liu, Yichen Liu, Yuyao Zhang, Zeyuan Meng, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Recent conditional 3D completion works have mainly relied on CLIP or BERT to encode textual information, which cannot support complex instruction. Meanwhile, large language models (LLMs) have shown great potential in multi-modal understanding and generation tasks. Inspired by the recent advancements of LLM, we present Volume Patch LLM (VP-LLM), which leverages LLMs to perform conditional 3D comple… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 27pages, 16 figures

  20. arXiv:2406.04629  [pdf, other

    cs.CV cs.GR cs.MM

    STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting

    Authors: Zenghao Chai, Chen Tang, Yongkang Wong, Mohan Kankanhalli

    Abstract: The creation of 4D avatars (i.e., animated 3D avatars) from text description typically uses text-to-image (T2I) diffusion models to synthesize 3D avatars in the canonical space and subsequently applies animation with target motions. However, such an optimization-by-animation paradigm has several drawbacks. (1) For pose-agnostic optimization, the rendered images in canonical pose for naive Score Di… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Tech report

  21. arXiv:2406.03723  [pdf, other

    cs.CV cs.GR cs.MM

    Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling

    Authors: Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang, Pedro Miraldo, Suhas Lohit, Moitreya Chatterjee

    Abstract: Extensions of Neural Radiance Fields (NeRFs) to model dynamic scenes have enabled their near photo-realistic, free-viewpoint rendering. Although these methods have shown some potential in creating immersive experiences, two drawbacks limit their ubiquity: (i) a significant reduction in reconstruction quality when the computing budget is limited, and (ii) a lack of semantic understanding of the und… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Paper accepted to IEEE/CVF CVPR 2024 (Spotlight). Work done when XL was an intern at MERL. Project Page Link: https://merl.com/research/highlights/gear-nerf

    ACM Class: I.2.10

  22. arXiv:2406.02039  [pdf, other

    cs.AR

    LMB: Augmenting PCIe Devices with CXL-Linked Memory Buffer

    Authors: Jiapin Wang, Xiangping Zhang, Chenlei Tang, Xiang Chen, Tao Lu

    Abstract: PCIe devices, such as SSDs and GPUs, are pivotal in modern data centers, and their value is set to grow amidst the emergence of AI and large models. However, these devices face onboard DRAM shortage issue due to internal space limitation, preventing accommodation of sufficient DRAM modules alongside flash or GPU processing chips. Current solutions either curb device-internal memory usage or supple… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  23. arXiv:2406.01987  [pdf, other

    cs.CV

    Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization

    Authors: Yunpeng Zhao, Cheng Chen, Qing You Pang, Quanzheng Li, Carol Tang, Beng-Ti Ang, Yueming Jin

    Abstract: Addressing missing modalities presents a critical challenge in multimodal learning. Current approaches focus on developing models that can handle modality-incomplete inputs during inference, assuming that the full set of modalities are available for all the data during training. This reliance on full-modality data for training limits the use of abundant modality-incomplete samples that are often e… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  24. arXiv:2406.01065  [pdf, other

    cs.LG cs.AI

    Causal prompting model-based offline reinforcement learning

    Authors: Xuehui Yu, Yi Guan, Rujia Shen, Xin Li, Chen Tang, Jingchi Jiang

    Abstract: Model-based offline Reinforcement Learning (RL) allows agents to fully utilise pre-collected datasets without requiring additional or unethical explorations. However, applying model-based offline RL to online systems presents challenges, primarily due to the highly suboptimal (noise-filled) and diverse nature of datasets generated by online systems. To tackle these issues, we introduce the Causal… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  25. arXiv:2405.20215  [pdf, other

    cs.CL

    TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models

    Authors: Chen Zhang, Chengguang Tang, Dading Chong, Ke Shi, Guohua Tang, Feng Jiang, Haizhou Li

    Abstract: Mainstream approaches to aligning large language models (LLMs) heavily rely on human preference data, particularly when models require periodic updates. The standard process for iterative alignment of LLMs involves collecting new human feedback for each update. However, the data collection process is costly and challenging to scale. To address this issue, we introduce the "TS-Align" framework, whi… ▽ More

    Submitted 14 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  26. arXiv:2405.20202  [pdf, other

    cs.AI

    One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments

    Authors: Ke Yi, Yuhui Xu, Heng Chang, Chen Tang, Yuan Meng, Tong Zhang, Jia Li

    Abstract: Large Language Models (LLMs) have advanced rapidly but face significant memory demands. While quantization has shown promise for LLMs, current methods typically require lengthy training to alleviate the performance degradation from quantization loss. However, deploying LLMs across diverse scenarios with different resource constraints, e.g., servers and personal computers, requires repeated trainin… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  27. arXiv:2405.19789  [pdf, other

    cs.LG cs.DC

    Estimating before Debiasing: A Bayesian Approach to Detaching Prior Bias in Federated Semi-Supervised Learning

    Authors: Guogang Zhu, Xuefeng Liu, Xinghao Wu, Shaojie Tang, Chao Tang, Jianwei Niu, Hao Su

    Abstract: Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the mo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  28. arXiv:2405.19226  [pdf, other

    cs.CV cs.MM

    ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions

    Authors: Honglin Lin, Siyu Li, Guoshun Nan, Chaoyue Tang, Xueting Wang, Jingxin Xu, Rong Yankai, Zhili Zhou, Yutong Gao, Qimei Cui, Xiaofeng Tao

    Abstract: Image retrieval from contextual descriptions (IRCD) aims to identify an image within a set of minimally contrastive candidates based on linguistically complex text. Despite the success of VLMs, they still significantly lag behind human performance in IRCD. The main challenges lie in aligning key contextual cues in two modalities, where these subtle cues are concealed in tiny areas of multiple cont… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted in ACL 2024 Findings

  29. arXiv:2405.17838  [pdf, other

    cs.LG cs.AI

    Trust and Terror: Hazards in Text Reveal Negatively Biased Credulity and Partisan Negativity Bias

    Authors: Keith Burghardt, Daniel M. T. Fessler, Chyna Tang, Anne Pisor, Kristina Lerman

    Abstract: Socio-linguistic indicators of text, such as emotion or sentiment, are often extracted using neural networks in order to better understand features of social media. One indicator that is often overlooked, however, is the presence of hazards within text. Recent psychological research suggests that statements about hazards are more believable than statements about benefits (a property known as negat… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages, 16 figures

  30. arXiv:2405.17013  [pdf, other

    cs.CV

    MotionLLM: Multimodal Motion-Language Learning with Large Language Models

    Authors: Qi Wu, Yubo Zhao, Yifan Wang, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Recent advancements in Multimodal Large Language Models (MM-LLMs) have demonstrated promising potential in terms of generalization and robustness when applied to different modalities. While previous works have already achieved 3D human motion generation using various approaches including language modeling, they mostly % are mostly carefully designed use specialized architecture and are restricted… ▽ More

    Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Project page: https://knoxzhao.github.io/MotionLLM

  31. arXiv:2405.16136  [pdf, other

    cs.AI cs.CL cs.LG cs.SD eess.AS

    C3LLM: Conditional Multimodal Content Generation Using Large Language Models

    Authors: Zixuan Wang, Qinkai Duan, Yu-Wing Tai, Chi-Keung Tang

    Abstract: We introduce C3LLM (Conditioned-on-Three-Modalities Large Language Models), a novel framework combining three tasks of video-to-audio, audio-to-text, and text-to-audio together. C3LLM adapts the Large Language Model (LLM) structure as a bridge for aligning different modalities, synthesizing the given conditional information, and making multimodal generation in a discrete manner. Our contributions… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  32. arXiv:2405.15924  [pdf, other

    cs.CL

    SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation

    Authors: Kun Zhao, Bohao Yang, Chen Tang, Chenghua Lin, Liang Zhan

    Abstract: The long-standing one-to-many problem of gold standard responses in open-domain dialogue systems presents challenges for automatic evaluation metrics. Though prior works have demonstrated some success by applying powerful Large Language Models (LLMs), existing approaches still struggle with the one-to-many problem, and exhibit subpar performance in domain-specific scenarios. We assume the commonse… ▽ More

    Submitted 29 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL2024 Findings

  33. arXiv:2405.15092  [pdf, other

    cs.AI cs.CL

    Dissociation of Faithful and Unfaithful Reasoning in LLMs

    Authors: Evelyn Yee, Alice Li, Chenyu Tang, Yeon Ho Jung, Ramamohan Paturi, Leon Bergen

    Abstract: Large language models (LLMs) improve their performance in downstream tasks when they generate Chain of Thought reasoning text before producing an answer. Our research investigates how LLMs recover from errors in Chain of Thought, reaching the correct final answer despite mistakes in the reasoning text. Through analysis of these error recovery behaviors, we find evidence for unfaithfulness in Chain… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: code published at https://github.com/CoTErrorRecovery/CoTErrorRecovery

  34. arXiv:2405.13480  [pdf, other

    physics.soc-ph cs.CY

    What is a typical signalized intersection in a city? A pipeline for intersection data imputation from OpenStreetMap

    Authors: Ao Qu, Anirudh Valiveru, Catherine Tang, Vindula Jayawardana, Baptiste Freydt, Cathy Wu

    Abstract: Signalized intersections, arguably the most complicated type of traffic scenario, are essential to urban mobility systems. With recent advancements in intelligent transportation technologies, signalized intersections have great prospects for making transportation greener, safer, and faster. Several studies have been conducted focusing on intersection-level control and optimization. However, arbitr… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  35. arXiv:2405.10508  [pdf, other

    cs.CV

    ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

    Authors: Pengzhi Li, Chengshuai Tang, Qinxuan Huang, Zhiheng Li

    Abstract: In this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques. Our method effectively bridges the gap between artistic and realistic images through an innovative image semantic transfer algorithm. By leveraging depth information and an initial artistic image, we generate… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPR 2024 Workshop on AI3DG

  36. arXiv:2405.02068  [pdf, other

    cs.CV

    Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

    Authors: Canhui Tang, Sanping Zhou, Yizhe Li, Yonghao Dong, Le Wang

    Abstract: With the wide application of knowledge distillation between an ImageNet pre-trained teacher model and a learnable student model, industrial anomaly detection has witnessed a significant achievement in the past few years. The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model c… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: The paper is under review

  37. arXiv:2405.01496  [pdf, other

    cs.CV

    LocInv: Localization-aware Inversion for Text-Guided Image Editing

    Authors: Chuanming Tang, Kai Wang, Fei Yang, Joost van de Weijer

    Abstract: Large-scale Text-to-Image (T2I) diffusion models demonstrate significant generation capabilities based on textual prompts. Based on the T2I diffusion models, text-guided image editing research aims to empower users to manipulate generated images by altering the text prompts. However, existing image editing techniques are prone to editing over unintentional regions that are beyond the intended targ… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024 Workshop AI4CC

  38. arXiv:2404.13388  [pdf

    eess.IV cs.CV cs.LG

    Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning

    Authors: Yong Liu, Mengtian Kang, Shuo Gao, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Arokia Nathan, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Luigi Occhipinti

    Abstract: Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To addres… ▽ More

    Submitted 23 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  39. arXiv:2404.13386  [pdf

    eess.IV cs.CV cs.LG

    SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus Images

    Authors: Jiaqi Wang, Mengtian Kang, Yong Liu, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Shuo Gao, Luigi G. Occhipinti

    Abstract: Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this artic… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: ISBI 2024

  40. arXiv:2404.10399  [pdf, other

    cs.RO

    FoundationGrasp: Generalizable Task-Oriented Grasping with Foundation Models

    Authors: Chao Tang, Dehao Huang, Wenlong Dong, Ruinian Xu, Hong Zhang

    Abstract: Task-oriented grasping (TOG), which refers to the problem of synthesizing grasps on an object that are configurationally compatible with the downstream manipulation task, is the first milestone towards tool manipulation. Analogous to the activation of two brain regions responsible for semantic and geometric reasoning during cognitive processes, modeling the complex relationship between objects, ta… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  41. arXiv:2404.09532  [pdf, other

    cs.CV cs.LG

    TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models

    Authors: Haojun Sun, Chen Tang, Zhi Wang, Yuan Meng, Jingyan jiang, Xinzhu Ma, Wenwu Zhu

    Abstract: Diffusion models have emerged as preeminent contenders in the realm of generative models. Distinguished by their distinctive sequential generative processes, characterized by hundreds or even thousands of timesteps, diffusion models progressively reconstruct images from pure Gaussian noise, with each timestep necessitating full inference of the entire model. However, the substantial computational… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  42. arXiv:2404.05639  [pdf, other

    cs.LG cs.AI cs.CR

    Investigating the Impact of Quantization on Adversarial Robustness

    Authors: Qun Li, Yuan Meng, Chen Tang, Jiacheng Jiang, Zhi Wang

    Abstract: Quantization is a promising technique for reducing the bit-width of deep models to improve their runtime performance and storage efficiency, and thus becomes a fundamental step for deployment. In real-world scenarios, quantized models are often faced with adversarial attacks which cause the model to make incorrect inferences by introducing slight perturbations. However, recent studies have paid le… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted to ICLR 2024 Workshop PML4LRS

  43. arXiv:2404.05211  [pdf, other

    cs.CV

    Multi-level Graph Subspace Contrastive Learning for Hyperspectral Image Clustering

    Authors: Jingxin Wang, Renxiang Guan, Kainan Gao, Zihao Li, Hao Li, Xianju Li, Chang Tang

    Abstract: Hyperspectral image (HSI) clustering is a challenging task due to its high complexity. Despite subspace clustering shows impressive performance for HSI, traditional methods tend to ignore the global-local interaction in HSI data. In this study, we proposed a multi-level graph subspace contrastive learning (MLGSC) for HSI clustering. The model is divided into the following main parts. Graph convolu… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: IJCNN 2024

  44. arXiv:2404.04665  [pdf, other

    cs.CV cs.AI

    Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification

    Authors: Lingzhi Liu, Haiyang Zhang, Chengwei Tang, Tiantian Zhang

    Abstract: The memory dictionary-based contrastive learning method has achieved remarkable results in the field of unsupervised person Re-ID. However, The method of updating memory based on all samples does not fully utilize the hardest sample to improve the generalization ability of the model, and the method based on hardest sample mining will inevitably introduce false-positive samples that are incorrectly… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  45. arXiv:2404.01129  [pdf, other

    cs.CL

    Structured Information Matters: Incorporating Abstract Meaning Representation into LLMs for Improved Open-Domain Dialogue Evaluation

    Authors: Bohao Yang, Kun Zhao, Chen Tang, Liang Zhan, Chenghua Lin

    Abstract: Automatic open-domain dialogue evaluation has attracted increasing attention. Trainable evaluation metrics are commonly trained with true positive and randomly selected negative responses, resulting in a tendency for them to assign a higher score to the responses that share higher content similarity with a given context. However, adversarial negative responses possess high content similarity with… ▽ More

    Submitted 6 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  46. arXiv:2404.00373  [pdf, other

    cs.CV

    The Devil is in the Edges: Monocular Depth Estimation with Edge-aware Consistency Fusion

    Authors: Pengzhi Li, Yikang Ding, Haohan Wang, Chengshuai Tang, Zhiheng Li

    Abstract: This paper presents a novel monocular depth estimation method, named ECFNet, for estimating high-quality monocular depth with clear edges and valid overall structure from a single RGB image. We make a thorough inquiry about the key factor that affects the edge depth estimation of the MDE networks, and come to a ratiocination that the edge information itself plays a critical role in predicting dept… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 17 pages, 19 figures

  47. arXiv:2404.00343  [pdf, other

    cs.RO

    Commonsense Scene Graph-based Target Localization for Object Search

    Authors: Wenqi Ge, Chao Tang, Hong Zhang

    Abstract: Object search is a fundamental skill for household robots, yet the core problem lies in the robot's ability to locate the target object accurately. The dynamic nature of household environments, characterized by the arbitrary placement of daily objects by users, makes it challenging to perform target localization. To efficiently locate the target object, the robot needs to be equipped with knowledg… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  48. arXiv:2403.19285   

    cs.CL

    Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation

    Authors: Chenming Tang, Zhixiang Wang, Yunfang Wu

    Abstract: In-context learning (ICL) is the trending prompting strategy in the era of large language models (LLMs), where a few examples are demonstrated to evoke LLMs' power for a given task. How to select informative examples remains an open issue. Previous works on in-context example selection for machine translation (MT) focus on superficial word-level features while ignoring deep syntax-level knowledge.… ▽ More

    Submitted 28 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Some reported baselines are not comparable in Section 5 and could be misleading and confusing

  49. arXiv:2403.19283  [pdf, other

    cs.CL

    Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction

    Authors: Chenming Tang, Fanyi Qu, Yunfang Wu

    Abstract: In the era of large language models (LLMs), in-context learning (ICL) stands out as an effective prompting strategy that explores LLMs' potency across various tasks. However, applying LLMs to grammatical error correction (GEC) is still a challenging task. In this paper, we propose a novel ungrammatical-syntax-based in-context example selection strategy for GEC. Specifically, we measure similarity… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024 Main Conference

  50. arXiv:2403.17905  [pdf, other

    eess.IV cs.CV cs.LG eess.SP

    Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2

    Authors: Yiwei Chen, Chao Tang, Amir Aghabiglou, Chung San Chu, Yves Wiaux

    Abstract: We propose a new approach for non-Cartesian magnetic resonance image reconstruction. While unrolled architectures provide robustness via data-consistency layers, embedding measurement operators in Deep Neural Network (DNN) can become impractical at large scale. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, are not affected by this limita… ▽ More

    Submitted 28 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to IEEE EUSIPCO 2024