(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 865 results for author: Liu, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.05404  [pdf, other

    cs.DC cs.AR cs.ET

    DFabric: Scaling Out Data Parallel Applications with CXL-Ethernet Hybrid Interconnects

    Authors: Xu Zhang, Ke Liu, Yisong Chang, Hui Yuan, Xiaolong Zheng, Ke Zhang, Mingyu Chen

    Abstract: Emerging interconnects, such as CXL and NVLink, have been integrated into the intra-host topology to scale more accelerators and facilitate efficient communication between them, such as GPUs. To keep pace with the accelerator's growing computing throughput, the interconnect has seen substantial enhancement in link bandwidth, e.g., 256GBps for CXL 3.0 links, which surpasses Ethernet and InfiniBand… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  2. arXiv:2409.04963  [pdf, other

    cs.CV

    GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding via Self-supervised Learning

    Authors: Keyi Liu, Yeqi Luo, Weidong Yang, Jingyi Xu, Zhijun Li, Wen-Ming Chen, Ben Fei

    Abstract: Self-supervised learning of point cloud aims to leverage unlabeled 3D data to learn meaningful representations without reliance on manual annotations. However, current approaches face challenges such as limited data diversity and inadequate augmentation for effective feature learning. To address these challenges, we propose GS-PT, which integrates 3D Gaussian Splatting (3DGS) into point cloud self… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  3. arXiv:2409.03685  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

    Authors: Stephen Tian, Blake Wulfe, Kyle Sargent, Katherine Liu, Sergey Zakharov, Vitor Guizilini, Jiajun Wu

    Abstract: Large-scale visuomotor policy learning is a promising approach toward developing generalizable manipulation systems. Yet, policies that can be deployed on diverse embodiments, environments, and observational modalities remain elusive. In this work, we investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: obs… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted to CoRL 2024

  4. arXiv:2409.03283  [pdf, other

    cs.SD eess.AS

    FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications

    Authors: Hao-Han Guo, Kun Liu, Fei-Yu Shen, Yi-Chen Wu, Feng-Long Xie, Kun Xie, Kai-Tuo Xu

    Abstract: This work proposes FireRedTTS, a foundation text-to-speech framework, to meet the growing demands for personalized and diverse generative speech applications. The framework comprises three parts: data processing, foundation system, and downstream applications. First, we comprehensively present our data processing pipeline, which transforms massive raw audio into a large-scale high-quality TTS data… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  5. arXiv:2409.03228  [pdf, other

    cs.CV

    Labeled-to-Unlabeled Distribution Alignment for Partially-Supervised Multi-Organ Medical Image Segmentation

    Authors: Xixi Jiang, Dong Zhang, Xiang Li, Kangyi Liu, Kwang-Ting Cheng, Xin Yang

    Abstract: Partially-supervised multi-organ medical image segmentation aims to develop a unified semantic segmentation model by utilizing multiple partially-labeled datasets, with each dataset providing labels for a single class of organs. However, the limited availability of labeled foreground organs and the absence of supervision to distinguish unlabeled foreground organs from the background pose a signifi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted by Medical Image Analysis

  6. arXiv:2409.02119  [pdf, other

    cs.LG cs.AI cs.CL

    CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models

    Authors: Xiaojun Xiao, Sen Shen, Qiming Bao, Hongfei Rong, Kairui Liu, Zhongsheng Wang, Jiamou Liu

    Abstract: In fine-tuning large language models (LLMs), conserving computational resources while maintaining effectiveness and improving outcomes within the same computational constraints is crucial. The Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models by reducing the number of trainable parameters and computational costs. However, current advancements in Lo… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  7. arXiv:2409.01236  [pdf, other

    cs.CV

    Spatial-Aware Conformal Prediction for Trustworthy Hyperspectral Image Classification

    Authors: Kangdao Liu, Tianhao Sun, Hao Zeng, Yongshan Zhang, Chi-Man Pun, Chi-Man Vong

    Abstract: Hyperspectral image (HSI) classification involves assigning specific labels to each pixel to identify various land cover categories. Although deep classifiers have shown high predictive accuracy in this field, quantifying their uncertainty remains a significant challenge, which hinders their application in critical contexts. This study first theoretically evaluates the applicability of \textit{Con… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  8. arXiv:2409.00617  [pdf, other

    cs.CL cs.AI

    Does Knowledge Localization Hold True? Surprising Differences Between Entity and Relation Perspectives in Language Models

    Authors: Yifan Wei, Xiaoyan Yu, Yixuan Weng, Huanhuan Ma, Yuanzhe Zhang, Jun Zhao, Kang Liu

    Abstract: Large language models encapsulate knowledge and have demonstrated superior performance on various natural language processing tasks. Recent studies have localized this knowledge to specific model parameters, such as the MLP weights in intermediate layers. This study investigates the differences between entity and relational knowledge through knowledge editing. Our findings reveal that entity and r… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: CIKM 2024

  9. arXiv:2409.00329  [pdf, other

    cs.CE

    Convolutional Hierarchical Deep Learning Neural Networks-Tensor Decomposition (C-HiDeNN-TD): a scalable surrogate modeling approach for large-scale physical systems

    Authors: Jiachen Guo, Chanwook Park, Xiaoyu Xie, Zhongsheng Sang, Gregory J. Wagner, Wing Kam Liu

    Abstract: A common trend in simulation-driven engineering applications is the ever-increasing size and complexity of the problem, where classical numerical methods typically suffer from significant computational time and huge memory cost. Methods based on artificial intelligence have been extensively investigated to accelerate partial differential equations (PDE) solvers using data-driven surrogates. Howeve… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  10. arXiv:2408.15252  [pdf, other

    eess.SP cs.AI

    Generative AI on SpectrumNet: An Open Benchmark of Multiband 3D Radio Maps

    Authors: Shuhang Zhang, Shuai Jiang, Wanjie Lin, Zheng Fang, Kangjun Liu, Hongliang Zhang, Ke Chen

    Abstract: Radio map is an efficient demonstration for visually displaying the wireless signal coverage within a certain region. It has been considered to be increasingly helpful for the future sixth generation (6G) of wireless networks, as wireless nodes are becoming more crowded and complicated. However, the construction of high resolution radio map is very challenging due to the sparse sampling in practic… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 30 pages, 15 figures

  11. arXiv:2408.13358  [pdf, other

    cs.CV

    Shape-Preserving Generation of Food Images for Automatic Dietary Assessment

    Authors: Guangzong Chen, Zhi-Hong Mao, Mingui Sun, Kangni Liu, Wenyan Jia

    Abstract: Traditional dietary assessment methods heavily rely on self-reporting, which is time-consuming and prone to bias. Recent advancements in Artificial Intelligence (AI) have revealed new possibilities for dietary assessment, particularly through analysis of food images. Recognizing foods and estimating food volumes from images are known as the key procedures for automatic dietary assessment. However,… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  12. arXiv:2408.13078  [pdf, other

    cs.LG cs.AI

    AEMLO: AutoEncoder-Guided Multi-Label Oversampling

    Authors: Ao Zhou, Bin Liu, Jin Wang, Kaiwei Sun, Kelin Liu

    Abstract: Class imbalance significantly impacts the performance of multi-label classifiers. Oversampling is one of the most popular approaches, as it augments instances associated with less frequent labels to balance the class distribution. Existing oversampling methods generate feature vectors of synthetic samples through replication or linear interpolation and assign labels through neighborhood informatio… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  13. arXiv:2408.12674  [pdf, other

    cs.RO cs.CV

    One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs

    Authors: Jianren Wang, Kangni Liu, Dingkun Guo, Xian Zhou, Christopher G Atkeson

    Abstract: Learning to manipulate dynamic and deformable objects from a single demonstration video holds great promise in terms of scalability. Previous approaches have predominantly focused on either replaying object relationships or actor trajectories. The former often struggles to generalize across diverse tasks, while the latter suffers from data inefficiency. Moreover, both methodologies encounter chall… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Robot Learning, Computer Vision, Learning from Videos

  14. arXiv:2408.12194  [pdf, other

    cs.CL

    Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment

    Authors: Kun Luo, Minghao Qin, Zheng Liu, Shitao Xiao, Jun Zhao, Kang Liu

    Abstract: Pretrained language models like BERT and T5 serve as crucial backbone encoders for dense retrieval. However, these models often exhibit limited generalization capabilities and face challenges in improving in domain accuracy. Recent research has explored using large language models (LLMs) as retrievers, achieving SOTA performance across various tasks. Despite these advancements, the specific benefi… ▽ More

    Submitted 23 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Submitted to EMNLP24

  15. arXiv:2408.11850  [pdf, other

    cs.CL

    Parallel Speculative Decoding with Adaptive Draft Length

    Authors: Tianyu Liu, Yun Li, Qitan Lv, Kai Liu, Jianchen Zhu, Winston Hu

    Abstract: Speculative decoding (SD), where an extra draft model is employed to provide multiple \textit{draft} tokens first and then the original target model verifies these tokens in parallel, has shown great power for LLM inference acceleration. However, existing SD methods suffer from the mutual waiting problem, i.e., the target model gets stuck when the draft model is \textit{guessing} tokens, and vice… ▽ More

    Submitted 4 September, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  16. arXiv:2408.11324  [pdf, other

    cs.SE

    HITS: High-coverage LLM-based Unit Test Generation via Method Slicing

    Authors: Zejun Wang, Kaibo Liu, Ge Li, Zhi Jin

    Abstract: Large language models (LLMs) have behaved well in generating unit tests for Java projects. However, the performance for covering the complex focal methods within the projects is poor. Complex methods comprise many conditions and loops, requiring the test cases to be various enough to cover all lines and branches. However, existing test generation methods with LLMs provide the whole method-to-test… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: to be published in ASE 24' Research Track

  17. arXiv:2408.10682  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models

    Authors: Hongbang Yuan, Zhuoran Jin, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: LLM have achieved success in many fields but still troubled by problematic content in the training corpora. LLM unlearning aims at reducing their influence and avoid undesirable behaviours. However, existing unlearning methods remain vulnerable to adversarial queries and the unlearned knowledge resurfaces after the manually designed attack queries. As part of a red-team effort to proactively asses… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 13 pages

  18. arXiv:2408.10543  [pdf, other

    cs.CV cs.AI eess.IV

    Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

    Authors: Kai Liu, Kang You, Pan Gao

    Abstract: Stable diffusion networks have emerged as a groundbreaking development for their ability to produce realistic and detailed visual content. This characteristic renders them ideal decoders, capable of producing high-quality and aesthetically pleasing reconstructions. In this paper, we introduce the first diffusion-based point cloud compression method, dubbed Diff-PCC, to leverage the expressive powe… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  19. arXiv:2408.10537  [pdf, other

    cs.CV

    Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation

    Authors: Jiawei Han, Kaiqi Liu, Wei Li, Guangzhi Chen

    Abstract: Point cloud semantic segmentation can significantly enhance the perception of an intelligent agent. Nevertheless, the discriminative capability of the segmentation network is influenced by the quantity of samples available for different categories. To mitigate the cognitive bias induced by class imbalance, this paper introduces a novel method, namely subspace prototype guidance (\textbf{SPG}), to… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  20. arXiv:2408.10524  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition

    Authors: Xucheng Wan, Naijun Zheng, Kai Liu, Huan Zhou

    Abstract: Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available. However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted to NCMMSC 2024

  21. arXiv:2408.07840  [pdf, other

    cs.CL cs.AI cs.SC

    ONSEP: A Novel Online Neural-Symbolic Framework for Event Prediction Based on Large Language Model

    Authors: Xuanqing Yu, Wangtao Sun, Jingwei Li, Kang Liu, Chengbao Liu, Jie Tan

    Abstract: In the realm of event prediction, temporal knowledge graph forecasting (TKGF) stands as a pivotal technique. Previous approaches face the challenges of not utilizing experience during testing and relying on a single short-term history, which limits adaptation to evolving data. In this paper, we introduce the Online Neural-Symbolic Event Prediction (ONSEP) framework, which innovates by integrating… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 16 pages, ACL 2024 Findings

  22. arXiv:2408.07413  [pdf, other

    cs.CL

    Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models

    Authors: Chenhui Hu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Knowledge editing aims to update outdated or incorrect knowledge in large language models (LLMs). However, current knowledge editing methods have limited scalability for lifelong editing. This study explores the fundamental reason why knowledge editing fails in lifelong editing. We begin with the closed-form solution derived from linear associative memory, which underpins state-of-the-art knowledg… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  23. arXiv:2408.06152  [pdf, other

    cs.MM cs.AI cs.CV cs.NI

    Palantir: Towards Efficient Super Resolution for Ultra-high-definition Live Streaming

    Authors: Xinqi Jin, Zhui Zhu, Xikai Sun, Fan Dang, Jiangchuan Liu, Jingao Xu, Kebin Liu, Xinlei Chen, Yunhao Liu

    Abstract: Neural enhancement through super-resolution (SR) deep neural networks (DNNs) opens up new possibilities for ultra-high-definition (UHD) live streaming over existing encoding and networking infrastructure. Yet, the heavy SR DNN inference overhead leads to severe deployment challenges. To reduce the overhead, existing systems propose to apply DNN-based SR only on carefully selected anchor frames whi… ▽ More

    Submitted 31 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  24. arXiv:2408.05996  [pdf, ps, other

    cs.NI

    Value-based Proactive Caching for Sensing Data in Internet of Vehicles

    Authors: Yantong Wang, Ke Liu, Hui Ji, Jiande Sun

    Abstract: Sensing data (SD) plays an important role in safe-related applications for Internet of Vehicles. Proactively caching required sensing data (SD) is a pivotal strategy for alleviating network congestion and improving data accessibility. Despite merits, existing studies predominantly address SD caching within a single time slot, which may not be scalable to scenarios involving multi-slots. Furthermor… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 14 pages,10 figures

  25. arXiv:2408.04974  [pdf, other

    cs.CR cs.CV

    XNN: Paradigm Shift in Mitigating Identity Leakage within Cloud-Enabled Deep Learning

    Authors: Kaixin Liu, Huixin Xiong, Bingyu Duan, Zexuan Cheng, Xinyu Zhou, Wanqian Zhang, Xiangyu Zhang

    Abstract: In the domain of cloud-based deep learning, the imperative for external computational resources coexists with acute privacy concerns, particularly identity leakage. To address this challenge, we introduce XNN and XNN-d, pioneering methodologies that infuse neural network features with randomized perturbations, striking a harmonious balance between utility and privacy. XNN, designed for the trainin… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  26. arXiv:2408.04662  [pdf, other

    cs.CL cs.AI

    Citekit: A Modular Toolkit for Large Language Model Citation Generation

    Authors: Jiajun Shen, Tong Zhou, Suifeng Zhao, Yubo Chen, Kang Liu

    Abstract: Enabling Large Language Models (LLMs) to generate citations in Question-Answering (QA) tasks is an emerging paradigm aimed at enhancing the verifiability of their responses when LLMs are utilizing external references to generate an answer. However, there is currently no unified framework to standardize and fairly compare different citation generation methods, leading to difficulties in reproducing… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 7 pages, 13 figures

  27. arXiv:2408.04276  [pdf, other

    cs.LG

    Early Risk Assessment Model for ICA Timing Strategy in Unstable Angina Patients Using Multi-Modal Machine Learning

    Authors: Candi Zheng, Kun Liu, Yang Wang, Shiyi Chen, Hongli Li

    Abstract: Background: Invasive coronary arteriography (ICA) is recognized as the gold standard for diagnosing cardiovascular diseases, including unstable angina (UA). The challenge lies in determining the optimal timing for ICA in UA patients, balancing the need for revascularization in high-risk patients against the potential complications in low-risk ones. Unlike myocardial infarction, UA does not have sp… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  28. arXiv:2408.03152  [pdf, other

    cs.LG

    TSC: A Simple Two-Sided Constraint against Over-Smoothing

    Authors: Furong Peng, Kang Liu, Xuan Lu, Yuhua Qian, Hongren Yan, Chao Ma

    Abstract: Graph Convolutional Neural Network (GCN), a widely adopted method for analyzing relational data, enhances node discriminability through the aggregation of neighboring information. Usually, stacking multiple layers can improve the performance of GCN by leveraging information from high-order neighbors. However, the increase of the network depth will induce the over-smoothing problem, which can be at… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: accept by KDD2024

  29. arXiv:2408.00254  [pdf, other

    cs.CV

    LoopSparseGS: Loop Based Sparse-View Friendly Gaussian Splatting

    Authors: Zhenyu Bao, Guibiao Liao, Kaichen Zhou, Kanglin Liu, Qing Li, Guoping Qiu

    Abstract: Despite the photorealistic novel view synthesis (NVS) performance achieved by the original 3D Gaussian splatting (3DGS), its rendering quality significantly degrades with sparse input views. This performance drop is mainly caused by the limited number of initial points generated from the sparse input, insufficient supervision during the training process, and inadequate regularization of the oversi… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Comments: 13 pages, 10 figures

  30. arXiv:2407.20183  [pdf, other

    cs.CL cs.AI

    MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

    Authors: Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao

    Abstract: Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to solve this task by combining LLMs and search engines. However, these methods still obtain unsatisfying performance due to three challenges: (1) complex requests often cannot be accurately and completely retriev… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Technical Report. Project Page: https://mindsearch.netlify.app Code: https://github.com/InternLM/MindSearch

  31. arXiv:2407.17211  [pdf, other

    cs.AI cs.NI cs.RO

    Testing Large Language Models on Driving Theory Knowledge and Skills for Connected Autonomous Vehicles

    Authors: Zuoyin Tang, Jianhua He, Dashuai Pei, Kezhong Liu, Tao Gao

    Abstract: Handling long tail corner cases is a major challenge faced by autonomous vehicles (AVs). While large language models (LLMs) hold great potentials to handle the corner cases with excellent generalization and explanation capabilities and received increasing research interest on application to autonomous driving, there are still technical barriers to be tackled, such as strict model performance and h… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  32. arXiv:2407.16725  [pdf, other

    cs.CV

    Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions

    Authors: Kai Liu, Zhihang Fu, Chao Chen, Sheng Jin, Ze Chen, Mingyuan Tao, Rongxin Jiang, Jieping Ye

    Abstract: The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spur… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  33. arXiv:2407.16724  [pdf, other

    cs.CL

    Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge

    Authors: Kai Liu, Ze Chen, Zhihang Fu, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, Jieping Ye

    Abstract: This paper presents a pioneering methodology, termed StructTuning, to efficiently transform foundation Large Language Models (LLMs) into domain specialists. It significantly minimizes the training corpus requirement to a mere 0.3% while achieving an impressive 50% of traditional knowledge injection performance. Our method is inspired by the educational processes for human students, particularly ho… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: N/A

  34. arXiv:2407.16434  [pdf, other

    cs.CL

    Enhancing LLM's Cognition via Structurization

    Authors: Kai Liu, Zhihang Fu, Chao Chen, Wei Zhang, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, Jieping Ye

    Abstract: When reading long-form text, human cognition is complex and structurized. While large language models (LLMs) process input contexts through a causal and sequential perspective, this approach can potentially limit their ability to handle intricate and complex inputs effectively. To enhance LLM's cognition capability, this paper presents a novel concept of context structurization. Specifically, we t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: N/A

  35. arXiv:2407.16430  [pdf, other

    cs.CV

    Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution

    Authors: Kai Liu, Zhihang Fu, Sheng Jin, Chao Chen, Ze Chen, Rongxin Jiang, Fan Zhou, Yaowu Chen, Jieping Ye

    Abstract: Detecting and rejecting unknown out-of-distribution (OOD) samples is critical for deployed neural networks to void unreliable predictions. In real-world scenarios, however, the efficacy of existing OOD detection methods is often impeded by the inherent imbalance of in-distribution (ID) data, which causes significant performance decline. Through statistical observations, we have identified two comm… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: N/A

  36. arXiv:2407.16424  [pdf, other

    cs.CV

    ESOD: Efficient Small Object Detection on High-Resolution Images

    Authors: Kai Liu, Zhihang Fu, Sheng Jin, Ze Chen, Fan Zhou, Rongxin Jiang, Yaowu Chen, Jieping Ye

    Abstract: Enlarging input images is a straightforward and effective approach to promote small object detection. However, simple image enlargement is significantly expensive on both computations and GPU memory. In fact, small objects are usually sparsely distributed and locally clustered. Therefore, massive feature extraction computations are wasted on the non-target background area of images. Recent works h… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: N/A

  37. arXiv:2407.15355  [pdf, other

    cs.CV

    Attention Beats Linear for Fast Implicit Neural Representation Generation

    Authors: Shuyi Zhang, Ke Liu, Jingjun Gu, Xiaoxu Cai, Zhihua Wang, Jiajun Bu, Haishuai Wang

    Abstract: Implicit Neural Representation (INR) has gained increasing popularity as a data representation method, serving as a prerequisite for innovative generation models. Unlike gradient-based methods, which exhibit lower efficiency in inference, the adoption of hyper-network for generating parameters in Multi-Layer Perceptrons (MLP), responsible for executing INR functions, has surfaced as a promising an… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accept by ECCV 2024

  38. arXiv:2407.13158  [pdf, other

    cs.LG cs.DB

    HHGT: Hierarchical Heterogeneous Graph Transformer for Heterogeneous Graph Representation Learning

    Authors: Qiuyu Zhu, Liang Zhang, Qianxiong Xu, Kaijun Liu, Cheng Long, Xiaoyang Wang

    Abstract: Despite the success of Heterogeneous Graph Neural Networks (HGNNs) in modeling real-world Heterogeneous Information Networks (HINs), challenges such as expressiveness limitations and over-smoothing have prompted researchers to explore Graph Transformers (GTs) for enhanced HIN representation learning. However, research on GT in HINs remains limited, with two key shortcomings in existing work: (1) A… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  39. arXiv:2407.12952  [pdf, other

    cs.CV

    Denoising Diffusions in Latent Space for Medical Image Segmentation

    Authors: Fahim Ahmed Zaman, Mathews Jacob, Amanda Chang, Kan Liu, Milan Sonka, Xiaodong Wu

    Abstract: Diffusion models (DPMs) have demonstrated remarkable performance in image generation, often times outperforming other generative models. Since their introduction, the powerful noise-to-image denoising pipeline has been extended to various discriminative tasks, including image segmentation. In case of medical imaging, often times the images are large 3D scans, where segmenting one image using DPMs… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures

  40. arXiv:2407.12823  [pdf, other

    cs.CL cs.AI

    WTU-EVAL: A Whether-or-Not Tool Usage Evaluation Benchmark for Large Language Models

    Authors: Kangyun Ning, Yisong Su, Xueqiang Lv, Yuanzhe Zhang, Jian Liu, Kang Liu, Jinan Xu

    Abstract: Although Large Language Models (LLMs) excel in NLP tasks, they still need external tools to extend their ability. Current research on tool learning with LLMs often assumes mandatory tool use, which does not always align with real-world situations, where the necessity for tools is uncertain, and incorrect or unnecessary use of tools can damage the general abilities of LLMs. Therefore, we propose to… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  41. arXiv:2407.11008  [pdf, other

    cs.CL cs.AI cs.CV

    Figuring out Figures: Using Textual References to Caption Scientific Figures

    Authors: Stanley Cao, Kevin Liu

    Abstract: Figures are essential channels for densely communicating complex ideas in scientific papers. Previous work in automatically generating figure captions has been largely unsuccessful and has defaulted to using single-layer LSTMs, which no longer achieve state-of-the-art performance. In our work, we use the SciCap datasets curated by Hsu et al. and use a variant of a CLIP+GPT-2 encoder-decoder model… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  42. arXiv:2407.10737  [pdf, other

    cs.CV cs.AI

    Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models

    Authors: Rining Wu, Feixiang Zhou, Ziwei Yin, Jian K. Liu

    Abstract: Our brains represent the ever-changing environment with neurons in a highly dynamic fashion. The temporal features of visual pixels in dynamic natural scenes are entrapped in the neuronal responses of the retina. It is crucial to establish the intrinsic temporal relationship between visual pixels and neuronal responses. Recent foundation vision models have paved an advanced way of understanding im… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: This article is accepted by ECCV 2024, which ID is 12149. Accepted papers' id can be found in: https://eccv2024.ecva.net/Conferences/2024/AcceptedPapers

  43. arXiv:2407.10499  [pdf, other

    cs.CL

    CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

    Authors: Songyang Zhang, Chuyu Zhang, Yingfan Hu, Haowen Shen, Kuikun Liu, Zerun Ma, Fengzhe Zhou, Wenwei Zhang, Xuming He, Dahua Lin, Kai Chen

    Abstract: While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation f… ▽ More

    Submitted 25 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Under review. The first three authors contribute equally, and Songyang Zhang is the project leader

  44. arXiv:2407.09088  [pdf, other

    eess.IV cs.AI cs.CV

    FD-SOS: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images

    Authors: Marawan Elbatel, Keyuan Liu, Yanqi Yang, Xiaomeng Li

    Abstract: Accurate detection of bone fenestration and dehiscence (FD) is crucial for effective treatment planning in dentistry. While cone-beam computed tomography (CBCT) is the gold standard for evaluating FD, it comes with limitations such as radiation exposure, limited accessibility, and higher cost compared to intraoral images. In intraoral images, dentists face challenges in the differential diagnosis… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  45. Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large Language Models

    Authors: Peter Ince, Xiapu Luo, Jiangshan Yu, Joseph K. Liu, Xiaoning Du

    Abstract: In this paper, we test the hypothesis that although OpenAI's GPT-4 performs well generally, we can fine-tune open-source models to outperform GPT-4 in smart contract vulnerability detection. We fine-tune two models from Meta's Code Llama and a dataset of 17k prompts, Detect Llama - Foundation and Detect Llama - Instruct, and we also fine-tune OpenAI's GPT-3.5 Turbo model (GPT-3.5FT). We then evalu… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  46. arXiv:2407.08440  [pdf, other

    cs.CL cs.AI

    Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models

    Authors: Wangtao Sun, Chenxiang Zhang, Xueyou Zhang, Ziyang Huang, Haotian Xu, Pei Chen, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Although Large Language Models (LLMs) have demonstrated strong instruction-following ability, they are further supposed to be controlled and guided by rules in real-world scenarios to be safe, accurate, and intelligent. This demands the possession of inferential rule-following capability of LLMs. However, few works have made a clear evaluation of the inferential rule-following capability of LLMs.… ▽ More

    Submitted 18 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  47. arXiv:2407.07495  [pdf, other

    cs.CL

    Bucket Pre-training is All You Need

    Authors: Hongtao Liu, Qiyao Peng, Qing Yang, Kai Liu, Hongyan Xu

    Abstract: Large language models (LLMs) have demonstrated exceptional performance across various natural language processing tasks. However, the conventional fixed-length data composition strategy for pretraining, which involves concatenating and splitting documents, can introduce noise and limit the model's ability to capture long-range dependencies. To address this, we first introduce three metrics for eva… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  48. arXiv:2407.05577  [pdf, other

    cs.CV

    Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN

    Authors: Jiacheng Su, Kunhong Liu, Liyan Chen, Junfeng Yao, Qingsong Liu, Dongdong Lv

    Abstract: The existing methods for audio-driven talking head video editing have the limitations of poor visual effects. This paper tries to tackle this problem through editing talking face images seamless with different emotions based on two modules: (1) an audio-to-landmark module, consisting of the CrossReconstructed Emotion Disentanglement and an alignment network module. It bridges the gap between speec… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  49. arXiv:2407.02031  [pdf, other

    cs.DC cs.AI cs.LG

    SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules

    Authors: Suyi Li, Lingyun Yang, Xiaoxiao Jiang, Hanfeng Lu, Zhipeng Di, Weiyi Lu, Jiawei Chen, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang

    Abstract: This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generatin… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  50. arXiv:2407.00187  [pdf, other

    cs.RO cs.CV cs.GR

    SMPLOlympics: Sports Environments for Physically Simulated Humanoids

    Authors: Zhengyi Luo, Jiashun Wang, Kangni Liu, Haotian Zhang, Chen Tessler, Jingbo Wang, Ye Yuan, Jinkun Cao, Zihui Lin, Fengyi Wang, Jessica Hodgins, Kris Kitani

    Abstract: We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for m… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Project page: https://smplolympics.github.io/SMPLOlympics