(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 525 results for author: Pan, Y

Searching in archive cs. Search in all archives.
.
  1. Adversarial Safety-Critical Scenario Generation using Naturalistic Human Driving Priors

    Authors: Kunkun Hao, Yonggang Luo, Wen Cui, Yuqiao Bai, Jucheng Yang, Songyang Yan, Yuxi Pan, Zijiang Yang

    Abstract: Evaluating the decision-making system is indispensable in developing autonomous vehicles, while realistic and challenging safety-critical test scenarios play a crucial role. Obtaining these scenarios is non-trivial, thanks to the long-tailed distribution, sparsity, and rarity in real-world data sets. To tackle this problem, in this paper, we introduce a natural adversarial scenario generation solu… ▽ More

    Submitted 6 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: Published in IEEE Transactions on Intelligent Vehicles, 2023

    Journal ref: IEEE Transactions on Intelligent Vehicles (2023)

  2. arXiv:2408.02906  [pdf, other

    cs.CV

    Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

    Authors: Xiaoqing Zhang, Qiushi Nie, Zunjie Xiao, Jilu Zhao, Xiao Wu, Pengxin Guo, Runzhi Li, Jin Liu, Yanjie Wei, Yi Pan

    Abstract: Spatial pooling (SP) and cross-channel pooling (CCP) operators have been applied to aggregate spatial features and pixel-wise features from feature maps in deep neural networks (DNNs), respectively. Their main goal is to reduce computation and memory overhead without visibly weakening the performance of DNNs. However, SP often faces the problem of losing the subtle feature representations, while C… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 27

  3. arXiv:2408.02394  [pdf, other

    cs.CV cs.RO

    CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration

    Authors: Gongxin Yao, Yixin Xuan, Xinyang Li, Yu Pan

    Abstract: Image-to-point cloud registration aims to determine the relative camera pose of an RGB image with respect to a point cloud. It plays an important role in camera localization within pre-built LiDAR maps. Despite the modality gaps, most learning-based methods establish 2D-3D point correspondences in feature space without any feedback mechanism for iterative optimization, resulting in poor accuracy a… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024

  4. arXiv:2408.02392  [pdf, other

    cs.CV eess.IV

    MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose Retrieval

    Authors: Gongxin Yao, Xinyang Li, Yixin Xuan, Yu Pan

    Abstract: Image-to-point cloud registration seeks to estimate their relative camera pose, which remains an open question due to the data modality gaps. The recent matching-based methods tend to tackle this by building 2D-3D correspondences. In this paper, we reveal the information loss inherent in these methods and propose a matching-free paradigm, named MaFreeI2P. Our key insight is to actively retrieve th… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted to IEEE Conference on Multimedia Expo 2024

  5. arXiv:2408.01319  [pdf, other

    cs.AI

    A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

    Authors: Jiaqi Wang, Hanqi Jiang, Yiheng Liu, Chong Ma, Xu Zhang, Yi Pan, Mengyuan Liu, Peiran Gu, Sichen Xia, Wenjun Li, Yutong Zhang, Zihao Wu, Zhengliang Liu, Tianyang Zhong, Bao Ge, Tuo Zhang, Ning Qiang, Xintao Hu, Xi Jiang, Xin Zhang, Wei Zhang, Dinggang Shen, Tianming Liu, Shu Zhang

    Abstract: In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  6. arXiv:2408.01191  [pdf, other

    cs.CV

    A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation

    Authors: Ruitao Xie, Limai Jiang, Xiaoxi He, Yi Pan, Yunpeng Cai

    Abstract: Machine-based brain tumor segmentation can help doctors make better diagnoses. However, the complex structure of brain tumors and expensive pixel-level annotations present challenges for automatic tumor segmentation. In this paper, we propose a counterfactual generation framework that not only achieves exceptional brain tumor segmentation performance without the need for pixel-level annotations, b… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo

  7. arXiv:2407.21298  [pdf, other

    cs.LG cs.AI q-bio.BM

    A Vectorization Method Induced By Maximal Margin Classification For Persistent Diagrams

    Authors: An Wu, Yu Pan, Fuqi Zhou, Jinghui Yan, Chuanlu Liu

    Abstract: Persistent homology is an effective method for extracting topological information, represented as persistent diagrams, of spatial structure data. Hence it is well-suited for the study of protein structures. Attempts to incorporate Persistent homology in machine learning methods of protein function prediction have resulted in several techniques for vectorizing persistent diagrams. However, current… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  8. arXiv:2407.20619  [pdf, other

    cs.RO

    ATI-CTLO:Adaptive Temporal Interval-based Continuous-Time LiDAR-Only Odometry

    Authors: Bo Zhou, Jiajie Wu, Yan Pan, Chuanzhao Lu

    Abstract: The motion distortion in LiDAR scans caused by aggressive robot motion and varying terrain features significantly impacts the positioning and mapping performance of 3D LiDAR odometry. Existing distortion correction solutions often struggle to balance computational complexity and accuracy. In this work, we propose an Adaptive Temporal Interval-based Continuous-Time LiDAR-only Odometry, utilizing st… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  9. arXiv:2407.20519  [pdf, other

    cs.HC cs.AI

    DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion Analysis

    Authors: Yue Pan, Qile Liu, Qing Liu, Li Zhang, Gan Huang, Xin Chen, Fali Li, Peng Xu, Zhen Liang

    Abstract: Affective brain-computer interfaces (aBCIs) are increasingly recognized for their potential in monitoring and interpreting emotional states through electroencephalography (EEG) signals. Current EEG-based emotion recognition methods perform well with short segments of EEG data. However, these methods encounter significant challenges in real-life scenarios where emotional states evolve over extended… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  10. arXiv:2407.19807  [pdf, other

    cs.CL

    Cool-Fusion: Fuse Large Language Models without Training

    Authors: Cong Liu, Xiaojun Quan, Yan Pan, Liang Lin, Weigang Wu, Xu Chen

    Abstract: We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is high computational load, i.e. to fine-tune or to align vocabularies via combinatorial optimization. To this end, we propose \emph{Cool-Fusion}, a simple yet effective approach that fuses the knowledge of heterogeneous source… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  11. arXiv:2407.16732  [pdf, other

    cs.SE cs.AI

    PyBench: Evaluating LLM Agent on various real-world coding tasks

    Authors: Yaolun Zhang, Yinxu Pan, Yudong Wang, Jie Cai

    Abstract: The LLM Agent, equipped with a code interpreter, is capable of automatically solving real-world coding tasks, such as data analysis and image editing. However, existing benchmarks primarily focus on either simplistic tasks, such as completing a few lines of code, or on extremely complex and specific tasks at the repository level, neither of which are representative of various daily coding tasks.… ▽ More

    Submitted 2 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: 16 pages

  12. arXiv:2407.13304  [pdf, other

    cs.CV cs.RO

    A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics

    Authors: Federico Magistri, Thomas Läbe, Elias Marks, Sumanth Nagulavancha, Yue Pan, Claus Smitt, Lasse Klingbeil, Michael Halstead, Heiner Kuhlmann, Chris McCool, Jens Behley, Cyrill Stachniss

    Abstract: As the population is expected to reach 10 billion by 2050, our agricultural production system needs to double its productivity despite a decline of human workforce in the agricultural sector. Autonomous robotic systems are one promising pathway to increase productivity by taking over labor-intensive manual tasks like fruit picking. To be effective, such systems need to monitor and interact with pl… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  13. EmoFace: Audio-driven Emotional 3D Face Animation

    Authors: Chang Liu, Qunfen Lin, Zijiao Zeng, Ye Pan

    Abstract: Audio-driven emotional 3D face animation aims to generate emotionally expressive talking heads with synchronized lip movements. However, previous research has often overlooked the influence of diverse emotions on facial expressions or proved unsuitable for driving MetaHuman models. In response to this deficiency, we introduce EmoFace, a novel audio-driven methodology for creating facial animations… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR). IEEE, 2024

    Journal ref: 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR). IEEE, 2024: 387-397

  14. arXiv:2407.10424  [pdf, other

    cs.PL cs.AI

    CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization

    Authors: Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

    Abstract: The increasing complexity and high costs associated with modern processor design have led to a surge in demand for processor design automation. Instruction-tuned large language models (LLMs) have demonstrated remarkable performance in automatically generating code for general-purpose programming languages like Python. However, these methods fail on hardware description languages (HDLs) like Verilo… ▽ More

    Submitted 20 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: 16 pages, 8 figures, conference

  15. arXiv:2407.10205  [pdf, other

    quant-ph cs.ET math.CO

    Parallel Ising Annealer via Gradient-based Hamiltonian Monte Carlo

    Authors: Hao Wang, Zixuan Liu, Zhixin Xie, Langyu Li, Zibo Miao, Wei Cui, Yu Pan

    Abstract: Ising annealer is a promising quantum-inspired computing architecture for combinatorial optimization problems. In this paper, we introduce an Ising annealer based on the Hamiltonian Monte Carlo, which updates the variables of all dimensions in parallel. The main innovation is the fusion of an approximate gradient-based approach into the Ising annealer which introduces significant acceleration and… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  16. arXiv:2407.09808  [pdf, other

    cs.NI

    SeqBalance: Congestion-Aware Load Balancing with no Reordering for RoCE

    Authors: Huimin Luo, Jiao Zhang, Mingxuan Yu, Yongchen Pan, Tian Pan, Tao Huang

    Abstract: Remote Direct Memory Access (RDMA) is widely used in data center networks because of its high performance. However, due to the characteristics of RDMA's retransmission strategy and the traffic mode of AI training, current load balancing schemes for data center networks are unsuitable for RDMA. In this paper, we propose SeqBalance, a load balancing framework designed for RDMA. SeqBalance implements… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  17. arXiv:2407.09783  [pdf, ps, other

    cs.IT

    Infinite families of optimal and minimal codes over rings using simplicial complexes

    Authors: Yanan Wu, Tingting Pang, Nian Li, Yanbin Pan, Xiangyong Zeng

    Abstract: In this paper, several infinite families of codes over the extension of non-unital non-commutative rings are constructed utilizing general simplicial complexes. Thanks to the special structure of the defining sets, the principal parameters of these codes are characterized. Specially, when the employed simplicial complexes are generated by a single maximal element, we determine their Lee weight dis… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 26 pages

  18. arXiv:2407.05758  [pdf, other

    eess.IV cs.AI cs.CV

    Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

    Authors: Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang

    Abstract: Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  19. arXiv:2407.05643  [pdf, other

    cs.IT eess.SP

    Spatial Non-Stationary Dual-Wideband Channel Estimation for XL-MIMO Systems

    Authors: Anzheng Tang, Jun-Bo Wang, Yijin Pan, Tuo Wu, Chuanwen Chang, Yijian Chen, Hongkang Yu, Maged Elkashlan

    Abstract: In this paper, we investigate the channel estimation problem for extremely large-scale multi-input and multi-output (XL-MIMO) systems, considering the spherical wavefront effect, spatially non-stationary (SnS) property, and dual-wideband effects. To accurately characterize the XL-MIMO channel, we first derive a novel spatial-and-frequency-domain channel model for XL-MIMO systems and carefully exam… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE journal for possible publication

  20. PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation

    Authors: Yinghua Yao, Yuangang Pan, Jing Li, Ivor Tsang, Xin Yao

    Abstract: Recent advancements in the realm of deep generative models focus on generating samples that satisfy multiple desired properties. However, prevalent approaches optimize these property functions independently, thus omitting the trade-offs among them. In addition, the property optimization is often improperly integrated into the generative models, resulting in an unnecessary compromise on generation… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Journal ref: Machine Learning 2024

  21. arXiv:2407.02483  [pdf, other

    cs.CL cs.AI

    MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

    Authors: Binxu Li, Tiankai Yan, Yuanting Pan, Zhe Xu, Jie Luo, Ruiyang Ji, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang

    Abstract: Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To brid… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  22. arXiv:2407.01250  [pdf, other

    cs.LG cs.IT math.DS

    Metric-Entropy Limits on Nonlinear Dynamical System Learning

    Authors: Yang Pan, Clemens Hutter, Helmut Bölcskei

    Abstract: This paper is concerned with the fundamental limits of nonlinear dynamical system learning from input-output traces. Specifically, we show that recurrent neural networks (RNNs) are capable of learning nonlinear systems that satisfy a Lipschitz property and forget past inputs fast enough in a metric-entropy optimal manner. As the sets of sequence-to-sequence maps realized by the dynamical systems w… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  23. arXiv:2407.00247  [pdf, other

    cs.CV

    Prompt Refinement with Image Pivot for Text-to-Image Generation

    Authors: Jingtao Zhan, Qingyao Ai, Yiqun Liu, Yingwei Pan, Ting Yao, Jiaxin Mao, Shaoping Ma, Tao Mei

    Abstract: For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement mod… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024

  24. arXiv:2406.19070  [pdf, other

    cs.CV

    FAGhead: Fully Animate Gaussian Head from Monocular Videos

    Authors: Yixin Xuan, Xinyang Li, Gongxin Yao, Shiwei Zhou, Donghui Sun, Xiaoxin Chen, Yu Pan

    Abstract: High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Repre… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  25. arXiv:2406.15501  [pdf

    cs.CR

    Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

    Authors: Yang Li, Yujie Luo, Yichen Zhang, Ao Sun, Wei Huang, Shuai Zhang, Tao Zhang, Chuang Zhou, Li Ma, Jie Yang, Mei Wu, Heng Wang, Yan Pan, Yun Shao, Xing Chen, Ziyang Chen, Song Yu, Hong Guo, Bingjie Xu

    Abstract: Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  26. arXiv:2406.13568  [pdf, other

    cs.AI

    Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

    Authors: Yuhao Pan, Xiucheng Wang, Nan Cheng, Qi Qiu

    Abstract: With the rapid development of artificial intelligence technology, the field of reinforcement learning has continuously achieved breakthroughs in both theory and practice. However, traditional reinforcement learning algorithms often entail high energy consumption during interactions with the environment. Spiking Neural Network (SNN), with their low energy consumption characteristics and performance… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  27. arXiv:2406.12373  [pdf, other

    cs.CL cs.AI cs.LG

    WebCanvas: Benchmarking Web Agents in Online Environments

    Authors: Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu

    Abstract: For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web… ▽ More

    Submitted 16 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Our platform, tool and dataset are publically available at https://www.imean.ai/web-canvas/ and https://huggingface.co/datasets/iMeanAI/Mind2Web-Live/

    MSC Class: 68T50 ACM Class: I.2.7

  28. arXiv:2406.07961  [pdf, other

    cs.CV cs.AI

    Accurate Explanation Model for Image Classifiers using Class Association Embedding

    Authors: Ruitao Xie, Jingbang Chen, Limai Jiang, Rui Xiao, Yi Pan, Yunpeng Cai

    Abstract: Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 40th IEEE International Conference on Data Engineering

  29. arXiv:2406.07294  [pdf, other

    cs.RO cs.CV

    OTO Planner: An Efficient Only Travelling Once Exploration Planner for Complex and Unknown Environments

    Authors: Bo Zhou, Chuanzhao Lu, Yan Pan, Fu Chen

    Abstract: Autonomous exploration in complex and cluttered environments is essential for various applications. However, there are many challenges due to the lack of global heuristic information. Existing exploration methods suffer from the repeated paths and considerable computational resource requirement in large-scale environments. To address the above issues, this letter proposes an efficient exploration… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  30. arXiv:2406.01894  [pdf, other

    cs.CV

    SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks

    Authors: Yi Pan, Jun-Jie Huang, Zihan Chen, Wentao Zhao, Ziyue Wang

    Abstract: Robust and imperceptible adversarial video attack is challenging due to the spatial and temporal characteristics of videos. The existing video adversarial attack methods mainly take a gradient-based approach and generate adversarial videos with noticeable perturbations. In this paper, we propose a novel Sparse Adversarial Video Attack via Spatio-Temporal Invertible Neural Networks (SVASTIN) to gen… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  31. arXiv:2405.18610  [pdf, other

    cs.LG cs.AI

    DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime

    Authors: Zhiyao Luo, Mingcheng Zhu, Fenglin Liu, Jiali Li, Yangchen Pan, Jiandong Zhou, Tingting Zhu

    Abstract: Reinforcement learning (RL) has garnered increasing recognition for its potential to optimise dynamic treatment regimes (DTRs) in personalised medicine, particularly for drug dosage prescriptions and medication recommendations. However, a significant challenge persists: the absence of a unified framework for simulating diverse healthcare scenarios and a comprehensive analysis to benchmark the effe… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 13 pages for main content

  32. arXiv:2405.18556  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination

    Authors: Zhiyao Luo, Yangchen Pan, Peter Watkinson, Tingting Zhu

    Abstract: In the rapidly changing healthcare landscape, the implementation of offline reinforcement learning (RL) in dynamic treatment regimes (DTRs) presents a mix of unprecedented opportunities and challenges. This position paper offers a critical examination of the current status of offline RL in the context of DTRs. We argue for a reassessment of applying RL in DTRs, citing concerns such as inconsistent… ▽ More

    Submitted 3 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024. 9 pages for main content, 34 pages in total

  33. arXiv:2405.17370  [pdf, other

    eess.SY cs.LG

    Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators

    Authors: Yunian Pan, Quanyan Zhu

    Abstract: Meta-learning has been proposed as a promising machine learning topic in recent years, with important applications to image classification, robotics, computer games, and control systems. In this paper, we study the problem of using meta-learning to deal with uncertainty and heterogeneity in ergodic linear quadratic regulators. We integrate the zeroth-order optimization technique with a typical met… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  34. arXiv:2405.16754  [pdf, other

    cs.RO

    Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning

    Authors: Youqi Pan, Wugen Zhou, Yingdian Cao, Hongbin Zha

    Abstract: Visual-inertial odometry (VIO) has demonstrated remarkable success due to its low-cost and complementary sensors. However, existing VIO methods lack the generalization ability to adjust to different environments and sensor attributes. In this paper, we propose Adaptive VIO, a new monocular visual-inertial odometry that combines online continual learning with traditional nonlinear optimization. Ada… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  35. arXiv:2405.14219  [pdf, other

    cs.LG cs.AI

    Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making

    Authors: Hanzhao Wang, Yu Pan, Fupeng Sun, Shang Liu, Kalyan Talluri, Guanting Chen, Xiaocheng Li

    Abstract: In this paper, we consider the supervised pretrained transformer for a class of sequential decision-making problems. The class of considered problems is a subset of the general formulation of reinforcement learning in that there is no transition probability matrix, and the class of problems covers bandits, dynamic pricing, and newsvendor problems as special cases. Such a structure enables the use… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  36. arXiv:2405.12891  [pdf, other

    cs.CV

    DARK: Denoising, Amplification, Restoration Kit

    Authors: Zhuoheng Li, Yuheng Pan, Houcheng Yu, Zhiheng Zhang

    Abstract: This paper introduces a novel lightweight computational framework for enhancing images under low-light conditions, utilizing advanced machine learning and convolutional neural networks (CNNs). Traditional enhancement techniques often fail to adequately address issues like noise, color distortion, and detail loss in challenging lighting environments. Our approach leverages insights from the Retinex… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  37. arXiv:2405.11993  [pdf, other

    cs.CV

    GGAvatar: Geometric Adjustment of Gaussian Head Avatar

    Authors: Xinyang Li, Jiaxin Wang, Yixin Xuan, Gongxin Yao, Yu Pan

    Abstract: We propose GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations. GGAvatar employs a coarse-to-fine structure, featuring two core modules: Neutral Gaussian Initialization Module and Geometry Morph Adjuster. Neutral Gaussian Initialization Module pairs Gaussian primitives with deformable triangular meshes, employing an ad… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 9 pages, 5 figures

  38. arXiv:2405.03388  [pdf, other

    cs.CV cs.RO

    3D LiDAR Mapping in Dynamic Environments Using a 4D Implicit Neural Representation

    Authors: Xingguang Zhong, Yue Pan, Cyrill Stachniss, Jens Behley

    Abstract: Building accurate maps is a key building block to enable reliable localization, planning, and navigation of autonomous vehicles. We propose a novel approach for building accurate maps of dynamic environments utilizing a sequence of LiDAR scans. To this end, we propose encoding the 4D scene into a novel spatio-temporal implicit neural map representation by fitting a time-dependent truncated signed… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 10 pages, CVPR 2024

  39. arXiv:2405.02151  [pdf, other

    cs.SD cs.AI eess.AS

    GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition

    Authors: Yu Pan, Yuguang Yang, Heng Lu, Lei Ma, Jianjun Zhao

    Abstract: The continuous evolution of pre-trained speech models has greatly advanced Speech Emotion Recognition (SER). However, current research typically relies on utterance-level emotion labels, inadequately capturing the complexity of emotions within a single utterance. In this paper, we introduce GMP-TL, a novel SER framework that employs gender-augmented multi-scale pseudo-label (GMP) based transfer le… ▽ More

    Submitted 16 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  40. arXiv:2404.17484  [pdf, other

    cs.CV eess.IV

    Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model

    Authors: Zhenghong Li, Jiaxiang Ren, Wensheng Cheng, Congwu Du, Yingtian Pan, Haibin Ling

    Abstract: Optical Doppler Tomography (ODT) is a blood flow imaging technique popularly used in bioengineering applications. The fundamental unit of ODT is the 1D frequency response along the A-line (depth), named raw A-scan. A 2D ODT image (B-scan) is obtained by first sensing raw A-scans along the B-line (width), and then constructing the B-scan from these raw A-scans via magnitude-phase analysis and post-… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 19 pages, 5 figures

  41. arXiv:2404.17164  [pdf, other

    cs.LG

    DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs

    Authors: Xindi Zheng, Yuwei Wu, Yu Pan, Wanyu Lin, Lei Ma, Jianjun Zhao

    Abstract: Missing data imputation poses a paramount challenge when dealing with graph data. Prior works typically are based on feature propagation or graph autoencoders to address this issue. However, these methods usually encounter the over-smoothing issue when dealing with missing data, as the graph neural network (GNN) modules are not explicitly designed for handling missing data. This paper proposes a n… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 9 pages

  42. arXiv:2404.15993  [pdf, other

    cs.LG cs.CL

    Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach

    Authors: Linyu Liu, Yu Pan, Xiaocheng Li, Guanting Chen

    Abstract: In this paper, we study the problem of uncertainty estimation and calibration for LLMs. We first formulate the uncertainty estimation problem for LLMs and then propose a supervised approach that takes advantage of the labeled datasets and estimates the uncertainty of the LLMs' responses. Based on the formulation, we illustrate the difference between the uncertainty estimation for LLMs and that for… ▽ More

    Submitted 28 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: 29 pages, 14 figures

    MSC Class: 68T07; 68T50

  43. arXiv:2404.15518  [pdf, other

    cs.LG cs.AI

    An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

    Authors: Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr

    Abstract: In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling. We reformulate the typical supervised learning as an on-policy policy evaluati… ▽ More

    Submitted 16 July, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  44. arXiv:2404.12154  [pdf, other

    cs.CV

    StyleBooth: Image Style Editing with Multimodal Instruction

    Authors: Zhen Han, Chaojie Mao, Zeyinzi Jiang, Yulin Pan, Jingfeng Zhang

    Abstract: Given an original image, image editing aims to generate an image that align with the provided instruction. The challenges are to accept multimodal inputs as instructions and a scarcity of high-quality training data, including crucial triplets of source/target image pairs and multimodal (text and image) instructions. In this paper, we focus on image style editing and present StyleBooth, a method th… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  45. arXiv:2404.10311  [pdf, other

    eess.SY cs.AI

    Learning and Optimization for Price-based Demand Response of Electric Vehicle Charging

    Authors: Chengyang Gu, Yuxin Pan, Ruohong Liu, Yize Chen

    Abstract: In the context of charging electric vehicles (EVs), the price-based demand response (PBDR) is becoming increasingly significant for charging load management. Such response usually encourages cost-sensitive customers to adjust their energy demand in response to changes in price for financial incentives. Thus, to model and optimize EV charging, it is important for charging station operator to model… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by American Control Conference (ACC) 2024

  46. arXiv:2404.08939  [pdf, other

    cs.RO cs.AI cs.HC

    NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT

    Authors: Xinzhe Zheng, Sijie Ji, Yipeng Pan, Kaiwen Zhang, Chenshu Wu

    Abstract: Inertial tracking is vital for robotic IoT and has gained popularity thanks to the ubiquity of low-cost Inertial Measurement Units (IMUs) and deep learning-powered tracking algorithms. Existing works, however, have not fully utilized IMU measurements, particularly magnetometers, nor maximized the potential of deep learning to achieve the desired accuracy. To enhance the tracking accuracy for indoo… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  47. arXiv:2404.08201  [pdf, other

    eess.IV cs.CV

    A Mutual Inclusion Mechanism for Precise Boundary Segmentation in Medical Images

    Authors: Yizhi Pan, Junyi Xin, Tianhua Yang, Teeradaj Racharak, Le-Minh Nguyen, Guanqun Sun

    Abstract: In medical imaging, accurate image segmentation is crucial for quantifying diseases, assessing prognosis, and evaluating treatment outcomes. However, existing methods lack an in-depth integration of global and local features, failing to pay special attention to abnormal regions and boundary details in medical images. To this end, we present a novel deep learning-based approach, MIPC-Net, for preci… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  48. arXiv:2404.04308  [pdf, other

    cs.AI cs.CV cs.LG

    Visual Knowledge in the Big Model Era: Retrospect and Prospect

    Authors: Wenguan Wang, Yi Yang, Yunhe Pan

    Abstract: Visual knowledge is a new form of knowledge representation that can encapsulate visual concepts and their relations in a succinct, comprehensive, and interpretable manner, with a deep root in cognitive psychology. As the knowledge about the visual world has been identified as an indispensable component of human cognition and intelligence, visual knowledge is poised to have a pivotal role in establ… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  49. arXiv:2404.02702  [pdf, other

    cs.SD cs.AI

    PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders

    Authors: Yu Pan, Lei Ma, Jianjun Zhao

    Abstract: Neural speech codec has recently gained widespread attention in generative speech modeling domains, like voice conversion, text-to-speech synthesis, etc. However, ensuring high-fidelity audio reconstruction of speech codecs under low bitrate remains an open and challenging issue. In this paper, we propose PromptCodec, a novel end-to-end neural speech codec using feature-aware prompt encoders based… ▽ More

    Submitted 13 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 7

  50. arXiv:2404.01647  [pdf, other

    cs.CV

    EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

    Authors: Shuai Tan, Bin Ji, Mengxiao Bi, Ye Pan

    Abstract: Achieving disentangled control over multiple facial motions and accommodating diverse input modalities greatly enhances the application and entertainment of the talking head generation. This necessitates a deep exploration of the decoupling space for facial features, ensuring that they a) operate independently without mutual interference and b) can be preserved to share with different modal input,… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 22 pages, 15 figures