(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 251 results for author: Ji, X

Searching in archive cs. Search in all archives.
.
  1. Self-Supervised Contrastive Graph Clustering Network via Structural Information Fusion

    Authors: Xiaoyang Ji, Yuchen Zhou, Haofu Yang, Shiyue Xu, Jiahao Li

    Abstract: Graph clustering, a classical task in graph learning, involves partitioning the nodes of a graph into distinct clusters. This task has applications in various real-world scenarios, such as anomaly detection, social network analysis, and community discovery. Current graph clustering methods commonly rely on module pre-training to obtain a reliable prior distribution for the model, which is then use… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 6 pages, 3 figures

    Journal ref: 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Tianjin, China, 2024, pp. 254-259

  2. arXiv:2408.03703  [pdf, other

    cs.CV

    CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications

    Authors: Tianfang Zhang, Lei Li, Yang Zhou, Wentao Liu, Chen Qian, Xiangyang Ji

    Abstract: Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability. However, the pairwise token affinity and complex matrix operations limit its deployment on resource-constrained scenarios and real-time applications, such as mobile devices, although considerable efforts have been made in previous works. In this paper, we introduc… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  3. arXiv:2408.02045  [pdf, other

    stat.ML cs.LG

    DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

    Authors: Qinshuo Liu, Zixin Wang, Xi-An Li, Xinyao Ji, Lei Zhang, Lin Liu, Zhonghua Liu

    Abstract: Semiparametric statistics play a pivotal role in a wide range of domains, including but not limited to missing data, causal inference, and transfer learning, to name a few. In many settings, semiparametric theory leads to (nearly) statistically optimal procedures that yet involve numerically solving Fredholm integral equations of the second kind. Traditional numerical methods, such as polynomial o… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: semiparametric statistics, missing data, causal inference, Fredholm integral equations of the second kind, bi-level optimization, deep learning, AI for science

  4. arXiv:2408.00297  [pdf, other

    cs.CV

    EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

    Authors: Qianyun He, Xinya Ji, Yicheng Gong, Yuanxun Lu, Zhengyu Diao, Linjia Huang, Yao Yao, Siyu Zhu, Zhan Ma, Songcen Xu, Xiaofei Wu, Zixiao Zhang, Xun Cao, Hao Zhu

    Abstract: We present a novel approach for synthesizing 3D talking heads with controllable emotion, featuring enhanced lip synchronization and rendering quality. Despite significant progress in the field, prior methods still suffer from multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect EmoTalk3D dataset with calibrated multi-view videos, emotional annotations,… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  5. arXiv:2407.19523  [pdf, other

    cs.LG

    Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation

    Authors: Cheems Wang, Yiqin Lv, Yixiu Mao, Yun Qu, Yi Xu, Xiangyang Ji

    Abstract: Meta-learning is a practical learning paradigm to transfer skills across tasks from a few examples. Nevertheless, the existence of task distribution shifts tends to weaken meta-learners' generalization capability, particularly when the task distribution is naively hand-crafted or based on simple priors that fail to cover typical scenarios sufficiently. Here, we consider explicitly generative model… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: The project is available at https://sites.google.com/view/ar-metalearn

  6. arXiv:2407.13237  [pdf, other

    cs.AI

    LLM-Empowered State Representation for Reinforcement Learning

    Authors: Boyuan Wang, Yun Qu, Yuhang Jiang, Jianzhun Shao, Chang Liu, Wenming Yang, Xiangyang Ji

    Abstract: Conventional state representations in reinforcement learning often omit critical task-related details, presenting a significant challenge for value networks in establishing accurate mappings from states to task rewards. Traditional methods typically depend on extensive sample learning to enrich state representations with task-specific information, which leads to low sample efficiency and high time… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  7. arXiv:2407.12820  [pdf, other

    cs.CL cs.AI cs.LG

    PQCache: Product Quantization-based KVCache for Long Context LLM Inference

    Authors: Hailin Zhang, Xiaodong Ji, Yilin Chen, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Weipeng Chen, Bin Cui

    Abstract: As the field of Large Language Models (LLMs) continues to evolve, the context length in inference is steadily growing. Key-Value Cache (KVCache), a crucial component in LLM inference, has now become the primary memory bottleneck due to limited GPU memory. Current methods selectively determine suitable keys and values for self-attention computation in LLMs to address the issue. However, they either… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  8. On Diversity in Discriminative Neural Networks

    Authors: Brahim Oubaha, Claude Berrou, Xueyao Ji, Yehya Nasser, Raphaël Le Bidan

    Abstract: Diversity is a concept of prime importance in almost all disciplines based on information processing. In telecommunications, for example, spatial, temporal, and frequency diversity, as well as redundant coding, are fundamental concepts that have enabled the design of extremely efficient systems. In machine learning, in particular with neural networks, diversity is not always a concept that is emph… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Published in: 2024 IEEE 12th International Symposium on Signal, Image, Video and Communications (ISIVC)

  9. arXiv:2407.10528  [pdf, other

    cs.CV

    Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

    Authors: Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Runyi Yu, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen

    Abstract: Text-to-motion generation requires not only grounding local actions in language but also seamlessly blending these individual actions to synthesize diverse and realistic global motions. However, existing motion generation methods primarily focus on the direct synthesis of global motions while neglecting the importance of generating and controlling local actions. In this paper, we propose the local… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  10. arXiv:2407.10267  [pdf, other

    cs.CV

    RS-NeRF: Neural Radiance Fields from Rolling Shutter Images

    Authors: Muyao Niu, Tong Chen, Yifan Zhan, Zhuoxiao Li, Xiang Ji, Yinqiang Zheng

    Abstract: Neural Radiance Fields (NeRFs) have become increasingly popular because of their impressive ability for novel view synthesis. However, their effectiveness is hindered by the Rolling Shutter (RS) effects commonly found in most camera systems. To solve this, we present RS-NeRF, a method designed to synthesize normal images from novel views using input with RS distortions. This involves a physical mo… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 ; Codes and data: https://github.com/MyNiuuu/RS-NeRF

  11. arXiv:2407.09642  [pdf, other

    cs.LG

    Seq-to-Final: A Benchmark for Tuning from Sequential Distributions to a Final Time Point

    Authors: Christina X Ji, Ahmed M Alaa, David Sontag

    Abstract: Distribution shift over time occurs in many settings. Leveraging historical data is necessary to learn a model for the last time point when limited data is available in the final period, yet few methods have been developed specifically for this purpose. In this work, we construct a benchmark with different sequences of synthetic shifts to evaluate the effectiveness of 3 classes of methods that 1)… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  12. arXiv:2407.00608  [pdf, other

    cs.AI cs.CL cs.CV

    Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

    Authors: Shian Du, Xiaotian Cheng, Qi Qian, Henglu Wei, Yi Xu, Xiangyang Ji

    Abstract: Personalized text-to-image generation has attracted unprecedented attention in the recent few years due to its unique capability of generating highly-personalized images via using the input concept dataset and novel textual prompt. However, previous methods solely focus on the performance of the reconstruction task, degrading its ability to combine with different textual prompt. Besides, optimizin… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  13. arXiv:2406.18284  [pdf, other

    cs.CV

    RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

    Authors: Xiaozhong Ji, Chuming Lin, Zhonggan Ding, Ying Tai, Junwei Zhu, Xiaobin Hu, Donghao Luo, Yanhao Ge, Chengjie Wang

    Abstract: Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical applications. The challenges are two-fold: 1) Preserving unique individual traits for achieving high-precision lip synchronization. 2) Generating high-qual… ▽ More

    Submitted 8 August, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  14. arXiv:2406.14969  [pdf, other

    cs.LG cs.AI

    Uni-Mol2: Exploring Molecular Pretraining Model at Scale

    Authors: Xiaohong Ji, Zhen Wang, Zhifeng Gao, Hang Zheng, Linfeng Zhang, Guolin Ke, Weinan E

    Abstract: In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining mo… ▽ More

    Submitted 1 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  15. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  16. arXiv:2406.07828  [pdf, other

    cs.CV

    Spatial Annealing Smoothing for Efficient Few-shot Neural Rendering

    Authors: Yuru Xiao, Xianming Liu, Deming Zhai, Kui Jiang, Junjun Jiang, Xiangyang Ji

    Abstract: Neural Radiance Fields (NeRF) with hybrid representations have shown impressive capabilities in reconstructing scenes for view synthesis, delivering high efficiency. Nonetheless, their performance significantly drops with sparse view inputs, due to the issue of overfitting. While various regularization strategies have been devised to address these challenges, they often depend on inefficient assum… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  17. arXiv:2406.04274  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models

    Authors: Xiang Ji, Sanjeev Kulkarni, Mengdi Wang, Tengyang Xie

    Abstract: This work studies the challenge of aligning large language models (LLMs) with offline preference data. We focus on alignment by Reinforcement Learning from Human Feedback (RLHF) in particular. While popular preference optimization methods exhibit good empirical performance in practice, they are not theoretically guaranteed to converge to the optimal policy and can provably fail when the data cover… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  18. arXiv:2405.18300  [pdf, other

    cs.AI

    CompetEvo: Towards Morphological Evolution from Competition

    Authors: Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Huaping Liu

    Abstract: Training an agent to adapt to specific tasks through co-optimization of morphology and control has widely attracted attention. However, whether there exists an optimal configuration and tactics for agents in a multiagent competition scenario is still an issue that is challenging to definitively conclude. In this context, we propose competitive evolution (CompetEvo), which co-evolves agents' design… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  19. arXiv:2405.08886  [pdf, other

    cs.LG stat.ML

    The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

    Authors: Ziquan Liu, Yufei Cui, Yan Yan, Yi Xu, Xiangyang Ji, Xue Liu, Antoni B. Chan

    Abstract: In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness th… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: ICML2024

  20. arXiv:2405.07626  [pdf, other

    cs.LG cs.AI

    AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language Models

    Authors: Shuo Liu, Di Yao, Lanting Fang, Zhetao Li, Wenbin Li, Kaiyu Feng, XiaoWen Ji, Jingping Bi

    Abstract: Detecting anomaly edges for dynamic graphs aims to identify edges significantly deviating from the normal pattern and can be applied in various domains, such as cybersecurity, financial transactions and AIOps. With the evolving of time, the types of anomaly edges are emerging and the labeled anomaly samples are few for each type. Current methods are either designed to detect randomly inserted edge… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 13pages

  21. arXiv:2405.06536  [pdf, other

    cs.CV

    Mesh Denoising Transformer

    Authors: Wenbo Zhao, Xianming Liu, Deming Zhai, Junjun Jiang, Xiangyang Ji

    Abstract: Mesh denoising, aimed at removing noise from input meshes while preserving their feature structures, is a practical yet challenging task. Despite the remarkable progress in learning-based mesh denoising methodologies in recent years, their network designs often encounter two principal drawbacks: a dependence on single-modal geometric representations, which fall short in capturing the multifaceted… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  22. arXiv:2405.00718  [pdf, other

    cs.CL cs.AI

    Can't say cant? Measuring and Reasoning of Dark Jargons in Large Language Models

    Authors: Xu Ji, Jianyi Zhang, Ziyin Zhou, Zhangchi Zhao, Qianqian Qiao, Kaiying Han, Md Imran Hossen, Xiali Hei

    Abstract: Ensuring the resilience of Large Language Models (LLMs) against malicious exploitation is paramount, with recent focus on mitigating offensive responses. Yet, the understanding of cant or dark jargon remains unexplored. This paper introduces a domain-specific Cant dataset and CantCounter evaluation framework, employing Fine-Tuning, Co-Tuning, Data-Diffusion, and Data-Analysis stages. Experiments r… ▽ More

    Submitted 25 April, 2024; originally announced May 2024.

  23. arXiv:2404.12768  [pdf, other

    cs.CV cs.AI cs.GR

    MixLight: Borrowing the Best of both Spherical Harmonics and Gaussian Models

    Authors: Xinlong Ji, Fangneng Zhan, Shijian Lu, Shi-Sheng Huang, Hua Huang

    Abstract: Accurately estimating scene lighting is critical for applications such as mixed reality. Existing works estimate illumination by generating illumination maps or regressing illumination parameters. However, the method of generating illumination maps has poor generalization performance and parametric models such as Spherical Harmonic (SH) and Spherical Gaussian (SG) fall short in capturing high-freq… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  24. arXiv:2404.06666  [pdf, other

    cs.CV cs.AI cs.CL cs.CR

    SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

    Authors: Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, Wenyuan Xu

    Abstract: Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexual scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing impro… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Journal ref: ACM Conference on Computer and Communications Security (CCS 2024)

  25. arXiv:2404.03962  [pdf, other

    cs.CV

    RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications

    Authors: Xingyu Liu, Chenyangguang Zhang, Gu Wang, Ruida Zhang, Xiangyang Ji

    Abstract: In robotic vision, a de-facto paradigm is to learn in simulated environments and then transfer to real-world applications, which poses an essential challenge in bridging the sim-to-real domain gap. While mainstream works tackle this problem in the RGB domain, we focus on depth data synthesis and develop a range-aware RGB-D data simulation pipeline (RaSim). In particular, high-fidelity depth data i… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: accepted by ICRA'24

  26. arXiv:2404.01120  [pdf, other

    cs.CV

    Motion Blur Decomposition with Cross-shutter Guidance

    Authors: Xiang Ji, Haiyang Jiang, Yinqiang Zheng

    Abstract: Motion blur is a frequently observed image artifact, especially under insufficient illumination where exposure time has to be prolonged so as to collect more photons for a bright enough image. Rather than simply removing such blurring effects, recent researches have aimed at decomposing a blurry image into multiple sharp images with spatial and temporal coherence. Since motion blur decomposition i… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  27. arXiv:2404.00992  [pdf, other

    cs.CV

    SGCNeRF: Few-Shot Neural Rendering via Sparse Geometric Consistency Guidance

    Authors: Yuru Xiao, Xianming Liu, Deming Zhai, Kui Jiang, Junjun Jiang, Xiangyang Ji

    Abstract: Neural Radiance Field (NeRF) technology has made significant strides in creating novel viewpoints. However, its effectiveness is hampered when working with sparsely available views, often leading to performance dips due to overfitting. FreeNeRF attempts to overcome this limitation by integrating implicit geometry regularization, which incrementally improves both geometry and textures. Nonetheless,… ▽ More

    Submitted 17 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  28. arXiv:2403.18512  [pdf, other

    cs.CV

    ParCo: Part-Coordinating Text-to-Motion Synthesis

    Authors: Qiran Zou, Shangyuan Yuan, Shian Du, Yu Wang, Chang Liu, Yi Xu, Jie Chen, Xiangyang Ji

    Abstract: We study a challenging task: text-to-motion synthesis, aiming to generate motions that align with textual descriptions and exhibit coordinated movements. Currently, the part-based methods introduce part partition into the motion synthesis process to achieve finer-grained generation. However, these methods encounter challenges such as the lack of coordination between different part motions and diff… ▽ More

    Submitted 23 July, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by ECCV 2024. Code: https://github.com/qrzou/ParCo

  29. arXiv:2403.17994  [pdf, other

    cs.CV cs.LG

    Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023

    Authors: Hongpeng Pan, Yang Yang, Zhongtian Fu, Yuxuan Zhang, Shian Du, Yi Xu, Xiangyang Ji

    Abstract: This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet eff… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  30. arXiv:2403.17094  [pdf, other

    cs.CV cs.LG

    SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving

    Authors: Yiming Xie, Henglu Wei, Zhenyi Liu, Xiaoyu Wang, Xiangyang Ji

    Abstract: To advance research in learning-based defogging algorithms, various synthetic fog datasets have been developed. However, existing datasets created using the Atmospheric Scattering Model (ASM) or real-time rendering engines often struggle to produce photo-realistic foggy images that accurately mimic the actual imaging process. This limitation hinders the effective generalization of models from synt… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  31. arXiv:2403.16561  [pdf, other

    cs.LG cs.AI

    FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning

    Authors: Xinyuan Ji, Zhaowei Zhu, Wei Xi, Olga Gadyatskaya, Zilong Song, Yong Cai, Yang Liu

    Abstract: Federated Learning (FL) heavily depends on label quality for its performance. However, the label distribution among individual clients is always both noisy and heterogeneous. The high loss incurred by client-specific samples in heterogeneous label noise poses challenges for distinguishing between client-specific and noisy label samples, impacting the effectiveness of existing label noise learning… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: accepted by AAA24

  32. arXiv:2403.16034  [pdf, other

    cs.CV

    V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

    Authors: Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, Li Jin, Mingyue Lei, Zhaoyang Ma, Zihang He, Haoxuan Ma, Yunshuang Yuan, Yingqian Zhao, Jiaqi Ma

    Abstract: Recent advancements in Vehicle-to-Everything (V2X) technologies have enabled autonomous vehicles to share sensing information to see through occlusions, greatly boosting the perception capability. However, there are no real-world datasets to facilitate the real V2X cooperative perception research -- existing datasets either only support Vehicle-to-Infrastructure cooperation or Vehicle-to-Vehicle c… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  33. arXiv:2403.12316  [pdf, other

    cs.CL

    OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety

    Authors: Chuang Liu, Linhao Yu, Jiaxuan Li, Renren Jin, Yufei Huang, Ling Shi, Junhui Zhang, Xinmeng Ji, Tingting Cui, Tao Liu, Jinwang Song, Hongying Zan, Sun Li, Deyi Xiong

    Abstract: The rapid development of Chinese large language models (LLMs) poses big challenges for efficient LLM evaluation. While current initiatives have introduced new benchmarks or evaluation platforms for assessing Chinese LLMs, many of these focus primarily on capabilities, usually overlooking potential alignment and safety issues. To address this gap, we introduce OpenEval, an evaluation testbed that b… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  34. arXiv:2403.10487  [pdf, other

    cs.RO cs.AI

    Stimulate the Potential of Robots via Competition

    Authors: Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Huaping Liu

    Abstract: It is common for us to feel pressure in a competition environment, which arises from the desire to obtain success comparing with other individuals or opponents. Although we might get anxious under the pressure, it could also be a drive for us to stimulate our potentials to the best in order to keep up with others. Inspired by this, we propose a competitive learning framework which is able to help… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  35. arXiv:2403.10099  [pdf, other

    cs.CV

    KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation

    Authors: Ruida Zhang, Chenyangguang Zhang, Yan Di, Fabian Manhardt, Xingyu Liu, Federico Tombari, Xiangyang Ji

    Abstract: In this paper, we present KP-RED, a unified KeyPoint-driven REtrieval and Deformation framework that takes object scans as input and jointly retrieves and deforms the most geometrically similar CADきゃど models from a pre-processed database to tightly match the target. Unlike existing dense matching based methods that typically struggle with noisy partial scans, we propose to leverage category-consisten… ▽ More

    Submitted 20 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  36. arXiv:2403.06775  [pdf, other

    cs.CV

    FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

    Authors: Pengchong Qiao, Lei Shang, Chang Liu, Baigui Sun, Xiangyang Ji, Jie Chen

    Abstract: Subject-driven generation has garnered significant interest recently due to its ability to personalize text-to-image generation. Typical works focus on learning the new subject's private attributes. However, an important fact has not been taken seriously that a subject is not an isolated new concept but should be a specialization of a certain category in the pre-trained model. This results in the… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: accepted by CVPR2024

  37. arXiv:2403.06461  [pdf, other

    cs.CV

    Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation

    Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Xingyu Ji, Shenghai Yuan, Lihua Xie

    Abstract: Multi-modal test-time adaptation (MM-TTA) is proposed to adapt models to an unlabeled target domain by leveraging the complementary multi-modal inputs in an online manner. Previous MM-TTA methods for 3D segmentation rely on predictions of cross-modal information in each input frame, while they ignore the fact that predictions of geometric neighborhoods within consecutive frames are highly correlat… ▽ More

    Submitted 25 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  38. arXiv:2403.06202  [pdf, other

    eess.SY cs.GT

    Pursuit Winning Strategies for Reach-Avoid Games with Polygonal Obstacles

    Authors: Rui Yan, Shuai Mi, Xiaoming Duan, Jintao Chen, Xiangyang Ji

    Abstract: This paper studies a multiplayer reach-avoid differential game in the presence of general polygonal obstacles that block the players' motions. The pursuers cooperate to protect a convex region from the evaders who try to reach the region. We propose a multiplayer onsite and close-to-goal (MOCG) pursuit strategy that can tell and achieve an increasing lower bound on the number of guaranteed defeate… ▽ More

    Submitted 22 May, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: 16 pages, 10 figures

  39. arXiv:2403.06168  [pdf, other

    cs.CV cs.AI

    DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

    Authors: Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, Jinlong Peng, Zhengkai Jiang, Jiangning Zhang, Taisong Jin, Chengjie Wang, Rongrong Ji

    Abstract: Due to the difficulty and labor-consuming nature of getting highly accurate or matting annotations, there only exists a limited amount of highly accurate labels available to the public. To tackle this challenge, we propose a DiffuMatting which inherits the strong Everything generation ability of diffusion and endows the power of "matting anything". Our DiffuMatting can 1). act as an anything matti… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  40. arXiv:2403.05438  [pdf, other

    cs.CV

    VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

    Authors: Yabo Zhang, Yuxiang Wei, Xianhui Lin, Zheng Hui, Peiran Ren, Xuansong Xie, Xiangyang Ji, Wangmeng Zuo

    Abstract: Text-to-image diffusion models (T2I) have demonstrated unprecedented capabilities in creating realistic and aesthetic images. On the contrary, text-to-video diffusion models (T2V) still lag far behind in frame quality and text alignment, owing to insufficient quality and quantity of training videos. In this paper, we introduce VideoElevator, a training-free and plug-and-play method, which elevates… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Project page: https://videoelevator.github.io Code: https://github.com/YBYBZhang/VideoElevator

  41. arXiv:2403.05160  [pdf, other

    cs.CV

    MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models

    Authors: Zijie Fang, Yifeng Wang, Zhi Wang, Jian Zhang, Xiangyang Ji, Yongbing Zhang

    Abstract: Recently, pathological diagnosis, the gold standard for cancer diagnosis, has achieved superior performance by combining the Transformer with the multiple instance learning (MIL) framework using whole slide images (WSIs). However, the giga-pixel nature of WSIs poses a great challenge for the quadratic-complexity self-attention mechanism in Transformer to be applied in MIL. Existing studies usually… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 11 pages, 2 figures

  42. arXiv:2403.02905  [pdf, other

    cs.MM

    MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model

    Authors: Sen Wang, Jiangning Zhang, Weijian Cao, Xiaobin Hu, Moran Li, Xiaozhong Ji, Xin Tan, Mengtian Li, Zhifeng Xie, Chengjie Wang, Lizhuang Ma

    Abstract: The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generating realistic and diverse motion is a challenging task. In this paper, we propose MMoFusion, a Multi-modal co-speech Motion generation framework based o… ▽ More

    Submitted 17 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  43. arXiv:2402.11826  [pdf, other

    cs.CV

    Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios

    Authors: Jialei Xu, Xianming Liu, Junjun Jiang, Kui Jiang, Rui Li, Kai Cheng, Xiangyang Ji

    Abstract: Monocular depth estimation from RGB images plays a pivotal role in 3D vision. However, its accuracy can deteriorate in challenging environments such as nighttime or adverse weather conditions. While long-wave infrared cameras offer stable imaging in such challenging conditions, they are inherently low-resolution, lacking rich texture and semantics as delivered by the RGB image. Current methods foc… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  44. arXiv:2402.06131  [pdf, other

    cs.RO cs.CV

    PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes

    Authors: Xinggang Hu, Yanmin Wu, Mingyuan Zhao, Linghao Yang, Xiangkui Zhang, Xiangyang Ji

    Abstract: Visual SLAM (Simultaneous Localization and Mapping) based on planar features has found widespread applications in fields such as environmental structure perception and augmented reality. However, current research faces challenges in accurately localizing and mapping in planar ambiguous scenes, primarily due to the poor accuracy of the employed planar features and data association methods. In this… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  45. arXiv:2402.00002  [pdf, other

    cs.NI

    Raptor Encoding for Low-Latency Concurrent Multi-PDU Session Transmission with Security Consideration in B5G Edge Network

    Authors: Zhongfu Guo, Xinsheng Ji, Wei You, Mingyan Xu, Yu Zhao, Zhimo Cheng, Deqiang Zhou

    Abstract: In B5G edge networks, end-to-end low-latency and high-reliability transmissions between edge computing nodes and terminal devices are essential. This paper investigates the queue-aware coding scheduling transmission of randomly arriving data packets, taking into account potential eavesdroppers in edge networks. To address these concerns, we introduce SCLER, a Protocol Data Units (PDU) Raptor-encod… ▽ More

    Submitted 4 October, 2023; originally announced February 2024.

  46. arXiv:2401.11395  [pdf, other

    cs.CV

    UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation

    Authors: Qingdong He, Jinlong Peng, Zhengkai Jiang, Kai Wu, Xiaozhong Ji, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Mingang Chen, Yunsheng Wu

    Abstract: 3D open-vocabulary scene understanding aims to recognize arbitrary novel categories beyond the base label space. However, existing works not only fail to fully utilize all the available modal information in the 3D domain but also lack sufficient granularity in representing the features of each modality. In this paper, we propose a unified multimodal 3D open-vocabulary scene understanding network,… ▽ More

    Submitted 20 April, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Comments: Accepted by IJCAI 2024

  47. arXiv:2401.06548  [pdf, other

    cs.CV

    Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning

    Authors: Chenyang Wang, Junjun Jiang, Xingyu Hu, Xianming Liu, Xiangyang Ji

    Abstract: Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks, where old data from experienced tasks is unavailable when learning from a new task. To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks. These methods usually adopt an extra memory to store the data for replay. However, it is not expected… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  48. arXiv:2401.02673  [pdf, other

    eess.AS cs.AI cs.SD

    A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

    Authors: Dongdi Zhao, Jianbo Ma, Lu Lu, Jinke Li, Xuan Ji, Lei Zhu, Fuming Fang, Ming Liu, Feijun Jiang

    Abstract: Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  49. arXiv:2401.00151  [pdf, other

    cs.CV cs.CR

    CamPro: Camera-based Anti-Facial Recognition

    Authors: Wenjun Zhu, Yuan Sun, Jiani Liu, Yushi Cheng, Xiaoyu Ji, Wenyuan Xu

    Abstract: The proliferation of images captured from millions of cameras and the advancement of facial recognition (FR) technology have made the abuse of FR a severe privacy threat. Existing works typically rely on obfuscation, synthesis, or adversarial examples to modify faces in images to achieve anti-facial recognition (AFR). However, the unmodified images captured by camera modules that contain sensitive… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Accepted by NDSS Symposium 2024

  50. arXiv:2401.00148  [pdf, other

    cs.CR cs.CV

    TPatch: A Triggered Physical Adversarial Patch

    Authors: Wenjun Zhu, Xiaoyu Ji, Yushi Cheng, Shibo Zhang, Wenyuan Xu

    Abstract: Autonomous vehicles increasingly utilize the vision-based perception module to acquire information about driving environments and detect obstacles. Correct detection and classification are important to ensure safe driving decisions. Existing works have demonstrated the feasibility of fooling the perception models such as object detectors and image classifiers with printed adversarial patches. Howe… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Appeared in 32nd USENIX Security Symposium (USENIX Security 23)