(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 777 results for author: Zhang, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.08883  [pdf

    cs.CV

    TractGraphFormer: Anatomically Informed Hybrid Graph CNN-Transformer Network for Classification from Diffusion MRI Tractography

    Authors: Yuqian Chen, Fan Zhang, Meng Wang, Leo R. Zekelman, Suheyla Cetin-Karayumak, Tengfei Xue, Chaoyi Zhang, Yang Song, Nikos Makris, Yogesh Rathi, Weidong Cai, Lauren J. O'Donnell

    Abstract: The relationship between brain connections and non-imaging phenotypes is increasingly studied using deep neural networks. However, the local and global properties of the brain's white matter networks are often overlooked in convolutional network design. We introduce TractGraphFormer, a hybrid Graph CNN-Transformer deep learning framework tailored for diffusion MRI tractography. This model leverage… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 4 figures

  2. arXiv:2407.08303  [pdf, other

    cs.CV cs.AI

    DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

    Authors: Xiaotong Li, Fan Zhang, Haiwen Diao, Yueze Wang, Xinlong Wang, Ling-Yu Duan

    Abstract: Existing Multimodal Large Language Models (MLLMs) increasingly emphasize complex understanding of various visual elements, including multiple objects, text information, and spatial relations. Their development for comprehensive visual perception hinges on the availability of high-quality image-text datasets that offer diverse visual elements and throughout image descriptions. However, the scarcity… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  3. arXiv:2407.05749  [pdf, other

    eess.SP cs.HC cs.LG

    LDGCN: An Edge-End Lightweight Dual GCN Based on Single-Channel EEG for Driver Drowsiness Monitoring

    Authors: Jingwei Huang, Chuansheng Wang, Jiayan Huang, Haoyi Fan, Antoni Grau, Fuquan Zhang

    Abstract: Driver drowsiness electroencephalography (EEG) signal monitoring can timely alert drivers of their drowsiness status, thereby reducing the probability of traffic accidents. Graph convolutional networks (GCNs) have shown significant advancements in processing the non-stationary, time-varying, and non-Euclidean nature of EEG signals. However, the existing single-channel EEG adjacency graph construct… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  4. arXiv:2407.04396  [pdf, other

    cs.CV cs.AI

    Graph-Guided Test-Time Adaptation for Glaucoma Diagnosis using Fundus Photography

    Authors: Qian Zeng, Le Zhang, Yipeng Liu, Ce Zhu, Fan Zhang

    Abstract: Glaucoma is a leading cause of irreversible blindness worldwide. While deep learning approaches using fundus images have largely improved early diagnosis of glaucoma, variations in images from different devices and locations (known as domain shifts) challenge the use of pre-trained models in real-world settings. To address this, we propose a novel Graph-guided Test-Time Adaptation (GTTA) framework… ▽ More

    Submitted 9 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures, 3 tables, submitted to MICCAI

  5. arXiv:2407.03964  [pdf, other

    cs.CL cs.LG

    Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

    Authors: Fuxiang Zhang, Junyou Li, Yi-Chen Li, Zongzhang Zhang, Yang Yu, Deheng Ye

    Abstract: Low sample efficiency is an enduring challenge of reinforcement learning (RL). With the advent of versatile large language models (LLMs), recent works impart common-sense knowledge to accelerate policy learning for RL processes. However, we note that such guidance is often tailored for one specific task but loses generalizability. In this paper, we introduce a framework that harnesses LLMs to extr… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  6. arXiv:2407.03856  [pdf, other

    cs.LG

    Q-Adapter: Training Your LLM Adapter as a Residual Q-Function

    Authors: Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu

    Abstract: We consider the problem of adapting Large Language Models (LLMs) pre-trained with Reinforcement Learning from Human Feedback (RLHF) to downstream preference data. Naive approaches to achieve this could be supervised fine-tuning on preferred responses or reinforcement learning with a learned reward model. However, the LLM runs the risk of forgetting its initial knowledge as the fine-tuning progress… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  7. arXiv:2407.02880  [pdf, other

    cs.LG cs.AI cs.CV

    Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

    Authors: Frederic Z. Zhang, Paul Albert, Cristian Rodriguez-Opazo, Anton van den Hengel, Ehsan Abbasnejad

    Abstract: Pre-trained models produce strong generic representations that can be adapted via fine-tuning. The learned weight difference relative to the pre-trained model, known as a task vector, characterises the direction and stride of fine-tuning. The significance of task vectors is such that simple arithmetic operations on them can be used to combine diverse representations from different domains. This pa… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  8. arXiv:2407.02675  [pdf, other

    eess.IV cs.CV

    Depth-Aware Endoscopic Video Inpainting

    Authors: Francis Xiatian Zhang, Shuang Chen, Xianghua Xie, Hubert P. H. Shum

    Abstract: Video inpainting fills in corrupted video content with plausible replacements. While recent advances in endoscopic video inpainting have shown potential for enhancing the quality of endoscopic videos, they mainly repair 2D visual information without effectively preserving crucial 3D spatial details for clinical reference. Depth-aware inpainting methods attempt to preserve these details by incorpor… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  9. arXiv:2407.01461  [pdf, other

    cs.CL

    Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement

    Authors: Zisu Huang, Xiaohua Wang, Feiran Zhang, Zhibo Xu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: The capacity of large language models (LLMs) to generate honest, harmless, and helpful responses heavily relies on the quality of user prompts. However, these prompts often tend to be brief and vague, thereby significantly limiting the full potential of LLMs. Moreover, harmful prompts can be meticulously crafted and manipulated by adversaries to jailbreak LLMs, inducing them to produce potentially… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  10. arXiv:2407.01230  [pdf, other

    cs.CV

    DaBiT: Depth and Blur informed Transformer for Joint Refocusing and Super-Resolution

    Authors: Crispian Morris, Nantheera Anantrasirichai, Fan Zhang, David Bull

    Abstract: In many real-world scenarios, recorded videos suffer from accidental focus blur, and while video deblurring methods exist, most specifically target motion blur. This paper introduces a framework optimised for the joint task of focal deblurring (refocusing) and video super-resolution (VSR). The proposed method employs novel map guided transformers, in addition to image propagation, to effectively l… ▽ More

    Submitted 10 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  11. arXiv:2407.01219  [pdf, other

    cs.CL

    Searching for Best Practices in Retrieval-Augmented Generation

    Authors: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  12. arXiv:2406.19240  [pdf, other

    cs.SE

    Data Preparation for Deep Learning based Code Smell Detection: A Systematic Literature Review

    Authors: Fengji Zhang, Zexian Zhang, Jacky Wai Keung, Xiangru Tang, Zhen Yang, Xiao Yu, Wenhua Hu

    Abstract: Code Smell Detection (CSD) plays a crucial role in improving software quality and maintainability. And Deep Learning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-based CSD methods heavily relies on the quality of the training data. Despite its importance, little attention has been paid to analyzing the data prepara… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  13. arXiv:2406.17694  [pdf, other

    cs.CR

    Protecting the 'Stop Using My Data' Right through Blockchain-assisted Evidence Generation

    Authors: Fan Zhang, Peng Liu

    Abstract: In order to provide personalized services to users, Internet-based platforms collect and utilize user-generated behavioral data. Although the 'stop using my data' right should be a fundamental data right, which allows individuals to request their personal data to be no longer utilized by online platforms, the existing preventive data protection measures (e.g., cryptographic data elimination, diffe… ▽ More

    Submitted 29 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  14. arXiv:2406.16486  [pdf, other

    cs.AI

    Towards Comprehensive Preference Data Collection for Reward Modeling

    Authors: Yulan Hu, Qingyang Li, Sheng Ouyang, Ge Chen, Kaihui Chen, Lijun Mei, Xucheng Ye, Fuzheng Zhang, Yong Liu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models (LLMs) with human preferences, thereby enhancing the quality of responses generated. A critical component of RLHF is the reward model, which is trained on preference data and outputs a scalar reward during the inference stage. However, the collection of preference data still lacks thorough investig… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  15. arXiv:2406.16062  [pdf, other

    cs.NE

    Towards Biologically Plausible Computing: A Comprehensive Comparison

    Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

    Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  16. arXiv:2406.13597  [pdf, other

    cs.LG cs.AI

    GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov Arnold Networks

    Authors: Fan Zhang, Xin Zhang

    Abstract: Massive number of applications involve data with underlying relationships embedded in non-Euclidean space. Graph neural networks (GNNs) are utilized to extract features by capturing the dependencies within graphs. Despite groundbreaking performances, we argue that Multi-layer perceptrons (MLPs) and fixed activation functions impede the feature extraction due to information loss. Inspired by Kolmog… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  17. arXiv:2406.13268  [pdf, other

    eess.AS cs.SD

    CEC: A Noisy Label Detection Method for Speaker Recognition

    Authors: Yao Shen, Yingying Gao, Yaqian Hao, Chenguang Hu, Fulin Zhang, Junlan Feng, Shilei Zhang

    Abstract: Noisy labels are inevitable, even in well-annotated datasets. The detection of noisy labels is of significant importance to enhance the robustness of speaker recognition models. In this paper, we propose a novel noisy label detection approach based on two new statistical metrics: Continuous Inconsistent Counting (CIC) and Total Inconsistent Counting (TIC). These metrics are calculated through Cros… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: interspeech 2024

  18. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, Jingyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  19. arXiv:2406.11277  [pdf, other

    cs.CL

    Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector

    Authors: Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Kun Gai, Ji-Rong Wen

    Abstract: Hallucination detection is a challenging task for large language models (LLMs), and existing studies heavily rely on powerful closed-source LLMs such as GPT-4. In this paper, we propose an autonomous LLM-based agent framework, called HaluAgent, which enables relatively smaller LLMs (e.g. Baichuan2-Chat 7B) to actively select suitable tools for detecting multiple hallucination types such as text, c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  20. arXiv:2406.10810  [pdf, other

    cs.RO

    RGBlimp-Q: Robotic Gliding Blimp With Moving Mass Control Based on a Bird-Inspired Continuum Arm

    Authors: Hao Cheng, Feitian Zhang

    Abstract: Robotic blimps, as lighter-than-air aerial systems, offer prolonged duration and enhanced safety in human-robot interactions due to their buoyant lift. However, robust flight against environmental airflow disturbances remains a significant challenge, limiting the broader application of these robots. Drawing inspiration from the flight mechanics of birds and their ability to perch against natural w… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  21. arXiv:2406.10558  [pdf, other

    cs.RO

    A Hybrid Controller Design for Human-Assistive Piloting of an Underactuated Blimp

    Authors: Wugang Meng, Tianfu Wu, Qiuyang Tao, Fumin Zhang

    Abstract: This paper introduces a novel solution to the manual control challenge for indoor blimps. The problem's complexity arises from the conflicting demands of executing human commands while maintaining stability through automatic control for underactuated robots. To tackle this challenge, we introduced an assisted piloting hybrid controller with a preemptive mechanism, that seamlessly switches between… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  22. arXiv:2406.09598  [pdf, other

    cs.CV

    Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking

    Authors: Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan

    Abstract: We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of object… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  23. arXiv:2406.08997  [pdf, ps, other

    cs.CV

    Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

    Authors: Fengyuan Zhang, Zhaopei Huang, Xinjie Zhang, Qin Jin

    Abstract: Micro-expressions serve as essential cues for understanding individuals' genuine emotional states. Recognizing micro-expressions attracts increasing research attention due to its various applications in fields such as business negotiation and psychotherapy. However, the intricate and transient nature of micro-expressions poses a significant challenge to their accurate recognition. Most existing wo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by ICME 2024

  24. arXiv:2406.08759  [pdf, other

    cs.CV cs.MM

    Gaussian-Forest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling

    Authors: Fengyi Zhang, Tianjun Zhang, Lin Zhang, Helen Huang, Yadan Luo

    Abstract: The field of novel-view synthesis has recently witnessed the emergence of 3D Gaussian Splatting, which represents scenes in a point-based manner and renders through rasterization. This methodology, in contrast to Radiance Fields that rely on ray tracing, demonstrates superior rendering quality and speed. However, the explicit and unstructured nature of 3D Gaussians poses a significant storage chal… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  25. arXiv:2406.08090  [pdf, other

    cs.CV

    From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization

    Authors: Ziran Zhang, Yongrui Ma, Yueting Chen, Feng Zhang, Jinwei Gu, Tianfan Xue, Shi Guo

    Abstract: Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably tr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  26. arXiv:2406.03394  [pdf, other

    cs.CV

    Gaussian Representation for Deformable Image Registration

    Authors: Jihe Li, Fabian Zhang, Xia Li, Tianhao Zhang, Ye Zhang, Joachim Buhmann

    Abstract: Deformable image registration (DIR) is a fundamental task in radiotherapy, with existing methods often struggling to balance computational efficiency, registration accuracy, and speed effectively. We introduce a novel DIR approach employing parametric 3D Gaussian control points achieving a better tradeoff. It provides an explicit and flexible representation for spatial deformation fields between 3… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  27. arXiv:2406.00947  [pdf, other

    cs.CV

    Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

    Authors: Fei Gao, Siwen Wang, Fandong Zhang, Hong-Yu Zhou, Yizhou Wang, Churan Wang, Gang Yu, Yizhou Yu

    Abstract: Medical image analysis suffers from a shortage of data, whether annotated or not. This becomes even more pronounced when it comes to 3D medical images. Self-Supervised Learning (SSL) can partially ease this situation by using unlabeled data. However, most existing SSL methods can only make use of data in a single dimensionality (e.g. 2D or 3D), and are incapable of enlarging the training dataset b… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 accept

  28. arXiv:2406.00707  [pdf, other

    cs.RO

    QUADFormer: Learning-based Detection of Cyber Attacks in Quadrotor UAVs

    Authors: Pengyu Wang, Zhaohua Yang, Nachuan Yang, Zikai Wang, Jialu Li, Fan Zhang, Chaoqun Wang, Jiankun Wang, Max Q. -H. Meng, Ling Shi

    Abstract: Safety-critical intelligent cyber-physical systems, such as quadrotor unmanned aerial vehicles (UAVs), are vulnerable to different types of cyber attacks, and the absence of timely and accurate attack detection can lead to severe consequences. When UAVs are engaged in large outdoor maneuvering flights, their system constitutes highly nonlinear dynamics that include non-Gaussian noises. Therefore,… ▽ More

    Submitted 14 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  29. arXiv:2406.00706  [pdf, other

    cs.RO

    MINER-RRT*: A Hierarchical and Fast Trajectory Planning Framework in 3D Cluttered Environments

    Authors: Pengyu Wang, Jiawei Tang, Hin Wang Lin, Fan Zhang, Chaoqun Wang, Jiankun Wang, Ling Shi, Max Q. -H. Meng

    Abstract: Trajectory planning for quadrotors in cluttered environments has been challenging in recent years. While many trajectory planning frameworks have been successful, there still exists potential for improvements, particularly in enhancing the speed of generating efficient trajectories. In this paper, we present a novel hierarchical trajectory planning framework to reduce computational time and memory… ▽ More

    Submitted 14 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  30. arXiv:2406.00312  [pdf, other

    cs.RO

    NuRF: Nudging the Particle Filter in Radiance Fields for Robot Visual Localization

    Authors: Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang

    Abstract: Can we localize a robot in radiance fields only using monocular vision? This study presents NuRF, a nudged particle filter framework for 6-DoF robot visual localization in radiance fields. NuRF sets anchors in SE(3) to leverage visual place recognition, which provides image comparisons to guide the sampling process. This guidance could improve the convergence and robustness of particle filters for… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 11 pages, 14 figures

  31. arXiv:2406.00212  [pdf, other

    eess.IV cs.CV

    MVAD: A Multiple Visual Artifact Detector for Video Streaming

    Authors: Chen Feng, Duolikun Danier, Fan Zhang, David Bull

    Abstract: Visual artifacts are often introduced into streamed video content, due to prevailing conditions during content production and/or delivery. Since these can degrade the quality of the user's experience, it is important to automatically and accurately detect them in order to enable effective quality measurement and enhancement. Existing detection methods often focus on a single type of artifact and/o… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 9 pages

  32. arXiv:2405.19883  [pdf, other

    cs.LG cs.AI cs.CL

    From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

    Authors: Jianliang He, Siyu Chen, Fengzhuo Zhang, Zhuoran Yang

    Abstract: In this work, from a theoretical lens, we aim to understand why large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. To this end, consider a hierarchical reinforcement learning (RL) model where the LLM Planner and the Actor perform high-level task planning and low-level execution, respectively. Under this model, the LLM Planner navigates a p… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  33. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, Jingyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  34. arXiv:2405.19614  [pdf, other

    cs.RO

    TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM

    Authors: Peifeng Jiang, Hong Liu, Xia Li, Ti Wang, Fabian Zhang, Joachim M. Buhmann

    Abstract: The limited robustness of 3D Gaussian Splatting (3DGS) to motion blur and camera noise, along with its poor real-time performance, restricts its application in robotic SLAM tasks. Upon analysis, the primary causes of these issues are the density of views with motion blur and the cumulative errors in dense pose estimation from calculating losses based on noisy original images and rendering results,… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  35. arXiv:2405.19000  [pdf, other

    cs.LG

    FedMAP: Unlocking Potential in Personalized Federated Learning through Bi-Level MAP Optimization

    Authors: Fan Zhang, Carlos Esteve-Yagüe, Sören Dittmer, Carola-Bibiane Schönlieb, Michael Roberts

    Abstract: Federated Learning (FL) enables collaborative training of machine learning models on decentralized data while preserving data privacy. However, data across clients often differs significantly due to class imbalance, feature distribution skew, sample size imbalance, and other phenomena. Leveraging information from these not identically distributed (non-IID) datasets poses substantial challenges. FL… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  36. arXiv:2405.18745  [pdf, other

    cs.CV

    PanoNormal: Monocular Indoor 360° Surface Normal Estimation

    Authors: Kun Huang, Fanglue Zhang, Neil Dodgson

    Abstract: The presence of spherical distortion on the Equirectangular image is an acknowledged challenge in dense regression computer vision tasks, such as surface normal estimation. Recent advances in convolutional neural networks (CNNs) strive to mitigate spherical distortion but often fall short in capturing holistic structures effectively, primarily due to their fixed receptive field. On the other hand,… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  37. arXiv:2405.18119  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Low-Resource Crop Classification from Multi-Spectral Time Series Using Lossless Compressors

    Authors: Wei Cheng, Hongrui Ye, Xiao Wen, Jiachen Zhang, Jiping Xu, Feifan Zhang

    Abstract: Deep learning has significantly improved the accuracy of crop classification using multispectral temporal data. However, these models have complex structures with numerous parameters, requiring large amounts of data and costly training. In low-resource situations with fewer labeled samples, deep learning models perform poorly due to insufficient data. Conversely, compressors are data-type agnostic… ▽ More

    Submitted 5 July, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 8 pages, 10 figures

  38. arXiv:2405.18001  [pdf, other

    cs.NI

    Network-Aware Reliability Modeling and Optimization for Microservice Placement

    Authors: Fangyu Zhang, Yuang Chen, Hancheng Lu, Yongsheng Huang

    Abstract: Optimizing microservice placement to enhance the reliability of services is crucial for improving the service level of microservice architecture-based mobile networks and Internet of Things (IoT) networks. Despite extensive research on service reliability, the impact of network load and routing on service reliability remains understudied, leading to suboptimal models and unsatisfactory performance… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 13 pages, 8 figures, submitted to IEEE transaction for potential publication

  39. arXiv:2405.17264  [pdf, other

    cs.CL cs.LG

    On the Noise Robustness of In-Context Learning for Text Generation

    Authors: Hongfu Gao, Feipeng Zhang, Wenyu Jiang, Jun Shu, Feng Zheng, Hongxin Wei

    Abstract: Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that in-context learning is robust to noisy demonstrations in text classification. In this work, we show that, on text generation tasks, noisy annotations significan… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  40. arXiv:2405.16114  [pdf, other

    cs.AI cs.CV cs.LG

    Multi-scale Quaternion CNN and BiGRU with Cross Self-attention Feature Fusion for Fault Diagnosis of Bearing

    Authors: Huanbai Liu, Fanlong Zhang, Yin Tan, Lian Huang, Yan Li, Guoheng Huang, Shenghong Luo, An Zeng

    Abstract: In recent years, deep learning has led to significant advances in bearing fault diagnosis (FD). Most techniques aim to achieve greater accuracy. However, they are sensitive to noise and lack robustness, resulting in insufficient domain adaptation and anti-noise ability. The comparison of studies reveals that giving equal attention to all features does not differentiate their significance. In this… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  41. arXiv:2405.15208  [pdf, other

    cs.CL cs.AI

    Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs

    Authors: Chenxi Sun, Hongzhi Zhang, Zijia Lin, Jingyuan Zhang, Fuzheng Zhang, Zhongyuan Wang, Bin Chen, Chengru Song, Di Zhang, Kun Gai, Deyi Xiong

    Abstract: Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the d… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at LREC-COLING 2024

  42. arXiv:2405.14832  [pdf, other

    cs.CV

    Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer

    Authors: Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, Jingxi Xu, Philip Torr, Xun Cao, Yao Yao

    Abstract: Generating high-quality 3D assets from text and images has long been challenging, primarily due to the absence of scalable 3D representations capable of capturing intricate geometry distributions. In this work, we introduce Direct3D, a native 3D generative model scalable to in-the-wild input images, without requiring a multiview diffusion model or SDS optimization. Our approach comprises two prima… ▽ More

    Submitted 1 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  43. arXiv:2405.14268  [pdf, other

    cs.NE cs.AI

    Multi-Representation Genetic Programming: A Case Study on Tree-based and Linear Representations

    Authors: Zhixing Huang, Yi Mei, Fangfang Zhang, Mengjie Zhang, Wolfgang Banzhaf

    Abstract: Existing genetic programming (GP) methods are typically designed based on a certain representation, such as tree-based or linear representations. These representations show various pros and cons in different domains. However, due to the complicated relationships among representation and fitness landscapes of GP, it is hard to intuitively determine which GP representation is the most suitable for s… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  44. Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

    Authors: Zhibo Chen, Heming Sun, Li Zhang, Fan Zhang

    Abstract: This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the domain of visual signal coding and processing. This survey study begins with a brief introduction of well-established generative models, including the Variational A… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  45. arXiv:2405.13952  [pdf, other

    cs.LG cs.AI

    Spectral Adapter: Fine-Tuning in Spectral Space

    Authors: Fangzhao Zhang, Mert Pilanci

    Abstract: Recent developments in Parameter-Efficient Fine-Tuning (PEFT) methods for pretrained deep neural networks have captured widespread interest. In this work, we study the enhancement of current PEFT methods by incorporating the spectral information of pretrained weight matrices into the fine-tuning procedure. We investigate two spectral adaptation mechanisms, namely additive tuning and orthogonal rot… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  46. EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy

    Authors: Yihong Huang, Yuang Zhang, Liping Wang, Fan Zhang, Xuemin Lin

    Abstract: Unsupervised Outlier Detection (UOD) is an important data mining task. With the advance of deep learning, deep Outlier Detection (OD) has received broad interest. Most deep UOD models are trained exclusively on clean datasets to learn the distribution of the normal data, which requires huge manual efforts to clean the real-world data if possible. Instead of relying on clean datasets, some approach… ▽ More

    Submitted 28 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  47. arXiv:2405.11616  [pdf, other

    cs.CV

    Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

    Authors: Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, Ping Tan, Wenping Wang, Qifeng Liu, Yike Guo

    Abstract: In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images. Specifically, these methods assume that the input images should… ▽ More

    Submitted 29 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

  48. arXiv:2405.11496  [pdf, other

    cs.CV cs.IR

    DEMO: A Statistical Perspective for Efficient Image-Text Matching

    Authors: Fan Zhang, Xian-Sheng Hua, Chong Chen, Xiao Luo

    Abstract: Image-text matching has been a long-standing problem, which seeks to connect vision and language through semantic understanding. Due to the capability to manage large-scale raw data, unsupervised hashing-based approaches have gained prominence recently. They typically construct a semantic similarity structure using the natural distance, which subsequently provides guidance to the model optimizatio… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  49. arXiv:2405.11344  [pdf

    cs.LG cs.AI

    Improved Content Understanding With Effective Use of Multi-task Contrastive Learning

    Authors: Akanksha Bindal, Sudarshan Ramanujam, Dave Golland, TJ Hazen, Tina Jiang, Fengyu Zhang, Peng Yan

    Abstract: In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling… ▽ More

    Submitted 21 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  50. arXiv:2405.10874  [pdf, other

    cs.RO

    Square-Root Inverse Filter-based GNSS-Visual-Inertial Navigation

    Authors: Jun Hu, Xiaoming Lang, Feng Zhang, Yinian Mao, Guoquan Huang

    Abstract: While Global Navigation Satellite System (GNSS) is often used to provide global positioning if available, its intermittency and/or inaccuracy calls for fusion with other sensors. In this paper, we develop a novel GNSS-Visual-Inertial Navigation System (GVINS) that fuses visual, inertial, and raw GNSS measurements within the square-root inverse sliding window filtering (SRI-SWF) framework in a tigh… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.