(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 276 results for author: Feng, R

.
  1. Achieving the Safety and Security of the End-to-End AV Pipeline

    Authors: Noah T. Curran, Minkyoung Cho, Ryan Feng, Liangkai Liu, Brian Jay Tang, Pedram MohajerAnsari, Alkim Domeke, Mert D. Pesé, Kang G. Shin

    Abstract: In the current landscape of autonomous vehicle (AV) safety and security research, there are multiple isolated problems being tackled by the community at large. Due to the lack of common evaluation criteria, several important research questions are at odds with one another. For instance, while much research has been conducted on physical attacks deceiving AV perception systems, there is often inade… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted to 1st Cyber Security in Cars Workshop (CSCS) at CCS

  2. arXiv:2409.03324  [pdf, ps, other

    math.PR

    Small gaps of GSE

    Authors: Renjie Feng, Jiaming Li, Dong Yao

    Abstract: In this paper, we study the smallest gaps for the Gaussian symplectic ensemble (GSE). We prove that the rescaled smallest gaps and their locations converge to a Poisson point process with an explicit rate. The approach provides an alternative proof for the GOE case and complements the results in \cite{FTW}. By combining the main results from \cite{BB, FTW, FW2}, the study of the smallest gaps for… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.02608  [pdf, other

    cs.CV

    A Medical Multimodal Large Language Model for Pediatric Pneumonia

    Authors: Weiwei Tian, Xinyu Huang, Tianhao Cheng, Wen He, Jinwu Fang, Rui Feng, Daoying Geng, Xiaobo Zhang

    Abstract: Pediatric pneumonia is the leading cause of death among children under five years worldwide, imposing a substantial burden on affected families. Currently, there are three significant hurdles in diagnosing and treating pediatric pneumonia. Firstly, pediatric pneumonia shares similar symptoms with other respiratory diseases, making rapid and accurate differential diagnosis challenging. Secondly, pr… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 18 pages, 10 figures

  4. arXiv:2408.05205  [pdf, other

    cs.CV

    Kalman-Inspired Feature Propagation for Video Face Super-Resolution

    Authors: Ruicheng Feng, Chongyi Li, Chen Change Loy

    Abstract: Despite the promising progress of face image super-resolution, video face super-resolution remains relatively under-explored. Existing approaches either adapt general video super-resolution networks to face datasets or apply established face image super-resolution models independently on individual video frames. These paradigms encounter challenges either in reconstructing facial details or mainta… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024. Project page: https://jnjaby.github.io/projects/KEEP/

  5. arXiv:2408.03124  [pdf, other

    eess.SY cs.LG

    Closed-loop Diffusion Control of Complex Physical Systems

    Authors: Long Wei, Haodong Feng, Peiyan Hu, Tao Zhang, Yuchen Yang, Xiang Zheng, Ruiqi Feng, Dixia Fan, Tailin Wu

    Abstract: The control problems of complex physical systems have wide applications in science and engineering. Several previous works have demonstrated that generative control methods based on diffusion models have significant advantages for solving these problems. However, existing generative control methods face challenges in handling closed-loop control, which is an inherent constraint for effective contr… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  6. arXiv:2408.02285  [pdf, other

    cs.CV

    Joint-Motion Mutual Learning for Pose Estimation in Videos

    Authors: Sifan Wu, Haipeng Chen, Yifang Yin, Sihao Hu, Runyang Feng, Yingying Jiao, Ziqi Yang, Zhenguang Liu

    Abstract: Human pose estimation in videos has long been a compelling yet challenging task within the realm of computer vision. Nevertheless, this task remains difficult because of the complex video scenes, such as video defocus and self-occlusion. Recent methods strive to integrate multi-frame visual features generated by a backbone network for pose estimation. However, they often ignore the useful joint in… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  7. arXiv:2408.01366  [pdf, other

    cs.RO cs.CV

    Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation

    Authors: Ruoxuan Feng, Di Hu, Wenke Ma, Xuelong Li

    Abstract: Humans possess a remarkable talent for flexibly alternating to different senses when interacting with the environment. Picture a chef skillfully gauging the timing of ingredient additions and controlling the heat according to the colors, sounds, and aromas, seamlessly navigating through every stage of the complex cooking process. This ability is founded upon a thorough comprehension of task stages… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  8. arXiv:2407.15173  [pdf, other

    cs.CV

    Rethinking Domain Adaptation and Generalization in the Era of CLIP

    Authors: Ruoyu Feng, Tao Yu, Xin Jin, Xiaoyuan Yu, Lei Xiao, Zhibo Chen

    Abstract: In recent studies on domain adaptation, significant emphasis has been placed on the advancement of learning shared knowledge from a source domain to a target domain. Recently, the large vision-language pre-trained model, i.e., CLIP has shown strong ability on zero-shot recognition, and parameter efficient tuning can further improve its performance on specific tasks. This work demonstrates that a s… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  9. arXiv:2407.11700  [pdf, other

    cs.CV eess.IV

    Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

    Authors: Jinming Liu, Ruoyu Feng, Yunpeng Qi, Qiuyu Chen, Zhibo Chen, Wenjun Zeng, Xin Jin

    Abstract: Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challeng… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  10. arXiv:2407.09705  [pdf, other

    cs.CV cs.AI cs.MM

    Diagnosing and Re-learning for Balanced Multimodal Learning

    Authors: Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu

    Abstract: To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal performance discrepancy as the basis. However, the intrinsic limitation of modality capacity is ignored. The scarcely informative modalities can be recognized as ``… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  11. arXiv:2407.06494  [pdf, other

    cs.LG cs.AI

    A Generative Approach to Control Complex Physical Systems

    Authors: Long Wei, Peiyan Hu, Ruiqi Feng, Haodong Feng, Yixuan Du, Tao Zhang, Rui Wang, Yue Wang, Zhi-Ming Ma, Tailin Wu

    Abstract: Controlling the evolution of complex physical systems is a fundamental task across science and engineering. Classical techniques suffer from limited applicability or huge computational costs. On the other hand, recent deep learning and reinforcement learning-based approaches often struggle to optimize long-term control sequences under the constraints of system dynamics. In this work, we introduce… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  12. arXiv:2407.03314  [pdf, other

    cs.CV cs.CL cs.DB

    BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

    Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

    Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  13. arXiv:2406.10517  [pdf, other

    cs.IR cs.AI cs.LG

    ADSNet: Cross-Domain LTV Prediction with an Adaptive Siamese Network in Advertising

    Authors: Ruize Wang, Hui Xu, Ying Cheng, Qi He, Xing Zhou, Rui Feng, Wei Xu, Lei Huang, Jie Jiang

    Abstract: Advertising platforms have evolved in estimating Lifetime Value (LTV) to better align with advertisers' true performance metric. However, the sparsity of real-world LTV data presents a significant challenge to LTV predictive model(i.e., pLTV), severely limiting the their capabilities. Therefore, we propose to utilize external data, in addition to the internal data of advertising platform, to expan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to KDD 2024

  14. arXiv:2406.07410  [pdf, other

    eess.AS

    Clever Hans Effect Found in Automatic Detection of Alzheimer's Disease through Speech

    Authors: Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling

    Abstract: We uncover an underlying bias present in the audio recordings produced from the picture description task of the Pitt corpus, the largest publicly accessible database for Alzheimer's Disease (AD) detection research. Even by solely utilizing the silent segments of these audio recordings, we achieve nearly 100% accuracy in AD detection. However, employing the same methods to other datasets and prepro… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  15. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  16. arXiv:2406.01597  [pdf, other

    cs.CV cs.GR

    End-to-End Rate-Distortion Optimized 3D Gaussian Representation

    Authors: Henan Wang, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen

    Abstract: 3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we formulate the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that can achieve flexible… ▽ More

    Submitted 9 April, 2024; originally announced June 2024.

  17. arXiv:2405.16980  [pdf, other

    cs.CV eess.IV

    DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking

    Authors: Hongtao Wang, Rongyu Feng, Liangyi Wu, Mutian Liu, Yinuo Cui, Chunxia Zhang, Zhenbo Guo

    Abstract: In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based pi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  18. arXiv:2405.14735  [pdf

    physics.optics

    Generalized all-optical complex exponential operator

    Authors: Baiqiao Chen, Qi Jia, Rui Feng, Fangkui Sun, Yongyin Cao, Jian Wang, Weiqiang Ding

    Abstract: Euler's formula, an extraordinary mathematical formula, establishes a vital link between complex-valued operations and trigonometric functions, finding widespread application in various fields. With the end of Moore's Law, electronic computing methods are encountering developmental bottlenecks. With its enviable potential, optical computing has successfully achieved high-speed operation of designe… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 17 pages, 4 figures, 1 table

  19. arXiv:2405.09786  [pdf, other

    cs.LG cs.CR

    IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency

    Authors: Linshan Hou, Ruili Feng, Zhongyun Hua, Wei Luo, Leo Yu Zhang, Yiming Li

    Abstract: Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries can maliciously trigger model misclassifications by implanting a hidden backdoor during model training. This paper proposes a simple yet effective input-level backdoor detection (dubbed IBD-PSC) as a `firewall' to filter out malicious testing images. Our method is motivated by an intriguing phenomenon, i.e., paramete… ▽ More

    Submitted 2 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024, 31 pages

  20. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  21. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  22. arXiv:2404.17433  [pdf, other

    cs.CV

    PromptCIR: Blind Compressed Image Restoration with Prompt Learning

    Authors: Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

    Abstract: Blind Compressed Image Restoration (CIR) has garnered significant attention due to its practical applications. It aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing works on blind CIR often seek assistance from a quality factor prediction network to facilitate their network to restore compressed images. However, the predicted numerical… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Winner of NTIRE 2024 Blind Compressed Image Enhancement Challenge

  23. arXiv:2404.09599  [pdf, other

    cs.CR

    Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation

    Authors: Shangqing Liu, Wei Ma, Jian Wang, Xiaofei Xie, Ruitao Feng, Yang Liu

    Abstract: Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task for example determining whether it is vulnerable or not. This poses a challenge for a single deep learning-based model to effectively lea… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  24. arXiv:2404.05169  [pdf, other

    cs.CV

    QMix: Quality-aware Learning with Mixed Noise for Robust Retinal Disease Diagnosis

    Authors: Junlin Hou, Jilan Xu, Rui Feng, Hao Chen

    Abstract: Due to the complexity of medical image acquisition and the difficulty of annotation, medical image datasets inevitably contain noise. Noisy data with wrong labels affects the robustness and generalization ability of deep neural networks. Previous noise learning methods mainly considered noise arising from images being mislabeled, i.e. label noise, assuming that all mislabeled images are of high im… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  25. arXiv:2404.02710  [pdf, other

    cs.CL eess.AS

    ART: The Alternating Reading Task Corpus for Speech Entrainment and Imitation

    Authors: Zheng Yuan, Dorina de Jong, Štefan Beňuš, Noël Nguyen, Ruitao Feng, Róbert Sabo, Luciano Fadiga, Alessandro D`Ausilio

    Abstract: We introduce the Alternating Reading Task (ART) Corpus, a collection of dyadic sentence reading for studying the entrainment and imitation behaviour in speech communication. The ART corpus features three experimental conditions - solo reading, alternating reading, and deliberate imitation - as well as three sub-corpora encompassing French-, Italian-, and Slovak-accented English. This design allows… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 15 pages, 2 figures, 7 tables, accepted at LREC-COLING 2024 conference

  26. FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion

    Authors: Qi Guo, Xiaohong Li, Xiaofei Xie, Shangqing Liu, Ze Tang, Ruitao Feng, Junjie Wang, Jidong Ge, Lei Bu

    Abstract: The rise of code pre-trained models has significantly enhanced various coding tasks, such as code completion, and tools like GitHub Copilot. However, the substantial size of these models, especially large models, poses a significant challenge when it comes to fine-tuning them for specific downstream tasks. As an alternative approach, retrieval-based methods have emerged as a promising solution, au… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: ISSTA 2024

  27. arXiv:2404.00964  [pdf, other

    cs.CV

    S2RC-GCN: A Spatial-Spectral Reliable Contrastive Graph Convolutional Network for Complex Land Cover Classification Using Hyperspectral Images

    Authors: Renxiang Guan, Zihao Li, Chujia Song, Guo Yu, Xianju Li, Ruyi Feng

    Abstract: Spatial correlations between different ground objects are an important feature of mining land cover research. Graph Convolutional Networks (GCNs) can effectively capture such spatial feature representations and have demonstrated promising results in performing hyperspectral imagery (HSI) classification tasks of complex land. However, the existing GCN-based HSI classification methods are prone to i… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to IJCNN 2024 (International Joint Conference on Neural Networks)

  28. arXiv:2403.13228  [pdf, ps, other

    math.RA math.CA

    Hilbert's Irreducibility Theorem for Linear Differential Operators

    Authors: Ruyong Feng, Zewang Guo, Wei Lu

    Abstract: We prove a differential analogue of Hilbert's irreducibility theorem. Let $\mathcal{L}$ be a linear differential operator with coefficients in $C(\mathbb{X})(x)$ that is irreducible over $\overline{C(\mathbb{X})}(x)$, where $\mathbb{X}$ is an irreducible affine algebraic variety over an algebraically closed field $C$ of characteristic zero. We show that the set of $c\in \mathbb{X}(C)$ such that th… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    MSC Class: 16S32; 68W30

  29. arXiv:2403.11953  [pdf, other

    eess.IV cs.CV

    Advancing COVID-19 Detection in 3D CT Scans

    Authors: Qingqiu Li, Runtian Yuan, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

    Abstract: To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior kno… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  30. arXiv:2403.11498  [pdf, other

    eess.IV cs.CV

    Domain Adaptation Using Pseudo Labels for COVID-19 Detection

    Authors: Runtian Yuan, Qingqiu Li, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

    Abstract: In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent he… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  31. arXiv:2403.09294  [pdf, other

    cs.CV cs.CL

    Anatomical Structure-Guided Medical Vision-Language Pre-training

    Authors: Qingqiu Li, Xiaohan Yan, Jilan Xu, Runtian Yuan, Yuejie Zhang, Rui Feng, Quanli Shen, Xiaobo Zhang, Shujun Wang

    Abstract: Learning medical visual representations through vision-language pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (A… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  32. arXiv:2402.19387  [pdf, other

    eess.IV cs.CV

    SeD: Semantic-Aware Discriminator for Image Super-Resolution

    Authors: Bingchen Li, Xin Li, Hanxin Zhu, Yeying Jin, Ruoyu Feng, Zhizheng Zhang, Zhibo Chen

    Abstract: Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner. However, the distribution learning is overly coarse-grained, which is susceptible to virtual textures and caus… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR2024

  33. arXiv:2402.18180  [pdf, other

    cs.CY

    Human Simulacra: Benchmarking the Personification of Large Language Models

    Authors: Qiuejie Xie, Qiming Feng, Tianqi Zhang, Qingqiu Li, Linyi Yang, Yuejie Zhang, Rui Feng, Liang He, Shang Gao, Yue Zhang

    Abstract: Large language models (LLMs) are recognized as systems that closely mimic aspects of human intelligence. This capability has attracted attention from the social science community, who see the potential in leveraging LLMs to replace human participants in experiments, thereby reducing research costs and complexity. In this paper, we introduce a framework for large language models personification, in… ▽ More

    Submitted 9 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  34. arXiv:2402.14983  [pdf, other

    cs.LG cs.CR q-fin.RM

    Privacy-Enhancing Collaborative Information Sharing through Federated Learning -- A Case of the Insurance Industry

    Authors: Panyi Dong, Zhiyu Quan, Brandon Edwards, Shih-han Wang, Runhuan Feng, Tianyang Wang, Patrick Foley, Prashant Shah

    Abstract: The report demonstrates the benefits (in terms of improved claims loss modeling) of harnessing the value of Federated Learning (FL) to learn a single model across multiple insurance industry datasets without requiring the datasets themselves to be shared from one company to another. The application of FL addresses two of the most pressing concerns: limited data volume and data variety, which are c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  35. arXiv:2402.04684  [pdf, ps, other

    math.CO cs.SC

    Parallel Summation in P-Recursive Extensions

    Authors: Shaoshi Chen, Ruyong Feng, Manuel Kauers, Xiuyun Li

    Abstract: We propose investigating a summation analog of the paradigm for parallel integration. We make some first steps towards an indefinite summation method applicable to summands that rationally depend on the summation index and a P-recursive sequence and its shifts. There is a distinction between so-called normal and so-called special polynomials. Under the assumption that the corresponding difference… ▽ More

    Submitted 7 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  36. arXiv:2401.13959  [pdf, other

    eess.IV cs.CV

    Conditional Neural Video Coding with Spatial-Temporal Super-Resolution

    Authors: Henan Wang, Xiaohan Pan, Runsen Feng, Zongyu Guo, Zhibo Chen

    Abstract: This document is an expanded version of a one-page abstract originally presented at the 2024 Data Compression Conference. It describes our proposed method for the video track of the Challenge on Learned Image Compression (CLIC) 2024. Our scheme follows the typical hybrid coding framework with some novel techniques. Firstly, we adopt Spynet network to produce accurate motion vectors for motion esti… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted by the 2024 Data Compression Conference (DCC) for presentation as a poster

  37. arXiv:2401.06166  [pdf

    q-bio.BM cs.AI cs.LG

    AdaMR: Adaptable Molecular Representation for Unified Pre-training Strategy

    Authors: Yan Ding, Hao Cheng, Ziliang Ye, Ruyi Feng, Wei Tian, Peng Xie, Juan Zhang, Zhongze Gu

    Abstract: We propose Adjustable Molecular Representation (AdaMR), a new large-scale uniform pre-training strategy for small-molecule drugs, as a novel unified pre-training strategy. AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization, setting it apart from recent large-scale molecular models. This adaptabilit… ▽ More

    Submitted 27 April, 2024; v1 submitted 28 December, 2023; originally announced January 2024.

  38. arXiv:2401.02686  [pdf, other

    cs.CR cs.LG cs.SE

    Beyond Fidelity: Explaining Vulnerability Localization of Learning-based Detectors

    Authors: Baijun Cheng, Shengming Zhao, Kailong Wang, Meizhen Wang, Guangdong Bai, Ruitao Feng, Yao Guo, Lei Ma, Haoyu Wang

    Abstract: Vulnerability detectors based on deep learning (DL) models have proven their effectiveness in recent years. However, the shroud of opacity surrounding the decision-making process of these detectors makes it difficult for security analysts to comprehend. To address this, various explanation approaches have been proposed to explain the predictions by highlighting important features, which have been… ▽ More

    Submitted 21 February, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted by Tosem

  39. arXiv:2401.00789  [pdf, other

    cs.CV

    Retrieval-Augmented Egocentric Video Captioning

    Authors: Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie

    Abstract: Understanding human actions from videos of first-person view poses significant challenges. Most prior approaches explore representation learning on egocentric videos only, while overlooking the potential benefit of exploiting existing large-scale third-person videos. In this paper, (1) we develop EgoInstructor, a retrieval-augmented multimodal captioning model that automatically retrieves semantic… ▽ More

    Submitted 19 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: CVPR 2024. Project page is available at: https://jazzcharles.github.io/Egoinstructor/

  40. arXiv:2312.15674  [pdf, other

    cs.MA

    Multi-Task Multi-Agent Shared Layers are Universal Cognition of Multi-Agent Coordination

    Authors: Jiawei Wang, Jian Zhao, Zhengtao Cao, Ruili Feng, Rongjun Qin, Yang Yu

    Abstract: Multi-agent reinforcement learning shines as the pinnacle of multi-agent systems, conquering intricate real-world challenges, fostering collaboration and coordination among agents, and unleashing the potential for intelligent decision-making across domains. However, training a multi-agent reinforcement learning network is a formidable endeavor, demanding substantial computational resources to inte… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  41. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  42. arXiv:2312.11521  [pdf, other

    cs.CL cs.AI

    Large Language Models are Complex Table Parsers

    Authors: Bowen Zhao, Changkai Ji, Yuejie Zhang, Wen He, Yingwen Wang, Qing Wang, Rui Feng, Xiaobo Zhang

    Abstract: With the Generative Pre-trained Transformer 3.5 (GPT-3.5) exhibiting remarkable reasoning and comprehension abilities in Natural Language Processing (NLP), most Question Answering (QA) research has primarily centered around general QA tasks based on GPT, neglecting the specific challenges posed by Complex Table QA. In this paper, we propose to incorporate GPT-3.5 to address such challenges, in whi… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: EMNLP 2023 Main

  43. arXiv:2312.06068  [pdf, other

    cs.CV cs.AI

    Contrastive Multi-view Subspace Clustering of Hyperspectral Images based on Graph Convolutional Networks

    Authors: Renxiang Guan, Zihao Li, Xianju Li, Chang Tang, Ruyi Feng

    Abstract: High-dimensional and complex spectral structures make the clustering of hyperspectral images (HSI) a challenging task. Subspace clustering is an effective approach for addressing this problem. However, current subspace clustering algorithms are primarily designed for a single view and do not fully exploit the spatial or textural feature information in HSI. In this study, contrastive multi-view sub… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  44. arXiv:2312.02684  [pdf, other

    cs.CV cs.LG cs.RO

    DeepPointMap: Advancing LiDAR SLAM with Unified Neural Descriptors

    Authors: Xiaze Zhang, Ziheng Ding, Qi Jing, Yuejie Zhang, Wenchao Ding, Rui Feng

    Abstract: Point clouds have shown significant potential in various domains, including Simultaneous Localization and Mapping (SLAM). However, existing approaches either rely on dense point clouds to achieve high localization accuracy or use generalized descriptors to reduce map size. Unfortunately, these two aspects seem to conflict with each other. To address this limitation, we propose a unified architectu… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  45. arXiv:2312.01454  [pdf, other

    cs.DB cs.AI cs.CL cs.LG

    D-Bot: Database Diagnosis System using Large Language Models

    Authors: Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, Jianming Wu, Jiesi Liu, Ruohang Feng, Guoyang Zeng

    Abstract: Database administrators (DBAs) play an important role in managing, maintaining and optimizing database systems. However, it is hard and tedious for DBAs to manage a large number of databases and give timely response (waiting for hours is intolerable in many online cases). In addition, existing empirical methods only support limited diagnosis scenarios, which are also labor-intensive to update the… ▽ More

    Submitted 5 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

  46. arXiv:2312.00568  [pdf, ps, other

    eess.SP

    A WINNER+ Based 3-D Non-Stationary Wideband MIMO Channel Model

    Authors: Ji Bian, Jian Sun, Cheng-Xiang Wang, Rui Feng, Jie Huang, Yang Yang, Minggao Zhang

    Abstract: In this paper, a three-dimensional (3-D) non-stationary wideband multiple-input multiple-output (MIMO) channel model based on the WINNER+ channel model is proposed. The angular distributions of clusters in both the horizontal and vertical planes are jointly considered. The receiver and clusters can be moving, which makes the model more general. Parameters including number of clusters, powers, dela… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  47. arXiv:2311.18834  [pdf, other

    cs.CV

    ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

    Authors: Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

    Abstract: We present ART$\boldsymbol{\cdot}$V, an efficient framework for auto-regressive video generation with diffusion models. Unlike existing methods that generate entire videos in one-shot, ART$\boldsymbol{\cdot}$V generates a single frame at a time, conditioned on the previous ones. The framework offers three distinct advantages. First, it only learns simple continual motions between adjacent frames,… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 24 pages, 21 figures. Project page at https://warranweng.github.io/art.v

  48. arXiv:2311.18829  [pdf, other

    cs.CV

    MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

    Authors: Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

    Abstract: We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy which divides the text-to-video into a two-stage process: text-to-image generation and image\&text-to-video generation. This strategy offers two signific… ▽ More

    Submitted 29 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Project page: https://wangyanhui666.github.io/MicroCinema.github.io/

  49. arXiv:2311.14934  [pdf, other

    cs.LG

    Robust Graph Neural Networks via Unbiased Aggregation

    Authors: Ruiqi Feng, Zhichao Hou, Tyler Derr, Xiaorui Liu

    Abstract: The adversarial robustness of Graph Neural Networks (GNNs) has been questioned due to the false sense of security uncovered by strong adaptive attacks despite the existence of numerous defenses. In this work, we delve into the robustness analysis of representative robust GNNs and provide a unified robust estimation point of view to understand their robustness and limitations. Our novel analysis of… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  50. arXiv:2311.12892  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    IMJENSE: Scan-specific Implicit Representation for Joint Coil Sensitivity and Image Estimation in Parallel MRI

    Authors: Ruimin Feng, Qing Wu, Jie Feng, Huajun She, Chunlei Liu, Yuyao Zhang, Hongjiang Wei

    Abstract: Parallel imaging is a commonly used technique to accelerate magnetic resonance imaging (MRI) data acquisition. Mathematically, parallel MRI reconstruction can be formulated as an inverse problem relating the sparsely sampled k-space measurements to the desired MRI image. Despite the success of many existing reconstruction algorithms, it remains a challenge to reliably reconstruct a high-quality im… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.