(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 446 results for author: Yan, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.05477  [pdf, other

    cs.LG

    Retrofitting Temporal Graph Neural Networks with Transformer

    Authors: Qiang Huang, Xiao Yan, Xin Wang, Susie Xi Rao, Zhichao Han, Fangcheng Fu, Wentao Zhang, Jiawei Jiang

    Abstract: Temporal graph neural networks (TGNNs) outperform regular GNNs by incorporating time information into graph-based operations. However, TGNNs adopt specialized models (e.g., TGN, TGAT, and APAN ) and require tailored training frameworks (e.g., TGL and ETC). In this paper, we propose TF-TGN, which uses Transformer decoder as the backbone model for TGNN to enjoy Transformer's codebase for efficient t… ▽ More

    Submitted 10 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: conference Under review

  2. arXiv:2409.03512  [pdf, other

    cs.CY cs.CL

    From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

    Authors: Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang , et al. (8 additional authors not shown)

    Abstract: Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integ… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.00727  [pdf, other

    cs.AI cs.CL cs.IR

    Hound: Hunting Supervision Signals for Few and Zero Shot Node Classification on Text-attributed Graph

    Authors: Yuxiang Wang, Xiao Yan, Shiyu Jin, Quanqing Xu, Chuanhui Yang, Yuanyuan Zhu, Chuang Hu, Bo Du, Jiawei Jiang

    Abstract: Text-attributed graph (TAG) is an important type of graph structured data with text descriptions for each node. Few- and zero-shot node classification on TAGs have many applications in fields such as academia and social networks. However, the two tasks are challenging due to the lack of supervision signals, and existing methods only use the contrastive loss to align graph-based node embedding and… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  4. arXiv:2409.00618  [pdf, other

    cs.CV

    YOLOO: You Only Learn from Others Once

    Authors: Lipeng Gu, Mingqiang Wei, Xuefeng Yan, Dingkun Zhu, Wei Zhao, Haoran Xie, Yong-Jin Liu

    Abstract: Multi-modal 3D multi-object tracking (MOT) typically necessitates extensive computational costs of deep neural networks (DNNs) to extract multi-modal representations. In this paper, we propose an intriguing question: May we learn from multiple modalities only during training to avoid multi-modal input in the inference phase? To answer it, we propose \textbf{YOLOO}, a novel multi-modal 3D MOT parad… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  5. arXiv:2408.17009  [pdf, other

    cs.SD eess.AS

    Utilizing Speaker Profiles for Impersonation Audio Detection

    Authors: Hao Gu, JiangYan Yi, Chenglong Wang, Yong Ren, Jianhua Tao, Xinrui Yan, Yujie Chen, Xiaohui Zhang

    Abstract: Fake audio detection is an emerging active topic. A growing number of literatures have aimed to detect fake utterance, which are mostly generated by Text-to-speech (TTS) or voice conversion (VC). However, countermeasures against impersonation remain an underexplored area. Impersonation is a fake type that involves an imitator replicating specific traits and speech style of a target speaker. Unlike… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM2024

  6. arXiv:2408.16094  [pdf, ps, other

    cs.DC

    Monadring: A lightweight consensus protocol to offer Validation-as-a-Service to AVS nodes

    Authors: Yu Zhang, Xiao Yan, Gang Tang, Helena Wang

    Abstract: Existing blockchain networks are often large-scale, requiring transactions to be synchronized across the entire network to reach consensus. On-chain computations can be prohibitively expensive, making many CPU-intensive computations infeasible. Inspired by the structure of IBM's token ring networks, we propose a lightweight consensus protocol called Monadring to address these issues. Monadring all… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 23 pages, 3 figures

  7. arXiv:2408.11323  [pdf, other

    cs.CV

    Optimizing Transmit Field Inhomogeneity of Parallel RF Transmit Design in 7T MRI using Deep Learning

    Authors: Zhengyi Lu, Hao Liang, Xiao Wang, Xinqiang Yan, Yuankai Huo

    Abstract: Ultrahigh field (UHF) Magnetic Resonance Imaging (MRI) provides a higher signal-to-noise ratio and, thereby, higher spatial resolution. However, UHF MRI introduces challenges such as transmit radiofrequency (RF) field (B1+) inhomogeneities, leading to uneven flip angles and image intensity anomalies. These issues can significantly degrade imaging quality and its medical applications. This study ad… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  8. arXiv:2408.09762  [pdf, other

    cs.LG

    Sequential Federated Learning in Hierarchical Architecture on Non-IID Datasets

    Authors: Xingrun Yan, Shiyuan Zuo, Rongfei Fan, Han Hu, Li Shen, Puning Zhao, Yong Luo

    Abstract: In a real federated learning (FL) system, communication overhead for passing model parameters between the clients and the parameter server (PS) is often a bottleneck. Hierarchical federated learning (HFL) that poses multiple edge servers (ESs) between clients and the PS can partially alleviate communication pressure but still needs the aggregation of model parameters from multiple ESs at the PS. T… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  9. arXiv:2408.09539  [pdf, other

    cs.LG cs.DC

    Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets

    Authors: Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Li Shen, Puning Zhao, Jie Xu, Han Hu

    Abstract: In practical federated learning (FL) systems, the presence of malicious Byzantine attacks and data heterogeneity often introduces biases into the learning process. However, existing Byzantine-robust methods typically only achieve a compromise between adaptability to different loss function types (including both strongly convex and non-convex) and robustness to heterogeneous datasets, but with non-… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  10. arXiv:2408.08191  [pdf, other

    cs.CV

    Beyond Full Label: Single-Point Prompt for Infrared Small Target Label Generation

    Authors: Shuai Yuan, Hanlin Qin, Renke Kou, Xiang Yan, Zechuan Li, Chenxu Peng, Abd-Krim Seghouane

    Abstract: In this work, we make the first attempt to construct a learning-based single-point annotation paradigm for infrared small target label generation (IRSTLG). Our intuition is that label generation requires just one more point prompt than target detection: IRSTLG can be regarded as an infrared small target detection (IRSTD) task with the target location hint. Based on this insight, we introduce an en… ▽ More

    Submitted 20 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  11. arXiv:2408.04967  [pdf, other

    eess.AS cs.SD

    ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

    Authors: Jiangyan Yi, Chu Yuan Zhang, Jianhua Tao, Chenglong Wang, Xinrui Yan, Yong Ren, Hao Gu, Junzuo Zhou

    Abstract: The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area. The ADD 2023 challenge goes beyond binary real/fake classification by emulating real-world scenarios, such as the identification of manip… ▽ More

    Submitted 12 September, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  12. arXiv:2408.03178  [pdf, other

    cs.CV cs.GR cs.LG

    An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion

    Authors: Xingguang Yan, Han-Hung Lee, Ziyu Wan, Angel X. Chang

    Abstract: We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Project Page: https://omages.github.io/

  13. arXiv:2408.01691  [pdf, other

    cs.LG cs.AI

    TreeCSS: An Efficient Framework for Vertical Federated Learning

    Authors: Qinbo Zhang, Xiao Yan, Yukai Ding, Quanqing Xu, Chuang Hu, Xiaokai Zhou, Jiawei Jiang

    Abstract: Vertical federated learning (VFL) considers the case that the features of data samples are partitioned over different participants. VFL consists of two main steps, i.e., identify the common data samples for all participants (alignment) and train model using the aligned data samples (training). However, when there are many participants and data samples, both alignment and training become slow. As s… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 16 pages, 7 figures

  14. arXiv:2407.20224  [pdf, other

    cs.CL

    Can Editing LLMs Inject Harm?

    Authors: Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu

    Abstract: Knowledge editing has been increasingly adopted to correct the false or outdated knowledge in Large Language Models (LLMs). Meanwhile, one critical but under-explored question is: can knowledge editing be used to inject harm into LLMs? In this paper, we propose to reformulate knowledge editing as a new type of safety threat for LLMs, namely Editing Attack, and conduct a systematic investigation wi… ▽ More

    Submitted 16 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally. 9 pages for main paper, 36 pages including appendix. The code, results, dataset for this paper and more resources are on the project website: https://llm-editing.github.io

  15. arXiv:2407.16943  [pdf

    cs.CV

    McGAN: Generating Manufacturable Designs by Embedding Manufacturing Rules into Conditional Generative Adversarial Network

    Authors: Zhichao Wang, Xiaoliang Yan, Shreyes Melkote, David Rosen

    Abstract: Generative design (GD) methods aim to automatically generate a wide variety of designs that satisfy functional or aesthetic design requirements. However, research to date generally lacks considerations of manufacturability of the generated designs. To this end, we propose a novel GD approach by using deep neural networks to encode design for manufacturing (DFM) rules, thereby modifying part design… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  16. arXiv:2407.16715  [pdf

    q-bio.QM cs.AI cs.LG

    Research on Adverse Drug Reaction Prediction Model Combining Knowledge Graph Embedding and Deep Learning

    Authors: Yufeng Li, Wenchao Zhao, Bo Dang, Xu Yan, Weimin Wang, Min Gao, Mingxuan Xiao

    Abstract: In clinical treatment, identifying potential adverse reactions of drugs can help assist doctors in making medication decisions. In response to the problems in previous studies that features are high-dimensional and sparse, independent prediction models need to be constructed for each adverse reaction of drugs, and the prediction accuracy is low, this paper develops an adverse drug reaction predict… ▽ More

    Submitted 27 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 12 pages, 4 figures, 9 tables

  17. arXiv:2407.14146  [pdf, other

    cs.MM

    Fine-grained Knowledge Graph-driven Video-Language Learning for Action Recognition

    Authors: Rui Zhang, Yafen Lu, Pengli Ji, Junxiao Xue, Xiaoran Yan

    Abstract: Recent work has explored video action recognition as a video-text matching problem and several effective methods have been proposed based on large-scale pre-trained vision-language models. However, these approaches primarily operate at a coarse-grained level without the detailed and semantic understanding of action concepts by exploiting fine-grained semantic connections between actions and body m… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  18. arXiv:2407.13417  [pdf, other

    cs.CV

    GDDS: A Single Domain Generalized Defect Detection Frame of Open World Scenario using Gather and Distribute Domain-shift Suppression Network

    Authors: Haiyong Chen, Yaxiu Zhang, Yan Zhang, Xin Zhang, Xingwei Yan

    Abstract: Efficient and intelligent surface defect detection of photovoltaic modules is crucial for improving the quality of photovoltaic modules and ensuring the reliable operation of large-scale infrastructure. However, the scenario characteristics of data distribution deviation make the construction of defect detection models for open world scenarios such as photovoltaic manufacturing and power plant ins… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 13 images

    ACM Class: I.4.9; I.5.1

  19. arXiv:2407.09731  [pdf, other

    cs.NE

    Sliding Window Bi-Objective Evolutionary Algorithms for Optimizing Chance-Constrained Monotone Submodular Functions

    Authors: Xiankun Yan, Aneta Neumann, Frank Neumann

    Abstract: Variants of the GSEMO algorithm using multi-objective formulations have been successfully analyzed and applied to optimize chance-constrained submodular functions. However, due to the effect of the increasing population size of the GSEMO algorithm considered in these studies from the algorithms, the approach becomes ineffective if the number of trade-offs obtained grows quickly during the optimiza… ▽ More

    Submitted 7 August, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  20. arXiv:2407.04903  [pdf, other

    cs.CL cs.AI cs.CV

    MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension

    Authors: Zekun Li, Xianjun Yang, Kyuri Choi, Wanrong Zhu, Ryan Hsieh, HyeonJung Kim, Jin Hyuk Lim, Sungyoung Ji, Byungju Lee, Xifeng Yan, Linda Ruth Petzold, Stephen D. Wilson, Woosang Lim, William Yang Wang

    Abstract: The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks pr… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Code and data are available at https://github.com/Leezekun/MMSci

  21. arXiv:2406.18032  [pdf, other

    cs.CR cs.DC cs.NI

    A Communication Satellite Servises Based Decentralized Network Protocol

    Authors: Xiao Yan, Bernie Gao

    Abstract: In this paper, we present a decentralized network protocol, Space Network Protocol, based on Communication Satellite Services. The protocol outlines a method for distributing information about the status of satellite communication services across the entire blockchain network, facilitating fairness and transparency in all communication services. Our primary objective is to standardize the services… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  22. arXiv:2406.13125  [pdf, other

    cs.AI

    A Unified Framework for Combinatorial Optimization Based on Graph Neural Networks

    Authors: Yaochu Jin, Xueming Yan, Shiqing Liu, Xiangyu Wang

    Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for solving combinatorial optimization problems (COPs), exhibiting state-of-the-art performance in both graph-structured and non-graph-structured domains. However, existing approaches lack a unified framework capable of addressing a wide range of COPs. After presenting a summary of representative COPs and a brief review of recent advance… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  23. arXiv:2406.11633  [pdf, other

    cs.CV

    DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

    Authors: Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao

    Abstract: Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract… ▽ More

    Submitted 11 September, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Homepage of DocGenome: https://unimodal4reasoning.github.io/DocGenome_page 22 pages, 11 figures

  24. arXiv:2406.08524  [pdf, other

    cs.LG cs.DC

    Federated Incomplete Multi-View Clustering with Heterogeneous Graph Neural Networks

    Authors: Xueming Yan, Ziqi Wang, Yaochu Jin

    Abstract: Federated multi-view clustering offers the potential to develop a global clustering model using data distributed across multiple devices. However, current methods face challenges due to the absence of label information and the paramount importance of data privacy. A significant issue is the feature heterogeneity across multi-view data, which complicates the effective mining of complementary cluste… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  25. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 12 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  26. arXiv:2406.07498  [pdf, other

    cs.SD eess.AS

    RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention

    Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's perfor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  27. arXiv:2406.05422  [pdf, other

    cs.AI cs.RO

    Diffusion-based Reinforcement Learning for Dynamic UAV-assisted Vehicle Twins Migration in Vehicular Metaverses

    Authors: Yongju Tong, Jiawen Kang, Junlong Chen, Minrui Xu, Gaolei Li, Weiting Zhang, Xincheng Yan

    Abstract: Air-ground integrated networks can relieve communication pressure on ground transportation networks and provide 6G-enabled vehicular Metaverses services offloading in remote areas with sparse RoadSide Units (RSUs) coverage and downtown areas where users have a high demand for vehicular services. Vehicle Twins (VTs) are the digital twins of physical vehicles to enable more immersive and realistic v… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  28. arXiv:2405.20858  [pdf, other

    cs.RO

    CSDO: Enhancing Efficiency and Success in Large-Scale Multi-Vehicle Trajectory Planning

    Authors: Yibin Yang, Shaobing Xu, Xintao Yan, Junkai Jiang, Jianqiang Wang, Heye Huang

    Abstract: This paper presents an efficient algorithm, naming Centralized Searching and Decentralized Optimization (CSDO), to find feasible solution for large-scale Multi-Vehicle Trajectory Planning (MVTP) problem. Due to the intractable growth of non-convex constraints with the number of agents, exploring various homotopy classes that imply different convex domains, is crucial for finding a feasible solutio… ▽ More

    Submitted 30 July, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: 8 pages, 7 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  29. arXiv:2405.20336  [pdf, other

    cs.CV cs.SD eess.AS

    RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

    Authors: Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan

    Abstract: In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works that typically address these two modalities in isolation. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rapping vocals, lyrics, and high-quality 3D holistic bo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project website: https://vis-www.cs.umass.edu/RapVerse

  30. arXiv:2405.17729  [pdf, other

    cs.CV cs.MM

    Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

    Authors: Rui Zhang, Shuailong Li, Junxiao Xue, Feng Lin, Qing Zhang, Xiao Ma, Xiaoran Yan

    Abstract: Video recognition remains an open challenge, requiring the identification of diverse content categories within videos. Mainstream approaches often perform flat classification, overlooking the intrinsic hierarchical structure relating categories. To address this, we formalize the novel task of hierarchical video recognition, and propose a video-language learning framework tailored for hierarchical… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  31. arXiv:2405.13646  [pdf

    cs.LG

    A Transformer variant for multi-step forecasting of water level and hydrometeorological sensitivity analysis based on explainable artificial intelligence technology

    Authors: Mingyu Liu, Nana Bao, Xingting Yan, Chenyang Li, Kai Peng

    Abstract: Understanding the combined influences of meteorological and hydrological factors on water level and flood events is essential, particularly in today's changing climate environments. Transformer, as one kind of the cutting-edge deep learning methods, offers an effective approach to model intricate nonlinear processes, enables the extraction of key features and water level predictions. EXplainable A… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  32. arXiv:2405.11884  [pdf, other

    cs.LG cs.DC

    Vertical Federated Learning Hybrid Local Pre-training

    Authors: Wenguo Li, Xinling Guo, Xu Jiao, Tiancheng Huang, Xiaoran Yan, Yao Yang

    Abstract: Vertical Federated Learning (VFL), which has a broad range of real-world applications, has received much attention in both academia and industry. Enterprises aspire to exploit more valuable features of the same users from diverse departments to boost their model prediction skills. VFL addresses this demand and concurrently secures individual parties from exposing their raw data. However, conventio… ▽ More

    Submitted 21 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  33. arXiv:2405.09083  [pdf, other

    cs.CV

    RSHazeDiff: A Unified Fourier-aware Diffusion Model for Remote Sensing Image Dehazing

    Authors: Jiamei Xiong, Xuefeng Yan, Yongzhen Wang, Wei Zhao, Xiao-Ping Zhang, Mingqiang Wei

    Abstract: Haze severely degrades the visual quality of remote sensing images and hampers the performance of automotive navigation, intelligent monitoring, and urban management. The emerging denoising diffusion probabilistic model (DDPM) exhibits the significant potential for dense haze removal with its strong generation ability. Since remote sensing images contain extensive small-scale texture structures, i… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  34. arXiv:2405.05231  [pdf, other

    cs.LG

    DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training

    Authors: Renjie Liu, Yichuan Wang, Xiao Yan, Zhenkun Cai, Minjie Wang, Haitian Jiang, Bo Tang, Jinyang Li

    Abstract: Graph neural networks (GNNs) are machine learning models specialized for graph data and widely used in many applications. To train GNNs on large graphs that exceed CPU memory, several systems store data on disk and conduct out-of-core processing. However, these systems suffer from either read amplification when reading node features that are usually smaller than a disk page or degraded model accur… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  35. arXiv:2405.00989  [pdf

    cs.CV cs.LG

    Estimate the building height at a 10-meter resolution based on Sentinel data

    Authors: Xin Yan

    Abstract: Building height is an important indicator for scientific research and practical application. However, building height products with a high spatial resolution (10m) are still very scarce. To meet the needs of high-resolution building height estimation models, this study established a set of spatial-spectral-temporal feature databases, combining SAR data provided by Sentinel-1, optical data provided… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  36. Misaka: Interactive Swarm Testbed for Smart Grid Distributed Algorithm Test and Evaluation

    Authors: Tingliang Zhang, Haiwang Zhong, Zhenfei Tan, Xinfei Yan

    Abstract: In this paper, we present Misaka, a visualized swarm testbed for smart grid algorithm evaluation, also an extendable open-source open-hardware platform for developing tabletop tangible swarm interfaces. The platform consists of a collection of custom-designed 3 omni-directional wheels robots each 10 cm in diameter, high accuracy localization through a microdot pattern overlaid on top of the activi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Journal ref: 2020 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia)

  37. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  38. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  39. arXiv:2404.13496  [pdf, other

    math.NA cs.AI

    ODE-DPS: ODE-based Diffusion Posterior Sampling for Inverse Problems in Partial Differential Equation

    Authors: Enze Jiang, Jishen Peng, Zheng Ma, Xiong-Bin Yan

    Abstract: In recent years we have witnessed a growth in mathematics for deep learning, which has been used to solve inverse problems of partial differential equations (PDEs). However, most deep learning-based inversion methods either require paired data or necessitate retraining neural networks for modifications in the conditions of the inverse problem, significantly reducing the efficiency of inversion and… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  40. arXiv:2404.11945  [pdf, other

    cs.RO

    Terrain-Aware Stride-Level Trajectory Forecasting for a Powered Hip Exoskeleton via Vision and Kinematics Fusion

    Authors: Ruoqi Zhao, Xingbang Yan, Yubo Fan

    Abstract: Powered hip exoskeletons have shown the ability for locomotion assistance during treadmill walking. However, providing suitable assistance in real-world walking scenarios which involve changing terrain remains challenging. Recent research suggests that forecasting the lower limb joint's angles could provide target trajectories for exoskeletons and prostheses, and the performance could be improved… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 6 pages, submitted to IEEE RA-L, under review. This work has been submitted to the IEEE Robotics and Automation Letters (RA-L) for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  41. arXiv:2404.11907  [pdf, other

    cs.AI

    Sampling-based Pareto Optimization for Chance-constrained Monotone Submodular Problems

    Authors: Xiankun Yan, Aneta Neumann, Frank Neumann

    Abstract: Recently surrogate functions based on the tail inequalities were developed to evaluate the chance constraints in the context of evolutionary computation and several Pareto optimization algorithms using these surrogates were successfully applied in optimizing chance-constrained monotone submodular problems. However, the difference in performance between algorithms using the surrogates and those emp… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  42. arXiv:2404.10441  [pdf, other

    cs.CV

    1st Place Solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction

    Authors: Hang Du, Yaping Xue, Weidong Dai, Xuejun Yan, Jingjing Wang

    Abstract: In this report, we present the 1st place solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction. The challenge aims to evaluate approaches for novel view synthesis and surface reconstruction using only a few posed images of each object. We utilize Pixel-NeRF as the basic model, and apply depth supervision as well as coarse-to-fine positional encoding. The experiments demonstrate… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  43. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  44. arXiv:2404.09790  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

  45. arXiv:2404.09324  [pdf, other

    cs.MA

    Correlated Mean Field Imitation Learning

    Authors: Zhiyu Zhao, Ning Yang, Xue Yan, Haifeng Zhang, Jun Wang, Yaodong Yang

    Abstract: We investigate multi-agent imitation learning (IL) within the framework of mean field games (MFGs), considering the presence of time-varying correlated signals. Existing MFG IL algorithms assume demonstrations are sampled from Mean Field Nash Equilibria (MFNE), limiting their adaptability to real-world scenarios. For example, in the traffic network equilibrium influenced by public routing recommen… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 23 pages

  46. arXiv:2404.09226  [pdf

    eess.IV cs.CV cs.LG

    Breast Cancer Image Classification Method Based on Deep Transfer Learning

    Authors: Weimin Wang, Yufeng Li, Xu Yan, Mingxuan Xiao, Min Gao

    Abstract: To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attenti… ▽ More

    Submitted 11 September, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: 12 pages, 8 figures, 2024 International Conference on Image Processing, Machine Learning and Pattern Recognition

  47. arXiv:2404.08979  [pdf, other

    cs.CV cs.LG

    BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection

    Authors: Jian Zhang, Ruiteng Zhang, Xinyue Yan, Xiting Zhuang, Ruicheng Cao

    Abstract: Degraded underwater images decrease the accuracy of underwater object detection. However, existing methods for underwater image enhancement mainly focus on improving the indicators in visual aspects, which may not benefit the tasks of underwater image detection, and may lead to serious degradation in performance. To alleviate this problem, we proposed a bidirectional-guided method for underwater o… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures, 4 tables

    MSC Class: 68T07; 68T45 ACM Class: I.4.3; I.4.8; I.4.9; I.4.10; I.2.10

  48. arXiv:2404.08713  [pdf, other

    eess.IV cs.LG q-bio.QM

    Survival Prediction Across Diverse Cancer Types Using Neural Networks

    Authors: Xu Yan, Weimin Wang, MingXuan Xiao, Yufeng Li, Min Gao

    Abstract: Gastric cancer and Colon adenocarcinoma represent widespread and challenging malignancies with high mortality rates and complex treatment landscapes. In response to the critical need for accurate prognosis in cancer patients, the medical community has embraced the 5-year survival rate as a vital metric for estimating patient outcomes. This study introduces a pioneering approach to enhance survival… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  49. arXiv:2404.08279  [pdf, other

    eess.IV cs.CV cs.LG

    Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example

    Authors: MingXuan Xiao, Yufeng Li, Xu Yan, Min Gao, Weimin Wang

    Abstract: Breast cancer is a relatively common cancer among gynecological cancers. Its diagnosis often relies on the pathology of cells in the lesion. The pathological diagnosis of breast cancer not only requires professionals and time, but also sometimes involves subjective judgment. To address the challenges of dependence on pathologists expertise and the time-consuming nature of achieving accurate breast… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  50. arXiv:2404.07962  [pdf, other

    cs.CV cs.LG

    Live and Learn: Continual Action Clustering with Incremental Views

    Authors: Xiaoqiang Yan, Yingtao Gan, Yiqiao Mao, Yangdong Ye, Hui Yu

    Abstract: Multi-view action clustering leverages the complementary information from different camera views to enhance the clustering performance. Although existing approaches have achieved significant progress, they assume all camera views are available in advance, which is impractical when the camera view is incremental over time. Besides, learning the invariant information among multiple camera views is s… ▽ More

    Submitted 22 March, 2024; originally announced April 2024.