(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 202 results for author: Gong, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.02586  [pdf, other

    cs.IT eess.SP

    Massive MIMO-OTFS-Based Random Access for Cooperative LEO Satellite Constellations

    Authors: Boxiao Shen, Yongpeng Wu, Shiqi Gong, Heng Liu, Björn Ottersten, Wenjun Zhang

    Abstract: This paper investigates joint device identification, channel estimation, and symbol detection for cooperative multi-satellite-enhanced random access, where orthogonal time-frequency space modulation with the large antenna array is utilized to combat the dynamics of the terrestrial-satellite links (TSLs). We introduce the generalized complex exponential basis expansion model to parameterize TSLs, t… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by IEEE Journal on Selected Areas in Communications

  2. arXiv:2407.16131  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Crystals with Transformers on Graphs, for Prediction of Unconventional Crystal Material Properties and the Benchmark

    Authors: Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong, Bolong Huang, Hua Zhang

    Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of cr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  3. arXiv:2407.14544  [pdf, other

    cs.DC

    Fast Iterative Graph Computing with Updated Neighbor States

    Authors: Yijie Zhou, Shufeng Gong, Feng Yao, Hanzhang Chen, Song Yu, Pengxi Liu, Yanfeng Zhang, Ge Yu, Jeffrey Xu Yu

    Abstract: Enhancing the efficiency of iterative computation on graphs has garnered considerable attention in both industry and academia. Nonetheless, the majority of efforts focus on expediting iterative computation by minimizing the running time per iteration step, ignoring the optimization of the number of iteration rounds, which is a crucial aspect of iterative computation. We experimentally verified the… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 14 pages, 13 figures, 2 tables; accepted for publication in ICDE 2024

  4. arXiv:2407.13076  [pdf, other

    cs.MA cs.NI eess.SP

    Matching-Driven Deep Reinforcement Learning for Energy-Efficient Transmission Parameter Allocation in Multi-Gateway LoRa Networks

    Authors: Ziqi Lin, Xu Zhang, Shimin Gong, Lanhua Li, Zhou Su, Bo Gu

    Abstract: Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things. Nevertheless, the LoRa MAC layer adopts pure ALOHA for medium access control, which may suffer from severe packet collisions as the network scale expands, consequently reducing the system energy efficiency (EE). To address this issue, it is… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  5. arXiv:2407.12014  [pdf, other

    cs.HC cs.CY

    Surprising Performances of Students with Autism in Classroom with NAO Robot

    Authors: Qin Yang, Huan Lu, Dandan Liang, Shengrong Gong, Huanghao Feng

    Abstract: Autism is a developmental disorder that manifests in early childhood and persists throughout life, profoundly affecting social behavior and hindering the acquisition of learning and social skills in those diagnosed. As technological advancements progress, an increasing array of technologies is being utilized to support the education of students with Autism Spectrum Disorder (ASD), aiming to improv… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  6. arXiv:2407.10753  [pdf, other

    cs.CV

    OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

    Authors: Jinghua Hou, Tong Wang, Xiaoqing Ye, Zhe Liu, Shi Gong, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai

    Abstract: Accurate depth information is crucial for enhancing the performance of multi-view 3D object detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise depth supervision, they overlook two significant phenomena: 1) the depth supervision obtained from LiDAR points is usually distributed on the surface of the object, which is not so friendly to existing DETR-based 3D… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  7. arXiv:2407.07249  [pdf, other

    cs.CV

    Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion

    Authors: Yu Cao, Shaogang Gong

    Abstract: In the field of Few-Shot Image Generation (FSIG) using Deep Generative Models (DGMs), accurately estimating the distribution of target domain with minimal samples poses a significant challenge. This requires a method that can both capture the broad diversity and the true characteristics of the target domain distribution. We present Conditional Relaxing Diffusion Inversion (CRDI), an innovative `tr… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  8. arXiv:2407.05679  [pdf, other

    cs.CV cs.AI

    BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

    Authors: Yumeng Zhang, Shi Gong, Kaixin Xiong, Xiaoqing Ye, Xiao Tan, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

    Abstract: World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence… ▽ More

    Submitted 18 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: 10 pages

  9. arXiv:2407.05118  [pdf, other

    cs.CV

    SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding

    Authors: Zixu Cheng, Yujiang Pu, Shaogang Gong, Parisa Kordjamshidi, Yu Kong

    Abstract: Temporal grounding, also known as video moment retrieval, aims at locating video segments corresponding to a given query sentence. The compositional nature of natural language enables the localization beyond predefined events, posing a certain challenge to the compositional generalizability of existing methods. Recent studies establish the correspondence between videos and queries through a decomp… ▽ More

    Submitted 15 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  10. arXiv:2407.03804  [pdf, other

    cs.LG cs.NI

    Multi-Time Scale Service Caching and Pricing in MEC Systems with Dynamic Program Popularity

    Authors: Yiming Chen, Xingyuan Hu, Bo Gu, Shimin Gong, Zhou Su

    Abstract: In mobile edge computing systems, base stations (BSs) equipped with edge servers can provide computing services to users to reduce their task execution time. However, there is always a conflict of interest between the BS and users. The BS prices the service programs based on user demand to maximize its own profit, while the users determine their offloading strategies based on the prices to minimiz… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  11. arXiv:2406.17880  [pdf, other

    cs.CV

    MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval

    Authors: Weitong Cai, Jiabo Huang, Shaogang Gong, Hailin Jin, Yang Liu

    Abstract: Video Moment Retrieval (VMR) aims to localize a specific temporal segment within an untrimmed long video given a natural language query. Existing methods often suffer from inadequate training annotations, i.e., the sentence typically matches with a fraction of the prominent video content in the foreground with limited wording diversity. This intrinsic modality imbalance leaves a considerable porti… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Under review

  12. arXiv:2406.16715  [pdf, other

    cs.LG

    GC-Bench: A Benchmark Framework for Graph Condensation with New Insights

    Authors: Shengbo Gong, Juntong Ni, Noveen Sachdeva, Carl Yang, Wei Jin

    Abstract: Graph condensation (GC) is an emerging technique designed to learn a significantly smaller graph that retains the essential information of the original graph. This condensed graph has shown promise in accelerating graph neural networks while preserving performance comparable to those achieved with the original, larger graphs. Additionally, this technique facilitates downstream applications such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 9 pages

  13. arXiv:2406.08835  [pdf, other

    cs.SD eess.AS

    A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

    Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao

    Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  14. arXiv:2406.01791  [pdf, other

    cs.CV

    Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels

    Authors: Weitong Cai, Jiabo Huang, Shaogang Gong

    Abstract: Video moment retrieval (VMR) is to search for a visual temporal moment in an untrimmed raw video by a given text query description (sentence). Existing studies either start from collecting exhaustive frame-wise annotations on the temporal boundary of target moments (fully-supervised), or learn with only the video-level video-text pairing labels (weakly-supervised). The former is poor in generalisa… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by BMVC2022

  15. arXiv:2405.19100  [pdf, other

    cs.CV

    Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer

    Authors: Zengqun Zhao, Yu Cao, Shaogang Gong, Ioannis Patras

    Abstract: Current facial expression recognition (FER) models are often designed in a supervised learning manner and thus are constrained by the lack of large-scale facial expression images with high-quality annotations. Consequently, these models often fail to generalize well, performing poorly on unseen images in inference. Vision-language-based zero-shot models demonstrate a promising potential for addres… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: The code and pre-trained models are available at https://github.com/zengqunzhao/Exp-CLIP

  16. arXiv:2405.18725  [pdf, other

    cs.LG cs.MA

    Can We Enhance the Quality of Mobile Crowdsensing Data Without Ground Truth?

    Authors: Jiajie Li, Bo Gu, Shimin Gong, Zhou Su, Mohsen Guizani

    Abstract: Mobile crowdsensing (MCS) has emerged as a prominent trend across various domains. However, ensuring the quality of the sensing data submitted by mobile users (MUs) remains a complex and challenging problem. To address this challenge, an advanced method is required to detect low-quality sensing data and identify malicious MUs that may disrupt the normal operations of an MCS system. Therefore, this… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  17. arXiv:2405.08278  [pdf, other

    cs.CR cs.SI

    Facilitating Feature and Topology Lightweighting: An Ethereum Transaction Graph Compression Method for Malicious Account Detection

    Authors: Jiajun Zhou, Xuanze Chen, Shengbo Gong, Chenkai Hu, Chengxiang Jin, Shanqing Yu, Qi Xuan

    Abstract: Ethereum has become one of the primary global platforms for cryptocurrency, playing an important role in promoting the diversification of the financial ecosystem. However, the relative lag in regulation has led to a proliferation of malicious activities in Ethereum, posing a serious threat to fund security. Existing regulatory methods usually detect malicious accounts through feature engineering o… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted by International Conference on Blockchain and Trustworthy Systems 2024

  18. arXiv:2404.19449  [pdf, other

    cs.IT

    AoI-aware Sensing Scheduling and Trajectory Optimization for Multi-UAV-assisted Wireless Backscatter Networks

    Authors: Yusi Long, Songhan Zhao, Shimin Gong, Bo Gu, Dusit Niyato, Xuemin, Shen

    Abstract: This paper considers multiple unmanned aerial vehicles (UAVs) to assist sensing data transmissions from the ground users (GUs) to a remote base station (BS). Each UAV collects sensing data from the GUs and then forwards the sensing data to the remote BS. The GUs first backscatter their data to the UAVs and then all UAVs forward data to the BS by the nonorthogonal multiple access (NOMA) transmissio… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by IEEE TVT

  19. arXiv:2404.07181  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    BAMBOO: a predictive and transferable machine learning force field framework for liquid electrolyte development

    Authors: Sheng Gong, Yumin Zhang, Zhenliang Mu, Zhichen Pu, Hongyi Wang, Zhiao Yu, Mengyi Chen, Tianze Zheng, Zhi Wang, Lifei Chen, Xiaojie Wu, Shaochen Shi, Weihao Gao, Wen Yan, Liang Xiang

    Abstract: Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for l… ▽ More

    Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  20. arXiv:2404.05192  [pdf, other

    cs.LG

    ATFNet: Adaptive Time-Frequency Ensembled Network for Long-term Time Series Forecasting

    Authors: Hengyu Ye, Jiadong Chen, Shijin Gong, Fuxin Jiang, Tieying Zhang, Jianjun Chen, Xiaofeng Gao

    Abstract: The intricate nature of time series data analysis benefits greatly from the distinct advantages offered by time and frequency domain representations. While the time domain is superior in representing local dependencies, particularly in non-periodic series, the frequency domain excels in capturing global dependencies, making it ideal for series with evident periodic patterns. To capitalize on both… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  21. arXiv:2404.04647  [pdf, other

    cs.CV

    Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training

    Authors: Shizhan Gong, Qi Dou, Farzan Farnia

    Abstract: Gradient-based saliency maps have been widely used to explain the decisions of deep neural network classifiers. However, standard gradient-based interpretation maps, including the simple gradient and integrated gradient algorithms, often lack desired structures such as sparsity and connectedness in their application to real-world computer vision models. A frequently used approach to inducing spars… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  22. arXiv:2404.00598  [pdf, other

    cs.IT eess.SP

    Robust Beamforming Design and Antenna Selection for Dynamic HRIS-aided Massive MIMO Systems

    Authors: Jintao Wang, Binggui Zhou, Chengzhi Ma, Shiqi Gong, Guanghua Yang, Shaodan Ma

    Abstract: In this paper, a dynamic hybrid active-passive reconfigurable intelligent surface (HRIS) is proposed to further enhance the massive multiple-input-multiple-output (MIMO) system, since it supports the dynamic placement of active and passive elements. Specifically, considering the impact of the hardware impairments (HWIs), we investigate the channel-aware configuration of the receive antennas at the… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 5 pages, 2 figures

  23. arXiv:2403.14943  [pdf, ps, other

    cs.IT eess.SP

    Primary Rate Maximization in Movable Antennas Empowered Symbiotic Radio Communications

    Authors: Bin Lyu, Hao Liu, Wenqing Hong, Shimin Gong, Feng Tian

    Abstract: In this paper, we propose a movable antenna (MA) empowered scheme for symbiotic radio (SR) communication systems. Specifically, multiple antennas at the primary transmitter (PT) can be flexibly moved to favorable locations to boost the channel conditions of the primary and secondary transmissions. The primary transmission is achieved by the active transmission from the PT to the primary user (PU),… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: To appear in IEEE VTC-Spring 2024. 6 Pages,5 figures

  24. arXiv:2403.04326  [pdf, other

    eess.SY cs.AI cs.LG

    Edge-based Parametric Digital Twins for Intelligent Building Indoor Climate Modeling

    Authors: Zhongjun Ni, Chi Zhang, Magnus Karlsson, Shaofang Gong

    Abstract: Digital transformation in the built environment generates vast data for developing data-driven models to optimize building operations. This study presents an integrated solution utilizing edge computing, digital twins, and deep learning to enhance the understanding of climate in buildings. Parametric digital twins, created using an ontology, ensure consistent data representation across diverse ser… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 8 pages, 8 figures, accepted in the 20th IEEE International Conference on Factory Communication Systems

    MSC Class: 68T07 ACM Class: I.5.4

  25. arXiv:2402.17463  [pdf, other

    cs.CL

    Training-Free Long-Context Scaling of Large Language Models

    Authors: Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong

    Abstract: The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of more than 100k tokens without continual training. By… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  26. arXiv:2402.15095  [pdf, ps, other

    math.ST cs.DS cs.LG math.PR

    The Umeyama algorithm for matching correlated Gaussian geometric models in the low-dimensional regime

    Authors: Shuyang Gong, Zhangsong Li

    Abstract: Motivated by the problem of matching two correlated random geometric graphs, we study the problem of matching two Gaussian geometric models correlated through a latent node permutation. Specifically, given an unknown permutation $πぱい^*$ on $\{1,\ldots,n\}$ and given $n$ i.i.d. pairs of correlated Gaussian vectors $\{X_{πぱい^*(i)},Y_i\}$ in $\mathbb{R}^d$ with noise parameter $σしぐま$, we consider two types… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 31 pages

    MSC Class: 68Q87 (Primary); 62M15 (Secondary)

  27. arXiv:2402.13577  [pdf, other

    cs.CL

    BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

    Authors: Xueliang Zhao, Xinting Huang, Tingchen Fu, Qintong Li, Shansan Gong, Lemao Liu, Wei Bi, Lingpeng Kong

    Abstract: Multimodal reasoning stands as a pivotal capability for large vision-language models (LVLMs). The integration with Domain-Specific Languages (DSL), offering precise visual representations, equips these models with the opportunity to execute more accurate reasoning in complex and professional domains. However, the vanilla Chain-of-Thought (CoT) prompting method faces challenges in effectively lever… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Preprint

  28. arXiv:2402.07754  [pdf, other

    cs.CL cs.AI cs.LG

    Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models

    Authors: Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong

    Abstract: Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models. In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language m… ▽ More

    Submitted 15 July, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Multiple updates (add boolean logic dataset, add DoT based on SEDD model and add detailed mathematical formulation in Appendix)

  29. arXiv:2402.03358  [pdf, other

    cs.SI cs.AI cs.DS cs.LG

    A Comprehensive Survey on Graph Reduction: Sparsification, Coarsening, and Condensation

    Authors: Mohammad Hashemi, Shengbo Gong, Juntong Ni, Wenqi Fan, B. Aditya Prakash, Wei Jin

    Abstract: Many real-world datasets can be naturally represented as graphs, spanning a wide range of domains. However, the increasing complexity and size of graph datasets present significant challenges for analysis and computation. In response, graph reduction, or graph summarization, has gained prominence for simplifying large graphs while preserving essential properties. In this survey, we aim to provide… ▽ More

    Submitted 29 June, 2024; v1 submitted 28 January, 2024; originally announced February 2024.

    Comments: Accepted by IJCAI 2024 (This ArXiv version is a long version of our IJCAI paper)

  30. arXiv:2402.02950  [pdf, other

    cs.CR eess.SP

    Semantic Entropy Can Simultaneously Benefit Transmission Efficiency and Channel Security of Wireless Semantic Communications

    Authors: Yankai Rong, Guoshun Nan, Minwei Zhang, Sihan Chen, Songtao Wang, Xuefei Zhang, Nan Ma, Shixun Gong, Zhaohui Yang, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek

    Abstract: Recently proliferated deep learning-based semantic communications (DLSC) focus on how transmitted symbols efficiently convey a desired meaning to the destination. However, the sensitivity of neural models and the openness of wireless channels cause the DLSC system to be extremely fragile to various malicious attacks. This inspires us to ask a question: "Can we further exploit the advantages of tra… ▽ More

    Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 13 pages, 12 figures

  31. arXiv:2402.02673  [pdf, other

    cs.GT

    A Unified Framework of Multi-Stage Multi-Winner Voting: An Axiomatic Exploration

    Authors: Shengjie Gong, Lingxiao Huang, Shuangping Huang, Yuyi Wang, Zhiqi Wang, Tao Xiao, Xiang Yan, Chunxue Yang

    Abstract: Multi-winner voting plays a crucial role in selecting representative committees based on voter preferences. Previous research has predominantly focused on single-stage voting rules, which are susceptible to manipulation during preference collection. In order to mitigate manipulation and increase the cost associated with it, we propose the introduction of multiple stages in the voting procedure, le… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  32. arXiv:2402.02430  [pdf, other

    cs.CV cs.LG

    Exploiting Low-level Representations for Ultra-Fast Road Segmentation

    Authors: Huan Zhou, Feng Xue, Yucong Li, Shi Gong, Yiqun Li, Yu Zhou

    Abstract: Achieving real-time and accuracy on embedded platforms has always been the pursuit of road segmentation methods. To this end, they have proposed many lightweight networks. However, they ignore the fact that roads are "stuff" (background or environmental elements) rather than "things" (specific identifiable objects), which inspires us to explore the feasibility of representing roads with low-level… ▽ More

    Submitted 6 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 11 pages, 7 figures, IEEE TITS

  33. arXiv:2401.13329  [pdf, other

    cs.CV

    Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval

    Authors: Dezhao Luo, Shaogang Gong, Jiabo Huang, Hailin Jin, Yang Liu

    Abstract: Video Moment Retrieval (VMR) requires precise modelling of fine-grained moment-text associations to capture intricate visual-language relationships. Due to the lack of a diverse and generalisable VMR dataset to facilitate learning scalable moment-text associations, existing methods resort to joint training on both source and target domain videos for cross-domain applications. Meanwhile, recent dev… ▽ More

    Submitted 29 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  34. arXiv:2401.11205  [pdf, other

    cs.IT eess.SP

    Joint Beamforming Optimization and Mode Selection for RDARS-aided MIMO Systems

    Authors: Jintao Wang, Chengzhi Ma, Shiqi Gong, Xi Yang, Shaodan Ma

    Abstract: Considering the appealing distribution gains of distributed antenna systems (DAS) and passive gains of reconfigurable intelligent surface (RIS), a flexible reconfigurable architecture called reconfigurable distributed antenna and reflecting surface (RDARS) is proposed. RDARS encompasses DAS and RIS as two special cases and maintains the advantages of distributed antennas while reducing the hardwar… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Comments: 13 pages, 9 figures. This paper has been submitted to IEEE journal for possible publication

  35. arXiv:2312.08924  [pdf, other

    cs.CV

    Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking

    Authors: Shitong Sun, Fanghua Ye, Shaogang Gong

    Abstract: Composed image retrieval attempts to retrieve an image of interest from gallery images through a composed query of a reference image and its corresponding modified text. It has recently attracted attention due to the collaboration of information-rich images and concise language to precisely express the requirements of target images. Most current composed image retrieval methods follow a supervised… ▽ More

    Submitted 24 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Under Review

  36. arXiv:2312.07374  [pdf, other

    cs.CV

    Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects

    Authors: Jian Hu, Jiayi Lin, Weitong Cai, Shaogang Gong

    Abstract: Camouflaged object detection (COD) approaches heavily rely on pixel-level annotated datasets. Weakly-supervised COD (WSCOD) approaches use sparse annotations like scribbles or points to reduce annotation effort, but this can lead to decreased accuracy. The Segment Anything Model (SAM) shows remarkable segmentation ability with sparse prompts like points. However, manual prompt is not always feasib… ▽ More

    Submitted 18 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  37. arXiv:2311.14837  [pdf, other

    cs.CV cs.IR

    Benchmarking Robustness of Text-Image Composed Retrieval

    Authors: Shitong Sun, Jindong Gu, Shaogang Gong

    Abstract: Text-image composed retrieval aims to retrieve the target image through the composed query, which is specified in the form of an image plus some text that describes desired modifications to the input image. It has recently attracted attention due to its ability to leverage both information-rich images and concise language to precisely express the requirements for target images. However, the robust… ▽ More

    Submitted 30 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Accepted by R0-FoMo: Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models at NeurIPS 2023

  38. arXiv:2311.11574  [pdf, other

    cs.IT

    A Framework on Complex Matrix Derivatives with Special Structure Constraints for Wireless Systems

    Authors: Xin Ju, Shiqi Gong, Nan Zhao, Chengwen Xing, Arumugam Nallanathan, Dusit Niyato

    Abstract: Matrix-variate optimization plays a central role in advanced wireless system designs. In this paper, we aim to explore optimal solutions of matrix variables under two special structure constraints using complex matrix derivatives, including diagonal structure constraints and constant modulus constraints, both of which are closely related to the state-of-the-art wireless applications. Specifically,… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  39. arXiv:2310.15913  [pdf, other

    cs.CV

    Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person ReID

    Authors: Qilei Li, Shaogang Gong

    Abstract: While deep learning has significantly improved ReID model accuracy under the independent and identical distribution (IID) assumption, it has also become clear that such models degrade notably when applied to an unseen novel domain due to unpredictable/unknown domain shift. Contemporary domain generalization (DG) ReID models struggle in learning domain-invariant representation solely through traini… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to WACV2024

  40. arXiv:2310.05793  [pdf, other

    cs.LG cs.CL

    DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models

    Authors: Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

    Abstract: Diffusion models have gained prominence in generating high-quality sequences of text. Nevertheless, current approaches predominantly represent discrete text within a continuous diffusion space, which incurs substantial computational overhead during training and results in slower sampling speeds. In this paper, we introduce a soft absorbing state that facilitates the diffusion model in learning to… ▽ More

    Submitted 16 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Findings Camera-ready

  41. arXiv:2310.00856  [pdf, other

    cs.SI

    Multi-triplet Feature Augmentation for Ponzi Scheme Detection in Ethereum

    Authors: Chengxiang Jin, Jiajun Zhou, Shengbo Gong, Chenxuan Xie, Qi Xuan

    Abstract: Blockchain technology revolutionizes the Internet, but also poses increasing risks, particularly in cryptocurrency finance. On the Ethereum platform, Ponzi schemes, phishing scams, and a variety of other frauds emerge. Existing Ponzi scheme detection approaches based on heterogeneous transaction graph modeling leverages semantic information between node (account) pairs to establish connections, ov… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: Accepted by 2023 IEEE International Conference on Data Mining Workshops (ICDMW)

  42. arXiv:2309.08965  [pdf, other

    cs.AI cs.LG cs.MA

    Multiagent Reinforcement Learning with an Attention Mechanism for Improving Energy Efficiency in LoRa Networks

    Authors: Xu Zhang, Ziqi Lin, Shimin Gong, Bo Gu, Dusit Niyato

    Abstract: Long Range (LoRa) wireless technology, characterized by low power consumption and a long communication range, is regarded as one of the enabling technologies for the Industrial Internet of Things (IIoT). However, as the network scale increases, the energy efficiency (EE) of LoRa networks decreases sharply due to severe packet collisions. To address this issue, it is essential to appropriately assi… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: 6 pages, 3 figures, This paper has been accepted for publication in IEEE Global Communications Conference (GLOBECOM) 2023

  43. arXiv:2309.04478  [pdf

    cond-mat.mtrl-sci cs.AI cs.LG

    Multimodal machine learning for materials science: composition-structure bimodal learning for experimentally measured properties

    Authors: Sheng Gong, Shuo Wang, Taishan Zhu, Yang Shao-Horn, Jeffrey C. Grossman

    Abstract: The widespread application of multimodal machine learning models like GPT-4 has revolutionized various research fields including computer vision and natural language processing. However, its implementation in materials informatics remains underexplored, despite the presence of materials data across diverse modalities, such as composition and structure. The effectiveness of machine learning models… ▽ More

    Submitted 3 August, 2023; originally announced September 2023.

  44. arXiv:2309.00661  [pdf, other

    cs.CV

    Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models

    Authors: Dezhao Luo, Jiabo Huang, Shaogang Gong, Hailin Jin, Yang Liu

    Abstract: Accurate video moment retrieval (VMR) requires universal visual-textual correlations that can handle unknown vocabulary and unseen scenes. However, the learned correlations are likely either biased when derived from a limited amount of moment-text data which is hard to scale up because of the prohibitive annotation cost (fully-supervised), or unreliable when only the video-text pairwise relationsh… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: Accepted by WACV 2024

  45. arXiv:2308.10427  [pdf, other

    cs.LG cs.CR cs.DC

    Federated Learning Robust to Byzantine Attacks: Achieving Zero Optimality Gap

    Authors: Shiyuan Zuo, Rongfei Fan, Han Hu, Ning Zhang, Shimin Gong

    Abstract: In this paper, we propose a robust aggregation method for federated learning (FL) that can effectively tackle malicious Byzantine attacks. At each user, model parameter is firstly updated by multiple steps, which is adjustable over iterations, and then pushed to the aggregation center directly. This decreases the number of interactions between the aggregation center and users, allows each user to… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  46. arXiv:2308.02242  [pdf, ps, other

    cs.NI

    Countering Eavesdroppers with Meta-learning-based Cooperative Ambient Backscatter Communications

    Authors: Nam H. Chu, Nguyen Van Huynh, Diep N. Nguyen, Dinh Thai Hoang, Shimin Gong, Tao Shu, Eryk Dutkiewicz, Khoa T. Phan

    Abstract: This article introduces a novel lightweight framework using ambient backscattering communications to counter eavesdroppers. In particular, our framework divides an original message into two parts: (i) the active-transmit message transmitted by the transmitter using conventional RF signals and (ii) the backscatter message transmitted by an ambient backscatter tag that backscatters upon the active s… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  47. arXiv:2307.12639  [pdf, other

    cs.SI cs.CL cs.GR cs.LG

    Fake News Detection Through Graph-based Neural Networks: A Survey

    Authors: Shuzhi Gong, Richard O. Sinnott, Jianzhong Qi, Cecile Paris

    Abstract: The popularity of online social networks has enabled rapid dissemination of information. People now can share and consume information much more rapidly than ever before. However, low-quality and/or accidentally/deliberately fake information can also spread rapidly. This can lead to considerable and negative impacts on society. Identifying, labelling and debunking online misinformation as early as… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 18 pages, 3 tables, 7 figures

  48. arXiv:2307.11088  [pdf, other

    cs.CL

    L-Eval: Instituting Standardized Evaluation for Long Context Language Models

    Authors: Chenxin An, Shansan Gong, Ming Zhong, Xingjian Zhao, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu

    Abstract: Recently, there has been growing interest in extending the context length of large language models (LLMs), aiming to effectively process long inputs of one turn or conversations with more extensive histories. While proprietary models such as GPT-4 and Claude can largely preserve the reasoning ability in an extended context, open-source models are still progressing through the early stages of devel… ▽ More

    Submitted 4 October, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

  49. arXiv:2307.08631  [pdf, ps, other

    cs.IT eess.SP

    Dual-Functional MIMO Beamforming Optimization for RIS-Aided Integrated Sensing and Communication

    Authors: Xin Zhao, Heng Liu, Shiqi Gong, Xin Ju, Chengwen Xing, Nan Zhao

    Abstract: Aiming at providing wireless communication systems with environment-perceptive capacity, emerging integrated sensing and communication (ISAC) technologies face multiple difficulties, especially in balancing the performance trade-off between the communication and radar functions. In this paper, we introduce a reconfigurable intelligent surface (RIS) to assist both data transmission and target detec… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 30 pages, 8 figures, manuscript submitted to IEEE TCOM

  50. arXiv:2307.07726  [pdf, ps, other

    stat.ML cs.LG

    Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection

    Authors: Shijin Gong, Xinyu Zhang

    Abstract: When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have concurrently made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks, which… ▽ More

    Submitted 5 October, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: 32 pages, 6 figures