(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 640 results for author: Ma, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.03029  [pdf, other

    cs.LG cs.AI

    Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning

    Authors: Haozhe Ma, Zhengding Luo, Thanh Vinh Vo, Kuankuan Sima, Tze-Yun Leong

    Abstract: Reward shaping addresses the challenge of sparse rewards in reinforcement learning by constructing denser and more informative reward signals. To achieve self-adaptive and highly efficient reward shaping, we propose a novel method that incorporates success rates derived from historical experiences into shaped rewards. Our approach utilizes success rates sampled from Beta distributions, which dynam… ▽ More

    Submitted 7 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  2. arXiv:2408.02976  [pdf, ps, other

    cs.CL cs.AI

    Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation

    Authors: Hui Ma, Bo Zhang, Bo Xu, Jian Wang, Hongfei Lin, Xiao Sun

    Abstract: Empathetic response generation, aiming at understanding the user's situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Previous methods mainly focus on using maximum likelihood estimation as the optimization objective for training response generation models, without taking into account the empathy level alignment between generated responses and targ… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  3. arXiv:2407.19696  [pdf, other

    cs.CV

    Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images

    Authors: Zewen Du, Zhenjiang Hu, Guiyu Zhao, Ying Jin, Hongbin Ma

    Abstract: Object detection in aerial images has always been a challenging task due to the generally small size of the objects. Most current detectors prioritize novel detection frameworks, often overlooking research on fundamental components such as feature pyramid networks. In this paper, we introduce the Cross-Layer Feature Pyramid Transformer (CFPT), a novel upsampler-free feature pyramid network designe… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  4. arXiv:2407.19675  [pdf, other

    cs.CV

    Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment

    Authors: Wulian Yun, Mengshi Qi, Fei Peng, Huadong Ma

    Abstract: Existing action quality assessment (AQA) methods often require a large number of label annotations for fully supervised learning, which are laborious and expensive. In practice, the labeled data are difficult to obtain because the AQA annotation process requires domain-specific expertise. In this paper, we propose a novel semi-supervised method, which can be utilized for better assessment of the A… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: To be published in ECCV2024

  5. arXiv:2407.16626  [pdf, other

    cs.SE

    A Tale of Two DL Cities: When Library Tests Meet Compiler

    Authors: Qingchao Shen, Yongqiang Tian, Haoyang Ma, Junjie Chen, Lili Huang, Ruifeng Fu, Shing-Chi Cheung, Zan Wang

    Abstract: Deep Learning (DL) compilers typically load a DL model and optimize it with intermediate representation.Existing DL compiler testing techniques mainly focus on model optimization stages, but rarely explore bug detection at the model loading stage. Effectively testing the model loading stage requires covering diverse usages of each DL operator from various DL libraries, which shares a common object… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by ICSE'2025

  6. arXiv:2407.16477  [pdf, other

    cs.CV

    qMRI Diffusor: Quantitative T1 Mapping of the Brain using a Denoising Diffusion Probabilistic Model

    Authors: Shishuai Wang, Hua Ma, Juan A. Hernandez-Tamames, Stefan Klein, Dirk H. J. Poot

    Abstract: Quantitative MRI (qMRI) offers significant advantages over weighted images by providing objective parameters related to tissue properties. Deep learning-based methods have demonstrated effectiveness in estimating quantitative maps from series of weighted images. In this study, we present qMRI Diffusor, a novel approach to qMRI utilising deep generative models. Specifically, we implemented denoisin… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by Deep Generative Models workshop at MICCAI 2024

  7. arXiv:2407.15286  [pdf, other

    cs.CL

    Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis

    Authors: Guangliang Liu, Haitao Mao, Jiliang Tang, Kristen Marie Johnson

    Abstract: Large Language Models (LLMs) are capable of producing content that perpetuates stereotypes, discrimination, and toxicity. The recently proposed moral self-correction is a computationally efficient method for reducing harmful content in the responses of LLMs. However, the process of how injecting self-correction instructions can modify the behavior of LLMs remains under-explored. In this paper, we… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  8. arXiv:2407.14811  [pdf, other

    cs.CV cs.AI

    Decoupled Prompt-Adapter Tuning for Continual Activity Recognition

    Authors: Di Fu, Thanh Vinh Vo, Haozhe Ma, Tze-Yun Leong

    Abstract: Action recognition technology plays a vital role in enhancing security through surveillance systems, enabling better patient monitoring in healthcare, providing in-depth performance analysis in sports, and facilitating seamless human-AI collaboration in domains such as manufacturing and assistive technologies. The dynamic nature of data in these areas underscores the need for models that can conti… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  9. arXiv:2407.14502  [pdf, other

    cs.CV

    M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

    Authors: Seunggeun Chi, Hyung-gun Chi, Hengbo Ma, Nakul Agarwal, Faizan Siddiqui, Karthik Ramani, Kwonjoon Lee

    Abstract: We introduce the Multi-Motion Discrete Diffusion Models (M2D2M), a novel approach for human motion generation from textual descriptions of multiple actions, utilizing the strengths of discrete diffusion models. This approach adeptly addresses the challenge of generating multi-motion sequences, ensuring seamless transitions of motions and coherence across a series of actions. The strength of M2D2M… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  10. arXiv:2407.14062  [pdf, other

    cs.CV

    Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation

    Authors: Zhe Zhao, Mengshi Qi, Huadong Ma

    Abstract: Generating realistic human grasps is a crucial yet challenging task for applications involving object manipulation in computer graphics and robotics. Existing methods often struggle with generating fine-grained realistic human grasps that ensure all fingers effectively interact with objects, as they focus on encoding hand with the whole representation and then estimating both hand posture and posi… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: To be published in The 18th European Conference on Computer Vision ECCV 2024

  11. arXiv:2407.12425  [pdf, other

    cs.CL

    Navigating the Noisy Crowd: Finding Key Information for Claim Verification

    Authors: Haisong Gong, Huanhuan Ma, Qiang Liu, Shu Wu, Liang Wang

    Abstract: Claim verification is a task that involves assessing the truthfulness of a given claim based on multiple evidence pieces. Using large language models (LLMs) for claim verification is a promising way. However, simply feeding all the evidence pieces to an LLM and asking if the claim is factual does not yield good results. The challenge lies in the noisy nature of both the evidence and the claim: evi… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  12. arXiv:2407.10811  [pdf, other

    cs.MA cs.AI cs.LG

    GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

    Authors: Haoyuan Jiang, Xuantang Xiong, Ziyue Li, Hangyu Mao, Guanghu Sui, Jingqing Ruan, Yuheng Cheng, Hua Wei, Wolfgang Ketter, Rui Zhao

    Abstract: Currently, traffic signal control (TSC) methods based on reinforcement learning (RL) have proven superior to traditional methods. However, most RL methods face difficulties when applied in the real world due to three factors: input, output, and the cycle-flow relation. The industry's observable input is much more limited than simulation-based RL methods. For real-world solutions, only flow can be… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under Review of IEEE Transactions on Intelligent Transportation Systems

  13. arXiv:2407.09451  [pdf, other

    cs.RO

    Benchmarking Large Neighborhood Search for Multi-Agent Path Finding

    Authors: Jiaqi Tan, Yudong Luo, Jiaoyang Li, Hang Ma

    Abstract: Multi-Agent Path Finding (MAPF) aims to arrange collision-free goal-reaching paths for a group of agents. Anytime MAPF solvers based on large neighborhood search (LNS) have gained prominence recently due to their flexibility and scalability. Neighborhood selection strategy is crucial to the success of MAPF-LNS and a flurry of methods have been proposed. However, several pitfalls exist and hinder a… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  14. arXiv:2407.06297  [pdf, other

    cs.CV

    SGOR: Outlier Removal by Leveraging Semantic and Geometric Information for Robust Point Cloud Registration

    Authors: Guiyu Zhao, Zhentao Guo, Hongbin Ma

    Abstract: In this paper, we introduce a new outlier removal method that fully leverages geometric and semantic information, to achieve robust registration. Current semantic-based registration methods only use semantics for point-to-point or instance semantic correspondence generation, which has two problems. First, these methods are highly dependent on the correctness of semantics. They perform poorly in sc… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by IROS 2024

  15. Towards Understanding the Bugs in Solidity Compiler

    Authors: Haoyang Ma, Wuqi Zhang, Qingchao Shen, Yongqiang Tian, Junjie Chen, Shing-Chi Cheung

    Abstract: Solidity compiler plays a key role in enabling the development of smart contract applications on Ethereum by governing the syntax of a domain-specific language called Solidity and performing compilation and optimization of Solidity code. The correctness of Solidity compiler is critical in fostering transparency, efficiency, and trust in industries reliant on smart contracts. However, like other so… ▽ More

    Submitted 9 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Journal ref: ISSTA 2024

  16. arXiv:2407.05721  [pdf, other

    cs.CL

    PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation

    Authors: Jinpeng Hu, Tengteng Dong, Luo Gang, Hui Ma, Peng Zou, Xiao Sun, Dan Guo, Meng Wang

    Abstract: Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In t… ▽ More

    Submitted 7 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: work in progress

  17. arXiv:2407.02933  [pdf, other

    cs.RO

    Online Time-Informed Kinodynamic Motion Planning of Nonlinear Systems

    Authors: Fei Meng, Jianbang Liu, Haojie Shi, Han Ma, Hongliang Ren, Max Q. -H. Meng

    Abstract: Sampling-based kinodynamic motion planners (SKMPs) are powerful in finding collision-free trajectories for high-dimensional systems under differential constraints. Time-informed set (TIS) can provide the heuristic search domain to accelerate their convergence to the time-optimal solution. However, existing TIS approximation methods suffer from the curse of dimensionality, computational burden, and… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  18. arXiv:2407.02616  [pdf

    eess.IV cs.CV

    Deep Learning Based Apparent Diffusion Coefficient Map Generation from Multi-parametric MR Images for Patients with Diffuse Gliomas

    Authors: Zach Eidex, Mojtaba Safari, Jacob Wynne, Richard L. J. Qiu, Tonghe Wang, David Viar Hernandez, Hui-Kuo Shu, Hui Mao, Xiaofeng Yang

    Abstract: Purpose: Apparent diffusion coefficient (ADC) maps derived from diffusion weighted (DWI) MRI provides functional measurements about the water molecules in tissues. However, DWI is time consuming and very susceptible to image artifacts, leading to inaccurate ADC measurements. This study aims to develop a deep learning framework to synthesize ADC maps from multi-parametric MR images. Methods: We pro… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.15044

  19. arXiv:2406.19113  [pdf, other

    cs.AR cs.DC q-bio.GN

    MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storag… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in ISCA 2024. arXiv admin note: substantial text overlap with arXiv:2311.12527

  20. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  21. arXiv:2406.16005  [pdf, other

    cs.DC

    A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications

    Authors: Lei Chen, Shi Liu, Chenxi Wang, Haoran Ma, Yifan Qiao, Zhe Wang, Chenggang Wu, Youyou Lu, Xiaobing Feng, Huimin Cui, Shan Lu, Harry Xu

    Abstract: With rapid advances in network hardware, far memory has gained a great deal of traction due to its ability to break the memory capacity wall. Existing far memory systems fall into one of two data paths: one that uses the kernel's paging system to transparently access far memory at the page granularity, and a second that bypasses the kernel, fetching data at the object granularity. While it is gene… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  22. arXiv:2406.15658  [pdf, other

    cs.CV cs.AI

    TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning

    Authors: Nemin Wu, Qian Cao, Zhangyu Wang, Zeping Liu, Yanlin Qi, Jielu Zhang, Joshua Ni, Xiaobai Yao, Hongxu Ma, Lan Mu, Stefano Ermon, Tanuja Ganu, Akshay Nambi, Ni Lao, Gengchen Mai

    Abstract: Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generati… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures. Submitted to NeurIPS 2024 Datasets and Benchmarks Track. Under review

  23. arXiv:2406.14842  [pdf, other

    q-bio.GN cs.HC

    Online t-SNE for single-cell RNA-seq

    Authors: Hui Ma, Kai Chen

    Abstract: Due to the sequential sample arrival, changing experiment conditions, and evolution of knowledge, the demand to continually visualize evolving structures of sequential and diverse single-cell RNA-sequencing (scRNA-seq) data becomes indispensable. However, as one of the state-of-the-art visualization and analysis methods for scRNA-seq, t-distributed stochastic neighbor embedding (t-SNE) merely visu… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  24. arXiv:2406.14585  [pdf

    physics.comp-ph cs.LG physics.data-an physics.optics

    Deep-learning-assisted reconfigurable metasurface antenna for real-time holographic beam steering

    Authors: Hyunjun Ma, Jin-soo Kim, Jong-Ho Choe, Q-Han Park

    Abstract: We propose a metasurface antenna capable of real time holographic beam steering. An array of reconfigurable dipoeles can generate on demand far field patterns of radiation through the specific encoding of meta atomic states. i.e., the configuration of each dipole. Suitable states for the generation of the desired patterns can be identified using iteartion, but this is very slow and needs to be don… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Journal ref: Nanophotonics 12.13 (2023): 2415-2423

  25. arXiv:2406.13873  [pdf, other

    cs.AI

    A Pure Transformer Pretraining Framework on Text-attributed Graphs

    Authors: Yu Song, Haitao Mao, Jiachen Xiao, Jingzhe Liu, Zhikai Chen, Wei Jin, Carl Yang, Jiliang Tang, Hui Liu

    Abstract: Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges such as feature heterogeneity and structural heterogeneity. Recently, increasing efforts have been made to enhance node feature quality with Large Lan… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  26. arXiv:2406.13281  [pdf, other

    cs.CV

    ECAFormer: Low-light Image Enhancement using Cross Attention

    Authors: Yudi Ruan, Hao Ma, Weikai Li, Xiao Wang

    Abstract: Low-light image enhancement (LLIE) is vital for autonomous driving. Despite the importance, existing LLIE methods often prioritize robustness in overall brightness adjustment, which can come at the expense of detail preservation. To overcome this limitation,we propose the Hierarchical Mutual Enhancement via Cross-Attention transformer (ECAFormer), a novel network that utilizes Dual Multi-head Self… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  27. arXiv:2406.12465  [pdf, other

    cs.CY cs.AI cs.IR

    RIGL: A Unified Reciprocal Approach for Tracing the Independent and Group Learning Processes

    Authors: Xiaoshan Yu, Chuan Qin, Dazhong Shen, Shangshang Yang, Haiping Ma, Hengshu Zhu, Xingyi Zhang

    Abstract: In the realm of education, both independent learning and group learning are esteemed as the most classic paradigms. The former allows learners to self-direct their studies, while the latter is typically characterized by teacher-directed scenarios. Recent studies in the field of intelligent education have leveraged deep temporal models to trace the learning process, capturing the dynamics of studen… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024. 12 pages

  28. arXiv:2406.10727  [pdf, other

    cs.LG

    Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights

    Authors: Zhikai Chen, Haitao Mao, Jingzhe Liu, Yu Song, Bingheng Li, Wei Jin, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, Jiliang Tang

    Abstract: Given the ubiquity of graph data and its applications in diverse domains, building a Graph Foundation Model (GFM) that can work well across different graphs and tasks with a unified backbone has recently garnered significant interests. A major obstacle to achieving this goal stems from the fact that graphs from different domains often exhibit diverse node features. Inspired by multi-modal models t… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Preliminary version: if you find any mistakes regarding the evaluation, feel free to contact the first author

  29. arXiv:2406.08961  [pdf, other

    q-bio.BM cs.LG

    SIU: A Million-Scale Structural Small Molecule-Protein Interaction Dataset for Unbiased Bioactivity Prediction

    Authors: Yanwen Huang, Bowen Gao, Yinjun Jia, Hongbo Ma, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

    Abstract: Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  30. arXiv:2406.05086  [pdf, other

    math.OC cs.AI cs.GT

    Robust Reward Design for Markov Decision Processes

    Authors: Shuo Wu, Haoxiang Ma, Jie Fu, Shuo Han

    Abstract: The problem of reward design examines the interaction between a leader and a follower, where the leader aims to shape the follower's behavior to maximize the leader's payoff by modifying the follower's reward function. Current approaches to reward design rely on an accurate model of how the follower responds to reward modifications, which can be sensitive to modeling inaccuracies. To address this… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 50 pages, 8 figures

  31. Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization

    Authors: Huanhuan Ma, Jinghao Zhang, Qiang Liu, Shu Wu, Liang Wang

    Abstract: The rapid spread of information through mobile devices and media has led to the widespread of false or deceptive news, causing significant concerns in society. Among different types of misinformation, image repurposing, also known as out-of-context misinformation, remains highly prevalent and effective. However, current approaches for detecting out-of-context misinformation often lack interpretabi… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ICASSP 2024 lecture paper

  32. arXiv:2406.04689  [pdf, other

    cs.CV

    CDeFuse: Continuous Decomposition for Infrared and Visible Image Fusion

    Authors: Haolong Ma, Hui Li, Chunyang Cheng, Xiaoning Song, Zhongwei Shen

    Abstract: As a common image processing technique, image decomposition is often used to extract complementary information between modalities. In current decomposition-based image fusion methods, typically, source images are decomposed into three parts at single scale (i.e., visible-exclusive part, infrared-exclusive part, and common part) and lacking interaction between modalities during the decomposition pr… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  33. arXiv:2406.03999  [pdf, other

    cs.LG cs.CV

    Unveiling the Dynamics of Information Interplay in Supervised Learning

    Authors: Kun Song, Zhiquan Tan, Bochao Zou, Huimin Ma, Weiran Huang

    Abstract: In this paper, we use matrix information theory as an analytical tool to analyze the dynamics of the information interplay between data representations and classification head vectors in the supervised learning process. Specifically, inspired by the theory of Neural Collapse, we introduce matrix mutual information ratio (MIR) and matrix entropy difference ratio (HDR) to assess the interactions of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  34. arXiv:2406.03917  [pdf, other

    cs.CV

    Frequency-based Matcher for Long-tailed Semantic Segmentation

    Authors: Shan Li, Lu Yang, Pu Cao, Liulei Li, Huadong Ma

    Abstract: The successful application of semantic segmentation technology in the real world has been among the most exciting achievements in the computer vision community over the past decade. Although the long-tailed phenomenon has been investigated in many fields, e.g., classification and object detection, it has not received enough attention in semantic segmentation and has become a non-negligible obstacl… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted for publication as a Regular paper in the IEEE Transactions on Multimedia

  35. arXiv:2406.02803  [pdf, other

    cs.DC

    DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

    Authors: Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, Harry Xu

    Abstract: Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunit… ▽ More

    Submitted 27 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  36. arXiv:2406.02378  [pdf, other

    cs.CL

    On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

    Authors: Guangliang Liu, Haitao Mao, Bochuan Cao, Zhiyu Xue, Kristen Johnson, Jiliang Tang, Rongrong Wang

    Abstract: Large Language Models (LLMs) can improve their responses when instructed to do so, a capability known as self-correction. When these instructions lack specific details about the issues in the response, this is referred to as leveraging the intrinsic self-correction capability. The empirical success of self-correction can be found in various applications, e.g., text detoxification and social bias m… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 22 pages, 7 figures

  37. arXiv:2406.01908  [pdf, other

    cs.LG math.OC

    PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

    Authors: Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun

    Abstract: Solving large-scale linear programming (LP) problems is an important task in various areas such as communication networks, power systems, finance and logistics. Recently, two distinct approaches have emerged to expedite LP solving: (i) First-order methods (FOMs); (ii) Learning to optimize (L2O). In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  38. arXiv:2406.01899  [pdf, other

    cs.LG

    Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models

    Authors: Wenzhuo Tang, Haitao Mao, Danial Dervovic, Ivan Brugere, Saumitra Mishra, Yuying Xie, Jiliang Tang

    Abstract: Models for natural language and images benefit from data scaling behavior: the more data fed into the model, the better they perform. This 'better with more' phenomenon enables the effectiveness of large-scale pre-training on vast amounts of data. However, current graph pre-training methods struggle to scale up data due to heterogeneity across graphs. To achieve effective data scaling, we aim to d… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  39. arXiv:2405.20881  [pdf, other

    cs.CV

    S4Fusion: Saliency-aware Selective State Space Model for Infrared Visible Image Fusion

    Authors: Haolong Ma, Hui Li, Chunyang Cheng, Gaoang Wang, Xiaoning Song, Xiaojun Wu

    Abstract: As one of the tasks in Image Fusion, Infrared and Visible Image Fusion aims to integrate complementary information captured by sensors of different modalities into a single image. The Selective State Space Model (SSSM), known for its ability to capture long-range dependencies, has demonstrated its potential in the field of computer vision. However, in image fusion, current methods underestimate th… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  40. arXiv:2405.18731  [pdf, other

    eess.SP cs.AI physics.comp-ph

    VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems

    Authors: Ziqing Xing, Zhaoyang Zhang, Zirui Chen, Yusong Wang, Haoran Ma, Zhun Wei, Gang Bao

    Abstract: Recently, studies have shown the potential of integrating field-type iterative methods with deep learning (DL) techniques in solving inverse scattering problems (ISPs). In this article, we propose a novel Variational Born Iterative Network, namely, VBIM-Net, to solve the full-wave ISPs with significantly improved flexibility and inversion quality. The proposed VBIM-Net emulates the alternating upd… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages, 21 figures

  41. arXiv:2405.17152  [pdf, other

    cs.MA cs.AI

    CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

    Authors: Jingqing Ruan, Ziyue Li, Hua Wei, Haoyuan Jiang, Jiaming Lu, Xuantang Xiong, Hangyu Mao, Rui Zhao

    Abstract: Effective multi-intersection collaboration is pivotal for reinforcement-learning-based traffic signal control to alleviate congestion. Existing work mainly chooses neighboring intersections as collaborators. However, quite an amount of congestion, even some wide-range congestion, is caused by non-neighbors failing to collaborate. To address these issues, we propose to separate the collaborator sel… ▽ More

    Submitted 19 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  42. arXiv:2405.16363  [pdf, other

    cs.IR cs.AI

    LLMs for User Interest Exploration in Large-scale Recommendation Systems

    Authors: Jianling Wang, Haokai Lu, Yifan Liu, He Ma, Yueqi Wang, Yang Gu, Shuzhou Zhang, Ningren Han, Shuchao Bi, Lexi Baugher, Ed Chi, Minmin Chen

    Abstract: Traditional recommendation systems are subject to a strong feedback loop by learning from and reinforcing past user-item interactions, which in turn limits the discovery of novel user interests. To address this, we introduce a hybrid hierarchical framework combining Large Language Models (LLMs) and classic recommendation models for user interest exploration. The framework controls the interfacing… ▽ More

    Submitted 7 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  43. arXiv:2405.15199  [pdf, other

    cs.CV

    ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

    Authors: Jingyuan Zhu, Shiyu Li, Yuxuan Liu, Ping Huang, Jiulong Shan, Huimin Ma, Jian Yuan

    Abstract: Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the generation quality and the controllability for complex scenes containing multi-class objects and dense objects with occlusions remain limited. This paper presents ODGEN, a novel method to generate high-quality images conditioned on b… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  44. arXiv:2405.15124  [pdf, other

    cs.LG cs.AI

    Scaling Law for Time Series Forecasting

    Authors: Jingzhe Shi, Qinwei Ma, Huan Ma, Lei Li

    Abstract: Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 20 pages

  45. arXiv:2405.13049  [pdf, other

    cs.CL cs.AI cs.MM

    SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations

    Authors: Fanfan Wang, Heqing Ma, Jianfei Yu, Rui Xia, Erik Cambria

    Abstract: The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted to the 18th International Workshop on Semantic Evaluation (SemEval-2024). 12 pages, 3 figures, 4 Tables

    Journal ref: https://aclanthology.org/2024.semeval-1.277/

  46. arXiv:2405.12850  [pdf, other

    cs.CV

    Weakly supervised alignment and registration of MR-CT for cervical cancer radiotherapy

    Authors: Jjahao Zhang, Yin Gu, Deyu Sun, Yuhua Gao, Ming Gao, Ming Cui, Teng Zhang, He Ma

    Abstract: Cervical cancer is one of the leading causes of death in women, and brachytherapy is currently the primary treatment method. However, it is important to precisely define the extent of paracervical tissue invasion to improve cancer diagnosis and treatment options. The fusion of the information characteristics of both computed tomography (CT) and magnetic resonance imaging(MRI) modalities may be use… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  47. arXiv:2405.11715  [pdf, other

    cs.AI cs.LG

    Semantic Trajectory Data Mining with LLM-Informed POI Classification

    Authors: Yifan Liu, Chenchen Kuai, Haoxuan Ma, Xishun Liao, Brian Yueshuai He, Jiaqi Ma

    Abstract: Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly en… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  48. arXiv:2405.10121  [pdf, other

    cs.CL cs.MM

    Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation

    Authors: Bo Zhang, Hui Ma, Jian Ding, Jian Wang, Bo Xu, Hongfei Lin

    Abstract: Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Under Review

  49. arXiv:2405.09593  [pdf, other

    cs.DB cs.AI

    SQL-to-Schema Enhances Schema Linking in Text-to-SQL

    Authors: Sun Yang, Qiong Su, Zhishuai Li, Ziyue Li, Hangyu Mao, Chenxi Liu, Rui Zhao

    Abstract: In sophisticated existing Text-to-SQL methods exhibit errors in various proportions, including schema-linking errors (incorrect columns, tables, or extra columns), join errors, nested errors, and group-by errors. Consequently, there is a critical need to filter out unnecessary tables and columns, directing the language models attention to relevant tables and columns with schema-linking, to reduce… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  50. arXiv:2405.09148  [pdf, ps, other

    cs.CV

    A Hierarchically Feature Reconstructed Autoencoder for Unsupervised Anomaly Detection

    Authors: Honghui Chen, Pingping Chen, Huan Mao, Mengxi Jiang

    Abstract: Anomaly detection and localization without any manual annotations and prior knowledge is a challenging task under the setting of unsupervised learning. The existing works achieve excellent performance in the anomaly detection, but with complex networks or cumbersome pipelines. To address this issue, this paper explores a simple but effective architecture in the anomaly detection. It consists of a… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures

    MSC Class: 68T01 ACM Class: I.2.10