(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 1,329 results for author: Liu, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.06027  [pdf, other

    eess.SP cs.LG

    A Comprehensive Survey on EEG-Based Emotion Recognition: A Graph-Based Perspective

    Authors: Chenyu Liu, Xinliang Zhou, Yihao Wu, Yi Ding, Liming Zhai, Kun Wang, Ziyu Jia, Yang Liu

    Abstract: Compared to other modalities, electroencephalogram (EEG) based emotion recognition can intuitively respond to emotional patterns in the human brain and, therefore, has become one of the most focused tasks in affective computing. The nature of emotions is a physiological and psychological state change in response to brain region connectivity, making emotion recognition focus more on the dependency… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  2. arXiv:2408.05117  [pdf, other

    eess.IV cs.AI cs.CV

    Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images

    Authors: Shouyue Liu, Jinkui Hao, Yonghuai Liu, Huazhu Fu, Xinyu Guo, Shuting Zhang, Yitian Zhao

    Abstract: Early detection of dementia, such as Alzheimer's disease (AD) or mild cognitive impairment (MCI), is essential to enable timely intervention and potential treatment. Accurate detection of AD/MCI is challenging due to the high complexity, cost, and often invasive nature of current diagnostic techniques, which limit their suitability for large-scale population screening. Given the shared embryologic… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  3. arXiv:2408.04324  [pdf, ps, other

    eess.SP

    Secure Transmission for Movable Antennas Empowered Cell-Free Symbiotic Radio Communications

    Authors: Jiayu Guan, Bin Lyu, Yan Liu, Feng Tian

    Abstract: In this paper, a novel movable antenna (MA) empowered secure transmission scheme is designed for cell-free symbiotic radio (SR) systems in the presence of an eavesdropper (Eve). Specifically, multiple distributed access points (APs) equipped with MAs collaboratively transmit confidential information to the primary user (PU), in the meanwhile the backscatter device (BD) transmits its own informatio… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 7 pages, 6 figures

  4. arXiv:2408.02027  [pdf, other

    eess.SP

    Near-Field Sensing Enabled Predictive Beamforming: From Estimation to Tracking

    Authors: Hao Jiang, Zhaolin Wang, Yuanwei Liu

    Abstract: A near-field sensing (NISE) enabled predictive beamforming framework is proposed to facilitate wireless communications with high-mobility channels. Unlike conventional far-field sensing, which only captures the angle and the radial velocity of the user, NISE enables the estimation of the full motion state, including additional distance and transverse velocity information. Two full-motion state sen… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  5. Multibeam Hybrid Transmitarray Based on Polarization Rotating Metasurface With Reconfigurable Bidirectional Radiation

    Authors: Fan Qin, Yifei Liu, Chao Gu, Linfeng Zeng, Wenchi Cheng, Hailin Zhang, Steven Gao

    Abstract: This paper proposes a bidirectional multibeam hybrid transmitarray (HTA) employing a transmission polarization-rotating metasurface (TPRM). A novel configuration is introduced to facilitate bidirectional beam scanning by combining the transmitarray (TA) and folded-transmitarray (FTA). To accomplish the reconfiguration of both unidirectional and bidirectional radiation states in the +z, -z, and +/-… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 12 pages, 26 figures, published to TAP

  6. arXiv:2408.00429  [pdf, other

    eess.SP cs.AI

    Augmenting Channel Simulator and Semi- Supervised Learning for Efficient Indoor Positioning

    Authors: Yupeng Li, Xinyu Ning, Shijian Gao, Yitong Liu, Zhi Sun, Qixing Wang, Jiangzhou Wang

    Abstract: This work aims to tackle the labor-intensive and resource-consuming task of indoor positioning by proposing an efficient approach. The proposed approach involves the introduction of a semi-supervised learning (SSL) with a biased teacher (SSLB) algorithm, which effectively utilizes both labeled and unlabeled channel data. To reduce measurement expenses, unlabeled data is generated using an updated… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ACCEPTED for presentation at 2024 IEEE Global Communications Conference

  7. Joint Vehicle Connection and Beamforming Optimization in Digital Twin Assisted Integrated Sensing and Communication Vehicular Networks

    Authors: Weihang Ding, Zhaohui Yang, Mingzhe Chen, Yuchen Liu, Mohammad Shikh-Bahaei

    Abstract: This paper introduces an approach to harness digital twin (DT) technology in the realm of integrated sensing and communications (ISAC) in the sixth-generation (6G) Internet-of-everything (IoE) applications. We consider moving targets in a vehicular network and use DT to track and predict the motion of the vehicles. After predicting the location of the vehicle at the next time slot, the DT designs… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Journal ref: IEEE Internet of Things Journal (2024)

  8. An Efficient Convex-Hull Relaxation Based Algorithm for Multi-User Discrete Passive Beamforming

    Authors: Wenhai Lai, Zheyu Wu, Yi Feng, Kaiming Shen, Ya-Feng Liu

    Abstract: Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Dif… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 5 pages

    Journal ref: IEEE Signal Processing Letters 2024

  9. arXiv:2407.19511  [pdf, ps, other

    cs.IT eess.SP

    Suppressing Beam Squint Effect For Near-Field Wideband Communication Through Movable Antennas

    Authors: Yanze Zhu, Qingqing Wu, Yang Liu, Qingjiang Shi, Wen Chen

    Abstract: In this correspondence, we study deploying movable antenna (MA) array in a wideband multiple-input-single-output (MISO) communication system, where near-field (NF) channel model is considered. To alleviate beam squint effect, we propose to maximize the minimum analog beamforming gain across the entire wideband spectrum by appropriately adjusting MAs' positions, which is a highly challenging task.… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures, submitted to IEEE journal

  10. arXiv:2407.18516  [pdf

    eess.AS eess.SY

    Integrating Posture Control in Speech Motor Models: A Parallel-Structured Simulation Approach

    Authors: Yadong Liu, Sidney Fels, Arian Shamei, Najeeb Khan, Bryan Gick

    Abstract: Posture is an essential aspect of motor behavior, necessitating continuous muscle activation to counteract gravity. It remains stable under perturbation, aiding in maintaining bodily balance and enabling movement execution. Similarities have been observed between gross body postures and speech postures, such as those involving the jaw, tongue, and lips, which also exhibit resilience to perturbatio… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  11. arXiv:2407.17172  [pdf, other

    cs.SD cs.CL eess.AS

    Speech Editing -- a Summary

    Authors: Tobias Kässmann, Yining Liu, Danni Liu

    Abstract: With the rise of video production and social media, speech editing has become crucial for creators to address issues like mispronunciations, missing words, or stuttering in audio recordings. This paper explores text-based speech editing methods that modify audio via text transcripts without manual waveform editing. These approaches ensure edited audio is indistinguishable from the original by alte… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  12. arXiv:2407.14564  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.LG

    APS-USCT: Ultrasound Computed Tomography on Sparse Data via AI-Physic Synergy

    Authors: Yi Sheng, Hanchen Wang, Yipei Liu, Junhuan Yang, Weiwen Jiang, Youzuo Lin, Lei Yang

    Abstract: Ultrasound computed tomography (USCT) is a promising technique that achieves superior medical imaging reconstruction resolution by fully leveraging waveform information, outperforming conventional ultrasound methods. Despite its advantages, high-quality USCT reconstruction relies on extensive data acquisition by a large number of transducers, leading to increased costs, computational demands, exte… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: MICCAI

  13. arXiv:2407.13292  [pdf, other

    cs.SD cs.CL eess.AS

    Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training

    Authors: Lukuan Dong, Donghong Qin, Fengbo Bai, Fanhua Song, Yan Liu, Chen Xu, Zhijian Ou

    Abstract: The mainstream automatic speech recognition (ASR) technology usually requires hundreds to thousands of hours of annotated speech data. Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data. The Iu Mien language is the main ethnic language of the Yao ethnic group in China and is low-resourced in the sense… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  14. arXiv:2407.13229  [pdf, other

    cs.RO eess.SY

    Disturbance Observer for Estimating Coupled Disturbances

    Authors: Jindou Jia, Yuhang Liu, Kexin Guo, Xiang Yu, Lihua Xie, Lei Guo

    Abstract: High-precision control for nonlinear systems is impeded by the low-fidelity dynamical model and external disturbance. Especially, the intricate coupling between internal uncertainty and external disturbance is usually difficult to be modeled explicitly. Here we show an effective and convergent algorithm enabling accurate estimation of the coupled disturbance via combining control and learning phil… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures

  15. arXiv:2407.13220  [pdf, other

    eess.AS cs.SD

    MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

    Authors: Huadai Liu, Jialei Wang, Rongjie Huang, Yang Liu, Jiayang Xu, Zhou Zhao

    Abstract: Text-guided diffusion models catalyze a paradigm shift in audio generation, facilitating the adaptability of source audio to conform to specific textual prompts. Recent advancements introduce inversion techniques, like DDIM inversion, to zero-shot editing, exploiting pre-trained diffusion models for audio modification. Nonetheless, our investigation exposes that DDIM inversion suffers from an accu… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  16. arXiv:2407.12038  [pdf, ps, other

    eess.AS cs.AI

    ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

    Authors: Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Shuchen Shi, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Xin Qi, Guanjun Li

    Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept… ▽ More

    Submitted 31 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: ISCSLP 2024 Challenge description and results

  17. arXiv:2407.11079  [pdf, ps, other

    eess.SP cs.IT

    One-Bit MIMO Detection: From Global Maximum-Likelihood Detector to Amplitude Retrieval Approach

    Authors: Mingjie Shao, Wei-Kun Chen, Cheng-Yang Yu, Ya-Feng Liu, Wing-Kin Ma

    Abstract: As communication systems advance towards the future 6G era, the incorporation of large-scale antenna arrays in base stations (BSs) presents challenges such as increased hardware costs and energy consumption. To address these issues, the use of one-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs) has gained significant attentions. This paper focuses on one-bit multiple-in… ▽ More

    Submitted 16 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

  18. arXiv:2407.10628  [pdf

    cond-mat.mtrl-sci eess.IV

    Automated high-resolution backscattered-electron imaging at macroscopic scale

    Authors: Zhiyuan Lang, Zunshuai Zhang, Lei Wang, Yuhan Liu, Weixiong Qian, Shenghua Zhou, Ying Jiang, Tongyi Zhang, Jiong Yang

    Abstract: Scanning electron microscopy (SEM) has been widely utilized in the field of materials science due to its significant advantages, such as large depth of field, wide field of view, and excellent stereoscopic imaging. However, at high magnification, the limited imaging range in SEM cannot cover all the possible inhomogeneous microstructures. In this research, we propose a novel approach for generatin… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 22 pages,12 figures

  19. arXiv:2407.09782  [pdf

    eess.SY

    Gravity Balanced Arm Exoskeleton for Basketball Shooting Training

    Authors: Yunfei Liu, Zhanghao Yang

    Abstract: This paper proposes a gravity balanced arm exoskeleton design for basketball shooting training. The potential energy equation of the mechanism is derived. A simulation of the arm going through the basketball shooting motion is done on the mechanism. Throughout the motion the total potential energy is constant. Thus, the proposed arm exoskeleton is indeed gravity balanced with the use of two spring… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 3 pages, 5 figures, 1 table

  20. arXiv:2407.09780  [pdf

    eess.SY

    Human Leg Training Machine Based on The Multi-linkage System

    Authors: Yunfei Liu, Zhanghao Yang

    Abstract: In real life, many people have leg defects. the goal of our work is to design a mechanism which could help them walk based on a specific trajectory and realize flexible walking finally. In this paper, we use a motor to drive a multi-link leg mechanism. The major issues addressed in this paper are as follows: (i) design human leg training mechanism based on the multi-link mechanism (ii) Simulate le… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 4 pages, 7 figures, 1 table

  21. arXiv:2407.09727  [pdf

    eess.SY

    Temperature Secret in Bathtub: A Model of Temperature Distribution of Bathtub Based on Heat Conduction Equation

    Authors: Yunfei Liu

    Abstract: We use the multidimensional heat conduction and heat transfer equations to model the temperature distribution of water in a bathtub by solving partial differential equations. We address optimal water addition and bathtub design. First, we establish a water surface cooling model using Newton's law of cooling to simulate heat exchange between air and water. Without new heat sources, the water temper… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 21 pages, 8 figures, 3 tables

  22. arXiv:2407.09084  [pdf, other

    eess.SY

    Perceived Time To Collision as Public Space Users' Discomfort Metric

    Authors: Alireza Jafari, Yen-Chen Liu

    Abstract: Micro-mobility transport vehicles such as e-scooters are joining current sidewalk users and affect the safety and comfort of pedestrians as primary sidewalk users. The lack of agreed-upon metrics to quantify people's discomfort hinders shared public space safety research. We introduce perceived Time To Collision (TTC) as a potential metric of user discomfort performing controlled experiments using… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 6 pages, 7 figures, 1 table, IFAC 2023

  23. arXiv:2407.09078  [pdf, other

    eess.SY

    Dynamic Modeling and Stability Analysis of Balancing in Riderless Electric Scooters

    Authors: Yun-Hao Lin, Alireza Jafari, Yen-Chen Liu

    Abstract: Today, electric scooter is a trendy personal mobility vehicle. The rising demand and opportunities attract ride-share services. A common problem of such services is abandoned e-scooters. An autonomous e-scooter capable of moving to the charging station is a solution. This paper focuses on maintaining balance for these riderless e-scooters. The paper presents a nonlinear model for an e-scooter movi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 7 pages, 4 figures, 1 table, In ACC2024

  24. arXiv:2407.09066  [pdf

    physics.optics eess.SP

    Physical encryption and decryption for secure data transmission in optical networks leveraging the temporal Talbot effect and microwave photonics

    Authors: Chulun Lin, Taixia Shi, Yiqing Liu, Yang Chen

    Abstract: A novel microwave photonic scheme for secure data transmission in optical networks is proposed. The security of the scheme is guaranteed by physical encryption and decryption via the temporal Talbot effect in dispersive mediums. First, the original data is randomized in the digital domain by performing an exclusive OR operation using a random matrix. Subsequently, a time-varying multi-tone electri… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 19 pages, 15 figures, 1 table

  25. arXiv:2407.09022  [pdf

    eess.SP

    Improvement of Sensitivity of Capacitive Micromachined Ultrasound Transducer

    Authors: Yifan Wang, Yunfei Liu

    Abstract: Capacitive Micromachined Ultrasonic Transducer (CMUT) has a wild range of applications in medical detecting and imaging fields. However, operating under self-generating-self-receiving (SGSR) method usually results in poor sensitivity. But the sensitivity cannot be improved simply by increasing the resonant frequency since the frequency of a specific kind of CMUT is designed for specific usage. In… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 6 pages, 18 figures

  26. arXiv:2407.08914  [pdf, other

    cs.NI eess.SP

    Multi-objective Aerial Collaborative Secure Communication Optimization via Generative Diffusion Model-enabled Deep Reinforcement Learning

    Authors: Chuang Zhang, Geng Sun, Jiahui Li, Qingqing Wu, Jiacheng Wang, Dusit Niyato, Yuanwei Liu

    Abstract: Due to flexibility and low-cost, unmanned aerial vehicles (UAVs) are increasingly crucial for enhancing coverage and functionality of wireless networks. However, incorporating UAVs into next-generation wireless communication systems poses significant challenges, particularly in sustaining high-rate and long-range secure communications against eavesdropping attacks. In this work, we consider a UAV… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Transactions on Mobile Computing

  27. arXiv:2407.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Autoregressive Speech Synthesis without Vector Quantization

    Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

    Abstract: We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  28. arXiv:2407.07728  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    SaMoye: Zero-shot Singing Voice Conversion Based on Feature Disentanglement and Synthesis

    Authors: Zihao Wang, Le Ma, Yan Liu, Kejun Zhang

    Abstract: Singing voice conversion (SVC) aims to convert a singer's voice in a given music piece to another singer while keeping the original content. We propose an end-to-end feature disentanglement-based model, which we named SaMoye, to enable zero-shot many-to-many singing voice conversion. SaMoye disentangles the features of the singing voice into content features, timbre features, and pitch features re… ▽ More

    Submitted 10 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 7 pages, 4 figures

    MSC Class: 68Txx(Primary)14F05; 91Fxx(Secondary) ACM Class: I.2.7; J.5

  29. arXiv:2407.07245  [pdf, other

    eess.SY cs.NI eess.SP

    Accelerating Mobile Edge Generation (MEG) by Constrained Learning

    Authors: Xiaoxia Xu, Yuanwei Liu, Xidong Mu, Hong Xing, Arumugam Nallanathan

    Abstract: A novel accelerated mobile edge generation (MEG) framework is proposed for generating high-resolution images on mobile devices. Exploiting a large-scale latent diffusion model (LDM) distributed across edge server (ES) and user equipment (UE), cost-efficient artificial intelligence generated content (AIGC) is achieved by transmitting low-dimensional features between ES and UE. To reduce overheads o… ▽ More

    Submitted 6 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: 30 pages, 7 figures

  30. arXiv:2407.06901  [pdf, other

    cs.HC cs.SD eess.AS

    RespEar: Earable-Based Robust Respiratory Rate Monitoring

    Authors: Yang Liu, Kayla-Jade Butkow, Jake Stuchbury-Wass, Adam Pullin, Dong Ma, Cecilia Mascolo

    Abstract: Respiratory rate (RR) monitoring is integral to understanding physical and mental health and tracking fitness. Existing studies have demonstrated the feasibility of RR monitoring under specific user conditions (e.g., while remaining still, or while breathing heavily). Yet, performing accurate, continuous and non-obtrusive RR monitoring across diverse daily routines and activities remains challengi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  31. arXiv:2407.06614  [pdf, other

    eess.IV cs.CV

    Implicit Regression in Subspace for High-Sensitivity CEST Imaging

    Authors: Chu Chen, Yang Liu, Se Weon Park, Jizhou Li, Kannie W. Y. Chan, Raymond H. F. Chan

    Abstract: Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, c… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  32. arXiv:2407.05758  [pdf, other

    eess.IV cs.AI cs.CV

    Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

    Authors: Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang

    Abstract: Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  33. arXiv:2407.05619  [pdf, other

    cs.RO eess.SY

    AIRA: A Low-cost IR-based Approach Towards Autonomous Precision Drone Landing and NLOS Indoor Navigation

    Authors: Yanchen Liu, Minghui Zhao, Kaiyuan Hou, Junxi Xia, Charlie Carver, Stephen Xia, Xia Zhou, Xiaofan Jiang

    Abstract: Automatic drone landing is an important step for achieving fully autonomous drones. Although there are many works that leverage GPS, video, wireless signals, and active acoustic sensing to perform precise landing, autonomous drone landing remains an unsolved challenge for palm-sized microdrones that may not be able to support the high computational requirements of vision, wireless, or active audio… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  34. arXiv:2407.05617  [pdf, other

    eess.IV

    LINEAR: Learning Implicit Neural Representation With Explicit Physical Priors for Accelerated Quantitative T1rho Mapping

    Authors: Yuanyuan Liu, Jinwen Xie, Zhuo-Xu Cui, Qingyong Zhu, Jing Cheng, Dong Liang, Yanjie Zhu

    Abstract: Quantitative T1rho mapping has shown promise in clinical and research studies. However, it suffers from long scan times. Deep learning-based techniques have been successfully applied in accelerated quantitative MR parameter mapping. However, most methods require fully-sampled training dataset, which is impractical in the clinic. In this study, a novel subject-specific unsupervised method based on… ▽ More

    Submitted 23 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Yuanyuan Liu and Jinwen Xie contributed equally to this work

  35. arXiv:2407.05421  [pdf, other

    eess.AS cs.SD

    ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

    Authors: Ruibo Fu, Xin Qi, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Zhiyong Wang, Yi Lu, Xiaopeng Wang, Shuchen Shi, Yukun Liu, Xuefei Liu, Shuai Zhang

    Abstract: Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: The audio demo is available at https://7xin.github.io/ASRRL/

  36. arXiv:2407.03188  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation

    Authors: Zihao Wang, Haoxuan Liu, Jiaxing Yu, Tao Zhang, Yan Liu, Kejun Zhang

    Abstract: Amid the rising intersection of generative AI and human artistic processes, this study probes the critical yet less-explored terrain of alignment in human-centric automatic song composition. We propose a novel task of Colloquial Description-to-Song Generation, which focuses on aligning the generated content with colloquial human expressions. This task is aimed at bridging the gap between colloquia… ▽ More

    Submitted 10 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 19 pages, 5 figures

    MSC Class: 68Txx(Primary)14F05; 91Fxx(Secondary) ACM Class: I.2.7; J.5

  37. arXiv:2407.02918  [pdf, other

    cs.CV eess.IV

    Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction

    Authors: Jiaxin Guo, Jiangliu Wang, Di Kang, Wenzhen Dong, Wenting Wang, Yun-hui Liu

    Abstract: Real-time 3D reconstruction of surgical scenes plays a vital role in computer-assisted surgery, holding a promise to enhance surgeons' visibility. Recent advancements in 3D Gaussian Splatting (3DGS) have shown great potential for real-time novel view synthesis of general scenes, which relies on accurate poses and point clouds generated by Structure-from-Motion (SfM) for initialization. However, 3D… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  38. arXiv:2407.02804  [pdf, other

    eess.SP eess.SY

    Mobile Edge Generation-Enabled Digital Twin: Architecture Design and Research Opportunities

    Authors: Xiaoxia Xu, Ruikang Zhong, Xidong Mu, Yuanwei Liu, Kaibin Huang

    Abstract: A novel paradigm of mobile edge generation (MEG)-enabled digital twin (DT) is proposed, which enables distributed on-device generation at mobile edge networks for real-time DT applications. First, an MEG-DT architecture is put forward to decentralize generative artificial intelligence (GAI) models onto edge servers (ESs) and user equipments (UEs), which has the advantages of low latency, privacy p… ▽ More

    Submitted 6 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 7 pages, 6 figures

  39. arXiv:2406.19649  [pdf

    eess.IV cs.CV

    AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

    Authors: Guanghao Zhu, Jing Zhang, Juanxiu Liu, Xiaohui Du, Ruqian Hao, Yong Liu, Lin Liu

    Abstract: Semi-supervised learning (SSL) has shown considerable potential in medical image segmentation, primarily leveraging consistency regularization and pseudo-labeling. However, many SSL approaches only pay attention to low-level consistency and overlook the significance of pseudo-label reliability. Therefore, in this work, we propose an adversarial self-training consistency framework (AstMatch). First… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  40. arXiv:2406.19627  [pdf

    eess.SY

    Practical Power System Inertia Monitoring Based on Pumped Storage Hydropower Operation Signature

    Authors: Hongyu Li, Chang Chen, Mark Baldwin, Shutang You, Wenpeng Yu, Lin Zhu, Yilu Liu

    Abstract: This paper proposes a practical method to monitor power system inertia using Pumped Storage Hydropower (PSH) switching-off events. This approach offers real-time system-level inertia estimation with minimal expenses, no disruption, and the inclusion of behind-the-meter inertia. First, accurate inertia estimation is achieved through improved RoCoF calculation that accounts for pre-event RoCoF, redu… ▽ More

    Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 8 pages, 15 figures

  41. arXiv:2406.18009  [pdf, other

    eess.AS cs.SD

    E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

    Abstract: This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  42. arXiv:2406.17666  [pdf

    eess.IV

    Transformer-based segmentation of adnexal lesions and ovarian implants in CT images

    Authors: Aneesh Rangnekar, Kevin M. Boehm, Emily A. Aherne, Ines Nikolovski, Natalie Gangai, Ying Liu, Dimitry Zamarin, Kara L. Roche, Sohrab P. Shah, Yulia Lakhman, Harini Veeraraghavan

    Abstract: Two self-supervised pretrained transformer-based segmentation models (SMIT and Swin UNETR) fine-tuned on a dataset of ovarian cancer CT images provided reasonably accurate delineations of the tumors in an independent test dataset. Tumors in the adnexa were segmented more accurately by both transformers (SMIT and Swin UNETR) than the omental implants. AI-assisted labeling performed on 72 out of 245… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  43. arXiv:2406.16148  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

    Authors: Yuwei Zhang, Tong Xia, Jing Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia Mascolo

    Abstract: Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  44. arXiv:2406.15056  [pdf, ps, other

    cs.IT eess.SP

    Continuous Aperture Array (CAPA)-Based Wireless Communications: Capacity Characterization

    Authors: Boqun Zhao, Chongjun Ouyang, Xingqi Zhang, Yuanwei Liu

    Abstract: The capacity limits of continuous-aperture array (CAPA)-based wireless communications are characterized. To this end, an analytically tractable transmission framework is established for both uplink and downlink CAPA systems. Based on this framework, closed-form expressions for the single-user channel capacity are derived. The results are further extended to a multiuser case by characterizing the c… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  45. arXiv:2406.12323  [pdf, other

    eess.SP

    Hybrid Beamforming Design for Near-Field ISAC with Modular XL-MIMO

    Authors: Chunwei Meng, Dingyou Ma, Zhaolin Wang, Yuanwei Liu, Zhiqing Wei, Zhiyong Feng

    Abstract: A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and pla… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  46. arXiv:2406.12300  [pdf

    eess.IV cs.CV q-bio.NC

    IR2QSM: Quantitative Susceptibility Mapping via Deep Neural Networks with Iterative Reverse Concatenations and Recurrent Modules

    Authors: Min Li, Chen Chen, Zhuang Xiong, Ying Liu, Pengfei Rong, Shanshan Shan, Feng Liu, Hongfu Sun, Yang Gao

    Abstract: Quantitative susceptibility mapping (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-bas… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 9 figures

  47. arXiv:2406.11265  [pdf, ps, other

    eess.SY

    Balancing Performance and Cost for Two-Hop Cooperative Communications: Stackelberg Game and Distributed Multi-Agent Reinforcement Learning

    Authors: Yuanzhe Geng, Erwu Liu, Wei Ni, Rui Wang, Yan Liu, Hao Xu, Chen Cai, Abbas Jamalipour

    Abstract: This paper aims to balance performance and cost in a two-hop wireless cooperative communication network where the source and relays have contradictory optimization goals and make decisions in a distributed manner. This differs from most existing works that have typically assumed that source and relay nodes follow a schedule created implicitly by a central controller. We propose that the relays for… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  48. arXiv:2406.10941  [pdf, other

    eess.SP

    Near-Field Localization and Sensing with Large-Aperture Arrays: From Signal Modeling to Processing

    Authors: Zhaolin Wang, Parisa Ramezani, Yuanwei Liu, Emil Björnson

    Abstract: The signal processing community is currently witnessing a growing interest in near-field signal processing, driven by the trend towards the use of large aperture arrays with high spatial resolution in the fields of communication, localization, sensing, imaging, etc. From the perspective of localization and sensing, this trend breaks the basic far-field assumptions that have dominated the array sig… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 20 pages, 5 figures

  49. arXiv:2406.10591  [pdf, other

    eess.AS cs.AI cs.CV cs.MM cs.SD

    MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

    Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

    Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  50. arXiv:2406.10272  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Connected Speech-Based Cognitive Assessment in Chinese and English

    Authors: Saturnino Luz, Sofia De La Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu

    Abstract: We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: To appear in Proceedings of Interspeech 2024

    ACM Class: J.3; I.5.4