(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 190 results for author: Lin, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.07397  [pdf, other

    cs.SD eess.AS

    SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness

    Authors: Jie Lin, Xiuping Yang, Li Xiao, Xinhong Li, Weiyan Yi, Yuhong Yang, Weiping Tu, Xiong Chen

    Abstract: Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a prevalent chronic breathing disorder caused by upper airway obstruction. Previous studies advanced OSAHS evaluation through machine learning-based systems trained on sleep snoring or speech signal datasets. However, constructing datasets for training a precise and rapid OSAHS evaluation system poses a challenge, since 1) it is time-consuming t… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2407.02826  [pdf, other

    eess.AS

    SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

    Authors: Jingru Lin, Meng Ge, Junyi Ao, Liqun Deng, Haizhou Li

    Abstract: It was shown that pre-trained models with self-supervised learning (SSL) techniques are effective in various downstream speech tasks. However, most such models are trained on single-speaker speech data, limiting their effectiveness in mixture speech. This motivates us to explore pre-training on mixture speech. This work presents SA-WavLM, a novel pre-trained model for mixture speech. Specifically,… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: InterSpeech 2024

  3. arXiv:2407.01536  [pdf, other

    eess.SY

    Beyond Profit: A Multi-Objective Framework for Electric Vehicle Charging Station Operations

    Authors: Shuoyao Wang, Jiawei Lin

    Abstract: This paper explores the pricing and scheduling strategies of the electric vehicle charging stations in response to the rising demand for cleaner transportation. Most of the existing methods focus on maximizing the energy efficiency or the charging station profit, however, the reputation of EVs is also a key factor for the long-term charging station operations. To address these gaps, we propose a n… ▽ More

    Submitted 12 March, 2024; originally announced July 2024.

    Comments: Accepted By VTC24-Spring

  4. arXiv:2406.14869  [pdf, other

    eess.SP

    Cost-Effective RF Fingerprinting Based on Hybrid CVNN-RF Classifier with Automated Multi-Dimensional Early-Exit Strategy

    Authors: Jiayan Gan, Zhixing Du, Qiang Li, Huaizong Shao, Jingran Lin, Ye Pan, Zhongyi Wen, Shafei Wang

    Abstract: While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms h… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  5. arXiv:2406.09989  [pdf, other

    q-bio.NC eess.SY

    Suppressing seizure via optimal electrical stimulation to the hub of epileptic brain network

    Authors: Zhichao Liang, Guanyi Zhao, Yinuo Zhang, Weiting Sun, Jingzhe Lin, Jialin Wang, Quanying Liu

    Abstract: The electrical stimulation to the seizure onset zone (SOZ) serves as an efficient approach to seizure suppression. Recently, seizure dynamics have gained widespread attendance in its network propagation mechanisms. Compared with the direct stimulation to SOZ, other brain network-level approaches that can effectively suppress epileptic seizures remain under-explored. In this study, we introduce a p… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  6. arXiv:2406.07437  [pdf, other

    cs.SD eess.AS

    Graph-based multi-Feature fusion method for speech emotion recognition

    Authors: Xueyu Liu, Jie Lin, Chao Wang

    Abstract: Exploring proper way to conduct multi-speech feature fusion for cross-corpus speech emotion recognition is crucial as different speech features could provide complementary cues reflecting human emotion status. While most previous approaches only extract a single speech feature for emotion recognition, existing fusion methods such as concatenation, parallel connection, and splicing ignore heterogen… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 25 pages,4 figures

  7. arXiv:2406.01245  [pdf, other

    eess.IV

    Sparse Focus Network for Multi-Source Remote Sensing Data Classification

    Authors: Xuepeng Jin, Junyan Lin, Feng Gao, Lin Qi, Yang Zhou

    Abstract: Multi-source remote sensing data classification has emerged as a prominent research topic with the advancement of various sensors. Existing multi-source data classification methods are susceptible to irrelevant information interference during multi-source feature extraction and fusion. To solve this issue, we propose a sparse focus network for multi-source data classification. Sparse attention is… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE IGARSS 2024

  8. arXiv:2406.01235  [pdf, other

    eess.IV

    Boosting Spatial-Spectral Masked Auto-Encoder Through Mining Redundant Spectra for HSI-SAR/LiDAR Classification

    Authors: Junyan Lin, Xuepeng Jin, Feng Gao, Junyu Dong, Hui Yu

    Abstract: Although recent masked image modeling (MIM)-based HSI-LiDAR/SAR classification methods have gradually recognized the importance of the spectral information, they have not adequately addressed the redundancy among different spectra, resulting in information leakage during the pretraining stage. This issue directly impairs the representation ability of the model. To tackle the problem, we propose a… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IGARSS 2024

  9. arXiv:2406.00421  [pdf

    eess.SY

    Modal Analysis of Power System with High CIG Penetration Based on Impedance Models

    Authors: Le Zheng, Jiajie Zheng, Jiajian Lin, Chongru Liu

    Abstract: This paper explores the modal analysis of power systems with high Converter-Interfaced Generation (CIG) penetration utilizing an impedance-based modeling approach. Traditional modal analysis based on the state-space model (MASS) requires comprehensive control structures and parameters of each system element, a challenging prerequisite as converters increasingly integrate into power systems and the… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  10. arXiv:2405.20693  [pdf, other

    eess.IV cs.CV

    R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

    Authors: Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li

    Abstract: 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a p… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  11. arXiv:2405.13661  [pdf, ps, other

    cs.SD eess.AS

    Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review

    Authors: Hong Zhang, Jie Lin, Shengxuan Chen

    Abstract: Timbre, the sound's unique "color", is fundamental to how we perceive and appreciate music. This review explores the multifaceted world of timbre perception and representation. It begins by tracing the word's origin, offering an intuitive grasp of the concept. Building upon this foundation, the article delves into the complexities of defining and measuring timbre. It then explores the concept and… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  12. arXiv:2405.13636  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Mamba: Pretrained Audio State Space Model For Audio Tagging

    Authors: Jiaju Lin, Haoxuan Hu

    Abstract: Audio tagging is an important task of mapping audio samples to their corresponding categories. Recently endeavours that exploit transformer models in this field have achieved great success. However, the quadratic self-attention cost limits the scaling of audio transformer models and further constrains the development of more universal audio models. In this paper, we attempt to solve this problem b… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  13. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang Jin, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables

  14. arXiv:2405.09561  [pdf

    eess.SP cs.AI cs.LG

    GAD: A Real-time Gait Anomaly Detection System with Online Adaptive Learning

    Authors: Ming-Chang Lee, Jia-Chun Lin, Sokratis Katsikas

    Abstract: Gait anomaly detection is a task that involves detecting deviations from a person's normal gait pattern. These deviations can indicate health issues and medical conditions in the healthcare domain, or fraudulent impersonation and unauthorized identity access in the security domain. A number of gait anomaly detection approaches have been introduced, but many of them require offline data preprocessi… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 14 pages, 8 figures, 3 tables, ICT Systems Security and Privacy Protection 39th IFIP TC 11 International Conference, SEC 2024, Edinburgh, UK, June 12-14, 2024, Proceedings (IFIP SEC2024)

  15. arXiv:2405.05518  [pdf, other

    cs.CV cs.RO eess.IV

    DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction

    Authors: Siyu Li, Jiacheng Lin, Hao Shi, Jiaming Zhang, Song Wang, You Yao, Zhiyong Li, Kailun Yang

    Abstract: Temporal information plays a pivotal role in Bird's-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: The source code will be made publicly available at https://github.com/lynn-yu/DTCLMapper

  16. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  17. arXiv:2405.03692  [pdf, other

    eess.IV cs.NI eess.SY

    Imitation Learning for Adaptive Video Streaming with Future Adversarial Information Bottleneck Principle

    Authors: Shuoyao Wang, Jiawei Lin, Fangwei Ye

    Abstract: Adaptive video streaming plays a crucial role in ensuring high-quality video streaming services. Despite extensive research efforts devoted to Adaptive BitRate (ABR) techniques, the current reinforcement learning (RL)-based ABR algorithms may benefit the average Quality of Experience (QoE) but suffers from fluctuating performance in individual video sessions. In this paper, we present a novel appr… ▽ More

    Submitted 12 March, 2024; originally announced May 2024.

    Comments: submitted to IEEE Journal

  18. arXiv:2404.19615  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    SemiPL: A Semi-supervised Method for Event Sound Source Localization

    Authors: Yue Li, Baiqiao Yin, Jinfu Liu, Jiajun Wen, Jiaying Lin, Mengyuan Liu

    Abstract: In recent years, Event Sound Source Localization has been widely applied in various fields. Recent works typically relying on the contrastive learning framework show impressive performance. However, all work is based on large relatively simple datasets. It's also crucial to understand and analyze human behaviors (actions and interactions of people), voices, and sounds in chaotic events in many app… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  19. arXiv:2404.17554  [pdf

    cs.HC eess.SP eess.SY stat.AP

    A Novel Context driven Critical Integrative Levels (CIL) Approach: Advancing Human-Centric and Integrative Lighting Asset Management in Public Libraries with Practical Thresholds

    Authors: Jing Lin, Nina Mylly, Per Olof Hedekvist, Jingchun Shen

    Abstract: This paper proposes the context driven Critical Integrative Levels (CIL), a novel approach to lighting asset management in public libraries that aligns with the transformative vision of human-centric and integrative lighting. This approach encompasses not only the visual aspects of lighting performance but also prioritizes the physiological and psychological well-being of library users. Incorporat… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  20. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  21. arXiv:2404.15279  [pdf, other

    eess.SP cs.AI

    Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification

    Authors: Jimmy Lin, Junkai Li, Jiasi Gao, Weizhi Ma, Yang Liu

    Abstract: Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances.… ▽ More

    Submitted 20 January, 2024; originally announced April 2024.

    Comments: Accepted by AAAI 2024

  22. arXiv:2404.12794  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

    Authors: Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang

    Abstract: LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Ob… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/Terminal-K/MambaMOS

  23. arXiv:2404.09729  [pdf

    eess.SP cs.IT cs.LG stat.ME

    Amplitude-Phase Fusion for Enhanced Electrocardiogram Morphological Analysis

    Authors: Shuaicong Hu, Yanan Wang, Jian Liu, Jingyu Lin, Shengmei Qin, Zhenning Nie, Zhifeng Yao, Wenjie Cai, Cuiwei Yang

    Abstract: Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG mor… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 16 pages, 12 figures

    ACM Class: I.5.2

  24. arXiv:2404.01929  [pdf

    eess.IV cs.CV

    Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method

    Authors: Jyun-An Lin, Yun-Chien Cheng, Ching-Kai Lin

    Abstract: This study aims to establish a computer-aided diagnostic system for lung lesions using endobronchial ultrasound (EBUS) to assist physicians in identifying lesion areas. During EBUS-transbronchial needle aspiration (EBUS-TBNA) procedures, hysicians rely on grayscale ultrasound images to determine the location of lesions. However, these images often contain significant noise and can be influenced by… ▽ More

    Submitted 20 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  25. arXiv:2403.05808  [pdf, other

    cs.CV eess.IV

    Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

    Authors: Junxiong Lin, Yan Wang, Zeng Tao, Boyang Wang, Qing Zhao, Haorang Wang, Xuan Tong, Xinji Mai, Yuxuan Lin, Wei Song, Jiawen Yu, Shaoqi Yan, Wenqiang Zhang

    Abstract: Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation inf… ▽ More

    Submitted 9 July, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  26. arXiv:2402.18302  [pdf, other

    cs.CV cs.RO eess.AS eess.IV

    EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

    Authors: Jiacheng Lin, Jiajun Chen, Kunyu Peng, Xuan He, Zhiyong Li, Rainer Stiefelhagen, Kailun Yang

    Abstract: This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multi-object tracking, which often comes at the cos… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: The source code and datasets will be made publicly available at https://github.com/lab206/EchoTrack

  27. arXiv:2402.17785  [pdf, other

    cs.SD cs.AI eess.AS

    ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

    Authors: Xia Liang, Xingjian Du, Jiaju Lin, Pei Zou, Yuan Wan, Bilei Zhu

    Abstract: Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Eval… ▽ More

    Submitted 6 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  28. arXiv:2402.00859  [pdf, other

    eess.AS

    Deep Room Impulse Response Completion

    Authors: Jackie Lin, Georg Götz, Sebastian J. Schlecht

    Abstract: Rendering immersive spatial audio in virtual reality (VR) and video games demands a fast and accurate generation of room impulse responses (RIRs) to recreate auditory environments plausibly. However, the conventional methods for simulating or measuring long RIRs are either computationally intensive or challenged by low signal-to-noise ratios. This study is propelled by the insight that direct soun… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: The following article has been submitted to the EURASIP Journal on Audio, Speech, and Music Processing

  29. arXiv:2401.10411  [pdf, other

    eess.AS cs.SD

    AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

    Authors: Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide

    Abstract: Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) for smart glasses that have microphone arrays, which fuses multi-channel ASR with serialized output training, for wearer/conversation-partner disambiguation as well as s… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  30. arXiv:2312.16002  [pdf, other

    eess.AS cs.AI

    The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

    Authors: Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

    Abstract: This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition. Our submitted systems for ICMC-ASR Challenge include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best system achieves… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Technical Report. 2 pages. For ICMC-ASR-2023 Challenge

  31. arXiv:2312.14473  [pdf, other

    math.OC eess.SY

    Coordinated Active-Reactive Power Management of ReP2H Systems with Multiple Electrolyzers

    Authors: Yangjun Zeng, Buxiang Zhou, Jie Zhu, Jiarong Li, Bosen Yang, Jin Lin, Yiwei Qiu

    Abstract: Utility-scale renewable power-to-hydrogen (ReP2H) production typically uses thyristor rectifiers (TRs) to supply power to multiple electrolyzers (ELZs). They exhibit a nonlinear and non-decouplable relation between active and reactive power. The on-off scheduling and load allocation of multiple ELZs simultaneously impact energy conversion efficiency and AC-side active and reactive power flow. Impr… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  32. arXiv:2312.10687  [pdf, other

    eess.AS cs.SD

    MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis

    Authors: Wenhao Guan, Yishuang Li, Tao Li, Hukai Huang, Feng Wang, Jiayan Lin, Lingyan Huang, Lin Li, Qingyang Hong

    Abstract: The style transfer task in Text-to-Speech refers to the process of transferring style information into text content to generate corresponding speech with a specific style. However, most existing style transfer approaches are either based on fixed emotional labels or reference speech clips, which cannot achieve flexible style transfer. Recently, some methods have adopted text descriptions to guide… ▽ More

    Submitted 31 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI2024

  33. arXiv:2311.15556  [pdf, other

    cs.CV eess.IV

    PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images

    Authors: Jiquan Yuan, Xinyan Cao, Changjin Li, Fanyi Yang, Jinlong Lin, Xixin Cao

    Abstract: As image generation technology advances, AI-based image generation has been applied in various fields and Artificial Intelligence Generated Content (AIGC) has garnered widespread attention. However, the development of AI-based image generative models also brings new problems and challenges. A significant challenge is that AI-generated images (AIGI) may exhibit unique distortions compared to natura… ▽ More

    Submitted 29 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 18 pages

  34. arXiv:2311.05653  [pdf, ps, other

    eess.SY

    Minimal Input Structural Modifications for Strongly Structural Controllability

    Authors: Geethu Joseph, Shana Moothedath, Jiabin Lin

    Abstract: This paper studies the problem of modifying the input matrix of a structured system to make the system strongly structurally controllable. We focus on the generalized structured systems that rely on zero/nonzero/arbitrary structure, i.e., some entries of system matrices are zeros, some are nonzero, and the remaining entries can be zero or nonzero (arbitrary). We analyze the feasibility of the prob… ▽ More

    Submitted 17 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

  35. arXiv:2311.04526  [pdf, other

    eess.AS

    Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech

    Authors: Jingru Lin, Meng Ge, Wupeng Wang, Haizhou Li, Mengling Feng

    Abstract: Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks. Since they are mainly pre-trained to map input speech to pseudo-labels, the resulting representations are only effective for the type of pre-train data used, either clean or mixture speech. With the idea of selective auditory attention, we propose a novel pre-training solution called Sele… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  36. arXiv:2311.04442  [pdf, other

    eess.IV cs.CV

    SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image Classification

    Authors: Junyan Lin, Feng Gao, Xiaocheng Shi, Junyu Dong, Qian Du

    Abstract: Masked image modeling (MIM) is a highly popular and effective self-supervised learning method for image understanding. Existing MIM-based methods mostly focus on spatial feature modeling, neglecting spectral feature modeling. Meanwhile, existing MIM-based methods use Transformer for feature extraction, some local or high-frequency information may get lost. To this end, we propose a spatial-spectra… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: IEEE TGRS 2023

  37. arXiv:2310.10159  [pdf, other

    cs.SD cs.CL eess.AS

    Joint Music and Language Attention Models for Zero-shot Music Tagging

    Authors: Xingjian Du, Zhesong Yu, Jiaju Lin, Bilei Zhu, Qiuqiang Kong

    Abstract: Music tagging is a task to predict the tags of music recordings. However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags. In this work, we propose a zero-shot music tagging system modeled by a joint music and language attention (JMLA) model to address the open-set music tagging problem. The JMLA model consists of an audio… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: \begin{keywords} Music tagging, joint music and language attention models, Music Foundation Model. \end{keywords}

  38. arXiv:2310.03750  [pdf

    eess.SP cond-mat.mtrl-sci cs.LG physics.app-ph

    Health diagnosis and recuperation of aged Li-ion batteries with data analytics and equivalent circuit modeling

    Authors: Riko I Made, Jing Lin, Jintao Zhang, Yu Zhang, Lionel C. H. Moh, Zhaolin Liu, Ning Ding, Sing Yang Chiam, Edwin Khoo, Xuesong Yin, Guangyuan Wesley Zheng

    Abstract: Battery health assessment and recuperation play a crucial role in the utilization of second-life Li-ion batteries. However, due to ambiguous aging mechanisms and lack of correlations between the recovery effects and operational states, it is challenging to accurately estimate battery health and devise a clear strategy for cell rejuvenation. This paper presents aging and reconditioning experiments… ▽ More

    Submitted 21 September, 2023; originally announced October 2023.

    Comments: 20 pages, 5 figures, 1 table

    Journal ref: iScience (2024)

  39. arXiv:2310.01861  [pdf, other

    eess.IV cs.CV cs.GR

    Shifting More Attention to Breast Lesion Segmentation in Ultrasound Videos

    Authors: Junhao Lin, Qian Dai, Lei Zhu, Huazhu Fu, Qiong Wang, Weibin Li, Wenhao Rao, Xiaoyang Huang, Liansheng Wang

    Abstract: Breast lesion segmentation in ultrasound (US) videos is essential for diagnosing and treating axillary lymph node metastasis. However, the lack of a well-established and large-scale ultrasound video dataset with high-quality annotations has posed a persistent challenge for the research community. To overcome this issue, we meticulously curated a US video breast lesion segmentation dataset comprisi… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 10 pages

  40. One-Bit Channel Estimation for IRS-aided Millimeter-Wave Massive MU-MISO System

    Authors: Silei Wang, Qiang Li, Jingran Lin

    Abstract: Recently, intelligent reflecting surface (IRS)-assisted communication has gained considerable attention due to its advantage in extending the coverage and compensating the path loss with low-cost passive metasurface. This paper considers the uplink channel estimation for IRS-aided multiuser massive MISO communications with one-bit ADCs at the base station (BS). The use of one-bit ADC is impelled b… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: Accepted by IEEE Trans. Signal Process

  41. Driving behavior-guided battery health monitoring for electric vehicles using machine learning

    Authors: Nanhua Jiang, Jiawei Zhang, Weiran Jiang, Yao Ren, Jing Lin, Edwin Khoo, Ziyou Song

    Abstract: An accurate estimation of the state of health (SOH) of batteries is critical to ensuring the safe and reliable operation of electric vehicles (EVs). Feature-based machine learning methods have exhibited enormous potential for rapidly and precisely monitoring battery health status. However, simultaneously using various health indicators (HIs) may weaken estimation performance due to feature redunda… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Journal ref: Applied Energy (2024)

  42. arXiv:2309.10993  [pdf, other

    cs.SD cs.HC eess.AS

    Directional Source Separation for Robust Speech Recognition on Smart Glasses

    Authors: Tiantian Feng, Ju Lin, Yiteng Huang, Weipeng He, Kaustubh Kalgaonkar, Niko Moritz, Li Wan, Xin Lei, Ming Sun, Frank Seide

    Abstract: Modern smart glasses leverage advanced audio sensing and machine learning technologies to offer real-time transcribing and captioning services, considerably enriching human experiences in daily communications. However, such systems frequently encounter challenges related to environmental noises, resulting in degradation to speech recognition and speaker change detection. To improve voice quality,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  43. arXiv:2308.16476  [pdf, other

    eess.SY

    Multi-Stage Expansion Planning for Decarbonizing Thermal Generation Supported Renewable Power Systems Using Hydrogen and Ammonia Storage

    Authors: Zhipeng Yu, Jin Lin, Feng Liu, Jiarong Li, Yingtian Chi, Yonghua Song, Zhengwei Ren

    Abstract: Large-scale centralized development of wind and solar energy and peer-to-grid transmission of renewable energy source (RES) via high voltage direct current (HVDC) has been regarded as one of the most promising ways to achieve goals of peak carbon and carbon neutrality in China. Traditionally, large-scale thermal generation is needed to economically support the load demand of HVDC with a given prof… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 10 pages, 8 figures

  44. arXiv:2308.04162  [pdf, other

    cs.CV eess.AS eess.IV

    EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation

    Authors: Jiajun Chen, Jiacheng Lin, Zhiqiang Xiao, Haolong Fu, Ke Nai, Kailun Yang, Zhiyong Li

    Abstract: Audio-guided Video Object Segmentation (A-VOS) and Referring Video Object Segmentation (R-VOS) are two highly-related tasks, which both aim to segment specific objects from video sequences according to user-provided expression prompts. However, due to the challenges in modeling representations for different modalities, contemporary methods struggle to strike a balance between interaction flexibili… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: The source code will be made publicly available at https://github.com/lab206/EPCFormer

  45. arXiv:2308.01173  [pdf, other

    eess.IV

    FlexDTI: Flexible diffusion gradient encoding scheme-based highly efficient diffusion tensor imaging using deep learning

    Authors: Zejun Wu, Jiechao Wang, Zunquan Chen, Qinqin Yang, Zhen Xing, Dairong Cao, Jianfeng Bao, Taishan Kang, Jianzhong Lin, Shuhui Cai, Zhong Chen, Congbo Cai

    Abstract: Objective: Most deep neural network-based diffusion tensor imaging methods require the diffusion gradients' number and directions in the data to be reconstructed to match those in the training data. This work aims to develop and evaluate a novel dynamic-convolution-based method called FlexDTI for highly efficient diffusion tensor reconstruction with flexible diffusion encoding gradient scheme. App… ▽ More

    Submitted 21 December, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 24 pages,9 figures,3 tables

  46. arXiv:2307.13346  [pdf, other

    cs.SD cs.MM eess.AS

    A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis

    Authors: Li Xiao, Xiuping Yang, Xinhong Li, Weiping Tu, Xiong Chen, Weiyan Yi, Jie Lin, Yuhong Yang, Yanzhen Ren

    Abstract: Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a chronic breathing disorder caused by a blockage in the upper airways. Snoring is a prominent symptom of OSAHS, and previous studies have attempted to identify the obstruction site of the upper airways by snoring sounds. Despite some progress, the classification of the obstruction site remains challenging in real-world clinical settings due to… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Accepted to INTERSPEECH 2023

  47. arXiv:2307.13220  [pdf

    eess.IV cs.AI physics.med-ph

    One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

    Authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Meijing Lin, Jiefeng Guo, Congbo Cai, Zhong Chen , et al. (3 additional authors not shown)

    Abstract: Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 38 pages, 19 figures, 5 tables

  48. arXiv:2307.12134  [pdf, other

    cs.CL cs.SD eess.AS

    Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

    Authors: Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer

    Abstract: End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic parse from speech have become more promising recently. This approach uses a single model that utilizes audio and text representations from pre-trained speech recognition models (ASR), and outperforms traditional pipeline SLU systems in on-device streaming scenarios. However, E2E SLU systems still show weakness wh… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: INTERSPEECH 2023

  49. arXiv:2307.09871  [pdf, other

    eess.AS

    Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder

    Authors: Jingru Lin, Xianghu Yue, Junyi Ao, Haizhou Li

    Abstract: Acoustic word embeddings (AWEs) aims to map a variable-length speech segment into a fixed-dimensional representation. High-quality AWEs should be invariant to variations, such as duration, pitch and speaker. In this paper, we introduce a novel self-supervised method to learn robust AWEs from a large-scale unlabelled speech corpus. Our model, named Correspondence Transformer Encoder (CTE), employs… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  50. arXiv:2306.16250  [pdf, other

    cs.SD eess.AS

    MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation

    Authors: Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Yukai Ju, Shulin He, Yannan Wang, Zhiyong Wu

    Abstract: The previous SpEx+ has yielded outstanding performance in speaker extraction and attracted much attention. However, it still encounters inadequate utilization of multi-scale information and speaker embedding. To this end, this paper proposes a new effective speaker extraction system with multi-scale interfusion and conditional speaker modulation (ConSM), which is called MC-SpEx. First of all, we d… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted by InterSpeech 2023