(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 246 results for author: Xiao, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.08550  [pdf

    cs.AI cs.ET cs.MA cs.RO eess.SY

    Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility

    Authors: Yuchen Xia, Jize Zhang, Nasser Jazdi, Michael Weyrich

    Abstract: This paper introduces a novel approach to integrating large language model (LLM) agents into automated production systems, aimed at enhancing task automation and flexibility. We organize production operations within a hierarchical framework based on the automation pyramid. Atomic operation functionalities are modeled as microservices, which are executed through interface invocation within a dedica… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Report number: VDI-Berichte Nr. 2437, 2024

  2. arXiv:2407.05928  [pdf, other

    eess.SP

    CA-FedRC: Codebook Adaptation via Federated Reservoir Computing in 5G NR

    Authors: Ziqiang Ye, Sikai Liao, Yulan Gao, Shu Fang, Yue Xiao, Ming Xiao, Saviour Zammit

    Abstract: With the burgeon deployment of the fifth-generation new radio (5G NR) networks, the codebook plays a crucial role in enabling the base station (BS) to acquire the channel state information (CSI). Different 5G NR codebooks incur varying overheads and exhibit performance disparities under diverse channel conditions, necessitating codebook adaptation based on channel conditions to reduce feedback ove… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2407.05310  [pdf, other

    eess.SP cs.NE cs.SD eess.AS

    Ternary Spike-based Neuromorphic Signal Processing System

    Authors: Shuai Wang, Dehao Zhang, Ammar Belatreche, Yichen Xiao, Hongyu Qing, Wenjie We, Malu Zhang, Yang Yang

    Abstract: Deep Neural Networks (DNNs) have been successfully implemented across various signal processing fields, resulting in significant enhancements in performance. However, DNNs generally require substantial computational resources, leading to significant economic costs and posing challenges for their deployment on resource-constrained edge devices. In this study, we take advantage of spiking neural net… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  4. arXiv:2407.03661  [pdf, other

    eess.AS cs.SD

    Configurable DOA Estimation using Incremental Learning

    Authors: Yang Xiao, Rohan Kumar Das

    Abstract: This study introduces a progressive neural network (PNN) model for direction of arrival (DOA) estimation, DOA-PNN, addressing the challenge due to catastrophic forgetting in adapting dynamic acoustic environments. While traditional methods such as GCC, MUSIC, and SRP-PHAT are effective in static settings, they perform worse in noisy, reverberant conditions. Deep learning models, particularly CNNs,… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Submitted to DCASE WS 2024

  5. arXiv:2407.03657  [pdf, other

    eess.AS cs.SD

    UCIL: An Unsupervised Class Incremental Learning Approach for Sound Event Detection

    Authors: Yang Xiao, Rohan Kumar Das

    Abstract: This work explores class-incremental learning (CIL) for sound event detection (SED), advancing adaptability towards real-world scenarios. CIL's success in domains like computer vision inspired our SED-tailored method, addressing the unique challenges of diverse and complex audio environments. Our approach employs an independent unsupervised learning framework with a distillation loss function to i… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Submitted to DCASE WS 2024

  6. arXiv:2407.03656  [pdf, other

    eess.AS cs.SD

    WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection System

    Authors: Yang Xiao, Rohan Kumar Das

    Abstract: This work aims to advance sound event detection (SED) research by presenting a new large language model (LLM)-powered dataset namely wild domestic environment sound event detection (WildDESED). It is crafted as an extension to the original DESED dataset to reflect diverse acoustic variability and complex noises in home settings. We leveraged LLMs to generate eight different domestic scenarios base… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Submitted to DCASE WS 2024

  7. arXiv:2407.03654  [pdf, other

    eess.AS

    Mixstyle based Domain Generalization for Sound Event Detection with Heterogeneous Training Data

    Authors: Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

    Abstract: This work explores domain generalization (DG) for sound event detection (SED), advancing adaptability towards real-world scenarios. Our approach employs a mean-teacher framework with domain generalization to integrate heterogeneous training data, while preserving the SED model performance across the datasets. Specifically, we first apply mixstyle to the frequency dimension to adapt the mel-spectro… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Sumbitted to DCASE WS 2024. 5 pages. arXiv admin note: text overlap with arXiv:2407.00291

  8. arXiv:2407.00291  [pdf, other

    eess.AS cs.SD

    FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

    Authors: Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

    Abstract: This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Technical report for DCASE 2024 Challenge Task 4

  9. arXiv:2406.18313  [pdf, other

    cs.SD cs.CL eess.AS

    Advancing Airport Tower Command Recognition: Integrating Squeeze-and-Excitation and Broadcasted Residual Learning

    Authors: Yuanxi Lin, Tonglin Zhou, Yang Xiao

    Abstract: Accurate recognition of aviation commands is vital for flight safety and efficiency, as pilots must follow air traffic control instructions precisely. This paper addresses challenges in speech command recognition, such as noisy environments and limited computational resources, by advancing keyword spotting technology. We create a dataset of standardized airport tower commands, including routine an… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by IALP 2024

  10. arXiv:2406.17578  [pdf, other

    eess.IV

    Sparse-view Signal-domain Photoacoustic Tomography Reconstruction Method Based on Neural Representation

    Authors: Bowei Yao, Yi Zeng, Haizhao Dai, Qing Wu, Youshen Xiao, Fei Gao, Yuyao Zhang, Jingyi Yu, Xiran Cai

    Abstract: Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  11. arXiv:2406.16102  [pdf, other

    eess.SP

    Federated Transfer Learning Aided Interference Classification in GNSS Signals

    Authors: Min Jiang, Ziqiang Ye, Yue Xiao, Xiaogang Gou

    Abstract: This study delves into the classification of interference signals to global navigation satellite systems (GNSS) stemming from mobile jammers such as unmanned aerial vehicles (UAVs) across diverse wireless communication zones, employing federated learning (FL) and transfer learning (TL). Specifically, we employ a neural network classifier, enhanced with FL to decentralize data processing and TL to… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 6 pages, 5 figures, conference accepted

  12. arXiv:2406.07498  [pdf, other

    cs.SD eess.AS

    RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention

    Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's perfor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  13. arXiv:2406.05961  [pdf, other

    eess.AS

    BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation

    Authors: Zihan Zhang, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: Audio packet loss is an inevitable problem in real-time speech communication. A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed. Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS. This paper presents its updated version, BS-PLCNet 2, to reduce comput… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  14. arXiv:2406.05699  [pdf, ps, other

    eess.AS cs.AI eess.SP

    An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

    Authors: Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, Jinzhu Li, Sheng Zhao, Jinyu Li, Naoyuki Kanda

    Abstract: Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue. In this paper, we explored various strategies to enhance the quality of audi… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH2024

  15. arXiv:2406.03038  [pdf

    eess.SY

    Study on layout of double rotated serpentine springs for vertical-comb-driven torsional micromirror

    Authors: Biyun Ling, Yuhu Xia, Minli Cai, Xiaoyue Wang, Yaming Wu

    Abstract: The combination of double rotated serpentine springs (RSS) and vertical comb-drive is a suitbale solution for the development of torsional micromirror with high fill factor, low fabrication difficulty and good performance. However, the alignment error between upper and lower comb set caused by fabrication can induce force with unexpected direction. And the cross-axis coupled spring constants in do… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  16. arXiv:2406.02190  [pdf, ps, other

    eess.SY

    Age of Trust (AoT): A Continuous Verification Framework for Wireless Networks

    Authors: Yuquan Xiao, Qinghe Du, Wenchi Cheng, Panagiotis D. Diamantoulakis, George K. Karagiannidis

    Abstract: Zero Trust is a new security vision for 6G networks that emphasises the philosophy of never trust and always verify. However, there is a fundamental trade-off between the wireless transmission efficiency and the trust level, which is reflected by the verification interval and its adaptation strategy. More importantly, the mathematical framework to characterise the trust level of the adaptive verif… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  17. arXiv:2406.02139  [pdf, other

    eess.SY

    Statistical Age of Information: A Risk-Aware Metric and Its Applications in Status Updates

    Authors: Yuquan Xiao, Qinghe Du, George K. Karagiannidis

    Abstract: Age of information (AoI) is an effective measure to quantify the information freshness in wireless status update systems. It has been further validated that the peak AoI has the potential to capture the core characteristics of the aging process, and thus the average peak AoI is widely used to evaluate the long-term performance of information freshness. However, the average peak AoI is a risk-insen… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  18. arXiv:2406.02014  [pdf, other

    q-bio.NC cs.LG cs.SD eess.AS

    Understanding Auditory Evoked Brain Signal via Physics-informed Embedding Network with Multi-Task Transformer

    Authors: Wanli Ma, Xuegang Tang, Jin Gu, Ying Wang, Yuling Xia

    Abstract: In the fields of brain-computer interaction and cognitive neuroscience, effective decoding of auditory signals from task-based functional magnetic resonance imaging (fMRI) is key to understanding how the brain processes complex auditory information. Although existing methods have enhanced decoding capabilities, limitations remain in information utilization and model representation. To overcome the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  19. arXiv:2405.18267  [pdf, other

    eess.IV cs.CV cs.LG

    CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths

    Authors: Reihaneh Teimouri, Marta Kersten-Oertel, Yiming Xiao

    Abstract: Efficient and accurate brain ventricle segmentation from clinical CT scans is critical for emergency surgeries like ventriculostomy. With the challenges in poor soft tissue contrast and a scarcity of well-annotated databases for clinical brain CTs, we introduce a novel uncertainty-aware ventricle segmentation technique without the need of CT segmentation ground truths by leveraging diffusion-model… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Early acceptance at MICCAI2024

  20. arXiv:2405.18092  [pdf

    cs.AI cs.ET cs.MA cs.RO eess.SY

    LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins

    Authors: Yuchen Xia, Daniel Dittler, Nasser Jazdi, Haonan Chen, Michael Weyrich

    Abstract: This paper presents a novel design of a multi-agent system framework that applies a large language model (LLM) to automate the parametrization of process simulations in digital twins. We propose a multi-agent framework that includes four types of agents: observation, reasoning, decision and summarization. By enabling dynamic interaction between LLM agents and simulation model, the developed system… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Submitted to IEEE-ETFA2024, under peer-review

  21. arXiv:2405.17270  [pdf, other

    eess.SP

    Towards Accurate Ego-lane Identification with Early Time Series Classification

    Authors: Yuchuan Jin, Theodor Stenhammar, David Bejmer, Axel Beauvisage, Yuxuan Xia, Junsheng Fu

    Abstract: Accurate and timely determination of a vehicle's current lane within a map is a critical task in autonomous driving systems. This paper utilizes an Early Time Series Classification (ETSC) method to achieve precise and rapid ego-lane identification in real-world driving data. The method begins by assessing the similarities between map and lane markings perceived by the vehicle's camera using measur… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures

  22. arXiv:2405.16905  [pdf, other

    eess.SY

    Privacy and Security Trade-off in Interconnected Systems with Known or Unknown Privacy Noise Covariance

    Authors: Haojun Wang, Kun Liu, Baojia Li, Emilia Fridman, Yuanqing Xia

    Abstract: This paper is concerned with the security problem for interconnected systems, where each subsystem is required to detect local attacks using locally available information and the information received from its neighboring subsystems. Moreover, we consider that there exists an additional eavesdropper being able to infer the private information by eavesdropping transmitted data between subsystems. Th… ▽ More

    Submitted 1 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  23. arXiv:2405.04290  [pdf, other

    cs.RO eess.SP

    Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map

    Authors: Yuxuan Xia, Erik Stenborg, Junsheng Fu, Gustaf Hendeby

    Abstract: High-definition map with accurate lane-level information is crucial for autonomous driving, but the creation of these maps is a resource-intensive process. To this end, we present a cost-effective solution to create lane-level roadmaps using only the global navigation satellite system (GNSS) and a camera on customer vehicles. Our proposed solution utilizes a prior standard-definition (SD) map, GNS… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 27th International Conference on Information Fusion

  24. arXiv:2404.17903  [pdf, other

    eess.SP

    3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints

    Authors: Jiayin Deng, Zhiqun Hu, Yuxuan Xia, Zhaoming Lu, Xiangming Wen

    Abstract: Roadside perception is a key component in intelligent transportation systems. In this paper, we present a novel three-dimensional (3D) extended object tracking (EOT) method, which simultaneously estimates the object kinematics and extent state, in roadside perception using both the radar and camera data. Because of the influence of sensor viewing angle and limited angle resolution, radar measureme… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  25. arXiv:2404.05257  [pdf, other

    eess.SP

    Sensing-Resistance-Oriented Beamforming for Privacy Protection from ISAC Devices

    Authors: Teng Ma, Yue Xiao, Xia Lei, Ming Xiao

    Abstract: With the evolution of integrated sensing and communication (ISAC) technology, a growing number of devices go beyond conventional communication functions with sensing abilities. Therefore, future networks are divinable to encounter new privacy concerns on sensing, such as the exposure of position information to unintended receivers. In contrast to traditional privacy preserving schemes aiming to pr… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted for presentation at WS29 ICC 2024 Workshop - ISAC6G

  26. arXiv:2403.16970  [pdf, other

    eess.IV cs.CV cs.LG

    Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability

    Authors: Zirui Qiu, Hassan Rivaz, Yiming Xiao

    Abstract: As deep learning has become the state-of-the-art for computer-assisted diagnosis, interpretability of the automatic decisions is crucial for clinical deployment. While various methods were proposed in this domain, visual attention maps of clinicians during radiological screening offer a unique asset to provide important insights and can potentially enhance the quality of computer-assisted diagnosi… ▽ More

    Submitted 29 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  27. arXiv:2403.06756  [pdf, other

    eess.SP

    One-Bit Target Detection in Collocated MIMO Radar with Colored Background Noise

    Authors: Yu-Hang Xiao, David Ramírez, Lei Huang, Xiao Peng Li, Hing Cheung So

    Abstract: One-bit sampling has emerged as a promising technique in multiple-input multiple-output (MIMO) radar systems due to its ability to significantly reduce data volume and processing requirements. Nevertheless, current detection methods have not adequately addressed the impact of colored noise, which is frequently encountered in real scenarios. In this paper, we present a novel detection method that a… ▽ More

    Submitted 26 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  28. arXiv:2403.06423  [pdf, other

    eess.SP cs.RO

    LiDAR Point Cloud-based Multiple Vehicle Tracking with Probabilistic Measurement-Region Association

    Authors: Guanhua Ding, Jianan Liu, Yuxuan Xia, Tao Huang, Bing Zhu, Jinping Sun

    Abstract: Multiple extended target tracking (ETT) has gained increasing attention due to the development of high-precision LiDAR and radar sensors in automotive applications. For LiDAR point cloud-based vehicle tracking, this paper presents a probabilistic measurement-region association (PMRA) ETT model, which can describe the complex measurement distribution by partitioning the target extent into different… ▽ More

    Submitted 18 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures, accepted by the 27th International Conference on Information Fusion (FUSION 2024)

  29. arXiv:2402.16865  [pdf, other

    eess.IV cs.CV cs.LG

    Improve Robustness of Eye Disease Detection by including Learnable Probabilistic Discrete Latent Variables into Machine Learning Models

    Authors: Anirudh Prabhakaran, YeKun Xiao, Ching-Yu Cheng, Dianbo Liu

    Abstract: Ocular diseases, ranging from diabetic retinopathy to glaucoma, present a significant public health challenge due to their prevalence and potential for causing vision impairment. Early and accurate diagnosis is crucial for effective treatment and management.In recent years, deep learning models have emerged as powerful tools for analysing medical images, including ocular imaging . However, challen… ▽ More

    Submitted 20 January, 2024; originally announced February 2024.

    Comments: This is a work in progress

  30. arXiv:2402.09679  [pdf, other

    cs.RO eess.SY

    Design and Visual Servoing Control of a Hybrid Dual-Segment Flexible Neurosurgical Robot for Intraventricular Biopsy

    Authors: Jian Chen, Mingcong Chen, Qingxiang Zhao, Shuai Wang, Yihe Wang, Ying Xiao, Jian Hu, Danny Tat Ming Chan, Kam Tong Leo Yeung, David Yuen Chung Chan, Hongbin Liu

    Abstract: Traditional rigid endoscopes have challenges in flexibly treating tumors located deep in the brain, and low operability and fixed viewing angles limit its development. This study introduces a novel dual-segment flexible robotic endoscope MicroNeuro, designed to perform biopsies with dexterous surgical manipulation deep in the brain. Taking into account the uncertainty of the control model, an imag… ▽ More

    Submitted 23 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted by IEEE International Conference on Robotics and Automation (ICRA) 2024, 7 pages, 9 figures

  31. arXiv:2402.09372  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

    Authors: Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

    Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

  32. arXiv:2402.07383  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

    Authors: Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

    Abstract: Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their applications and user experience. While there have been prior works to generate natural laughter, they fell short in terms of controlling the timing an… ▽ More

    Submitted 4 March, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: See https://aka.ms/elate/ for demo samples, v2: subjective evaluation has been added

  33. arXiv:2402.03230  [pdf, other

    eess.IV cs.CV cs.LG

    Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical Segmentation

    Authors: Arash Harirpoush, Amirhossein Rasoulian, Marta Kersten-Oertel, Yiming Xiao

    Abstract: Recent rising interests in patient-specific thoracic surgical planning and simulation require efficient and robust creation of digital anatomical models from automatic medical image segmentation algorithms. Deep learning (DL) is now state-of-the-art in various radiological tasks, and U-shaped DL models have particularly excelled in medical image segmentation since the inception of the 2D UNet. To… ▽ More

    Submitted 14 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  34. arXiv:2402.02781  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Dual Knowledge Distillation for Efficient Sound Event Detection

    Authors: Yang Xiao, Rohan Kumar Das

    Abstract: Sound event detection (SED) is essential for recognizing specific sounds and their temporal locations within acoustic signals. This becomes challenging particularly for on-device applications, where computational resources are limited. To address this issue, we introduce a novel framework referred to as dual knowledge distillation for developing efficient SED systems in this work. Our proposed dua… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024 (Deep Neural Network Model Compression Workshop)

  35. arXiv:2402.01271  [pdf, other

    eess.AS cs.SD

    An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

    Authors: Linping Xu, Jiawei Jiang, Dejun Zhang, Xianjun Xia, Li Chen, Yijian Xiao, Piao Ding, Shenyi Song, Sixing Yin, Ferdous Sohel

    Abstract: Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleave… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: INTERSPEECH 2023

  36. arXiv:2401.07139  [pdf, other

    cs.CV cs.AI eess.IV

    Deep Blind Super-Resolution for Satellite Video

    Authors: Yi Xiao, Qiangqiang Yuan, Qiang Zhang, Liangpei Zhang

    Abstract: Recent efforts have witnessed remarkable progress in Satellite Video Super-Resolution (SVSR). However, most SVSR methods usually assume the degradation is fixed and known, e.g., bicubic downsampling, which makes them vulnerable in real-world scenes with multiple and unknown degradations. To alleviate this issue, blind SR has thus become a research hotspot. Nevertheless, existing approaches are mai… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

    Comments: Published in IEEE TGRS

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-16, 2023, Art no. 5516316

  37. arXiv:2401.05606  [pdf

    eess.SP

    Weiss-Weinstein bound of frequency estimation error for very weak GNSS signals

    Authors: Xin Zhang, Xingqun Zhan, Jihong Huang, Jiahui Liu, Yingchao Xiao

    Abstract: Tightness remains the center quest in all modern estimation bounds. For very weak signals, this is made possible with judicial choices of prior probability distribution and bound family. While current bounds in GNSS assess performance of carrier frequency estimators under Gaussian or uniform assumptions, the circular nature of frequency is overlooked. In addition, of all bounds in Bayesian framewo… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 35 pages, 13 figures, submitted to NAVIGATION, Journal of the Institute of Navigation

  38. arXiv:2401.04389  [pdf, other

    cs.SD eess.AS

    RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement

    Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: submitted to ICASSP 2024

  39. arXiv:2401.03687  [pdf, other

    eess.AS cs.SD

    BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators

    Authors: Zihan Zhang, Jiayao Sun, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: Packet loss is a common and unavoidable problem in voice over internet phone (VoIP) systems. To deal with the problem, we propose a band-split packet loss concealment network (BS-PLCNet). Specifically, we split the full-band signal into wide-band (0-8kHzきろへるつ) and high-band (8-24kHzきろへるつ). The wide-band signals are processed by a gated convolutional recurrent network (GCRN), while the high-band counterpart… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: submitted to ICASSP 2024

  40. arXiv:2401.02961  [pdf, other

    cs.LG cs.CV eess.IV physics.optics

    A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design

    Authors: Manna Dai, Yang Jiang, Feng Yang, Joyjit Chattoraj, Yingzhi Xia, Xinxing Xu, Weijiang Zhao, My Ha Dao, Yong Liu

    Abstract: Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that… ▽ More

    Submitted 18 October, 2023; originally announced January 2024.

  41. arXiv:2401.02831  [pdf, other

    cs.CV eess.IV

    Two-stage Progressive Residual Dense Attention Network for Image Denoising

    Authors: Wencong Wu, An Ge, Guannan Lv, Yuelong Xia, Yungang Zhang, Wen Xiong

    Abstract: Deep convolutional neural networks (CNNs) for image denoising can effectively exploit rich hierarchical features and have achieved great success. However, many deep CNN-based denoising models equally utilize the hierarchical features of noisy images without paying attention to the more important and useful features, leading to relatively low performance. To address the issue, we design a new Two-s… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  42. arXiv:2401.01912  [pdf, other

    cs.CV cs.LG eess.IV

    Shrinking Your TimeStep: Towards Low-Latency Neuromorphic Object Recognition with Spiking Neural Network

    Authors: Yongqi Ding, Lin Zuo, Mengmeng Jing, Pei He, Yongjun Xiao

    Abstract: Neuromorphic object recognition with spiking neural networks (SNNs) is the cornerstone of low-power neuromorphic computing. However, existing SNNs suffer from significant latency, utilizing 10 to 40 timesteps or more, to recognize neuromorphic objects. At low latencies, the performance of existing SNNs is drastically degraded. In this work, we propose the Shrinking SNN (SSNN) to achieve low-latenc… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  43. arXiv:2312.17527  [pdf, ps, other

    cs.PL eess.SY

    Data-Driven Template-Free Invariant Generation

    Authors: Yuan Xia, Jyotirmoy V. Deshmukh, Mukund Raghothaman, Srivatsan Ravi

    Abstract: Automatic verification of concurrent programs faces state explosion due to the exponential possible interleavings of its sequential components coupled with large or infinite state spaces. An alternative is deductive verification, where given a candidate invariant, we establish inductive invariance and show that any state satisfying the invariant is also safe. However, learning (inductive) program… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

  44. arXiv:2312.16963  [pdf, other

    eess.IV cs.CV

    FFCA-Net: Stereo Image Compression via Fast Cascade Alignment of Side Information

    Authors: Yichong Xia, Yujun Huang, Bin Chen, Haoqian Wang, Yaowei Wang

    Abstract: Multi-view compression technology, especially Stereo Image Compression (SIC), plays a crucial role in car-mounted cameras and 3D-related applications. Interestingly, the Distributed Source Coding (DSC) theory suggests that efficient data compression of correlated sources can be achieved through independent encoding and joint decoding. This motivates the rapidly developed deep-distributed SIC metho… ▽ More

    Submitted 29 December, 2023; v1 submitted 28 December, 2023; originally announced December 2023.

  45. arXiv:2312.10741  [pdf, other

    eess.AS cs.CL cs.SD

    StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

    Authors: Yu Zhang, Rongjie Huang, Ruiqi Li, JinZheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

    Abstract: Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expr… ▽ More

    Submitted 2 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  46. arXiv:2312.09576  [pdf, other

    eess.IV cs.CV

    SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

    Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, Jin Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

    Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

  47. arXiv:2312.03423  [pdf, other

    eess.SP

    Markov Chain Monte Carlo Multi-Scan Data Association for Sets of Trajectories

    Authors: Yuxuan Xia, Ángel F. García-Fernández, Lennart Svensson

    Abstract: This paper considers a batch solution to the multi-object tracking problem based on sets of trajectories. Specifically, we present two offline implementations of the trajectory Poisson multi-Bernoulli mixture (TPMBM) filter for batch data based on Markov chain Monte Carlo (MCMC) sampling of the data association hypotheses. In contrast to online TPMBM implementations, the proposed offline implement… ▽ More

    Submitted 23 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Accepted for publication in IEEE Transactions on Aerospace and Electronic Systems. MATLAB implementation available at https://github.com/yuhsuansia/Batch-TPMBM-using-MCMC-sampling

  48. arXiv:2311.18333  [pdf, other

    math.NA eess.SP

    Spherical Designs for Function Approximation and Beyond

    Authors: Yuchen Xiao, Xiaosheng Zhuang

    Abstract: In this paper, we compare two optimization algorithms using full Hessian and approximation Hessian to obtain numerical spherical designs through their variational characterization. Based on the obtained spherical design point sets, we investigate the approximation of smooth and non-smooth functions by spherical harmonics with spherical designs. Finally, we use spherical framelets for denoising Wen… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 29 pages, 9 figures, 7 tables

    MSC Class: 42C05; 58C35; 65K10; 65D15; 65D32

  49. arXiv:2311.16771  [pdf, other

    stat.ML cs.LG eess.SP

    The HR-Calculus: Enabling Information Processing with Quaternion Algebra

    Authors: Danilo P. Mandic, Sayed Pouria Talebi, Clive Cheong Took, Yili Xia, Dongpo Xu, Min Xiang, Pauline Bourigault

    Abstract: From their inception, quaternions and their division algebra have proven to be advantageous in modelling rotation/orientation in three-dimensional spaces and have seen use from the initial formulation of electromagnetic filed theory through to forming the basis of quantum filed theory. Despite their impressive versatility in modelling real-world phenomena, adaptive information processing technique… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  50. arXiv:2311.13706  [pdf, other

    eess.IV cs.CV

    Multi-view Hybrid Graph Convolutional Network for Volume-to-mesh Reconstruction in Cardiovascular MRI

    Authors: Nicolás Gaggion, Benjamin A. Matheson, Yan Xia, Rodrigo Bonazzola, Nishant Ravikumar, Zeike A. Taylor, Diego H. Milone, Alejandro F. Frangi, Enzo Ferrante

    Abstract: Cardiovascular magnetic resonance imaging is emerging as a crucial tool to examine cardiac morphology and function. Essential to this endeavour are anatomical 3D surface and volumetric meshes derived from CMR images, which facilitate computational anatomy studies, biomarker discovery, and in-silico simulations. However, conventional surface mesh generation methods, such as active shape models and… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.