(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 99 results for author: Zhu, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2409.07862  [pdf, other

    eess.IV cs.CV

    Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement

    Authors: Vamsi Krishna Vasa, Peijie Qiu, Wenhui Zhu, Yujian Xiong, Oana Dumitrascu, Yalin Wang

    Abstract: Retinal fundus photography offers a non-invasive way to diagnose and monitor a variety of retinal diseases, but is prone to inherent quality glitches arising from systemic imperfections or operator/patient-related factors. However, high-quality retinal images are crucial for carrying out accurate diagnoses and automated analyses. The fundus image enhancement is typically formulated as a distributi… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  2. arXiv:2408.15667  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers

    Authors: Qian Wang, Zhaoyang Bu, Jiaxuan Mao, Wenyu Zhu, Jingya Zhao, Wei Du, Guochao Shi, Min Zhou, Si Chen, Jieming Qu

    Abstract: Recent advancements in deep learning techniques have sparked performance boosts in various real-world applications including disease diagnosis based on multi-modal medical data. Cough sound data-based respiratory disease (e.g., COVID-19 and Chronic Obstructive Pulmonary Disease) diagnosis has also attracted much attention. However, existing works usually utilise traditional machine learning or dee… ▽ More

    Submitted 2 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  3. arXiv:2408.03174  [pdf, ps, other

    eess.SP cs.IT

    Joint Transmission and Compression Optimization for Networked Sensing with Limited-Capacity Fronthaul Links

    Authors: Weifeng Zhu, Shuowen Zhang, Liang Liu

    Abstract: This paper considers networked sensing in cellular network, where multiple base stations (BSs) first compress their received echo signals from multiple targets and then forward the quantized signals to the central unit (CU) via limited-capacity fronthaul links, such that the CU can leverage all useful echo signals to perform high-resolution localization. Under this setup, we manage to characterize… ▽ More

    Submitted 6 September, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: submitted to IEEE TWC; conference paper accepted by IEEE Globecom 2024

  4. arXiv:2407.13092  [pdf, other

    eess.IV cs.CV

    CC-DCNet: Dynamic Convolutional Neural Network with Contrastive Constraints for Identifying Lung Cancer Subtypes on Multi-modality Images

    Authors: Yuan Jin, Gege Ma, Geng Chen, Tianling Lyu, Jan Egger, Junhui Lyu, Shaoting Zhang, Wentao Zhu

    Abstract: The accurate diagnosis of pathological subtypes of lung cancer is of paramount importance for follow-up treatments and prognosis managements. Assessment methods utilizing deep learning technologies have introduced novel approaches for clinical diagnosis. However, the majority of existing models rely solely on single-modality image input, leading to limited diagnostic accuracy. To this end, we prop… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  5. arXiv:2407.12271  [pdf, other

    cs.CV eess.IV

    RBAD: A Dataset and Benchmark for Retinal Vessels Branching Angle Detection

    Authors: Hao Wang, Wenhui Zhu, Jiayou Qin, Xin Li, Oana Dumitrascu, Xiwen Chen, Peijie Qiu, Abolfazl Razi

    Abstract: Detecting retinal image analysis, particularly the geometrical features of branching points, plays an essential role in diagnosing eye diseases. However, existing methods used for this purpose often are coarse-level and lack fine-grained analysis for efficient annotation. To mitigate these issues, this paper proposes a novel method for detecting retinal branching angles using a self-configured ima… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  6. arXiv:2407.06612  [pdf

    eess.IV cs.CV cs.LG

    AI-based Automatic Segmentation of Prostate on Multi-modality Images: A Review

    Authors: Rui Jin, Derun Li, Dehui Xiang, Lei Zhang, Hailing Zhou, Fei Shi, Weifang Zhu, Jing Cai, Tao Peng, Xinjian Chen

    Abstract: Prostate cancer represents a major threat to health. Early detection is vital in reducing the mortality rate among prostate cancer patients. One approach involves using multi-modality (CT, MRI, US, etc.) computer-aided diagnosis (CADきゃど) systems for the prostate region. However, prostate segmentation is challenging due to imperfections in the images and the prostate's complex tissue structure. The ad… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  7. arXiv:2407.03575  [pdf, other

    eess.IV cs.CV

    DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

    Authors: Wenhui Zhu, Xiwen Chen, Peijie Qiu, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

    Abstract: Multiple instance learning (MIL) stands as a powerful approach in weakly supervised learning, regularly employed in histological whole slide image (WSI) classification for detecting tumorous lesions. However, existing mainstream MIL methods focus on modeling correlation between instances while overlooking the inherent diversity among instances. However, few MIL methods have aimed at diversity mode… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  8. arXiv:2406.14896  [pdf, other

    eess.IV cs.CV

    SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation

    Authors: Wenhui Zhu, Xiwen Chen, Peijie Qiu, Mohammad Farazi, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

    Abstract: Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important facto… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted as a conference paper to 2024 MICCAI

  9. arXiv:2406.10856  [pdf, other

    cs.NI eess.SY

    LEO Satellite Networks Assisted Geo-distributed Data Processing

    Authors: Zhiyuan Zhao, Zhe Chen, Zheng Lin, Wenjun Zhu, Kun Qiu, Chaoqun You, Yue Gao

    Abstract: Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective soluti… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 6 pages, 5 figures

  10. arXiv:2405.19665  [pdf

    eess.SY cs.AI cs.LG

    A novel fault localization with data refinement for hydroelectric units

    Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

    Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

  11. arXiv:2403.12425  [pdf, other

    cs.CV cs.SD eess.AS

    Multimodal Fusion Method with Spatiotemporal Sequences and Relationship Learning for Valence-Arousal Estimation

    Authors: Jun Yu, Gongpeng Zhao, Yongqi Wang, Zhihong Wei, Yang Zheng, Zerui Zhang, Zhongpeng Cai, Guochen Xie, Jichao Zhu, Wangyuan Zhu

    Abstract: This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 8 pages,3 figures

  12. arXiv:2403.11757  [pdf, other

    cs.MM cs.LG cs.SD eess.AS

    Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation

    Authors: Jun Yu, Wangyuan Zhu, Jichao Zhu

    Abstract: In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration", "Amusement", "Determination", "Empathic Pain"… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  13. arXiv:2402.11834  [pdf, ps, other

    eess.SY eess.SP

    Terahertz User-Centric Clustering in the Presence of Beam Misalignment

    Authors: Khaled Humadi, Imene Trigui, Wei-Ping Zhu, Wessam Ajib

    Abstract: Beam misalignment is one of the main challenges for the design of reliable wireless systems in terahertz (THz) bands. This paper investigates how to apply user-centric base station (BS) clustering as a valuable add-on in THz networks. In particular, to reduce the impact of beam misalignment, a user-centric BS clustering design that provides multi-connectivity via BS cooperation is investigated. Th… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  14. arXiv:2402.10388  [pdf

    cs.CY eess.SP

    Improvising Age Verification Technologies in Canada: Technical, Regulatory and Social Dynamics

    Authors: Azfar Adib, Wei-Ping Zhu, M. Omair Ahmad

    Abstract: Age verification, which is a mandatory legal requirement for delivering certain age-appropriate services or products, has recently been emphasized around the globe to ensure online safety for children. The rapid advancement of artificial intelligence has facilitated the recent development of some cutting-edge age-verification technologies, particularly using biometrics. However, successful deploym… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Presented and accepted for publication in the 2023 IEEE International Humanitarian Technologies Conference (IEEE IHTC 2023), November 1 to 3, 2023, Cartagena, Colombia

  15. arXiv:2401.04154  [pdf

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

    Authors: Wentao Zhu

    Abstract: Audio and video are two most common modalities in the mainstream media platforms, e.g., YouTube. To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy. For multimodal fusion, simply conc… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by WACV 2024; well-formatted PDF is in https://drive.google.com/file/d/1qvW52lamsvNGMCqPS7q8g8L4NaR_LlbR/view?usp=sharing. arXiv admin note: text overlap with arXiv:2401.04023

  16. arXiv:2401.04023  [pdf

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification

    Authors: Wentao Zhu

    Abstract: In recent years, researchers combine both audio and video signals to deal with challenges where actions are not well represented or captured by visual cues. However, how to effectively leverage the two modalities is still under development. In this work, we develop a multiscale multimodal Transformer (MMT) that leverages hierarchical representation learning. Particularly, MMT is composed of a nove… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by WACV 2024; well-formatted PDF is in https://drive.google.com/file/d/10Zo_ydJZFAm7YsxHDgTjhyc4dEJbW_dk/view?usp=sharing

  17. arXiv:2312.16228  [pdf, other

    cs.SD cs.LG cs.MM cs.NE eess.AS

    Deformable Audio Transformer for Audio Event Detection

    Authors: Wentao Zhu

    Abstract: Transformers have achieved promising results on a variety of tasks. However, the quadratic complexity in self-attention computation has limited the applications, especially in low-resource settings and mobile or edge devices. Existing works have proposed to exploit hand-crafted attention patterns to reduce computation complexity. However, such hand-crafted patterns are data-agnostic and may not be… ▽ More

    Submitted 7 January, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

    Comments: ICASSP 2024. arXiv admin note: substantial text overlap with arXiv:2201.00520 by other authors

  18. arXiv:2312.05786  [pdf, other

    eess.SP cs.IT

    Deep Learning for Joint Design of Pilot, Channel Feedback, and Hybrid Beamforming in FDD Massive MIMO-OFDM Systems

    Authors: Junyi Yang, Weifeng Zhu, Shu Sun, Xiaofeng Li, Xingqin Lin, Meixia Tao

    Abstract: This letter considers the transceiver design in frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems for high-quality data transmission. We propose a novel deep learning based framework where the procedures of pilot design, channel feedback, and hybrid beamforming are realized by carefully crafted deep neural networ… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 5 pages, 4 figures, acccpted by IEEE Communication Letters

  19. arXiv:2312.05557  [pdf, ps, other

    cs.IT eess.SP

    Long-Term Rate-Fairness-Aware Beamforming Based Massive MIMO Systems

    Authors: W. Zhu, H. D. Tuan, E. Dutkiewicz, Y. Fang, H. V. Poor, L. Hanzo

    Abstract: This is the first treatise on multi-user (MU) beamforming designed for achieving long-term rate-fairness in fulldimensional MU massive multi-input multi-output (m-MIMO) systems. Explicitly, based on the channel covariances, which can be assumed to be known beforehand, we address this problem by optimizing the following objective functions: the users' signal-toleakage-noise ratios (SLNRs) using SLN… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  20. arXiv:2311.14264  [pdf, ps, other

    eess.SP

    An ADMM-Based Geometric Configuration Optimization in RSSD-Based Source Localization By UAVs with Spread Angle Constraint

    Authors: Xin Cheng, Guangjie Han, Jinlin Peng, Jinfang Jiang, Yu He, Weiqiang Zhu, Feng Shu, Jiangzhou Wang

    Abstract: Deploying multiple unmanned aerial vehicles (UAVs) to locate a signal-emitting source covers a wide range of military and civilian applications like rescue and target tracking. It is well known that the UAVs-source (sensors-target) geometry, namely geometric configuration, significantly affects the final localization accuracy. This paper focuses on the geometric configuration optimization for rece… ▽ More

    Submitted 17 July, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  21. arXiv:2310.17155  [pdf, ps, other

    cs.IT eess.SP

    Max-min Rate Optimization of Low-Complexity Hybrid Multi-User Beamforming Maintaining Rate-Fairness

    Authors: W. Zhu, H. D. Tuan, E. Dutkiewicz, H. V. Poor, L. Hanzo

    Abstract: A wireless network serving multiple users in the millimeter-wave or the sub-terahertz band by a base station is considered. High-throughput multi-user hybrid-transmit beamforming is conceived by maximizing the minimum rate of the users. For the sake of energy-efficient signal transmission, the array-of-subarrays structure is used for analog beamforming relying on low-resolution phase shifters. We… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  22. arXiv:2310.10095  [pdf, other

    eess.IV cs.CV cs.LG

    A Multi-Scale Spatial Transformer U-Net for Simultaneously Automatic Reorientation and Segmentation of 3D Nuclear Cardiac Images

    Authors: Yangfan Ni, Duo Zhang, Gege Ma, Lijun Lu, Zhongke Huang, Wentao Zhu

    Abstract: Accurate reorientation and segmentation of the left ventricular (LV) is essential for the quantitative analysis of myocardial perfusion imaging (MPI), in which one critical step is to reorient the reconstructed transaxial nuclear cardiac images into standard short-axis slices for subsequent image processing. Small-scale LV myocardium (LV-MY) region detection and the diverse cardiac structures of i… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 17 pages, 7 figures

  23. arXiv:2308.12198  [pdf, other

    eess.SP cs.IT

    Hierarchical Beam Alignment for Millimeter-Wave Communication Systems: A Deep Learning Approach

    Authors: Junyi Yang, Weifeng Zhu, Meixia Tao, Shu Sun

    Abstract: Fast and precise beam alignment is crucial for high-quality data transmission in millimeter-wave (mmWave) communication systems, where large-scale antenna arrays are utilized to overcome the severe propagation loss. To tackle the challenging problem, we propose a novel deep learning-based hierarchical beam alignment method for both multiple-input single-output (MISO) and multiple-input multiple-ou… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: 15 pages, 16 figures, to appear in Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2209.03643

  24. arXiv:2308.04663  [pdf, other

    eess.IV cs.CV cs.LG

    Classification of lung cancer subtypes on CT images with synthetic pathological priors

    Authors: Wentao Zhu, Yuan Jin, Gege Ma, Geng Chen, Jan Egger, Shaoting Zhang, Dimitris N. Metaxas

    Abstract: The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements. In this paper, we propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on computed tomography (CT) images. Inspired by studies stating that cross-scale associations exist in the image patterns betwe… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 16 pages, 7 figures

    Journal ref: Medical Image Analysis 95, July 2024, 103199

  25. arXiv:2306.15942  [pdf, other

    cs.SD cs.AI eess.AS

    Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

    Authors: Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao, Yujun Wang

    Abstract: Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input fea… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  26. arXiv:2306.11958  [pdf, other

    physics.med-ph eess.IV

    PDS-MAR: a fine-grained Projection-Domain Segmentation-based Metal Artifact Reduction method for intraoperative CBCT images with guidewires

    Authors: Tianling Lyu, Zhan Wu, Gege Ma, Chen Jiang, Xinyun Zhong, Yan Xi, Yang Chen, Wentao Zhu

    Abstract: Since the invention of modern CT systems, metal artifacts have been a persistent problem. Due to increased scattering, amplified noise, and insufficient data collection, it is more difficult to suppress metal artifacts in cone-beam CT, limiting its use in human- and robot-assisted spine surgeries where metallic guidewires and screws are commonly used. In this paper, we demonstrate that conventiona… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: 19 Pages

    Journal ref: Phys. Med. Biol. 68 215007 (2023)

  27. arXiv:2306.01289  [pdf, other

    eess.IV cs.CV

    nnMobileNet: Rethinking CNN for Retinopathy Research

    Authors: Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang

    Abstract: Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability… ▽ More

    Submitted 15 April, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted as a conference paper to 2024 CVPRW

  28. arXiv:2305.08014  [pdf

    cs.CV cs.AI cs.LG eess.AS

    Surface EMG-Based Inter-Session/Inter-Subject Gesture Recognition by Leveraging Lightweight All-ConvNet and Transfer Learning

    Authors: Md. Rabiul Islam, Daniel Massicotte, Philippe Y. Massicotte, Wei-Ping Zhu

    Abstract: Gesture recognition using low-resolution instantaneous HD-sEMG images opens up new avenues for the development of more fluid and natural muscle-computer interfaces. However, the data variability between inter-session and inter-subject scenarios presents a great challenge. The existing approaches employed very large and complex deep ConvNet or 2SRNN-based domain adaptation methods to approximate th… ▽ More

    Submitted 19 February, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

  29. arXiv:2304.09727  [pdf, other

    eess.SP cs.IT

    Cooperative Multi-Cell Massive Access with Temporally Correlated Activity

    Authors: Weifeng Zhu, Meixia Tao, Xiaojun Yuan, Fan Xu, Yunfeng Guan

    Abstract: This paper investigates the problem of activity detection and channel estimation in cooperative multi-cell massive access systems with temporally correlated activity, where all access points (APs) are connected to a central unit via fronthaul links. We propose to perform user-centric AP cooperation for computation burden alleviation and introduce a generalized sliding-window detection strategy for… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: 16 pages, 17 figures, minor revision

  30. arXiv:2303.10757  [pdf, other

    cs.SD cs.AI cs.CV cs.LG eess.AS

    Multiscale Audio Spectrogram Transformer for Efficient Audio Classification

    Authors: Wentao Zhu, Mohamed Omar

    Abstract: Audio event has a hierarchical architecture in both time and frequency and can be grouped together to construct more abstract semantic audio classes. In this work, we develop a multiscale audio spectrogram Transformer (MAST) that employs hierarchical representation learning for efficient audio classification. Specifically, MAST employs one-dimensional (and two-dimensional) pooling operators along… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  31. arXiv:2303.07704  [pdf, other

    eess.AS cs.SD

    TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge

    Authors: Yukai Ju, Jun Chen, Shimin Zhang, Shulin He, Wei Rao, Weixin Zhu, Yannan Wang, Tao Yu, Shidong Shang

    Abstract: This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded version -- TEA-PSE 3.0. Specifically, TEA-PSE 3.0 incorporates a residual LSTM after squeezed temporal convolution network (S-TCN) to enhance sequence modeling capabilities. Additionally, the local-global representation (LGR) struct… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  32. arXiv:2303.03737  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

    Authors: Zhaoxi Mu, Xinyu Yang, Wenjing Zhu

    Abstract: Transformer has shown advanced performance in speech separation, benefiting from its ability to capture global features. However, capturing local features and channel information of audio sequences in speech separation is equally important. In this paper, we present a novel approach named Intra-SE-Conformer and Inter-Transformer (ISCIT) for speech separation. Specifically, we design a new network… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  33. arXiv:2303.03732  [pdf, other

    cs.SD cs.LG eess.AS

    A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments

    Authors: Zhaoxi Mu, Xinyu Yang, Xiangyuan Yang, Wenjing Zhu

    Abstract: In noisy and reverberant environments, the performance of deep learning-based speech separation methods drops dramatically because previous methods are not designed and optimized for such situations. To address this issue, we propose a multi-stage end-to-end learning method that decouples the difficult speech separation problem in noisy and reverberant environments into three sub-problems: speech… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  34. Low-Complexity Pareto-Optimal 3D Beamforming for the Full-Dimensional Multi-User Massive MIMO Downlink

    Authors: W. Zhu, H. D. Tuan, E. Dutkiewicz, Y. Fang, L. Hanzo

    Abstract: Full-dimensional (FD) multi-user massive multiple input multiple output (m-MIMO) systems employ large two-dimensional (2D) rectangular antenna arrays to control both the azimuth and elevation angles of signal transmission. We introduce the sum of two outer products of the azimuth and elevation beamforming vectors having moderate dimensions as a new class of FD beamforming. We show that this low-co… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

  35. arXiv:2302.03003  [pdf, other

    eess.IV cs.CV stat.ML

    OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing

    Authors: Wenhui Zhu, Peijie Qiu, Oana M. Dumitrascu, Jacob M. Sobczak, Mohammad Farazi, Zhangsihao Yang, Keshav Nandakumar, Yalin Wang

    Abstract: Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the Optimal Transport (OT) theory to propose an… ▽ More

    Submitted 8 April, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted as a conference paper to The 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023)

  36. arXiv:2302.02991  [pdf, other

    eess.IV cs.CV stat.ML

    Optimal Transport Guided Unsupervised Learning for Enhancing low-quality Retinal Images

    Authors: Wenhui Zhu, Peijie Qiu, Mohammad Farazi, Keshav Nandakumar, Oana M. Dumitrascu, Yalin Wang

    Abstract: Real-world non-mydriatic retinal fundus photography is prone to artifacts, imperfections and low-quality when certain ocular or systemic co-morbidities exist. Artifacts may result in inaccuracy or ambiguity in clinical diagnoses. In this paper, we proposed a simple but effective end-to-end framework for enhancing poor-quality retinal fundus images. Leveraging the optimal transport theory, we propo… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted as a conference paper to 20th IEEE International Symposium on Biomedical Imaging(ISBI 2023)

  37. arXiv:2301.00554  [pdf

    eess.IV

    In-situ monitoring additive manufacturing process with AI edge computing

    Authors: Wenkang Zhu, Hui Li, Yikai Zhang, Yuqing Hou, Liwei Chen

    Abstract: In-situ monitoring system can be used to monitor the quality of additive manufacturing (AM) processes. In the case of digital image correlation (DIC) based in-situ monitoring systems, high-speed cameras were used to capture images of high resolutions. This paper proposed a novel in-situ monitoring system to accelerate the process of digital images using artificial intelligence (AI) edge computing… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

  38. arXiv:2211.06041  [pdf, other

    eess.AS

    An Adapter based Multi-label Pre-training for Speech Separation and Enhancement

    Authors: Tianrui Wang, Xie Chen, Zhuo Chen, Shu Yu, Weibin Zhu

    Abstract: In recent years, self-supervised learning (SSL) has achieved tremendous success in various speech tasks due to its power to extract representations from massive unlabeled data. However, compared with tasks such as speech recognition (ASR), the improvements from SSL representation in speech separation (SS) and enhancement (SE) are considerably smaller. Based on HuBERT, this work investigates improv… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: 5 pages

  39. arXiv:2211.00002  [pdf, other

    cs.CV eess.IV physics.data-an

    A Self-Supervised Approach to Reconstruction in Sparse X-Ray Computed Tomography

    Authors: Rey Mendoza, Minh Nguyen, Judith Weng Zhu, Vincent Dumont, Talita Perciano, Juliane Mueller, Vidya Ganapati

    Abstract: Computed tomography has propelled scientific advances in fields from biology to materials science. This technology allows for the elucidation of 3-dimensional internal structure by the attenuation of x-rays through an object at different rotations relative to the beam. By imaging 2-dimensional projections, a 3-dimensional object can be reconstructed through a computational algorithm. Imaging at a… ▽ More

    Submitted 29 October, 2022; originally announced November 2022.

    Comments: NeurIPS 2022 Machine Learning and the Physical Sciences Workshop. arXiv admin note: text overlap with arXiv:2210.16709

  40. arXiv:2210.12954  [pdf, other

    cs.IT eess.SP

    Message Passing-Based Joint User Activity Detection and Channel Estimation for Temporally-Correlated Massive Access

    Authors: Weifeng Zhu, Meixia Tao, Xiaojun Yuan, Yunfeng Guan

    Abstract: This paper studies the user activity detection and channel estimation problem in a temporally-correlated massive access system where a very large number of users communicate with a base station sporadically and each user once activated can transmit with a large probability over multiple consecutive frames. We formulate the problem as a dynamic compressed sensing (DCS) problem to exploit both the s… ▽ More

    Submitted 26 January, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: 31 pages, 14 figures, minor revision

  41. arXiv:2210.11089  [pdf, other

    eess.AS cs.SD

    Speech Dereverberation with a Reverberation Time Shortening Target

    Authors: Rui Zhou, Wenye Zhu, Xiaofei Li

    Abstract: This work proposes a new learning target based on reverberation time shortening (RTS) for speech dereverberation. The learning target for dereverberation is usually set as the direct-path speech or optionally with some early reflections. This type of target suddenly truncates the reverberation, and thus it may not be suitable for network training. The proposed RTS target suppresses reverberation a… ▽ More

    Submitted 5 June, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2204.08765

  42. arXiv:2210.08802  [pdf, other

    eess.AS cs.SD

    spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancement

    Authors: Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang

    Abstract: Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network -- Spatial DCCRN. Firstly, we extend S-DCCRN to multi-channel scenario, aiming at performing cascaded su… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  43. arXiv:2210.05946  [pdf, other

    eess.IV cs.CV stat.ML

    Self-Supervised Equivariant Regularization Reconciles Multiple Instance Learning: Joint Referable Diabetic Retinopathy Classification and Lesion Segmentation

    Authors: Wenhui Zhu, Peijie Qiu, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang

    Abstract: Lesion appearance is a crucial clue for medical providers to distinguish referable diabetic retinopathy (rDR) from non-referable DR. Most existing large-scale DR datasets contain only image-level labels rather than pixel-based annotations. This motivates us to develop algorithms to classify rDR and segment lesions via image-level labels. This paper leverages self-supervised equivariant learning an… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 7 pages, 2 tables, 3 figures. 18th International Symposium on Medical Information Processing and Analysis

  44. arXiv:2209.03643  [pdf, ps, other

    eess.SP

    Deep Learning for Hierarchical Beam Alignment in mmWave Communication Systems

    Authors: Junyi Yang, Weifeng Zhu, Meixia Tao

    Abstract: Fast and precise beam alignment is crucial to support high-quality data transmission in millimeter wave (mmWave) communication systems. In this work, we propose a novel deep learning based hierarchical beam alignment method that learns two tiers of probing codebooks (PCs) and uses their measurements to predict the optimal beam in a coarse-to-fine searching manner. Specifically, the proposed method… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Comments: 6 pages, 6 figure, accepted by GLOBECOM 2022

  45. arXiv:2206.04289  [pdf, other

    eess.IV cs.CV

    A No-Reference Deep Learning Quality Assessment Method for Super-resolution Images Based on Frequency Maps

    Authors: Zicheng Zhang, Wei Sun, Xiongkuo Min, Wenhan Zhu, Tao Wang, Wei Lu, Guangtao Zhai

    Abstract: To support the application scenarios where high-resolution (HR) images are urgently needed, various single image super-resolution (SISR) algorithms are developed. However, SISR is an ill-posed inverse problem, which may bring artifacts like texture shift, blur, etc. to the reconstructed images, thus it is necessary to evaluate the quality of super-resolution images (SRIs). Note that most existing… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

  46. arXiv:2205.07494  [pdf, other

    eess.SP cs.IT

    Double-Sided Information Aided Temporal-Correlated Massive Access

    Authors: Weifeng Zhu, Meixia Tao, Yunfeng Guan

    Abstract: This letter considers temporal-correlated massive access, where each device, once activated, is likely to transmit continuously over several consecutive frames. Motivated by that the device activity at each frame is correlated to not only its previous frame but also its next frame, we propose a double-sided information (DSI) aided joint activity detection and channel estimation algorithm based on… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: 6 pages, 5 figures

  47. arXiv:2204.08765  [pdf, other

    eess.AS cs.SD eess.SP

    Speech Dereverberation with A Reverberation Time Shortening Target

    Authors: Rui Zhou, Wenye Zhu, Xiaofei Li

    Abstract: This work proposes a new learning target based on reverberation time shortening (RTS) for speech dereverberation. The learning target for dereverberation is usually set as the direct-path speech or optionally with some early reflections. This type of target suddenly truncates the reverberation, and thus it may not be suitable for network training. The proposed RTS target suppresses reverberation a… ▽ More

    Submitted 20 November, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Submitted to ICASSP 2023

  48. arXiv:2204.05571  [pdf, other

    cs.SD cs.LG eess.AS

    Speech Emotion Recognition with Global-Aware Fusion on Multi-scale Feature Representation

    Authors: Wenjing Zhu, Xiang Li

    Abstract: Speech Emotion Recognition (SER) is a fundamental task to predict the emotion label from speech data. Recent works mostly focus on using convolutional neural networks~(CNNs) to learn local attention map on fixed-scale feature representation by viewing time-varied spectral features as images. However, rich emotional feature at different scales and important global information are not able to be wel… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: 6 pages, 3 figures, ICASSP 2022

  49. arXiv:2204.00226  [pdf, other

    eess.AS

    Multiple Confidence Gates For Joint Training Of SE And ASR

    Authors: Tianrui Wang, Weibin Zhu, Yingying Gao, Junlan Feng, Shilei Zhang

    Abstract: Joint training of speech enhancement model (SE) and speech recognition model (ASR) is a common solution for robust ASR in noisy environments. SE focuses on improving the auditory quality of speech, but the enhanced feature distribution is changed, which is uncertain and detrimental to the ASR. To tackle this challenge, an approach with multiple confidence gates for jointly training of SE and ASR i… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: 5 pages

  50. arXiv:2203.04780  [pdf

    eess.SP

    Intelligent resonance tracking of a microwave plasmonic resonator for compact wireless sensors

    Authors: Xuanru Zhang, Jia Wen Zhu, Tie Jun Cui

    Abstract: Plasmonic sensing has been in the spotlight for decades, the concept and applications of which have been generalized to spoof surface plasmons (SSPs) in the microwave band. Here, we report a compact and wireless sensor within a printed circuit board size of 18 mm * 12 mm, tracking the resonance frequency shift of a microwave plasmonic resonator via a software-defined scheme. The microwave plasmoni… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.