Search | arXiv e-print repository

Spatially-Variant Degradation Model for Dataset-free Super-resolution

Authors: Shaojie Guo, Haofei Song, Qingli Li, Yan Wang

Abstract: This paper focuses on the dataset-free Blind Image Super-Resolution (BISR). Unlike existing dataset-free BISR methods that focus on obtaining a degradation kernel for the entire image, we are the first to explicitly design a spatially-variant degradation model for each pixel. Our method also benefits from having a significantly smaller number of learnable parameters compared to data-driven spatial… ▽ More This paper focuses on the dataset-free Blind Image Super-Resolution (BISR). Unlike existing dataset-free BISR methods that focus on obtaining a degradation kernel for the entire image, we are the first to explicitly design a spatially-variant degradation model for each pixel. Our method also benefits from having a significantly smaller number of learnable parameters compared to data-driven spatially-variant BISR methods. Concretely, each pixel's degradation kernel is expressed as a linear combination of a learnable dictionary composed of a small number of spatially-variant atom kernels. The coefficient matrices of the atom degradation kernels are derived using membership functions of fuzzy set theory. We construct a novel Probabilistic BISR model with tailored likelihood function and prior terms. Subsequently, we employ the Monte Carlo EM algorithm to infer the degradation kernels for each pixel. Our method achieves a significant improvement over other state-of-the-art BISR methods, with an average improvement of 1 dBでしべる (2x).Code will be released at https://github.com/shaojieguoECNU/SVDSR. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.01086 [pdf, ps, other]

Terahertz Communication Multi-UAV-Assisted Mobile Edge Computing System

Authors: Heekang Song, Hyowoon Seo, Wan Choi

Abstract: Mobile edge computing (MEC) and terahertz (THz)enabled unmanned aerial vehicle (UAV) communication systems are gaining significant attention for improving user service delays in future mobile networks. This article introduces a novel multi-UAV-aided MEC system operating at THz frequencies to minimize expected user service delays, including communication and computation latency. We address this cha… ▽ More Mobile edge computing (MEC) and terahertz (THz)enabled unmanned aerial vehicle (UAV) communication systems are gaining significant attention for improving user service delays in future mobile networks. This article introduces a novel multi-UAV-aided MEC system operating at THz frequencies to minimize expected user service delays, including communication and computation latency. We address this challenge by jointly optimizing UAV relay selection, power control, positioning, and user-resource association for task offloading and resource allocation. To tackle the problem's complexities, we decompose it into four subproblems, each solved optimally with our proposed algorithm. An iterative penalty dual decomposition (PDD) algorithm approximates the original problem's solution. Numerical results demonstrate that our PDD-based approach outperforms baseline algorithms in terms of expected user service delay. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.07061 [pdf, other]

Triage of 3D pathology data via 2.5D multiple-instance learning to guide pathologist assessments

Authors: Gan Gao, Andrew H. Song, Fiona Wang, David Brenes, Rui Wang, Sarah S. L. Chow, Kevin W. Bishop, Lawrence D. True, Faisal Mahmood, Jonathan T. C. Liu

Abstract: Accurate patient diagnoses based on human tissue biopsies are hindered by current clinical practice, where pathologists assess only a limited number of thin 2D tissue slices sectioned from 3D volumetric tissue. Recent advances in non-destructive 3D pathology, such as open-top light-sheet microscopy, enable comprehensive imaging of spatially heterogeneous tissue morphologies, offering the feasibili… ▽ More Accurate patient diagnoses based on human tissue biopsies are hindered by current clinical practice, where pathologists assess only a limited number of thin 2D tissue slices sectioned from 3D volumetric tissue. Recent advances in non-destructive 3D pathology, such as open-top light-sheet microscopy, enable comprehensive imaging of spatially heterogeneous tissue morphologies, offering the feasibility to improve diagnostic determinations. A potential early route towards clinical adoption for 3D pathology is to rely on pathologists for final diagnosis based on viewing familiar 2D H&E-like image sections from the 3D datasets. However, manual examination of the massive 3D pathology datasets is infeasible. To address this, we present CARP3D, a deep learning triage approach that automatically identifies the highest-risk 2D slices within 3D volumetric biopsy, enabling time-efficient review by pathologists. For a given slice in the biopsy, we estimate its risk by performing attention-based aggregation of 2D patches within each slice, followed by pooling of the neighboring slices to compute a context-aware 2.5D risk score. For prostate cancer risk stratification, CARP3D achieves an area under the curve (AUえーゆーC) of 90.4% for triaging slices, outperforming methods relying on independent analysis of 2D sections (AUC=81.3%). These results suggest that integrating additional depth context enhances the model's discriminative capabilities. In conclusion, CARP3D has the potential to improve pathologist diagnosis via accurate triage of high-risk slices within large-volume 3D pathology datasets. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: CVPR CVMI 2024

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 6955-6965

arXiv:2405.02857 [pdf, other]

doi 10.1109/TMI.2024.3394033

I$^3$Net: Inter-Intra-slice Interpolation Network for Medical Slice Synthesis

Authors: Haofei Song, Xintian Mao, Jing Yu, Qingli Li, Yan Wang

Abstract: Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution fr… ▽ More Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution from other views. Based on this observation, we propose an Inter-Intra-slice Interpolation Network (I$^3$Net), which fully explores information from high in-plane resolution and compensates for low through-plane resolution. The through-plane branch supplements the limited information contained in low through-plane resolution from high in-plane resolution and enables continual and diverse feature learning. In-plane branch transforms features to the frequency domain and enforces an equal learning opportunity for all frequency bands in a global context learning paradigm. We further propose a cross-view block to take advantage of the information from all three views online. Extensive experiments on two public datasets demonstrate the effectiveness of I$^3$Net, and noticeably outperforms state-of-the-art super-resolution, video frame interpolation and slice interpolation methods by a large margin. We achieve 43.90dBでしべる in PSNR, with at least 1.14dBでしべる improvement under the upscale factor of $\times$2 on MSD dataset with faster inference. Code is available at https://github.com/DeepMed-Lab-ECNU/Medical-Image-Reconstruction. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2403.12726 [pdf]

Small Distance Increment Method for Measuring Complex Permittivity With mmWave Radar

Authors: Hang Song, Hyun Joon Kim, Mingxia Wan, Bo Wei, Takamaro Kikkawa, Jun-ichi Takada

Abstract: Measuring the complex permittivity of material is essential in many scenarios such as quality check and component analysis. Generally, measurement methods for characterizing the material are based on the usage of vector network analyzer, which is large and not easy for on-site measurement, especially in high frequency range such as millimeter wave (mmWave). In addition, some measurement methods re… ▽ More Measuring the complex permittivity of material is essential in many scenarios such as quality check and component analysis. Generally, measurement methods for characterizing the material are based on the usage of vector network analyzer, which is large and not easy for on-site measurement, especially in high frequency range such as millimeter wave (mmWave). In addition, some measurement methods require the destruction of samples, which is not suitable for non-destructive inspection. In this work, a small distance increment (SDI) method is proposed to non-destructively measure the complex permittivity of material. In SDI, the transmitter and receiver are formed as the monostatic radar, which is facing towards the material under test (MUT). During the measurement, the distance between radar and MUT changes with small increments and the signals are recorded at each position. A mathematical model is formulated to depict the relationship among the complex permittivity, distance increment, and measured signals. By fitting the model, the complex permittivity of MUT is estimated. To implement and evaluate the proposed SDI method, a commercial off-the-shelf mmWave radar is utilized and the measurement system is developed. Then, the evaluation was carried out on the acrylic plate. With the proposed method, the estimated complex permittivity of acrylic plate shows good agreement with the literature values, demonstrating the efficacy of SDI method for characterizing the complex permittivity of material. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2402.19470 [pdf, other]

Towards Generalizable Tumor Synthesis

Authors: Qi Chen, Xiaoxi Chen, Haorui Song, Zhiwei Xiong, Alan Yuille, Chen Wei, Zongwei Zhou

Abstract: Tumor synthesis enables the creation of artificial tumors in medical images, facilitating the training of AI models for tumor detection and segmentation. However, success in tumor synthesis hinges on creating visually realistic tumors that are generalizable across multiple organs and, furthermore, the resulting AI models being capable of detecting real tumors in images sourced from different domai… ▽ More Tumor synthesis enables the creation of artificial tumors in medical images, facilitating the training of AI models for tumor detection and segmentation. However, success in tumor synthesis hinges on creating visually realistic tumors that are generalizable across multiple organs and, furthermore, the resulting AI models being capable of detecting real tumors in images sourced from different domains (e.g., hospitals). This paper made a progressive stride toward generalizable tumor synthesis by leveraging a critical observation: early-stage tumors (< 2cm) tend to have similar imaging characteristics in computed tomography (CT), whether they originate in the liver, pancreas, or kidneys. We have ascertained that generative AI models, e.g., Diffusion Models, can create realistic tumors generalized to a range of organs even when trained on a limited number of tumor examples from only one organ. Moreover, we have shown that AI models trained on these synthetic tumors can be generalized to detect and segment real tumors from CT volumes, encompassing a broad spectrum of patient demographics, imaging protocols, and healthcare facilities. △ Less

Submitted 28 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR 2024)

arXiv:2401.06148 [pdf, other]

doi 10.1038/s44222-023-00096-8

Artificial Intelligence for Digital and Computational Pathology

Authors: Andrew H. Song, Guillaume Jaume, Drew F. K. Williamson, Ming Y. Lu, Anurag Vaidya, Tiffany R. Miller, Faisal Mahmood

Abstract: Advances in digitizing tissue slides and the fast-paced progress in artificial intelligence, including deep learning, have boosted the field of computational pathology. This field holds tremendous potential to automate clinical diagnosis, predict patient prognosis and response to therapy, and discover new morphological biomarkers from tissue images. Some of these artificial intelligence-based syst… ▽ More Advances in digitizing tissue slides and the fast-paced progress in artificial intelligence, including deep learning, have boosted the field of computational pathology. This field holds tremendous potential to automate clinical diagnosis, predict patient prognosis and response to therapy, and discover new morphological biomarkers from tissue images. Some of these artificial intelligence-based systems are now getting approved to assist clinical diagnosis; however, technical barriers remain for their widespread clinical adoption and integration as a research tool. This Review consolidates recent methodological advances in computational pathology for predicting clinical end points in whole-slide images and highlights how these developments enable the automation of clinical practice and the discovery of new biomarkers. We then provide future perspectives as the field expands into a broader range of clinical and research tasks with increasingly diverse modalities of clinical data. △ Less

Submitted 12 December, 2023; originally announced January 2024.

Journal ref: Nature Reviews Bioengineering 2023

arXiv:2401.05850 [pdf, other]

Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection

Authors: Yadong Guan, Jiqing Han, Hongwei Song, Wenjie Song, Guibin Zheng, Tieran Zheng, Yongjun He

Abstract: Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning fram… ▽ More Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: accepted by icassp2024

arXiv:2310.07974 [pdf, other]

Causality-based Cost Allocation for Peer-to-Peer Energy Trading in Distribution System

Authors: Hyun Joong Kim, Yong Hyun Song, Jip Kim

Abstract: While peer-to-peer energy trading has the potential to harness the capabilities of small-scale energy resources, a peer-matching process often overlooks power grid conditions, yielding increased losses, line congestion, and voltage problems. This imposes a great challenge on the distribution system operator (DSO), which can eventually limit peer-to-peer energy trading. To align the peer-matching p… ▽ More While peer-to-peer energy trading has the potential to harness the capabilities of small-scale energy resources, a peer-matching process often overlooks power grid conditions, yielding increased losses, line congestion, and voltage problems. This imposes a great challenge on the distribution system operator (DSO), which can eventually limit peer-to-peer energy trading. To align the peer-matching process with the physical grid conditions, this paper proposes a cost causality-based network cost allocation method and the grid-aware peer-matching process. Building on the cost causality principle, the proposed model utilizes the network cost (loss, congestion, and voltage) as a signal to encourage peers to adjust their preferences ensuring that matches are more in line with grid conditions, leading to enhanced social welfare. Additionally, this paper presents mathematical proof showing the superiority of the causality-based cost allocation over existing methods. △ Less

Submitted 20 February, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 7 pages, 7 figures

arXiv:2308.16551 [pdf]

Object Detection for Caries or Pit and Fissure Sealing Requirement in Children's First Permanent Molars

Authors: Chenyao Jiang, Shiyao Zhai, Hengrui Song, Yuqing Ma, Yachen Fan, Yancheng Fang, Dongmei Yu, Canyang Zhang, Sanyang Han, Runming Wang, Yong Liu, Jianbo Li, Peiwu Qin

Abstract: Dental caries is one of the most common oral diseases that, if left untreated, can lead to a variety of oral problems. It mainly occurs inside the pits and fissures on the occlusal/buccal/palatal surfaces of molars and children are a high-risk group for pit and fissure caries in permanent molars. Pit and fissure sealing is one of the most effective methods that is widely used in prevention of pit… ▽ More Dental caries is one of the most common oral diseases that, if left untreated, can lead to a variety of oral problems. It mainly occurs inside the pits and fissures on the occlusal/buccal/palatal surfaces of molars and children are a high-risk group for pit and fissure caries in permanent molars. Pit and fissure sealing is one of the most effective methods that is widely used in prevention of pit and fissure caries. However, current detection of pits and fissures or caries depends primarily on the experienced dentists, which ordinary parents do not have, and children may miss the remedial treatment without timely detection. To address this issue, we present a method to autodetect caries and pit and fissure sealing requirements using oral photos taken by smartphones. We use the YOLOv5 and YOLOX models and adopt a tiling strategy to reduce information loss during image pre-processing. The best result for YOLOXs model with tiling strategy is 72.3 mAP.5, while the best result without tiling strategy is 71.2. YOLOv5s6 model with/without tiling attains 70.9/67.9 mAP.5, respectively. We deploy the pre-trained network to mobile devices as a WeChat applet, allowing in-home detection by parents or children guardian. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.07618 [pdf, other]

Vision-based Semantic Communications for Metaverse Services: A Contest Theoretic Approach

Authors: Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Boon Hee Soong

Abstract: The popularity of Metaverse as an entertainment, social, and work platform has led to a great need for seamless avatar integration in the virtual world. In Metaverse, avatars must be updated and rendered to reflect users' behaviour. Achieving real-time synchronization between the virtual bilocation and the user is complex, placing high demands on the Metaverse Service Provider (MSP)'s rendering re… ▽ More The popularity of Metaverse as an entertainment, social, and work platform has led to a great need for seamless avatar integration in the virtual world. In Metaverse, avatars must be updated and rendered to reflect users' behaviour. Achieving real-time synchronization between the virtual bilocation and the user is complex, placing high demands on the Metaverse Service Provider (MSP)'s rendering resource allocation scheme. To tackle this issue, we propose a semantic communication framework that leverages contest theory to model the interactions between users and MSPs and determine optimal resource allocation for each user. To reduce the consumption of network resources in wireless transmission, we use the semantic communication technique to reduce the amount of data to be transmitted. Under our simulation settings, the encoded semantic data only contains 51 bytes of skeleton coordinates instead of the image size of 8.243 megabytes. Moreover, we implement Deep Q-Network to optimize reward settings for maximum performance and efficient resource allocation. With the optimal reward setting, users are incentivized to select their respective suitable uploading frequency, reducing down-sampling loss due to rendering resource constraints by 66.076\% compared with the traditional average distribution method. The framework provides a novel solution to resource allocation for avatar association in VR environments, ensuring a smooth and immersive experience for all users. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 6 pages,7figures

arXiv:2308.02416 [pdf, other]

Local-Global Temporal Fusion Network with an Attention Mechanism for Multiple and Multiclass Arrhythmia Classification

Authors: Yun Kwan Kim, Minji Lee, Kunwook Jo, Hee Seok Song, Seong-Whan Lee

Abstract: Clinical decision support systems (CDSSs) have been widely utilized to support the decisions made by cardiologists when detecting and classifying arrhythmia from electrocardiograms (ECGs). However, forming a CDSS for the arrhythmia classification task is challenging due to the varying lengths of arrhythmias. Although the onset time of arrhythmia varies, previously developed methods have not consid… ▽ More Clinical decision support systems (CDSSs) have been widely utilized to support the decisions made by cardiologists when detecting and classifying arrhythmia from electrocardiograms (ECGs). However, forming a CDSS for the arrhythmia classification task is challenging due to the varying lengths of arrhythmias. Although the onset time of arrhythmia varies, previously developed methods have not considered such conditions. Thus, we propose a framework that consists of (i) local temporal information extraction, (ii) global pattern extraction, and (iii) local-global information fusion with attention to perform arrhythmia detection and classification with a constrained input length. The 10-class and 4-class performances of our approach were assessed by detecting the onset and offset of arrhythmia as an episode and the duration of arrhythmia based on the MIT-BIH arrhythmia database (MITDB) and MIT-BIH atrial fibrillation database (AFDB), respectively. The results were statistically superior to those achieved by the comparison models. To check the generalization ability of the proposed method, an AFDB-trained model was tested on the MITDB, and superior performance was attained compared with that of a state-of-the-art model. The proposed method can capture local-global information and dynamics without incurring information losses. Therefore, arrhythmias can be recognized more accurately, and their occurrence times can be calculated; thus, the clinical field can create more accurate treatment plans by using the proposed method. △ Less

Submitted 13 October, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

Comments: 14 pages, 6 figures

MSC Class: 68T07; 92C55

arXiv:2307.14907 [pdf, other]

Weakly Supervised AI for Efficient Analysis of 3D Pathology Samples

Authors: Andrew H. Song, Mane Williams, Drew F. K. Williamson, Guillaume Jaume, Andrew Zhang, Bowen Chen, Robert Serafin, Jonathan T. C. Liu, Alex Baras, Anil V. Parwani, Faisal Mahmood

Abstract: Human tissue and its constituent cells form a microenvironment that is fundamentally three-dimensional (3D). However, the standard-of-care in pathologic diagnosis involves selecting a few two-dimensional (2D) sections for microscopic evaluation, risking sampling bias and misdiagnosis. Diverse methods for capturing 3D tissue morphologies have been developed, but they have yet had little translation… ▽ More Human tissue and its constituent cells form a microenvironment that is fundamentally three-dimensional (3D). However, the standard-of-care in pathologic diagnosis involves selecting a few two-dimensional (2D) sections for microscopic evaluation, risking sampling bias and misdiagnosis. Diverse methods for capturing 3D tissue morphologies have been developed, but they have yet had little translation to clinical practice; manual and computational evaluations of such large 3D data have so far been impractical and/or unable to provide patient-level clinical insights. Here we present Modality-Agnostic Multiple instance learning for volumetric Block Analysis (MAMBA), a deep-learning-based platform for processing 3D tissue images from diverse imaging modalities and predicting patient outcomes. Archived prostate cancer specimens were imaged with open-top light-sheet microscopy or microcomputed tomography and the resulting 3D datasets were used to train risk-stratification networks based on 5-year biochemical recurrence outcomes via MAMBA. With the 3D block-based approach, MAMBA achieves an area under the receiver operating characteristic curve (AUえーゆーC) of 0.86 and 0.74, superior to 2D traditional single-slice-based prognostication (AUえーゆーC of 0.79 and 0.57), suggesting superior prognostication with 3D morphological features. Further analyses reveal that the incorporation of greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, suggesting the value of capturing larger extents of heterogeneous 3D morphology. With the rapid growth and adoption of 3D spatial biology and pathology techniques by researchers and clinicians, MAMBA provides a general and efficient framework for 3D weakly supervised learning for clinical decision support and can help to reveal novel 3D morphological biomarkers for prognosis and therapeutic response. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.10186 [pdf, other]

doi 10.1109/LGRS.2022.3141547

Multi-Scale U-Shape MLP for Hyperspectral Image Classification

Authors: Moule Lin, Weipeng Jing, Donglin Di, Guangsheng Chen, Houbing Song

Abstract: Hyperspectral images have significant applications in various domains, since they register numerous semantic and spatial information in the spectral band with spatial variability of spectral signatures. Two critical challenges in identifying pixels of the hyperspectral image are respectively representing the correlated information among the local and global, as well as the abundant parameters of t… ▽ More Hyperspectral images have significant applications in various domains, since they register numerous semantic and spatial information in the spectral band with spatial variability of spectral signatures. Two critical challenges in identifying pixels of the hyperspectral image are respectively representing the correlated information among the local and global, as well as the abundant parameters of the model. To tackle this challenge, we propose a Multi-Scale U-shape Multi-Layer Perceptron (MUMLP) a model consisting of the designed MSC (Multi-Scale Channel) block and the UMLP (U-shape Multi-Layer Perceptron) structure. MSC transforms the channel dimension and mixes spectral band feature to embed the deep-level representation adequately. UMLP is designed by the encoder-decoder structure with multi-layer perceptron layers, which is capable of compressing the large-scale parameters. Extensive experiments are conducted to demonstrate our model can outperform state-of-the-art methods across-the-board on three wide-adopted public datasets, namely Pavia University, Houston 2013 and Houston 2018 △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 5 pages

Journal ref: IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1-5, 2022, Art no. 6006105

arXiv:2306.06461 [pdf]

Semi-supervsied Learning-based Sound Event Detection using Freuqency Dynamic Convolution with Large Kernel Attention for DCASE Challenge 2023 Task 4

Authors: Ji Won Kim, Sang Won Son, Yoonah Song, Hong Kook Kim, Il Hoon Song, Jeong Eun Lim

Abstract: This report proposes a frequency dynamic convolution (FDY) with a large kernel attention (LKA)-convolutional recurrent neural network (CRNN) with a pre-trained bidirectional encoder representation from audio transformers (BEATs) embedding-based sound event detection (SED) model that employs a mean-teacher and pseudo-label approach to address the challenge of limited labeled data for DCASE 2023 Tas… ▽ More This report proposes a frequency dynamic convolution (FDY) with a large kernel attention (LKA)-convolutional recurrent neural network (CRNN) with a pre-trained bidirectional encoder representation from audio transformers (BEATs) embedding-based sound event detection (SED) model that employs a mean-teacher and pseudo-label approach to address the challenge of limited labeled data for DCASE 2023 Task 4. The proposed FDY with LKA integrates the FDY and LKA module to effectively capture time-frequency patterns, long-term dependencies, and high-level semantic information in audio signals. The proposed FDY with LKA-CRNN with a BEATs embedding network is initially trained on the entire DCASE 2023 Task 4 dataset using the mean-teacher approach, generating pseudo-labels for weakly labeled, unlabeled, and the AudioSet. Subsequently, the proposed SED model is retrained using the same pseudo-label approach. A subset of these models is selected for submission, demonstrating superior F1-scores and polyphonic SED score performance on the DCASE 2023 Challenge Task 4 validation dataset. △ Less

Submitted 10 June, 2023; originally announced June 2023.

Comments: DCASE 2023 Challenge Task 4A, 5 pages

arXiv:2304.00471 [pdf, other]

A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

Authors: Bo-Kyeong Kim, Jaemin Kang, Daeun Seo, Hancheol Park, Shinkook Choi, Hyoung-Kyu Song, Hyungshin Kim, Sungsu Lim

Abstract: Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limi… ▽ More Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limit their efficient deployment. This study aims to develop a lightweight model for speech-driven talking-face synthesis. We build a compact generator by removing the residual blocks and reducing the channel width from Wav2Lip, a popular talking-face generator. We also present a knowledge distillation scheme to stably yet effectively train the small-capacity generator without adversarial learning. We reduce the number of parameters and MACs by 28$\times$ while retaining the performance of the original model. Moreover, to alleviate a severe performance drop when converting the whole generator to INT8 precision, we adopt a selective quantization method that uses FP16 for the quantization-sensitive layers and INT8 for the other layers. Using this mixed precision, we achieve up to a 19$\times$ speedup on edge GPUs without noticeably compromising the generation quality. △ Less

Submitted 28 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

Comments: MLSys Workshop on On-Device Intelligence, 2023; Demo: https://huggingface.co/spaces/nota-ai/compressed_wav2lip

arXiv:2303.06032 [pdf, other]

Exploring Adversarial Attacks on Neural Networks: An Explainable Approach

Authors: Justus Renkhoff, Wenkai Tan, Alvaro Velasquez, illiam Yichen Wang, Yongxin Liu, Jian Wang, Shuteng Niu, Lejla Begic Fazlic, Guido Dartmann, Houbing Song

Abstract: Deep Learning (DL) is being applied in various domains, especially in safety-critical applications such as autonomous driving. Consequently, it is of great significance to ensure the robustness of these methods and thus counteract uncertain behaviors caused by adversarial attacks. In this paper, we use gradient heatmaps to analyze the response characteristics of the VGG-16 model when the input ima… ▽ More Deep Learning (DL) is being applied in various domains, especially in safety-critical applications such as autonomous driving. Consequently, it is of great significance to ensure the robustness of these methods and thus counteract uncertain behaviors caused by adversarial attacks. In this paper, we use gradient heatmaps to analyze the response characteristics of the VGG-16 model when the input images are mixed with adversarial noise and statistically similar Gaussian random noise. In particular, we compare the network response layer by layer to determine where errors occurred. Several interesting findings are derived. First, compared to Gaussian random noise, intentionally generated adversarial noise causes severe behavior deviation by distracting the area of concentration in the networks. Second, in many cases, adversarial examples only need to compromise a few intermediate blocks to mislead the final decision. Third, our experiments revealed that specific blocks are more vulnerable and easier to exploit by adversarial examples. Finally, we demonstrate that the layers $Block4\_conv1$ and $Block5\_cov1$ of the VGG-16 model are more susceptible to adversarial attacks. Our work could provide valuable insights into developing more reliable Deep Neural Network (DNN) models. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2302.12175 [pdf, other]

doi 10.1109/TITS.2023.3248841

Communication and Control in Collaborative UAVs: Recent Advances and Future Trends

Authors: Shumaila Javaid, Nasir Saeed, Zakria Qadir, Hamza Fahim, Bin He, Houbing Song, Muhammad Bilal

Abstract: The recent progress in unmanned aerial vehicles (UAV) technology has significantly advanced UAV-based applications for military, civil, and commercial domains. Nevertheless, the challenges of establishing high-speed communication links, flexible control strategies, and developing efficient collaborative decision-making algorithms for a swarm of UAVs limit their autonomy, robustness, and reliabilit… ▽ More The recent progress in unmanned aerial vehicles (UAV) technology has significantly advanced UAV-based applications for military, civil, and commercial domains. Nevertheless, the challenges of establishing high-speed communication links, flexible control strategies, and developing efficient collaborative decision-making algorithms for a swarm of UAVs limit their autonomy, robustness, and reliability. Thus, a growing focus has been witnessed on collaborative communication to allow a swarm of UAVs to coordinate and communicate autonomously for the cooperative completion of tasks in a short time with improved efficiency and reliability. This work presents a comprehensive review of collaborative communication in a multi-UAV system. We thoroughly discuss the characteristics of intelligent UAVs and their communication and control requirements for autonomous collaboration and coordination. Moreover, we review various UAV collaboration tasks, summarize the applications of UAV swarm networks for dense urban environments and present the use case scenarios to highlight the current developments of UAV-based applications in various domains. Finally, we identify several exciting future research direction that needs attention for advancing the research in collaborative UAVs. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2301.06567 [pdf, other]

Scalable Surface Water Mapping up to Fine-scale using Geometric Features of Water from Topographic Airborne LiDAR Data

Authors: Hunsoo Song, Jinha Jung

Abstract: Despite substantial technological advancements, the comprehensive mapping of surface water, particularly smaller bodies (<1ha), continues to be a challenge due to a lack of robust, scalable methods. Standard methods require either training labels or site-specific parameter tuning, which complicates automated mapping and introduces biases related to training data and parameters. The reliance on wat… ▽ More Despite substantial technological advancements, the comprehensive mapping of surface water, particularly smaller bodies (<1ha), continues to be a challenge due to a lack of robust, scalable methods. Standard methods require either training labels or site-specific parameter tuning, which complicates automated mapping and introduces biases related to training data and parameters. The reliance on water's reflectance properties, including LiDAR intensity, further complicates the matter, as higher-resolution images inherently produce more noise. To mitigate these difficulties, we propose a unique method that focuses on the geometric characteristics of water instead of its variable reflectance properties. Unlike preceding approaches, our approach relies entirely on 3D coordinate observations from airborne LiDAR data, taking advantage of the principle that connected surface water remains flat due to gravity. By harnessing this natural law in conjunction with connectivity, our method can accurately and scalably identify small water bodies, eliminating the need for training labels or repetitive parameter tuning. Consequently, our approach enables the creation of comprehensive 3D topographic maps that include both water and terrain, all performed in an unsupervised manner using only airborne laser scanning data, potentially enhancing the process of generating reliable 3D topographic maps. We validated our method across extensive and diverse landscapes, while comparing it to highly competitive Normalized Difference Water Index (NDWI)-based methods and assessing it using a reference surface water map. In conclusion, our method offers a new approach to address persistent difficulties in robust, scalable surface water mapping and 3D topographic mapping, using solely airborne LiDAR data. △ Less

Submitted 15 August, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

arXiv:2212.13654 [pdf]

Large-scale single-photon imaging

Authors: Liheng Bian, Haoze Song, Lintao Peng, Xuyang Chang, Xi Yang, Roarke Horstmeyer, Lin Ye, Tong Qin, Dezhi Zheng, Jun Zhang

Abstract: Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep lea… ▽ More Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep learning into SPAD, enabling super-resolution single-photon imaging over an order of magnitude, with significant enhancement of bit depth and imaging quality. We first studied the complex photon flow model of SPAD electronics to accurately characterize multiple physical noise sources, and collected a real SPAD image dataset (64 $\times$ 32 pixels, 90 scenes, 10 different bit depth, 3 different illumination flux, 2790 images in total) to calibrate noise model parameters. With this real-world physical noise model, we for the first time synthesized a large-scale realistic single-photon image dataset (image pairs of 5 different resolutions with maximum megapixels, 17250 scenes, 10 different bit depth, 3 different illumination flux, 2.6 million images in total) for subsequent network training. To tackle the severe super-resolution challenge of SPAD inputs with low bit depth, low resolution, and heavy noise, we further built a deep transformer network with a content-adaptive self-attention mechanism and gated fusion modules, which can dig global contextual features to remove multi-source noise and extract full-frequency details. We applied the technique on a series of experiments including macroscopic and microscopic imaging, microfluidic inspection, and Fourier ptychography. The experiments validate the technique's state-of-the-art super-resolution SPAD imaging performance, with more than 5 dBでしべる superiority on PSNR compared to the existing methods. △ Less

Submitted 27 December, 2022; originally announced December 2022.

arXiv:2211.14771 [pdf, other]

Performance Analysis of Free-Space Information Sharing in Full-Duplex Semantic Communications

Authors: Hongyang Du, Jiacheng Wang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Boon Hee Soong

Abstract: In next-generation Internet services, such as Metaverse, the mixed reality (MR) technique plays a vital role. Yet the limited computing capacity of the user-side MR headset-mounted device (HMD) prevents its further application, especially in scenarios that require a lot of computation. One way out of this dilemma is to design an efficient information sharing scheme among users to replace the heavy… ▽ More In next-generation Internet services, such as Metaverse, the mixed reality (MR) technique plays a vital role. Yet the limited computing capacity of the user-side MR headset-mounted device (HMD) prevents its further application, especially in scenarios that require a lot of computation. One way out of this dilemma is to design an efficient information sharing scheme among users to replace the heavy and repetitive computation. In this paper, we propose a free-space information sharing mechanism based on full-duplex device-to-device (D2D) semantic communications. Specifically, the view images of MR users in the same real-world scenario may be analogous. Therefore, when one user (i.e., a device) completes some computation tasks, the user can send his own calculation results and the semantic features extracted from the user's own view image to nearby users (i.e., other devices). On this basis, other users can use the received semantic features to obtain the spatial matching of the computational results under their own view images without repeating the computation. Using generalized small-scale fading models, we analyze the key performance indicators of full-duplex D2D communications, including channel capacity and bit error probability, which directly affect the transmission of semantic information. Finally, the numerical analysis experiment proves the effectiveness of our proposed methods. △ Less

Submitted 27 November, 2022; originally announced November 2022.

arXiv:2211.09988 [pdf, ps, other]

Exploring WavLM on Speech Enhancement

Authors: Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

Abstract: There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To better understand the efficacy of self-supervised learning models for speech enhancement, in this work, we design and conduct a series of experiments with… ▽ More There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To better understand the efficacy of self-supervised learning models for speech enhancement, in this work, we design and conduct a series of experiments with three resource conditions by combining WavLM and two high-quality speech enhancement systems. Also, we propose a regression-based WavLM training objective and a noise-mixing data configuration to further boost the downstream enhancement performance. The experiments on the DNS challenge dataset and a simulation dataset show that the WavLM benefits the speech enhancement task in terms of both speech quality and speech recognition accuracy, especially for low fine-tuning resources. For the high fine-tuning resource condition, only the word error rate is substantially improved. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: Accepted by IEEE SLT 2022

arXiv:2211.02419 [pdf, other]

High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

Authors: Yucong Lin, Jinhua Su, Yuhang Li, Yuhao Wei, Hanchao Yan, Saining Zhang, Jiaan Luo, Danni Ai, Hong Song, Jingfan Fan, Tianyu Fu, Deqiang Xiao, Feifei Wang, Jue Hou, Jian Yang

Abstract: Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We develope… ▽ More Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2210.14103 [pdf, other]

Bit Error and Block Error Rate Training for ML-Assisted Communication

Authors: Reinhard Wiesmayr, Gian Marti, Chris Dick, Haochuan Song, Christoph Studer

Abstract: Even though machine learning (ML) techniques are being widely used in communications, the question of how to train communication systems has received surprisingly little attention. In this paper, we show that the commonly used binary cross-entropy (BCE) loss is a sensible choice in uncoded systems, e.g., for training ML-assisted data detectors, but may not be optimal in coded systems. We propose n… ▽ More Even though machine learning (ML) techniques are being widely used in communications, the question of how to train communication systems has received surprisingly little attention. In this paper, we show that the commonly used binary cross-entropy (BCE) loss is a sensible choice in uncoded systems, e.g., for training ML-assisted data detectors, but may not be optimal in coded systems. We propose new loss functions targeted at minimizing the block error rate and SNR deweighting, a novel method that trains communication systems for optimal performance over a range of signal-to-noise ratios. The utility of the proposed loss functions as well as of SNR deweighting is shown through simulations in NVIDIA Sionna. △ Less

Submitted 6 March, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: A shorter version of this paper will be presented at the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

arXiv:2209.01436 [pdf, other]

Augmented Deep Unfolding for Downlink Beamforming in Multi-cell Massive MIMO With Limited Feedback

Authors: Yifan Ma, Xianghao Yu, Jun Zhang, S. H. Song, Khaled B. Letaief

Abstract: In limited feedback multi-user multiple-input multiple-output (MU-MIMO) cellular networks, users send quantized information about the channel conditions to the associated base station (BS) for downlink beamforming. However, channel quantization and beamforming have been treated as two separate tasks conventionally, which makes it difficult to achieve global system optimality. In this paper, we pro… ▽ More In limited feedback multi-user multiple-input multiple-output (MU-MIMO) cellular networks, users send quantized information about the channel conditions to the associated base station (BS) for downlink beamforming. However, channel quantization and beamforming have been treated as two separate tasks conventionally, which makes it difficult to achieve global system optimality. In this paper, we propose an augmented deep unfolding (ADU) approach that jointly optimizes the beamforming scheme at the BSs and the channel quantization scheme at the users. In particular, the classic WMMSE beamformer is unrolled and a deep neural network (DNN) is leveraged to pre-process its input to enhance the performance. The variational information bottleneck technique is adopted to further improve the performance when the feedback capacity is strictly restricted. Simulation results demonstrate that the proposed ADU method outperforms all the benchmark schemes in terms of the system average rate. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: This paper has been accepted by IEEE GLOBECOM, Rio de Janeiro, Brazil, Dec. 2022

arXiv:2208.11243 [pdf, other]

doi 10.3390/rs15164105

A new explainable DTM generation algorithm with airborne LIDAR data: grounds are smoothly connected eventually

Authors: Hunsoo Song, Jinha Jung

Abstract: The digital terrain model (DTM) is fundamental geospatial data for various studies in urban, environmental, and Earth science. The reliability of the results obtained from such studies can be considerably affected by the errors and uncertainties of the underlying DTM. Numerous algorithms have been developed to mitigate the errors and uncertainties of DTM. However, most algorithms involve tricky pa… ▽ More The digital terrain model (DTM) is fundamental geospatial data for various studies in urban, environmental, and Earth science. The reliability of the results obtained from such studies can be considerably affected by the errors and uncertainties of the underlying DTM. Numerous algorithms have been developed to mitigate the errors and uncertainties of DTM. However, most algorithms involve tricky parameter selection and complicated procedures that make the algorithm's decision rule obscure, so it is often difficult to explain and predict the errors and uncertainties of the resulting DTM. Also, previous algorithms often consider the local neighborhood of each point for distinguishing non-ground objects, which limits both search radius and contextual understanding and can be susceptible to errors particularly if point density varies. This study presents an open-source DTM generation algorithm for airborne LiDAR data that can consider beyond the local neighborhood and whose results are easily explainable, predictable, and reliable. The key assumption of the algorithm is that grounds are smoothly connected while non-grounds are surrounded by areas having sharp elevation changes. The robustness and uniqueness of the proposed algorithm were evaluated in geographically complex environments through tiling evaluation compared to other state-of-the-art algorithms. △ Less

Submitted 23 August, 2022; originally announced August 2022.

Journal ref: Remote Sensing. 2023, 15(16), 4105

arXiv:2208.00198 [pdf, other]

Intelligent Reflecting Surface-Aided Maneuvering Target Sensing: True Velocity Estimation

Authors: Lei Xie, Xianghao Yu, S. H. Song

Abstract: Maneuvering target sensing will be an important service of future vehicular networks, where precise velocity estimation is one of the core tasks. To this end, the recently proposed integrated sensing and communications (ISAC) provides a promising platform for achieving accurate velocity estimation. However, with one mono-static ISAC base station (BS), only the radial projection of the true velocit… ▽ More Maneuvering target sensing will be an important service of future vehicular networks, where precise velocity estimation is one of the core tasks. To this end, the recently proposed integrated sensing and communications (ISAC) provides a promising platform for achieving accurate velocity estimation. However, with one mono-static ISAC base station (BS), only the radial projection of the true velocity can be estimated, which causes serious estimation error. In this paper, we investigate the estimation of the true velocity of a maneuvering target with the assistance of an intelligent reflecting surface (IRS). We propose an efficient velocity estimation algorithm by exploiting the two perspectives from the BS and IRS to the target. We propose a two-stage scheme where the true velocity can be recovered based on the Doppler frequency of the BS-target link and BS-IRS-target link. Experimental results validate that the true velocity can be precisely recovered and demonstrate the advantage of adding the IRS. △ Less

Submitted 31 October, 2022; v1 submitted 30 July, 2022; originally announced August 2022.

arXiv:2207.02399 [pdf]

doi 10.1002/mrm.29833

Learning Apparent Diffusion Coefficient Maps from Accelerated Radial k-Space Diffusion-Weighted MRI in Mice using a Deep CNN-Transformer Model

Authors: Yuemeng Li, Miguel Romanello Joaquim, Stephen Pickup, Hee Kwon Song, Rong Zhou, Yong Fan

Abstract: Purpose: To accelerate radially sampled diffusion weighted spin-echo (Rad-DW-SE) acquisition method for generating high quality apparent diffusion coefficient (ADC) maps. Methods: A deep learning method was developed to generate accurate ADC maps from accelerated DWI data acquired with the Rad-DW-SE method. The deep learning method integrates convolutional neural networks (CNNs) with vision transf… ▽ More Purpose: To accelerate radially sampled diffusion weighted spin-echo (Rad-DW-SE) acquisition method for generating high quality apparent diffusion coefficient (ADC) maps. Methods: A deep learning method was developed to generate accurate ADC maps from accelerated DWI data acquired with the Rad-DW-SE method. The deep learning method integrates convolutional neural networks (CNNs) with vision transformers to generate high quality ADC maps from accelerated DWI data, regularized by a monoexponential ADC model fitting term. A model was trained on DWI data of 147 mice and evaluated on DWI data of 36 mice, with acceleration factors of 4x and 8x compared to the original acquisition parameters. We have made our code publicly available at GitHub: https://github.com/ymli39/DeepADC-Net-Learning-Apparent-Diffusion-Coefficient-Maps, and our dataset can be downloaded at https://pennpancreaticcancerimagingresource.github.io/data.html. Results: Ablation studies and experimental results have demonstrated that the proposed deep learning model generates higher quality ADC maps from accelerated DWI data than alternative deep learning methods under comparison when their performance is quantified in whole images as well as in regions of interest, including tumors, kidneys, and muscles. Conclusions: The deep learning method with integrated CNNs and transformers provides an effective means to accurately compute ADC maps from accelerated DWI data acquired with the Rad-DW-SE method. △ Less

Submitted 1 August, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted by Magnetic Resonance in Medicine

Journal ref: Magn Reson Med 2023

arXiv:2206.09427 [pdf]

doi 10.1109/ACCESS.2023.3326326

QuDASH: Quantum-inspired rate adaptation approach for DASH video streaming

Authors: Bo Wei, Hang Song, Makoto Nakamura, Koichi Kimura, Nozomu Togawa, Jiro Katto

Abstract: Internet traffic is dramatically increasing with the development of network technologies and video streaming traffic accounts for large amount within the total traffic, which reveals the importance to guarantee the quality of content delivery service. Based on the network conditions, adaptive bitrate (ABR) control is utilized as a common technique which can choose the proper bitrate to ensure the… ▽ More Internet traffic is dramatically increasing with the development of network technologies and video streaming traffic accounts for large amount within the total traffic, which reveals the importance to guarantee the quality of content delivery service. Based on the network conditions, adaptive bitrate (ABR) control is utilized as a common technique which can choose the proper bitrate to ensure the video streaming quality. In this paper, new bitrate control method, QuDASH is proposed by taking advantage of the emerging quantum technology. In QuDASH, the adaptive control model is developed using the quadratic unconstrained binary optimization (QUBO), which aims at increasing the average bitrate and decreasing the video rebuffering events to maximize the user quality of experience (QoE). In order to formulate the video control model, first the QUBO terms of different factors are defined regarding video quality, bitrate change, and buffer condition. Then, all the individual QUBO terms are merged to generate an objective function. By minimizing the QUBO objective function, the bitrate choice is determined from the solution. The control model is solved by Digital Annealer, which is a quantum-inspired computing technology. The evaluation of the proposed method is carried out by simulation with the throughput traces obtained in real world under different scenarios and the comparison with other methods is conducted. Experiment results demonstrated that the proposed QuDASH method has better performance in terms of QoE compared with other advanced ABR methods. In 68.2% of the examined cases, QuDASH achieves the highest QoE results, which shows the superiority of the QuDASH over conventional methods. △ Less

Submitted 21 October, 2023; v1 submitted 19 June, 2022; originally announced June 2022.

Comments: Accepted Version

Journal ref: IEEE Access, 2023

arXiv:2206.08885 [pdf, other]

Incorporating intratumoral heterogeneity into weakly-supervised deep learning models via variance pooling

Authors: Iain Carmichael, Andrew H. Song, Richard J. Chen, Drew F. K. Williamson, Tiffany Y. Chen, Faisal Mahmood

Abstract: Supervised learning tasks such as cancer survival prediction from gigapixel whole slide images (WSIs) are a critical challenge in computational pathology that requires modeling complex features of the tumor microenvironment. These learning tasks are often solved with deep multi-instance learning (MIL) models that do not explicitly capture intratumoral heterogeneity. We develop a novel variance poo… ▽ More Supervised learning tasks such as cancer survival prediction from gigapixel whole slide images (WSIs) are a critical challenge in computational pathology that requires modeling complex features of the tumor microenvironment. These learning tasks are often solved with deep multi-instance learning (MIL) models that do not explicitly capture intratumoral heterogeneity. We develop a novel variance pooling architecture that enables a MIL model to incorporate intratumoral heterogeneity into its predictions. Two interpretability tools based on representative patches are illustrated to probe the biological signals captured by these models. An empirical study with 4,479 gigapixel WSIs from the Cancer Genome Atlas shows that adding variance pooling onto MIL frameworks improves survival prediction performance for five cancer types. △ Less

Submitted 19 November, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: MICCAI 2022

arXiv:2205.15805 [pdf, other]

Collaborative Sensing in Perceptive Mobile Networks: Opportunities and Challenges

Authors: Lei Xie, S. H. Song, Yonina C. Eldar, Khaled B. Letaief

Abstract: With the development of innovative applications that demand accurate environment information, e.g., autonomous driving, sensing becomes an important requirement for future wireless networks. To this end, integrated sensing and communication (ISAC) provides a promising platform to exploit the synergy between sensing and communication, where perceptive mobile networks (PMNs) were proposed to add acc… ▽ More With the development of innovative applications that demand accurate environment information, e.g., autonomous driving, sensing becomes an important requirement for future wireless networks. To this end, integrated sensing and communication (ISAC) provides a promising platform to exploit the synergy between sensing and communication, where perceptive mobile networks (PMNs) were proposed to add accurate sensing capability to existing wireless networks. The well-developed cellular networks offer exciting opportunities for sensing, including large coverage, strong computation and communication power, and most importantly networked sensing, where the perspectives from multiple sensing nodes can be collaboratively utilized for sensing the same target. However, PMNs also face big challenges such as the inherent interference between sensing and communication, the complex sensing environment, and the tracking of high-speed targets by cellular networks. This paper provides a comprehensive review on the design of PMNs, covering the popular network architectures, sensing protocols, standing research problems, and available solutions. Several future research directions that are critical for the development of PMNs are also discussed. △ Less

Submitted 31 May, 2022; originally announced May 2022.

arXiv:2205.14585 [pdf, other]

An unsupervised, open-source workflow for 2D and 3D building mapping from airborne LiDAR data

Authors: Hunsoo Song, Jinha Jung

Abstract: Despite the substantial demand for high-quality, large-area building maps, no established open-source workflow for generating 2D and 3D maps currently exists. This study introduces an automated, open-source workflow for large-scale 2D and 3D building mapping utilizing airborne LiDAR data. Uniquely, our workflow operates entirely unsupervised, eliminating the need for any training procedures. We ha… ▽ More Despite the substantial demand for high-quality, large-area building maps, no established open-source workflow for generating 2D and 3D maps currently exists. This study introduces an automated, open-source workflow for large-scale 2D and 3D building mapping utilizing airborne LiDAR data. Uniquely, our workflow operates entirely unsupervised, eliminating the need for any training procedures. We have integrated a specifically tailored DTM generation algorithm into our workflow to prevent errors in complex urban landscapes, especially around highways and overpasses. Through fine rasterization of LiDAR point clouds, we've enhanced building-tree differentiation, reduced errors near water bodies, and augmented computational efficiency by introducing a new planarity calculation. Our workflow offers a practical and scalable solution for the mass production of rasterized 2D and 3D building maps from raw airborne LiDAR data. Also, we elaborate on the influence of parameters and potential error sources to provide users with practical guidance. Our method's robustness has been rigorously optimized and tested using an extensive dataset (> 550 km$^2$), and further validated through comparison with deep learning-based and hand-digitized products. Notably, through these unparalleled, large-scale comparisons, we offer a valuable analysis of large-scale building maps generated via different methodologies, providing insightful evaluations of the effectiveness of each approach. We anticipate that our highly scalable building mapping workflow will facilitate the production of reliable 2D and 3D building maps, fostering advances in large-scale urban analysis. The code will be released upon publication. △ Less

Submitted 15 August, 2023; v1 submitted 29 May, 2022; originally announced May 2022.

arXiv:2205.10620 [pdf, other]

GNN-Enhanced Approximate Message Passing for Massive/Ultra-Massive MIMO Detection

Authors: Hengtao He, Alva Kosasih, Xianghao Yu, Jun Zhang, S. H. Song, Wibowo Hardjawana, Khaled B. Letaief

Abstract: Efficient massive/ultra-massive multiple-input multiple-output (MIMO) detection algorithms with satisfactory performance and low complexity are critical to meet the high throughput and ultra-low latency requirements in 5G and beyond communications, given the extremely large number of antennas. In this paper, we propose a low-complexity graph neural network (GNN) enhanced approximate message passin… ▽ More Efficient massive/ultra-massive multiple-input multiple-output (MIMO) detection algorithms with satisfactory performance and low complexity are critical to meet the high throughput and ultra-low latency requirements in 5G and beyond communications, given the extremely large number of antennas. In this paper, we propose a low-complexity graph neural network (GNN) enhanced approximate message passing (AMP) algorithm, AMP-GNN, for massive/ultra-massive MIMO detection. The structure of the neural network is customized by unfolding the AMP algorithm and introducing the GNN module for multiuser interference cancellation. Numerical results will show that the proposed AMP-GNN significantly improves the performance of the AMP detector and achieves comparable performance as the state-of-the-art deep learning-based MIMO detectors but with reduced computational complexity. Furthermore, it presents strong robustness to the change of the number of users. △ Less

Submitted 9 January, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

Comments: 6 pages, 5 figures, Accepted by IEEE WCNC 2023

arXiv:2203.15388 [pdf, ps, other]

Federated Learning-Based Localization with Heterogeneous Fingerprint Database

Authors: Xin Cheng, Chuan Ma, Jun Li, Haiwei Song, Feng Shu, Jiangzhou Wang

Abstract: Fingerprint-based localization plays an important role in indoor location-based services, where the position information is usually collected in distributed clients and gathered in a centralized server. However, the overloaded transmission as well as the potential risk of divulging private information burdens the application.Owning the ability to address these challenges, federated learning (FL)-b… ▽ More Fingerprint-based localization plays an important role in indoor location-based services, where the position information is usually collected in distributed clients and gathered in a centralized server. However, the overloaded transmission as well as the potential risk of divulging private information burdens the application.Owning the ability to address these challenges, federated learning (FL)-based fingerprinting localization comes into people's sights, which aims to train a global model while keeping raw data locally. However, in distributed machine learning (ML) scenarios, the unavoidable database heterogeneity usually degrades the performance of existing FL-based localization algorithm (FedLoc). In this paper, we first characterize the database heterogeneity with a computable metric, i.e., the area of convex hull, and verify it by experimental results. Then, a novel heterogeneous FL-based localization algorithm with the area of convex hull-based aggregation (FedLoc-AC) is proposed. Extensive experimental results, including real-word cases are conducted. We can conclude that the proposed FedLoc-AC can achieve an obvious prediction gain compared to FedLoc in heterogeneous scenarios and has almost the same prediction error with it in homogeneous scenarios. Moreover, the extension of FedLoc-AC in multi-floor cases is proposed and verified. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.12888 [pdf]

RSSI-CSI Measurement and Variation Mitigation with Commodity WiFi Device

Authors: Bo Wei, Hang Song, Jiro Katto, Takamaro Kikkawa

Abstract: Owing to the plentiful information released by the commodity devices, WiFi signals have been widely studied for various wireless sensing applications. In many works, both received signal strength indicator (RSSI) and the channel state information (CSI) are utilized as the key factors for precise sensing. However, the calculation and relationship between RSSI and CSI is not explained in detail. Fur… ▽ More Owing to the plentiful information released by the commodity devices, WiFi signals have been widely studied for various wireless sensing applications. In many works, both received signal strength indicator (RSSI) and the channel state information (CSI) are utilized as the key factors for precise sensing. However, the calculation and relationship between RSSI and CSI is not explained in detail. Furthermore, there are few works focusing on the measurement variation of the WiFi signal which impacts the sensing results. In this paper, the relationship between RSSI and CSI is studied in detail and the measurement variation of amplitude and phase information is investigated by extensive experiments. In the experiments, transmitter and receiver are directly connected by power divider and RF cables and the signal transmission is quantitatively controlled by RF attenuators. By changing the intensity of attenuation, the measurement of RSSI and CSI is carried out under different conditions. From the results, it is found that in order to get a reliable measurement of the signal amplitude and phase by commodity WiFi, the attenuation of the channels should not exceed 60 dBでしべる. Meanwhile, the difference between two channels should be lower than 10 dBでしべる. An active control mechanism is suggested to ensure the measurement stability. The findings and criteria of this work is promising to facilitate more precise sensing technologies with WiFi signal. △ Less

Submitted 24 March, 2022; originally announced March 2022.

arXiv:2203.10800 [pdf, other]

Graph Neural Networks for Wireless Communications: From Theory to Practice

Authors: Yifei Shen, Jun Zhang, S. H. Song, Khaled B. Letaief

Abstract: Deep learning-based approaches have been developed to solve challenging problems in wireless communications, leading to promising results. Early attempts adopted neural network architectures inherited from applications such as computer vision. They often yield poor performance in large scale networks (i.e., poor scalability) and unseen network settings (i.e., poor generalization). To resolve these… ▽ More Deep learning-based approaches have been developed to solve challenging problems in wireless communications, leading to promising results. Early attempts adopted neural network architectures inherited from applications such as computer vision. They often yield poor performance in large scale networks (i.e., poor scalability) and unseen network settings (i.e., poor generalization). To resolve these issues, graph neural networks (GNNs) have been recently adopted, as they can effectively exploit the domain knowledge, i.e., the graph topology in wireless communications problems. GNN-based methods can achieve near-optimal performance in large-scale networks and generalize well under different system settings, but the theoretical underpinnings and design guidelines remain elusive, which may hinder their practical implementations. This paper endeavors to fill both the theoretical and practical gaps. For theoretical guarantees, we prove that GNNs achieve near-optimal performance in wireless networks with much fewer training samples than traditional neural architectures. Specifically, to solve an optimization problem on an $n$-node graph (where the nodes may represent users, base stations, or antennas), GNNs' generalization error and required number of training samples are $\mathcal{O}(n)$ and $\mathcal{O}(n^2)$ times lower than the unstructured multi-layer perceptrons. For design guidelines, we propose a unified framework that is applicable to general design problems in wireless networks, which includes graph modeling, neural architecture design, and theory-guided performance enhancement. Extensive simulations, which cover a variety of important problems and network settings, verify our theory and the effectiveness of the proposed design framework. △ Less

Submitted 4 November, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.00917 [pdf, other]

Machine Learning Methods for Inferring the Number of UAV Emitters via Massive MIMO Receive Array

Authors: Yifan Li, Feng Shu, Jinsong Hu, Shihao Yan, Haiwei Song, Weiqiang Zhu, Da Tian, Yaoliang Song, Jiangzhou Wang

Abstract: To provide important prior knowledge for the DOA estimation of UAV emitters in future wireless networks, we present a complete DOA preprocessing system for inferring the number of emitters via massive MIMO receive array. Firstly, in order to eliminate the noise signals, two high-precision signal detectors, square root of maximum eigenvalue times minimum eigenvalue (SR-MME) and geometric mean (GM),… ▽ More To provide important prior knowledge for the DOA estimation of UAV emitters in future wireless networks, we present a complete DOA preprocessing system for inferring the number of emitters via massive MIMO receive array. Firstly, in order to eliminate the noise signals, two high-precision signal detectors, square root of maximum eigenvalue times minimum eigenvalue (SR-MME) and geometric mean (GM), are proposed. Compared to other detectors, SR-MME and GM can achieve a high detection probability while maintaining extremely low false alarm probability. Secondly, if the existence of emitters is determined by detectors, we need to further confirm their number. Therefore, we perform feature extraction on the the eigenvalue sequence of sample covariance matrix to construct feature vector and innovatively propose a multi-layer neural network (ML-NN). Additionally, the support vector machine (SVM), and naive Bayesian classifier (NBC) are also designed. The simulation results show that the machine learning-based methods can achieve good results in signal classification, especially neural networks, which can always maintain the classification accuracy above 70\% with massive MIMO receive array. Finally, we analyze the classical signal classification methods, Akaike (AIC) and Minimum description length (MDL). It is concluded that the two methods are not suitable for scenarios with massive MIMO arrays, and they also have much worse performance than machine learning-based classifiers. △ Less

Submitted 10 March, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

arXiv:2202.12808 [pdf, other]

High-Dimensional Sparse Bayesian Learning without Covariance Matrices

Authors: Alexander Lin, Andrew H. Song, Berkin Bilgic, Demba Ba

Abstract: Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem. However, the most popular inference algorithms for SBL become too expensive for high-dimensional settings, due to the need to store and compute a large covariance matrix. We introduce a new inference scheme that avoids explicit construction of the covariance matrix by solving multiple linear systems in p… ▽ More Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem. However, the most popular inference algorithms for SBL become too expensive for high-dimensional settings, due to the need to store and compute a large covariance matrix. We introduce a new inference scheme that avoids explicit construction of the covariance matrix by solving multiple linear systems in parallel to obtain the posterior moments for SBL. Our approach couples a little-known diagonal estimation result from numerical linear algebra with the conjugate gradient algorithm. On several simulations, our method scales better than existing approaches in computation time and memory, especially for structured dictionaries capable of fast matrix-vector multiplication. △ Less

Submitted 25 February, 2022; originally announced February 2022.

Comments: 5 pages

Journal ref: IEEE ICASSP 2022

arXiv:2112.14391 [pdf, other]

Perceptive Mobile Network with Distributed Target Monitoring Terminals: Leaking Communication Energy for Sensing

Authors: Lei Xie, Peilan Wang, S. H. Song, Khaled B. Letaief

Abstract: Integrated sensing and communication (ISAC) creates a platform to exploit the synergy between two powerful functionalities that have been developing separately. However, the interference management and resource allocation between sensing and communication have not been fully studied. In this paper, we consider the design of perceptive mobile networks (PMNs) by adding sensing capability to current… ▽ More Integrated sensing and communication (ISAC) creates a platform to exploit the synergy between two powerful functionalities that have been developing separately. However, the interference management and resource allocation between sensing and communication have not been fully studied. In this paper, we consider the design of perceptive mobile networks (PMNs) by adding sensing capability to current cellular networks. To avoid the full-duplex operation, we propose the PMN with distributed target monitoring terminals (TMTs) where passive TMTs are deployed over wireless networks to locate the sensing target (ST). We jointly optimize the transmit and receive beamformers towards the communication user terminals (UEs) and the ST by alternating-optimization (AO) and prove its convergence. To reduce computation complexity and obtain physical insights, we further investigate the use of linear transceivers, including zero forcing and beam synthesis (B-syn). Our analysis revealed interesting physical insights regarding interference management and resource allocation between sensing and communication: 1) instead of forming dedicated sensing signals, it is more efficient to redesign the communication signals for both communication and sensing purposes and "leak" communication energy for sensing; 2) the amount of energy leakage from one UE to the ST depends on their relative locations. △ Less

Submitted 19 April, 2022; v1 submitted 28 December, 2021; originally announced December 2021.

Comments: This paper has been submitted to the IEEE for possible publication

arXiv:2112.13293 [pdf, other]

Deep-learned speckle pattern and its application to ghost imaging

Authors: Xiaoyu Nie, Haotian Song, Wenhan Ren, Xingchen Zhao, Zhedong Zhang, Tao Peng, Marlan O. Scully

Abstract: In this paper, we present a method for speckle pattern design using deep learning. The speckle patterns possess unique features after experiencing convolutions in Speckle-Net, our well-designed framework for speckle pattern generation. We then apply our method to the computational ghost imaging system. The standard deep learning-assisted ghost imaging methods use the network to recognize the recon… ▽ More In this paper, we present a method for speckle pattern design using deep learning. The speckle patterns possess unique features after experiencing convolutions in Speckle-Net, our well-designed framework for speckle pattern generation. We then apply our method to the computational ghost imaging system. The standard deep learning-assisted ghost imaging methods use the network to recognize the reconstructed objects or imaging algorithms. In contrast, this innovative application optimizes the illuminating speckle patterns via Speckle-Net with specific sampling ratios. Our method, therefore, outperforms the other techniques for ghost imaging, particularly its ability to retrieve high-quality images with extremely low sampling ratios. It opens a new route towards nontrivial speckle generation by referring to a standard loss function on specified objectives with the modified deep neural network. It also has great potential for applications in the fields of dynamic speckle illumination microscopy, structured illumination microscopy, x-ray imaging, photo-acoustic imaging, and optical lattices. △ Less

Submitted 27 December, 2021; v1 submitted 25 December, 2021; originally announced December 2021.

Comments: 12 pages, 12 figures

arXiv:2112.00330 [pdf, other]

Soft-Output Joint Channel Estimation and Data Detection using Deep Unfolding

Authors: Haochuan Song, Xiaohu You, Chuan Zhang, Christoph Studer

Abstract: We propose a novel soft-output joint channel estimation and data detection (JED) algorithm for multiuser (MU) multiple-input multiple-output (MIMO) wireless communication systems. Our algorithm approximately solves a maximum a-posteriori JED optimization problem using deep unfolding and generates soft-output information for the transmitted bits in every iteration. The parameters of the unfolded al… ▽ More We propose a novel soft-output joint channel estimation and data detection (JED) algorithm for multiuser (MU) multiple-input multiple-output (MIMO) wireless communication systems. Our algorithm approximately solves a maximum a-posteriori JED optimization problem using deep unfolding and generates soft-output information for the transmitted bits in every iteration. The parameters of the unfolded algorithm are computed by a hyper-network that is trained with a binary cross entropy (BCE) loss. We evaluate the performance of our algorithm in a coded MU-MIMO system with 8 basestation antennas and 4 user equipments and compare it to state-of-the-art algorithms separate channel estimation from soft-output data detection. Our results demonstrate that our JED algorithm outperforms such data detectors with as few as 10 iterations. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: Presented at the 2021 IEEE Information Theory Workshop (ITW)

arXiv:2111.14022 [pdf, other]

Cell-Free Massive MIMO Detection: A Distributed Expectation Propagation Approach

Authors: Hengtao He, Xianghao Yu, Jun Zhang, S. H. Song, Khaled B. Letaief

Abstract: Cell-free massive MIMO is one of the core technologies for future wireless networks. It is expected to bring enormous benefits, including ultra-high reliability, data throughput, energy efficiency, and uniform coverage. As a radically distributed system, the performance of cell-free massive MIMO critically relies on efficient distributed processing algorithms. In this paper, we propose a distribut… ▽ More Cell-free massive MIMO is one of the core technologies for future wireless networks. It is expected to bring enormous benefits, including ultra-high reliability, data throughput, energy efficiency, and uniform coverage. As a radically distributed system, the performance of cell-free massive MIMO critically relies on efficient distributed processing algorithms. In this paper, we propose a distributed expectation propagation (EP) detector for cell-free massive MIMO, which consists of two modules: a nonlinear module at the central processing unit (CPU) and a linear module at each access point (AP). The turbo principle in iterative channel decoding is utilized to compute and pass the extrinsic information between the two modules. An analytical framework is provided to characterize the asymptotic performance of the proposed EP detector with a large number of antennas. Furthermore, a distributed iterative channel estimation and data detection (ICD) algorithm is developed to handle the practical setting with imperfect channel state information (CSI). Simulation results will show that the proposed method outperforms existing detectors for cell-free massive MIMO systems in terms of the bit-error rate and demonstrate that the developed theoretical analysis accurately predicts system performance. Finally, it is shown that with imperfect CSI, the proposed ICD algorithm improves the system performance significantly and enables non-orthogonal pilots to reduce the pilot overhead. △ Less

Submitted 7 March, 2023; v1 submitted 27 November, 2021; originally announced November 2021.

Comments: 31 Pages, 8 Figures, 2 Tables. This paper has been submitted to the IEEE for possible publication. arXiv admin note: substantial text overlap with arXiv:2108.07498

arXiv:2111.02509 [pdf, ps, other]

Clustering-based Multicast Scheme for UAV Networks

Authors: Hao Song, Lingjia Liu, Bodong Shang, Scott Pudlewski, Elizabeth Serena Bentley

Abstract: When an unmanned aerial vehicle (UAV) network is utilized as an aerial small base station (BS), like a relay deployed far away from macro BSs, existing multicast methods based on acknowledgement (ACK) feedback and retransmissions may encounter severe delay and signaling overhead due to hostile wireless environments caused by a long-distance propagation and numerous UAVs. In this paper, a novel mul… ▽ More When an unmanned aerial vehicle (UAV) network is utilized as an aerial small base station (BS), like a relay deployed far away from macro BSs, existing multicast methods based on acknowledgement (ACK) feedback and retransmissions may encounter severe delay and signaling overhead due to hostile wireless environments caused by a long-distance propagation and numerous UAVs. In this paper, a novel multicast scheme is designed for UAV networks serving as an aerial small BS, where a UAV experiencing a packet loss will request the packet from other UAVs in the same cluster rather than relying on retransmissions of BSs. The technical details of the introduced multicast scheme are designed with the carrier sense multiple access with collision avoidance (CSMA/CA) protocol for practicability and without loss of generality. Then, the Poisson cluster process is employed to model UAV networks to capture their dynamic network topology, based on which distance distributions are derived using tools of stochastic geometry for analytical tractability. Additionally, critical performance indicators of the designed multicast scheme are analyzed. Through extensive simulation studies, the superiority of the designed multicast scheme is demonstrated and the system design insight related to the proper number of clusters is revealed. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: 11 pages, 15 figures

arXiv:2110.15928 [pdf, other]

Joint Channel Estimation and Data Detection in Cell-Free Massive MU-MIMO Systems

Authors: Haochuan Song, Tom Goldstein, Xiaohu You, Chuan Zhang, Olav Tirkkonen, Christoph Studer

Abstract: We propose a joint channel estimation and data detection (JED) algorithm for densely-populated cell-free massive multiuser (MU) multiple-input multiple-output (MIMO) systems, which reduces the channel training overhead caused by the presence of hundreds of simultaneously transmitting user equipments (UEs). Our algorithm iteratively solves a relaxed version of a maximum a-posteriori JED problem and… ▽ More We propose a joint channel estimation and data detection (JED) algorithm for densely-populated cell-free massive multiuser (MU) multiple-input multiple-output (MIMO) systems, which reduces the channel training overhead caused by the presence of hundreds of simultaneously transmitting user equipments (UEs). Our algorithm iteratively solves a relaxed version of a maximum a-posteriori JED problem and simultaneously exploits the sparsity of cell-free massive MU-MIMO channels as well as the boundedness of QAM constellations. In order to improve the performance and convergence of the algorithm, we propose methods that permute the access point and UE indices to form so-called virtual cells, which leads to better initial solutions. We assess the performance of our algorithm in terms of root-mean-squared-symbol error, bit error rate, and mutual information, and we demonstrate that JED significantly reduces the pilot overhead compared to orthogonal training, which enables reliable communication with short packets to a large number of UEs. △ Less

Submitted 29 October, 2021; originally announced October 2021.

Comments: To appear in the IEEE Transactions on Wireless Communications

arXiv:2110.07309 [pdf, other]

Cell-Free Massive MIMO for 6G Wireless Communication Networks

Authors: Hengtao He, Xianghao Yu, Jun Zhang, S. H. Song, Khaled B. Letaief

Abstract: The recently commercialized fifth-generation (5G) wireless communication networks achieved many improvements, including air interface enhancement, spectrum expansion, and network intensification by several key technologies, such as massive multiple-input multiple-output (MIMO), millimeter-wave communications, and ultra-dense networking. Despite the deployment of 5G commercial systems, wireless com… ▽ More The recently commercialized fifth-generation (5G) wireless communication networks achieved many improvements, including air interface enhancement, spectrum expansion, and network intensification by several key technologies, such as massive multiple-input multiple-output (MIMO), millimeter-wave communications, and ultra-dense networking. Despite the deployment of 5G commercial systems, wireless communications is still facing many challenges to enable connected intelligence and a myriad of applications such as industrial Internet-of-things, autonomous systems, brain-computer interfaces, digital twin, tactile Internet, etc. Therefore, it is urgent to start research on the sixth-generation (6G) wireless communication systems. Among the candidate technologies for such systems, cell-free massive MIMO which combines the advantages of distributed systems and massive MIMO is considered as a key solution to enhance the wireless transmission efficiency and becomes the international frontier. In this paper, we present a comprehensive study on cell-free massive MIMO for 6G wireless communication networks, especially from the signal processing perspective. We focus on enabling physical layer technologies for cell-free massive MIMO, such as user association, pilot assignment, transmitter and receiver design, as well as power control and allocation. Furthermore, some current and future research problems are highlighted. △ Less

Submitted 29 November, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: 28 pages, 4 figures, 4 tables, Accepted by Journal of Communications and Information Networks

arXiv:2110.05797 [pdf, other]

Zero-bias Deep Neural Network for Quickest RF Signal Surveillance

Authors: Yongxin Liu, Yingjie Chen, Jian Wang, Shuteng Niu, Dahai Liu, Houbing Song

Abstract: The Internet of Things (IoT) is reshaping modern society by allowing a decent number of RF devices to connect and share information through RF channels. However, such an open nature also brings obstacles to surveillance. For alleviation, a surveillance oracle, or a cognitive communication entity needs to identify and confirm the appearance of known or unknown signal sources in real-time. In this p… ▽ More The Internet of Things (IoT) is reshaping modern society by allowing a decent number of RF devices to connect and share information through RF channels. However, such an open nature also brings obstacles to surveillance. For alleviation, a surveillance oracle, or a cognitive communication entity needs to identify and confirm the appearance of known or unknown signal sources in real-time. In this paper, we provide a deep learning framework for RF signal surveillance. Specifically, we jointly integrate the Deep Neural Networks (DNNs) and Quickest Detection (QD) to form a sequential signal surveillance scheme. We first analyze the latent space characteristic of neural network classification models, and then we leverage the response characteristics of DNN classifiers and propose a novel method to transform existing DNN classifiers into performance-assured binary abnormality detectors. In this way, we seamlessly integrate the DNNs with the parametric quickest detection. Finally, we propose an enhanced Elastic Weight Consolidation (EWC) algorithm with better numerical stability for DNNs in signal surveillance systems to evolve incrementally, we demonstrate that the zero-bias DNN is superior to regular DNN models considering incremental learning and decision fairness. We evaluated the proposed framework using real signal datasets and we believe this framework is helpful in developing a trustworthy IoT ecosystem. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: This paper has been accepted for publication in IEEE IPCCC 2021. arXiv admin note: text overlap with arXiv:2105.15098

arXiv:2110.04683 [pdf, other]

Mixture Model Auto-Encoders: Deep Clustering through Dictionary Learning

Authors: Alexander Lin, Andrew H. Song, Demba Ba

Abstract: State-of-the-art approaches for clustering high-dimensional data utilize deep auto-encoder architectures. Many of these networks require a large number of parameters and suffer from a lack of interpretability, due to the black-box nature of the auto-encoders. We introduce Mixture Model Auto-Encoders (MixMate), a novel architecture that clusters data by performing inference on a generative model. D… ▽ More State-of-the-art approaches for clustering high-dimensional data utilize deep auto-encoder architectures. Many of these networks require a large number of parameters and suffer from a lack of interpretability, due to the black-box nature of the auto-encoders. We introduce Mixture Model Auto-Encoders (MixMate), a novel architecture that clusters data by performing inference on a generative model. Derived from the perspective of sparse dictionary learning and mixture models, MixMate comprises several auto-encoders, each tasked with reconstructing data in a distinct cluster, while enforcing sparsity in the latent space. Through experiments on various image datasets, we show that MixMate achieves competitive performance compared to state-of-the-art deep clustering algorithms, while using orders of magnitude fewer parameters. △ Less

Submitted 25 February, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

Comments: 5 pages, 3 figures

Journal ref: IEEE ICASSP 2022

arXiv:2110.00272 [pdf, other]

Learn to Communicate with Neural Calibration: Scalability and Generalization

Authors: Yifan Ma, Yifei Shen, Xianghao Yu, Jun Zhang, S. H. Song, Khaled B. Letaief

Abstract: The conventional design of wireless communication systems typically relies on established mathematical models that capture the characteristics of different communication modules. Unfortunately, such design cannot be easily and directly applied to future wireless networks, which will be characterized by large-scale ultra-dense networks whose design complexity scales exponentially with the network s… ▽ More The conventional design of wireless communication systems typically relies on established mathematical models that capture the characteristics of different communication modules. Unfortunately, such design cannot be easily and directly applied to future wireless networks, which will be characterized by large-scale ultra-dense networks whose design complexity scales exponentially with the network size. Furthermore, such networks will vary dynamically in a significant way, which makes it intractable to develop comprehensive analytical models. Recently, deep learning-based approaches have emerged as potential alternatives for designing complex and dynamic wireless systems. However, existing learning-based methods have limited capabilities to scale with the problem size and to generalize with varying network settings. In this paper, we propose a scalable and generalizable neural calibration framework for future wireless system design, where a neural network is adopted to calibrate the input of conventional model-based algorithms. Specifically, the backbone of a traditional time-efficient algorithm is integrated with deep neural networks to achieve a high computational efficiency, while enjoying enhanced performance. The permutation equivariance property, carried out by the topological structure of wireless systems, is furthermore utilized to develop a generalizable neural network architecture. The proposed neural calibration framework is applied to solve challenging resource management problems in massive multiple-input multiple-output (MIMO) systems. Simulation results will show that the proposed neural calibration approach enjoys significantly improved scalability and generalization compared with the existing learning-based methods. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: submitted to IEEE Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2108.01529

arXiv:2109.12293 [pdf]

Adaptive video transmission using QUBO method and Digital Annealer based on Ising machine

Authors: Bo Wei, Hang Song, Jiro Katto

Abstract: With the dramatically increasing video streaming in the total network traffic, it is critical to develop effective algorithms to promote the content delivery service of high quality. Adaptive bitrate (ABR) control is the most essential technique which determines the proper bitrate to be chosen based on network conditions, thus realize high-quality video streaming. In this paper, a novel ABR strate… ▽ More With the dramatically increasing video streaming in the total network traffic, it is critical to develop effective algorithms to promote the content delivery service of high quality. Adaptive bitrate (ABR) control is the most essential technique which determines the proper bitrate to be chosen based on network conditions, thus realize high-quality video streaming. In this paper, a novel ABR strategy is proposed based on Ising machine by using the quadratic unconstrained binary optimization (QUBO) method and Digital Annealer (DA) for the first time. The proposed method is evaluated by simulation with the real-world measured throughput, and compared with other state-of-the-art methods. Experiment results show that the proposed QUBO-based method can outperform the existing methods, which demonstrating the superior of the proposed QUBO-based method. △ Less

Submitted 25 September, 2021; originally announced September 2021.

arXiv:2108.07673 [pdf, other]

doi 10.1016/j.optcom.2022.128450

0.8% Nyquist computational ghost imaging via non-experimental deep learning

Authors: Haotian Song, Xiaoyu Nie, Hairong Su, Hui Chen, Yu Zhou, Xingchen Zhao, Tao Peng, Marlan O. Scully

Abstract: We present a framework for computational ghost imaging based on deep learning and customized pink noise speckle patterns. The deep neural network in this work, which can learn the sensing model and enhance image reconstruction quality, is trained merely by simulation. To demonstrate the sub-Nyquist level in our work, the conventional computational ghost imaging results, reconstructed imaging resul… ▽ More We present a framework for computational ghost imaging based on deep learning and customized pink noise speckle patterns. The deep neural network in this work, which can learn the sensing model and enhance image reconstruction quality, is trained merely by simulation. To demonstrate the sub-Nyquist level in our work, the conventional computational ghost imaging results, reconstructed imaging results using white noise and pink noise via deep learning are compared under multiple sampling rates at different noise conditions. We show that the proposed scheme can provide high-quality images with a sampling rate of 0.8% even when the object is outside the training dataset, and it is robust to noisy environments. This method is excellent for various applications, particularly those that require a low sampling rate, fast reconstruction efficiency, or experience strong noise interference. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: 10 pages, 6 figures

Showing 1–50 of 90 results for author: Song, H