Search | arXiv e-print repository

arXiv:2408.04805 [pdf]

Improved Robustness for Deep Learning-based Segmentation of Multi-Center Myocardial Perfusion MRI Datasets Using Data Adaptive Uncertainty-guided Space-time Analysis

Authors: Dilek M. Yalcinkaya, Khalid Youssef, Bobak Heydari, Janet Wei, Noel Bairey Merz, Robert Judd, Rohan Dharmakumar, Orlando P. Simonetti, Jonathan W. Weinsaft, Subha V. Raman, Behzad Sharif

Abstract: Background. Fully automatic analysis of myocardial perfusion MRI datasets enables rapid and objective reporting of stress/rest studies in patients with suspected ischemic heart disease. Developing deep learning techniques that can analyze multi-center datasets despite limited training data and variations in software and hardware is an ongoing challenge. Methods. Datasets from 3 medical centers a… ▽ More Background. Fully automatic analysis of myocardial perfusion MRI datasets enables rapid and objective reporting of stress/rest studies in patients with suspected ischemic heart disease. Developing deep learning techniques that can analyze multi-center datasets despite limited training data and variations in software and hardware is an ongoing challenge. Methods. Datasets from 3 medical centers acquired at 3T (n = 150 subjects) were included: an internal dataset (inD; n = 95) and two external datasets (exDs; n = 55) used for evaluating the robustness of the trained deep neural network (DNN) models against differences in pulse sequence (exD-1) and scanner vendor (exD-2). A subset of inD (n = 85) was used for training/validation of a pool of DNNs for segmentation, all using the same spatiotemporal U-Net architecture and hyperparameters but with different parameter initializations. We employed a space-time sliding-patch analysis approach that automatically yields a pixel-wise "uncertainty map" as a byproduct of the segmentation process. In our approach, a given test case is segmented by all members of the DNN pool and the resulting uncertainty maps are leveraged to automatically select the "best" one among the pool of solutions. Results. The proposed DAUGS analysis approach performed similarly to the established approach on the internal dataset (p = n.s.) whereas it significantly outperformed on the external datasets (p < 0.005 for exD-1 and exD-2). Moreover, the number of image series with "failed" segmentation was significantly lower for the proposed vs. the established approach (4.3% vs. 17.1%, p < 0.0005). Conclusions. The proposed DAUGS analysis approach has the potential to improve the robustness of deep learning methods for segmentation of multi-center stress perfusion datasets with variations in the choice of pulse sequence, site location or scanner vendor. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: Accepted for publication in JCMR, 2024

arXiv:2407.17758 [pdf, other]

Speed-enhanced Subdomain Adaptation Regression for Long-term Stable Neural Decoding in Brain-computer Interfaces

Authors: Jiyu Wei, Dazhong Rong, Xinyun Zhu, Qinming He, Yueming Wang

Abstract: Brain-computer interfaces (BCIs) offer a means to convert neural signals into control signals, providing a potential restoration of movement for people with paralysis. Despite their promise, BCIs face a significant challenge in maintaining decoding accuracy over time due to neural nonstationarities. However, the decoding accuracy of BCI drops severely across days due to the neural data drift. Whil… ▽ More Brain-computer interfaces (BCIs) offer a means to convert neural signals into control signals, providing a potential restoration of movement for people with paralysis. Despite their promise, BCIs face a significant challenge in maintaining decoding accuracy over time due to neural nonstationarities. However, the decoding accuracy of BCI drops severely across days due to the neural data drift. While current recalibration techniques address this issue to a degree, they often fail to leverage the limited labeled data, to consider the signal correlation between two days, or to perform conditional alignment in regression tasks. This paper introduces a novel approach to enhance recalibration performance. We begin with preliminary experiments that reveal the temporal patterns of neural signal changes and identify three critical elements for effective recalibration: global alignment, conditional speed alignment, and feature-label consistency. Building on these insights, we propose the Speed-enhanced Subdomain Adaptation Regression (SSAR) framework, integrating semi-supervised learning with domain adaptation techniques in regression neural decoding. SSAR employs Speed-enhanced Subdomain Alignment (SeSA) for global and speed conditional alignment of similarly labeled data, with Contrastive Consistency Constraint (CCC) to enhance the alignment of SeSA by reinforcing feature-label consistency through contrastive learning. Our comprehensive set of experiments, both qualitative and quantitative, substantiate the superior recalibration performance and robustness of SSAR. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2406.10956 [pdf, other]

Robust Channel Learning for Large-Scale Radio Speaker Verification

Authors: Wenhao Yang, Jianguo Wei, Wenhuan Lu, Lei Li, Xugang Lu

Abstract: Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learnin… ▽ More Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learning (CRSL) framework that enhances the robustness of the current speaker verification pipeline, considering data source, data augmentation, and the efficiency of model transfer processes. Our framework introduces an augmentation module that mitigates bandwidth variations in radio speech datasets by manipulating the bandwidth of training inputs. It also addresses unknown noise by introducing noise within the manifold space. Additionally, we propose an efficient fine-tuning method that reduces the need for extensive additional training time and large amounts of data. Moreover, we develop a toolkit for assembling a large-scale radio speech corpus and establish a benchmark specifically tailored for radio scenario speaker verification studies. Experimental results demonstrate that our proposed methodology effectively enhances performance and mitigates degradation caused by radio transmission in speaker verification tasks. The code will be available on Github. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 12 pages, 11 figures

arXiv:2406.00993 [pdf]

Detection of Acetone as a Gas Biomarker for Diabetes Based on Gas Sensor Technology

Authors: Jiaming Wei, Tong Liu, Jipeng Huang, Xiaowei Li, Yurui Qi, Gangyin Luo

Abstract: With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for dia… ▽ More With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for diabetes breath analysis. This provides a more readily accepted method for early diabetes prevention and monitoring. Addressing issues such as the invasive nature, disease transmission risks, and complexity of diabetes testing, this study aims to design a diabetes gas biomarker acetone detection system centered around a sensor array using gas sensors and pattern recognition algorithms. The research covers sensor selection, sensor preparation, circuit design, data acquisition and processing, and detection model establishment to accurately identify acetone. Titanium dioxide was chosen as the nano gas-sensitive material to prepare the acetone gas sensor, with data collection conducted using STM32. Filtering was applied to process the raw sensor data, followed by feature extraction using principal component analysis. A recognition model based on support vector machine algorithm was used for qualitative identification of gas samples, while a recognition model based on backpropagation neural network was employed for quantitative detection of gas sample concentrations. Experimental results demonstrated recognition accuracies of 96% and 97.5% for acetone-ethanol and acetone-methanol mixed gases, and 90% for ternary acetone, ethanol, and methanol mixed gases. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 9 pages, 14 figures

arXiv:2405.12031 [pdf, other]

Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification

Authors: Nian Li, Jianguo Wei

Abstract: Transformer-based architectures for speaker verification typically require more training data than ECAPA-TDNN. Therefore, recent work has generally been trained on VoxCeleb1&2. We propose a backbone network based on self-attention, which can achieve competitive results when trained on VoxCeleb2 alone. The network alternates between neighborhood attention and global attention to capture local and g… ▽ More Transformer-based architectures for speaker verification typically require more training data than ECAPA-TDNN. Therefore, recent work has generally been trained on VoxCeleb1&2. We propose a backbone network based on self-attention, which can achieve competitive results when trained on VoxCeleb2 alone. The network alternates between neighborhood attention and global attention to capture local and global features, then aggregates features of different hierarchical levels, and finally performs attentive statistics pooling. Additionally, we employ a progressive channel fusion strategy to expand the receptive field in the channel dimension as the network deepens. We trained the proposed PCF-NAT model on VoxCeleb2 and evaluated it on VoxCeleb1 and the validation sets of VoxSRC. The EER and minDCF of the shallow PCF-NAT are on average more than 20% lower than those of similarly sized ECAPA-TDNN. Deep PCF-NAT achieves an EER lower than 0.5% on VoxCeleb1-O. The code and models are publicly available at https://github.com/ChenNan1996/PCF-NAT. △ Less

Submitted 29 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures, 3 tables; added github link

arXiv:2404.19108 [pdf, other]

Real-Time Convolutional Neural Network-Based Star Detection and Centroiding Method for CubeSat Star Tracker

Authors: Hongrui Zhao, Michael F. Lembeck, Adrian Zhuang, Riya Shah, Jesse Wei

Abstract: Star trackers are one of the most accurate celestial sensors used for absolute attitude determination. The devices detect stars in captured images and accurately compute their projected centroids on an imaging focal plane with subpixel precision. Traditional algorithms for star detection and centroiding often rely on threshold adjustments for star pixel detection and pixel brightness weighting for… ▽ More Star trackers are one of the most accurate celestial sensors used for absolute attitude determination. The devices detect stars in captured images and accurately compute their projected centroids on an imaging focal plane with subpixel precision. Traditional algorithms for star detection and centroiding often rely on threshold adjustments for star pixel detection and pixel brightness weighting for centroid computation. However, challenges like high sensor noise and stray light can compromise algorithm performance. This article introduces a Convolutional Neural Network (CNN)-based approach for star detection and centroiding, tailored to address the issues posed by noisy star tracker images in the presence of stray light and other artifacts. Trained using simulated star images overlayed with real sensor noise and stray light, the CNN produces both a binary segmentation map distinguishing star pixels from the background and a distance map indicating each pixel's proximity to the nearest star centroid. Leveraging this distance information alongside pixel coordinates transforms centroid calculations into a set of trilateration problems solvable via the least squares method. Our method employs efficient UNet variants for the underlying CNN architectures, and the variants' performances are evaluated. Comprehensive testing has been undertaken with synthetic image evaluations, hardware-in-the-loop assessments, and night sky tests. The tests consistently demonstrated that our method outperforms several existing algorithms in centroiding accuracy and exhibits superior resilience to high sensor noise and stray light interference. An additional benefit of our algorithms is that they can be executed in real-time on low-power edge AI processors. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2403.20168 [pdf, other]

Unsupervised Tumor-Aware Distillation for Multi-Modal Brain Image Translation

Authors: Chuan Huang, Jia Wei, Rui Li

Abstract: Multi-modal brain images from MRI scans are widely used in clinical diagnosis to provide complementary information from different modalities. However, obtaining fully paired multi-modal images in practice is challenging due to various factors, such as time, cost, and artifacts, resulting in modality-missing brain images. To address this problem, unsupervised multi-modal brain image translation has… ▽ More Multi-modal brain images from MRI scans are widely used in clinical diagnosis to provide complementary information from different modalities. However, obtaining fully paired multi-modal images in practice is challenging due to various factors, such as time, cost, and artifacts, resulting in modality-missing brain images. To address this problem, unsupervised multi-modal brain image translation has been extensively studied. Existing methods suffer from the problem of brain tumor deformation during translation, as they fail to focus on the tumor areas when translating the whole images. In this paper, we propose an unsupervised tumor-aware distillation teacher-student network called UTAD-Net, which is capable of perceiving and translating tumor areas precisely. Specifically, our model consists of two parts: a teacher network and a student network. The teacher network learns an end-to-end mapping from source to target modality using unpaired images and corresponding tumor masks first. Then, the translation knowledge is distilled into the student network, enabling it to generate more realistic tumor areas and whole images without masks. Experiments show that our model achieves competitive performance on both quantitative and qualitative evaluations of image quality compared with state-of-the-art methods. Furthermore, we demonstrate the effectiveness of the generated images on downstream segmentation tasks. Our code is available at https://github.com/scut-HC/UTAD-Net. △ Less

Submitted 24 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

Comments: 8 pages, 5 figures. It has been provisionally accepted for IJCNN 2024

arXiv:2307.16597 [pdf, other]

Errors Dynamics in Affine Group Systems

Authors: Xinghan Li, Jianqi Chen, Han Zhang, Jieqiang Wei, Junfeng Wu

Abstract: Errors dynamics captures the evolution of the state errors between two distinct trajectories, that are governed by the same system rule but initiated or perturbed differently. In particular, state observer error dynamics analysis in matrix Lie group is fundamental in practice. In this paper, we focus on the error dynamics analysis for an affine group system under external disturbances or random no… ▽ More Errors dynamics captures the evolution of the state errors between two distinct trajectories, that are governed by the same system rule but initiated or perturbed differently. In particular, state observer error dynamics analysis in matrix Lie group is fundamental in practice. In this paper, we focus on the error dynamics analysis for an affine group system under external disturbances or random noises. To this end, we first discuss the connections between the notions of affine group systems and linear group systems. We provide two equivalent characterizations of a linear group system. Such characterizations are based on the homeomorphism of its transition flow and linearity of its Lie algebra counterpart, respectively. Next, we investigate the evolution of a linear group system and we assume it is diffused by a Brownian motion in tangent spaces. We further show that the dynamics projected in the Lie algebra is governed by a stochastic differential equation with a linear drift term. We apply these findings in analyzing the error dynamics. Under differentiable disturbance, we derive an ordinary differential equation characterizing the evolution of the projected errors in the Lie algebra. In addition, the counterpart with stochastic disturbances is derived for the projected errors in terms of a stochastic differential equation. Explicit and accurate derivation of error dynamics is provided for matrix group $SE_N(3)$, which plays a vital role especially in robotic applications. △ Less

Submitted 18 December, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: 8pages,1 figure

arXiv:2307.05087 [pdf, other]

SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation

Authors: Zhengxin Lei, Feng Xu, Jiangtao Wei, Feng Cai, Feng Wang, Ya-Qiu Jin

Abstract: SAR images are highly sensitive to observation configurations, and they exhibit significant variations across different viewing angles, making it challenging to represent and learn their anisotropic features. As a result, deep learning methods often generalize poorly across different view angles. Inspired by the concept of neural radiance fields (NeRF), this study combines SAR imaging mechanisms w… ▽ More SAR images are highly sensitive to observation configurations, and they exhibit significant variations across different viewing angles, making it challenging to represent and learn their anisotropic features. As a result, deep learning methods often generalize poorly across different view angles. Inspired by the concept of neural radiance fields (NeRF), this study combines SAR imaging mechanisms with neural networks to propose a novel NeRF model for SAR image generation. Following the mapping and projection pinciples, a set of SAR images is modeled implicitly as a function of attenuation coefficients and scattering intensities in the 3D imaging space through a differentiable rendering equation. SAR-NeRF is then constructed to learn the distribution of attenuation coefficients and scattering intensities of voxels, where the vectorized form of 3D voxel SAR rendering equation and the sampling relationship between the 3D space voxels and the 2D view ray grids are analytically derived. Through quantitative experiments on various datasets, we thoroughly assess the multi-view representation and generalization capabilities of SAR-NeRF. Additionally, it is found that SAR-NeRF augumented dataset can significantly improve SAR target classification performance under few-shot learning setup, where a 10-type classification accuracy of 91.6\% can be achieved by using only 12 images per class. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2306.06467 [pdf, other]

A Chance-Constrained Optimal Design of Volt/VAR Control Rules for Distributed Energy Resources

Authors: Jinlei Wei, Sarthak Gupta, Dionysios C. Aliprantis, Vassilis Kekatos

Abstract: Deciding setpoints for distributed energy resources (DERs) via local control rules rather than centralized optimization offers significant autonomy. The IEEE Standard 1547 recommends deciding DER setpoints using Volt/VAR rules. Although such rules are specified as non-increasing piecewise-affine, their exact shape is left for the utility operators to decide and possibly customize per bus and grid… ▽ More Deciding setpoints for distributed energy resources (DERs) via local control rules rather than centralized optimization offers significant autonomy. The IEEE Standard 1547 recommends deciding DER setpoints using Volt/VAR rules. Although such rules are specified as non-increasing piecewise-affine, their exact shape is left for the utility operators to decide and possibly customize per bus and grid conditions. To address this need, this work optimally designs Volt/VAR rules to minimize ohmic losses on lines while maintaining voltages within allowable limits. This is practically relevant as excessive reactive injections could reduce equipment's lifetime due to overloading. We consider a linearized single-phase grid model. Even under this setting, optimal rule design (ORD) is technically challenging as Volt/VAR rules entail mixed-integer models, stability implications, and uncertainties in grid loading. Uncertainty is handled by minimizing the average losses under voltage chance constraints. To cope with the piecewise-affine shape of the rules, we build upon our previous reformulation of ORD as a deep learning task. A recursive neural network (RNN) surrogates Volt/VAR dynamics and thanks to back-propagation, we expedite this chance-constrained ORD. RNN weights coincide with rule parameters, and are trained using primal-dual decomposition. Numerical tests corroborate the efficacy of this novel ORD formulation and solution methodology. △ Less

Submitted 29 July, 2023; v1 submitted 10 June, 2023; originally announced June 2023.

arXiv:2210.10998 [pdf, other]

Semi-supervised object detection based on single-stage detector for thighbone fracture localization

Authors: Jinman Wei, Jinkun Yao, Guoshan Zhanga, Bin Guan, Yueming Zhang, Shaoquan Wang

Abstract: The thighbone is the largest bone supporting the lower body. If the thighbone fracture is not treated in time, it will lead to lifelong inability to walk. Correct diagnosis of thighbone disease is very important in orthopedic medicine. Deep learning is promoting the development of fracture detection technology. However, the existing computer aided diagnosis (CADきゃど) methods baesd on deep learning rel… ▽ More The thighbone is the largest bone supporting the lower body. If the thighbone fracture is not treated in time, it will lead to lifelong inability to walk. Correct diagnosis of thighbone disease is very important in orthopedic medicine. Deep learning is promoting the development of fracture detection technology. However, the existing computer aided diagnosis (CADきゃど) methods baesd on deep learning rely on a large number of manually labeled data, and labeling these data costs a lot of time and energy. Therefore, we develop a object detection method with limited labeled image quantity and apply it to the thighbone fracture localization. In this work, we build a semi-supervised object detection(SSOD) framework based on single-stage detector, which including three modules: adaptive difficult sample oriented (ADSO) module, Fusion Box and deformable expand encoder (Dex encoder). ADSO module takes the classification score as the label reliability evaluation criterion by weighting, Fusion Box is designed to merge similar pseudo boxes into a reliable box for box regression and Dex encoder is proposed to enhance the adaptability of image augmentation. The experiment is conducted on the thighbone fracture dataset, which includes 3484 training thigh fracture images and 358 testing thigh fracture images. The experimental results show that the proposed method achieves the state-of-the-art AP in thighbone fracture detection at different labeled data rates, i.e. 1%, 5% and 10%. Besides, we use full data to achieve knowledge distillation, our method achieves 86.2% AP50 and 52.6% AP75. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: Preprint submitted to Applied Soft Computing

arXiv:2210.07818 [pdf, other]

ISTA-Inspired Network for Image Super-Resolution

Authors: Yuqing Liu, Wei Zhang, Weifeng Sun, Zhikai Yu, Jianfeng Wei, Shengquan Li

Abstract: Deep learning for image super-resolution (SR) has been investigated by numerous researchers in recent years. Most of the works concentrate on effective block designs and improve the network representation but lack interpretation. There are also iterative optimization-inspired networks for image SR, which take the solution step as a whole without giving an explicit optimization step. This paper pro… ▽ More Deep learning for image super-resolution (SR) has been investigated by numerous researchers in recent years. Most of the works concentrate on effective block designs and improve the network representation but lack interpretation. There are also iterative optimization-inspired networks for image SR, which take the solution step as a whole without giving an explicit optimization step. This paper proposes an unfolding iterative shrinkage thresholding algorithm (ISTA) inspired network for interpretable image SR. Specifically, we analyze the problem of image SR and propose a solution based on the ISTA method. Inspired by the mathematical analysis, the ISTA block is developed to conduct the optimization in an end-to-end manner. To make the exploration more effective, a multi-scale exploitation block and multi-scale attention mechanism are devised to build the ISTA block. Experimental results show the proposed ISTA-inspired restoration network (ISTAR) achieves competitive or better performances than other optimization-inspired works with fewer parameters and lower computation complexity. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2209.11233 [pdf, other]

Evaluating Latent Space Robustness and Uncertainty of EEG-ML Models under Realistic Distribution Shifts

Authors: Neeraj Wagh, Jionghao Wei, Samarth Rawal, Brent M. Berry, Yogatheesan Varatharajah

Abstract: The recent availability of large datasets in bio-medicine has inspired the development of representation learning methods for multiple healthcare applications. Despite advances in predictive performance, the clinical utility of such methods is limited when exposed to real-world data. This study develops model diagnostic measures to detect potential pitfalls before deployment without assuming acces… ▽ More The recent availability of large datasets in bio-medicine has inspired the development of representation learning methods for multiple healthcare applications. Despite advances in predictive performance, the clinical utility of such methods is limited when exposed to real-world data. This study develops model diagnostic measures to detect potential pitfalls before deployment without assuming access to external data. Specifically, we focus on modeling realistic data shifts in electrophysiological signals (EEGs) via data transforms and extend the conventional task-based evaluations with analyses of a) the model's latent space and b) predictive uncertainty under these transforms. We conduct experiments on multiple EEG feature encoders and two clinically relevant downstream tasks using publicly available large-scale clinical EEGs. Within this experimental setting, our results suggest that measures of latent space integrity and model uncertainty under the proposed data shifts may help anticipate performance degradation during deployment. △ Less

Submitted 14 October, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: NeurIPS 2022 camera ready version. Code available at https://github.com/neerajwagh/evaluating-eeg-representations. tl;dr - We develop model diagnostic measures to identify failure modes of EEG-ML models before deployment without access to out-of-distribution data. Keywords - dataset shift, EEG, representation learning, robustness, latent space, uncertainty quantification, distribution shift

arXiv:2208.12251 [pdf, other]

A Gis Aided Approach for Geolocalizing an Unmanned Aerial System Using Deep Learning

Authors: Jianli Wei, Deniz Karakay, Alper Yilmaz

Abstract: The Global Positioning System (GPS) has become a part of our daily life with the primary goal of providing geopositioning service. For an unmanned aerial system (UAS), geolocalization ability is an extremely important necessity which is achieved using Inertial Navigation System (INS) with the GPS at its heart. Without geopositioning service, UAS is unable to fly to its destination or come back hom… ▽ More The Global Positioning System (GPS) has become a part of our daily life with the primary goal of providing geopositioning service. For an unmanned aerial system (UAS), geolocalization ability is an extremely important necessity which is achieved using Inertial Navigation System (INS) with the GPS at its heart. Without geopositioning service, UAS is unable to fly to its destination or come back home. Unfortunately, GPS signals can be jammed and suffer from a multipath problem in urban canyons. Our goal is to propose an alternative approach to geolocalize a UAS when GPS signal is degraded or denied. Considering UAS has a downward-looking camera on its platform that can acquire real-time images as the platform flies, we apply modern deep learning techniques to achieve geolocalization. In particular, we perform image matching to establish latent feature conjugates between UAS acquired imagery and satellite orthophotos. A typical application of feature matching suffers from high-rise buildings and new constructions in the field that introduce uncertainties into homography estimation, hence results in poor geolocalization performance. Instead, we extract GIS information from OpenStreetMap (OSM) to semantically segment matched features into building and terrain classes. The GIS mask works as a filter in selecting semantically matched features that enhance coplanarity conditions and the UAS geolocalization accuracy. Once the paper is published our code will be publicly available at https://github.com/OSUPCVLab/UbihereDrone2021. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: Paper published at SENSORS 2022 Conference

arXiv:2206.07142 [pdf]

Experimental Comparison of PAM-8 Probabilistic Shaping with Different Gaussian Orders at 200 Gb/s Net Rate in IM/DD System with O-Band TOSA

Authors: Md Sabbir-Bin Hossain, Georg Böcherer, Youxi Lin, Shuangxu Li, Stefano Calabrò, Andrei Nedelcu, Talha Rahman, Tom Wettlin, Jinlong Wei, Nebojša Stojanović, Changsong Xie, Maxim Kuschnerov, Stephan Pachnicke

Abstract: For 200Gb/s net rates, cap probabilistic shaped PAM-8 with different Gaussian orders are experimentally compared against uniform PAM-8. In back-to-back and 5km measurements, cap-shaped 85-GBd PAM-8 with Gaussian order of 5 outperforms 71-GBd uniform PAM-8 by up to 2.90dBでしべる and 3.80dBでしべる in receiver sensitivity, respectively. For 200Gb/s net rates, cap probabilistic shaped PAM-8 with different Gaussian orders are experimentally compared against uniform PAM-8. In back-to-back and 5km measurements, cap-shaped 85-GBd PAM-8 with Gaussian order of 5 outperforms 71-GBd uniform PAM-8 by up to 2.90dBでしべる and 3.80dBでしべる in receiver sensitivity, respectively. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: submitted to 2022 European Conference on Optical Communication (ECOC)

arXiv:2205.08805 [pdf]

doi 10.1109/ECOC52684.2021.9605995

Experimental Comparison of Cap and Cup Probabilistically Shaped PAM for O-Band IM/DD Transmission System

Authors: Md Sabbir-Bin Hossain, Georg Boecherer, Talha Rahman, Nebojsa Stojanovic, Patrick Schulte, Stefano Calabrò, Jinlong Wei, Christian Bluemm, Tom Wettlin, Changsong Xie, Maxim Kuschnerov, Stephan Pachnicke

Abstract: For 200Gbit/s net rates, uniform PAM-4, 6 and 8 are experimentally compared against probabilistic shaped PAM-8 cap and cup variants. In back-to-back and 20km measurements, cap shaped 80GBd PAM-8 outperforms 72GBd PAM-8 and 83GBd PAM-6 by up to 3.50dBでしべる and 0.8dBでしべる in receiver sensitivity, respectively For 200Gbit/s net rates, uniform PAM-4, 6 and 8 are experimentally compared against probabilistic shaped PAM-8 cap and cup variants. In back-to-back and 20km measurements, cap shaped 80GBd PAM-8 outperforms 72GBd PAM-8 and 83GBd PAM-6 by up to 3.50dBでしべる and 0.8dBでしべる in receiver sensitivity, respectively △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: Originally published in ECOC-2021. We have updated Figure 3. The change also affects the overall outcome. In contrast to the published version, compared to uniform PAM-8 72 GBd, PS-PAM-8 80 GBd performance is updated to 3.50 dBでしべる instead of 5.17 dBでしべる, while for PAM-6 83 GBd the gain becomes 0.8 dBでしべる instead of 2.17 dBでしべる. The changes are adapted in all sections except the experimental setup and DSP section

Journal ref: 2021 European Conference on Optical Communication (ECOC)

arXiv:2203.10773 [pdf, other]

Slice Imputation: Intermediate Slice Interpolation for Anisotropic 3D Medical Image Segmentation

Authors: Zhaotao Wu, Jia Wei, Jiabing Wang, Rui Li

Abstract: We introduce a novel frame-interpolation-based method for slice imputation to improve segmentation accuracy for anisotropic 3D medical images, in which the number of slices and their corresponding segmentation labels can be increased between two consecutive slices in anisotropic 3D medical volumes. Unlike previous inter-slice imputation methods, which only focus on the smoothness in the axial dire… ▽ More We introduce a novel frame-interpolation-based method for slice imputation to improve segmentation accuracy for anisotropic 3D medical images, in which the number of slices and their corresponding segmentation labels can be increased between two consecutive slices in anisotropic 3D medical volumes. Unlike previous inter-slice imputation methods, which only focus on the smoothness in the axial direction, this study aims to improve the smoothness of the interpolated 3D medical volumes in all three directions: axial, sagittal, and coronal. The proposed multitask inter-slice imputation method, in particular, incorporates a smoothness loss function to evaluate the smoothness of the interpolated 3D medical volumes in the through-plane direction (sagittal and coronal). It not only improves the resolution of the interpolated 3D medical volumes in the through-plane direction but also transforms them into isotropic representations, which leads to better segmentation performances. Experiments on whole tumor segmentation in the brain, liver tumor segmentation, and prostate segmentation indicate that our method outperforms the competing slice imputation methods on both computed tomography and magnetic resonance images volumes in most cases. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.09098 [pdf, other]

TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

Authors: Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin, Junhai Xu, Lin Zhang, Yantao Ji, Jianwu Dang

Abstract: Speaker embedding is an important front-end module to explore discriminative speaker features for many speech applications where speaker information is needed. Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation. However, naively adding many branches of multi-scale f… ▽ More Speaker embedding is an important front-end module to explore discriminative speaker features for many speech applications where speaker information is needed. Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation. However, naively adding many branches of multi-scale features with the simple fully convolutional operation could not efficiently improve the performance due to the rapid increase of model parameters and computational complexity. Therefore, in the most current state-of-the-art network architectures, only a few branches corresponding to a limited number of temporal scales could be designed for speaker embeddings. To address this problem, in this paper, we propose an effective temporal multi-scale (TMS) model where multi-scale branches could be efficiently designed in a speaker embedding network almost without increasing computational costs. The new model is based on the conventional TDNN, where the network architecture is smartly separated into two modeling operators: a channel-modeling operator and a temporal multi-branch modeling operator. Adding temporal multi-scale in the temporal multi-branch operator needs only a little bit increase of the number of parameters, and thus save more computational budget for adding more branches with large temporal scales. Moreover, in the inference stage, we further developed a systemic re-parameterization method to convert the TMS-based model into a single-path-based topology in order to increase inference speed. We investigated the performance of the new TMS method for automatic speaker verification (ASV) on in-domain and out-of-domain conditions. Results show that the TMS-based model obtained a significant increase in the performance over the SOTA ASV models, meanwhile, had a faster inference speed. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

arXiv:2202.09954 [pdf, other]

doi 10.1109/TCOMM.2022.3201931

Theoretical Analysis of Deep Neural Networks in Physical Layer Communication

Authors: Jun Liu, Haitao Zhao, Dongtang Ma, Kai Mei, Jibo Wei

Abstract: Recently, deep neural network (DNN)-based physical layer communication techniques have attracted considerable interest. Although their potential to enhance communication systems and superb performance have been validated by simulation experiments, little attention has been paid to the theoretical analysis. Specifically, most studies in the physical layer have tended to focus on the application of… ▽ More Recently, deep neural network (DNN)-based physical layer communication techniques have attracted considerable interest. Although their potential to enhance communication systems and superb performance have been validated by simulation experiments, little attention has been paid to the theoretical analysis. Specifically, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantitatively analyze why DNNs can achieve comparable performance in the physical layer comparing with traditional techniques, and also drive their cost in terms of computational complexity. To achieve this goal, we first analyze the encoding performance of a DNN-based transmitter and compare it to a traditional one. And then, we theoretically analyze the performance of DNN-based estimator and compare it with traditional estimators. Third, we investigate and validate how information is flown in a DNN-based communication system under the information theoretic concepts. Our analysis develops a concise way to open the "black box" of DNNs in physical layer communication, which can be applied to support the design of DNN-based intelligent communication techniques and help to provide explainable performance assessment. △ Less

Submitted 26 August, 2022; v1 submitted 20 February, 2022; originally announced February 2022.

Comments: 15 pages, 13 figures, has been accepted for publication in IEEE Transactions on Communications. arXiv admin note: substantial text overlap with arXiv:2106.01124

Journal ref: IEEE Transactions on Communications, 2022

arXiv:2202.00550 [pdf, other]

doi 10.36227/techrxiv.17707565

Record Capacity-Reach of C band IM/DD Optical Systems over Dispersion-Uncompensated Links

Authors: Haide Wang, Ji Zhou, Jinlong Wei, Wenxuan Mo, Yuanhua Feng, Weiping Liu, Changyuan Yu, Zhaohui Li

Abstract: We experimentally demonstrate a C band 100Gbit/s intensity modulation and direct detection entropy-loaded multi-rate Nyquist-subcarrier modulation signal over 100km dispersion-uncompensated link. A record capacity-reach of 10Tbit/s$\times$km is achieved. We experimentally demonstrate a C band 100Gbit/s intensity modulation and direct detection entropy-loaded multi-rate Nyquist-subcarrier modulation signal over 100km dispersion-uncompensated link. A record capacity-reach of 10Tbit/s$\times$km is achieved. △ Less

Submitted 12 December, 2021; originally announced February 2022.

Comments: This paper is submitted to Conference on Lasers and Electro-Optics 2022

Journal ref: TechRxiv 2022

arXiv:2201.11866 [pdf, other]

Calibrating Histopathology Image Classifiers using Label Smoothing

Authors: Jerry Wei, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

Abstract: The classification of histopathology images fundamentally differs from traditional image classification tasks because histopathology images naturally exhibit a range of diagnostic features, resulting in a diverse range of annotator agreement levels. However, examples with high annotator disagreement are often either assigned the majority label or discarded entirely when training histopathology ima… ▽ More The classification of histopathology images fundamentally differs from traditional image classification tasks because histopathology images naturally exhibit a range of diagnostic features, resulting in a diverse range of annotator agreement levels. However, examples with high annotator disagreement are often either assigned the majority label or discarded entirely when training histopathology image classifiers. This widespread practice often yields classifiers that do not account for example difficulty and exhibit poor model calibration. In this paper, we ask: can we improve model calibration by endowing histopathology image classifiers with inductive biases about example difficulty? We propose several label smoothing methods that utilize per-image annotator agreement. Though our methods are simple, we find that they substantially improve model calibration, while maintaining (or even improving) accuracy. For colorectal polyp classification, a common yet challenging task in gastrointestinal pathology, we find that our proposed agreement-aware label smoothing methods reduce calibration error by almost 70%. Moreover, we find that using model confidence as a proxy for annotator agreement also improves calibration and accuracy, suggesting that datasets without multiple annotators can still benefit from our proposed label smoothing methods via our proposed confidence-aware label smoothing methods. Given the importance of calibration (especially in histopathology image analysis), the improvements from our proposed techniques merit further exploration and potential implementation in other histopathology image classification tasks. △ Less

Submitted 27 January, 2022; originally announced January 2022.

arXiv:2201.04809 [pdf, other]

Conditional Variational Autoencoder with Balanced Pre-training for Generative Adversarial Networks

Authors: Yuchong Yao, Xiaohui Wangr, Yuanbang Ma, Han Fang, Jiaying Wei, Liyuan Chen, Ali Anaissi, Ali Braytee

Abstract: Class imbalance occurs in many real-world applications, including image classification, where the number of images in each class differs significantly. With imbalanced data, the generative adversarial networks (GANs) leans to majority class samples. The two recent methods, Balancing GAN (BAGAN) and improved BAGAN (BAGAN-GP), are proposed as an augmentation tool to handle this problem and restore t… ▽ More Class imbalance occurs in many real-world applications, including image classification, where the number of images in each class differs significantly. With imbalanced data, the generative adversarial networks (GANs) leans to majority class samples. The two recent methods, Balancing GAN (BAGAN) and improved BAGAN (BAGAN-GP), are proposed as an augmentation tool to handle this problem and restore the balance to the data. The former pre-trains the autoencoder weights in an unsupervised manner. However, it is unstable when the images from different categories have similar features. The latter is improved based on BAGAN by facilitating supervised autoencoder training, but the pre-training is biased towards the majority classes. In this work, we propose a novel Conditional Variational Autoencoder with Balanced Pre-training for Generative Adversarial Networks (CAPGAN) as an augmentation tool to generate realistic synthetic images. In particular, we utilize a conditional convolutional variational autoencoder with supervised and balanced pre-training for the GAN initialization and training with gradient penalty. Our proposed method presents a superior performance of other state-of-the-art methods on the highly imbalanced version of MNIST, Fashion-MNIST, CIFAR-10, and two medical imaging datasets. Our method can synthesize high-quality minority samples in terms of Fréchet inception distance, structural similarity index measure and perceptual quality. △ Less

Submitted 13 January, 2022; originally announced January 2022.

arXiv:2110.13465 [pdf, other]

CS-Rep: Making Speaker Verification Networks Embracing Re-parameterization

Authors: Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Lin Zhang, Yantao Ji, Junhai Xu, Xugang Lu

Abstract: Automatic speaker verification (ASV) systems, which determine whether two speeches are from the same speaker, mainly focus on verification accuracy while ignoring inference speed. However, in real applications, both inference speed and verification accuracy are essential. This study proposes cross-sequential re-parameterization (CS-Rep), a novel topology re-parameterization strategy for multi-type… ▽ More Automatic speaker verification (ASV) systems, which determine whether two speeches are from the same speaker, mainly focus on verification accuracy while ignoring inference speed. However, in real applications, both inference speed and verification accuracy are essential. This study proposes cross-sequential re-parameterization (CS-Rep), a novel topology re-parameterization strategy for multi-type networks, to increase the inference speed and verification accuracy of models. CS-Rep solves the problem that existing re-parameterization methods are unsuitable for typical ASV backbones. When a model applies CS-Rep, the training-period network utilizes a multi-branch topology to capture speaker information, whereas the inference-period model converts to a time-delay neural network (TDNN)-like plain backbone with stacked TDNN layers to achieve the fast inference speed. Based on CS-Rep, an improved TDNN with friendly test and deployment called Rep-TDNN is proposed. Compared with the state-of-the-art model ECAPA-TDNN, which is highly recognized in the industry, Rep-TDNN increases the actual inference speed by about 50% and reduces the EER by 10%. The code will be released. △ Less

Submitted 3 April, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted by ICASSP 2022

arXiv:2109.01235 [pdf, other]

DeepTracks: Geopositioning Maritime Vehicles in Video Acquired from a Moving Platform

Authors: Jianli Wei, Guanyu Xu, Alper Yilmaz

Abstract: Geopositioning and tracking a moving boat at sea is a very challenging problem, requiring boat detection, matching and estimating its GPS location from imagery with no common features. The problem can be stated as follows: given imagery from a camera mounted on a moving platform with known GPS location as the only valid sensor, we predict the geoposition of a target boat visible in images. Our sol… ▽ More Geopositioning and tracking a moving boat at sea is a very challenging problem, requiring boat detection, matching and estimating its GPS location from imagery with no common features. The problem can be stated as follows: given imagery from a camera mounted on a moving platform with known GPS location as the only valid sensor, we predict the geoposition of a target boat visible in images. Our solution uses recent ML algorithms, the camera-scene geometry and Bayesian filtering. The proposed pipeline first detects and tracks the target boat's location in the image with the strategy of tracking by detection. This image location is then converted to geoposition to the local sea coordinates referenced to the camera GPS location using plane projective geometry. Finally, target boat local coordinates are transformed to global GPS coordinates to estimate the geoposition. To achieve a smooth geotrajectory, we apply unscented Kalman filter (UKF) which implicitly overcomes small detection errors in the early stages of the pipeline. We tested the performance of our approach using GPS ground truth and show the accuracy and speed of the estimated geopositions. Our code is publicly available at https://github.com/JianliWei1995/AI-Track-at-Sea. △ Less

Submitted 2 September, 2021; originally announced September 2021.

arXiv:2108.06464 [pdf]

doi 10.1109/TCSVT.2021.3104575

4-D Epanechnikov Mixture Regression in Light Field Image Compression

Authors: Boning Liu, Yan Zhao, Xiaomeng Jiang, Shigang Wang, Jian Wei

Abstract: With the emergence of light field imaging in recent years, the compression of its elementary image array (EIA) has become a significant problem. Our coding framework includes modeling and reconstruction. For the modeling, the covariance-matrix form of the 4-D Epanechnikov kernel (4-D EK) and its correlated statistics were deduced to obtain the 4-D Epanechnikov mixture models (4-D EMMs). A 4-D Epan… ▽ More With the emergence of light field imaging in recent years, the compression of its elementary image array (EIA) has become a significant problem. Our coding framework includes modeling and reconstruction. For the modeling, the covariance-matrix form of the 4-D Epanechnikov kernel (4-D EK) and its correlated statistics were deduced to obtain the 4-D Epanechnikov mixture models (4-D EMMs). A 4-D Epanechnikov mixture regression (4-D EMR) was proposed based on this 4-D EK, and a 4-D adaptive model selection (4-D AMLS) algorithm was designed to realize the optimal modeling for a pseudo video sequence (PVS) of the extracted key-EIA. A linear function based reconstruction (LFBR) was proposed based on the correlation between adjacent elementary images (EIs). The decoded images realized a clear outline reconstruction and superior coding efficiency compared to high-efficiency video coding (HEVC) and JPEG 2000 below approximately 0.05 bpp. This work realized an unprecedented theoretical application by (1) proposing the 4-D Epanechnikov kernel theory, (2) exploiting the 4-D Epanechnikov mixture regression and its application in the modeling of the pseudo video sequence of light field images, (3) using 4-D adaptive model selection for the optimal number of models, and (4) employing a linear function-based reconstruction according to the content similarity. △ Less

Submitted 14 August, 2021; originally announced August 2021.

Comments: 16 pages, 17 figures, IEEE Transactions on Circuits and Systems for Video Technology ( Early Access )

arXiv:2107.11792 [pdf, other]

doi 10.1109/JLT.2021.3131603

Multi-Rate Nyquist-SCM for C-Band 100Gbit/s Signal over 50km Dispersion-Uncompensated Link

Authors: Haide Wang, Ji Zhou, Jinlong Wei, Dong Guo, Yuanhua Feng, Weiping Liu, Changyuan Yu, Dawei Wang, Zhaohui Li

Abstract: In this paper, to the best of our knowledge, we propose the first multi-rate Nyquist-subcarriers modulation (SCM) for C-band 100Gbit/s signal transmission over 50km dispersion-uncompensated link. Chromatic dispersion (CD) introduces severe spectral nulls on optical double-sideband signal, which greatly degrades the performance of intensity-modulation and direct-detection systems. Based on the prio… ▽ More In this paper, to the best of our knowledge, we propose the first multi-rate Nyquist-subcarriers modulation (SCM) for C-band 100Gbit/s signal transmission over 50km dispersion-uncompensated link. Chromatic dispersion (CD) introduces severe spectral nulls on optical double-sideband signal, which greatly degrades the performance of intensity-modulation and direct-detection systems. Based on the prior knowledge of the dispersive channel, Nyquist-SCM with multi-rate subcarriers is proposed to keep away from the CD-caused spectral nulls flexibly. Signal on each subcarrier can be individually recovered by a digital signal processing, including the feed-forward equalizer with no more than 31 taps, a two-tap post filter, and maximum likelihood sequence estimation with one memory length. Combining with entropy loading based on probabilistic constellation shaping to maximize the capacity-reach, the C-band 100Gbit/s multi-rate Nyquist-SCM signal over 50km dispersion-uncompensated link can achieve 7% hard-decision forward error correction limit and average normalized generalized mutual information of 0.967 at received optical power of -4dBでしべるm and optical signal-to-noise ratio of 47.67dBでしべる. In conclusion, the multi-rate Nyquist-SCM shows great potentials in solving the CD-caused spectral distortions. △ Less

Submitted 28 November, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

Comments: This paper has been accepted by Journal of Lightwave Techonlogy

arXiv:2107.06712 [pdf, other]

doi 10.1109/TCOMM.2021.3095198

A Low Complexity Learning-based Channel Estimation for OFDM Systems with Online Training

Authors: Kai Mei, Jun Liu, Xiaoying Zhang, Kuo Cao, Nandana Rajatheva, Jibo Wei

Abstract: In this paper, we devise a highly efficient machine learning-based channel estimation for orthogonal frequency division multiplexing (OFDM) systems, in which the training of the estimator is performed online. A simple learning module is employed for the proposed learning-based estimator. The training process is thus much faster and the required training data is reduced significantly. Besides, a tr… ▽ More In this paper, we devise a highly efficient machine learning-based channel estimation for orthogonal frequency division multiplexing (OFDM) systems, in which the training of the estimator is performed online. A simple learning module is employed for the proposed learning-based estimator. The training process is thus much faster and the required training data is reduced significantly. Besides, a training data construction approach utilizing least square (LS) estimation results is proposed so that the training data can be collected during the data transmission. The feasibility of this novel construction approach is verified by theoretical analysis and simulations. Based on this construction approach, two alternative training data generation schemes are proposed. One scheme transmits additional block pilot symbols to create training data, while the other scheme adopts a decision-directed method and does not require extra pilot overhead. Simulation results show the robustness of the proposed channel estimation method. Furthermore, the proposed method shows better adaptation to practical imperfections compared with the conventional minimum mean-square error (MMSE) channel estimation. It outperforms the existing machine learning-based channel estimation techniques under varying channel conditions. △ Less

Submitted 14 July, 2021; originally announced July 2021.

Comments: 12 pages, 12 figures. To appear in IEEE Transactions on Communications

arXiv:2106.12850 [pdf]

Transform-Based Feature Map Compression for CNN Inference

Authors: Yubo Shi, Meiqi Wang, Siyi Chen, Jinghe Wei, Zhongfeng Wang

Abstract: To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently. However, the large memory access of deep CNNs will lead to high power consumption. A variety of hardware-friendly compression methods have been proposed to reduce the data transfer bandwidth by exploiting the sparsity of feature maps. Most of them focus on designing a special… ▽ More To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently. However, the large memory access of deep CNNs will lead to high power consumption. A variety of hardware-friendly compression methods have been proposed to reduce the data transfer bandwidth by exploiting the sparsity of feature maps. Most of them focus on designing a specialized encoding format to increase the compression ratio. Differently, we observe and exploit the sparsity distinction between activations in earlier and later layers to improve the compression ratio. We propose a novel hardware-friendly transform-based method named 1D-Discrete Cosine Transform on Channel dimension with Masks (DCT-CM), which intelligently combines DCT, masks, and a coding format to compress activations. The proposed algorithm achieves an average compression ratio of 2.9x (53% higher than the state-of-the-art transform-based feature map compression works) during inference on ResNet-50 with an 8-bit quantization scheme. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: Accepted by IEEE International Symposium on Circuits and Systems(ISCAS) 2021

arXiv:2106.01124 [pdf, other]

Opening the Black Box of Deep Neural Networks in Physical Layer Communication

Authors: Jun Liu, Haitao Zhao, Dongtang Ma, Kai Mei, Jibo Wei

Abstract: Deep Neural Network (DNN)-based physical layer techniques are attracting considerable interest due to their potential to enhance communication systems. However, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantit… ▽ More Deep Neural Network (DNN)-based physical layer techniques are attracting considerable interest due to their potential to enhance communication systems. However, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantitatively analyze why DNNs can achieve comparable performance in the physical layer comparing with traditional techniques and their cost in terms of computational complexity. We further investigate and also experimentally validate how information is flown in a DNN-based communication system under the information theoretic concepts. △ Less

Submitted 18 February, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: 6 pages, 5 figures, to be presented in the IEEE Wireless Communications and Networking Conference (WCNC) 2022 Workshop on Machine Learning for Communications: Future Large Scale MIMO and AI-Native Air-Interface

arXiv:2105.14758 [pdf, other]

Low-Dose CT Denoising Using a Structure-Preserving Kernel Prediction Network

Authors: Lu Xu, Yuwei Zhang, Ying Liu, Daoye Wang, Mu Zhou, Jimmy Ren, Jingwei Wei, Zhaoxiang Ye

Abstract: Low-dose CT has been a key diagnostic imaging modality to reduce the potential risk of radiation overdose to patient health. Despite recent advances, CNN-based approaches typically apply filters in a spatially invariant way and adopt similar pixel-level losses, which treat all regions of the CT image equally and can be inefficient when fine-grained structures coexist with non-uniformly distributed… ▽ More Low-dose CT has been a key diagnostic imaging modality to reduce the potential risk of radiation overdose to patient health. Despite recent advances, CNN-based approaches typically apply filters in a spatially invariant way and adopt similar pixel-level losses, which treat all regions of the CT image equally and can be inefficient when fine-grained structures coexist with non-uniformly distributed noises. To address this issue, we propose a Structure-preserving Kernel Prediction Network (StructKPN) that combines the kernel prediction network with a structure-aware loss function that utilizes the pixel gradient statistics and guides the model towards spatially-variant filters that enhance noise removal, prevent over-smoothing and preserve detailed structures for different regions in CT imaging. Extensive experiments demonstrated that our approach achieved superior performance on both synthetic and non-synthetic datasets, and better preserves structures that are highly desired in clinical screening and low-dose protocol optimization. △ Less

Submitted 23 July, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: ICIP2021

arXiv:2105.08993 [pdf, other]

TarGAN: Target-Aware Generative Adversarial Networks for Multi-modality Medical Image Translation

Authors: Junxiao Chen, Jia Wei, Rui Li

Abstract: Paired multi-modality medical images, can provide complementary information to help physicians make more reasonable decisions than single modality medical images. But they are difficult to generate due to multiple factors in practice (e.g., time, cost, radiation dose). To address these problems, multi-modality medical image translation has aroused increasing research interest recently. However, th… ▽ More Paired multi-modality medical images, can provide complementary information to help physicians make more reasonable decisions than single modality medical images. But they are difficult to generate due to multiple factors in practice (e.g., time, cost, radiation dose). To address these problems, multi-modality medical image translation has aroused increasing research interest recently. However, the existing works mainly focus on translation effect of a whole image instead of a critical target area or Region of Interest (ROI), e.g., organ and so on. This leads to poor-quality translation of the localized target area which becomes blurry, deformed or even with extra unreasonable textures. In this paper, we propose a novel target-aware generative adversarial network called TarGAN, which is a generic multi-modality medical image translation model capable of (1) learning multi-modality medical image translation without relying on paired data, (2) enhancing quality of target area generation with the help of target area labels. The generator of TarGAN jointly learns mapping at two levels simultaneously - whole image translation mapping and target area translation mapping. These two mappings are interrelated through a proposed crossing loss. The experiments on both quantitative measures and qualitative evaluations demonstrate that TarGAN outperforms the state-of-the-art methods in all cases. Subsequent segmentation task is conducted to demonstrate effectiveness of synthetic images generated by TarGAN in a real-world application. Our code is available at https://github.com/2165998/TarGAN. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: 10 pages, 3 figures. It has been provisionally accepted for MICCAI 2021

arXiv:2105.04335 [pdf, other]

Geometrical Characterization of Sensor Placement for Cone-Invariant and Multi-Agent Systems against Undetectable Zero-Dynamics Attacks

Authors: Jianqi Chen, Jieqiang Wei, Wei Chen, Henrik Sandberg, Karl H. Johansson, Jie Chen

Abstract: Undetectable attacks are an important class of malicious attacks threatening the security of cyber-physical systems, which can modify a system's state but leave the system output measurements unaffected, and hence cannot be detected from the output. This paper studies undetectable attacks on cone-invariant systems and multi-agent systems. We first provide a general characterization of zero-dynamic… ▽ More Undetectable attacks are an important class of malicious attacks threatening the security of cyber-physical systems, which can modify a system's state but leave the system output measurements unaffected, and hence cannot be detected from the output. This paper studies undetectable attacks on cone-invariant systems and multi-agent systems. We first provide a general characterization of zero-dynamics attacks, which characterizes fully undetectable attacks targeting the non-minimum phase zeros of a system. This geometrical characterization makes it possible to develop a defense strategy seeking to place a minimal number of sensors to detect and counter the zero-dynamics attacks on the system's actuators. The detect and defense scheme amounts to computing a set containing potentially vulnerable actuator locations and nodes, and a defense union for feasible placement of sensors based on the geometrical properties of the cones under consideration. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: 8 figures

arXiv:2104.06588 [pdf, other]

OneVision: Centralized to Distributed Controller Synthesis with Delay Compensation

Authors: Jiayi Wei, Tongrui Li, Swarat Chaudhuri, Isil Dillig, Joydeep Biswas

Abstract: We propose a new algorithm to simplify the controller development for distributed robotic systems subject to external observations, disturbances, and communication delays. Unlike prior approaches that propose specialized solutions to handling communication latency for specific robotic applications, our algorithm uses an arbitrary centralized controller as the specification and automatically genera… ▽ More We propose a new algorithm to simplify the controller development for distributed robotic systems subject to external observations, disturbances, and communication delays. Unlike prior approaches that propose specialized solutions to handling communication latency for specific robotic applications, our algorithm uses an arbitrary centralized controller as the specification and automatically generates distributed controllers with communication management and delay compensation. We formulate our goal as nonlinear optimal control -- using a regret minimizing objective that measures how much the distributed agents behave differently from the delay-free centralized response -- and solve for optimal actions w.r.t. local estimations of this objective using gradient-based optimization. We analyze our proposed algorithm's behavior under a linear time-invariant special case and prove that the closed-loop dynamics satisfy a form of input-to-state stability w.r.t. unexpected disturbances and observations. Our experimental results on both simulated and real-world robotic tasks demonstrate the practical usefulness of our approach and show significant improvement over several baseline approaches. △ Less

Submitted 13 April, 2021; originally announced April 2021.

Comments: 8 pages, 4 figures

arXiv:2104.05463 [pdf, other]

Scalable Power Control/Beamforming in Heterogeneous Wireless Networks with Graph Neural Networks

Authors: Xiaochen Zhang, Haitao Zhao, Jun Xiong, Li Zhou, Jibo Wei

Abstract: Machine learning (ML) has been widely used for efficient resource allocation (RA) in wireless networks. Although superb performance is achieved on small and simple networks, most existing ML-based approaches are confronted with difficulties when heterogeneity occurs and network size expands. In this paper, specifically focusing on power control/beamforming (PC/BF) in heterogeneous device-to-device… ▽ More Machine learning (ML) has been widely used for efficient resource allocation (RA) in wireless networks. Although superb performance is achieved on small and simple networks, most existing ML-based approaches are confronted with difficulties when heterogeneity occurs and network size expands. In this paper, specifically focusing on power control/beamforming (PC/BF) in heterogeneous device-to-device (D2D) networks, we propose a novel unsupervised learning-based framework named heterogeneous interference graph neural network (HIGNN) to handle these challenges. First, we characterize diversified link features and interference relations with heterogeneous graphs. Then, HIGNN is proposed to empower each link to obtain its individual transmission scheme after limited information exchange with neighboring links. It is noteworthy that HIGNN is scalable to wireless networks of growing sizes with robust performance after trained on small-sized networks. Numerical results show that compared with state-of-the-art benchmarks, HIGNN achieves much higher execution efficiency while providing strong performance. △ Less

Submitted 8 December, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: 6 pages, 6 figures, accepted by IEEE GLOBECOM 2021. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2101.12355 [pdf, other]

A Petri Dish for Histopathology Image Analysis

Authors: Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

Abstract: With the rise of deep learning, there has been increased interest in using neural networks for histopathology image analysis, a field that investigates the properties of biopsy or resected specimens traditionally manually examined under a microscope by pathologists. However, challenges such as limited data, costly annotation, and processing high-resolution and variable-size images make it difficul… ▽ More With the rise of deep learning, there has been increased interest in using neural networks for histopathology image analysis, a field that investigates the properties of biopsy or resected specimens traditionally manually examined under a microscope by pathologists. However, challenges such as limited data, costly annotation, and processing high-resolution and variable-size images make it difficult to quickly iterate over model designs. Throughout scientific history, many significant research directions have leveraged small-scale experimental setups as petri dishes to efficiently evaluate exploratory ideas. In this paper, we introduce a minimalist histopathology image analysis dataset (MHIST), an analogous petri dish for histopathology image analysis. MHIST is a binary classification dataset of 3,152 fixed-size images of colorectal polyps, each with a gold-standard label determined by the majority vote of seven board-certified gastrointestinal pathologists and annotator agreement level. MHIST occupies less than 400 MB of disk space, and a ResNet-18 baseline can be trained to convergence on MHIST in just 6 minutes using 3.5 GB of memory on a NVIDIA RTX 3090. As example use cases, we use MHIST to study natural questions such as how dataset size, network depth, transfer learning, and high-disagreement examples affect model performance. By introducing MHIST, we hope to not only help facilitate the work of current histopathology imaging researchers, but also make the field more-accessible to the general community. Our dataset is available at https://bmirds.github.io/MHIST. △ Less

Submitted 27 March, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

Comments: In proceedings of Artificial Intelligence in Medicine (AIME) 2021

arXiv:2101.03548 [pdf, ps, other]

Channel Modeling and Signal Processing for Array-based Visible Light Communication System in Misalignment

Authors: Jiaqi Wei, Chen Gong, Nuo Huang, Zhengyuan Xu

Abstract: This paper proposes an indoor visible light communication (VLC) system with multiple transmitters and receivers. Due to diffusivity of LED light beams, photodiode receive signals from many directions. We use one concave and one convex lens as optical antenna, and obtain the optimal lens structure by optimizing which corresponds to the minimum condition number of channel gain matrix. In this way th… ▽ More This paper proposes an indoor visible light communication (VLC) system with multiple transmitters and receivers. Due to diffusivity of LED light beams, photodiode receive signals from many directions. We use one concave and one convex lens as optical antenna, and obtain the optimal lens structure by optimizing which corresponds to the minimum condition number of channel gain matrix. In this way the light emitted by different LED can be separated well from each other then minimize signal interference. However, interference increases in the case of system deviation, so we explore the system mobility. Then subsequent signal processing is carried out, including signal combining and successive interference cancellation (SIC). We combine the same signal received by different receivers to improve signal to interference noise ratio (SINR). And SIC can effectively restore interference and eliminate its impact. The simulation results show that channel capacity can be increased by more than 5 times and up to 20 times under the condition of receiver and transmitter alignment. In the case of movement, channel capacity can also be increased by about 4 times on average. Moreover, the mobile range of system is also significantly expanded. △ Less

Submitted 10 January, 2021; originally announced January 2021.

arXiv:2011.04247 [pdf, other]

Numerology Selection for OFDM Systems Based on Deep Neural Networks

Authors: Xiaoran Liu, Jiao Zhang, Jibo Wei

Abstract: In order to support diverse scenarios and deployments, the numerology of orthogonal frequency division multiplexing (OFDM) is defined for the parametrization of subcarrier spacing and cyclic prefix (CP). The time-frequency dispersion of mobile radio channels and the channel noise result in different performance deterioration in different numerologies. In this letter, we propose a deep neutral netw… ▽ More In order to support diverse scenarios and deployments, the numerology of orthogonal frequency division multiplexing (OFDM) is defined for the parametrization of subcarrier spacing and cyclic prefix (CP). The time-frequency dispersion of mobile radio channels and the channel noise result in different performance deterioration in different numerologies. In this letter, we propose a deep neutral network (DNN) approach for numerology selection of OFDM systems. Considering the inter-symbol interference (ISI), inter-carrier interference (ICI) and noise level, the SNR loss is established as the objective to be minimized. We extract the power delay profile, mobile velocity and noise power as the input features to the DNN. The proposed DNN learns from the channel characteristics to obtain the optimal numerology selection. Simulation results show that the proposed DNN achieves better performance than the existing methods. The decision boundaries of different numerologies are also illustrated to show the application range according to the channel characteristics. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: 4 pages, 3 figures

arXiv:2010.15560 [pdf, other]

Genetic U-Net: Automatically Designed Deep Networks for Retinal Vessel Segmentation Using a Genetic Algorithm

Authors: Jiahong Wei, Zhun Fan

Abstract: Recently, many methods based on hand-designed convolutional neural networks (CNNs) have achieved promising results in automatic retinal vessel segmentation. However, these CNNs remain constrained in capturing retinal vessels in complex fundus images. To improve their segmentation performance, these CNNs tend to have many parameters, which may lead to overfitting and high computational complexity.… ▽ More Recently, many methods based on hand-designed convolutional neural networks (CNNs) have achieved promising results in automatic retinal vessel segmentation. However, these CNNs remain constrained in capturing retinal vessels in complex fundus images. To improve their segmentation performance, these CNNs tend to have many parameters, which may lead to overfitting and high computational complexity. Moreover, the manual design of competitive CNNs is time-consuming and requires extensive empirical knowledge. Herein, a novel automated design method, called Genetic U-Net, is proposed to generate a U-shaped CNN that can achieve better retinal vessel segmentation but with fewer architecture-based parameters, thereby addressing the above issues. First, we devised a condensed but flexible search space based on a U-shaped encoder-decoder. Then, we used an improved genetic algorithm to identify better-performing architectures in the search space and investigated the possibility of finding a superior network architecture with fewer parameters. The experimental results show that the architecture obtained using the proposed method offered a superior performance with less than 1% of the number of the original U-Net parameters in particular and with significantly fewer parameters than other state-of-the-art models. Furthermore, through in-depth investigation of the experimental results, several effective operations and patterns of networks to generate superior retinal vessel segmentations were identified. △ Less

Submitted 11 June, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

arXiv:2009.13913 [pdf]

Denoising convolutional neural networks for photoacoustic microscopy

Authors: Xianlin Song, Kanggao Tang, Jianshuang Wei, Lingfang Song

Abstract: Photoacoustic imaging is a new imaging technology in recent years, which combines the advantages of high resolution and rich contrast of optical imaging with the advantages of high penetration depth of acoustic imaging. Photoacoustic imaging has been widely used in biomedical fields, such as brain imaging, tumor detection and so on. The signal-to-noise ratio (SNR) of image signals in photoacoustic… ▽ More Photoacoustic imaging is a new imaging technology in recent years, which combines the advantages of high resolution and rich contrast of optical imaging with the advantages of high penetration depth of acoustic imaging. Photoacoustic imaging has been widely used in biomedical fields, such as brain imaging, tumor detection and so on. The signal-to-noise ratio (SNR) of image signals in photoacoustic imaging is generally low due to the limitation of laser pulse energy, electromagnetic interference in the external environment and system noise. In order to solve the problem of low SNR of photoacoustic images, we use feedforward denoising convolutional neural network to further process the obtained images, so as to obtain higher SNR images and improve image quality. We use Python language to manage the referenced Python external library through Anaconda, and build a feedforward noise-reducing convolutional neural network on Pycharm platform.We first processed and segmated a training set containing 400 images, and then used it for network training. Finally, we tested it with a series of cerebrovascular photoacoustic microscopy images.The results show that the peak signal-to-noise ratio (PSNR) of the image increases significantly before and after denoising.The experimental results verify that the feed-forward noise reduction convolutional neural network can effectively improve the quality of photoacoustic microscopic images, which provides a good foundation for the subsequent biomedical research. △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:2009.10585 [pdf]

doi 10.1364/ACPC.2016.AS1B.3

Practical Solutions for 400 Gbit/s Data Center Transmission

Authors: Annika Dochhan, Nicklas Eiselt, Jinlong Wei, Helmut Griesser, Michael Eiselt, Juan José Vegas Olmos, Idelfonso Tafur Monroy, Jörg-Peter Elbers

Abstract: We review three solutions for low-cost data center interconnects with a target reach of up to 80 km. Directly detected DMT, PAM-4 and multi-band CAP are promising modulation schemes, enabling 400 Gbit/s by combining eight channels of 56 Gbit/s. We review three solutions for low-cost data center interconnects with a target reach of up to 80 km. Directly detected DMT, PAM-4 and multi-band CAP are promising modulation schemes, enabling 400 Gbit/s by combining eight channels of 56 Gbit/s. △ Less

Submitted 22 September, 2020; originally announced September 2020.

Comments: The work has been partially funded by the European Union Marie Curie project ABACUS and CEEOLAN and by the German ministry of education and research (BMBF) in project SpeeD under contract 13N1374

Journal ref: Asia Communications and Photonics Conference (ACP) 2016

arXiv:2009.10070 [pdf]

Photoacoustic microscopical simulation platform for large volumetric imaging using Bessel beam

Authors: Xianlin Song, Jianshuang Wei, Lingfang Song

Abstract: We developed a Bessel-beam photoacoustic microscopical simulation platform by using the k-Wave: MATLAB toolbox. The simulation platform uses the ring slit method to generate Bessel beam. By controlling the inner and outer radius of the ring slit, the depth-of-field (DoF) of Bessel beam can be controlled. And the large volumetric image is obtained by point scanning. The simulation experiments on bl… ▽ More We developed a Bessel-beam photoacoustic microscopical simulation platform by using the k-Wave: MATLAB toolbox. The simulation platform uses the ring slit method to generate Bessel beam. By controlling the inner and outer radius of the ring slit, the depth-of-field (DoF) of Bessel beam can be controlled. And the large volumetric image is obtained by point scanning. The simulation experiments on blood vessels was carried out to demonstrate the feasibility of the simulation plat-form. This simulation work can be used as an auxiliary tool for the research of Bessel-beam photoacoustic microscopy. △ Less

Submitted 19 September, 2020; originally announced September 2020.

arXiv:2009.03689 [pdf]

Synthetic multi-focus optical-resolution photoacoustic microscope for large volumetric imaging

Authors: Xianlin Song, Jianshuang Wei, Lingfang Song

Abstract: Photoacoustic microscopy is becoming an important tool for the biomedical research. It has been widely used in biological researches, such as structural imaging of vasculature, brain structural and functional imaging, and tumor detection. The conventional optical-resolution photoacoustic microscopy (OR- PAM) employs focused gaussian beam to achieve high lateral resolution by a microscope objective… ▽ More Photoacoustic microscopy is becoming an important tool for the biomedical research. It has been widely used in biological researches, such as structural imaging of vasculature, brain structural and functional imaging, and tumor detection. The conventional optical-resolution photoacoustic microscopy (OR- PAM) employs focused gaussian beam to achieve high lateral resolution by a microscope objective with high numerical apertures. Since the focused gaussian beam only has narrow depth range in focus, little detail in depth direction can be revealed. Here, we developed a synthetic multi-focus optical-resolution photoacoustic microscope using multi-scale weighted gradient-based fusion. Based on the saliency of the image structure, a gradient-based multi-focus image fusion method is used, and a multi-scale method is used to determine the gradient weights. We pay special attention to a dual-scale scheme, which effectively solves the fusion problem caused by anisotropic blur and registration error. First, the structure-based large-scale focus measurement method is used to reduce the effect of anisotropic blur and registration error on the detection of the focus area, and then the gradient weights near the edge wave are used by applying the small-scale focus measure. Simulation was performed to test the performance of our method, different focused images were used to verify the feasibility of the method. Performance of our method was analyzed by calculating Entropy, Mean Square Error (MSE) and Edge strength. The result of simulation shown that this method can extend the depth of field of PAM two times without the sacrifice of lateral resolution. And the in vivo imaging of the zebra fish further demonstrates the feasibility of our method. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2009.01840

arXiv:2009.01840 [pdf]

Computed extended depth of field optical-resolution photoacoustic microscope

Authors: Xianlin Song, Jianshuang Wei, Lingfang Song

Abstract: Photoacoustic microscopy with large depth of focus is significant to the biomedical research. The conventional optical-resolution photoacoustic microscope (OR-PAM) suffers from limited depth of field (DoF) since the employed focused Gaussian beam only has a narrow depth range in focus, little details in depth direction can be revealed. Here, we developed a computed extended depth of field method f… ▽ More Photoacoustic microscopy with large depth of focus is significant to the biomedical research. The conventional optical-resolution photoacoustic microscope (OR-PAM) suffers from limited depth of field (DoF) since the employed focused Gaussian beam only has a narrow depth range in focus, little details in depth direction can be revealed. Here, we developed a computed extended depth of field method for photoacoustic microscope by using wavelet transform image fusion rules. Wavelet transform is performed on the max amplitude projection (MAP) images acquired at different axial positions by OR-PAM to separate the low and high frequencies, respectively. The fused low frequency coefficients is taking the average of the low-frequency coefficients of the low-frequency part of the images. And maximum selection rule is used in high frequency coefficients. Wavelet coefficient of the MAP images are compared and select the maximum value coefficient is taken as fused high-frequency coefficients. And finally the wavelet inverse transform is performed to achieve large DoF. Simulation was performed to demonstrate that this method can extend the depth of field of PAM two times without the sacrifice of lateral resolution. And the in vivo imaging of the mouse cerebral vasculature with intact skull further demonstrates the feasibility of our method. △ Less

Submitted 3 September, 2020; originally announced September 2020.

arXiv:2007.09248 [pdf, other]

doi 10.1109/TCCN.2021.3118465

Fine Timing and Frequency Synchronization for MIMO-OFDM: An Extreme Learning Approach

Authors: Jun Liu, Kai Mei, Xiaochen Zhang, Des McLernon, Dongtang Ma, Jibo Wei, Syed Ali Raza Zaidi

Abstract: Multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) is a key technology component in the evolution towards cognitive radio (CR) in next-generation communication in which the accuracy of timing and frequency synchronization significantly impacts the overall system performance. In this paper, we propose a novel scheme leveraging extreme learning machine (ELM) to ach… ▽ More Multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) is a key technology component in the evolution towards cognitive radio (CR) in next-generation communication in which the accuracy of timing and frequency synchronization significantly impacts the overall system performance. In this paper, we propose a novel scheme leveraging extreme learning machine (ELM) to achieve high-precision synchronization. Specifically, exploiting the preamble signals with synchronization offsets, two ELMs are incorporated into a traditional MIMO-OFDM system to estimate both the residual symbol timing offset (RSTO) and the residual carrier frequency offset (RCFO). The simulation results show that the performance of the proposed ELM-based synchronization scheme is superior to the traditional method under both additive white Gaussian noise (AWGN) and frequency selective fading channels. Furthermore, comparing with the existing machine learning based techniques, the proposed method shows outstanding performance without the requirement of perfect channel state information (CSI) and prohibitive computational complexity. Finally, the proposed method is robust in terms of the choice of channel parameters (e.g., number of paths) and also in terms of "generalization ability" from a machine learning standpoint. △ Less

Submitted 1 June, 2022; v1 submitted 17 July, 2020; originally announced July 2020.

Comments: 13 pages, 12 figures, has been accepted for publication in IEEE Transactions on Cognitive Communications and Networking

Journal ref: IEEE Transactions on Cognitive Communications and Networking, 2021

arXiv:2007.00477 [pdf]

doi 10.3390/ma13132960

Automatic Crack Detection on Road Pavements Using Encoder Decoder Architecture

Authors: Zhun Fan, Chong Li, Ying Chen, Jiahong Wei, Giuseppe Loprencipe, Xiaopeng Chen, Paola Di Mascio

Abstract: Inspired by the development of deep learning in computer vision and object detection, the proposed algorithm considers an encoder-decoder architecture with hierarchical feature learning and dilated convolution, named U-Hierarchical Dilated Network (U-HDN), to perform crack detection in an end-to-end method. Crack characteristics with multiple context information are automatically able to learn and… ▽ More Inspired by the development of deep learning in computer vision and object detection, the proposed algorithm considers an encoder-decoder architecture with hierarchical feature learning and dilated convolution, named U-Hierarchical Dilated Network (U-HDN), to perform crack detection in an end-to-end method. Crack characteristics with multiple context information are automatically able to learn and perform end-to-end crack detection. Then, a multi-dilation module embedded in an encoder-decoder architecture is proposed. The crack features of multiple context sizes can be integrated into the multi-dilation module by dilation convolution with different dilatation rates, which can obtain much more cracks information. Finally, the hierarchical feature learning module is designed to obtain a multi-scale features from the high to low-level convolutional layers, which are integrated to predict pixel-wise crack detection. Some experiments on public crack databases using 118 images were performed and the results were compared with those obtained with other methods on the same images. The results show that the proposed U-HDN method achieves high performance because it can extract and fuse different context sizes and different levels of feature maps than other algorithms. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2005.14453 [pdf]

Complexity Reduction of Volterra Nonlinear Equalization for Optical Short-Reach IM/DD Systems

Authors: Tom Wettlin, Talha Rahman, Jinlong Wei, Stefano Calabrò, Nebojsa Stojanovic, Stephan Pachnicke

Abstract: We investigate approaches to reduce the computational complexity of Volterra nonlinear equalizers (VNLEs) for short-reach optical transmission systems using intensity modulation and direct detection (IM/DD). In this contribution we focus on a structural reduction of the number of kernels, i.e. we define rules to decide which terms need to be implemented and which can be neglected before the kernel… ▽ More We investigate approaches to reduce the computational complexity of Volterra nonlinear equalizers (VNLEs) for short-reach optical transmission systems using intensity modulation and direct detection (IM/DD). In this contribution we focus on a structural reduction of the number of kernels, i.e. we define rules to decide which terms need to be implemented and which can be neglected before the kernels are calculated. This static complexity reduction is to be distinguished from other approaches like pruning or L1 regularization, that are applied after the adaptation of the full Volterra equalizer e.g. by thresholding. We investigate the impact of the complexity reduction on 90 GBd PAM6 IM/DD experimental data acquired in a back-to-back setup as well as in case of transmission over 1 km SSMF. First, we show, that the third-order VNLE terms have a significant impact on the overall performance of the system and that a high number of coefficients is necessary for optimal performance. Afterwards, we show that restrictions, for example on the tap spacing among samples participating in the same kernel, can lead to an improved tradeoff between performance and complexity compared to a full third-order VNLE. We show an example, in which the number of third-order kernels is halved without any appreciable performance degradation. △ Less

Submitted 29 May, 2020; originally announced May 2020.

Journal ref: in Proc. 21th ITG-Symposium on Photonic Networks, pp. 65-70, Nov. 2020

arXiv:2005.06101 [pdf]

A Cyber Physical System Framework for UAV Communications

Authors: Haijun Wang, Haitao Zhao, Dongtang Ma, Jibo Wei

Abstract: Diverse applications have witnessed the prevalence of unmanned aerial vehicles (UAVs) due to their agility and versatility. Compared with computation and control, the communication tends to be the bottleneck of the whole UAV system. Cyber physical system (CPS), which achieves the integration of the cyber and physical domains, can inspire us to deal with the communication problems through a cross-d… ▽ More Diverse applications have witnessed the prevalence of unmanned aerial vehicles (UAVs) due to their agility and versatility. Compared with computation and control, the communication tends to be the bottleneck of the whole UAV system. Cyber physical system (CPS), which achieves the integration of the cyber and physical domains, can inspire us to deal with the communication problems through a cross-disciplinary method. To this end, we first expound the coupling effects of computation and control to communication. Then, we propose a novel CPS framework for UAV communications. By extending the dimension of communication decisions to computation and control, the framework can precisely orient and settle the communication issues. Further, a quantitative energy optimization model is established to guide the protocol and algorithm design for UAV communications. Case simulation results validate the CPS framework in terms of the energy consumption and communication delay. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: 7 pages, 5 figures, 1 table, 15 references

arXiv:2004.08555 [pdf, other]

doi 10.1109/TITS.2021.3054840

Deep Learning on Traffic Prediction: Methods, Analysis and Future Directions

Authors: Xueyan Yin, Genze Wu, Jinze Wei, Yanming Shen, Heng Qi, Baocai Yin

Abstract: Traffic prediction plays an essential role in intelligent transportation system. Accurate traffic prediction can assist route planing, guide vehicle dispatching, and mitigate traffic congestion. This problem is challenging due to the complicated and dynamic spatio-temporal dependencies between different regions in the road network. Recently, a significant amount of research efforts have been devot… ▽ More Traffic prediction plays an essential role in intelligent transportation system. Accurate traffic prediction can assist route planing, guide vehicle dispatching, and mitigate traffic congestion. This problem is challenging due to the complicated and dynamic spatio-temporal dependencies between different regions in the road network. Recently, a significant amount of research efforts have been devoted to this area, especially deep learning method, greatly advancing traffic prediction abilities. The purpose of this paper is to provide a comprehensive survey on deep learning-based approaches in traffic prediction from multiple perspectives. Specifically, we first summarize the existing traffic prediction methods, and give a taxonomy. Second, we list the state-of-the-art approaches in different traffic prediction applications. Third, we comprehensively collect and organize widely used public datasets in the existing literature to facilitate other researchers. Furthermore, we give an evaluation and analysis by conducting extensive experiments to compare the performance of different methods on a real-world public dataset. Finally, we discuss open challenges in this field. △ Less

Submitted 18 March, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

Comments: to be published in IEEE Transactions on Intelligent Transportation Systems

arXiv:2001.11698 [pdf, other]

doi 10.3233/FAIA200314

Inter-slice image augmentation based on frame interpolation for boosting medical image segmentation accuracy

Authors: Zhaotao Wu, Jia Wei, Wenguang Yuan, Jiabing Wang, Tolga Tasdizen

Abstract: We introduce the idea of inter-slice image augmentation whereby the numbers of the medical images and the corresponding segmentation labels are increased between two consecutive images in order to boost medical image segmentation accuracy. Unlike conventional data augmentation methods in medical imaging, which only increase the number of training samples directly by adding new virtual samples usin… ▽ More We introduce the idea of inter-slice image augmentation whereby the numbers of the medical images and the corresponding segmentation labels are increased between two consecutive images in order to boost medical image segmentation accuracy. Unlike conventional data augmentation methods in medical imaging, which only increase the number of training samples directly by adding new virtual samples using simple parameterized transformations such as rotation, flipping, scaling, etc., we aim to augment data based on the relationship between two consecutive images, which increases not only the number but also the information of training samples. For this purpose, we propose a frame-interpolation-based data augmentation method to generate intermediate medical images and the corresponding segmentation labels between two consecutive images. We train and test a supervised U-Net liver segmentation network on SLIVER07 and CHAOS2019, respectively, with the augmented training samples, and obtain segmentation scores exhibiting significant improvement compared to the conventional augmentation methods. △ Less

Submitted 31 January, 2020; originally announced January 2020.

arXiv:2001.06678 [pdf]

Evolutionary Neural Architecture Search for Retinal Vessel Segmentation

Authors: Zhun Fan, Jiahong Wei, Guijie Zhu, Jiajie Mo, Wenji Li

Abstract: The accurate retinal vessel segmentation (RVS) is of great significance to assist doctors in the diagnosis of ophthalmology diseases and other systemic diseases. Manually designing a valid neural network architecture for retinal vessel segmentation requires high expertise and a large workload. In order to improve the performance of vessel segmentation and reduce the workload of manually designing… ▽ More The accurate retinal vessel segmentation (RVS) is of great significance to assist doctors in the diagnosis of ophthalmology diseases and other systemic diseases. Manually designing a valid neural network architecture for retinal vessel segmentation requires high expertise and a large workload. In order to improve the performance of vessel segmentation and reduce the workload of manually designing neural network, we propose novel approach which applies neural architecture search (NAS) to optimize an encoder-decoder architecture for retinal vessel segmentation. A modified evolutionary algorithm is used to evolve the architectures of encoder-decoder framework with limited computing resources. The evolved model obtained by the proposed approach achieves top performance among all compared methods on the three datasets, namely DRIVE, STARE and CHASE_DB1, but with much fewer parameters. Moreover, the results of cross-training show that the evolved model is with considerable scalability, which indicates a great potential for clinical disease diagnosis. △ Less

Submitted 18 March, 2020; v1 submitted 18 January, 2020; originally announced January 2020.

Showing 1–50 of 68 results for author: Wei, J