-
Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation
Authors:
Hongxu Jiang,
Muhammad Imran,
Linhai Ma,
Teng Zhang,
Yuyin Zhou,
Muxuan Liang,
Kuang Gong,
Wei Shao
Abstract:
Denoising diffusion probabilistic models (DDPMs) have achieved unprecedented success in computer vision. However, they remain underutilized in medical imaging, a field crucial for disease diagnosis and treatment planning. This is primarily due to the high computational cost associated with (1) the use of large number of time steps (e.g., 1,000) in diffusion processes and (2) the increased dimensio…
▽ More
Denoising diffusion probabilistic models (DDPMs) have achieved unprecedented success in computer vision. However, they remain underutilized in medical imaging, a field crucial for disease diagnosis and treatment planning. This is primarily due to the high computational cost associated with (1) the use of large number of time steps (e.g., 1,000) in diffusion processes and (2) the increased dimensionality of medical images, which are often 3D or 4D. Training a diffusion model on medical images typically takes days to weeks, while sampling each image volume takes minutes to hours. To address this challenge, we introduce Fast-DDPM, a simple yet effective approach capable of improving training speed, sampling speed, and generation quality simultaneously. Unlike DDPM, which trains the image denoiser across 1,000 time steps, Fast-DDPM trains and samples using only 10 time steps. The key to our method lies in aligning the training and sampling procedures to optimize time-step utilization. Specifically, we introduced two efficient noise schedulers with 10 time steps: one with uniform time step sampling and another with non-uniform sampling. We evaluated Fast-DDPM across three medical image-to-image generation tasks: multi-image super-resolution, image denoising, and image-to-image translation. Fast-DDPM outperformed DDPM and current state-of-the-art methods based on convolutional networks and generative adversarial networks in all tasks. Additionally, Fast-DDPM reduced the training time to 0.2x and the sampling time to 0.01x compared to DDPM. Our code is publicly available at: https://github.com/mirthAI/Fast-DDPM.
△ Less
Submitted 23 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Electromagnetic Information Theory for Holographic MIMO Communications
Authors:
Li Wei,
Tierui Gong,
Chongwen Huang,
Zhaoyang Zhang,
Wei E. I. Sha,
Zhi Ning Chen,
Linglong Dai,
Merouane Debbah,
Chau Yuen
Abstract:
Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far it…
▽ More
Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far its capabilities can be extended. However, the traditional Shannon information theory falls short in addressing these inquiries because it only focuses on the information itself while neglecting the underlying carrier, electromagnetic (EM) waves, and environmental interactions. To fill up the gap between the theoretical analysis and the practical application for HMIMO systems, we introduce electromagnetic information theory (EIT) in this paper. This paper begins by laying the foundation for HMIMO-oriented EIT, encompassing EM wave equations and communication regions. In the context of HMIMO systems, the resultant physical limitations are presented, involving Chu's limit, Harrington's limit, Hannan's limit, and the evaluation of coupling effects. Field sampling and HMIMO-assisted oversampling are also discussed to guide the optimal HMIMO design within the EIT framework. To comprehensively depict the EM-compliant propagation process, we present the approximate and exact channel modeling approaches in near-/far-field zones. Furthermore, we discuss both traditional Shannon's information theory, employing the probabilistic method, and Kolmogorov information theory, utilizing the functional analysis, for HMIMO-oriented EIT systems.
△ Less
Submitted 25 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
A Flexible 2.5D Medical Image Segmentation Approach with In-Slice and Cross-Slice Attention
Authors:
Amarjeet Kumar,
Hongxu Jiang,
Muhammad Imran,
Cyndi Valdes,
Gabriela Leon,
Dahyun Kang,
Parvathi Nataraj,
Yuyin Zhou,
Michael D. Weiss,
Wei Shao
Abstract:
Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is f…
▽ More
Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is feasible, it fails to capture the spatial relationships between slices. On the other hand, 3D models face challenges such as resolution inconsistencies in 2.5D images, along with computational complexity and susceptibility to overfitting when trained with limited data. In this context, 2.5D models, which capture inter-slice correlations using only 2D neural networks, emerge as a promising solution due to their reduced computational demand and simplicity in implementation. In this paper, we introduce CSA-Net, a flexible 2.5D segmentation model capable of processing 2.5D images with an arbitrary number of slices through an innovative Cross-Slice Attention (CSA) module. This module uses the cross-slice attention mechanism to effectively capture 3D spatial information by learning long-range dependencies between the center slice (for segmentation) and its neighboring slices. Moreover, CSA-Net utilizes the self-attention mechanism to understand correlations among pixels within the center slice. We evaluated CSA-Net on three 2.5D segmentation tasks: (1) multi-class brain MRI segmentation, (2) binary prostate MRI segmentation, and (3) multi-class prostate MRI segmentation. CSA-Net outperformed leading 2D and 2.5D segmentation methods across all three tasks, demonstrating its efficacy and superiority. Our code is publicly available at https://github.com/mirthAI/CSA-Net.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Electromagnetic Hybrid Beamforming for Holographic Communications
Authors:
Ran Ji,
Chongwen Huang,
Xiaoming Chen,
Wei E. I. Sha,
Linglong Dai,
Jiguang He,
Zhaoyang Zhang,
Chau Yuen,
Mérouane Debbah
Abstract:
It is well known that there is inherent radiation pattern distortion for the commercial base station antenna array, which usually needs three antenna sectors to cover the whole space. To eliminate pattern distortion and further enhance beamforming performance, we propose an electromagnetic hybrid beamforming (EHB) scheme based on a three-dimensional (3D) superdirective holographic antenna array. S…
▽ More
It is well known that there is inherent radiation pattern distortion for the commercial base station antenna array, which usually needs three antenna sectors to cover the whole space. To eliminate pattern distortion and further enhance beamforming performance, we propose an electromagnetic hybrid beamforming (EHB) scheme based on a three-dimensional (3D) superdirective holographic antenna array. Specifically, EHB consists of antenna excitation current vectors (analog beamforming) and digital precoding matrices, where the implementation of analog beamforming involves the real-time adjustment of the radiation pattern to adapt it to the dynamic wireless environment. Meanwhile, the digital beamforming is optimized based on the channel characteristics of analog beamforming to further improve the achievable rate of communication systems. An electromagnetic channel model incorporating array radiation patterns and the mutual coupling effect is also developed to evaluate the benefits of our proposed scheme. Simulation results demonstrate that our proposed EHB scheme with a 3D holographic array achieves a relatively flat superdirective beamforming gain and allows for programmable focusing directions throughout the entire spatial domain. Furthermore, they also verify that the proposed scheme achieves a sum rate gain of over 150% compared to traditional beamforming algorithms.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation
Authors:
Jie Zhang,
Xubing Yang,
Rui Jiang,
Wei Shao,
Li Zhang
Abstract:
The development of high-resolution remote sensing satellites has provided great convenience for research work related to remote sensing. Segmentation and extraction of specific targets are essential tasks when facing the vast and complex remote sensing images. Recently, the introduction of Segment Anything Model (SAM) provides a universal pre-training model for image segmentation tasks. While the…
▽ More
The development of high-resolution remote sensing satellites has provided great convenience for research work related to remote sensing. Segmentation and extraction of specific targets are essential tasks when facing the vast and complex remote sensing images. Recently, the introduction of Segment Anything Model (SAM) provides a universal pre-training model for image segmentation tasks. While the direct application of SAM to remote sensing image segmentation tasks does not yield satisfactory results, we propose RSAM-Seg, which stands for Remote Sensing SAM with Semantic Segmentation, as a tailored modification of SAM for the remote sensing field and eliminates the need for manual intervention to provide prompts. Adapter-Scale, a set of supplementary scaling modules, are proposed in the multi-head attention blocks of the encoder part of SAM. Furthermore, Adapter-Feature are inserted between the Vision Transformer (ViT) blocks. These modules aim to incorporate high-frequency image information and image embedding features to generate image-informed prompts. Experiments are conducted on four distinct remote sensing scenarios, encompassing cloud detection, field monitoring, building detection and road mapping tasks . The experimental results not only showcase the improvement over the original SAM and U-Net across cloud, buildings, fields and roads scenarios, but also highlight the capacity of RSAM-Seg to discern absent areas within the ground truth of certain datasets, affirming its potential as an auxiliary annotation method. In addition, the performance in few-shot scenarios is commendable, underscores its potential in dealing with limited datasets.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Authors:
Yutao Hu,
Tianbin Li,
Quanfeng Lu,
Wenqi Shao,
Junjun He,
Yu Qiao,
Ping Luo
Abstract:
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this pape…
▽ More
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this paper, we introduce OmniMedVQA, a novel comprehensive medical Visual Question Answering (VQA) benchmark. This benchmark is collected from 73 different medical datasets, including 12 different modalities and covering more than 20 distinct anatomical regions. Importantly, all images in this benchmark are sourced from authentic medical scenarios, ensuring alignment with the requirements of the medical field and suitability for evaluating LVLMs. Through our extensive experiments, we have found that existing LVLMs struggle to address these medical VQA problems effectively. Moreover, what surprises us is that medical-specialized LVLMs even exhibit inferior performance to those general-domain models, calling for a more versatile and robust LVLM in the biomedical field. The evaluation results not only reveal the current limitations of LVLM in understanding real medical images but also highlight our dataset's significance. Our code with dataset are available at https://github.com/OpenGVLab/Multi-Modality-Arena.
△ Less
Submitted 21 April, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
CIS-UNet: Multi-Class Segmentation of the Aorta in Computed Tomography Angiography via Context-Aware Shifted Window Self-Attention
Authors:
Muhammad Imran,
Jonathan R Krebs,
Veera Rajasekhar Reddy Gopu,
Brian Fazzone,
Vishal Balaji Sivaraman,
Amarjeet Kumar,
Chelsea Viscardi,
Robert Evans Heithaus,
Benjamin Shickel,
Yuyin Zhou,
Michol A Cooper,
Wei Shao
Abstract:
Advancements in medical imaging and endovascular grafting have facilitated minimally invasive treatments for aortic diseases. Accurate 3D segmentation of the aorta and its branches is crucial for interventions, as inaccurate segmentation can lead to erroneous surgical planning and endograft construction. Previous methods simplified aortic segmentation as a binary image segmentation problem, overlo…
▽ More
Advancements in medical imaging and endovascular grafting have facilitated minimally invasive treatments for aortic diseases. Accurate 3D segmentation of the aorta and its branches is crucial for interventions, as inaccurate segmentation can lead to erroneous surgical planning and endograft construction. Previous methods simplified aortic segmentation as a binary image segmentation problem, overlooking the necessity of distinguishing between individual aortic branches. In this paper, we introduce Context Infused Swin-UNet (CIS-UNet), a deep learning model designed for multi-class segmentation of the aorta and thirteen aortic branches. Combining the strengths of Convolutional Neural Networks (CNNs) and Swin transformers, CIS-UNet adopts a hierarchical encoder-decoder structure comprising a CNN encoder, symmetric decoder, skip connections, and a novel Context-aware Shifted Window Self-Attention (CSW-SA) as the bottleneck block. Notably, CSW-SA introduces a unique utilization of the patch merging layer, distinct from conventional Swin transformers. It efficiently condenses the feature map, providing a global spatial context and enhancing performance when applied at the bottleneck layer, offering superior computational efficiency and segmentation accuracy compared to the Swin transformers. We trained our model on computed tomography (CT) scans from 44 patients and tested it on 15 patients. CIS-UNet outperformed the state-of-the-art SwinUNetR segmentation model, which is solely based on Swin transformers, by achieving a superior mean Dice coefficient of 0.713 compared to 0.697, and a mean surface distance of 2.78 mm compared to 3.39 mm. CIS-UNet's superior 3D aortic segmentation offers improved precision and optimization for planning endovascular treatments. Our dataset and code will be publicly available.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
ImageBind-LLM: Multi-modality Instruction Tuning
Authors:
Jiaming Han,
Renrui Zhang,
Wenqi Shao,
Peng Gao,
Peng Xu,
Han Xiao,
Kaipeng Zhang,
Chris Liu,
Song Wen,
Ziyu Guo,
Xudong Lu,
Shuai Ren,
Yafei Wen,
Xiaoxin Chen,
Xiangyu Yue,
Hongsheng Li,
Yu Qiao
Abstract:
We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training…
▽ More
We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder. Then, the image features transformed by the bind network are added to word tokens of all layers in LLaMA, which progressively injects visual instructions via an attention-free and zero-initialized gating mechanism. Aided by the joint embedding of ImageBind, the simple image-text training enables our model to exhibit superior multi-modality instruction-following capabilities. During inference, the multi-modality inputs are fed into the corresponding ImageBind encoders, and processed by a proposed visual cache model for further cross-modal embedding enhancement. The training-free cache model retrieves from three million image features extracted by ImageBind, which effectively mitigates the training-inference modality discrepancy. Notably, with our approach, ImageBind-LLM can respond to instructions of diverse modalities and demonstrate significant language generation quality. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.
△ Less
Submitted 11 September, 2023; v1 submitted 7 September, 2023;
originally announced September 2023.
-
Align, Adapt and Inject: Sound-guided Unified Image Generation
Authors:
Yue Yang,
Kaipeng Zhang,
Yuying Ge,
Wenqi Shao,
Zeyue Xue,
Yu Qiao,
Ping Luo
Abstract:
Text-guided image generation has witnessed unprecedented progress due to the development of diffusion models. Beyond text and image, sound is a vital element within the sphere of human perception, offering vivid representations and naturally coinciding with corresponding scenes. Taking advantage of sound therefore presents a promising avenue for exploration within image generation research. Howeve…
▽ More
Text-guided image generation has witnessed unprecedented progress due to the development of diffusion models. Beyond text and image, sound is a vital element within the sphere of human perception, offering vivid representations and naturally coinciding with corresponding scenes. Taking advantage of sound therefore presents a promising avenue for exploration within image generation research. However, the relationship between audio and image supervision remains significantly underdeveloped, and the scarcity of related, high-quality datasets brings further obstacles. In this paper, we propose a unified framework 'Align, Adapt, and Inject' (AAI) for sound-guided image generation, editing, and stylization. In particular, our method adapts input sound into a sound token, like an ordinary word, which can plug and play with existing powerful diffusion-based Text-to-Image (T2I) models. Specifically, we first train a multi-modal encoder to align audio representation with the pre-trained textual manifold and visual manifold, respectively. Then, we propose the audio adapter to adapt audio representation into an audio token enriched with specific semantics, which can be injected into a frozen T2I model flexibly. In this way, we are able to extract the dynamic information of varied sounds, while utilizing the formidable capability of existing T2I models to facilitate sound-guided image generation, editing, and stylization in a convenient and cost-effective manner. The experiment results confirm that our proposed AAI outperforms other text and sound-guided state-of-the-art methods. And our aligned multi-modal encoder is also competitive with other approaches in the audio-visual retrieval and audio-text retrieval tasks.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
MicroSegNet: A Deep Learning Approach for Prostate Segmentation on Micro-Ultrasound Images
Authors:
Hongxu Jiang,
Muhammad Imran,
Preethika Muralidharan,
Anjali Patel,
Jake Pensa,
Muxuan Liang,
Tarik Benidir,
Joseph R. Grajo,
Jason P. Joseph,
Russell Terry,
John Michael DiBianco,
Li-Ming Su,
Yuyin Zhou,
Wayne G. Brisbane,
Wei Shao
Abstract:
Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging…
▽ More
Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging due to artifacts and indistinct borders between the prostate, bladder, and urethra in the midline. This paper presents MicroSegNet, a multi-scale annotation-guided transformer UNet model designed specifically to tackle these challenges. During the training process, MicroSegNet focuses more on regions that are hard to segment (hard regions), characterized by discrepancies between expert and non-expert annotations. We achieve this by proposing an annotation-guided binary cross entropy (AG-BCE) loss that assigns a larger weight to prediction errors in hard regions and a lower weight to prediction errors in easy regions. The AG-BCE loss was seamlessly integrated into the training process through the utilization of multi-scale deep supervision, enabling MicroSegNet to capture global contextual dependencies and local information at various scales. We trained our model using micro-US images from 55 patients, followed by evaluation on 20 patients. Our MicroSegNet model achieved a Dice coefficient of 0.939 and a Hausdorff distance of 2.02 mm, outperforming several state-of-the-art segmentation methods, as well as three human annotators with different experience levels. Our code is publicly available at https://github.com/mirthAI/MicroSegNet and our dataset is publicly available at https://zenodo.org/records/10475293.
△ Less
Submitted 25 January, 2024; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Image Registration of In Vivo Micro-Ultrasound and Ex Vivo Pseudo-Whole Mount Histopathology Images of the Prostate: A Proof-of-Concept Study
Authors:
Muhammad Imran,
Brianna Nguyen,
Jake Pensa,
Sara M. Falzarano,
Anthony E. Sisk,
Muxuan Liang,
John Michael DiBianco,
Li-Ming Su,
Yuyin Zhou,
Wayne G. Brisbane,
Wei Shao
Abstract:
Early diagnosis of prostate cancer significantly improves a patient's 5-year survival rate. Biopsy of small prostate cancers is improved with image-guided biopsy. MRI-ultrasound fusion-guided biopsy is sensitive to smaller tumors but is underutilized due to the high cost of MRI and fusion equipment. Micro-ultrasound (micro-US), a novel high-resolution ultrasound technology, provides a cost-effecti…
▽ More
Early diagnosis of prostate cancer significantly improves a patient's 5-year survival rate. Biopsy of small prostate cancers is improved with image-guided biopsy. MRI-ultrasound fusion-guided biopsy is sensitive to smaller tumors but is underutilized due to the high cost of MRI and fusion equipment. Micro-ultrasound (micro-US), a novel high-resolution ultrasound technology, provides a cost-effective alternative to MRI while delivering comparable diagnostic accuracy. However, the interpretation of micro-US is challenging due to subtle gray scale changes indicating cancer vs normal tissue. This challenge can be addressed by training urologists with a large dataset of micro-US images containing the ground truth cancer outlines. Such a dataset can be mapped from surgical specimens (histopathology) onto micro-US images via image registration. In this paper, we present a semi-automated pipeline for registering in vivo micro-US images with ex vivo whole-mount histopathology images. Our pipeline begins with the reconstruction of pseudo-whole-mount histopathology images and a 3-dimensional (3D) micro-US volume. Each pseudo-whole-mount histopathology image is then registered with the corresponding axial micro-US slice using a two-stage approach that estimates an affine transformation followed by a deformable transformation. We evaluated our registration pipeline using micro-US and histopathology images from 18 patients who underwent radical prostatectomy. The results showed a Dice coefficient of 0.94 and a landmark error of 2.7 mm, indicating the accuracy of our registration pipeline. This proof-of-concept study demonstrates the feasibility of accurately aligning micro-US and histopathology images. To promote transparency and collaboration in research, we will make our code and dataset publicly available.
△ Less
Submitted 16 June, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Channel Modeling and Multi-User Precoding for Tri-Polarized Holographic MIMO Communications
Authors:
Li Wei,
Chongwen Huang,
George C. Alexandropoulos,
Zhaohui Yang,
Jun Yang,
Wei E. I. Sha,
Merouane Debbah,
Chau Yuen
Abstract:
This paper studies the exploitation of triple polarization (TP) for multi-user (MU) holographic multiple-input multiple-output surface (HMIMOS) wireless communication systems, aiming at capacity boosting without enlarging the antenna array size. We specifically consider that both the transmitter and receiver are equipped with an HMIMOS comprising compact sub-wavelength TP patch antennas. To charac…
▽ More
This paper studies the exploitation of triple polarization (TP) for multi-user (MU) holographic multiple-input multiple-output surface (HMIMOS) wireless communication systems, aiming at capacity boosting without enlarging the antenna array size. We specifically consider that both the transmitter and receiver are equipped with an HMIMOS comprising compact sub-wavelength TP patch antennas. To characterize TP MUHMIMOS systems, a TP near-field channel model is proposed using the dyadic Green's function, whose characteristics are leveraged to design a user-cluster-based precoding scheme for mitigating the cross-polarization and inter-user interference contributions. A theoretical correlation analysis for HMIMOS with infinitely small patch antennas is also presented. According to the proposed scheme, the users are assigned to one of the three polarizations, which is easy to implement, at the cost, however, of reducing the system's diversity. Our numerical results showcase that the cross-polarization channel components have a nonnegligible impact on the system performance, which is efficiently eliminated with the proposed MU precoding scheme.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
Channel Measurement for Holographic MIMO: Benefits and Challenges of Spatial Oversampling
Authors:
Tengjiao Wang,
Yongxi Liu,
Ming Zhang,
Wei E. I. Sha,
Cen Ling,
Chao Li,
Shaobo Wang
Abstract:
In this paper, the channel of an indoor holographic multiple-input multiple-output (MIMO) system is measured. It is demonstrated through experiments for the first time that the spatial oversampling of holographic MIMO systems is able to increase the capacity of a wireless communication system significantly. However, the antenna efficiency is the most crucial challenge preventing us from getting th…
▽ More
In this paper, the channel of an indoor holographic multiple-input multiple-output (MIMO) system is measured. It is demonstrated through experiments for the first time that the spatial oversampling of holographic MIMO systems is able to increase the capacity of a wireless communication system significantly. However, the antenna efficiency is the most crucial challenge preventing us from getting the capacity improvement. An extended EM-compliant channel model is also proposed for holographic MIMO systems, which is able to take the non-isotropic characteristics of the propagation environment, the antenna pattern distortion, the antenna efficiency, and the polarization characteristics into consideration.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
An Electromagnetic-Information-Theory Based Model for Efficient Characterization of MIMO Systems in Complex Space
Authors:
Ruifeng Li,
Da Li,
Jinyan Ma,
Zhaoyang Feng,
Ling Zhang,
Shurun Tan,
Wei E. I. Sha,
Hongsheng Chen,
Er-Ping Li
Abstract:
It is the pursuit of a multiple-input-multiple-output (MIMO) system to approach and even break the limit of channel capacity. However, it is always a big challenge to efficiently characterize the MIMO systems in complex space and get better propagation performance than the conventional MIMO systems considering only free space, which is important for guiding the power and phase allocation of antenn…
▽ More
It is the pursuit of a multiple-input-multiple-output (MIMO) system to approach and even break the limit of channel capacity. However, it is always a big challenge to efficiently characterize the MIMO systems in complex space and get better propagation performance than the conventional MIMO systems considering only free space, which is important for guiding the power and phase allocation of antenna units. In this manuscript, an Electromagnetic-Information-Theory (EMIT) based model is developed for efficient characterization of MIMO systems in complex space. The group-T-matrix-based multiple scattering fast algorithm, the mode-decomposition-based characterization method, and their joint theoretical framework in complex space are discussed. Firstly, key informatics parameters in free electromagnetic space based on a dyadic Green's function are derived. Next, a novel group-T-matrix-based multiple scattering fast algorithm is developed to describe a representative inhomogeneous electromagnetic space. All the analytical results are validated by simulations. In addition, the complete form of the EMIT-based model is proposed to derive the informatics parameters frequently used in electromagnetic propagation, through integrating the mode analysis method with the dyadic Green's function matrix. Finally, as a proof-or-concept, microwave anechoic chamber measurements of a cylindrical array is performed, demonstrating the effectiveness of the EMIT-based model. Meanwhile, a case of image transmission with limited power is presented to illustrate how to use this EMIT-based model to guide the power and phase allocation of antenna units for real MIMO applications.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Tri-Polarized Holographic MIMO Surface in Near-Field: Channel Modeling and Precoding Design
Authors:
Li Wei,
Chongwen Huang,
George C. Alexandropoulos,
Zhaohui Yang,
Jun Yang,
Wei E. I. Sha,
Zhaoyang Zhang,
Merouane Debbah,
Chau Yuen
Abstract:
This paper investigates the utilization of triple polarization (TP) for multi-user (MU) holographic multiple-input multi-output surface (HMIMOS) wireless communication systems, targeting capacity boosting and diversity exploitation without enlarging the antenna array sizes. We specifically consider that both the transmitter and receiver are both equipped with an HMIMOS consisting of compact sub-wa…
▽ More
This paper investigates the utilization of triple polarization (TP) for multi-user (MU) holographic multiple-input multi-output surface (HMIMOS) wireless communication systems, targeting capacity boosting and diversity exploitation without enlarging the antenna array sizes. We specifically consider that both the transmitter and receiver are both equipped with an HMIMOS consisting of compact sub-wavelength TP patch antennas within the near-field (NF) regime. To characterize TP MU-HMIMOS systems, a TP NF channel model is constructed using the dyadic Green's function, whose characteristics are leveraged to design two precoding schemes for mitigating the cross-polarization and inter-user interference contributions. Specifically, a user-cluster-based precoding scheme assigns different users to one of three polarizations at the expense of the system's diversity, and a two-layer precoding scheme removes interference using the Gaussian elimination method at a high computational cost. The theoretical correlation analysis for HMIMOS in the NF region is also investigated, revealing that both the spacing of transmit patch antennas and user distance impact transmit correlation factors. Our numerical results show that the users far from transmitting HMIMOS experience higher correlation than those closer within the NF regime, resulting in a lower channel capacity. Meanwhile, in terms of channel capacity, TP HMIMOS can almost achieve 1.25 times gain compared with dual-polarized HMIMOS, and 3 times compared with conventional HMIMOS. In addition, the proposed two-layer precoding scheme combined with two-layer power allocation realizes a higher spectral efficiency than other schemes without sacrificing diversity.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Electromagnetic Effective-Degree-of-Freedom Limit of a MIMO System in 2-D Inhomogeneous Environment
Authors:
Shuai S. A. Yuan,
Zi He,
Sheng Sun,
Xiaoming Chen,
Chongwen Huang,
Wei E. I. Sha
Abstract:
Compared with a single-input-single-output (SISO) wireless communication system, the benefit of multiple-input-multiple-output (MIMO) technology originates from its extra degree of freedom (DOF), also referred as scattering channels or spatial electromagnetic (EM) modes, brought by spatial multiplexing. When the physical sizes of transmitting and receiving arrays are fixed, and there are sufficien…
▽ More
Compared with a single-input-single-output (SISO) wireless communication system, the benefit of multiple-input-multiple-output (MIMO) technology originates from its extra degree of freedom (DOF), also referred as scattering channels or spatial electromagnetic (EM) modes, brought by spatial multiplexing. When the physical sizes of transmitting and receiving arrays are fixed, and there are sufficient antennas (typically with half-wavelength spacings), the DOF limit is only dependent on the propagating environment. Analytical methods can be used to estimate this limit in free space, and some approximate models are adopted in stochastic environments, such as Clarke's model and Ray-tracing methods. However, this DOF limit in an certain inhomogeneous environment has not been well discussed with rigorous full-wave numerical methods. In this work, volume integral equation (VIE) is implemented for investigating the limit of MIMO effective degree of freedom (EDOF) in three representative two-dimensional (2-D) inhomogeneous environments. Moreover, we clarify the relation between the performance of a MIMO system and the scattering characteristics of its propagating environment.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Extremely Large-Scale MIMO: Fundamentals, Challenges, Solutions, and Future Directions
Authors:
Zhe Wang,
Jiayi Zhang,
Hongyang Du,
Wei E. I. Sha,
Bo Ai,
Dusit Niyato,
Mérouane Debbah
Abstract:
Extremely large-scale multiple-input-multiple-output (XL-MIMO) is a promising technology to empower the next-generation communications. However, XL-MIMO, which is still in its early stage of research, has been designed with a variety of hardware and performance analysis schemes. To illustrate the differences and similarities among these schemes, we comprehensively review existing XL-MIMO hardware…
▽ More
Extremely large-scale multiple-input-multiple-output (XL-MIMO) is a promising technology to empower the next-generation communications. However, XL-MIMO, which is still in its early stage of research, has been designed with a variety of hardware and performance analysis schemes. To illustrate the differences and similarities among these schemes, we comprehensively review existing XL-MIMO hardware designs and characteristics in this article. Then, we thoroughly discuss the research status of XL-MIMO from "channel modeling", "performance analysis", and "signal processing". Several existing challenges are introduced and respective solutions are provided. We then propose two case studies for the hybrid propagation channel modeling and the effective degrees of freedom (EDoF) computations for practical scenarios. Using our proposed solutions, we perform numerical results to investigate the EDoF performance for the scenarios with unparallel XL-MIMO surfaces and multiple user equipment, respectively. Finally, we discuss several future research directions.
△ Less
Submitted 6 April, 2023; v1 submitted 24 September, 2022;
originally announced September 2022.
-
Infrared and Visible Image Fusion via Interactive Compensatory Attention Adversarial Learning
Authors:
Zhishe Wang,
Wenyu Shao,
Yanlin Chen,
Jiawei Xu,
Xiaoqin Zhang
Abstract:
The existing generative adversarial fusion methods generally concatenate source images and extract local features through convolution operation, without considering their global characteristics, which tends to produce an unbalanced result and is biased towards the infrared image or visible image. Toward this end, we propose a novel end-to-end mode based on generative adversarial training to achiev…
▽ More
The existing generative adversarial fusion methods generally concatenate source images and extract local features through convolution operation, without considering their global characteristics, which tends to produce an unbalanced result and is biased towards the infrared image or visible image. Toward this end, we propose a novel end-to-end mode based on generative adversarial training to achieve better fusion balance, termed as \textit{interactive compensatory attention fusion network} (ICAFusion). In particular, in the generator, we construct a multi-level encoder-decoder network with a triple path, and adopt infrared and visible paths to provide additional intensity and gradient information. Moreover, we develop interactive and compensatory attention modules to communicate their pathwise information, and model their long-range dependencies to generate attention maps, which can more focus on infrared target perception and visible detail characterization, and further increase the representation power for feature extraction and feature reconstruction. In addition, dual discriminators are designed to identify the similar distribution between fused result and source images, and the generator is optimized to produce a more balanced result. Extensive experiments illustrate that our ICAFusion obtains superior fusion performance and better generalization ability, which precedes other advanced methods in the subjective visual description and objective metric evaluation. Our codes will be public at \url{https://github.com/Zhishe-Wang/ICAFusion}
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
Electromagnetic Effective Degree of Freedom of a MIMO System in Free Space
Authors:
Shuai S. A. Yuan,
Zi He,
Xiaoming Chen,
Chongwen Huang,
Wei E. I. Sha
Abstract:
Effective degree of freedom (EDOF) of a multiple-input-multiple-output (MIMO) system represents its equivalent number of independent single-input-single-output (SISO) systems, which directly characterizes the communication performance. Traditional EDOF only considers single polarization, where the full polarized components degrade into two independent transverse components under the far-field appr…
▽ More
Effective degree of freedom (EDOF) of a multiple-input-multiple-output (MIMO) system represents its equivalent number of independent single-input-single-output (SISO) systems, which directly characterizes the communication performance. Traditional EDOF only considers single polarization, where the full polarized components degrade into two independent transverse components under the far-field approximation. However, the traditional model is not applicable to complex scenarios especially for the near-field region. Based on an electromagnetic (EM) channel model built from the dyadic Green's function, we first calculate the EM EDOF to estimate the performance of an arbitrary MIMO system with full polarizations in free space. Then, we clarify the relations between the limit of EDOF and the optimal number of sources/receivers. Finally, potential benefits of near-field MIMO communications are demonstrated with the EM EDOF, in which the contribution of the longitudinally polarized source is taken into account. This work establishes a fundamental EM framework for MIMO wireless communications.
△ Less
Submitted 1 January, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning
Authors:
Alessa Hering,
Lasse Hansen,
Tony C. W. Mok,
Albert C. S. Chung,
Hanna Siebert,
Stephanie Häger,
Annkristin Lange,
Sven Kuckertz,
Stefan Heldmann,
Wei Shao,
Sulaiman Vesal,
Mirabela Rusu,
Geoffrey Sonn,
Théo Estienne,
Maria Vakalopoulou,
Luyi Han,
Yunzhi Huang,
Pew-Thian Yap,
Mikael Brudfors,
Yaël Balbastre,
Samuel Joutard,
Marc Modat,
Gal Lifshitz,
Dan Raviv,
Jinxin Lv
, et al. (28 additional authors not shown)
Abstract:
Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing…
▽ More
Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing approaches. The Learn2Reg challenge addresses these limitations by providing a multi-task medical image registration data set for comprehensive characterisation of deformable registration algorithms. A continuous evaluation will be possible at https://learn2reg.grand-challenge.org. Learn2Reg covers a wide range of anatomies (brain, abdomen, and thorax), modalities (ultrasound, CT, MR), availability of annotations, as well as intra- and inter-patient registration evaluation. We established an easily accessible framework for training and validation of 3D registration methods, which enabled the compilation of results of over 65 individual method submissions from more than 20 unique teams. We used a complementary set of metrics, including robustness, accuracy, plausibility, and runtime, enabling unique insight into the current state-of-the-art of medical image registration. This paper describes datasets, tasks, evaluation methods and results of the challenge, as well as results of further analysis of transferability to new datasets, the importance of label supervision, and resulting bias. While no single approach worked best across all tasks, many methodological aspects could be identified that push the performance of medical image registration to new state-of-the-art performance. Furthermore, we demystified the common belief that conventional registration methods have to be much slower than deep-learning-based methods.
△ Less
Submitted 7 October, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Multi-User Holographic MIMO Surfaces: Channel Modeling and Spectral Efficiency Analysis
Authors:
Li Wei,
Chongwen Huang,
George C. Alexandropoulos,
Wei E. I. Sha,
Zhaoyang Zhang,
Merouane Debbah,
Chau Yuen
Abstract:
The multi-user Holographic Multiple-Input and Multiple-Output Surface (MU-HMIMOS) paradigm, which is capable of realizing large continuous apertures with minimal power consumption, has been recently considered as an energyefficient solution for future wireless networks, offering increased flexibility in impacting electromagnetic (EM) wave propagation according to the desired communication, localiz…
▽ More
The multi-user Holographic Multiple-Input and Multiple-Output Surface (MU-HMIMOS) paradigm, which is capable of realizing large continuous apertures with minimal power consumption, has been recently considered as an energyefficient solution for future wireless networks, offering increased flexibility in impacting electromagnetic (EM) wave propagation according to the desired communication, localization, and sensing objectives. The tractable channel modeling in MU-HMIMOS wireless systems is one of the most critical research challenges, mainly due to the coupling effect induced by the excessively large number of closely spaced patch antennas. In this paper, we focus on this challenge for the downlink of multi-user MIMO communications and extend an EM-compliant channel model to multiuser case, which is expressed in the wavenumber domain using the Fourier plane wave approximation. Based on the presented channel model, we investigate the spectral efficiency of maximumratio transmission and Zero-Forcing (ZF) precoding schemes. We also introduce a novel hardware efficient ZF precoder, leveraging Neumann series (NS) expansion to replace the required matrix inversion operation, which is very hard to be computed in the conventional way due to the extremely large number of patch antennas in the envisioned MU-HMIMOS communication systems. In comparison with the conventional independent and identical Rayleigh fading channels that ignore antenna coupling effects, the proposed EM-compliant channel model captures the mutual couplings induced by the very small antenna spacing. Our extensive performance evaluation results demonstrate that our theoretical performance expressions approximate sufficiently well ...
△ Less
Submitted 3 July, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
Bridging the gap between prostate radiology and pathology through machine learning
Authors:
Indrani Bhattacharya,
David S. Lim,
Han Lin Aung,
Xingchen Liu,
Arun Seetharaman,
Christian A. Kunder,
Wei Shao,
Simon J. C. Soerensen,
Richard E. Fan,
Pejman Ghanouni,
Katherine J. To'o,
James D. Brooks,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
Prostate cancer is the second deadliest cancer for American men. While Magnetic Resonance Imaging (MRI) is increasingly used to guide targeted biopsies for prostate cancer diagnosis, its utility remains limited due to high rates of false positives and false negatives as well as low inter-reader agreements. Machine learning methods to detect and localize cancer on prostate MRI can help standardize…
▽ More
Prostate cancer is the second deadliest cancer for American men. While Magnetic Resonance Imaging (MRI) is increasingly used to guide targeted biopsies for prostate cancer diagnosis, its utility remains limited due to high rates of false positives and false negatives as well as low inter-reader agreements. Machine learning methods to detect and localize cancer on prostate MRI can help standardize radiologist interpretations. However, existing machine learning methods vary not only in model architecture, but also in the ground truth labeling strategies used for model training. In this study, we compare different labeling strategies, namely, pathology-confirmed radiologist labels, pathologist labels on whole-mount histopathology images, and lesion-level and pixel-level digital pathologist labels (previously validated deep learning algorithm on histopathology images to predict pixel-level Gleason patterns) on whole-mount histopathology images. We analyse the effects these labels have on the performance of the trained machine learning models. Our experiments show that (1) radiologist labels and models trained with them can miss cancers, or underestimate cancer extent, (2) digital pathologist labels and models trained with them have high concordance with pathologist labels, and (3) models trained with digital pathologist labels achieve the best performance in prostate cancer detection in two different cohorts with different disease distributions, irrespective of the model architecture used. Digital pathologist labels can reduce challenges associated with human annotations, including labor, time, inter- and intra-reader variability, and can help bridge the gap between prostate radiology and pathology by enabling the training of reliable machine learning models to detect and localize prostate cancer on MRI.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
Weakly Supervised Registration of Prostate MRI and Histopathology Images
Authors:
Wei Shao,
Indrani Bhattacharya,
Simon J. C. Soerensen,
Christian A. Kunder,
Jeffrey B. Wang,
Richard E. Fan,
Pejman Ghanouni,
James D. Brooks,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
The interpretation of prostate MRI suffers from low agreement across radiologists due to the subtle differences between cancer and normal tissue. Image registration addresses this issue by accurately mapping the ground-truth cancer labels from surgical histopathology images onto MRI. Cancer labels achieved by image registration can be used to improve radiologists' interpretation of MRI by training…
▽ More
The interpretation of prostate MRI suffers from low agreement across radiologists due to the subtle differences between cancer and normal tissue. Image registration addresses this issue by accurately mapping the ground-truth cancer labels from surgical histopathology images onto MRI. Cancer labels achieved by image registration can be used to improve radiologists' interpretation of MRI by training deep learning models for early detection of prostate cancer. A major limitation of current automated registration approaches is that they require manual prostate segmentations, which is a time-consuming task, prone to errors. This paper presents a weakly supervised approach for affine and deformable registration of MRI and histopathology images without requiring prostate segmentations. We used manual prostate segmentations and mono-modal synthetic image pairs to train our registration networks to align prostate boundaries and local prostate features. Although prostate segmentations were used during the training of the network, such segmentations were not needed when registering unseen images at inference time. We trained and validated our registration network with 135 and 10 patients from an internal cohort, respectively. We tested the performance of our method using 16 patients from the internal cohort and 22 patients from an external cohort. The results show that our weakly supervised method has achieved significantly higher registration accuracy than a state-of-the-art method run without prostate segmentations. Our deep learning framework will ease the registration of MRI and histopathology images by obviating the need for prostate segmentations.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
Geodesic Density Regression for Correcting 4DCT Pulmonary Respiratory Motion Artifacts
Authors:
Wei Shao,
Yue Pan,
Oguz C. Durumeric,
Joseph M. Reinhardt,
John E. Bayouth,
Mirabela Rusu,
Gary E. Christensen
Abstract:
Pulmonary respiratory motion artifacts are common in four-dimensional computed tomography (4DCT) of lungs and are caused by missing, duplicated, and misaligned image data. This paper presents a geodesic density regression (GDR) algorithm to correct motion artifacts in 4DCT by correcting artifacts in one breathing phase with artifact-free data from corresponding regions of other breathing phases. T…
▽ More
Pulmonary respiratory motion artifacts are common in four-dimensional computed tomography (4DCT) of lungs and are caused by missing, duplicated, and misaligned image data. This paper presents a geodesic density regression (GDR) algorithm to correct motion artifacts in 4DCT by correcting artifacts in one breathing phase with artifact-free data from corresponding regions of other breathing phases. The GDR algorithm estimates an artifact-free lung template image and a smooth, dense, 4D (space plus time) vector field that deforms the template image to each breathing phase to produce an artifact-free 4DCT scan. Correspondences are estimated by accounting for the local tissue density change associated with air entering and leaving the lungs, and using binary artifact masks to exclude regions with artifacts from image regression. The artifact-free lung template image is generated by mapping the artifact-free regions of each phase volume to a common reference coordinate system using the estimated correspondences and then averaging. This procedure generates a fixed view of the lung with an improved signal-to-noise ratio. The GDR algorithm was evaluated and compared to a state-of-the-art geodesic intensity regression (GIR) algorithm using simulated CT time-series and 4DCT scans with clinically observed motion artifacts. The simulation shows that the GDR algorithm has achieved significantly more accurate Jacobian images and sharper template images, and is less sensitive to data dropout than the GIR algorithm. We also demonstrate that the GDR algorithm is more effective than the GIR algorithm for removing clinically observed motion artifacts in treatment planning 4DCT scans. Our code is freely available at https://github.com/Wei-Shao-Reg/GDR.
△ Less
Submitted 12 June, 2021;
originally announced June 2021.
-
Plane Spiral OAM Mode-Group Based MIMO Communications: An Experimental Study
Authors:
Xiaowen Xiong,
Shilie Zheng,
Zelin Zhu,
Yuqi Chen,
Hongzhe Shi,
Bingchen Pan,
Cheng Ren,
Xianbin Yu,
Xiaofeng Jin,
Wei E. I. Sha,
Xianmin Zhang
Abstract:
Spatial division multiplexing using conventional orbital angular momentum (OAM) has become a well-known physical layer transmission method over the past decade. The mode-group (MG) superposed by specific single mode plane spiral OAM (PSOAM) waves has been proved to be a flexible beamforming method to achieve the azimuthal pattern diversity, which inherits the spiral phase distribution of conventio…
▽ More
Spatial division multiplexing using conventional orbital angular momentum (OAM) has become a well-known physical layer transmission method over the past decade. The mode-group (MG) superposed by specific single mode plane spiral OAM (PSOAM) waves has been proved to be a flexible beamforming method to achieve the azimuthal pattern diversity, which inherits the spiral phase distribution of conventional OAM wave. Thus, it possesses both the beam directionality and vorticity. In this paper, it's the first time to show and verify novel PSOAM MG based multiple-in-multiple-out (MIMO) communication link (MG-MIMO) experimentally in a line-of-sight (LoS) scenario. A compact multi-mode PSOAM antenna is demonstrated experimentally to generate multiple independent controllable PSOAM waves, which can be used for constructing MGs. After several proof-of-principle tests, it has been verified that the beam directionality gain of MG can improve the receiving signal-to-noise (SNR) level in an actual system, meanwhile, the vorticity can provide another degree of freedom (DoF) to reduce the spatial correlation of MIMO system. Furthermore, a tentative long-distance transmission experiment operated at 10.2 GHz has been performed successfully at a distance of 50 m with a single-way spectrum efficiency of 3.7 bits/s/Hz/stream. The proposed MG-MIMO may have potential in the long-distance LoS back-haul scenario.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
ProsRegNet: A Deep Learning Framework for Registration of MRI and Histopathology Images of the Prostate
Authors:
Wei Shao,
Linda Banh,
Christian A. Kunder,
Richard E. Fan,
Simon J. C. Soerensen,
Jeffrey B. Wang,
Nikola C. Teslovich,
Nikhil Madhuripan,
Anugayathri Jawahar,
Pejman Ghanouni,
James D. Brooks,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
Magnetic resonance imaging (MRI) is an increasingly important tool for the diagnosis and treatment of prostate cancer. However, interpretation of MRI suffers from high inter-observer variability across radiologists, thereby contributing to missed clinically significant cancers, overdiagnosed low-risk cancers, and frequent false positives. Interpretation of MRI could be greatly improved by providin…
▽ More
Magnetic resonance imaging (MRI) is an increasingly important tool for the diagnosis and treatment of prostate cancer. However, interpretation of MRI suffers from high inter-observer variability across radiologists, thereby contributing to missed clinically significant cancers, overdiagnosed low-risk cancers, and frequent false positives. Interpretation of MRI could be greatly improved by providing radiologists with an answer key that clearly shows cancer locations on MRI. Registration of histopathology images from patients who had radical prostatectomy to pre-operative MRI allows such mapping of ground truth cancer labels onto MRI. However, traditional MRI-histopathology registration approaches are computationally expensive and require careful choices of the cost function and registration hyperparameters. This paper presents ProsRegNet, a deep learning-based pipeline to accelerate and simplify MRI-histopathology image registration in prostate cancer. Our pipeline consists of image preprocessing, estimation of affine and deformable transformations by deep neural networks, and mapping cancer labels from histopathology images onto MRI using estimated transformations. We trained our neural network using MR and histopathology images of 99 patients from our internal cohort (Cohort 1) and evaluated its performance using 53 patients from three different cohorts (an additional 12 from Cohort 1 and 41 from two public cohorts). Results show that our deep learning pipeline has achieved more accurate registration results and is at least 20 times faster than a state-of-the-art registration algorithm. This important advance will provide radiologists with highly accurate prostate MRI answer keys, thereby facilitating improvements in the detection of prostate cancer on MRI. Our code is freely available at https://github.com/pimed//ProsRegNet.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Generative Adversarial Networks for Spatio-temporal Data: A Survey
Authors:
Nan Gao,
Hao Xue,
Wei Shao,
Sichen Zhao,
Kyle Kai Qin,
Arian Prabowo,
Mohammad Saiedur Rahaman,
Flora D. Salim
Abstract:
Generative Adversarial Networks (GANs) have shown remarkable success in producing realistic-looking images in the computer vision area. Recently, GAN-based techniques are shown to be promising for spatio-temporal-based applications such as trajectory prediction, events generation and time-series data imputation. While several reviews for GANs in computer vision have been presented, no one has cons…
▽ More
Generative Adversarial Networks (GANs) have shown remarkable success in producing realistic-looking images in the computer vision area. Recently, GAN-based techniques are shown to be promising for spatio-temporal-based applications such as trajectory prediction, events generation and time-series data imputation. While several reviews for GANs in computer vision have been presented, no one has considered addressing the practical applications and challenges relevant to spatio-temporal data. In this paper, we have conducted a comprehensive review of the recent developments of GANs for spatio-temporal data. We summarise the application of popular GAN architectures for spatio-temporal data and the common practices for evaluating the performance of spatio-temporal applications with GANs. Finally, we point out future research directions to benefit researchers in this area.
△ Less
Submitted 29 July, 2021; v1 submitted 18 August, 2020;
originally announced August 2020.
-
CorrSigNet: Learning CORRelated Prostate Cancer SIGnatures from Radiology and Pathology Images for Improved Computer Aided Diagnosis
Authors:
Indrani Bhattacharya,
Arun Seetharaman,
Wei Shao,
Rewa Sood,
Christian A. Kunder,
Richard E. Fan,
Simon John Christoph Soerensen,
Jeffrey B. Wang,
Pejman Ghanouni,
Nikola C. Teslovich,
James D. Brooks,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
Magnetic Resonance Imaging (MRI) is widely used for screening and staging prostate cancer. However, many prostate cancers have subtle features which are not easily identifiable on MRI, resulting in missed diagnoses and alarming variability in radiologist interpretation. Machine learning models have been developed in an effort to improve cancer identification, but current models localize cancer usi…
▽ More
Magnetic Resonance Imaging (MRI) is widely used for screening and staging prostate cancer. However, many prostate cancers have subtle features which are not easily identifiable on MRI, resulting in missed diagnoses and alarming variability in radiologist interpretation. Machine learning models have been developed in an effort to improve cancer identification, but current models localize cancer using MRI-derived features, while failing to consider the disease pathology characteristics observed on resected tissue. In this paper, we propose CorrSigNet, an automated two-step model that localizes prostate cancer on MRI by capturing the pathology features of cancer. First, the model learns MRI signatures of cancer that are correlated with corresponding histopathology features using Common Representation Learning. Second, the model uses the learned correlated MRI features to train a Convolutional Neural Network to localize prostate cancer. The histopathology images are used only in the first step to learn the correlated features. Once learned, these correlated features can be extracted from MRI of new patients (without histopathology or surgery) to localize cancer. We trained and validated our framework on a unique dataset of 75 patients with 806 slices who underwent MRI followed by prostatectomy surgery. We tested our method on an independent test set of 20 prostatectomy patients (139 slices, 24 cancerous lesions, 1.12M pixels) and achieved a per-pixel sensitivity of 0.81, specificity of 0.71, AUC of 0.86 and a per-lesion AUC of $0.96 \pm 0.07$, outperforming the current state-of-the-art accuracy in predicting prostate cancer using MRI.
△ Less
Submitted 31 July, 2020;
originally announced August 2020.
-
FADACS: A Few-shot Adversarial Domain Adaptation Architecture for Context-Aware Parking Availability Sensing
Authors:
Wei Shao,
Sichen Zhao,
Zhen Zhang,
Shiyu Wang,
Mohammad Saiedur Rahaman,
Andy Song,
Flora Dilys Salim
Abstract:
Existing research on parking availability sensing mainly relies on extensive contextual and historical information. In practice, the availability of such information is a challenge as it requires continuous collection of sensory signals. In this study, we design an end-to-end transfer learning framework for parking availability sensing to predict parking occupancy in areas in which the parking dat…
▽ More
Existing research on parking availability sensing mainly relies on extensive contextual and historical information. In practice, the availability of such information is a challenge as it requires continuous collection of sensory signals. In this study, we design an end-to-end transfer learning framework for parking availability sensing to predict parking occupancy in areas in which the parking data is insufficient to feed into data-hungry models. This framework overcomes two main challenges: 1) many real-world cases cannot provide enough data for most existing data-driven models, and 2) it is difficult to merge sensor data and heterogeneous contextual information due to the differing urban fabric and spatial characteristics. Our work adopts a widely-used concept, adversarial domain adaptation, to predict the parking occupancy in an area without abundant sensor data by leveraging data from other areas with similar features. In this paper, we utilise more than 35 million parking data records from sensors placed in two different cities, one a city centre and the other a coastal tourist town. We also utilise heterogeneous spatio-temporal contextual information from external resources, including weather and points of interest. We quantify the strength of our proposed framework in different cases and compare it to the existing data-driven approaches. The results show that the proposed framework is comparable to existing state-of-the-art methods and also provide some valuable insights on parking availability prediction.
△ Less
Submitted 27 January, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Adaptive Feature Selection Guided Deep Forest for COVID-19 Classification with Chest CT
Authors:
Liang Sun,
Zhanhao Mo,
Fuhua Yan,
Liming Xia,
Fei Shan,
Zhongxiang Ding,
Wei Shao,
Feng Shi,
Huan Yuan,
Huiting Jiang,
Dijia Wu,
Ying Wei,
Yaozong Gao,
Wanchun Gao,
He Sui,
Daoqiang Zhang,
Dinggang Shen
Abstract:
Chest computed tomography (CT) becomes an effective tool to assist the diagnosis of coronavirus disease-19 (COVID-19). Due to the outbreak of COVID-19 worldwide, using the computed-aided diagnosis technique for COVID-19 classification based on CT images could largely alleviate the burden of clinicians. In this paper, we propose an Adaptive Feature Selection guided Deep Forest (AFS-DF) for COVID-19…
▽ More
Chest computed tomography (CT) becomes an effective tool to assist the diagnosis of coronavirus disease-19 (COVID-19). Due to the outbreak of COVID-19 worldwide, using the computed-aided diagnosis technique for COVID-19 classification based on CT images could largely alleviate the burden of clinicians. In this paper, we propose an Adaptive Feature Selection guided Deep Forest (AFS-DF) for COVID-19 classification based on chest CT images. Specifically, we first extract location-specific features from CT images. Then, in order to capture the high-level representation of these features with the relatively small-scale data, we leverage a deep forest model to learn high-level representation of the features. Moreover, we propose a feature selection method based on the trained deep forest model to reduce the redundancy of features, where the feature selection could be adaptively incorporated with the COVID-19 classification model. We evaluated our proposed AFS-DF on COVID-19 dataset with 1495 patients of COVID-19 and 1027 patients of community acquired pneumonia (CAP). The accuracy (ACC), sensitivity (SEN), specificity (SPE) and AUC achieved by our method are 91.79%, 93.05%, 89.95% and 96.35%, respectively. Experimental results on the COVID-19 dataset suggest that the proposed AFS-DF achieves superior performance in COVID-19 vs. CAP classification, compared with 4 widely used machine learning methods.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
Orbital Angular Momentum Multiplexing in Highly Reverberant Environments
Authors:
Xiaoming Chen,
Wei Xue,
Hongyu Shi,
Jianjia Yi,
Wei E. I. Sha
Abstract:
Previous studies on orbital angular momentum (OAM) communication mainly considered line-of-sight environments. In this letter, however, it is found that OAM communication with high-order modulation can be achieved in highly reverberant environments by combining the OAM multiplexing with a spatial equalizer. The OAM multiplexing exhibits comparable performance of conventional multiple-input multipl…
▽ More
Previous studies on orbital angular momentum (OAM) communication mainly considered line-of-sight environments. In this letter, however, it is found that OAM communication with high-order modulation can be achieved in highly reverberant environments by combining the OAM multiplexing with a spatial equalizer. The OAM multiplexing exhibits comparable performance of conventional multiple-input multiple-output (MIMO) system.
△ Less
Submitted 14 December, 2019;
originally announced January 2020.
-
Advances in Microwave Near-Field Imaging: Prototypes, Systems, and Applications
Authors:
Wenyi Shao
Abstract:
A near-field microwave imaging system attempts to reveal the presence of an object and/or an electrical property distribution by measuring the scattered field from many positions surrounding the object. Over the past few decades, both the hardware and software components of a near-field microwave imaging system technology have attracted interest throughout the world. Due to limitations of hardware…
▽ More
A near-field microwave imaging system attempts to reveal the presence of an object and/or an electrical property distribution by measuring the scattered field from many positions surrounding the object. Over the past few decades, both the hardware and software components of a near-field microwave imaging system technology have attracted interest throughout the world. Due to limitations of hardware technology (unavailability of data acquisition apparatus), experimental microwave imaging is very challenging for the pioneers. However, Probably due to the hardware cost, most of the studies (operating at a few GHz) were still focused on software only. The feasibility of using microwave approaches to image different types of objects have been tested and verified by simulations in a variety of applications. Further, work has been conducted on improving both quantitative and qualitative algorithms to improve simulated reconstruction results. Nowadays, benefitting from the hardware progress and reduction of their cost, researchers are eager to pursue real experimental validations instead of simulations, and, more unique prototypes and commercial systems have been built for various applications. These prototypes and systems are a result of years of dedicated work and it is important to review the advancements in developed prototype systems. The article will provide an overview of the many of the systems designed from different research groups throughout the world, for applications of near-field microwave imaging. The article further outlines challenges faced in current microwave near-field imaging, developmental tendencies of engineers and scientists, and the future outlook.
△ Less
Submitted 6 August, 2019;
originally announced September 2019.
-
A Phase Shift and Sum Method for UWB Radar Imaging in Dispersive Media
Authors:
Wenyi Shao
Abstract:
A phase shift and sum (PSAS) algorithm to image objects in dispersive media is presented. The algorithm compensates the phase shift of the scattered field from the receiver to the source for each frequency component in an ultrawideband (UWB) and then integrates all the frequency responses. This method resolves the multispeed and multipath issue when UWB signals propagate in dispersive media. In ad…
▽ More
A phase shift and sum (PSAS) algorithm to image objects in dispersive media is presented. The algorithm compensates the phase shift of the scattered field from the receiver to the source for each frequency component in an ultrawideband (UWB) and then integrates all the frequency responses. This method resolves the multispeed and multipath issue when UWB signals propagate in dispersive media. In addition, a multipath effect due to refraction on a curved boundary is also explored. By collecting data using a customized microwave measurement system of two different objects placed in a plastic graduated cylinder filled with glycerin, along the measured dielectric parameters of glycerin (a dispersive medium), highquality reconstructed images are formed using PSAS. Quantitative and qualitative comparisons with two other traditional time-shift radar-based microwave imaging algorithms for the same objects under test demonstrate the advantages of PSAS.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Registration of pre-surgical MRI and whole-mount histopathology images in prostate cancer patients with radical prostatectomy via RAPSODI
Authors:
Mirabela Rusu,
Christian A. Kunder,
Nikola C. Teslovich,
Jeffrey B Wang,
Rewa R. Sood,
Wei Shao,
Leo C. Chan,
Robert West,
Richard Fan,
Pejman Ghanouni,
James B. Brooks,
Geoffrey A. Sonn
Abstract:
Magnetic resonance imaging (MRI) has great potential to improve prostate cancer diagnosis. It can spare men with a normal exam from undergoing invasive biopsy while making biopsies more accurate in men with lesions suspicious for cancer. Yet, the subtle differences between cancer and confounding conditions, render the interpretation of MRI challenging. The tissue collected from patients that under…
▽ More
Magnetic resonance imaging (MRI) has great potential to improve prostate cancer diagnosis. It can spare men with a normal exam from undergoing invasive biopsy while making biopsies more accurate in men with lesions suspicious for cancer. Yet, the subtle differences between cancer and confounding conditions, render the interpretation of MRI challenging. The tissue collected from patients that undergo pre-surgical MRI and radical prostatectomy provides a unique opportunity to correlate histopathology images of the entire prostate with MRI in order to accurately map the extent of prostate cancer onto MRI. Here, we introduce the RAPSODI (framework for the registration of radiology and pathology images. RAPSODI relies on a three-step procedure that first reconstructs in 3D the resected tissue using the serial whole-mount histopathology slices, then registers corresponding histopathology and MRI slices, and finally maps the cancer outlines from the histopathology slices onto MRI. We tested RAPSODI in a phantom study where we simulated various conditions, e.g., tissue specimen rotation upon mounting on glass slides, tissue shrinkage during fixation, or imperfect slice-to-slice correspondences between histology and MRI. Our experiments showed that RAPSODI can reliably correct for rotations within $\pm15^{\circ}$ and shrinkage up to 10%. We also evaluated RAPSODI in 89 patients from two institutions that underwent radical prostatectomy, yielding 543 histopathology slices that were registered to corresponding T2 weighted MRI slices. We found a Dice coefficient of 0.98$ \pm $0.01 for the prostate, prostate boundary Hausdorff distance of 1.71$ \pm $0.48 mm, a urethra deviation of 2.91$ \pm $1.25 mm, and a landmark deviation of 2.88$ \pm $0.70 mm between registered histopathology images and MRI. Our robust framework successfully mapped the extent of disease from histopathology slices onto MRI.
△ Less
Submitted 21 September, 2019; v1 submitted 30 June, 2019;
originally announced July 2019.
-
Towards Understanding Regularization in Batch Normalization
Authors:
Ping Luo,
Xinjiang Wang,
Wenqi Shao,
Zhanglin Peng
Abstract:
Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit r…
▽ More
Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization. Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate. Third, generalization of BN is explored by using statistical mechanics. Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses.
△ Less
Submitted 24 April, 2019; v1 submitted 4 September, 2018;
originally announced September 2018.