Search | arXiv e-print repository

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.03141 [pdf, other]

Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation

Authors: Yihao Zhou, Timothy Tin-Yan Lee, Kelly Ka-Lee Lai, Chonglin Wu, Hin Ting Lau, De Yang, Chui-Yi Chan, Winnie Chiu-Wing Chu, Jack Chun-Yiu Cheng, Tsz-Ping Lam, Yong-Ping Zheng

Abstract: The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of mea… ▽ More The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of measuring spinal curvature is still carried out manually. Consequently, there is a considerable demand for a fully automatic system that can locate bony landmarks and perform angle measurements. To this end, we introduce an estimation model for automatic ultrasound curve angle (UCA) measurement. The model employs a dual-branch network to detect candidate landmarks and perform vertebra segmentation on ultrasound coronal images. An affinity clustering strategy is utilized within the vertebral segmentation area to illustrate the affinity relationship between candidate landmarks. Subsequently, we can efficiently perform line delineation from a clustered affinity map for UCA measurement. As our method is specifically designed for UCA calculation, this method outperforms other state-of-the-art methods for landmark and line detection tasks. The high correlation between the automatic UCA and Cobb angle (R$^2$=0.858) suggests that our proposed method can potentially replace manual UCA measurement in ultrasound scoliosis assessment. △ Less

Submitted 6 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

arXiv:2402.16865 [pdf, other]

Improve Robustness of Eye Disease Detection by including Learnable Probabilistic Discrete Latent Variables into Machine Learning Models

Authors: Anirudh Prabhakaran, YeKun Xiao, Ching-Yu Cheng, Dianbo Liu

Abstract: Ocular diseases, ranging from diabetic retinopathy to glaucoma, present a significant public health challenge due to their prevalence and potential for causing vision impairment. Early and accurate diagnosis is crucial for effective treatment and management.In recent years, deep learning models have emerged as powerful tools for analysing medical images, including ocular imaging . However, challen… ▽ More Ocular diseases, ranging from diabetic retinopathy to glaucoma, present a significant public health challenge due to their prevalence and potential for causing vision impairment. Early and accurate diagnosis is crucial for effective treatment and management.In recent years, deep learning models have emerged as powerful tools for analysing medical images, including ocular imaging . However, challenges persist in model interpretability and uncertainty estimation, which are critical for clinical decision-making. This study introduces a novel application of GFlowOut, leveraging the probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks, for the classification and analysis of ocular diseases using eye fundus images. We develop a robust and generalizable method that utilizes GFlowOut integrated with ResNet18 and ViT models as backbone in identifying various ocular conditions. This study employs a unique set of dropout masks - none, random, bottomup, and topdown - to enhance model performance in analyzing ocular images. Our results demonstrate that the bottomup GFlowOut mask significantly improves accuracy, outperforming the traditional dropout approach. △ Less

Submitted 20 January, 2024; originally announced February 2024.

Comments: This is a work in progress

arXiv:2402.16144 [pdf, ps, other]

100 Gbps Indoor Access and 4.8 Gbps Outdoor Point-to-Point LiFi Transmission Systems using Laser-based Light Sources

Authors: Cheng Cheng, Sovan Das, Stefan Videv, Adrian Spark, Sina Babadi, Aravindh Krishnamoorthy, Changmin Lee, Daniel Grieder, Kathleen Hartnett, Paul Rudy, James Raring, Marzieh Najafi, Vasilis K. Papanikolaou, Robert Schober, Harald Haas

Abstract: In this paper, we demonstrate the communication capabilities of light-fidelity (LiFi) systems based on highbrightness and high-bandwidth integrated laser-based sources in a surface mount device (SMD) packaging platform. The laserbased source is able to deliver 450 lumens of white light illumination and the resultant light brightness is over 1000 cd mm2. It is demonstrated that a wavelength divisio… ▽ More In this paper, we demonstrate the communication capabilities of light-fidelity (LiFi) systems based on highbrightness and high-bandwidth integrated laser-based sources in a surface mount device (SMD) packaging platform. The laserbased source is able to deliver 450 lumens of white light illumination and the resultant light brightness is over 1000 cd mm2. It is demonstrated that a wavelength division multiplexing (WDM) LiFi system with ten parallel channels is able to deliver over 100 Gbps data rate with the assistance of Volterra filter-based nonlinear equalisers. In addition, an aggregated transmission data rate of 4.8 Gbps has been achieved over a link distance of 500 m with the same type of SMD light source. This work demonstrates the scalability of LiFi systems that employ laserbased light sources, particularly in their capacity to enable highspeed short range, as well as long-range data transmission. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2310.11692 [pdf, other]

Random Sampling of Bandlimited Graph Signals from Local Measurements

Authors: Lili Shen, Jun Xian, Cheng Cheng

Abstract: The random sampling on graph signals is one of the fundamental topics in graph signal processing. In this letter, we consider the random sampling of k-bandlimited signals from the local measurements and show that no more than O(klogk) measurements with replacement are sufficient for the accurate and stable recovery of any k-bandlimited graph signals. We propose two random sampling strategies based… ▽ More The random sampling on graph signals is one of the fundamental topics in graph signal processing. In this letter, we consider the random sampling of k-bandlimited signals from the local measurements and show that no more than O(klogk) measurements with replacement are sufficient for the accurate and stable recovery of any k-bandlimited graph signals. We propose two random sampling strategies based on the minimum measurements, i.e., the optimal sampling and the estimated sampling. The geodesic distance between vertices is introduced to design the sampling probability distribution. Numerical experiments are included to show the effectiveness of the proposed methods. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2308.06069 [pdf, other]

Safeguarding Learning-based Control for Smart Energy Systems with Sampling Specifications

Authors: Chih-Hong Cheng, Venkatesh Prasad Venkataramanan, Pragya Kirti Gupta, Yun-Fei Hsu, Simon Burton

Abstract: We study challenges using reinforcement learning in controlling energy systems, where apart from performance requirements, one has additional safety requirements such as avoiding blackouts. We detail how these safety requirements in real-time temporal logic can be strengthened via discretization into linear temporal logic (LTL), such that the satisfaction of the LTL formulae implies the satisfacti… ▽ More We study challenges using reinforcement learning in controlling energy systems, where apart from performance requirements, one has additional safety requirements such as avoiding blackouts. We detail how these safety requirements in real-time temporal logic can be strengthened via discretization into linear temporal logic (LTL), such that the satisfaction of the LTL formulae implies the satisfaction of the original safety requirements. The discretization enables advanced engineering methods such as synthesizing shields for safe reinforcement learning as well as formal verification, where for statistical model checking, the probabilistic guarantee acquired by LTL model checking forms a lower bound for the satisfaction of the original real-time safety requirements. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2307.11784 [pdf, other]

What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems

Authors: Saddek Bensalem, Chih-Hong Cheng, Wei Huang, Xiaowei Huang, Changshun Wu, Xingyu Zhao

Abstract: Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verificati… ▽ More Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2306.16036 [pdf, other]

A Cascaded Approach for ultraly High Performance Lesion Detection and False Positive Removal in Liver CT Scans

Authors: Fakai Wang, Chi-Tung Cheng, Chien-Wei Peng, Ke Yan, Min Wu, Le Lu, Chien-Hung Liao, Ling Zhang

Abstract: Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and… ▽ More Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and the complexities of tumor types or subtypes. In this work, we customize a multi-object labeling tool for multi-phase CT images, which is used to curate a large-scale dataset containing 1,631 patients with four-phase CT images, multi-organ masks, and multi-lesion (six major types of liver lesions confirmed by pathology) masks. We develop a two-stage liver lesion detection pipeline, where the high-sensitivity detecting algorithms in the first stage discover as many lesion proposals as possible, and the lesion-reclassification algorithms in the second stage remove as many false alarms as possible. The multi-sensitivity lesion detection algorithm maximizes the information utilization of the individual probability maps of segmentation, and the lesion-shuffle augmentation effectively explores the texture contrast between lesions and the liver. Independently tested on 331 patient cases, the proposed model achieves high sensitivity and specificity for malignancy classification in the multi-phase contrast-enhanced CT (99.2%, 97.1%, diagnosis setting) and in the noncontrast CT (97.3%, 95.7%, screening setting). △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2305.11959 [pdf, other]

CIAMA: A Multiple Access Scheme with High Diversity and Multiplexing Gains for Next-gen Wireless Networks

Authors: Jianjian Wu, Chi-Tsun Cheng, Qingfeng Zhou

Abstract: This paper studies advanced multi-access techniques to support high volumes of concurrent access in wireless networks. Sparse code multiple access (SCMA), as a code-domain Non-Orthogonal Multiple Access (NOMA), serves multiple users simultaneously by adopting frequency-domain coding. Blind Interference Alignment, in contrast, applies time-domain coding to accommodate multiple users. Unlike beamfor… ▽ More This paper studies advanced multi-access techniques to support high volumes of concurrent access in wireless networks. Sparse code multiple access (SCMA), as a code-domain Non-Orthogonal Multiple Access (NOMA), serves multiple users simultaneously by adopting frequency-domain coding. Blind Interference Alignment, in contrast, applies time-domain coding to accommodate multiple users. Unlike beamforming, both of them need no Channel State Information at the Transmitter (CSIT), which saves control overheads on channel information feedback. To further increase multiplexing gain and diversity order, we propose a new multiple access framework, which utilizes both time and frequency coding by combining SCMA and BIA, which is CIAMA (sparseCode-and-bIA-based multiple access). Two decoding schemes, namely the two-stage decoding scheme consisting of zero-forcing and Message Passing Algorithm (MPA), and the Joint Message Passing Algorithm (JMPA) enhanced by constructing a virtual factor graph, have been analyzed. Simulation results indicate that although the performance of the two-stage decoding scheme is inferior to both BIA and SCMA, it has a relatively low decoding complexity. Nonetheless, the JMPA decoding scheme achieves the same diversity gain as an STBC-based SCMA and with an even higher multiplexing gain, which makes the CIAMA with JMPA decoding scheme a promising MA scheme for next-gen wireless networks. △ Less

Submitted 25 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: The second version. It is currently submitted to a potential journal. A new co-author added: thanks for the significant suggestions on the paper editing from C.T. Cheng

arXiv:2304.03895 [pdf, other]

MCDIP-ADMM: Overcoming Overfitting in DIP-based CT reconstruction

Authors: Chen Cheng, Qingping Zhou

Abstract: This paper investigates the application of unsupervised learning methods for computed tomography (CT) reconstruction. To motivate our work, we review several existing priors, namely the truncated Gaussian prior, the $l_1$ prior, the total variation prior, and the deep image prior (DIP). We find that DIP outperforms the other three priors in terms of representational capability and visual performan… ▽ More This paper investigates the application of unsupervised learning methods for computed tomography (CT) reconstruction. To motivate our work, we review several existing priors, namely the truncated Gaussian prior, the $l_1$ prior, the total variation prior, and the deep image prior (DIP). We find that DIP outperforms the other three priors in terms of representational capability and visual performance. However, the performance of DIP deteriorates when the number of iterations exceeds a certain threshold due to overfitting. To address this issue, we propose a novel method (MCDIP-ADMM) based on Multi-Code Deep Image Prior and plug-and-play Alternative Direction Method of Multipliers. Specifically, MCDIP utilizes multiple latent codes to generate a series of feature maps at an intermediate layer within a generator model. These maps are then composed with trainable weights, representing the complete image prior. Experimental results demonstrate the superior performance of the proposed MCDIP-ADMM compared to three existing competitors. In the case of parallel beam projection with Gaussian noise, MCDIP-ADMM achieves an average improvement of 4.3 dBでしべる over DIP, 1.7 dBでしべる over ADMM DIP-WTV, and 1.2 dBでしべる over PnP-DIP in terms of PSNR. Similarly, for fan-beam projection with Poisson noise, MCDIP-ADMM achieves an average improvement of 3.09 dBでしべる over DIP, 1.86 dBでしべる over ADMM DIP-WTV, and 0.84 dBでしべる over PnP-DIP in terms of PSNR. △ Less

Submitted 1 June, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: 25 pages

arXiv:2302.00626 [pdf, other]

Continuous U-Net: Faster, Greater and Noiseless

Authors: Chun-Wun Cheng, Christina Runkel, Lihao Liu, Raymond H Chan, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: Image segmentation is a fundamental task in image analysis and clinical practice. The current state-of-the-art techniques are based on U-shape type encoder-decoder networks with skip connections, called U-Net. Despite the powerful performance reported by existing U-Net type networks, they suffer from several major limitations. Issues include the hard coding of the receptive field size, compromisin… ▽ More Image segmentation is a fundamental task in image analysis and clinical practice. The current state-of-the-art techniques are based on U-shape type encoder-decoder networks with skip connections, called U-Net. Despite the powerful performance reported by existing U-Net type networks, they suffer from several major limitations. Issues include the hard coding of the receptive field size, compromising the performance and computational cost, as well as the fact that they do not account for inherent noise in the data. They have problems associated with discrete layers, and do not offer any theoretical underpinning. In this work we introduce continuous U-Net, a novel family of networks for image segmentation. Firstly, continuous U-Net is a continuous deep neural network that introduces new dynamic blocks modelled by second order ordinary differential equations. Secondly, we provide theoretical guarantees for our network demonstrating faster convergence, higher robustness and less sensitivity to noise. Thirdly, we derive qualitative measures to tailor-made segmentation tasks. We demonstrate, through extensive numerical and visual results, that our model outperforms existing U-Net blocks for several medical image segmentation benchmarking datasets. △ Less

Submitted 1 February, 2023; originally announced February 2023.

arXiv:2211.06770 [pdf, other]

MicroISP: Processing 32MP Photos on Mobile Devices with Deep Learning

Authors: Andrey Ignatov, Anastasia Sycheva, Radu Timofte, Yu Tseng, Yu-Syuan Xu, Po-Hsiang Yu, Cheng-Ming Chiang, Hsien-Kai Kuo, Min-Hung Chen, Chia-Ming Cheng, Luc Van Gool

Abstract: While neural networks-based photo processing solutions can provide a better image quality compared to the traditional ISP systems, their application to mobile devices is still very limited due to their very high computational complexity. In this paper, we present a novel MicroISP model designed specifically for edge devices, taking into account their computational and memory limitations. The propo… ▽ More While neural networks-based photo processing solutions can provide a better image quality compared to the traditional ISP systems, their application to mobile devices is still very limited due to their very high computational complexity. In this paper, we present a novel MicroISP model designed specifically for edge devices, taking into account their computational and memory limitations. The proposed solution is capable of processing up to 32MP photos on recent smartphones using the standard mobile ML libraries and requiring less than 1 second to perform the inference, while for FullHD images it achieves real-time performance. The architecture of the model is flexible, allowing to adjust its complexity to devices of different computational power. To evaluate the performance of the model, we collected a novel Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The experiments demonstrated that, despite its compact size, the MicroISP model is able to provide comparable or better visual results than the traditional mobile ISP systems, while outperforming the previously proposed efficient deep learning based solutions. Finally, this model is also compatible with the latest mobile AI accelerators, achieving good runtime and low power consumption on smartphone NPUs and APUs. The code, dataset and pre-trained models are available on the project website: https://people.ee.ethz.ch/~ihnatova/microisp.html △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2211.06263

arXiv:2211.06263 [pdf, other]

PyNet-V2 Mobile: Efficient On-Device Photo Processing With Neural Networks

Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Yu Tseng, Yu-Syuan Xu, Po-Hsiang Yu, Cheng-Ming Chiang, Hsien-Kai Kuo, Min-Hung Chen, Chia-Ming Cheng, Luc Van Gool

Abstract: The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address th… ▽ More The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address this limitation, we propose a novel PyNET-V2 Mobile CNN architecture designed specifically for edge devices, being able to process RAW 12MP photos directly on mobile phones under 1.5 second and producing high perceptual photo quality. To train and to evaluate the performance of the proposed solution, we use the real-world Fujifilm UltraISP dataset consisting on thousands of RAW-RGB image pairs captured with a professional medium-format 102MP Fujifilm camera and a popular Sony mobile camera sensor. The results demonstrate that the PyNET-V2 Mobile model can substantially surpass the quality of tradition ISP pipelines, while outperforming the previously introduced neural network-based solutions designed for fast image processing. Furthermore, we show that the proposed architecture is also compatible with the latest mobile AI accelerators such as NPUs or APUs that can be used to further reduce the latency of the model to as little as 0.5 second. The dataset, code and pre-trained models used in this paper are available on the project website: https://github.com/gmalivenko/PyNET-v2 △ Less

Submitted 8 November, 2022; originally announced November 2022.

arXiv:2211.05256 [pdf, other]

Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang , et al. (29 additional authors not shown)

Abstract: Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob… ▽ More Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.08826, arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.03885

arXiv:2210.02445 [pdf, other]

Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks

Authors: Xiaofeng Lei, Shaohua Li, Xinxing Xu, Huazhu Fu, Yong Liu, Yih-Chung Tham, Yangqin Feng, Mingrui Tan, Yanyu Xu, Jocelyn Hui Lin Goh, Rick Siow Mong Goh, Ching-Yu Cheng

Abstract: Localizing anatomical landmarks are important tasks in medical image analysis. However, the landmarks to be localized often lack prominent visual features. Their locations are elusive and easily confused with the background, and thus precise localization highly depends on the context formed by their surrounding areas. In addition, the required precision is usually higher than segmentation and obje… ▽ More Localizing anatomical landmarks are important tasks in medical image analysis. However, the landmarks to be localized often lack prominent visual features. Their locations are elusive and easily confused with the background, and thus precise localization highly depends on the context formed by their surrounding areas. In addition, the required precision is usually higher than segmentation and object detection tasks. Therefore, localization has its unique challenges different from segmentation or detection. In this paper, we propose a zoom-in attentive network (ZIAN) for anatomical landmark localization in ocular images. First, a coarse-to-fine, or "zoom-in" strategy is utilized to learn the contextualized features in different scales. Then, an attentive fusion module is adopted to aggregate multi-scale features, which consists of 1) a co-attention network with a multiple regions-of-interest (ROIs) scheme that learns complementary features from the multiple ROIs, 2) an attention-based fusion module which integrates the multi-ROIs features and non-ROI features. We evaluated ZIAN on two open challenge tasks, i.e., the fovea localization in fundus images and scleral spur localization in AS-OCT images. Experiments show that ZIAN achieves promising performances and outperforms state-of-the-art localization methods. The source code and trained models of ZIAN are available at https://github.com/leixiaofeng-astar/OMIA9-ZIAN. △ Less

Submitted 22 December, 2022; v1 submitted 25 September, 2022; originally announced October 2022.

arXiv:2210.01918 [pdf, other]

doi 10.1109/TSP.2023.330361

Nonparametric and Regularized Dynamical Wasserstein Barycenters for Sequential Observations

Authors: Kevin C. Cheng, Shuchin Aeron, Michael C. Hughes, Eric L. Miller

Abstract: We consider probabilistic models for sequential observations which exhibit gradual transitions among a finite number of states. We are particularly motivated by applications such as human activity analysis where observed accelerometer time series contains segments representing distinct activities, which we call pure states, as well as periods characterized by continuous transition among these pure… ▽ More We consider probabilistic models for sequential observations which exhibit gradual transitions among a finite number of states. We are particularly motivated by applications such as human activity analysis where observed accelerometer time series contains segments representing distinct activities, which we call pure states, as well as periods characterized by continuous transition among these pure states. To capture this transitory behavior, the dynamical Wasserstein barycenter (DWB) model of Cheng et al. in 2021 [1] associates with each pure state a data-generating distribution and models the continuous transitions among these states as a Wasserstein barycenter of these distributions with dynamically evolving weights. Focusing on the univariate case where Wasserstein distances and barycenters can be computed in closed form, we extend [1] specifically relaxing the parameterization of the pure states as Gaussian distributions. We highlight issues related to the uniqueness in identifying the model parameters as well as uncertainties induced when estimating a dynamically evolving distribution from a limited number of samples. To ameliorate non-uniqueness, we introduce regularization that imposes temporal smoothness on the dynamics of the barycentric weights. A quantile-based approximation of the pure state distributions yields a finite dimensional estimation problem which we numerically solve using cyclic descent alternating between updates to the pure-state quantile functions and the barycentric weights. We demonstrate the utility of the proposed algorithm in segmenting both simulated and real world human activity time series. △ Less

Submitted 21 September, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Journal ref: IEEE Transactions on Signal Processing (2023), volume 71, pages 3164 - 3178

arXiv:2209.02912 [pdf]

Optimal Sensor Placement in Body Surface Networks using Gaussian Processes

Authors: Emad Alenany, Changqing Cheng

Abstract: This paper explores a new sequential selection framework for the optimal sensor placement (OSP) in Electrocardiography imaging networks (ECGI). The proposed methodology incorporates the use a recent experimental design method for the sequential selection of landmarkings on biological objects, namely, Gaussian process landmarking (GPLMK) for better exploration of the candidate sensors. The two expe… ▽ More This paper explores a new sequential selection framework for the optimal sensor placement (OSP) in Electrocardiography imaging networks (ECGI). The proposed methodology incorporates the use a recent experimental design method for the sequential selection of landmarkings on biological objects, namely, Gaussian process landmarking (GPLMK) for better exploration of the candidate sensors. The two experimental design methods work as a source of the training and the validation locations which is fitted using a spatiotemporal Gaussian process (STGP). The STGP is fitted using the training set to predict for the current validation set generated using GPLMK, and the sensor with the largest prediction absolute error is selected from the current validation set and added to the selected sensors. Next, a new validation set is generated and predicted using the current training set. The process continues until selecting a specific number of sensor locations. The study is conducted on a dataset of body surface potential mapping (BSPM) of 352 electrodes of four human subjects. A number of 30 sensor locations is selected using the proposed algorithm. The selected sensor locations achieved average $R^2 = 94.40 \%$ for estimating the whole-body QRS segment. The proposed method adds to design efforts for a more clinically practical ECGI system by improving its wearability and reduce the design cost as well. △ Less

Submitted 6 September, 2022; originally announced September 2022.

Comments: 14, 10 figures, 1 table

arXiv:2209.01336 [pdf, other]

Graph Fourier transforms on directed product graphs

Authors: Cheng Cheng, Yang Chen, Jeon Yu Lee, Qiyu Sun

Abstract: Graph Fourier transform (GFT) is one of the fundamental tools in graph signal processing to decompose graph signals into different frequency components and to represent graph signals with strong correlation by different modes of variation effectively. The GFT on undirected graphs has been well studied and several approaches have been proposed to define GFTs on directed graphs. In this paper, based… ▽ More Graph Fourier transform (GFT) is one of the fundamental tools in graph signal processing to decompose graph signals into different frequency components and to represent graph signals with strong correlation by different modes of variation effectively. The GFT on undirected graphs has been well studied and several approaches have been proposed to define GFTs on directed graphs. In this paper, based on the singular value decompositions of some graph Laplacians, we propose two GFTs on the Cartesian product graph of two directed graphs. We show that the proposed GFTs could represent spatial-temporal data sets on directed networks with strong correlation efficiently, and in the undirected graph setting they are essentially the joint GFT in the literature. In this paper, we also consider the bandlimiting procedure in the spectral domain of the proposed GFTs, and demonstrate its performance to denoise the temperature data set in the region of Brest (France) on January 2014. △ Less

Submitted 7 September, 2022; v1 submitted 3 September, 2022; originally announced September 2022.

arXiv:2208.07363 [pdf, other]

MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

Authors: Nolan Wagener, Andrey Kolobov, Felipe Vieira Frujeri, Ricky Loynd, Ching-An Cheng, Matthew Hausknecht

Abstract: Simulated humanoids are an appealing research domain due to their physical capabilities. Nonetheless, they are also challenging to control, as a policy must drive an unstable, discontinuous, and high-dimensional physical system. One widely studied approach is to utilize motion capture (MoCap) data to teach the humanoid agent low-level skills (e.g., standing, walking, and running) that can then be… ▽ More Simulated humanoids are an appealing research domain due to their physical capabilities. Nonetheless, they are also challenging to control, as a policy must drive an unstable, discontinuous, and high-dimensional physical system. One widely studied approach is to utilize motion capture (MoCap) data to teach the humanoid agent low-level skills (e.g., standing, walking, and running) that can then be re-used to synthesize high-level behaviors. However, even with MoCap data, controlling simulated humanoids remains very hard, as MoCap data offers only kinematic information. Finding physical control inputs to realize the demonstrated motions requires computationally intensive methods like reinforcement learning. Thus, despite the publicly available MoCap data, its utility has been limited to institutions with large-scale compute. In this work, we dramatically lower the barrier for productive research on this topic by training and releasing high-quality agents that can track over three hours of MoCap data for a simulated humanoid in the dm_control physics-based environment. We release MoCapAct (Motion Capture with Actions), a dataset of these expert agents and their rollouts, which contain proprioceptive observations and actions. We demonstrate the utility of MoCapAct by using it to train a single hierarchical policy capable of tracking the entire MoCap dataset within dm_control and show the learned low-level component can be re-used to efficiently learn downstream high-level tasks. Finally, we use MoCapAct to train an autoregressive GPT model and show that it can control a simulated humanoid to perform natural motion completion given a motion prompt. Videos of the results and links to the code and dataset are available at https://microsoft.github.io/MoCapAct. △ Less

Submitted 13 January, 2023; v1 submitted 15 August, 2022; originally announced August 2022.

Comments: Appearing in NeurIPS 2022 Datasets and Benchmarks Track

arXiv:2205.08833 [pdf, other]

Speckle Image Restoration without Clean Data

Authors: Tsung-Ming Tai, Yun-Jie Jhang, Wen-Jyi Hwang, Chau-Jern Cheng

Abstract: Speckle noise is an inherent disturbance in coherent imaging systems such as digital holography, synthetic aperture radar, optical coherence tomography, or ultrasound systems. These systems usually produce only single observation per view angle of the same interest object, imposing the difficulty to leverage the statistic among observations. We propose a novel image restoration algorithm that can… ▽ More Speckle noise is an inherent disturbance in coherent imaging systems such as digital holography, synthetic aperture radar, optical coherence tomography, or ultrasound systems. These systems usually produce only single observation per view angle of the same interest object, imposing the difficulty to leverage the statistic among observations. We propose a novel image restoration algorithm that can perform speckle noise removal without clean data and does not require multiple noisy observations in the same view angle. Our proposed method can also be applied to the situation without knowing the noise distribution as prior. We demonstrate our method is especially well-suited for spectral images by first validating on the synthetic dataset, and also applied on real-world digital holography samples. The results are superior in both quantitative measurement and visual inspection compared to several widely applied baselines. Our method even shows promising results across different speckle noise strengths, without the clean data needed. △ Less

Submitted 18 May, 2022; originally announced May 2022.

arXiv:2205.06242 [pdf, other]

Graph Fourier transform based on singular value decomposition of directed Laplacian

Authors: Yang Chen, Cheng Cheng, Qiyu Sun

Abstract: Graph Fourier transform (GFT) is a fundamental concept in graph signal processing. In this paper, based on singular value decomposition of Laplacian, we introduce a novel definition of GFT on directed graphs, and use singular values of Laplacian to carry the notion of graph frequencies. % of the proposed GFT. The proposed GFT is consistent with the conventional GFT in the undirected graph setting,… ▽ More Graph Fourier transform (GFT) is a fundamental concept in graph signal processing. In this paper, based on singular value decomposition of Laplacian, we introduce a novel definition of GFT on directed graphs, and use singular values of Laplacian to carry the notion of graph frequencies. % of the proposed GFT. The proposed GFT is consistent with the conventional GFT in the undirected graph setting, and on directed circulant graphs, the proposed GFT is the classical discrete Fourier transform, up to some rotation, permutation and phase adjustment. We show that frequencies and frequency components of the proposed GFT can be evaluated by solving some constrained minimization problems with low computational cost. Numerical demonstrations indicate that the proposed GFT could represent graph signals with different modes of variation efficiently. △ Less

Submitted 12 May, 2022; originally announced May 2022.

arXiv:2205.04019 [pdf, other]

Wiener filters on graphs and distributed polynomial approximation algorithms

Authors: Cong Zheng, Cheng Cheng, Qiyu Sun

Abstract: In this paper, we consider Wiener filters to reconstruct deterministic and (wide-band) stationary graph signals from their observations corrupted by random noises, and we propose distributed algorithms to implement Wiener filters and inverse filters on networks in which agents are equipped with a data processing subsystem for limited data storage and computation power, and with a one-hop communica… ▽ More In this paper, we consider Wiener filters to reconstruct deterministic and (wide-band) stationary graph signals from their observations corrupted by random noises, and we propose distributed algorithms to implement Wiener filters and inverse filters on networks in which agents are equipped with a data processing subsystem for limited data storage and computation power, and with a one-hop communication subsystem for direct data exchange only with their adjacent agents. The proposed distributed polynomial approximation algorithm is an exponential convergent quasi-Newton method based on Jacobi polynomial approximation and Chebyshev interpolation polynomial approximation to analytic functions on a cube. Our numerical simulations show that Wiener filtering procedure performs better on denoising (wide-band) stationary signals than the Tikhonov regularization approach does, and that the proposed polynomial approximation algorithms converge faster than the Chebyshev polynomial approximation algorithm and gradient decent algorithm do in the implementation of an inverse filtering procedure associated with a polynomial filter of commutative graph shifts. △ Less

Submitted 8 May, 2022; originally announced May 2022.

arXiv:2205.03238 [pdf]

Ultra-sensitive Flexible Sponge-Sensor Array for Muscle Activities Detection and Human Limb Motion Recognition

Authors: Jiao Suo, Yifan Liu, Clio Cheng, Keer Wang, Meng Chen, Ho-yin Chan, Roy Vellaisamy, Ning Xi, Vivian W. Q. Lou, Wen Jung Li

Abstract: Human limb motion tracking and recognition plays an important role in medical rehabilitation training, lower limb assistance, prosthetics design for amputees, feedback control for assistive robots, etc. Lightweight wearable sensors, including inertial sensors, surface electromyography sensors, and flexible strain/pressure, are promising to become the next-generation human motion capture devices. H… ▽ More Human limb motion tracking and recognition plays an important role in medical rehabilitation training, lower limb assistance, prosthetics design for amputees, feedback control for assistive robots, etc. Lightweight wearable sensors, including inertial sensors, surface electromyography sensors, and flexible strain/pressure, are promising to become the next-generation human motion capture devices. Herein, we present a wireless wearable device consisting of a sixteen-channel flexible sponge-based pressure sensor array to recognize various human lower limb motions by detecting contours on the human skin caused by calf gastrocnemius muscle actions. Each sensing element is a round porous structure of thin carbon nanotube/polydimethylsiloxane nanocomposites with a diameter of 4 mm and thickness of about 400 μみゅーm. Ten human subjects were recruited to perform ten different lower limb motions while wearing the developed device. The motion classification result with the support vector machine method shows a macro-recall of about 97.3% for all ten motions tested. This work demonstrates a portable wearable muscle activity detection device with a lower limb motion recognition application, which can be potentially used in assistive robot control, healthcare, sports monitoring, etc. △ Less

Submitted 29 June, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

Comments: 17 pages, 6 figures

arXiv:2204.08467 [pdf, other]

IOP-FL: Inside-Outside Personalization for Federated Medical Image Segmentation

Authors: Meirui Jiang, Hongzheng Yang, Chen Cheng, Qi Dou

Abstract: Federated learning (FL) allows multiple medical institutions to collaboratively learn a global model without centralizing client data. It is difficult, if possible at all, for such a global model to commonly achieve optimal performance for each individual client, due to the heterogeneity of medical images from various scanners and patient demographics. This problem becomes even more significant wh… ▽ More Federated learning (FL) allows multiple medical institutions to collaboratively learn a global model without centralizing client data. It is difficult, if possible at all, for such a global model to commonly achieve optimal performance for each individual client, due to the heterogeneity of medical images from various scanners and patient demographics. This problem becomes even more significant when deploying the global model to unseen clients outside the FL with unseen distributions not presented during federated training. To optimize the prediction accuracy of each individual client for medical imaging tasks, we propose a novel unified framework for both \textit{Inside and Outside model Personalization in FL} (IOP-FL). Our inside personalization uses a lightweight gradient-based approach that exploits the local adapted model for each client, by accumulating both the global gradients for common knowledge and the local gradients for client-specific optimization. Moreover, and importantly, the obtained local personalized models and the global model can form a diverse and informative routing space to personalize an adapted model for outside FL clients. Hence, we design a new test-time routing scheme using the consistency loss with a shape constraint to dynamically incorporate the models, given the distribution information conveyed by the test data. Our extensive experimental results on two medical image segmentation tasks present significant improvements over SOTA methods on both inside and outside personalization, demonstrating the potential of our IOP-FL scheme for clinical practice. △ Less

Submitted 29 March, 2023; v1 submitted 16 April, 2022; originally announced April 2022.

Comments: Accepted by IEEE TMI special issue on federated learning for medical imaging

arXiv:2203.16007 [pdf, other]

doi 10.1109/LSP.2023.3279781

Multi-target Extractor and Detector for Unknown-number Speaker Diarization

Authors: Chin-Yi Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Abstract: Strong representations of target speakers can help extract important information about speakers and detect corresponding temporal regions in multi-speaker conversations. In this study, we propose a neural architecture that simultaneously extracts speaker representations consistent with the speaker diarization objective and detects the presence of each speaker on a frame-by-frame basis regardless o… ▽ More Strong representations of target speakers can help extract important information about speakers and detect corresponding temporal regions in multi-speaker conversations. In this study, we propose a neural architecture that simultaneously extracts speaker representations consistent with the speaker diarization objective and detects the presence of each speaker on a frame-by-frame basis regardless of the number of speakers in a conversation. A speaker representation (called z-vector) extractor and a time-speaker contextualizer, implemented by a residual network and processing data in both temporal and speaker dimensions, are integrated into a unified framework. Tests on the CALLHOME corpus show that our model outperforms most of the methods proposed so far. Evaluations in a more challenging case with simultaneous speakers ranging from 2 to 7 show that our model achieves 6.4% to 30.9% relative diarization error rate reductions over several typical baselines. △ Less

Submitted 22 May, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted by IEEE Signal Processing Letters

Journal ref: IEEE Signal Processing Letters, vol. 30, pp. 638-642, 2023

arXiv:2112.05372 [pdf]

doi 10.1016/j.measurement.2019.06.004

A recurrent neural network approach for remaining useful life prediction utilizing a novel trend features construction method

Authors: Sen Zhao, Yong Zhang, Shang Wang, Beitong Zhou, Cheng Cheng

Abstract: Data-driven methods for remaining useful life (RUL) prediction normally learn features from a fixed window size of a priori of degradation, which may lead to less accurate prediction results on different datasets because of the variance of local features. This paper proposes a method for RUL prediction which depends on a trend feature representing the overall time sequence of degradation. Complete… ▽ More Data-driven methods for remaining useful life (RUL) prediction normally learn features from a fixed window size of a priori of degradation, which may lead to less accurate prediction results on different datasets because of the variance of local features. This paper proposes a method for RUL prediction which depends on a trend feature representing the overall time sequence of degradation. Complete ensemble empirical mode decomposition, followed by a reconstruction procedure, is created to build the trend features. The probability distribution of sensors' measurement learned by conditional neural processes is used to evaluate the trend features. With the best trend feature, a data-driven model using long short-term memory is developed to predict the RUL. To prove the effectiveness of the proposed method, experiments on a benchmark C-MAPSS dataset are carried out and compared with other state-of-the-art methods. Comparison results show that the proposed method achieves the smallest root mean square values in prediction of all RUL. △ Less

Submitted 10 December, 2021; originally announced December 2021.

Journal ref: Measurement 2019

arXiv:2112.02802 [pdf, ps, other]

Identification of Switched Linear Systems: Persistence of Excitation and Numerical Algorithms

Authors: Biqiang Mu, Tianshi Chen, Changming Cheng, Er-Wei Bai

Abstract: This paper investigates two issues on identification of switched linear systems: persistence of excitation and numerical algorithms. The main contribution is a much weaker condition on the regressor to be persistently exciting that guarantees the uniqueness of the parameter sets and also provides new insights in understanding the relation among different subsystems. It is found that for uniquely d… ▽ More This paper investigates two issues on identification of switched linear systems: persistence of excitation and numerical algorithms. The main contribution is a much weaker condition on the regressor to be persistently exciting that guarantees the uniqueness of the parameter sets and also provides new insights in understanding the relation among different subsystems. It is found that for uniquely determining the parameters of switched linear systems, the minimum number of samples needed derived from our condition is much smaller than that reported in the literature. The secondary contribution of the paper concerns the numerical algorithm. Though the algorithm is not new, we show that our surrogate problem, relaxed from an integer optimization to a continuous minimization, has exactly the same solution as the original integer optimization, which is effectively solved by a block-coordinate descent algorithm. Moreover, an algorithm for handling unknown number of subsystems is proposed. Several numerical examples are illustrated to support theoretical analysis. △ Less

Submitted 6 December, 2021; originally announced December 2021.

arXiv:2111.08400 [pdf, other]

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Authors: Yi-Chang Chen, Chun-Yen Cheng, Chien-An Chen, Ming-Chieh Sung, Yi-Ren Yeh

Abstract: Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic correction while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homopho… ▽ More Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic correction while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homophonic errors are fairly common in Chinese ASR. In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. Our experiment results on real world speech recognition datasets showed that our proposed method has evidently lower CER than the baseline model, which utilized a pre-trained BERT MLM as the corrector. △ Less

Submitted 16 November, 2021; originally announced November 2021.

arXiv:2111.03997 [pdf]

The Three-Dimensional Structural Configuration of the Central Retinal Vessel Trunk and Branches as a Glaucoma Biomarker

Authors: Satish K. Panda, Haris Cheong, Tin A. Tun, Thanadet Chuangsuwanich, Aiste Kadziauskiene, Vijayalakshmi Senthil, Ramaswami Krishnadas, Martin L. Buist, Shamira Perera, Ching-Yu Cheng, Tin Aung, Alexandre H. Thiery, Michael J. A. Girard

Abstract: Purpose: To assess whether the three-dimensional (3D) structural configuration of the central retinal vessel trunk and its branches (CRVT&B) could be used as a diagnostic marker for glaucoma. Method: We trained a deep learning network to automatically segment the CRVT&B from the B-scans of the optical coherence tomography (OCT) volume of the optic nerve head (ONH). Subsequently, two different appr… ▽ More Purpose: To assess whether the three-dimensional (3D) structural configuration of the central retinal vessel trunk and its branches (CRVT&B) could be used as a diagnostic marker for glaucoma. Method: We trained a deep learning network to automatically segment the CRVT&B from the B-scans of the optical coherence tomography (OCT) volume of the optic nerve head (ONH). Subsequently, two different approaches were used for glaucoma diagnosis using the structural configuration of the CRVT&B as extracted from the OCT volumes. In the first approach, we aimed to provide a diagnosis using only 3D CNN and the 3D structure of the CRVT&B. For the second approach, we projected the 3D structure of the CRVT&B orthographically onto three planes to obtain 2D images, and then a 2D CNN was used for diagnosis. The segmentation accuracy was evaluated using the Dice coefficient, whereas the diagnostic accuracy was assessed using the area under the receiver operating characteristic curves (AUえーゆーC). The diagnostic performance of the CRVT&B was also compared with that of retinal nerve fiber layer (RNFL) thickness. Results: Our segmentation network was able to efficiently segment retinal blood vessels from OCT scans. On a test set, we achieved a Dice coefficient of 0.81\pm0.07. The 3D and 2D diagnostic networks were able to differentiate glaucoma from non-glaucoma subjects with accuracies of 82.7% and 83.3%, respectively. The corresponding AUCs for CRVT&B were 0.89 and 0.90, higher than those obtained with RNFL thickness alone. Conclusions: Our work demonstrated that the diagnostic power of the CRVT&B is superior to that of a gold-standard glaucoma parameter, i.e., RNFL thickness. Our work also suggested that the major retinal blood vessels form a skeleton -- the configuration of which may be representative of major ONH structural changes as typically observed with the development and progression of glaucoma. △ Less

Submitted 8 November, 2021; v1 submitted 7 November, 2021; originally announced November 2021.

arXiv:2110.12786 [pdf, ps, other]

Dictionary Learning Using Rank-One Atomic Decomposition (ROAD)

Authors: Cheng Cheng, Wei Dai

Abstract: Dictionary learning aims at seeking a dictionary under which the training data can be sparsely represented. Methods in the literature typically formulate the dictionary learning problem as an optimization w.r.t. two variables, i.e., dictionary and sparse coefficients, and solve it by alternating between two stages: sparse coding and dictionary update. The key contribution of this work is a Rank-On… ▽ More Dictionary learning aims at seeking a dictionary under which the training data can be sparsely represented. Methods in the literature typically formulate the dictionary learning problem as an optimization w.r.t. two variables, i.e., dictionary and sparse coefficients, and solve it by alternating between two stages: sparse coding and dictionary update. The key contribution of this work is a Rank-One Atomic Decomposition (ROAD) formulation where dictionary learning is cast as an optimization w.r.t. a single variable which is a set of rank one matrices. The resulting algorithm is hence single-stage. Compared with two-stage algorithms, ROAD minimizes the sparsity of the coefficients whilst keeping the data consistency constraint throughout the whole learning process. An alternating direction method of multipliers (ADMM) is derived to solve the optimization problem and the lower bound of the penalty parameter is computed to guarantees a global convergence despite non-convexity of the optimization formulation. From practical point of view, ROAD reduces the number of tuning parameters required in other benchmark algorithms. Numerical tests demonstrate that ROAD outperforms other benchmark algorithms for both synthetic data and real data, especially when the number of training samples is small. △ Less

Submitted 26 October, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: arXiv admin note: text overlap with arXiv:1911.08975

arXiv:2110.06641 [pdf, ps, other]

Dictionary Learning with Convex Update (ROMD)

Authors: Cheng Cheng, Wei Dai

Abstract: Dictionary learning aims to find a dictionary under which the training data can be sparsely represented, and it is usually achieved by iteratively applying two stages: sparse coding and dictionary update. Typical methods for dictionary update focuses on refining both dictionary atoms and their corresponding sparse coefficients by using the sparsity patterns obtained from sparse coding stage, and h… ▽ More Dictionary learning aims to find a dictionary under which the training data can be sparsely represented, and it is usually achieved by iteratively applying two stages: sparse coding and dictionary update. Typical methods for dictionary update focuses on refining both dictionary atoms and their corresponding sparse coefficients by using the sparsity patterns obtained from sparse coding stage, and hence it is a non-convex bilinear inverse problem. In this paper, we propose a Rank-One Matrix Decomposition (ROMD) algorithm to recast this challenge into a convex problem by resolving these two variables into a set of rank-one matrices. Different from methods in the literature, ROMD updates the whole dictionary at a time using convex programming. The advantages hence include both convergence guarantees for dictionary update and faster convergence of the whole dictionary learning. The performance of ROMD is compared with other benchmark dictionary learning algorithms. The results show the improvement of ROMD in recovery accuracy, especially in the cases of high sparsity level and fewer observation data. △ Less

Submitted 25 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

arXiv:2110.02511 [pdf, other]

A Survey on Recent Deep Learning-driven Singing Voice Synthesis Systems

Authors: Yin-Ping Cho, Fu-Rong Yang, Yung-Chuan Chang, Ching-Ting Cheng, Xiao-Han Wang, Yi-Wen Liu

Abstract: Singing voice synthesis (SVS) is a task that aims to generate audio signals according to musical scores and lyrics. With its multifaceted nature concerning music and language, producing singing voices indistinguishable from that of human singers has always remained an unfulfilled pursuit. Nonetheless, the advancements of deep learning techniques have brought about a substantial leap in the quality… ▽ More Singing voice synthesis (SVS) is a task that aims to generate audio signals according to musical scores and lyrics. With its multifaceted nature concerning music and language, producing singing voices indistinguishable from that of human singers has always remained an unfulfilled pursuit. Nonetheless, the advancements of deep learning techniques have brought about a substantial leap in the quality and naturalness of synthesized singing voice. This paper aims to review some of the state-of-the-art deep learning-driven SVS systems. We intend to summarize their deployed model architectures and identify the strengths and limitations for each of the introduced systems. Thereby, we picture the recent advancement trajectory of this field and conclude the challenges left to be resolved both in commercial applications and academic research. △ Less

Submitted 6 October, 2021; originally announced October 2021.

arXiv:2110.02141 [pdf, ps, other]

Short-and-Sparse Deconvolution Via Rank-One Constrained Optimization (ROCO)

Authors: Cheng Cheng, Wei Dai

Abstract: Short-and-sparse deconvolution (SaSD) aims to recover a short kernel and a long and sparse signal from their convolution. In the literature, formulations of blind deconvolution is either a convex programming via a matrix lifting of convolution, or a bilinear Lasso. Optimization solvers are typically based on bilinear factorizations. In this paper, we formulate SaSD as a non-convex optimization wit… ▽ More Short-and-sparse deconvolution (SaSD) aims to recover a short kernel and a long and sparse signal from their convolution. In the literature, formulations of blind deconvolution is either a convex programming via a matrix lifting of convolution, or a bilinear Lasso. Optimization solvers are typically based on bilinear factorizations. In this paper, we formulate SaSD as a non-convex optimization with a rank-one matrix constraint, hence referred to as Rank-One Constrained Optimization (ROCO). The solver is based on alternating direction method of multipliers (ADMM). It operates on the full rank-one matrix rather than bilinear factorizations. Closed form updates are derived for the efficiency of ADMM. Simulations include both synthetic data and real images. Results show substantial improvements in recovery accuracy (at least 19dBでしべる in PSNR for real images) and comparable runtime compared with benchmark algorithms based on bilinear factorization. △ Less

Submitted 22 November, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

arXiv:2110.01809 [pdf, other]

DA-DRN: Degradation-Aware Deep Retinex Network for Low-Light Image Enhancement

Authors: Xinxu Wei, Xianshi Zhang, Shisen Wang, Cheng Cheng, Yanlin Huang, Kaifu Yang, Yongjie Li

Abstract: Images obtained in real-world low-light conditions are not only low in brightness, but they also suffer from many other types of degradation, such as color distortion, unknown noise, detail loss and halo artifacts. In this paper, we propose a Degradation-Aware Deep Retinex Network (denoted as DA-DRN) for low-light image enhancement and tackle the above degradation. Based on Retinex Theory, the dec… ▽ More Images obtained in real-world low-light conditions are not only low in brightness, but they also suffer from many other types of degradation, such as color distortion, unknown noise, detail loss and halo artifacts. In this paper, we propose a Degradation-Aware Deep Retinex Network (denoted as DA-DRN) for low-light image enhancement and tackle the above degradation. Based on Retinex Theory, the decomposition net in our model can decompose low-light images into reflectance and illumination maps and deal with the degradation in the reflectance during the decomposition phase directly. We propose a Degradation-Aware Module (DA Module) which can guide the training process of the decomposer and enable the decomposer to be a restorer during the training phase without additional computational cost in the test phase. DA Module can achieve the purpose of noise removal while preserving detail information into the illumination map as well as tackle color distortion and halo artifacts. We introduce Perceptual Loss to train the enhancement network to generate the brightness-improved illumination maps which are more consistent with human visual perception. We train and evaluate the performance of our proposed model over the LOL real-world and LOL synthetic datasets, and we also test our model over several other frequently used datasets without Ground-Truth (LIME, DICM, MEF and NPE datasets). We conduct extensive experiments to demonstrate that our approach achieves a promising effect with good rubustness and generalization and outperforms many other state-of-the-art methods qualitatively and quantitatively. Our method only takes 7 ms to process an image with 600x400 resolution on a TITAN Xp GPU. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: 16 pages, 17 figures

ACM Class: I.2; I.4

arXiv:2109.08363 [pdf]

doi 10.1016/j.enconman.2022.115299

100% renewable electricity in Japan

Authors: Cheng Cheng, Andrew Blakers, Matthew Stocks, Bin Lu

Abstract: Japan has committed to carbon neutrality by 2050. Emissions from the electricity sector amount to 42% of the total. Solar photovoltaics (PV) and wind comprise three quarters of global net capacity additions because of low and falling prices. This provides an opportunity for Japan to make large reductions in emissions while also reducing its dependence on energy imports. This study shows that Japan… ▽ More Japan has committed to carbon neutrality by 2050. Emissions from the electricity sector amount to 42% of the total. Solar photovoltaics (PV) and wind comprise three quarters of global net capacity additions because of low and falling prices. This provides an opportunity for Japan to make large reductions in emissions while also reducing its dependence on energy imports. This study shows that Japan has 14 times more solar and offshore wind resources than needed to supply 100% renewable electricity. A 40 year hourly energy balance model is presented of Japan's electricity system using historical data. Pumped hydro energy storage, high voltage interconnection and dispatchable capacity (hydro, biomass and hydrogen energy) are included to balance variable generation and demand. Differential evolution is used to find the least-cost solution under various constraints. The levelized cost of electricity is found to be USD 86 per MWh for a PV-dominated system, and USD 110 per MWh for a wind-dominated system. These costs can be compared with the average system prices on the spot market in Japan of USD 102 per MWh. In summary, Japan can be self-sufficient for electricity supply at competitive costs. △ Less

Submitted 17 September, 2021; originally announced September 2021.

arXiv:2109.06094 [pdf, other]

doi 10.1109/TGRS.2022.3169163

Single-stream CNN with Learnable Architecture for Multi-source Remote Sensing Data

Authors: Yi Yang, Daoye Zhu, Tengteng Qu, Qiangyu Wang, Fuhu Ren, Chengqi Cheng

Abstract: In this paper, we propose an efficient and generalizable framework based on deep convolutional neural network (CNN) for multi-source remote sensing data joint classification. While recent methods are mostly based on multi-stream architectures, we use group convolution to construct equivalent network architectures efficiently within a single-stream network. We further adopt and improve dynamic grou… ▽ More In this paper, we propose an efficient and generalizable framework based on deep convolutional neural network (CNN) for multi-source remote sensing data joint classification. While recent methods are mostly based on multi-stream architectures, we use group convolution to construct equivalent network architectures efficiently within a single-stream network. We further adopt and improve dynamic grouping convolution (DGConv) to make group convolution hyperparameters, and thus the overall network architecture, learnable during network training. The proposed method therefore can theoretically adjust any modern CNN models to any multi-source remote sensing data set, and can potentially avoid sub-optimal solutions caused by manually decided architecture hyperparameters. In the experiments, the proposed method is applied to ResNet and UNet, and the adjusted networks are verified on three very diverse benchmark data sets (i.e., Houston2018 data, Berlin data, and MUUFL data). Experimental results demonstrate the effectiveness of the proposed single-stream CNNs, and in particular ResNet18-DGConv improves the state-of-the-art classification overall accuracy (OA) on HS-SAR Berlin data set from $62.23\%$ to $68.21\%$. In the experiments we have two interesting findings. First, using DGConv generally reduces test OA variance. Second, multi-stream is harmful to model performance if imposed to the first few layers, but becomes beneficial if applied to deeper layers. Altogether, the findings imply that multi-stream architecture, instead of being a strictly necessary component in deep learning models for multi-source remote sensing data, essentially plays the role of model regularizer. Our code is publicly available at https://github.com/yyyyangyi/Multi-source-RS-DGConv. We hope our work can inspire novel research in the future. △ Less

Submitted 6 February, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

arXiv:2106.15953 [pdf, other]

BLNet: A Fast Deep Learning Framework for Low-Light Image Enhancement with Noise Removal and Color Restoration

Authors: Xinxu Wei, Xianshi Zhang, Shisen Wang, Cheng Cheng, Yanlin Huang, Kaifu Yang, Yongjie Li

Abstract: Images obtained in real-world low-light conditions are not only low in brightness, but they also suffer from many other types of degradation, such as color bias, unknown noise, detail loss and halo artifacts. In this paper, we propose a very fast deep learning framework called Bringing the Lightness (denoted as BLNet) that consists of two U-Nets with a series of well-designed loss functions to tac… ▽ More Images obtained in real-world low-light conditions are not only low in brightness, but they also suffer from many other types of degradation, such as color bias, unknown noise, detail loss and halo artifacts. In this paper, we propose a very fast deep learning framework called Bringing the Lightness (denoted as BLNet) that consists of two U-Nets with a series of well-designed loss functions to tackle all of the above degradations. Based on Retinex Theory, the decomposition net in our model can decompose low-light images into reflectance and illumination and remove noise in the reflectance during the decomposition phase. We propose a Noise and Color Bias Control module (NCBC Module) that contains a convolutional neural network and two loss functions (noise loss and color loss). This module is only used to calculate the loss functions during the training phase, so our method is very fast during the test phase. This module can smooth the reflectance to achieve the purpose of noise removal while preserving details and edge information and controlling color bias. We propose a network that can be trained to learn the mapping between low-light and normal-light illumination and enhance the brightness of images taken in low-light illumination. We train and evaluate the performance of our proposed model over the real-world Low-Light (LOL) dataset), and we also test our model over several other frequently used datasets (LIME, DICM and MEF datasets). We conduct extensive experiments to demonstrate that our approach achieves a promising effect with good rubustness and generalization and outperforms many other state-of-the-art methods qualitatively and quantitatively. Our method achieves high speed because we use loss functions instead of introducing additional denoisers for noise removal and color correction. The code and model are available at https://github.com/weixinxu666/BLNet. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: 13 pages, 12 figures, journal

ACM Class: I.2; I.4

arXiv:2106.09110 [pdf, other]

Safe Reinforcement Learning Using Advantage-Based Intervention

Authors: Nolan Wagener, Byron Boots, Ching-An Cheng

Abstract: Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still s… ▽ More Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention. Our code is available at https://github.com/nolanwagener/safe_rl. △ Less

Submitted 19 July, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Appearing in ICML 2021. 29 pages, 8 figures

arXiv:2106.07953 [pdf, other]

Learning to Compensate: A Deep Neural Network Framework for 5G Power Amplifier Compensation

Authors: Po-Yu Chen, Hao Chen, Yi-Min Tsai, Hsien-Kai Kuo, Hantao Huang, Hsin-Hung Chen, Sheng-Hong Yan, Wei-Lun Ou, Chia-Ming Cheng

Abstract: Owing to the complicated characteristics of 5G communication system, designing RF components through mathematical modeling becomes a challenging obstacle. Moreover, such mathematical models need numerous manual adjustments for various specification requirements. In this paper, we present a learning-based framework to model and compensate Power Amplifiers (PAs) in 5G communication. In the proposed… ▽ More Owing to the complicated characteristics of 5G communication system, designing RF components through mathematical modeling becomes a challenging obstacle. Moreover, such mathematical models need numerous manual adjustments for various specification requirements. In this paper, we present a learning-based framework to model and compensate Power Amplifiers (PAs) in 5G communication. In the proposed framework, Deep Neural Networks (DNNs) are used to learn the characteristics of the PAs, while, correspondent Digital Pre-Distortions (DPDs) are also learned to compensate for the nonlinear and memory effects of PAs. On top of the framework, we further propose two frequency domain losses to guide the learning process to better optimize the target, compared to naive time domain Mean Square Error (MSE). The proposed framework serves as a drop-in replacement for the conventional approach. The proposed approach achieves an average of 56.7% reduction of nonlinear and memory effects, which converts to an average of 16.3% improvement over a carefully-designed mathematical model, and even reaches 34% enhancement in severe distortion scenarios. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: IEEE International Conference on Communications (ICC) 2021

arXiv:2103.05922 [pdf, other]

RMP2: A Structured Composable Policy Class for Robot Learning

Authors: Anqi Li, Ching-An Cheng, M. Asif Rana, Man Xie, Karl Van Wyk, Nathan Ratliff, Byron Boots

Abstract: We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different lev… ▽ More We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots. However, implementing a system for end-to-end learning RMPflow policies faces several computational challenges. In this work, we re-examine the message passing algorithm of RMPflow and propose a more efficient alternate algorithm, called RMP2, that uses modern automatic differentiation tools (such as TensorFlow and PyTorch) to compute RMPflow policies. Our new design retains the strengths of RMPflow while bringing in advantages from automatic differentiation, including 1) easy programming interfaces to designing complex transformations; 2) support of general directed acyclic graph (DAG) transformation structures; 3) end-to-end differentiability for policy learning; 4) improved computational efficiency. Because of these features, RMP2 can be treated as a structured policy class for efficient robot learning which is suitable encoding domain knowledge. Our experiments show that using structured policy class given by RMP2 can improve policy performance and safety in reinforcement learning tasks for goal reaching in cluttered space. △ Less

Submitted 10 March, 2021; originally announced March 2021.

arXiv:2012.09755 [pdf, other]

Describing the Structural Phenotype of the Glaucomatous Optic Nerve Head Using Artificial Intelligence

Authors: Satish K. Panda, Haris Cheong, Tin A. Tun, Sripad K. Devella, Ramaswami Krishnadas, Martin L. Buist, Shamira Perera, Ching-Yu Cheng, Tin Aung, Alexandre H. Thiéry, Michaël J. A. Girard

Abstract: The optic nerve head (ONH) typically experiences complex neural- and connective-tissue structural changes with the development and progression of glaucoma, and monitoring these changes could be critical for improved diagnosis and prognosis in the glaucoma clinic. The gold-standard technique to assess structural changes of the ONH clinically is optical coherence tomography (OCT). However, OCT is li… ▽ More The optic nerve head (ONH) typically experiences complex neural- and connective-tissue structural changes with the development and progression of glaucoma, and monitoring these changes could be critical for improved diagnosis and prognosis in the glaucoma clinic. The gold-standard technique to assess structural changes of the ONH clinically is optical coherence tomography (OCT). However, OCT is limited to the measurement of a few hand-engineered parameters, such as the thickness of the retinal nerve fiber layer (RNFL), and has not yet been qualified as a stand-alone device for glaucoma diagnosis and prognosis applications. We argue this is because the vast amount of information available in a 3D OCT scan of the ONH has not been fully exploited. In this study we propose a deep learning approach that can: \textbf{(1)} fully exploit information from an OCT scan of the ONH; \textbf{(2)} describe the structural phenotype of the glaucomatous ONH; and that can \textbf{(3)} be used as a robust glaucoma diagnosis tool. Specifically, the structural features identified by our algorithm were found to be related to clinical observations of glaucoma. The diagnostic accuracy from these structural features was $92.0 \pm 2.3 \%$ with a sensitivity of $90.0 \pm 2.4 \% $ (at $95 \%$ specificity). By changing their magnitudes in steps, we were able to reveal how the morphology of the ONH changes as one transitions from a `non-glaucoma' to a `glaucoma' condition. We believe our work may have strong clinical implication for our understanding of glaucoma pathogenesis, and could be improved in the future to also predict future loss of vision. △ Less

Submitted 17 December, 2020; originally announced December 2020.

arXiv:2012.06275 [pdf]

doi 10.1109/JBHI.2020.3016831

Blind Monaural Source Separation on Heart and Lung Sounds Based on Periodic-Coded Deep Autoencoder

Authors: Kun-Hsi Tsai, Wei-Chien Wang, Chui-Hsuan Cheng, Chan-Yen Tsai, Jou-Kou Wang, Tzu-Hao Lin, Shih-Hau Fang, Li-Chin Chen, Yu Tsao

Abstract: Auscultation is the most efficient way to diagnose cardiovascular and respiratory diseases. To reach accurate diagnoses, a device must be able to recognize heart and lung sounds from various clinical situations. However, the recorded chest sounds are mixed by heart and lung sounds. Thus, effectively separating these two sounds is critical in the pre-processing stage. Recent advances in machine lea… ▽ More Auscultation is the most efficient way to diagnose cardiovascular and respiratory diseases. To reach accurate diagnoses, a device must be able to recognize heart and lung sounds from various clinical situations. However, the recorded chest sounds are mixed by heart and lung sounds. Thus, effectively separating these two sounds is critical in the pre-processing stage. Recent advances in machine learning have progressed on monaural source separations, but most of the well-known techniques require paired mixed sounds and individual pure sounds for model training. As the preparation of pure heart and lung sounds is difficult, special designs must be considered to derive effective heart and lung sound separation techniques. In this study, we proposed a novel periodicity-coded deep auto-encoder (PC-DAE) approach to separate mixed heart-lung sounds in an unsupervised manner via the assumption of different periodicities between heart rate and respiration rate. The PC-DAE benefits from deep-learning-based models by extracting representative features and considers the periodicity of heart and lung sounds to carry out the separation. We evaluated PC-DAE on two datasets. The first one includes sounds from the Student Auscultation Manikin (SAM), and the second is prepared by recording chest sounds in real-world conditions. Experimental results indicate that PC-DAE outperforms several well-known separations works in terms of standardized evaluation metrics. Moreover, waveforms and spectrograms demonstrate the effectiveness of PC-DAE compared to existing approaches. It is also confirmed that by using the proposed PC-DAE as a pre-processing stage, the heart sound recognition accuracies can be notably boosted. The experimental results confirmed the effectiveness of PC-DAE and its potential to be used in clinical applications. △ Less

Submitted 11 December, 2020; originally announced December 2020.

Comments: 13 pages, 11 figures, Accepted by IEEE Journal of Biomedical and Health Informatics

arXiv:2010.11585 [pdf]

doi 10.3390/futuretransp1030034

A simulation-based evaluation of a Cargo-Hitching service for E-commerce using mobility-on-demand vehicles

Authors: Andre Alho, Takanori Sakai, Simon Oh, Cheng Cheng, Ravi Seshadri, Wen Han Chong, Yusuke Hara, Julia Caravias, Lynette Cheah, Moshe Ben-Akiva

Abstract: Time-sensitive parcel deliveries, shipments requested for delivery in a day or less, are an increasingly important research subject. It is challenging to deal with these deliveries from a carrier perspective since it entails additional planning constraints, preventing an efficient consolidation of deliveries which is possible when demand is well known in advance. Furthermore, such time-sensitive d… ▽ More Time-sensitive parcel deliveries, shipments requested for delivery in a day or less, are an increasingly important research subject. It is challenging to deal with these deliveries from a carrier perspective since it entails additional planning constraints, preventing an efficient consolidation of deliveries which is possible when demand is well known in advance. Furthermore, such time-sensitive deliveries are requested to a wider spatial scope than retail centers, including homes and offices. Therefore, an increase in such deliveries is considered to exacerbate negative externalities such as congestion and emissions. One of the solutions is to leverage spare capacity in passenger transport modes. This concept is often denominated as cargo-hitching. While there are various possible system designs, it is crucial that such solution does not deteriorate the quality of service of passenger trips. This research aims to evaluate the use of Mobility-On-Demand services to perform same-day parcel deliveries. For this purpose, we use SimMobility, a high-resolution agent-based simulation platform of passenger and freight flows, applied in Singapore. E-commerce demand carrier data are used to characterize simulated parcel delivery demand. Operational scenarios that aim to minimize the adverse effect of fulfilling deliveries with Mobility-On-Demand vehicles on Mobility-On-Demand passenger flows (fulfillment, wait and travel times) are explored. Results indicate that the Mobility-On-Demand services have potential to fulfill a considerable amount of parcel deliveries and decrease freight vehicle traffic and total vehicle-kilometers-travelled without compromising the quality of Mobility On-Demand for passenger travel. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: 19 pages, 4 tables, 7 figures. Submitted to Transportation (Springer)

Journal ref: Future Transp. 2021, 1, 639-656

arXiv:2008.03877 [pdf, other]

Adaptive support driven Bayesian reweighted algorithm for sparse signal recovery

Authors: Junlin Li, Wei Zhou, Cheng Cheng

Abstract: Sparse learning has been widely studied to capture critical information from enormous data sources in the filed of system identification. Often, it is essential to understand internal working mechanisms of unknown systems (e.g. biological networks) in addition to input-output relationships. For this purpose, various feature selection techniques have been developed. For example, sparse Bayesian lea… ▽ More Sparse learning has been widely studied to capture critical information from enormous data sources in the filed of system identification. Often, it is essential to understand internal working mechanisms of unknown systems (e.g. biological networks) in addition to input-output relationships. For this purpose, various feature selection techniques have been developed. For example, sparse Bayesian learning (SBL) was proposed to learn major features from a dictionary of basis functions, which makes identified models interpretable. Reweighted L1-regularization algorithms are often applied in SBL to solve optimization problems. However, they are expensive in both computation and memory aspects, thus not suitable for large-scale problems. This paper proposes an adaptive support driven Bayesian reweighted (ASDBR) algorithm for sparse signal recovery. A restart strategy based on shrinkage-thresholding is developed to conduct adaptive support estimate, which can effectively reduce computation burden and memory demands. Moreover, ASDBR accurately extracts major features and excludes redundant information from large datasets. Numerical experiments demonstrate the proposed algorithm outperforms state-of-the-art methods. △ Less

Submitted 9 August, 2020; originally announced August 2020.

arXiv:2007.09586 [pdf]

A zero-carbon, reliable and affordable energy future in Australia

Authors: Bin Lu, Andrew Blakers, Matthew Stocks, Cheng Cheng, Anna Nadolny

Abstract: Australia has one of the highest per capita consumption of energy and emissions of greenhouse gases in the world. It is also the global leader in rapid per capita annual deployment of new solar and wind energy, which is causing the country's emissions to decline. Australia is located at low-moderate latitudes along with three quarters of the global population. These factors make the Australian exp… ▽ More Australia has one of the highest per capita consumption of energy and emissions of greenhouse gases in the world. It is also the global leader in rapid per capita annual deployment of new solar and wind energy, which is causing the country's emissions to decline. Australia is located at low-moderate latitudes along with three quarters of the global population. These factors make the Australian experience globally significant. In this study, we model a fully decarbonised electricity system together with complete electrification of heating, transport and industry in Australia leading to an 80% reduction in greenhouse gas emissions. An energy supply-demand balance is simulated based on long-term (10 years), high-resolution (half-hourly) meteorological and energy demand data. A significant feature of this model is that short-term off-river energy storage and distributed energy storage are utilised to support the large-scale integration of variable solar and wind energy. The results show that high levels of energy reliability and affordability can be effectively achieved through a synergy of flexible energy sources; interconnection of electricity grids over large areas; response from demand-side participation; and mass energy storage. This strategy represents a rapid and generic pathway towards zero-carbon energy futures within the Sunbelt. △ Less

Submitted 19 July, 2020; originally announced July 2020.

Comments: Here is a summary of the study: https://www.dropbox.com/s/uvd90goh80y9eda/Zero-carbon%20Australia.pdf?dl=0

arXiv:2006.05539 [pdf, other]

On Matched Filtering for Statistical Change Point Detection

Authors: Kevin C. Cheng, Eric L. Miller, Michael C. Hughes, Shuchin Aeron

Abstract: Non-parametric and distribution-free two-sample tests have been the foundation of many change point detection algorithms. However, randomness in the test statistic as a function of time makes them susceptible to false positives and localization ambiguity. We address these issues by deriving and applying filters matched to the expected temporal signatures of a change for various sliding window, two… ▽ More Non-parametric and distribution-free two-sample tests have been the foundation of many change point detection algorithms. However, randomness in the test statistic as a function of time makes them susceptible to false positives and localization ambiguity. We address these issues by deriving and applying filters matched to the expected temporal signatures of a change for various sliding window, two-sample tests under IID assumptions on the data. These filters are derived asymptotically with respect to the window size for the Wasserstein quantile test, the Wasserstein-1 distance test, Maximum Mean Discrepancy squared (MMD^2), and the Kolmogorov-Smirnov (KS) test. The matched filters are shown to have two important properties. First, they are distribution-free, and thus can be applied without prior knowledge of the underlying data distributions. Second, they are peak-preserving, which allows the filtered signal produced by our methods to maintain expected statistical significance. Through experiments on synthetic data as well as activity recognition benchmarks, we demonstrate the utility of this approach for mitigating false positives and improving the test precision. Our method allows for the localization of change points without the use of ad-hoc post-processing to remove redundant detections common to current methods. We further highlight the performance of statistical tests based on the Quantile-Quantile (Q-Q) function and show how the invariance property of the Q-Q function to order-preserving transformations allows these tests to detect change points of different scales with a single threshold within the same dataset. △ Less

Submitted 27 October, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

arXiv:2005.13201 [pdf, other]

Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation

Authors: Ashwin Raju, Chi-Tung Cheng, Yunakai Huo, Jinzheng Cai, Junzhou Huang, Jing Xiao, Le Lu, ChienHuang Liao, Adam P Harrison

Abstract: In medical imaging, organ/pathology segmentation models trained on current publicly available and fully-annotated datasets usually do not well-represent the heterogeneous modalities, phases, pathologies, and clinical scenarios encountered in real environments. On the other hand, there are tremendous amounts of unlabelled patient imaging scans stored by many modern clinical centers. In this work, w… ▽ More In medical imaging, organ/pathology segmentation models trained on current publicly available and fully-annotated datasets usually do not well-represent the heterogeneous modalities, phases, pathologies, and clinical scenarios encountered in real environments. On the other hand, there are tremendous amounts of unlabelled patient imaging scans stored by many modern clinical centers. In this work, we present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe), which only requires a small labeled cohort of single phase imaging data to adapt to any unlabeled cohort of heterogenous multi-phase data with possibly new clinical scenarios and pathologies. To do this, we propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling. We also introduce co-heterogeneous training, which is a novel integration of co-training and hetero modality learning. We have evaluated CHASe using a clinically comprehensive and challenging dataset of multi-phase computed tomography (CT) imaging studies (1147 patients and 4577 3D volumes). Compared to previous state-of-the-art baselines, CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2\% \sim 9.4\%$, depending on the phase combinations: e.g., from $84.6\%$ to $94.0\%$ on non-contrast CTs. △ Less

Submitted 19 July, 2021; v1 submitted 27 May, 2020; originally announced May 2020.

Comments: 23 pages, 8 figures

arXiv:2005.12209 [pdf, other]

JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi-Modal Image Alignment of Large-scale Pathological CT Scans

Authors: Fengze Liu, Jinzheng Cai, Yuankai Huo, Chi-Tung Cheng, Ashwin Raju, Dakai Jin, Jing Xiao, Alan Yuille, Le Lu, ChienHung Liao, Adam P Harrison

Abstract: Multi-modal image registration is a challenging problem that is also an important clinical task for many real applications and scenarios. As a first step in analysis, deformable registration among different image modalities is often required in order to provide complementary visual information. During registration, semantic information is key to match homologous points and pixels. Nevertheless, ma… ▽ More Multi-modal image registration is a challenging problem that is also an important clinical task for many real applications and scenarios. As a first step in analysis, deformable registration among different image modalities is often required in order to provide complementary visual information. During registration, semantic information is key to match homologous points and pixels. Nevertheless, many conventional registration methods are incapable in capturing high-level semantic anatomical dense correspondences. In this work, we propose a novel multi-task learning system, JSSR, based on an end-to-end 3D convolutional neural network that is composed of a generator, a registration and a segmentation component. The system is optimized to satisfy the implicit constraints between different tasks in an unsupervised manner. It first synthesizes the source domain images into the target domain, then an intra-modal registration is applied on the synthesized images and target images. The segmentation module are then applied on the synthesized and target images, providing additional cues based on semantic correspondences. The supervision from another fully-annotated dataset is used to regularize the segmentation. We extensively evaluate JSSR on a large-scale medical image dataset containing 1,485 patient CT imaging studies of four different contrast phases (i.e., 5,940 3D CT scans with pathological livers) on the registration, segmentation and synthesis tasks. The performance is improved after joint training on the registration and segmentation tasks by 0.9% and 1.9% respectively compared to a highly competitive and accurate deep learning baseline. The registration also consistently outperforms conventional state-of-the-art multi-modal registration methods. △ Less

Submitted 17 July, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

Comments: accepted to ECCV 2020

arXiv:2004.12599 [pdf, other]

Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency

Authors: Cheng-Ming Chiang, Yu Tseng, Yu-Syuan Xu, Hsien-Kai Kuo, Yi-Min Tsai, Guan-Yu Chen, Koan-Sin Tan, Wei-Ting Wang, Yu-Chieh Lin, Shou-Yao Roy Tseng, Wei-Shiang Lin, Chia-Lin Yu, BY Shen, Kloze Kao, Chia-Ming Cheng, Hung-Jen Chen

Abstract: Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency var… ▽ More Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency variation due to the difference and limitation of deep learning accelerators on mobile devices. In this paper, we conduct a search of portable network architectures for better quality-latency trade-off across mobile devices. We further present the effectiveness of widely used network optimizations for image deblurring task. This paper provides comprehensive experiments and comparisons to uncover the in-depth analysis for both latency and image quality. Through all the above works, we demonstrate the successful deployment of image deblurring application on mobile devices with the acceleration of deep learning accelerators. To the best of our knowledge, this is the first paper that addresses all the deployment issues of image deblurring task across mobile devices. This paper provides practical deployment-guidelines, and is adopted by the championship-winning team in NTIRE 2020 Image Deblurring Challenge on Smartphone Track. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: CVPR 2020 Workshop on New Trends in Image Restoration and Enhancement (NTIRE)

arXiv:2003.11152 [pdf, other]

Polynomial graph filter of multiple shifts and distributed implementation of inverse filtering

Authors: Nazar Emirov, Cheng Cheng, Junzheng Jiang, Qiyu Sun

Abstract: Polynomial graph filters and their inverses play important roles in graph signal processing. An advantage of polynomial graph filters is that they can be implemented in a distributed manner, which involves data transmission between adjacent vertices only. The challenge arisen in the inverse filtering is that a direct implementation may suffer from high computational burden, as the inverse graph fi… ▽ More Polynomial graph filters and their inverses play important roles in graph signal processing. An advantage of polynomial graph filters is that they can be implemented in a distributed manner, which involves data transmission between adjacent vertices only. The challenge arisen in the inverse filtering is that a direct implementation may suffer from high computational burden, as the inverse graph filter usually has full bandwidth even if the original filter has small bandwidth. In this paper, we consider distributed implementation of the inverse filtering procedure for a polynomial graph filter of multiple shifts, and we propose two iterative approximation algorithms that can be implemented in a distributed network, where each vertex is equipped with systems for limited data storage, computation power and data exchanging facility to its adjacent vertices. We also demonstrate the effectiveness of the proposed iterative approximation algorithms to implement the inverse filtering procedure and their satisfactory performance to denoise time-varying graph signals and a data set of US hourly temperature at 218 locations. △ Less

Submitted 4 November, 2021; v1 submitted 24 March, 2020; originally announced March 2020.

Showing 1–50 of 77 results for author: Cheng, C