Search | arXiv e-print repository

arXiv:2407.08498 [pdf, other]

ERD: Exponential Retinex decomposition based on weak space and hybrid nonconvex regularization and its denoising application

Authors: Wenjing Lu, Liang Wu, Liming Tang, Zhuang Fang

Abstract: The Retinex theory models the image as a product of illumination and reflection components, which has received extensive attention and is widely used in image enhancement, segmentation and color restoration. However, it has been rarely used in additive noise removal due to the inclusion of both multiplication and addition operations in the Retinex noisy image modeling. In this paper, we propose an… ▽ More The Retinex theory models the image as a product of illumination and reflection components, which has received extensive attention and is widely used in image enhancement, segmentation and color restoration. However, it has been rarely used in additive noise removal due to the inclusion of both multiplication and addition operations in the Retinex noisy image modeling. In this paper, we propose an exponential Retinex decomposition model based on hybrid non-convex regularization and weak space oscillation-modeling for image denoising. The proposed model utilizes non-convex first-order total variation (TV) and non-convex second-order TV to regularize the reflection component and the illumination component, respectively, and employs weak $H^{-1}$ norm to measure the residual component. By utilizing different regularizers, the proposed model effectively decomposes the image into reflection, illumination, and noise components. An alternating direction multipliers method (ADMM) combined with the Majorize-Minimization (MM) algorithm is developed to solve the proposed model. Furthermore, we provide a detailed proof of the convergence property of the algorithm. Numerical experiments validate both the proposed model and algorithm. Compared with several state-of-the-art denoising models, the proposed model exhibits superior performance in terms of peak signal-to-noise ratio (PSNR) and mean structural similarity (MSSIM). △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.05368 [pdf, other]

Music Era Recognition Using Supervised Contrastive Learning and Artist Information

Authors: Qiqi He, Xuchen Song, Weituo Hao, Ju-Chiang Wang, Wei-Tsung Lu, Wei Li

Abstract: Does popular music from the 60s sound different than that of the 90s? Prior study has shown that there would exist some variations of patterns and regularities related to instrumentation changes and growing loudness across multi-decadal trends. This indicates that perceiving the era of a song from musical features such as audio and artist information is possible. Music era information can be an im… ▽ More Does popular music from the 60s sound different than that of the 90s? Prior study has shown that there would exist some variations of patterns and regularities related to instrumentation changes and growing loudness across multi-decadal trends. This indicates that perceiving the era of a song from musical features such as audio and artist information is possible. Music era information can be an important feature for playlist generation and recommendation. However, the release year of a song can be inaccessible in many circumstances. This paper addresses a novel task of music era recognition. We formulate the task as a music classification problem and propose solutions based on supervised contrastive learning. An audio-based model is developed to predict the era from audio. For the case where the artist information is available, we extend the audio-based model to take multimodal inputs and develop a framework, called MultiModal Contrastive (MMC) learning, to enhance the training. Experimental result on Million Song Dataset demonstrates that the audio-based model achieves 54% in accuracy with a tolerance of 3-years range; incorporating the artist information with the MMC framework for training leads to 9% improvement further. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2406.15222 [pdf]

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests. △ Less

Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: under peer review

arXiv:2406.10956 [pdf, other]

Robust Channel Learning for Large-Scale Radio Speaker Verification

Authors: Wenhao Yang, Jianguo Wei, Wenhuan Lu, Lei Li, Xugang Lu

Abstract: Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learnin… ▽ More Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learning (CRSL) framework that enhances the robustness of the current speaker verification pipeline, considering data source, data augmentation, and the efficiency of model transfer processes. Our framework introduces an augmentation module that mitigates bandwidth variations in radio speech datasets by manipulating the bandwidth of training inputs. It also addresses unknown noise by introducing noise within the manifold space. Additionally, we propose an efficient fine-tuning method that reduces the need for extensive additional training time and large amounts of data. Moreover, we develop a toolkit for assembling a large-scale radio speech corpus and establish a benchmark specifically tailored for radio scenario speaker verification studies. Experimental results demonstrate that our proposed methodology effectively enhances performance and mitigates degradation caused by radio transmission in speaker verification tasks. The code will be available on Github. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 12 pages, 11 figures

arXiv:2404.13298 [pdf, other]

MARec: Metadata Alignment for cold-start Recommendation

Authors: Julien Monteil, Volodymyr Vaskovych, Wentao Lu, Anirban Majumder, Anton van den Hengel

Abstract: For many recommender systems the primary data source is a historical record of user clicks. The associated click matrix which is often very sparse, however, as the number of users x products can be far larger than the number of clicks, and such sparsity is accentuated in cold-start settings. The sparsity of the click matrix is the reason matrix factorization and autoencoders techniques remain high… ▽ More For many recommender systems the primary data source is a historical record of user clicks. The associated click matrix which is often very sparse, however, as the number of users x products can be far larger than the number of clicks, and such sparsity is accentuated in cold-start settings. The sparsity of the click matrix is the reason matrix factorization and autoencoders techniques remain highly competitive across collaborative filtering datasets. In this work, we propose a simple approach to address cold-start recommendations by leveraging content metadata, Metadata Alignment for cold-start Recommendation. we show that this approach can readily augment existing matrix factorization and autoencoder approaches, enabling a smooth transition to top performing algorithms in warmer set-ups. Our experimental results indicate three separate contributions: first, we show that our proposed framework largely beats SOTA results on 4 cold-start datasets with different sparsity and scale characteristics, with gains ranging from +8.4% to +53.8% on reported ranking metrics; second, we provide an ablation study on the utility of semantic features, and proves the additional gain obtained by leveraging such features ranges between +46.8% and +105.5%; and third, our approach is by construction highly competitive in warm set-ups, and we propose a closed-form solution outperformed by SOTA results by only 0.8% on average. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.02291 [pdf, other]

Towards a New Configurable and Practical Remote Automotive Security Testing Platform

Authors: Sekar Kulandaivel, Wenjuan Lu, Brandon Barry, Jorge Guajardo

Abstract: In the automotive security sector, the absence of a testing platform that is configurable, practical, and user-friendly presents considerable challenges. These difficulties are compounded by the intricate design of vehicle systems, the rapid evolution of attack vectors, and the absence of standardized testing methodologies. We propose a next-generation testing platform that addresses several chall… ▽ More In the automotive security sector, the absence of a testing platform that is configurable, practical, and user-friendly presents considerable challenges. These difficulties are compounded by the intricate design of vehicle systems, the rapid evolution of attack vectors, and the absence of standardized testing methodologies. We propose a next-generation testing platform that addresses several challenges in vehicle cybersecurity testing and research domains. In this paper, we detail how the Vehicle Security Engineering Cloud (VSEC) Test platform enables easier access to test beds for efficient vehicle cybersecurity testing and advanced (e.g., penetration, fuzz) testing and how we extend such test beds to benefit automotive security research. We highlight methodology on how to use this platform for a variety of users and use cases with real implemented examples. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 7 pages, 2 figures

arXiv:2402.16371 [pdf, other]

Adaptive Online Learning of Separable Path Graph Transforms for Intra-prediction

Authors: Wen-Yang Lu, Eduardo Pavez, Antonio Ortega, Xin Zhao, Shan Liu

Abstract: Current video coding standards, including H.264/AVC, HEVC, and VVC, employ discrete cosine transform (DCT), discrete sine transform (DST), and secondary to Karhunen-Loeve transforms (KLTs) decorrelate the intra-prediction residuals. However, the efficiency of these transforms in decorrelation can be limited when the signal has a non-smooth and non-periodic structure, such as those occurring in tex… ▽ More Current video coding standards, including H.264/AVC, HEVC, and VVC, employ discrete cosine transform (DCT), discrete sine transform (DST), and secondary to Karhunen-Loeve transforms (KLTs) decorrelate the intra-prediction residuals. However, the efficiency of these transforms in decorrelation can be limited when the signal has a non-smooth and non-periodic structure, such as those occurring in textures with intricate patterns. This paper introduces a novel adaptive separable path graph-based transform (GBT) that can provide better decorrelation than the DCT for intra-predicted texture data. The proposed GBT is learned in an online scenario with sequential K-means clustering, which groups similar blocks during encoding and decoding to adaptively learn the GBT for the current block from previously reconstructed areas with similar characteristics. A signaling overhead is added to the bitstream of each coding block to indicate the usage of the proposed graph-based transform. We assess the performance of this method combined with H.264/AVC intra-coding tools and demonstrate that it can significantly outperform H.264/AVC DCT for intra-predicted texture data. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 5 pages, 4 figures

arXiv:2402.09463 [pdf]

Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results

Authors: Kelly Payette, Céline Steger, Roxane Licandro, Priscille de Dumast, Hongwei Bran Li, Matthew Barkovich, Liu Li, Maik Dannecker, Chen Chen, Cheng Ouyang, Niccolò McConnell, Alina Miron, Yongmin Li, Alena Uus, Irina Grigorescu, Paula Ramirez Gilliland, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Haoyu Wang, Ziyan Huang, Jin Ye, Mireia Alenyà, Valentin Comte, Oscar Camara , et al. (42 additional authors not shown)

Abstract: Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif… ▽ More Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across different imaging centers remains unsolved, limiting real-world clinical applicability. The multi-center FeTA Challenge 2022 focuses on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two imaging centers as well as two additional unseen centers. The data from different centers varied in many aspects, including scanners used, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated in the challenge, and 17 algorithms were evaluated. Here, a detailed overview and analysis of the challenge results are provided, focusing on the generalizability of the submissions. Both in- and out of domain, the white matter and ventricles were segmented with the highest accuracy, while the most challenging structure remains the cerebral cortex due to anatomical complexity. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms. The resulting new methods contribute to improving the analysis of brain development in utero. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Results from FeTA Challenge 2022, held at MICCAI; Manuscript submitted. Supplementary Info (including submission methods descriptions) available here: https://zenodo.org/records/10628648

arXiv:2402.03585 [pdf, other]

Decoder-Only Image Registration

Authors: Xi Jia, Wenqi Lu, Xinxing Cheng, Jinming Duan

Abstract: In unsupervised medical image registration, the predominant approaches involve the utilization of a encoder-decoder network architecture, allowing for precise prediction of dense, full-resolution displacement fields from given paired images. Despite its widespread use in the literature, we argue for the necessity of making both the encoder and decoder learnable in such an architecture. For this, w… ▽ More In unsupervised medical image registration, the predominant approaches involve the utilization of a encoder-decoder network architecture, allowing for precise prediction of dense, full-resolution displacement fields from given paired images. Despite its widespread use in the literature, we argue for the necessity of making both the encoder and decoder learnable in such an architecture. For this, we propose a novel network architecture, termed LessNet in this paper, which contains only a learnable decoder, while entirely omitting the utilization of a learnable encoder. LessNet substitutes the learnable encoder with simple, handcrafted features, eliminating the need to learn (optimize) network parameters in the encoder altogether. Consequently, this leads to a compact, efficient, and decoder-only architecture for 3D medical image registration. Evaluated on two publicly available brain MRI datasets, we demonstrate that our decoder-only LessNet can effectively and efficiently learn both dense displacement and diffeomorphic deformation fields in 3D. Furthermore, our decoder-only LessNet can achieve comparable registration performance to state-of-the-art methods such as VoxelMorph and TransMorph, while requiring significantly fewer computational resources. Our code and pre-trained models are available at https://github.com/xi-jia/LessNet. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2310.01809 [pdf, other]

Mel-Band RoFormer for Music Source Separation

Authors: Ju-Chiang Wang, Wei-Tsung Lu, Minz Won

Abstract: Recently, multi-band spectrogram-based approaches such as Band-Split RNN (BSRNN) have demonstrated promising results for music source separation. In our recent work, we introduce the BS-RoFormer model which inherits the idea of band-split scheme in BSRNN at the front-end, and then uses the hierarchical Transformer with Rotary Position Embedding (RoPE) to model the inner-band and inter-band sequenc… ▽ More Recently, multi-band spectrogram-based approaches such as Band-Split RNN (BSRNN) have demonstrated promising results for music source separation. In our recent work, we introduce the BS-RoFormer model which inherits the idea of band-split scheme in BSRNN at the front-end, and then uses the hierarchical Transformer with Rotary Position Embedding (RoPE) to model the inner-band and inter-band sequences for multi-band mask estimation. This model has achieved state-of-the-art performance, but the band-split scheme is defined empirically, without analytic supports from the literature. In this paper, we propose Mel-RoFormer, which adopts the Mel-band scheme that maps the frequency bins into overlapped subbands according to the mel scale. In contract, the band-split mapping in BSRNN and BS-RoFormer is non-overlapping and designed based on heuristics. Using the MUSDB18HQ dataset for experiments, we demonstrate that Mel-RoFormer outperforms BS-RoFormer in the separation tasks of vocals, drums, and other stems. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: submitted as an ISMIR 2023 late-breaking and demo paper

arXiv:2309.02612 [pdf, other]

Music Source Separation with Band-Split RoPE Transformer

Authors: Wei-Tsung Lu, Ju-Chiang Wang, Qiuqiang Kong, Yun-Ning Hung

Abstract: Music source separation (MSS) aims to separate a music recording into multiple musically distinct stems, such as vocals, bass, drums, and more. Recently, deep learning approaches such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been used, but the improvement is still limited. In this paper, we propose a novel frequency-domain approach based on a Band-Split RoP… ▽ More Music source separation (MSS) aims to separate a music recording into multiple musically distinct stems, such as vocals, bass, drums, and more. Recently, deep learning approaches such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been used, but the improvement is still limited. In this paper, we propose a novel frequency-domain approach based on a Band-Split RoPE Transformer (called BS-RoFormer). BS-RoFormer relies on a band-split module to project the input complex spectrogram into subband-level representations, and then arranges a stack of hierarchical Transformers to model the inner-band as well as inter-band sequences for multi-band mask estimation. To facilitate training the model for MSS, we propose to use the Rotary Position Embedding (RoPE). The BS-RoFormer system trained on MUSDB18HQ and 500 extra songs ranked the first place in the MSS track of Sound Demixing Challenge (SDX23). Benchmarking a smaller version of BS-RoFormer on MUSDB18HQ, we achieve state-of-the-art result without extra training data, with 9.80 dBでしべる of average SDR. △ Less

Submitted 9 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: This paper explains the SAMI-ByteDance MSS system submitted to Sound Demixing Challenge (SDX23) Music Separation Track. Version 2 of paper fixed some typos

arXiv:2308.04922 [pdf]

HSD-PAM: High Speed Super Resolution Deep Penetration Photoacoustic Microscopy Imaging Boosted by Dual Branch Fusion Network

Authors: Zhengyuan Zhang, Haoran Jin, Zesheng Zheng, Wenwen Zhang, Wenhao Lu, Feng Qin, Arunima Sharma, Manojit Pramanik, Yuanjin Zheng

Abstract: Photoacoustic microscopy (PAM) is a novel implementation of photoacoustic imaging (PAI) for visualizing the 3D bio-structure, which is realized by raster scanning of the tissue. However, as three involved critical imaging parameters, imaging speed, lateral resolution, and penetration depth have mutual effect to one the other. The improvement of one parameter results in the degradation of other two… ▽ More Photoacoustic microscopy (PAM) is a novel implementation of photoacoustic imaging (PAI) for visualizing the 3D bio-structure, which is realized by raster scanning of the tissue. However, as three involved critical imaging parameters, imaging speed, lateral resolution, and penetration depth have mutual effect to one the other. The improvement of one parameter results in the degradation of other two parameters, which constrains the overall performance of the PAM system. Here, we propose to break these limitations by hardware and software co-design. Starting with low lateral resolution, low sampling rate AR-PAM imaging which possesses the deep penetration capability, we aim to enhance the lateral resolution and up sampling the images, so that high speed, super resolution, and deep penetration for the PAM system (HSD-PAM) can be achieved. Data-driven based algorithm is a promising approach to solve this issue, thereby a dedicated novel dual branch fusion network is proposed, which includes a high resolution branch and a high speed branch. Since the availability of switchable AR-OR-PAM imaging system, the corresponding low resolution, undersample AR-PAM and high resolution, full sampled OR-PAM image pairs are utilized for training the network. Extensive simulation and in vivo experiments have been conducted to validate the trained model, enhancement results have proved the proposed algorithm achieved the best perceptual and quantitative image quality. As a result, the imaging speed is increased 16 times and the imaging lateral resolution is improved 5 times, while the deep penetration merit of AR-PAM modality is still reserved. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.02282 [pdf, other]

DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization

Authors: Wang Lu, Jindong Wang, Xinwei Sun, Yiqiang Chen, Xiangyang Ji, Qiang Yang, Xing Xie

Abstract: Time series remains one of the most challenging modalities in machine learning research. The out-of-distribution (OOD) detection and generalization on time series tend to suffer due to its non-stationary property, i.e., the distribution changes over time. The dynamic distributions inside time series pose great challenges to existing algorithms to identify invariant distributions since they mainly… ▽ More Time series remains one of the most challenging modalities in machine learning research. The out-of-distribution (OOD) detection and generalization on time series tend to suffer due to its non-stationary property, i.e., the distribution changes over time. The dynamic distributions inside time series pose great challenges to existing algorithms to identify invariant distributions since they mainly focus on the scenario where the domain information is given as prior knowledge. In this paper, we attempt to exploit subdomains within a whole dataset to counteract issues induced by non-stationary for generalized representation learning. We propose DIVERSIFY, a general framework, for OOD detection and generalization on dynamic distributions of time series. DIVERSIFY takes an iterative process: it first obtains the "worst-case" latent distribution scenario via adversarial training, then reduces the gap between these latent distributions. We implement DIVERSIFY via combining existing OOD detection methods according to either extracted features or outputs of models for detection while we also directly utilize outputs for classification. In addition, theoretical insights illustrate that DIVERSIFY is theoretically supported. Extensive experiments are conducted on seven datasets with different OOD settings across gesture recognition, speech commands recognition, wearable stress and affect detection, and sensor-based human activity recognition. Qualitative and quantitative results demonstrate that DIVERSIFY learns more generalized features and significantly outperforms other baselines. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: Journal version of arXiv:2209.07027; 17 pages

arXiv:2307.02997 [pdf, other]

Fourier-Net+: Leveraging Band-Limited Representation for Efficient 3D Medical Image Registration

Authors: Xi Jia, Alexander Thorley, Alberto Gomez, Wenqi Lu, Dipak Kotecha, Jinming Duan

Abstract: U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields, which for high-resolution volumetric image data is a resource-intensive and time-consuming task. To tackle this challenge, we first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder. Instead of directly predicting a f… ▽ More U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields, which for high-resolution volumetric image data is a resource-intensive and time-consuming task. To tackle this challenge, we first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder. Instead of directly predicting a full-resolution displacement field, our Fourier-Net learns a low-dimensional representation of the displacement field in the band-limited Fourier domain which our model-driven decoder converts to a full-resolution displacement field in the spatial domain. Expanding upon Fourier-Net, we then introduce Fourier-Net+, which additionally takes the band-limited spatial representation of the images as input and further reduces the number of convolutional layers in the U-Net style network's contracting path. Finally, to enhance the registration performance, we propose a cascaded version of Fourier-Net+. We evaluate our proposed methods on three datasets, on which our proposed Fourier-Net and its variants achieve comparable results with current state-of-the art methods, while exhibiting faster inference speeds, lower memory footprint, and fewer multiply-add operations. With such small computational cost, our Fourier-Net+ enables the efficient training of large-scale 3D registration on low-VRAM GPUs. Our code is publicly available at \url{https://github.com/xi-jia/Fourier-Net}. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: Under review. arXiv admin note: text overlap with arXiv:2211.16342

arXiv:2306.10785 [pdf, other]

Multitrack Music Transcription with a Time-Frequency Perceiver

Authors: Wei-Tsung Lu, Ju-Chiang Wang, Yun-Ning Hung

Abstract: Multitrack music transcription aims to transcribe a music audio input into the musical notes of multiple instruments simultaneously. It is a very challenging task that typically requires a more complex model to achieve satisfactory result. In addition, prior works mostly focus on transcriptions of regular instruments, however, neglecting vocals, which are usually the most important signal source i… ▽ More Multitrack music transcription aims to transcribe a music audio input into the musical notes of multiple instruments simultaneously. It is a very challenging task that typically requires a more complex model to achieve satisfactory result. In addition, prior works mostly focus on transcriptions of regular instruments, however, neglecting vocals, which are usually the most important signal source if present in a piece of music. In this paper, we propose a novel deep neural network architecture, Perceiver TF, to model the time-frequency representation of audio input for multitrack transcription. Perceiver TF augments the Perceiver architecture by introducing a hierarchical expansion with an additional Transformer layer to model temporal coherence. Accordingly, our model inherits the benefits of Perceiver that posses better scalability, allowing it to well handle transcriptions of many instruments in a single model. In experiments, we train a Perceiver TF to model 12 instrument classes as well as vocal in a multi-task learning manner. Our result demonstrates that the proposed system outperforms the state-of-the-art counterparts (e.g., MT3 and SpecTNT) on various public datasets. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: ICASSP 2023

arXiv:2305.08712 [pdf, ps, other]

Model Predictive Control with Reach-avoid Analysis

Authors: Dejin Ren, Wanli Lu, Jidong Lv, Lijun Zhang, Bai Xue

Abstract: In this paper we investigate the optimal controller synthesis problem, so that the system under the controller can reach a specified target set while satisfying given constraints. Existing model predictive control (MPC) methods learn from a set of discrete states visited by previous (sub-)optimized trajectories and thus result in computationally expensive mixed-integer nonlinear optimization. In t… ▽ More In this paper we investigate the optimal controller synthesis problem, so that the system under the controller can reach a specified target set while satisfying given constraints. Existing model predictive control (MPC) methods learn from a set of discrete states visited by previous (sub-)optimized trajectories and thus result in computationally expensive mixed-integer nonlinear optimization. In this paper a novel MPC method is proposed based on reach-avoid analysis to solve the controller synthesis problem iteratively. The reach-avoid analysis is concerned with computing a reach-avoid set which is a set of initial states such that the system can reach the target set successfully. It not only provides terminal constraints, which ensure feasibility of MPC, but also expands discrete states in existing methods into a continuous set (i.e., reach-avoid sets) and thus leads to nonlinear optimization which is more computationally tractable online due to the absence of integer variables. Finally, we evaluate the proposed method and make comparisons with state-of-the-art ones based on several examples. △ Less

Submitted 21 June, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.05548 [pdf, ps, other]

CIT-EmotionNet: CNN Interactive Transformer Network for EEG Emotion Recognition

Authors: Wei Lu, Hua Ma, Tien-Ping Tan

Abstract: Emotion recognition using Electroencephalogram (EEG) signals has emerged as a significant research challenge in affective computing and intelligent interaction. However, effectively combining global and local features of EEG signals to improve performance in emotion recognition is still a difficult task. In this study, we propose a novel CNN Interactive Transformer Network for EEG Emotion Recognit… ▽ More Emotion recognition using Electroencephalogram (EEG) signals has emerged as a significant research challenge in affective computing and intelligent interaction. However, effectively combining global and local features of EEG signals to improve performance in emotion recognition is still a difficult task. In this study, we propose a novel CNN Interactive Transformer Network for EEG Emotion Recognition, known as CIT-EmotionNet, which efficiently integrates global and local features of EEG signals. Initially, we convert raw EEG signals into spatial-frequency representations, which serve as inputs. Then, we integrate Convolutional Neural Network (CNN) and Transformer within a single framework in a parallel manner. Finally, we design a CNN interactive Transformer module, which facilitates the interaction and fusion of local and global features, thereby enhancing the model's ability to extract both types of features from EEG spatial-frequency representations. The proposed CIT-EmotionNet outperforms state-of-the-art methods, achieving an average recognition accuracy of 98.57\% and 92.09\% on two publicly available datasets, SEED and SEED-IV, respectively. △ Less

Submitted 7 May, 2023; originally announced May 2023.

Comments: 10 pages,3 tables

arXiv:2303.10770 [pdf, other]

doi 10.1002/aisy.202400265

RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network

Authors: Sangmin Yoo, Eric Yeu-Jer Lee, Ziyu Wang, Xinxin Wang, Wei D. Lu

Abstract: Event-based cameras are inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the event data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks that are expensive to train. In this work, we propose a neural network architecture, Reservoir Nodes-enabled neuromorphic vision… ▽ More Event-based cameras are inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the event data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks that are expensive to train. In this work, we propose a neural network architecture, Reservoir Nodes-enabled neuromorphic vision sensing Network (RN-Net), based on simple convolution layers integrated with dynamic temporal encoding reservoirs for local and global spatiotemporal feature detection with low hardware and training costs. The RN-Net allows efficient processing of asynchronous temporal features, and achieves the highest accuracy of 99.2% for DVS128 Gesture reported to date, and one of the highest accuracy of 67.5% for DVS Lip dataset at a much smaller network size. By leveraging the internal device and circuit dynamics, asynchronous temporal feature encoding can be implemented at very low hardware cost without preprocessing and dedicated memory and arithmetic units. The use of simple DNN blocks and standard backpropagation-based training rules further reduces implementation costs. △ Less

Submitted 24 May, 2024; v1 submitted 19 March, 2023; originally announced March 2023.

Comments: 12 pages, 5 figures, 4 tables

arXiv:2302.08715 [pdf, other]

EEP-3DQA: Efficient and Effective Projection-based 3D Model Quality Assessment

Authors: Zicheng Zhang, Wei Sun, Yingjie Zhou, Wei Lu, Yucheng Zhu, Xiongkuo Min, Guangtao Zhai

Abstract: Currently, great numbers of efforts have been put into improving the effectiveness of 3D model quality assessment (3DQA) methods. However, little attention has been paid to the computational costs and inference time, which is also important for practical applications. Unlike 2D media, 3D models are represented by more complicated and irregular digital formats, such as point cloud and mesh. Thus it… ▽ More Currently, great numbers of efforts have been put into improving the effectiveness of 3D model quality assessment (3DQA) methods. However, little attention has been paid to the computational costs and inference time, which is also important for practical applications. Unlike 2D media, 3D models are represented by more complicated and irregular digital formats, such as point cloud and mesh. Thus it is normally difficult to perform an efficient module to extract quality-aware features of 3D models. In this paper, we address this problem from the aspect of projection-based 3DQA and develop a no-reference (NR) \underline{E}fficient and \underline{E}ffective \underline{P}rojection-based \underline{3D} Model \underline{Q}uality \underline{A}ssessment (\textbf{EEP-3DQA}) method. The input projection images of EEP-3DQA are randomly sampled from the six perpendicular viewpoints of the 3D model and are further spatially downsampled by the grid-mini patch sampling strategy. Further, the lightweight Swin-Transformer tiny is utilized as the backbone to extract the quality-aware features. Finally, the proposed EEP-3DQA and EEP-3DQA-t (tiny version) achieve the best performance than the existing state-of-the-art NR-3DQA methods and even outperforms most full-reference (FR) 3DQA methods on the point cloud and mesh quality assessment databases while consuming less inference time than the compared 3DQA methods. △ Less

Submitted 27 August, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

arXiv:2212.10901 [pdf, other]

ALCAP: Alignment-Augmented Music Captioner

Authors: Zihao He, Weituo Hao, Wei-Tsung Lu, Changyou Chen, Kristina Lerman, Xuchen Song

Abstract: Music captioning has gained significant attention in the wake of the rising prominence of streaming media platforms. Traditional approaches often prioritize either the audio or lyrics aspect of the music, inadvertently ignoring the intricate interplay between the two. However, a comprehensive understanding of music necessitates the integration of both these elements. In this study, we delve into t… ▽ More Music captioning has gained significant attention in the wake of the rising prominence of streaming media platforms. Traditional approaches often prioritize either the audio or lyrics aspect of the music, inadvertently ignoring the intricate interplay between the two. However, a comprehensive understanding of music necessitates the integration of both these elements. In this study, we delve into this overlooked realm by introducing a method to systematically learn multimodal alignment between audio and lyrics through contrastive learning. This not only recognizes and emphasizes the synergy between audio and lyrics but also paves the way for models to achieve deeper cross-modal coherence, thereby producing high-quality captions. We provide both theoretical and empirical results demonstrating the advantage of the proposed method, which achieves new state-of-the-art on two music captioning datasets. △ Less

Submitted 21 October, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

arXiv:2210.08868 [pdf, other]

Cerebrovascular Segmentation via Vessel Oriented Filtering Network

Authors: Zhanqiang Guo, Yao Luan, Jianjiang Feng, Wangsheng Lu, Yin Yin, Guangming Yang, Jie Zhou

Abstract: Accurate cerebrovascular segmentation from Magnetic Resonance Angiography (MRA) and Computed Tomography Angiography (CTA) is of great significance in diagnosis and treatment of cerebrovascular pathology. Due to the complexity and topology variability of blood vessels, complete and accurate segmentation of vascular network is still a challenge. In this paper, we proposed a Vessel Oriented Filtering… ▽ More Accurate cerebrovascular segmentation from Magnetic Resonance Angiography (MRA) and Computed Tomography Angiography (CTA) is of great significance in diagnosis and treatment of cerebrovascular pathology. Due to the complexity and topology variability of blood vessels, complete and accurate segmentation of vascular network is still a challenge. In this paper, we proposed a Vessel Oriented Filtering Network (VOF-Net) which embeds domain knowledge into the convolutional neural network. We design oriented filters for blood vessels according to vessel orientation field, which is obtained by orientation estimation network. Features extracted by oriented filtering are injected into segmentation network, so as to make use of the prior information that the blood vessels are slender and curved tubular structure. Experimental results on datasets of CTA and MRA show that the proposed method is effective for vessel segmentation, and embedding the specific vascular filter improves the segmentation performance. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2208.04939 [pdf, other]

U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?

Authors: Xi Jia, Joseph Bartlett, Tianyang Zhang, Wenqi Lu, Zhaowen Qiu, Jinming Duan

Abstract: Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based metho… ▽ More Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based methods are outdated compared to modern transformer-based approaches when applied to medical image registration. For this, we propose a large kernel U-Net (LKU-Net) by embedding a parallel convolutional block to a vanilla U-Net in order to enhance the effective receptive field. On the public 3D IXI brain dataset for atlas-based registration, we show that the performance of the vanilla U-Net is already comparable with that of state-of-the-art transformer-based networks (such as TransMorph), and that the proposed LKU-Net outperforms TransMorph by using only 1.12% of its parameters and 10.8% of its mult-adds operations. We further evaluate LKU-Net on a MICCAI Learn2Reg 2021 challenge dataset for inter-subject registration, our LKU-Net also outperforms TransMorph on this dataset and ranks first on the public leaderboard as of the submission of this work. With only modest modifications to the vanilla U-Net, we show that U-Net can outperform transformer-based architectures on inter-subject and atlas-based 3D medical image registration. Code is available at https://github.com/xi-jia/LKU-Net. △ Less

Submitted 13 August, 2022; v1 submitted 7 August, 2022; originally announced August 2022.

Comments: Accepted to MICCAI-MLMI 2022

arXiv:2207.12027 [pdf, other]

Non-cascaded Control Barrier Functions for the Safe Control of Quadrotors

Authors: Weifeng Zeng, Huanhui Cao, Wenjie Lu, Hao Xiong

Abstract: Researchers have developed various cascaded controllers and non-cascaded controllers for the navigation and control of quadrotors in recent years. It is vital to ensure the safety of a quadrotor both in normal state and in abnormal state if a controller tends to make the quadrotor unsafe. To this end, this paper proposes a non-cascaded Control Barrier Function (CBF) for a quadrotor controlled by e… ▽ More Researchers have developed various cascaded controllers and non-cascaded controllers for the navigation and control of quadrotors in recent years. It is vital to ensure the safety of a quadrotor both in normal state and in abnormal state if a controller tends to make the quadrotor unsafe. To this end, this paper proposes a non-cascaded Control Barrier Function (CBF) for a quadrotor controlled by either cascaded controllers or a non-cascaded controller. Incorporated with a Quadratic Programming (QP), the non-cascaded CBF can simultaneously regulate the magnitude of the total thrust and the torque of the quadrotor determined a controller, so as to ensure the safety of the quadrotor both in normal state and in abnormal state. The non-cascaded CBF establishes a non-conservative forward invariant safe region, in which the controller of a quadrotor is fully or partially effective in the navigation or the pose control of the quadrotor. The non-cascaded CBF is applied to a quadrotor performing trajectory tracking and a quadrotor performing aggressive roll maneuvers in simulations to evaluate the effectiveness of the non-cascaded CBF. △ Less

Submitted 25 July, 2022; originally announced July 2022.

arXiv:2207.00842 [pdf, other]

Safe Reinforcement Learning for a Robot Being Pursued but with Objectives Covering More Than Capture-avoidance

Authors: Huanhui Cao, Zhiyuan Cai, Hairuo Wei, Wenjie Lu, Lin Zhang, Hao Xiong

Abstract: Reinforcement Learning (RL) algorithms show amazing performance in recent years, but placing RL in real-world applications such as self-driven vehicles may suffer safety problems. A self-driven vehicle moving to a target position following a learned policy may suffer a vehicle with unpredictable aggressive behaviors or even being pursued by a vehicle following a Nash strategy. To address the safet… ▽ More Reinforcement Learning (RL) algorithms show amazing performance in recent years, but placing RL in real-world applications such as self-driven vehicles may suffer safety problems. A self-driven vehicle moving to a target position following a learned policy may suffer a vehicle with unpredictable aggressive behaviors or even being pursued by a vehicle following a Nash strategy. To address the safety issue of the self-driven vehicle in this scenario, this paper conducts a preliminary study based on a system of robots. A safe RL framework with safety guarantees is developed for a robot being pursued but with objectives covering more than capture-avoidance. Simulations and experiments are conducted based on the system of robots to evaluate the effectiveness of the developed safe RL framework. △ Less

Submitted 2 July, 2022; originally announced July 2022.

arXiv:2206.05054 [pdf, other]

A No-reference Quality Assessment Metric for Point Cloud Based on Captured Video Sequences

Authors: Yu Fan, Zicheng Zhang, Wei Sun, Xiongkuo Min, Wei Lu, Tao Wang, Ning Liu, Guangtao Zhai

Abstract: Point cloud is one of the most widely used digital formats of 3D models, the visual quality of which is quite sensitive to distortions such as downsampling, noise, and compression. To tackle the challenge of point cloud quality assessment (PCQA) in scenarios where reference is not available, we propose a no-reference quality assessment metric for colored point cloud based on captured video sequenc… ▽ More Point cloud is one of the most widely used digital formats of 3D models, the visual quality of which is quite sensitive to distortions such as downsampling, noise, and compression. To tackle the challenge of point cloud quality assessment (PCQA) in scenarios where reference is not available, we propose a no-reference quality assessment metric for colored point cloud based on captured video sequences. Specifically, three video sequences are obtained by rotating the camera around the point cloud through three specific orbits. The video sequences not only contain the static views but also include the multi-frame temporal information, which greatly helps understand the human perception of the point clouds. Then we modify the ResNet3D as the feature extraction model to learn the correlation between the capture videos and corresponding subjective quality scores. The experimental results show that our method outperforms most of the state-of-the-art full-reference and no-reference PCQA metrics, which validates the effectiveness of the proposed method. △ Less

Submitted 20 September, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: Accepted to IEEE 24th International Workshop on Multimedia Signal Processing, 2022

arXiv:2206.04289 [pdf, other]

A No-Reference Deep Learning Quality Assessment Method for Super-resolution Images Based on Frequency Maps

Authors: Zicheng Zhang, Wei Sun, Xiongkuo Min, Wenhan Zhu, Tao Wang, Wei Lu, Guangtao Zhai

Abstract: To support the application scenarios where high-resolution (HR) images are urgently needed, various single image super-resolution (SISR) algorithms are developed. However, SISR is an ill-posed inverse problem, which may bring artifacts like texture shift, blur, etc. to the reconstructed images, thus it is necessary to evaluate the quality of super-resolution images (SRIs). Note that most existing… ▽ More To support the application scenarios where high-resolution (HR) images are urgently needed, various single image super-resolution (SISR) algorithms are developed. However, SISR is an ill-posed inverse problem, which may bring artifacts like texture shift, blur, etc. to the reconstructed images, thus it is necessary to evaluate the quality of super-resolution images (SRIs). Note that most existing image quality assessment (IQA) methods were developed for synthetically distorted images, which may not work for SRIs since their distortions are more diverse and complicated. Therefore, in this paper, we propose a no-reference deep-learning image quality assessment method based on frequency maps because the artifacts caused by SISR algorithms are quite sensitive to frequency information. Specifically, we first obtain the high-frequency map (HM) and low-frequency map (LM) of SRI by using Sobel operator and piecewise smooth image approximation. Then, a two-stream network is employed to extract the quality-aware features of both frequency maps. Finally, the features are regressed into a single quality value using fully connected layers. The experimental results show that our method outperforms all compared IQA models on the selected three super-resolution quality assessment (SRQA) databases. △ Less

Submitted 9 June, 2022; originally announced June 2022.

arXiv:2205.14701 [pdf, other]

Modeling Beats and Downbeats with a Time-Frequency Transformer

Authors: Yun-Ning Hung, Ju-Chiang Wang, Xuchen Song, Wei-Tsung Lu, Minz Won

Abstract: Transformer is a successful deep neural network (DNN) architecture that has shown its versatility not only in natural language processing but also in music information retrieval (MIR). In this paper, we present a novel Transformer-based approach to tackle beat and downbeat tracking. This approach employs SpecTNT (Spectral-Temporal Transformer in Transformer), a variant of Transformer that models b… ▽ More Transformer is a successful deep neural network (DNN) architecture that has shown its versatility not only in natural language processing but also in music information retrieval (MIR). In this paper, we present a novel Transformer-based approach to tackle beat and downbeat tracking. This approach employs SpecTNT (Spectral-Temporal Transformer in Transformer), a variant of Transformer that models both spectral and temporal dimensions of a time-frequency input of music audio. A SpecTNT model uses a stack of blocks, where each consists of two levels of Transformer encoders. The lower-level (or spectral) encoder handles the spectral features and enables the model to pay attention to harmonic components of each frame. Since downbeats indicate bar boundaries and are often accompanied by harmonic changes, this step may help downbeat modeling. The upper-level (or temporal) encoder aggregates useful local spectral information to pay attention to beat/downbeat positions. We also propose an architecture that combines SpecTNT with a state-of-the-art model, Temporal Convolutional Networks (TCN), to further improve the performance. Extensive experiments demonstrate that our approach can significantly outperform TCN in downbeat tracking while maintaining comparable result in beat tracking. △ Less

Submitted 29 May, 2022; originally announced May 2022.

Comments: This paper is accepted for publication at ICASSP 2022

arXiv:2205.09107 [pdf]

doi 10.1088/1361-6560/acf2e2

Leveraging Global Binary Masks for Structure Segmentation in Medical Images

Authors: Mahdieh Kazemimoghadam, Zi Yang, Lin Ma, Mingli Chen, Weiguo Lu, Xuejun Gu

Abstract: Deep learning (DL) models for medical image segmentation are highly influenced by intensity variations of input images and lack generalization due to primarily utilizing pixels' intensity information for inference. Acquiring sufficient training data is another challenge limiting models' applications. We proposed to leverage the consistency of organs' anatomical shape and position information in me… ▽ More Deep learning (DL) models for medical image segmentation are highly influenced by intensity variations of input images and lack generalization due to primarily utilizing pixels' intensity information for inference. Acquiring sufficient training data is another challenge limiting models' applications. We proposed to leverage the consistency of organs' anatomical shape and position information in medical images. We introduced a framework leveraging recurring anatomical patterns through global binary masks for organ segmentation. Two scenarios were studied.1) Global binary masks were the only model's (i.e. U-Net) input, forcing exclusively encoding organs' position and shape information for segmentation/localization.2) Global binary masks were incorporated as an additional channel functioning as position/shape clues to mitigate training data scarcity. Two datasets of the brain and heart CT images with their ground-truth were split into (26:10:10) and (12:3:5) for training, validation, and test respectively. Training exclusively on global binary masks led to Dice scores of 0.77(0.06) and 0.85(0.04), with the average Euclidian distance of 3.12(1.43)mm and 2.5(0.93)mm relative to the center of mass of the ground truth for the brain and heart structures respectively. The outcomes indicate that a surprising degree of position and shape information is encoded through global binary masks. Incorporating global binary masks led to significantly higher accuracy relative to the model trained on only CT images in small subsets of training data; the performance improved by 4.3-125.3% and 1.3-48.1% for 1-8 training cases of the brain and heart datasets respectively. The findings imply the advantages of utilizing global binary masks for building generalizable models and to compensate for training data scarcity. △ Less

Submitted 24 August, 2023; v1 submitted 13 May, 2022; originally announced May 2022.

arXiv:2205.07021 [pdf, other]

doi 10.1109/EMBC48229.2022.9871734

Self-supervised Assisted Active Learning for Skin Lesion Segmentation

Authors: Ziyuan Zhao, Wenjing Lu, Zeng Zeng, Kaixin Xu, Bharadwaj Veeravalli, Cuntai Guan

Abstract: Label scarcity has been a long-standing issue for biomedical image segmentation, due to high annotation costs and professional requirements. Recently, active learning (AL) strategies strive to reduce annotation costs by querying a small portion of data for annotation, receiving much traction in the field of medical imaging. However, most of the existing AL methods have to initialize models with so… ▽ More Label scarcity has been a long-standing issue for biomedical image segmentation, due to high annotation costs and professional requirements. Recently, active learning (AL) strategies strive to reduce annotation costs by querying a small portion of data for annotation, receiving much traction in the field of medical imaging. However, most of the existing AL methods have to initialize models with some randomly selected samples followed by active selection based on various criteria, such as uncertainty and diversity. Such random-start initialization methods inevitably introduce under-value redundant samples and unnecessary annotation costs. For the purpose of addressing the issue, we propose a novel self-supervised assisted active learning framework in the cold-start setting, in which the segmentation model is first warmed up with self-supervised learning (SSL), and then SSL features are used for sample selection via latent feature clustering without accessing labels. We assess our proposed methodology on skin lesions segmentation task. Extensive experiments demonstrate that our approach is capable of achieving promising performance with substantial improvements over existing baselines. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: Accepted by the 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2022)

Journal ref: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2204.14047 [pdf, other]

doi 10.1145/3503161.3548329

A Deep Learning based No-reference Quality Assessment Model for UGC Videos

Authors: Wei Sun, Xiongkuo Min, Wei Lu, Guangtao Zhai

Abstract: Quality assessment for User Generated Content (UGC) videos plays an important role in ensuring the viewing experience of end-users. Previous UGC video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of UGC videos for quality regression, which are regarded as the sub-optimal solutions because of the… ▽ More Quality assessment for User Generated Content (UGC) videos plays an important role in ensuring the viewing experience of end-users. Previous UGC video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of UGC videos for quality regression, which are regarded as the sub-optimal solutions because of the domain shifts between these tasks and the UGC VQA task. In this paper, we propose a very simple but effective UGC VQA model, which tries to address this problem by training an end-to-end spatial feature extraction network to directly learn the quality-aware spatial feature representation from raw pixels of the video frames. We also extract the motion features to measure the temporal-related distortions that the spatial features cannot model. The proposed model utilizes very sparse frames to extract spatial features and dense frames (i.e. the video chunk) with a very low spatial resolution to extract motion features, which thereby has low computational complexity. With the better quality-aware features, we only use the simple multilayer perception layer (MLP) network to regress them into the chunk-level quality scores, and then the temporal average pooling strategy is adopted to obtain the video-level quality score. We further introduce a multi-scale quality fusion strategy to solve the problem of VQA across different spatial resolutions, where the multi-scale weights are obtained from the contrast sensitivity function of the human visual system. The experimental results show that the proposed model achieves the best performance on five popular UGC VQA databases, which demonstrates the effectiveness of the proposed model. The code will be publicly available. △ Less

Submitted 20 October, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

Comments: Accepted by ACM MM 2022

Journal ref: Proceedings of the 30th ACM International Conference on Multimedia (2022) 856-865

arXiv:2203.09098 [pdf, other]

TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

Authors: Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin, Junhai Xu, Lin Zhang, Yantao Ji, Jianwu Dang

Abstract: Speaker embedding is an important front-end module to explore discriminative speaker features for many speech applications where speaker information is needed. Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation. However, naively adding many branches of multi-scale f… ▽ More Speaker embedding is an important front-end module to explore discriminative speaker features for many speech applications where speaker information is needed. Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation. However, naively adding many branches of multi-scale features with the simple fully convolutional operation could not efficiently improve the performance due to the rapid increase of model parameters and computational complexity. Therefore, in the most current state-of-the-art network architectures, only a few branches corresponding to a limited number of temporal scales could be designed for speaker embeddings. To address this problem, in this paper, we propose an effective temporal multi-scale (TMS) model where multi-scale branches could be efficiently designed in a speaker embedding network almost without increasing computational costs. The new model is based on the conventional TDNN, where the network architecture is smartly separated into two modeling operators: a channel-modeling operator and a temporal multi-branch modeling operator. Adding temporal multi-scale in the temporal multi-branch operator needs only a little bit increase of the number of parameters, and thus save more computational budget for adding more branches with large temporal scales. Moreover, in the inference stage, we further developed a systemic re-parameterization method to convert the TMS-based model into a single-path-based topology in order to increase inference speed. We investigated the performance of the new TMS method for automatic speaker verification (ASV) on in-domain and out-of-domain conditions. Results show that the TMS-based model obtained a significant increase in the performance over the SOTA ASV models, meanwhile, had a faster inference speed. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

arXiv:2203.05208 [pdf, other]

Transferring Dual Stochastic Graph Convolutional Network for Facial Micro-expression Recognition

Authors: Hui Tang, Li Chai, Wanli Lu

Abstract: Micro-expression recognition has drawn increasing attention due to its wide application in lie detection, criminal detection and psychological consultation. To improve the recognition performance of the small micro-expression data, this paper presents a transferring dual stochastic Graph Convolutional Network (TDSGCN) model. We propose a stochastic graph construction method and dual graph convolut… ▽ More Micro-expression recognition has drawn increasing attention due to its wide application in lie detection, criminal detection and psychological consultation. To improve the recognition performance of the small micro-expression data, this paper presents a transferring dual stochastic Graph Convolutional Network (TDSGCN) model. We propose a stochastic graph construction method and dual graph convolutional network to extract more discriminative features from the micro-expression images. We use transfer learning to pre-train SGCNs from macro expression data. Optical flow algorithm is also integrated to extract their temporal features. We fuse both spatial and temporal features to improve the recognition performance. To the best of our knowledge, this is the first attempt to utilize the transferring learning and graph convolutional network in micro-expression recognition task. In addition, to handle the class imbalance problem of dataset, we focus on the design of focal loss function. Through extensive evaluation, our proposed method achieves state-of-the-art performance on SAMM and recently released MMEW benchmarks. Our code will be publicly available accompanying this paper. △ Less

Submitted 10 March, 2022; originally announced March 2022.

arXiv:2201.12571 [pdf]

doi 10.1016/j.ijepes.2022.107998

Probabilistic load flow calculation of AC/DC hybrid system based on cumulant method

Authors: Yinfeng Sun, Dapeng Xia, Zichun Gao, Zhenhao Wang, Guoqing Li, Weihua Lu, Xueguang Wu, Yang Li

Abstract: The operating conditions of the power system have become more complex and changeable. This paper proposes a probabilistic load flow based on the cumulant method (PLF-CM) for the voltage sourced converter high voltage direct current (VSC-HVDC) hybrid system containing photovoltaic grid-connected systems. Firstly, the corresponding control mode is set for the converter, including droop control and m… ▽ More The operating conditions of the power system have become more complex and changeable. This paper proposes a probabilistic load flow based on the cumulant method (PLF-CM) for the voltage sourced converter high voltage direct current (VSC-HVDC) hybrid system containing photovoltaic grid-connected systems. Firstly, the corresponding control mode is set for the converter, including droop control and master-slave control. The unified iterative method is used to calculate the conventional AC/DC flow. Secondly, on the basis of the probability model of load and photovoltaic output, based on the aforementioned flow results, use correlation coefficient matrix of this paper will change the relevant sample into independent sample, the cumulants of the load and photovoltaic output are obtained; then, the probability density function (PDF) and cumulative distribution function (CDF) of state variables are obtained by using Gram-Charlie series expansion method. Finally, the mean value and standard deviation of node voltage and line power are calculated on the modified IEEE 34-bus and IEEE 57-bus transmission systems. The algorithm can reflect the inherent uncertainty of new energy sources, and replace the complex convolution operation, greatly improving the calculation speed and the convergence. △ Less

Submitted 15 February, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

Journal ref: International Journal of Electrical Power & Energy Systems 139 (2022) 107998

arXiv:2110.13465 [pdf, other]

CS-Rep: Making Speaker Verification Networks Embracing Re-parameterization

Authors: Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Lin Zhang, Yantao Ji, Junhai Xu, Xugang Lu

Abstract: Automatic speaker verification (ASV) systems, which determine whether two speeches are from the same speaker, mainly focus on verification accuracy while ignoring inference speed. However, in real applications, both inference speed and verification accuracy are essential. This study proposes cross-sequential re-parameterization (CS-Rep), a novel topology re-parameterization strategy for multi-type… ▽ More Automatic speaker verification (ASV) systems, which determine whether two speeches are from the same speaker, mainly focus on verification accuracy while ignoring inference speed. However, in real applications, both inference speed and verification accuracy are essential. This study proposes cross-sequential re-parameterization (CS-Rep), a novel topology re-parameterization strategy for multi-type networks, to increase the inference speed and verification accuracy of models. CS-Rep solves the problem that existing re-parameterization methods are unsuitable for typical ASV backbones. When a model applies CS-Rep, the training-period network utilizes a multi-branch topology to capture speaker information, whereas the inference-period model converts to a time-delay neural network (TDNN)-like plain backbone with stacked TDNN layers to achieve the fast inference speed. Based on CS-Rep, an improved TDNN with friendly test and deployment called Rep-TDNN is proposed. Compared with the state-of-the-art model ECAPA-TDNN, which is highly recognized in the industry, Rep-TDNN increases the actual inference speed by about 50% and reduces the EER by 10%. The code will be released. △ Less

Submitted 3 April, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted by ICASSP 2022

arXiv:2110.12855 [pdf, other]

doi 10.1145/3474085.3475529

Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience

Authors: Wei-Tsung Lu, Meng-Hsuan Wu, Yuh-Ming Chiu, Li Su

Abstract: The subjective evaluation of music generation techniques has been mostly done with questionnaire-based listening tests while ignoring the perspectives from music composition, arrangement, and soundtrack editing. In this paper, we propose an editing test to evaluate users' editing experience of music generation models in a systematic way. To do this, we design a new music style transfer model combi… ▽ More The subjective evaluation of music generation techniques has been mostly done with questionnaire-based listening tests while ignoring the perspectives from music composition, arrangement, and soundtrack editing. In this paper, we propose an editing test to evaluate users' editing experience of music generation models in a systematic way. To do this, we design a new music style transfer model combining the non-chronological inference architecture, autoregressive models and the Transformer, which serves as an improvement from the baseline model on the same style transfer task. Then, we compare the performance of the two models with a conventional listening test and the proposed editing test, in which the quality of generated samples is assessed by the amount of effort (e.g., the number of required keyboard and mouse actions) spent by users to polish a music clip. Results on two target styles indicate that the improvement over the baseline model can be reflected by the editing test quantitatively. Also, the editing test provides profound insights which are not accessible from usual listening tests. The major contribution of this paper is the systematic presentation of the editing test and the corresponding insights, while the proposed music style transfer model based on state-of-the-art neural networks represents another contribution. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: 9 pages, Proceedings of the 29th ACM International Conference on Multimedia

arXiv:2110.09127 [pdf, other]

SpecTNT: a Time-Frequency Transformer for Music Audio

Authors: Wei-Tsung Lu, Ju-Chiang Wang, Minz Won, Keunwoo Choi, Xuchen Song

Abstract: Transformers have drawn attention in the MIR field for their remarkable performance shown in natural language processing and computer vision. However, prior works in the audio processing domain mostly use Transformer as a temporal feature aggregator that acts similar to RNNs. In this paper, we propose SpecTNT, a Transformer-based architecture to model both spectral and temporal sequences of an inp… ▽ More Transformers have drawn attention in the MIR field for their remarkable performance shown in natural language processing and computer vision. However, prior works in the audio processing domain mostly use Transformer as a temporal feature aggregator that acts similar to RNNs. In this paper, we propose SpecTNT, a Transformer-based architecture to model both spectral and temporal sequences of an input time-frequency representation. Specifically, we introduce a novel variant of the Transformer-in-Transformer (TNT) architecture. In each SpecTNT block, a spectral Transformer extracts frequency-related features into the frequency class token (FCT) for each frame. Later, the FCTs are linearly projected and added to the temporal embeddings (TEs), which aggregate useful information from the FCTs. Then, a temporal Transformer processes the TEs to exchange information across the time axis. By stacking the SpecTNT blocks, we build the SpecTNT model to learn the representation for music signals. In experiments, SpecTNT demonstrates state-of-the-art performance in music tagging and vocal melody extraction, and shows competitive performance for chord recognition. The effectiveness of SpecTNT and other design choices are further examined through ablation studies. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: 6 pages

Journal ref: International Society for Music Information Retrieval (ISMIR) 2021

arXiv:2110.09000 [pdf, other]

Supervised Metric Learning for Music Structure Features

Authors: Ju-Chiang Wang, Jordan B. L. Smith, Wei-Tsung Lu, Xuchen Song

Abstract: Music structure analysis (MSA) methods traditionally search for musically meaningful patterns in audio: homogeneity, repetition, novelty, and segment-length regularity. Hand-crafted audio features such as MFCCs or chromagrams are often used to elicit these patterns. However, with more annotations of section labels (e.g., verse, chorus, and bridge) becoming available, one can use supervised feature… ▽ More Music structure analysis (MSA) methods traditionally search for musically meaningful patterns in audio: homogeneity, repetition, novelty, and segment-length regularity. Hand-crafted audio features such as MFCCs or chromagrams are often used to elicit these patterns. However, with more annotations of section labels (e.g., verse, chorus, and bridge) becoming available, one can use supervised feature learning to make these patterns even clearer and improve MSA performance. To this end, we take a supervised metric learning approach: we train a deep neural network to output embeddings that are near each other for two spectrogram inputs if both have the same section type (according to an annotation), and otherwise far apart. We propose a batch sampling scheme to ensure the labels in a training pair are interpreted meaningfully. The trained model extracts features that can be used in existing MSA algorithms. In evaluations with three datasets (HarmonixSet, SALAMI, and RWC), we demonstrate that using the proposed features can improve a traditional MSA algorithm significantly in both intra- and cross-dataset scenarios. △ Less

Submitted 29 April, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

Comments: This paper was accepted and presented at ISMIR 2021

arXiv:2109.10863 [pdf]

A Transportation Digital-Twin Approach for Adaptive Traffic Control Systems

Authors: Sagar Dasgupta, Mizanur Rahman, Abhay D. Lidbe, Weike Lu, Steven Jones

Abstract: A transportation digital twin represents a digital version of a transportation physical object or process, such as a traffic signal controller, and thereby a two-way real-time data exchange between the physical twin and digital twin. This paper introduces a digital twin approach for adaptive traffic signal control (ATSC) to improve a traveler's driving experience by reducing and redistributing wai… ▽ More A transportation digital twin represents a digital version of a transportation physical object or process, such as a traffic signal controller, and thereby a two-way real-time data exchange between the physical twin and digital twin. This paper introduces a digital twin approach for adaptive traffic signal control (ATSC) to improve a traveler's driving experience by reducing and redistributing waiting time at an intersection. While an ATSC combined with a connected vehicle concept can reduce waiting time at an intersection and improve travel time in a signalized corridor, it is nearly impossible to reduce traffic delay for congested traffic conditions. To remedy this defect of the traditional ATCS with connected vehicle data, we have developed a digital twin-based ATSC (DT-based ATSC) that considers the waiting time of approaching vehicles towards a subject intersection along with the waiting time of those vehicles at the immediate upstream intersection. We conducted a case study using a microscopic traffic simulation, Simulation of Urban Mobility (SUMO), by developing a digital replica of a roadway network with signalized intersections in an urban setting where vehicle and traffic signal data were collected in real-time. Our analyses reveal that the DT-based ATSC outperforms the connected vehicle-based baseline ATSC in terms of average cumulative waiting time, distribution of drivers' waiting time, and level of services for each approach for different traffic demands and therefore demonstrates our method's superior efficacy. △ Less

Submitted 1 July, 2023; v1 submitted 19 August, 2021; originally announced September 2021.

arXiv:2108.08731 [pdf]

doi 10.1002/mp.15677

Registration-Guided Deep Learning Image Segmentation for Cone Beam CT-based Online Adaptive Radiotherapy

Authors: Lin Ma, Weicheng Chi, Howard E. Morgan, Mu-Han Lin, Mingli Chen, David Sher, Dominic Moon, Dat T. Vo, Vladimir Avkshtol, Weiguo Lu, Xuejun Gu

Abstract: Adaptive radiotherapy (ART), especially online ART, effectively accounts for positioning errors and anatomical changes. One key component of online ART is accurately and efficiently delineating organs at risk (OARs) and targets on online images, such as CBCT, to meet the online demands of plan evaluation and adaptation. Deep learning (DL)-based automatic segmentation has gained great success in se… ▽ More Adaptive radiotherapy (ART), especially online ART, effectively accounts for positioning errors and anatomical changes. One key component of online ART is accurately and efficiently delineating organs at risk (OARs) and targets on online images, such as CBCT, to meet the online demands of plan evaluation and adaptation. Deep learning (DL)-based automatic segmentation has gained great success in segmenting planning CT, but its applications to CBCT yielded inferior results due to the low image quality and limited available contour labels for training. To overcome these obstacles to online CBCT segmentation, we propose a registration-guided DL (RgDL) segmentation framework that integrates image registration algorithms and DL segmentation models. The registration algorithm generates initial contours, which were used as guidance by DL model to obtain accurate final segmentations. We had two implementations the proposed framework--Rig-RgDL (Rig for rigid body) and Def-RgDL (Def for deformable)--with rigid body (RB) registration or deformable image registration (DIR) as the registration algorithm respectively and U-Net as DL model architecture. The two implementations of RgDL framework were trained and evaluated on seven OARs in an institutional clinical Head and Neck (HN) dataset. Compared to the baseline approaches using the registration or the DL alone, RgDL achieved more accurate segmentation, as measured by higher mean Dice similarity coefficients (DSC) and other distance-based metrics. Rig-RgDL achieved a DSC of 84.5% on seven OARs on average, higher than RB or DL alone by 4.5% and 4.7%. The DSC of Def-RgDL is 86.5%, higher than DIR or DL alone by 2.4% and 6.7%. The inference time took by the DL model to generate final segmentations of seven OARs is less than one second in RgDL. The resulting segmentation accuracy and efficiency show the promise of applying RgDL framework for online ART. △ Less

Submitted 19 August, 2021; originally announced August 2021.

Comments: 16 pages, 6 figures

arXiv:2107.02041 [pdf, other]

doi 10.1109/TCSVT.2022.3186894

No-Reference Quality Assessment for 3D Colored Point Cloud and Mesh Models

Authors: Zicheng Zhang, Wei Sun, Xiongkuo Min, Tao Wang, Wei Lu, Guangtao Zhai

Abstract: To improve the viewer's Quality of Experience (QoE) and optimize computer graphics applications, 3D model quality assessment (3D-QA) has become an important task in the multimedia area. Point cloud and mesh are the two most widely used digital representation formats of 3D models, the visual quality of which is quite sensitive to lossy operations like simplification and compression. Therefore, many… ▽ More To improve the viewer's Quality of Experience (QoE) and optimize computer graphics applications, 3D model quality assessment (3D-QA) has become an important task in the multimedia area. Point cloud and mesh are the two most widely used digital representation formats of 3D models, the visual quality of which is quite sensitive to lossy operations like simplification and compression. Therefore, many related studies such as point cloud quality assessment (PCQA) and mesh quality assessment (MQA) have been carried out to measure the visual quality degradations of 3D models. However, a large part of previous studies utilize full-reference (FR) metrics, which indicates they can not predict the quality level with the absence of the reference 3D model. Furthermore, few 3D-QA metrics consider color information, which significantly restricts their effectiveness and scope of application. In this paper, we propose a no-reference (NR) quality assessment metric for colored 3D models represented by both point cloud and mesh. First, we project the 3D models from 3D space into quality-related geometry and color feature domains. Then, the 3D natural scene statistics (3D-NSS) and entropy are utilized to extract quality-aware features. Finally, machine learning is employed to regress the quality-aware features into visual quality scores. Our method is validated on the colored point cloud quality assessment database (SJTU-PCQA), the Waterloo point cloud assessment database (WPC), and the colored mesh quality assessment database (CMDM). The experimental results show that the proposed method outperforms most compared NR 3D-QA metrics with competitive computational resources and greatly reduces the performance gap with the state-of-the-art FR 3D-QA metrics. The code of the proposed model is publicly available now to facilitate further research. △ Less

Submitted 2 May, 2022; v1 submitted 5 July, 2021; originally announced July 2021.

arXiv:2106.13689 [pdf]

Semantic annotation for computational pathology: Multidisciplinary experience and best practice recommendations

Authors: Noorul Wahab, Islam M Miligy, Katherine Dodd, Harvir Sahota, Michael Toss, Wenqi Lu, Mostafa Jahanifar, Mohsin Bilal, Simon Graham, Young Park, Giorgos Hadjigeorghiou, Abhir Bhalerao, Ayat Lashen, Asmaa Ibrahim, Ayaka Katayama, Henry O Ebili, Matthew Parkin, Tom Sorell, Shan E Ahmed Raza, Emily Hero, Hesham Eldaly, Yee Wah Tsang, Kishore Gopalakrishnan, David Snead, Emad Rakha , et al. (2 additional authors not shown)

Abstract: Recent advances in whole slide imaging (WSI) technology have led to the development of a myriad of computer vision and artificial intelligence (AI) based diagnostic, prognostic, and predictive algorithms. Computational Pathology (CPath) offers an integrated solution to utilize information embedded in pathology WSIs beyond what we obtain through visual assessment. For automated analysis of WSIs and… ▽ More Recent advances in whole slide imaging (WSI) technology have led to the development of a myriad of computer vision and artificial intelligence (AI) based diagnostic, prognostic, and predictive algorithms. Computational Pathology (CPath) offers an integrated solution to utilize information embedded in pathology WSIs beyond what we obtain through visual assessment. For automated analysis of WSIs and validation of machine learning (ML) models, annotations at the slide, tissue and cellular levels are required. The annotation of important visual constructs in pathology images is an important component of CPath projects. Improper annotations can result in algorithms which are hard to interpret and can potentially produce inaccurate and inconsistent results. Despite the crucial role of annotations in CPath projects, there are no well-defined guidelines or best practices on how annotations should be carried out. In this paper, we address this shortcoming by presenting the experience and best practices acquired during the execution of a large-scale annotation exercise involving a multidisciplinary team of pathologists, ML experts and researchers as part of the Pathology image data Lake for Analytics, Knowledge and Education (PathLAKE) consortium. We present a real-world case study along with examples of different types of annotations, diagnostic algorithm, annotation data dictionary and annotation constructs. The analyses reported in this work highlight best practice recommendations that can be used as annotation guidelines over the lifecycle of a CPath project. △ Less

Submitted 25 June, 2021; originally announced June 2021.

arXiv:2105.09558 [pdf]

An Accelerated Stackelberg Game Approach for Distributed Energy Resource Aggregator participating in Energy and Reserve Markets Considering Security Check

Authors: Zhijun Shen, Mingbo Liu, Lixin Xu, Wentian Lu

Abstract: With increasing distributed energy resoures (DERs) integration, the strategic behavior of a DER aggregator in electricity markets will significantly affect the secure operation of the distribution system. In this paper, the interactions among the DER aggregator, energy and reserve markets, and distribution system are investigated through a single-leader-multi-follower Stackelberg game model with t… ▽ More With increasing distributed energy resoures (DERs) integration, the strategic behavior of a DER aggregator in electricity markets will significantly affect the secure operation of the distribution system. In this paper, the interactions among the DER aggregator, energy and reserve markets, and distribution system are investigated through a single-leader-multi-follower Stackelberg game model with the DER aggregator as the leader and the independent system operator and distribution system operator as the followers. To guarantee the operation security of the distribution system, security check problems under three different scenarios are involved in the follower level, which is linearized using a mixed-integer linearized power flow model. Then, using the strong duality theorem, the proposed model is converted into a bi-level mixed-integer linear (BMILP) programming model with only mixed-integer linear follower-level problems. Next, an accelerated relaxation-based bi-level reformulation and decomposition algorithm is proposed to solve the BMILP problem. Finally, case studies are carried out on a constructed integrated transmission and distribution (T&D) system and a practical integrated T&D system to verify the effectiveness of the proposed model and algorithm. The simulation results indicate that the available downward reserve of the DER aggregator will decrease with the security limitation of the distribution system. △ Less

Submitted 26 July, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: 20 pages, 14 figures. This work has been submitted to Renewable Energy for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2105.03877 [pdf]

Non-iterative Optimization Algorithm for Active Distribution Grids Considering Uncertainty of Feeder Parameters

Authors: J. Wu, M. Liu, W. Lu, K. Xie, M. Xie

Abstract: To cope with fast-fluctuating distributed energy resources (DERs) and uncontrolled loads, this paper formulates a time-varying optimization problem for distribution grids with DERs and develops a novel non-iterative algorithm to track the optimal solutions. Different from existing methods, the proposed approach does not require iterations during the sampling interval. It only needs to perform a si… ▽ More To cope with fast-fluctuating distributed energy resources (DERs) and uncontrolled loads, this paper formulates a time-varying optimization problem for distribution grids with DERs and develops a novel non-iterative algorithm to track the optimal solutions. Different from existing methods, the proposed approach does not require iterations during the sampling interval. It only needs to perform a single one-step calculation at each interval to obtain the evolution of the optimal trajectory, which demonstrates fast calculation and online-tracking capability with an asymptotically vanishing error. Specifically, the designed approach contains two terms: a prediction term tracking the change in the optimal solution based on the time-varying nature of system power, and a correction term pushing the solution toward the optimum based on Newton's method. Moreover, the proposed algorithm can be applied in the absence of an accurate network model by leveraging voltage measurements to identify the true voltage sensitivity parameters. Simulations for an illustrative distribution network are provided to validate the approach. △ Less

Submitted 9 May, 2021; originally announced May 2021.

Comments: 9 pages, 10 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2105.02771 [pdf]

doi 10.1088/1361-6560/ac176d

Saliency-Guided Deep Learning Network for Automatic Tumor Bed Volume Delineation in Post-operative Breast Irradiation

Authors: Mahdieh Kazemimoghadam, Weicheng Chi, Asal Rahimi, Nathan Kim, Prasanna Alluri, Chika Nwachukwu, Weiguo Lu, Xuejun Gu

Abstract: Efficient, reliable and reproducible target volume delineation is a key step in the effective planning of breast radiotherapy. However, post-operative breast target delineation is challenging as the contrast between the tumor bed volume (TBV) and normal breast tissue is relatively low in CT images. In this study, we propose to mimic the marker-guidance procedure in manual target delineation. We de… ▽ More Efficient, reliable and reproducible target volume delineation is a key step in the effective planning of breast radiotherapy. However, post-operative breast target delineation is challenging as the contrast between the tumor bed volume (TBV) and normal breast tissue is relatively low in CT images. In this study, we propose to mimic the marker-guidance procedure in manual target delineation. We developed a saliency-based deep learning segmentation (SDL-Seg) algorithm for accurate TBV segmentation in post-operative breast irradiation. The SDL-Seg algorithm incorporates saliency information in the form of markers' location cues into a U-Net model. The design forces the model to encode the location-related features, which underscores regions with high saliency levels and suppresses low saliency regions. The saliency maps were generated by identifying markers on CT images. Markers' locations were then converted to probability maps using a distance-transformation coupled with a Gaussian filter. Subsequently, the CT images and the corresponding saliency maps formed a multi-channel input for the SDL-Seg network. Our in-house dataset was comprised of 145 prone CT images from 29 post-operative breast cancer patients, who received 5-fraction partial breast irradiation (PBI) regimen on GammaPod. The performance of the proposed method was compared against basic U-Net. Our model achieved mean (standard deviation) of 76.4 %, 6.76 mm, and 1.9 mm for DSC, HD95, and ASD respectively on the test set with computation time of below 11 seconds per one CT volume. SDL-Seg showed superior performance relative to basic U-Net for all the evaluation metrics while preserving low computation cost. The findings demonstrate that SDL-Seg is a promising approach for improving the efficiency and accuracy of the on-line treatment planning procedure of PBI, such as GammaPod based PBI. △ Less

Submitted 26 July, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

Comments: https://iopscience.iop.org/article/10.1088/1361-6560/ac176d

Journal ref: Physics in Medicine & Biology 2021

arXiv:2104.13339 [pdf, other]

An Event-based Parameter Switching Method for Controlling Cybersecurity Dynamics

Authors: Zhaofeng Liu, Wenlian Lu, Yingying Lang

Abstract: This paper proposes a new event-based parameter switching method for the control tasks of cybersecurity in the context of preventive and reactive cyber defense dynamics. Our parameter switching method helps avoid excessive control costs as well as guarantees the dynamics to converge as our desired speed. Meanwhile, it can be proved that this approach is Zeno-free. A new estimation method with adap… ▽ More This paper proposes a new event-based parameter switching method for the control tasks of cybersecurity in the context of preventive and reactive cyber defense dynamics. Our parameter switching method helps avoid excessive control costs as well as guarantees the dynamics to converge as our desired speed. Meanwhile, it can be proved that this approach is Zeno-free. A new estimation method with adaptive time windows is used to bridge the gap between the probability state and the sampling state. With the new estimation method, several practical experiments are given afterwards. △ Less

Submitted 27 April, 2021; originally announced April 2021.

Comments: 21 pages, 10 figures, 1 algorithm. may be submitted to SciSec Conference

MSC Class: 37N35 ACM Class: C.2.0

arXiv:2104.01445 [pdf]

A Dynamics Perspective of Pursuit-Evasion Games of Intelligent Agents with the Ability to Learn

Authors: Hao Xiong, Huanhui Cao, Lin Zhang, Wenjie Lu

Abstract: Pursuit-evasion games are ubiquitous in nature and in an artificial world. In nature, pursuer(s) and evader(s) are intelligent agents that can learn from experience, and dynamics (i.e., Newtonian or Lagrangian) is vital for the pursuer and the evader in some scenarios. To this end, this paper addresses the pursuit-evasion game of intelligent agents from the perspective of dynamics. A bio-inspired… ▽ More Pursuit-evasion games are ubiquitous in nature and in an artificial world. In nature, pursuer(s) and evader(s) are intelligent agents that can learn from experience, and dynamics (i.e., Newtonian or Lagrangian) is vital for the pursuer and the evader in some scenarios. To this end, this paper addresses the pursuit-evasion game of intelligent agents from the perspective of dynamics. A bio-inspired dynamics formulation of a pursuit-evasion game and baseline pursuit and evasion strategies are introduced at first. Then, reinforcement learning techniques are used to mimic the ability of intelligent agents to learn from experience. Based on the dynamics formulation and reinforcement learning techniques, the effects of improving both pursuit and evasion strategies based on experience on pursuit-evasion games are investigated at two levels 1) individual runs and 2) ranges of the parameters of pursuit-evasion games. Results of the investigation are consistent with nature observations and the natural law - survival of the fittest. More importantly, with respect to the result of a pursuit-evasion game of agents with baseline strategies, this study achieves a different result. It is shown that, in a pursuit-evasion game with a dynamics formulation, an evader is not able to escape from a slightly faster pursuer with an effective learned pursuit strategy, based on agile maneuvers and an effective learned evasion strategy. △ Less

Submitted 3 April, 2021; originally announced April 2021.

arXiv:2103.04026 [pdf]

Morphological Operation Residual Blocks: Enhancing 3D Morphological Feature Representation in Convolutional Neural Networks for Semantic Segmentation of Medical Images

Authors: Chentian Li, Chi Ma, William W. Lu

Abstract: The shapes and morphology of the organs and tissues are important prior knowledge in medical imaging recognition and segmentation. The morphological operation is a well-known method for morphological feature extraction. As the morphological operation is performed well in hand-crafted image segmentation techniques, it is also promising to design an approach to approximate morphological operation in… ▽ More The shapes and morphology of the organs and tissues are important prior knowledge in medical imaging recognition and segmentation. The morphological operation is a well-known method for morphological feature extraction. As the morphological operation is performed well in hand-crafted image segmentation techniques, it is also promising to design an approach to approximate morphological operation in the convolutional networks. However, using the traditional convolutional neural network as a black-box is usually hard to specify the morphological operation action. Here, we introduced a 3D morphological operation residual block to extract morphological features in end-to-end deep learning models for semantic segmentation. This study proposed a novel network block architecture that embedded the morphological operation as an infinitely strong prior in the convolutional neural network. Several 3D deep learning models with the proposed morphological operation block were built and compared in different medical imaging segmentation tasks. Experimental results showed the proposed network achieved a relatively higher performance in the segmentation tasks comparing with the conventional approach. In conclusion, the novel network block could be easily embedded in traditional networks and efficiently reinforce the deep learning models for medical imaging segmentation. △ Less

Submitted 5 March, 2021; originally announced March 2021.

arXiv:2102.03353 [pdf, other]

doi 10.1016/j.neucom.2021.04.124

Cross-domain Activity Recognition via Substructural Optimal Transport

Authors: Wang Lu, Yiqiang Chen, Jindong Wang, Xin Qin

Abstract: It is expensive and time-consuming to collect sufficient labeled data for human activity recognition (HAR). Domain adaptation is a promising approach for cross-domain activity recognition. Existing methods mainly focus on adapting cross-domain representations via domain-level, class-level, or sample-level distribution matching. However, they might fail to capture the fine-grained locality informat… ▽ More It is expensive and time-consuming to collect sufficient labeled data for human activity recognition (HAR). Domain adaptation is a promising approach for cross-domain activity recognition. Existing methods mainly focus on adapting cross-domain representations via domain-level, class-level, or sample-level distribution matching. However, they might fail to capture the fine-grained locality information in activity data. The domain- and class-level matching are too coarse that may result in under-adaptation, while sample-level matching may be affected by the noise seriously and eventually cause over-adaptation. In this paper, we propose substructure-level matching for domain adaptation (SSDA) to better utilize the locality information of activity data for accurate and efficient knowledge transfer. Based on SSDA, we propose an optimal transport-based implementation, Substructural Optimal Transport (SOT), for cross-domain HAR. We obtain the substructures of activities via clustering methods and seeks the coupling of the weighted substructures between different domains. We conduct comprehensive experiments on four public activity recognition datasets (i.e. UCI-DSADS, UCI-HAR, USC-HAD, PAMAP2), which demonstrates that SOT significantly outperforms other state-of-the-art methods w.r.t classification accuracy (9%+ improvement). In addition, our mehtod is 5x faster than traditional OT-based DA methods with the same hyper-parameters. △ Less

Submitted 16 September, 2021; v1 submitted 29 January, 2021; originally announced February 2021.

Comments: Accepted by Neurocomputing; 17 pages; Code: https://github.com/jindongwang/transferlearning/tree/master/code/traditional/sot

Journal ref: Neurocomputing, Volume 454, 2021

arXiv:2012.13539 [pdf, ps, other]

A GCICA Grant-Free Random Access Scheme for M2M Communications in Crowded Massive MIMO Systems

Authors: Huimei Han, Lushun Fang, Weidang Lu, Wenchao Zhai, Ying Li, Jun Zhao

Abstract: A high success rate of grant-free random access scheme is proposed to support massive access for machine-to-machine communications in massive multipleinput multiple-output systems. This scheme allows active user equipments (UEs) to transmit their modulated uplink messages along with super pilots consisting of multiple sub-pilots to a base station (BS). Then, the BS performs channel state informati… ▽ More A high success rate of grant-free random access scheme is proposed to support massive access for machine-to-machine communications in massive multipleinput multiple-output systems. This scheme allows active user equipments (UEs) to transmit their modulated uplink messages along with super pilots consisting of multiple sub-pilots to a base station (BS). Then, the BS performs channel state information (CSI) estimation and uplink message decoding by utilizing a proposed graph combined clustering independent component analysis (GCICA) decoding algorithm, and then employs the estimated CSIs to detect active UEs by utilizing the characteristic of asymptotic favorable propagation of massive MIMO channel. We call this proposed scheme as GCICA based random access (GCICA-RA) scheme. We analyze the successful access probability, missed detection probability, and uplink throughput of the GCICA-RA scheme. Numerical results show that, the GCICA-RA scheme significantly improves the successful access probability and uplink throughput, decreases missed detection probability, and provides low CSI estimation error at the same time. △ Less

Submitted 25 December, 2020; originally announced December 2020.

arXiv:2012.10239 [pdf]

doi 10.1063/5.0041901

Computational interference microscopy enabled by deep learning

Authors: Yuheng Jiao, Yuchen R. He, Mikhail E. Kandel, Xiaojun Liu, Wenlong Lu, Gabriel Popescu

Abstract: Quantitative phase imaging (QPI) has been widely applied in characterizing cells and tissues. Spatial light interference microscopy (SLIM) is a highly sensitive QPI method, due to its partially coherent illumination and common path interferometry geometry. However, its acquisition rate is limited because of the four-frame phase-shifting scheme. On the other hand, off-axis methods like diffraction… ▽ More Quantitative phase imaging (QPI) has been widely applied in characterizing cells and tissues. Spatial light interference microscopy (SLIM) is a highly sensitive QPI method, due to its partially coherent illumination and common path interferometry geometry. However, its acquisition rate is limited because of the four-frame phase-shifting scheme. On the other hand, off-axis methods like diffraction phase microscopy (DPM), allows for single-shot QPI. However, the laser-based DPM system is plagued by spatial noise due to speckles and multiple reflections. In a parallel development, deep learning was proven valuable in the field of bioimaging, especially due to its ability to translate one form of contrast into another. Here, we propose using deep learning to produce synthetic, SLIM-quality, high-sensitivity phase maps from DPM, single-shot images as input. We used an inverted microscope with its two ports connected to the DPM and SLIM modules, such that we have access to the two types of images on the same field of view. We constructed a deep learning model based on U-net and trained on over 1,000 pairs of DPM and SLIM images. The model learned to remove the speckles in laser DPM and overcame the background phase noise in both the test set and new data. Furthermore, we implemented the neural network inference into the live acquisition software, which now allows a DPM user to observe in real-time an extremely low-noise phase image. We demonstrated this principle of computational interference microscopy (CIM) imaging using blood smears, as they contain both erythrocytes and leukocytes, in static and dynamic conditions. △ Less

Submitted 17 December, 2020; originally announced December 2020.

Showing 1–50 of 80 results for author: Lu, W