-
VEnhancer: Generative Space-Time Enhancement for Video Generation
Authors:
Jingwen He,
Tianfan Xue,
Dongyang Liu,
Xinqi Lin,
Peng Gao,
Dahua Lin,
Yu Qiao,
Wanli Ouyang,
Ziwei Liu
Abstract:
We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffu…
▽ More
We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffusion model. Furthermore, VEnhancer effectively removes generated spatial artifacts and temporal flickering of generated videos. To achieve this, basing on a pretrained video diffusion model, we train a video ControlNet and inject it to the diffusion model as a condition on low frame-rate and low-resolution videos. To effectively train this video ControlNet, we design space-time data augmentation as well as video-aware conditioning. Benefiting from the above designs, VEnhancer yields to be stable during training and shares an elegant end-to-end training manner. Extensive experiments show that VEnhancer surpasses existing state-of-the-art video super-resolution and space-time super-resolution methods in enhancing AI-generated videos. Moreover, with VEnhancer, exisiting open-source state-of-the-art text-to-video method, VideoCrafter-2, reaches the top one in video generation benchmark -- VBench.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Highly Accelerated MRI via Implicit Neural Representation Guided Posterior Sampling of Diffusion Models
Authors:
Jiayue Chu,
Chenhe Du,
Xiyue Lin,
Yuyao Zhang,
Hongjiang Wei
Abstract:
Reconstructing high-fidelity magnetic resonance (MR) images from under-sampled k-space is a commonly used strategy to reduce scan time. The posterior sampling of diffusion models based on the real measurement data holds significant promise of improved reconstruction accuracy. However, traditional posterior sampling methods often lack effective data consistency guidance, leading to inaccurate and u…
▽ More
Reconstructing high-fidelity magnetic resonance (MR) images from under-sampled k-space is a commonly used strategy to reduce scan time. The posterior sampling of diffusion models based on the real measurement data holds significant promise of improved reconstruction accuracy. However, traditional posterior sampling methods often lack effective data consistency guidance, leading to inaccurate and unstable reconstructions. Implicit neural representation (INR) has emerged as a powerful paradigm for solving inverse problems by modeling a signal's attributes as a continuous function of spatial coordinates. In this study, we present a novel posterior sampler for diffusion models using INR, named DiffINR. The INR-based component incorporates both the diffusion prior distribution and the MRI physical model to ensure high data fidelity. DiffINR demonstrates superior performance on experimental datasets with remarkable accuracy, even under high acceleration factors (up to R=12 in single-channel reconstruction). Notably, our proposed framework can be a generalizable framework to solve inverse problems in other medical imaging tasks.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Zero-Shot Image Denoising for High-Resolution Electron Microscopy
Authors:
Xuanyu Tian,
Zhuoya Dong,
Xiyue Lin,
Yue Gao,
Hongjiang Wei,
Yanhang Ma,
Jingyi Yu,
Yuyao Zhang
Abstract:
High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we…
▽ More
High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we propose a super-resolution (SR) based self-supervised training strategy, incorporating the Random Sub-sampler module. The Random Sub-sampler is designed to generate approximate infinite noisy pairs from a single noisy image, serving as an effective data augmentation in zero-shot denoising. Noise2SR trains the network with paired noisy images of different resolutions, which is conducted via SR strategy. The SR-based training facilitates the network adopting more pixels for supervision, and the random sub-sampling helps compel the network to learn continuous signals enhancing the robustness. Meanwhile, we mitigate the uncertainty caused by random-sampling by adopting minimum mean squared error (MMSE) estimation for the denoised results. With the distinctive integration of training strategy and proposed designs, Noise2SR can achieve superior denoising performance using a single noisy HREM image. We evaluate the performance of Noise2SR in both simulated and real HREM denoising tasks. It outperforms state-of-the-art ZS-SSL methods and achieves comparable denoising performance with supervised methods. The success of Noise2SR suggests its potential for improving the SNR of images in material imaging domains.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Accelerate Hybrid Model Predictive Control using Generalized Benders Decomposition
Authors:
Xuan Lin
Abstract:
Hybrid model predictive control with both continuous and discrete variables is widely applicable to robotics tasks. Due to the combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we propose to accelerate hybrid MPC using Generalized Benders Decomposition (GBD). GBD enumerates cuts online and stores inside a finite buffer to provi…
▽ More
Hybrid model predictive control with both continuous and discrete variables is widely applicable to robotics tasks. Due to the combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we propose to accelerate hybrid MPC using Generalized Benders Decomposition (GBD). GBD enumerates cuts online and stores inside a finite buffer to provide warm-starts for the new problem instances. Leveraging on the sparsity of feasibility cuts, a fast algorithm is designed for Benders master problems. We also propose to construct initial optimality cuts from heuristic solutions allowing GBD to plan for longer time horizons. The proposed algorithm successfully controls a cart-pole system with randomly moving soft-contact walls reaching speeds 2-3 times faster than Gurobi, oftentimes exceeding 1000Hz. It also guides a free-flying robot through a maze with a time horizon of 50 re-planning at 20Hz. The code is available at https://github.com/XuanLin/Benders-MPC.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
System Identification for Lithium-Ion Batteries with Nonlinear Coupled Electro-Thermal Dynamics via Bayesian Optimization
Authors:
Hao Tu,
Xinfan Lin,
Yebin Wang,
Huazhen Fang
Abstract:
Essential to various practical applications of lithium-ion batteries is the availability of accurate equivalent circuit models. This paper presents a new coupled electro-thermal model for batteries and studies how to extract it from data. We consider the problem of maximum likelihood parameter estimation, which, however, is nontrivial to solve as the model is nonlinear in both its dynamics and mea…
▽ More
Essential to various practical applications of lithium-ion batteries is the availability of accurate equivalent circuit models. This paper presents a new coupled electro-thermal model for batteries and studies how to extract it from data. We consider the problem of maximum likelihood parameter estimation, which, however, is nontrivial to solve as the model is nonlinear in both its dynamics and measurement. We propose to leverage the Bayesian optimization approach, owing to its machine learning-driven capability in handling complex optimization problems and searching for global optima. To enhance the parameter search efficiency, we dynamically narrow and refine the search space in Bayesian optimization. The proposed system identification approach can efficiently determine the parameters of the coupled electro-thermal model. It is amenable to practical implementation, with few requirements on the experiment, data types, and optimization setups, and well applicable to many other battery models.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Confocal structured illumination microscopy
Authors:
Weishuai Zhou,
Manhong Yao,
Xi Lin,
Quan Yu,
Junzheng Peng,
Jingang Zhong
Abstract:
Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce t…
▽ More
Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce the concept of confocal imaging into OS-SIM and propose confocal structured illumination microscopy (CSIM) to enhance the imaging performance of OS-SIM. CSIM exploits the principle of dual photography to reconstruct a dual image from each pixel of the camera. The reconstructed dual image is equivalent to the image obtained by using the spatial light modulator (SLM) as a virtual camera, enabling the separation of the conjugate and non-conjugate signals recorded by the camera pixel. We can reject the non-conjugate signals by extracting the conjugate signal from each dual image to reconstruct a confocal image when establishing the conjugate relationship between the camera and the SLM. We have constructed the theoretical framework of CSIM. Optical-sectioning experimental results demonstrate that CSIM can reconstruct images with superior SNR and greater imaging depth compared with existing OS-SIM. CSIM is expected to expand the application scope of OS-SIM.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Perception- and Fidelity-aware Reduced-Reference Super-Resolution Image Quality Assessment
Authors:
Xinying Lin,
Xuyang Liu,
Hong Yang,
Xiaohai He,
Honggang Chen
Abstract:
With the advent of image super-resolution (SR) algorithms, how to evaluate the quality of generated SR images has become an urgent task. Although full-reference methods perform well in SR image quality assessment (SR-IQA), their reliance on high-resolution (HR) images limits their practical applicability. Leveraging available reconstruction information as much as possible for SR-IQA, such as low-r…
▽ More
With the advent of image super-resolution (SR) algorithms, how to evaluate the quality of generated SR images has become an urgent task. Although full-reference methods perform well in SR image quality assessment (SR-IQA), their reliance on high-resolution (HR) images limits their practical applicability. Leveraging available reconstruction information as much as possible for SR-IQA, such as low-resolution (LR) images and the scale factors, is a promising way to enhance assessment performance for SR-IQA without HR for reference. In this letter, we attempt to evaluate the perceptual quality and reconstruction fidelity of SR images considering LR images and scale factors. Specifically, we propose a novel dual-branch reduced-reference SR-IQA network, \ie, Perception- and Fidelity-aware SR-IQA (PFIQA). The perception-aware branch evaluates the perceptual quality of SR images by leveraging the merits of global modeling of Vision Transformer (ViT) and local relation of ResNet, and incorporating the scale factor to enable comprehensive visual perception. Meanwhile, the fidelity-aware branch assesses the reconstruction fidelity between LR and SR images through their visual perception. The combination of the two branches substantially aligns with the human visual system, enabling a comprehensive SR image evaluation. Experimental results indicate that our PFIQA outperforms current state-of-the-art models across three widely-used SR-IQA benchmarks. Notably, PFIQA excels in assessing the quality of real-world SR images.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Towards Real-world Video Face Restoration: A New Benchmark
Authors:
Ziyan Chen,
Jingwen He,
Xinqi Lin,
Yu Qiao,
Chao Dong
Abstract:
Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face ima…
▽ More
Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face images, which are limited in their coverage of real-world video frames. In this work, we introduced new real-world datasets named FOS with a taxonomy of "Full, Occluded, and Side" faces from mainly video frames to study the applicability of current methods on videos. Compared with existing test datasets, FOS datasets cover more diverse degradations and involve face samples from more complex scenarios, which helps to revisit current face restoration approaches more comprehensively. Given the established datasets, we benchmarked both the state-of-the-art BFR methods and the video super resolution (VSR) methods to comprehensively study current approaches, identifying their potential and limitations in VFR tasks. In addition, we studied the effectiveness of the commonly used image quality assessment (IQA) metrics and face IQA (FIQA) metrics by leveraging a subjective user study. With extensive experimental results and detailed analysis provided, we gained insights from the successes and failures of both current BFR and VSR methods. These results also pose challenges to current face restoration approaches, which we hope stimulate future advances in VFR research.
△ Less
Submitted 4 May, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction
Authors:
Chenhe Du,
Xiyue Lin,
Qing Wu,
Xuanyu Tian,
Ying Su,
Zhe Luo,
Hongjiang Wei,
S. Kevin Zhou,
Jingyi Yu,
Yuyao Zhang
Abstract:
Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging recon…
▽ More
Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging reconstruction tasks. However, the unsupervised nature of INR architecture imposes limited constraints on the solution space, particularly for the highly ill-posed reconstruction task posed by LACT and ultra-SVCT. In this study, we introduce the Diffusion Prior Driven Neural Representation (DPER), an advanced unsupervised framework designed to address the exceptionally ill-posed CT reconstruction inverse problems. DPER adopts the Half Quadratic Splitting (HQS) algorithm to decompose the inverse problem into data fidelity and distribution prior sub-problems. The two sub-problems are respectively addressed by INR reconstruction scheme and pre-trained score-based diffusion model. This combination initially preserves the implicit image local consistency prior from INR. Additionally, it effectively augments the feasibility of the solution space for the inverse problem through the generative diffusion model, resulting in increased stability and precision in the solutions. We conduct comprehensive experiments to evaluate the performance of DPER on LACT and ultra-SVCT reconstruction with two public datasets (AAPM and LIDC). The results show that our method outperforms the state-of-the-art reconstruction methods on in-domain datasets, while achieving significant performance improvements on out-of-domain datasets.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression
Authors:
Lei Lu,
Yanyue Xie,
Wei Jiang,
Wei Wang,
Xue Lin,
Yanzhi Wang
Abstract:
This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the…
▽ More
This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the insufficient representation power of limited codewords in capturing faithful details. We propose a novel dual-stream framework, HyrbidFlow, which combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme low bitrates. The codebook-based stream benefits from the high-quality learned codebook priors to provide high quality and clarity in reconstructed images. The continuous feature stream targets at maintaining fidelity details. To achieve the ultra low bitrate, a masked token-based transformer is further proposed, where we only transmit a masked portion of codeword indices and recover the missing indices through token generation guided by information from the continuous feature stream. We also develop a bridging correction network to merge the two streams in pixel decoding for final image reconstruction, where the continuous stream features rectify biases of the codebook-based pixel decoder to impose reconstructed fidelity details. Experimental results demonstrate superior performance across several datasets under extremely low bitrates, compared with existing single-stream codebook-based or continuous-feature-based LIC methods.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Yawei Li,
Nancy Mehta,
Radu Timofte,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Bingnan Han,
Zhuoyuan Wu,
Yajun Zou,
Yuqing Liu,
Jizhe Li,
Keji He,
Chao Fan,
Heng Zhang,
Xiaolin Zhang,
Xuanwu Yin,
Kunlong Zuo,
Bohao Liao,
Peizhe Xia,
Long Peng,
Zhibo Du,
Xin Di,
Wangkai Li,
Yang Wang
, et al. (109 additional authors not shown)
Abstract:
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such…
▽ More
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
△ Less
Submitted 25 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Improving Cancer Imaging Diagnosis with Bayesian Networks and Deep Learning: A Bayesian Deep Learning Approach
Authors:
Pei Xi,
Lin
Abstract:
With recent advancements in the development of artificial intelligence applications using theories and algorithms in machine learning, many accurate models can be created to train and predict on given datasets. With the realization of the importance of imaging interpretation in cancer diagnosis, this article aims to investigate the theory behind Deep Learning and Bayesian Network prediction models…
▽ More
With recent advancements in the development of artificial intelligence applications using theories and algorithms in machine learning, many accurate models can be created to train and predict on given datasets. With the realization of the importance of imaging interpretation in cancer diagnosis, this article aims to investigate the theory behind Deep Learning and Bayesian Network prediction models. Based on the advantages and drawbacks of each model, different approaches will be used to construct a Bayesian Deep Learning Model, combining the strengths while minimizing the weaknesses. Finally, the applications and accuracy of the resulting Bayesian Deep Learning approach in the health industry in classifying images will be analyzed.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Spikewhisper: Temporal Spike Backdoor Attacks on Federated Neuromorphic Learning over Low-power Devices
Authors:
Hanqing Fu,
Gaolei Li,
Jun Wu,
Jianhua Li,
Xi Lin,
Kai Zhou,
Yuchen Liu
Abstract:
Federated neuromorphic learning (FedNL) leverages event-driven spiking neural networks and federated learning frameworks to effectively execute intelligent analysis tasks over amounts of distributed low-power devices but also perform vulnerability to poisoning attacks. The threat of backdoor attacks on traditional deep neural networks typically comes from time-invariant data. However, in FedNL, un…
▽ More
Federated neuromorphic learning (FedNL) leverages event-driven spiking neural networks and federated learning frameworks to effectively execute intelligent analysis tasks over amounts of distributed low-power devices but also perform vulnerability to poisoning attacks. The threat of backdoor attacks on traditional deep neural networks typically comes from time-invariant data. However, in FedNL, unknown threats may be hidden in time-varying spike signals. In this paper, we start to explore a novel vulnerability of FedNL-based systems with the concept of time division multiplexing, termed Spikewhisper, which allows attackers to evade detection as much as possible, as multiple malicious clients can imperceptibly poison with different triggers at different timeslices. In particular, the stealthiness of Spikewhisper is derived from the time-domain divisibility of global triggers, in which each malicious client pastes only one local trigger to a certain timeslice in the neuromorphic sample, and also the polarity and motion of each local trigger can be configured by attackers. Extensive experiments based on two different neuromorphic datasets demonstrate that the attack success rate of Spikewispher is higher than the temporally centralized attacks. Besides, it is validated that the effect of Spikewispher is sensitive to the trigger duration.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Towards a MATLAB Toolbox to compute backstepping kernels using the power series method
Authors:
Xin Lin,
Rafael Vazquez,
Miroslav Krstic
Abstract:
In this paper, we extend our previous work on the power series method for computing backstepping kernels. Our first contribution is the development of initial steps towards a MATLAB toolbox dedicated to backstepping kernel computation. This toolbox would exploit MATLAB's linear algebra and sparse matrix manipulation features for enhanced efficiency; our initial findings show considerable improveme…
▽ More
In this paper, we extend our previous work on the power series method for computing backstepping kernels. Our first contribution is the development of initial steps towards a MATLAB toolbox dedicated to backstepping kernel computation. This toolbox would exploit MATLAB's linear algebra and sparse matrix manipulation features for enhanced efficiency; our initial findings show considerable improvements in computational speed with respect to the use of symbolical software without loss of precision at high orders. Additionally, we tackle limitations observed in our earlier work, such as slow convergence (due to oscillatory behaviors) and non-converging series (due to loss of analiticity at some singular points). To overcome these challenges, we introduce a technique that mitigates this behaviour by computing the expansion at different points, denoted as localized power series. This approach effectively navigates around singularities, and can also accelerates convergence by using more local approximations. Basic examples are provided to demonstrate these enhancements. Although this research is still ongoing, the significant potential and simplicity of the method already establish the power series approach as a viable and versatile solution for solving backstepping kernel equations, benefiting both novel and experienced practitioners in the field. We anticipate that these developments will be particularly beneficial in training the recently introduced neural operators that approximate backstepping kernels and gains.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations
Authors:
Xun Lin,
Yi Yu,
Song Xia,
Jue Jiang,
Haoran Wang,
Zitong Yu,
Yizhong Liu,
Ying Fu,
Shuai Wang,
Wenzhong Tang,
Alex Kot
Abstract:
The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segme…
▽ More
The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segmentation (MIS) datasets, where the processes of collection and fine-grained annotation are time-intensive and laborious. Recently, Unlearnable Examples (UEs) methods have shown the potential to protect images by adding invisible shortcuts. These shortcuts can prevent unauthorized deep neural networks from generalizing. However, existing UEs are designed for natural image classification and fail to protect MIS datasets imperceptibly as their protective perturbations are less learnable than important prior knowledge in MIS, e.g., contour and texture features. To this end, we propose an Unlearnable Medical image generation method, termed UMed. UMed integrates the prior knowledge of MIS by injecting contour- and texture-aware perturbations to protect images. Given that our target is to only poison features critical to MIS, UMed requires only minimal perturbations within the ROI and its contour to achieve greater imperceptibility (average PSNR is 50.03) and protective performance (clean average DSC degrades from 82.18% to 6.80%).
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Towards Energy Efficient RAN: From Industry Standards to Trending Practice
Authors:
Lopamudra Kundu,
Xingqin Lin,
Rajesh Gadiyar
Abstract:
As 5G deployments continue throughout the world, concerns regarding its energy consumption have gained significant traction. This article focuses on radio access networks (RANs) which account for a major portion of the network energy use. Firstly, we introduce the state-of-the-art 3GPP and O-RAN standardization work on enhancing RAN energy efficiency. Then we highlight three unique ways for enabli…
▽ More
As 5G deployments continue throughout the world, concerns regarding its energy consumption have gained significant traction. This article focuses on radio access networks (RANs) which account for a major portion of the network energy use. Firstly, we introduce the state-of-the-art 3GPP and O-RAN standardization work on enhancing RAN energy efficiency. Then we highlight three unique ways for enabling energy optimization in telecommunication networks, including full stack acceleration, network functions consolidation, and shared infrastructure between communication and artificial intelligence. These network design strategies not only allow for considerable overall reduction in the energy footprint, but also deliver several added benefits including improved throughput, reduced cost of ownership, and increased revenue opportunities for telcos.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Optimal Control of a Stochastic Power System -- Algorithms and Mathematical Analysis
Authors:
Zhen Wang,
Kaihua Xi,
Aijie Cheng,
Hai Xiang Lin,
Jan H. van Schuppen
Abstract:
The considered optimal control problem of a stochastic power system, is to select the set of power supply vectors which infimizes the probability that the phase-angle differences of any power flow of the network, endangers the transient stability of the power system by leaving a critical subset. The set of control laws is restricted to be a periodically recomputed set of fixed power supply vectors…
▽ More
The considered optimal control problem of a stochastic power system, is to select the set of power supply vectors which infimizes the probability that the phase-angle differences of any power flow of the network, endangers the transient stability of the power system by leaving a critical subset. The set of control laws is restricted to be a periodically recomputed set of fixed power supply vectors based on predictions of power demand for the next short horizon. Neither state feedback nor output feedback is used. The associated control objective function is Lipschitz continuous, nondifferentiable, and nonconvex. The results of the paper include that a minimum exists in the value range of the control objective function. Furthermore, it includes a two-step procedure to compute an approximate minimizer based on two key methods: (1) a projected generalized subgradient method for computing an initial vector, and (2) a steepest descent method for approximating a local minimizer. Finally, it includes two convergence theorems that an approximation sequence converges to a local minimum.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
The Bridge Toward 6G: 5G-Advanced Evolution in 3GPP Release 19
Authors:
Xingqin Lin
Abstract:
The 3rd generation partnership project (3GPP) initiated 5G-Advanced in Release 18, laying a solid foundation for the further evolution of 5G-Advanced. Release 19-the next wave of 5G-Advanced-will primarily focus on commercial deployment needs while serving as a bridge toward 6G. In this article, we provide an in-depth overview of the 5G-Advanced evolution in 3GPP Release 19. We not only delve into…
▽ More
The 3rd generation partnership project (3GPP) initiated 5G-Advanced in Release 18, laying a solid foundation for the further evolution of 5G-Advanced. Release 19-the next wave of 5G-Advanced-will primarily focus on commercial deployment needs while serving as a bridge toward 6G. In this article, we provide an in-depth overview of the 5G-Advanced evolution in 3GPP Release 19. We not only delve into the key technology components and their corresponding use cases in 5G-Advanced evolution but also shed light on initial 3GPP studies toward 6G.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
Deep Learning for Joint Design of Pilot, Channel Feedback, and Hybrid Beamforming in FDD Massive MIMO-OFDM Systems
Authors:
Junyi Yang,
Weifeng Zhu,
Shu Sun,
Xiaofeng Li,
Xingqin Lin,
Meixia Tao
Abstract:
This letter considers the transceiver design in frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems for high-quality data transmission. We propose a novel deep learning based framework where the procedures of pilot design, channel feedback, and hybrid beamforming are realized by carefully crafted deep neural networ…
▽ More
This letter considers the transceiver design in frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems for high-quality data transmission. We propose a novel deep learning based framework where the procedures of pilot design, channel feedback, and hybrid beamforming are realized by carefully crafted deep neural networks. All the considered modules are jointly learned in an end-to-end manner, and a graph neural network is adopted to effectively capture interactions between beamformers based on the built graphical representation. Numerical results validate the effectiveness of our method.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Estimation Sample Complexity of a Class of Nonlinear Continuous-time Systems
Authors:
Simon Kuang,
Xinfan Lin
Abstract:
We present a method of parameter estimation for large class of nonlinear systems, namely those in which the state consists of output derivatives and the flow is linear in the parameter. The method, which solves for the unknown parameter by directly inverting the dynamics using regularized linear regression, is based on new design and analysis ideas for differentiation filtering and regularized lea…
▽ More
We present a method of parameter estimation for large class of nonlinear systems, namely those in which the state consists of output derivatives and the flow is linear in the parameter. The method, which solves for the unknown parameter by directly inverting the dynamics using regularized linear regression, is based on new design and analysis ideas for differentiation filtering and regularized least squares. Combined in series, they yield a novel finite-sample bound on mean absolute error of estimation.
△ Less
Submitted 22 April, 2024; v1 submitted 8 December, 2023;
originally announced December 2023.
-
Human Body Model based ID using Shape and Pose Parameters
Authors:
Aravind Sundaresan,
Brian Burns,
Indranil Sur,
Yi Yao,
Xiao Lin,
Sujeong Kim
Abstract:
We present a Human Body model based IDentification system (HMID) system that is jointly trained for shape, pose and biometric identification. HMID is based on the Human Mesh Recovery (HMR) network and we propose additional losses to improve and stabilize shape estimation and biometric identification while maintaining the pose and shape output. We show that when our HMID network is trained using ad…
▽ More
We present a Human Body model based IDentification system (HMID) system that is jointly trained for shape, pose and biometric identification. HMID is based on the Human Mesh Recovery (HMR) network and we propose additional losses to improve and stabilize shape estimation and biometric identification while maintaining the pose and shape output. We show that when our HMID network is trained using additional shape and pose losses, it shows a significant improvement in biometric identification performance when compared to an identical model that does not use such losses. The HMID model uses raw images instead of silhouettes and is able to perform robust recognition on images collected at range and altitude as many anthropometric properties are reasonably invariant to clothing, view and range. We show results on the USF dataset as well as the BRIAR dataset which includes probes with both clothing and view changes. Our approach (using body model losses) shows a significant improvement in Rank20 accuracy and True Accuracy Rate on the BRIAR evaluation dataset.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Speech Understanding on Tiny Devices with A Learning Cache
Authors:
Afsara Benazir,
Zhiming Xu,
Felix Xiaozhu Lin
Abstract:
This paper addresses spoken language understanding (SLU) on microcontroller-like embedded devices, integrating on-device execution with cloud offloading in a novel fashion. We leverage temporal locality in the speech inputs to a device and reuse recent SLU inferences accordingly. Our idea is simple: let the device match incoming inputs against cached results, and only offload inputs not matched to…
▽ More
This paper addresses spoken language understanding (SLU) on microcontroller-like embedded devices, integrating on-device execution with cloud offloading in a novel fashion. We leverage temporal locality in the speech inputs to a device and reuse recent SLU inferences accordingly. Our idea is simple: let the device match incoming inputs against cached results, and only offload inputs not matched to any cached ones to the cloud for full inference. Realization of this idea, however, is non-trivial: the device needs to compare acoustic features in a robust yet low-cost way. To this end, we present SpeechCache (or SC), a speech cache for tiny devices. It matches speech inputs at two levels of representations: first by sequences of clustered raw sound units, then as sequences of phonemes. Working in tandem, the two representations offer complementary tradeoffs between cost and efficiency. To boost accuracy even further, our cache learns to personalize: with the mismatched and then offloaded inputs, it continuously finetunes the device's feature extractors with the assistance of the cloud. We implement SC on an off-the-shelf STM32 microcontroller. The complete implementation has a small memory footprint of 2MB. Evaluated on challenging speech benchmarks, our system resolves 45%-90% of inputs on device, reducing the average latency by up to 80% compared to offloading to popular cloud speech recognition services. The benefit brought by our proposed SC is notable even in adversarial settings - noisy environments, cold cache, or one device shared by a number of users.
△ Less
Submitted 8 May, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Efficient Deep Speech Understanding at the Edge
Authors:
Rongxiang Wang,
Felix Xiaozhu Lin
Abstract:
In contemporary speech understanding (SU), a sophisticated pipeline is employed, encompassing the ingestion of streaming voice input. The pipeline executes beam search iteratively, invoking a deep neural network to generate tentative outputs (referred to as hypotheses) in an autoregressive manner. Periodically, the pipeline assesses attention and Connectionist Temporal Classification (CTC) scores.…
▽ More
In contemporary speech understanding (SU), a sophisticated pipeline is employed, encompassing the ingestion of streaming voice input. The pipeline executes beam search iteratively, invoking a deep neural network to generate tentative outputs (referred to as hypotheses) in an autoregressive manner. Periodically, the pipeline assesses attention and Connectionist Temporal Classification (CTC) scores.
This paper aims to enhance SU performance on edge devices with limited resources. Adopting a hybrid strategy, our approach focuses on accelerating on-device execution and offloading inputs surpassing the device's capacity. While this approach is established, we tackle SU's distinctive challenges through innovative techniques: (1) Late Contextualization: This involves the parallel execution of a model's attentive encoder during input ingestion. (2) Pilot Inference: Addressing temporal load imbalances in the SU pipeline, this technique aims to mitigate them effectively. (3) Autoregression Offramps: Decisions regarding offloading are made solely based on hypotheses, presenting a novel approach.
These techniques are designed to seamlessly integrate with existing speech models, pipelines, and frameworks, offering flexibility for independent or combined application. Collectively, they form a hybrid solution for edge SU. Our prototype, named XYZ, has undergone testing on Arm platforms featuring 6 to 8 cores, demonstrating state-of-the-art accuracy. Notably, it achieves a 2x reduction in end-to-end latency and a corresponding 2x decrease in offloading requirements.
△ Less
Submitted 4 December, 2023; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Control of the Power Flows of a Stochastic Power System
Authors:
Zhen Wang,
Kaihua Xi,
Aijie Cheng,
Hai Xiang Lin,
Jan H. van Schuppen
Abstract:
How to determine the power supply of a power system to guarantee that the state remains during a short horizon in a critical subset of the state set? The critical subset is related to the power flows of all power lines of a power system and to transient stability. The control objective is to minimize a cost function. That function is defined as the maximal power flow over all power lines, includin…
▽ More
How to determine the power supply of a power system to guarantee that the state remains during a short horizon in a critical subset of the state set? The critical subset is related to the power flows of all power lines of a power system and to transient stability. The control objective is to minimize a cost function. That function is defined as the maximal power flow over all power lines, including a multiple of its standard deviation, as a function of the power supply vector. That the controlled system has an improved performance is shown by numerical results of three academic examples including an eight-node academic network, a twelve-node ring network, and a Manhattan-grid network.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Performance Analysis of Integrated Data and Energy Transfer Assisted by Fluid Antenna Systems
Authors:
Xiao Lin,
Halvin Yang,
Yizhe Zhao,
Jie Hu,
Kai-Kit Wong
Abstract:
Fluid antenna multiple access (FAMA) is capable of exploiting the high spatial diversity of wireless channels to mitigate multi-user interference via flexible port switching, which achieves a better performance than traditional multi-input-multi-output (MIMO) systems. Moreover, integrated data and energy transfer (IDET) is able to provide both the wireless data transfer (WDT) and wireless energy t…
▽ More
Fluid antenna multiple access (FAMA) is capable of exploiting the high spatial diversity of wireless channels to mitigate multi-user interference via flexible port switching, which achieves a better performance than traditional multi-input-multi-output (MIMO) systems. Moreover, integrated data and energy transfer (IDET) is able to provide both the wireless data transfer (WDT) and wireless energy transfer (WET) services towards low-power devices. In this paper, a FAMA assisted IDET system is studied, where $N$ access points (APs) provide dedicated IDET services towards $N$ user equipments (UEs). Each UE is equipped with a single fluid antenna. The performance of WDT and WET , i.e., the WDT outage probability, the WET outage probability, the reliable throughput and the average energy harvesting amount, are analysed theoretically by using time switching (TS) between WDT and WET. Numerical results validate our theoretical analysis, which reveals that the number of UEs and TS ratio should be optimized to achieve a trade-off between the WDT and WET performance. Moreover, FAMA assisted IDET achieves a better performance in terms of both WDT and WET than traditional MIMO with the same antenna size.
△ Less
Submitted 7 February, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Online Relocating and Matching of Ride-Hailing Services: A Model-Based Modular Approach
Authors:
Chang Gao,
Xi Lin,
Fang He,
Xindi Tang
Abstract:
This study proposes an innovative model-based modular approach (MMA) to dynamically optimize order matching and vehicle relocation in a ride-hailing platform. MMA utilizes a two-layer and modular modeling structure. The upper layer determines the spatial transfer patterns of vehicle flow within the system to maximize the total revenue of the current and future stages. With the guidance provided by…
▽ More
This study proposes an innovative model-based modular approach (MMA) to dynamically optimize order matching and vehicle relocation in a ride-hailing platform. MMA utilizes a two-layer and modular modeling structure. The upper layer determines the spatial transfer patterns of vehicle flow within the system to maximize the total revenue of the current and future stages. With the guidance provided by the upper layer, the lower layer performs rapid vehicle-to-order matching and vehicle relocation. MMA is interpretable, and equipped with the customized and polynomial-time algorithm, which, as an online order-matching and vehicle-relocation algorithm, can scale past thousands of vehicles. We theoretically prove that the proposed algorithm can achieve the global optimum in stylized networks, while the numerical experiments based on both the toy network and realistic dataset demonstrate that MMA is capable of achieving superior systematic performance compared to batch matching and reinforcement-learning based methods. Moreover, its modular and lightweight modeling structure further enables it to achieve a high level of robustness against demand variation while maintaining a relatively low computational cost.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Generalized Benders Decomposition with Continual Learning for Hybrid Model Predictive Control in Dynamic Environment
Authors:
Xuan Lin
Abstract:
Hybrid model predictive control (MPC) with both continuous and discrete variables is widely applicable to robotic control tasks, especially those involving contact with the environment. Due to the combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we proposed a hybrid MPC solver based on Generalized Benders Decomposition (GBD) w…
▽ More
Hybrid model predictive control (MPC) with both continuous and discrete variables is widely applicable to robotic control tasks, especially those involving contact with the environment. Due to the combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we proposed a hybrid MPC solver based on Generalized Benders Decomposition (GBD) with continual learning. The algorithm accumulates cutting planes from the invariant dual space of the subproblems. After a short cold-start phase, the accumulated cuts provide warm-starts for the new problem instances to increase the solving speed. Despite the randomly changing environment that the control is unprepared for, the solving speed maintains. We verified our solver on controlling a cart-pole system with randomly moving soft contact walls and show that the solving speed is 2-3 times faster than the off-the-shelf solver Gurobi.
△ Less
Submitted 10 October, 2023; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Leveraging Untrustworthy Commands for Multi-Robot Coordination in Unpredictable Environments: A Bandit Submodular Maximization Approach
Authors:
Zirui Xu,
Xiaofeng Lin,
Vasileios Tzoumas
Abstract:
We study the problem of multi-agent coordination in unpredictable and partially-observable environments with untrustworthy external commands. The commands are actions suggested to the robots, and are untrustworthy in that their performance guarantees, if any, are unknown. Such commands may be generated by human operators or machine learning algorithms and, although untrustworthy, can often increas…
▽ More
We study the problem of multi-agent coordination in unpredictable and partially-observable environments with untrustworthy external commands. The commands are actions suggested to the robots, and are untrustworthy in that their performance guarantees, if any, are unknown. Such commands may be generated by human operators or machine learning algorithms and, although untrustworthy, can often increase the robots' performance in complex multi-robot tasks. We are motivated by complex multi-robot tasks such as target tracking, environmental mapping, and area monitoring. Such tasks are often modeled as submodular maximization problems due to the information overlap among the robots. We provide an algorithm, Meta Bandit Sequential Greedy (MetaBSG), which enjoys performance guarantees even when the external commands are arbitrarily bad. MetaBSG leverages a meta-algorithm to learn whether the robots should follow the commands or a recently developed submodular coordination algorithm, Bandit Sequential Greedy (BSG) [1], which has performance guarantees even in unpredictable and partially-observable environments. Particularly, MetaBSG asymptotically can achieve the better performance out of the commands and the BSG algorithm, quantifying its suboptimality against the optimal time-varying multi-robot actions in hindsight. Thus, MetaBSG can be interpreted as robustifying the untrustworthy commands. We validate our algorithm in simulated scenarios of multi-target tracking.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
A Critical Escape Probability Formulation for Enhancing the Transient Stability of Power Systems with System Parameter Design
Authors:
Xian Wu,
Kaihua Xi,
Aijie Cheng,
Chenghui Zhang,
Hai Xiang Lin
Abstract:
For the enhancement of the transient stability of power systems, the key is to define a quantitative optimization formulation with system parameters as decision variables. In this paper, we model the disturbances by Gaussian noise and define a metric named Critical Escape Probability (CREP) based on the invariant probability measure of a linearised stochastic processes. CREP characterizes the prob…
▽ More
For the enhancement of the transient stability of power systems, the key is to define a quantitative optimization formulation with system parameters as decision variables. In this paper, we model the disturbances by Gaussian noise and define a metric named Critical Escape Probability (CREP) based on the invariant probability measure of a linearised stochastic processes. CREP characterizes the probability of the state escaping from a critical set. CREP involves all the system parameters and reflects the size of the basin of attraction of the nonlinear systems. An optimization framework that minimizes CREP with the system parameters as decision variablesis is presented. Simulations show that the mean first hitting time when the state hits the boundary of the critical set, that is often used to describe the stability of nonlinear systems, is dramatically increased by minimizing CREP. This indicates that the transient stability of the system is effectively enhanced. It also shown that suppressing the state fluctuations only is insufficient for enhancing the transient stability. In addition, the famous Braess' paradox which also exists in power systems is revisited. Surprisingly, it turned out that the paradoxes identified by the traditional metric may not exist according to CREP. This new metric opens a new avenue for the transient stability analysis of future power systems integrated with large amounts of renewable energy.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation
Authors:
Xian Lin,
Zengqiang Yan,
Xianbo Deng,
Chuansheng Zheng,
Li Yu
Abstract:
Transformers have been extensively studied in medical image segmentation to build pairwise long-range dependence. Yet, relatively limited well-annotated medical image data makes transformers struggle to extract diverse global features, resulting in attention collapse where attention maps become similar or even identical. Comparatively, convolutional neural networks (CNNs) have better convergence p…
▽ More
Transformers have been extensively studied in medical image segmentation to build pairwise long-range dependence. Yet, relatively limited well-annotated medical image data makes transformers struggle to extract diverse global features, resulting in attention collapse where attention maps become similar or even identical. Comparatively, convolutional neural networks (CNNs) have better convergence properties on small-scale training data but suffer from limited receptive fields. Existing works are dedicated to exploring the combinations of CNN and transformers while ignoring attention collapse, leaving the potential of transformers under-explored. In this paper, we propose to build CNN-style Transformers (ConvFormer) to promote better attention convergence and thus better segmentation performance. Specifically, ConvFormer consists of pooling, CNN-style self-attention (CSA), and convolutional feed-forward network (CFFN) corresponding to tokenization, self-attention, and feed-forward network in vanilla vision transformers. In contrast to positional embedding and tokenization, ConvFormer adopts 2D convolution and max-pooling for both position information preservation and feature size reduction. In this way, CSA takes 2D feature maps as inputs and establishes long-range dependency by constructing self-attention matrices as convolution kernels with adaptive sizes. Following CSA, 2D convolution is utilized for feature refinement through CFFN. Experimental results on multiple datasets demonstrate the effectiveness of ConvFormer working as a plug-and-play module for consistent performance improvement of transformer-based frameworks. Code is available at https://github.com/xianlin7/ConvFormer.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Slimmed optical neural networks with multiplexed neuron sets and a corresponding backpropagation training algorithm
Authors:
Yi-Feng Liu,
Rui-Yao Ren,
Dai-Bao Hou,
Hai-Zhong Weng,
Bo-Wen Wang,
Ke-Jie Huang,
Xing Lin,
Feng Liu,
Chen-Hui Li,
Chao-Yuan Jin
Abstract:
Due to their intrinsic capabilities on parallel signal processing, optical neural networks (ONNs) have attracted extensive interests recently as a potential alternative to electronic artificial neural networks (ANNs) with reduced power consumption and low latency. Preliminary confirmation of the parallelism in optical computing has been widely done by applying the technology of wavelength division…
▽ More
Due to their intrinsic capabilities on parallel signal processing, optical neural networks (ONNs) have attracted extensive interests recently as a potential alternative to electronic artificial neural networks (ANNs) with reduced power consumption and low latency. Preliminary confirmation of the parallelism in optical computing has been widely done by applying the technology of wavelength division multiplexing (WDM) in the linear transformation part of neural networks. However, inter-channel crosstalk has obstructed WDM technologies to be deployed in nonlinear activation in ONNs. Here, we propose a universal WDM structure called multiplexed neuron sets (MNS) which apply WDM technologies to optical neurons and enable ONNs to be further compressed. A corresponding back-propagation (BP) training algorithm is proposed to alleviate or even cancel the influence of inter-channel crosstalk on MNS-based WDM-ONNs. For simplicity, semiconductor optical amplifiers (SOAs) are employed as an example of MNS to construct a WDM-ONN trained with the new algorithm. The result shows that the combination of MNS and the corresponding BP training algorithm significantly downsize the system and improve the energy efficiency to tens of times while giving similar performance to traditional ONNs.
△ Less
Submitted 13 December, 2023; v1 submitted 27 August, 2023;
originally announced August 2023.
-
Unsupervised Image Denoising in Real-World Scenarios via Self-Collaboration Parallel Generative Adversarial Branches
Authors:
Xin Lin,
Chao Ren,
Xiao Liu,
Jie Huang,
Yinjie Lei
Abstract:
Deep learning methods have shown remarkable performance in image denoising, particularly when trained on large-scale paired datasets. However, acquiring such paired datasets for real-world scenarios poses a significant challenge. Although unsupervised approaches based on generative adversarial networks offer a promising solution for denoising without paired datasets, they are difficult in surpassi…
▽ More
Deep learning methods have shown remarkable performance in image denoising, particularly when trained on large-scale paired datasets. However, acquiring such paired datasets for real-world scenarios poses a significant challenge. Although unsupervised approaches based on generative adversarial networks offer a promising solution for denoising without paired datasets, they are difficult in surpassing the performance limitations of conventional GAN-based unsupervised frameworks without significantly modifying existing structures or increasing the computational complexity of denoisers. To address this problem, we propose a SC strategy for multiple denoisers. This strategy can achieve significant performance improvement without increasing the inference complexity of the GAN-based denoising framework. Its basic idea is to iteratively replace the previous less powerful denoiser in the filter-guided noise extraction module with the current powerful denoiser. This process generates better synthetic clean-noisy image pairs, leading to a more powerful denoiser for the next iteration. This baseline ensures the stability and effectiveness of the training network. The experimental results demonstrate the superiority of our method over state-of-the-art unsupervised methods.
△ Less
Submitted 13 August, 2023;
originally announced August 2023.
-
An Overview of the 3GPP Study on Artificial Intelligence for 5G New Radio
Authors:
Xingqin Lin
Abstract:
Air interface is a fundamental component within any wireless communication system. In Release 18, the 3rd Generation Partnership Project (3GPP) delves into the possibilities of leveraging artificial intelligence (AI)/machine learning (ML) to improve the performance of the fifth-generation (5G) New Radio (NR) air interface. This endeavor marks a pioneering stride within 3GPP's journey in shaping wi…
▽ More
Air interface is a fundamental component within any wireless communication system. In Release 18, the 3rd Generation Partnership Project (3GPP) delves into the possibilities of leveraging artificial intelligence (AI)/machine learning (ML) to improve the performance of the fifth-generation (5G) New Radio (NR) air interface. This endeavor marks a pioneering stride within 3GPP's journey in shaping wireless communication standards. This article offers a comprehensive overview of the pivotal themes explored by 3GPP in this domain. Encompassing a general framework for AI/ML and specific use cases such as channel state information feedback, beam management, and positioning, it provides a holistic perspective. Moreover, we highlight the potential trajectory of AI/ML for the NR air interface in 3GPP Release 19, a pathway that paves the journey towards the sixth generation (6G) wireless communication systems that will feature integrated AI and communication as a key usage scenario.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Vehicle Dispatching and Routing of On-Demand Intercity Ride-Pooling Services: A Multi-Agent Hierarchical Reinforcement Learning Approach
Authors:
Jinhua Si,
Fang He,
Xi Lin,
Xindi Tang
Abstract:
The integrated development of city clusters has given rise to an increasing demand for intercity travel. Intercity ride-pooling service exhibits considerable potential in upgrading traditional intercity bus services by implementing demand-responsive enhancements. Nevertheless, its online operations suffer the inherent complexities due to the coupling of vehicle resource allocation among cities and…
▽ More
The integrated development of city clusters has given rise to an increasing demand for intercity travel. Intercity ride-pooling service exhibits considerable potential in upgrading traditional intercity bus services by implementing demand-responsive enhancements. Nevertheless, its online operations suffer the inherent complexities due to the coupling of vehicle resource allocation among cities and pooled-ride vehicle routing. To tackle these challenges, this study proposes a two-level framework designed to facilitate online fleet management. Specifically, a novel multi-agent feudal reinforcement learning model is proposed at the upper level of the framework to cooperatively assign idle vehicles to different intercity lines, while the lower level updates the routes of vehicles using an adaptive large neighborhood search heuristic. Numerical studies based on the realistic dataset of Xiamen and its surrounding cities in China show that the proposed framework effectively mitigates the supply and demand imbalances, and achieves significant improvement in both the average daily system profit and order fulfillment ratio.
△ Less
Submitted 20 March, 2024; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Unsupervised speech enhancement with deep dynamical generative speech and noise models
Authors:
Xiaoyu Lin,
Simon Leglaive,
Laurent Girin,
Xavier Alameda-Pineda
Abstract:
This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF noise model with a deep dynamical generative model (DDGM) depending either on the DVAE latent variables, or on the noisy observations, or on both. This DDGM can…
▽ More
This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF noise model with a deep dynamical generative model (DDGM) depending either on the DVAE latent variables, or on the noisy observations, or on both. This DDGM can be trained in three configurations: noise-agnostic, noise-dependent and noise adaptation after noise-dependent training. Experimental results show that the proposed method achieves competitive performance compared to state-of-the-art unsupervised speech enhancement methods, while the noise-dependent training configuration yields a much more time-efficient inference process.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
Bandit Submodular Maximization for Multi-Robot Coordination in Unpredictable and Partially Observable Environments
Authors:
Zirui Xu,
Xiaofeng Lin,
Vasileios Tzoumas
Abstract:
We study the problem of multi-agent coordination in unpredictable and partially observable environments, that is, environments whose future evolution is unknown a priori and that can only be partially observed. We are motivated by the future of autonomy that involves multiple robots coordinating actions in dynamic, unstructured, and partially observable environments to complete complex tasks such…
▽ More
We study the problem of multi-agent coordination in unpredictable and partially observable environments, that is, environments whose future evolution is unknown a priori and that can only be partially observed. We are motivated by the future of autonomy that involves multiple robots coordinating actions in dynamic, unstructured, and partially observable environments to complete complex tasks such as target tracking, environmental mapping, and area monitoring. Such tasks are often modeled as submodular maximization coordination problems due to the information overlap among the robots. We introduce the first submodular coordination algorithm with bandit feedback and bounded tracking regret -- bandit feedback is the robots' ability to compute in hindsight only the effect of their chosen actions, instead of all the alternative actions that they could have chosen instead, due to the partial observability; and tracking regret is the algorithm's suboptimality with respect to the optimal time-varying actions that fully know the future a priori. The bound gracefully degrades with the environments' capacity to change adversarially, quantifying how often the robots should re-select actions to learn to coordinate as if they fully knew the future a priori. The algorithm generalizes the seminal Sequential Greedy algorithm by Fisher et al. to the bandit setting, by leveraging submodularity and algorithms for the problem of tracking the best action. We validate our algorithm in simulated scenarios of multi-target tracking.
△ Less
Submitted 26 May, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Authors:
Huadai Liu,
Rongjie Huang,
Xuan Lin,
Wenqiang Xu,
Maozong Zheng,
Hong Chen,
Jinzheng He,
Zhou Zhao
Abstract:
Text-to-speech(TTS) has undergone remarkable improvements in performance, particularly with the advent of Denoising Diffusion Probabilistic Models (DDPMs). However, the perceived quality of audio depends not solely on its content, pitch, rhythm, and energy, but also on the physical environment. In this work, we propose ViT-TTS, the first visual TTS model with scalable diffusion transformers. ViT-T…
▽ More
Text-to-speech(TTS) has undergone remarkable improvements in performance, particularly with the advent of Denoising Diffusion Probabilistic Models (DDPMs). However, the perceived quality of audio depends not solely on its content, pitch, rhythm, and energy, but also on the physical environment. In this work, we propose ViT-TTS, the first visual TTS model with scalable diffusion transformers. ViT-TTS complement the phoneme sequence with the visual information to generate high-perceived audio, opening up new avenues for practical applications of AR and VR to allow a more immersive and realistic audio experience. To mitigate the data scarcity in learning visual acoustic information, we 1) introduce a self-supervised learning framework to enhance both the visual-text encoder and denoiser decoder; 2) leverage the diffusion transformer scalable in terms of parameters and capacity to learn visual scene information. Experimental results demonstrate that ViT-TTS achieves new state-of-the-art results, outperforming cascaded systems and other baselines regardless of the visibility of the scene. With low-resource data (1h, 2h, 5h), ViT-TTS achieves comparative results with rich-resource baselines.~\footnote{Audio samples are available at \url{https://ViT-TTS.github.io/.}}
△ Less
Submitted 21 April, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Hardware Acceleration for Open Radio Access Networks: A Contemporary Overview
Authors:
Lopamudra Kundu,
Xingqin Lin,
Elena Agostini,
Vikrama Ditya
Abstract:
Radio access networks (RAN) are going through a paradigm shift towards interoperable, intelligent, software-defined, and cloud-native open RAN solutions. A key challenge towards the adoption and deployment of open RAN at scale is performance. Hence, it is critical to leverage the power of hardware acceleration to offload compute-heavy RAN workloads to specialized hardware devices to enable acceler…
▽ More
Radio access networks (RAN) are going through a paradigm shift towards interoperable, intelligent, software-defined, and cloud-native open RAN solutions. A key challenge towards the adoption and deployment of open RAN at scale is performance. Hence, it is critical to leverage the power of hardware acceleration to offload compute-heavy RAN workloads to specialized hardware devices to enable accelerated compute for open RAN deployments. In this article, we provide a state-of-the-art overview of hardware acceleration for open RAN in the fifth generation (5G) wireless networks. We also present a practical implementation of inline hardware acceleration for open RAN layer 1 processing and identify several areas for future exploration towards the sixth generation (6G) wireless networks.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Dual Degradation Representation for Joint Deraining and Low-Light Enhancement in the Dark
Authors:
Xin Lin,
Jingtong Yue,
Sixian Ding,
Chao Ren,
Lu Qi,
Ming-Hsuan Yang
Abstract:
Rain in the dark poses a significant challenge to deploying real-world applications such as autonomous driving, surveillance systems, and night photography. Existing low-light enhancement or deraining methods struggle to brighten low-light conditions and remove rain simultaneously. Additionally, cascade approaches like ``deraining followed by low-light enhancement'' or the reverse often result in…
▽ More
Rain in the dark poses a significant challenge to deploying real-world applications such as autonomous driving, surveillance systems, and night photography. Existing low-light enhancement or deraining methods struggle to brighten low-light conditions and remove rain simultaneously. Additionally, cascade approaches like ``deraining followed by low-light enhancement'' or the reverse often result in problematic rain patterns or overly blurred and overexposed images. To address these challenges, we introduce an end-to-end model called L$^{2}$RIRNet, designed to manage both low-light enhancement and deraining in real-world settings. Our model features two main components: a Dual Degradation Representation Network (DDR-Net) and a Restoration Network. The DDR-Net independently learns degradation representations for luminance effects in dark areas and rain patterns in light areas, employing dual degradation loss to guide the training process. The Restoration Network restores the degraded image using a Fourier Detail Guidance (FDG) module, which leverages near-rainless detailed images, focusing on texture details in frequency and spatial domains to inform the restoration process. Furthermore, we contribute a dataset containing both synthetic and real-world low-light-rainy images. Extensive experiments demonstrate that our L$^{2}$RIRNet performs favorably against existing methods in both synthetic and complex real-world scenarios. All the code and dataset can be found in \url{https://github.com/linxin0/Low_light_rainy}.
△ Less
Submitted 17 June, 2024; v1 submitted 6 May, 2023;
originally announced May 2023.
-
Application of attention-based Siamese composite neural network in medical image recognition
Authors:
Zihao Huang,
Yue Wang,
Weixing Xin,
Xingtong Lin,
Huizhen Li,
Haowen Chen,
Yizhen Lao,
Xia Chen
Abstract:
Medical image recognition often faces the problem of insufficient data in practical applications. Image recognition and processing under few-shot conditions will produce overfitting, low recognition accuracy, low reliability and insufficient robustness. It is often the case that the difference of characteristics is subtle, and the recognition is affected by perspectives, background, occlusion and…
▽ More
Medical image recognition often faces the problem of insufficient data in practical applications. Image recognition and processing under few-shot conditions will produce overfitting, low recognition accuracy, low reliability and insufficient robustness. It is often the case that the difference of characteristics is subtle, and the recognition is affected by perspectives, background, occlusion and other factors, which increases the difficulty of recognition. Furthermore, in fine-grained images, the few-shot problem leads to insufficient useful feature information in the images. Considering the characteristics of few-shot and fine-grained image recognition, this study has established a recognition model based on attention and Siamese neural network. Aiming at the problem of few-shot samples, a Siamese neural network suitable for classification model is proposed. The Attention-Based neural network is used as the main network to improve the classification effect. Covid- 19 lung samples have been selected for testing the model. The results show that the less the number of image samples are, the more obvious the advantage shows than the ordinary neural network.
△ Less
Submitted 15 March, 2024; v1 submitted 19 April, 2023;
originally announced April 2023.
-
DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification
Authors:
Yangfu Li,
Jiapan Gan,
Xiaodan Lin
Abstract:
Conventional time-delay neural networks (TDNNs) struggle to handle long-range context, their ability to represent speaker information is therefore limited in long utterances. Existing solutions either depend on increasing model complexity or try to balance between local features and global context to address this issue. To effectively leverage the long-term dependencies of audio signals and constr…
▽ More
Conventional time-delay neural networks (TDNNs) struggle to handle long-range context, their ability to represent speaker information is therefore limited in long utterances. Existing solutions either depend on increasing model complexity or try to balance between local features and global context to address this issue. To effectively leverage the long-term dependencies of audio signals and constrain model complexity, we introduce a novel module called Global-aware Filter layer (GF layer) in this work, which employs a set of learnable transform-domain filters between a 1D discrete Fourier transform and its inverse transform to capture global context. Additionally, we develop a dynamic filtering strategy and a sparse regularization method to enhance the performance of the GF layer and prevent overfitting. Based on the GF layer, we present a dual-stream TDNN architecture called DS-TDNN for automatic speaker verification (ASV), which utilizes two unique branches to extract both local and global features in parallel and employs an efficient strategy to fuse different-scale information. Experiments on the Voxceleb and SITW databases demonstrate that the DS-TDNN achieves a relative improvement of 10\% together with a relative decline of 20\% in computational cost over the ECAPA-TDNN in speaker verification task. This improvement will become more evident as the utterance's duration grows. Furthermore, the DS-TDNN also beats popular deep residual models and attention-based systems on utterances of arbitrary length.
△ Less
Submitted 1 August, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Speech Modeling with a Hierarchical Transformer Dynamical VAE
Authors:
Xiaoyu Lin,
Xiaoyu Bie,
Simon Leglaive,
Laurent Girin,
Xavier Alameda-Pineda
Abstract:
The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors. In almost all the DVAEs of the literature, the temporal dependencies within each sequence and across the two sequences are modeled with recurrent neural networks. In this paper, we propose to…
▽ More
The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors. In almost all the DVAEs of the literature, the temporal dependencies within each sequence and across the two sequences are modeled with recurrent neural networks. In this paper, we propose to model speech signals with the Hierarchical Transformer DVAE (HiT-DVAE), which is a DVAE with two levels of latent variable (sequence-wise and frame-wise) and in which the temporal dependencies are implemented with the Transformer architecture. We show that HiT-DVAE outperforms several other DVAEs for speech spectrogram modeling, while enabling a simpler training procedure, revealing its high potential for downstream low-level speech processing tasks such as speech enhancement.
△ Less
Submitted 10 May, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
5G-Advanced Towards 6G: Past, Present, and Future
Authors:
Wanshi Chen,
Xingqin Lin,
Juho Lee,
Antti Toskala,
Shu Sun,
Carla Fabiana Chiasserini,
Lingjia Liu
Abstract:
Since the start of 5G work in 3GPP in early 2016, tremendous progress has been made in both standardization and commercial deployments. 3GPP is now entering the second phase of 5G standardization, known as 5G-Advanced, built on the 5G baseline in 3GPP Releases 15, 16, and 17. 3GPP Release 18, the start of 5G-Advanced, includes a diverse set of features that cover both device and network evolutions…
▽ More
Since the start of 5G work in 3GPP in early 2016, tremendous progress has been made in both standardization and commercial deployments. 3GPP is now entering the second phase of 5G standardization, known as 5G-Advanced, built on the 5G baseline in 3GPP Releases 15, 16, and 17. 3GPP Release 18, the start of 5G-Advanced, includes a diverse set of features that cover both device and network evolutions, providing balanced mobile broadband evolution and further vertical domain expansion and accommodating both immediate and long-term commercial needs. 5G-Advanced will significantly expand 5G capabilities, address many new use cases, transform connectivity experiences, and serve as an essential step in developing mobile communications towards 6G. This paper provides a comprehensive overview of the 3GPP 5G-Advanced development, introducing the prominent state-of-the-art technologies investigated in 3GPP and identifying key evolution directions for future research and standardization.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Explicit formulas for the Variance of the State of a Linearized Power System driven by Gaussian stochastic disturbances
Authors:
Xian Wu,
Kaihua Xi,
Aijie Cheng,
Hai Xiang Lin,
Jan H van Schuppen,
Chenghui Zhang
Abstract:
We look into the fluctuations caused by disturbances in power systems. In the linearized system of the power systems, the disturbance is modeled by a Brownian motion process, and the fluctuations are described by the covariance matrix of the associated stochastic process at the invariant probability distribution. We derive explicit formulas for the covariance matrix for the system with a uniform d…
▽ More
We look into the fluctuations caused by disturbances in power systems. In the linearized system of the power systems, the disturbance is modeled by a Brownian motion process, and the fluctuations are described by the covariance matrix of the associated stochastic process at the invariant probability distribution. We derive explicit formulas for the covariance matrix for the system with a uniform damping-inertia ratio. The variance of the frequency at the node with the disturbance is significantly bigger than the sum of those at all the other nodes, indicating the disturbance effects the node most, according to research on the variances in complete graphs and star graphs. Additionally, it is shown that adding new nodes typically does not aid in reducing the variations at the disturbance's source node. Finally, it is shown by the explicit formulas that the line capacity affect the variation of the frequency and the inertia affects the variance of the phase differences.
△ Less
Submitted 16 March, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
EEG Opto-processor: epileptic seizure detection using diffractive photonic computing units
Authors:
Tao Yan,
Maoqi Zhang,
Sen Wan,
Kaifeng Shang,
Haiou Zhang,
Xun Cao,
Xing Lin,
Qionghai Dai
Abstract:
Electroencephalography (EEG) analysis extracts critical information from brain signals, which has provided fundamental support for various applications, including brain-disease diagnosis and brain-computer interface. However, the real-time processing of large-scale EEG signals at high energy efficiency has placed great challenges for electronic processors on edge computing devices. Here, we propos…
▽ More
Electroencephalography (EEG) analysis extracts critical information from brain signals, which has provided fundamental support for various applications, including brain-disease diagnosis and brain-computer interface. However, the real-time processing of large-scale EEG signals at high energy efficiency has placed great challenges for electronic processors on edge computing devices. Here, we propose the EEG opto-processor based on diffractive photonic computing units (DPUs) to effectively process the extracranial and intracranial EEG signals and perform epileptic seizure detection. The signals of EEG channels within a second-time window are optically encoded as inputs to the constructed diffractive neural networks for classification, which monitors the brain state to determine whether it's the symptom of an epileptic seizure or not. We developed both the free-space and integrated DPUs as edge computing systems and demonstrated their applications for real-time epileptic seizure detection with the benchmark datasets, i.e., the CHB-MIT extracranial EEG dataset and Epilepsy-iEEG-Multicenter intracranial EEG dataset, at high computing performance. Along with the channel selection mechanism, both the numerical evaluations and experimental results validated the sufficient high classification accuracies of the proposed opto-processors for supervising the clinical diagnosis. Our work opens up a new research direction of utilizing photonic computing techniques for processing large-scale EEG signals in promoting its broader applications.
△ Less
Submitted 9 December, 2022;
originally announced January 2023.
-
Technology Trends for Massive MIMO towards 6G
Authors:
Yiming Huo,
Xingqin Lin,
Boya Di,
Hongliang Zhang,
Francisco Javier Lorca Hernando,
Ahmet Serdar Tan,
Shahid Mumtaz,
Özlem Tuğfe Demir,
Kun Chen-Hu
Abstract:
At the dawn of the next-generation wireless systems and networks, massive multiple-input multiple-output (MIMO) has been envisioned as one of the enabling technologies. With the continued success of being applied in the 5G and beyond, the massive MIMO technology has demonstrated its advantageousness, integrability, and extendibility. Moreover, several evolutionary features and revolutionizing tren…
▽ More
At the dawn of the next-generation wireless systems and networks, massive multiple-input multiple-output (MIMO) has been envisioned as one of the enabling technologies. With the continued success of being applied in the 5G and beyond, the massive MIMO technology has demonstrated its advantageousness, integrability, and extendibility. Moreover, several evolutionary features and revolutionizing trends for massive MIMO have gradually emerged in recent years, which are expected to reshape the future 6G wireless systems and networks. Specifically, the functions and performance of future massive MIMO systems will be enabled and enhanced via combining other innovative technologies, architectures, and strategies such as intelligent omni-surfaces (IOSs)/intelligent reflecting surfaces (IRSs), artificial intelligence (AI), THz communications, cell free architecture. Also, more diverse vertical applications based on massive MIMO will emerge and prosper, such as wireless localization and sensing, vehicular communications, non-terrestrial communications, remote sensing, inter-planetary communications.
△ Less
Submitted 5 January, 2023; v1 submitted 4 January, 2023;
originally announced January 2023.
-
Improved CNN Prediction Based Reversible Data Hiding
Authors:
Yingqiang Qiu,
Wanli Peng,
Xiaodan Lin,
Huanqiang Zeng,
Zhenxing Qian
Abstract:
This letter proposes an improved CNN predictor (ICNNP) for reversible data hiding (RDH) in images, which consists of a feature extraction module, a pixel prediction module, and a complexity prediction module. Due to predicting the complexity of each pixel with the ICNNP during the embedding process, the proposed method can achieve superior performance than the CNN predictor-based method. Specifica…
▽ More
This letter proposes an improved CNN predictor (ICNNP) for reversible data hiding (RDH) in images, which consists of a feature extraction module, a pixel prediction module, and a complexity prediction module. Due to predicting the complexity of each pixel with the ICNNP during the embedding process, the proposed method can achieve superior performance than the CNN predictor-based method. Specifically, an input image does be first split into two different sub-images, i.e., the "Dot" image and the "Cross" image. Meanwhile, each sub-image is applied to predict another one. Then, the prediction errors of pixels are sorted with the predicted pixel complexities. In light of this, some sorted prediction errors with less complexity are selected to be efficiently used for low-distortion data embedding with a traditional histogram shift scheme. Experimental results demonstrate that the proposed method can achieve better embedding performance than that of the CNN predictor with the same histogram shifting strategy.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
DASECount: Domain-Agnostic Sample-Efficient Wireless Indoor Crowd Counting via Few-shot Learning
Authors:
Huawei Hou,
Suzhi Bi,
Lili Zheng,
Xiaohui Lin,
Yuan Wu,
Zhi Quan
Abstract:
Accurate indoor crowd counting (ICC) is a key enabler to many smart home/office applications. In this paper, we propose a Domain-Agnostic and Sample-Efficient wireless indoor crowd Counting (DASECount) framework that suffices to attain robust cross-domain detection accuracy given very limited data samples in new domains. DASECount leverages the wisdom of few-shot learning (FSL) paradigm consisting…
▽ More
Accurate indoor crowd counting (ICC) is a key enabler to many smart home/office applications. In this paper, we propose a Domain-Agnostic and Sample-Efficient wireless indoor crowd Counting (DASECount) framework that suffices to attain robust cross-domain detection accuracy given very limited data samples in new domains. DASECount leverages the wisdom of few-shot learning (FSL) paradigm consisting of two major stages: source domain meta training and target domain meta testing. Specifically, in the meta-training stage, we design and train two separate convolutional neural network (CNN) modules on the source domain dataset to fully capture the implicit amplitude and phase features of CSI measurements related to human activities. A subsequent knowledge distillation procedure is designed to iteratively update the CNN parameters for better generalization performance. In the meta-testing stage, we use the partial CNN modules to extract low-dimension features out of the high-dimension input target domain CSI data. With the obtained low-dimension CSI features, we can even use very few shots of target domain data samples (e.g., 5-shot samples) to train a lightweight logistic regression (LR) classifier, and attain very high cross-domain ICC accuracy. Experiment results show that the proposed DASECount method achieves over 92.68\%, and on average 96.37\% detection accuracy in a 0-8 people counting task under various domain setups, which significantly outperforms the other representative benchmark methods considered.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Road Slope Prediction and Vehicle Dynamics Control for Autonomous Vehicles
Authors:
Gautam Shetty,
Sabir Hossain,
Chuan Hu,
Xianke Lin
Abstract:
Autonomous vehicles can enhance overall performance and implement safety measures in ways that are impossible with conventional automobiles. These functions are executed through vehicle control systems, which have been the subject of considerable research. Autonomous cars have a distinct advantage as they possess various perception sensors that can predict road surface conditions and other phenome…
▽ More
Autonomous vehicles can enhance overall performance and implement safety measures in ways that are impossible with conventional automobiles. These functions are executed through vehicle control systems, which have been the subject of considerable research. Autonomous cars have a distinct advantage as they possess various perception sensors that can predict road surface conditions and other phenomena ahead of time. Many modern automotive control systems treat the road slope as a constant and do not account for changes in the road profile in their vehicle models. As a result, vehicle states may be miscalculated, which, in the worst-case scenario, may result in accidents. This is particularly true for high center-of-gravity vehicles like trailers and delivery trucks. With the help of perception sensors in autonomous vehicles, a road slope estimation system can be developed to aid these control systems in making informed decisions regarding the vehicle's state. The current review is divided into three logical steps that can be discussed in the following manner: the first section describes and reviews the individual steps for developing a road slope estimation system. The second one provides a detailed review of previous investigations that implemented different methods that employ this prediction system to improve overall vehicle performance. Finally, a roll control system is presented as an innovative idea that builds on the whole discussion. A rollover prevention system with prediction abilities is presented because (1) it proves to be a critical safety feature, especially for heavy vehicles like buses, trucks, delivery trailers, etc., and (2) not enough research has been conducted on technologies that integrate a roll stability controller with a slope estimation system.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
PSVRF: Learning to restore Pitch-Shifted Voice without reference
Authors:
Yangfu Li,
Xiaodan Lin,
Jiaxin Yang
Abstract:
Pitch scaling algorithms have a significant impact on the security of Automatic Speaker Verification (ASV) systems. Although numerous anti-spoofing algorithms have been proposed to identify the pitch-shifted voice and even restore it to the original version, they either have poor performance or require the original voice as a reference, limiting the prospects of applications. In this paper, we pro…
▽ More
Pitch scaling algorithms have a significant impact on the security of Automatic Speaker Verification (ASV) systems. Although numerous anti-spoofing algorithms have been proposed to identify the pitch-shifted voice and even restore it to the original version, they either have poor performance or require the original voice as a reference, limiting the prospects of applications. In this paper, we propose a no-reference approach termed PSVRF$^1$ for high-quality restoration of pitch-shifted voice. Experiments on AISHELL-1 and AISHELL-3 demonstrate that PSVRF can restore the voice disguised by various pitch-scaling techniques, which obviously enhances the robustness of ASV systems to pitch-scaling attacks. Furthermore, the performance of PSVRF even surpasses that of the state-of-the-art reference-based approach.
△ Less
Submitted 13 March, 2023; v1 submitted 6 October, 2022;
originally announced October 2022.