Search | arXiv e-print repository

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance. △ Less

Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

arXiv:2404.17926 [pdf, other]

Pre-training on High Definition X-ray Images: An Experimental Study

Authors: Xiao Wang, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang, Chuanfu Li, Jin Tang

Abstract: Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficul… ▽ More Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficult miscellaneous diseases. In this paper, we address these issues by proposing the first high-definition (1280 $\times$ 1280) X-ray based pre-trained foundation vision model on our newly collected large-scale dataset which contains more than 1 million X-ray images. Our model follows the masked auto-encoder framework which takes the tokens after mask processing (with a high rate) is used as input, and the masked image patches are reconstructed by the Transformer encoder-decoder network. More importantly, we introduce a novel context-aware masking strategy that utilizes the chest contour as a boundary for adaptive masking operations. We validate the effectiveness of our model on two downstream tasks, including X-ray report generation and disease recognition. Extensive experiments demonstrate that our pre-trained medical foundation vision model achieves comparable or even new state-of-the-art performance on downstream benchmark datasets. The source code and pre-trained models of this paper will be released on https://github.com/Event-AHU/Medical_Image_Analysis. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: Technology Report

arXiv:2403.13611 [pdf, other]

Densify & Conquer: Densified, smaller base-stations can conquer the increasing carbon footprint problem in nextG wireless

Authors: Agrim Gupta, Adel Heidari, Jiaming Jin, Dinesh Bharadia

Abstract: Connectivity on-the-go has been one of the most impressive technological achievements in the 2010s decade. However, multiple studies show that this has come at an expense of increased carbon footprint, that also rivals the entire aviation sector's carbon footprint. The two major contributors of this increased footprint are (a) smartphone batteries which affect the embodied footprint and (b) base-s… ▽ More Connectivity on-the-go has been one of the most impressive technological achievements in the 2010s decade. However, multiple studies show that this has come at an expense of increased carbon footprint, that also rivals the entire aviation sector's carbon footprint. The two major contributors of this increased footprint are (a) smartphone batteries which affect the embodied footprint and (b) base-stations that occupy ever-increasing energy footprint to provide the last mile wireless connectivity to smartphones. The root-cause of both these turn out to be the same, which is communicating over the last-mile lossy wireless medium. We show in this paper, titled DensQuer, how base-station densification, which is to replace a single larger base-station with multiple smaller ones, reduces the effect of the last-mile wireless, and in effect conquers both these adverse sources of increased carbon footprint. Backed by a open-source ray-tracing computation framework (Sionna), we show how a strategic densification strategy can minimize the number of required smaller base-stations to practically achievable numbers, which lead to about 3x power-savings in the base-station network. Also, DensQuer is able to also reduce the required deployment height of base-stations to as low as 15m, that makes the smaller cells easily deployable on trees/street poles instead of requiring a dedicated tower. Further, by utilizing newly introduced hardware power rails in Google Pixel 7a and above phones, we also show that this strategic densified network leads to reduction in mobile transmit power by 10-15 dBでしべる, leading to about 3x reduction in total cellular power consumption, and about 50% increase in smartphone battery life when it communicates data via the cellular network. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 12 pages, 14 figures

arXiv:2312.16807 [pdf, other]

Efficient Interference Graph Estimation via Concurrent Flooding

Authors: Haifeng Jia, Yichen Wei, Zhan Wang, Jiani Jin, Haorui Li, Yibo Pi

Abstract: Traditional wisdom for network management allocates network resources separately for the measurement and data transmission tasks. Heavy measurement tasks may take up resources for data transmission and significantly reduce network performance. It is therefore challenging for interference graphs, deemed as incurring heavy measurement overhead, to be used in practice in wireless networks. To address… ▽ More Traditional wisdom for network management allocates network resources separately for the measurement and data transmission tasks. Heavy measurement tasks may take up resources for data transmission and significantly reduce network performance. It is therefore challenging for interference graphs, deemed as incurring heavy measurement overhead, to be used in practice in wireless networks. To address this challenge in wireless sensor networks, we propose to use power as a new dimension for interference graph estimation (IGE) and integrate IGE with concurrent flooding such that IGE can be done simultaneously with flooding using the same frequency-time resources. With controlled and real-world experiments, we show that it is feasible to efficiently achieve IGE via concurrent flooding on the commercial off-the-shelf (COTS) devices by controlling the transmit powers of nodes. We believe that efficient IGE would be a key enabler for the practical use of the existing scheduling algorithms assuming known interference graphs. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: Accepted by International Conference on Embedded Wireless Systems and Networking 2023 (EWSN'23), 7 pages with 9 figures, equal contribution by Haifeng Jia and Yichen Wei

ACM Class: C.2

arXiv:2310.03581 [pdf, other]

Resilient Legged Local Navigation: Learning to Traverse with Compromised Perception End-to-End

Authors: Jin Jin, Chong Zhang, Jonas Frey, Nikita Rudin, Matias Mattamala, Cesar Cadena, Marco Hutter

Abstract: Autonomous robots must navigate reliably in unknown environments even under compromised exteroceptive perception, or perception failures. Such failures often occur when harsh environments lead to degraded sensing, or when the perception algorithm misinterprets the scene due to limited generalization. In this paper, we model perception failures as invisible obstacles and pits, and train a reinforce… ▽ More Autonomous robots must navigate reliably in unknown environments even under compromised exteroceptive perception, or perception failures. Such failures often occur when harsh environments lead to degraded sensing, or when the perception algorithm misinterprets the scene due to limited generalization. In this paper, we model perception failures as invisible obstacles and pits, and train a reinforcement learning (RL) based local navigation policy to guide our legged robot. Unlike previous works relying on heuristics and anomaly detection to update navigational information, we train our navigation policy to reconstruct the environment information in the latent space from corrupted perception and react to perception failures end-to-end. To this end, we incorporate both proprioception and exteroception into our policy inputs, thereby enabling the policy to sense collisions on different body parts and pits, prompting corresponding reactions. We validate our approach in simulation and on the real quadruped robot ANYmal running in real-time (<10 ms CPU inference). In a quantitative comparison with existing heuristic-based locally reactive planners, our policy increases the success rate over 30% when facing perception failures. Project Page: https://bit.ly/45NBTuh. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: Website and videos are available at our Project Page: https://bit.ly/45NBTuh

arXiv:2308.10543 [pdf, other]

An Anchor-Point Based Image-Model for Room Impulse Response Simulation with Directional Source Radiation and Sensor Directivity Patterns

Authors: Chao Pan, Lei Zhang, Yilong Lu, Jilu Jin, Lin Qiu, Jingdong Chen, Jacob Benesty

Abstract: The image model method has been widely used to simulate room impulse responses and the endeavor to adapt this method to different applications has also piqued great interest over the last few decades. This paper attempts to extend the image model method and develops an anchor-point-image-model (APIM) approach as a solution for simulating impulse responses by including both the source radiation and… ▽ More The image model method has been widely used to simulate room impulse responses and the endeavor to adapt this method to different applications has also piqued great interest over the last few decades. This paper attempts to extend the image model method and develops an anchor-point-image-model (APIM) approach as a solution for simulating impulse responses by including both the source radiation and sensor directivity patterns. To determine the orientations of all the virtual sources, anchor points are introduced to real sources, which subsequently lead to the determination of the orientations of the virtual sources. An algorithm is developed to generate room impulse responses with APIM by taking into account the directional pattern functions, factional time delays, as well as the computational complexity. The developed model and algorithms can be used in various acoustic problems to simulate room acoustics and improve and evaluate processing algorithms. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 19 pages, 8 figures

arXiv:2306.11332 [pdf, ps, other]

Minimum Eigenvalue Based Covariance Matrix Estimation with Limited Samples

Authors: Jing Qian, Juening Jin, Hao Wang

Abstract: In this paper, we consider the interference rejection combining (IRC) receiver, which improves the cell-edge user throughput via suppressing inter-cell interference and requires estimating the covariance matrix including the inter-cell interference with high accuracy. In order to solve the problem of sample covariance matrix estimation with limited samples, a regularization parameter optimization… ▽ More In this paper, we consider the interference rejection combining (IRC) receiver, which improves the cell-edge user throughput via suppressing inter-cell interference and requires estimating the covariance matrix including the inter-cell interference with high accuracy. In order to solve the problem of sample covariance matrix estimation with limited samples, a regularization parameter optimization based on the minimum eigenvalue criterion is developed. It is different from traditional methods that aim at minimizing the mean squared error, but goes straight at the objective of optimizing the final performance of the IRC receiver. A lower bound of the minimum eigenvalue that is easier to calculate is also derived. Simulation results demonstrate that the proposed approach is effective and can approach the performance of the oracle estimator in terms of the mutual information metric. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.10461 [pdf, other]

GAN-based Image Compression with Improved RDO Process

Authors: Fanxin Xia, Jian Jin, Lili Meng, Feng Ding, Huaxiang Zhang

Abstract: GAN-based image compression schemes have shown remarkable progress lately due to their high perceptual quality at low bit rates. However, there are two main issues, including 1) the reconstructed image perceptual degeneration in color, texture, and structure as well as 2) the inaccurate entropy model. In this paper, we present a novel GAN-based image compression approach with improved rate-distort… ▽ More GAN-based image compression schemes have shown remarkable progress lately due to their high perceptual quality at low bit rates. However, there are two main issues, including 1) the reconstructed image perceptual degeneration in color, texture, and structure as well as 2) the inaccurate entropy model. In this paper, we present a novel GAN-based image compression approach with improved rate-distortion optimization (RDO) process. To achieve this, we utilize the DISTS and MS-SSIM metrics to measure perceptual degeneration in color, texture, and structure. Besides, we absorb the discretized gaussian-laplacian-logistic mixture model (GLLMM) for entropy modeling to improve the accuracy in estimating the probability distributions of the latent representation. During the evaluation process, instead of evaluating the perceptual quality of the reconstructed image via IQA metrics, we directly conduct the Mean Opinion Score (MOS) experiment among different codecs, which fully reflects the actual perceptual results of humans. Experimental results demonstrate that the proposed method outperforms the existing GAN-based methods and the state-of-the-art hybrid codec (i.e., VVC). △ Less

Submitted 17 June, 2023; originally announced June 2023.

arXiv:2305.12994 [pdf, ps, other]

Multistatic Integrated Sensing and Communication System in Cellular Networks

Authors: Zixiang Han, Lincong Han, Xiaozhou Zhang, Yajuan Wang, Liang Ma, Mengting Lou, Jing Jin, Guangyi Liu

Abstract: A novel multistatic multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system in cellular networks is proposed. It can make use of widespread base stations (BSs) to perform cooperative sensing in wide area. This system is important since the deployment of sensing function can be achieved based on the existing mobile communication networks at a low cost. In this syste… ▽ More A novel multistatic multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system in cellular networks is proposed. It can make use of widespread base stations (BSs) to perform cooperative sensing in wide area. This system is important since the deployment of sensing function can be achieved based on the existing mobile communication networks at a low cost. In this system, orthogonal frequency division multiplexing (OFDM) signals transmitted from the central BS are received and processed by each of the neighboring BSs to estimate sensing object parameters. A joint data processing method is then introduced to derive the closed-form solution of objects position and velocity. Numerical simulation shows that the proposed multistatic system can improve the position and velocity estimation accuracy compared with monostatic and bistatic system, demonstrating the effectiveness and promise of implementing ISAC in the upcoming fifth generation advanced (5G-A) and sixth generation (6G) mobile networks. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.11056 [pdf, other]

PETAL: Physics Emulation Through Averaged Linearizations for Solving Inverse Problems

Authors: Jihui Jin, Etienne Ollivier, Richard Touret, Matthew McKinley, Karim G. Sabra, Justin K. Romberg

Abstract: Inverse problems describe the task of recovering an underlying signal of interest given observables. Typically, the observables are related via some non-linear forward model applied to the underlying unknown signal. Inverting the non-linear forward model can be computationally expensive, as it often involves computing and inverting a linearization at a series of estimates. Rather than inverting th… ▽ More Inverse problems describe the task of recovering an underlying signal of interest given observables. Typically, the observables are related via some non-linear forward model applied to the underlying unknown signal. Inverting the non-linear forward model can be computationally expensive, as it often involves computing and inverting a linearization at a series of estimates. Rather than inverting the physics-based model, we instead train a surrogate forward model (emulator) and leverage modern auto-grad libraries to solve for the input within a classical optimization framework. Current methods to train emulators are done in a black box supervised machine learning fashion and fail to take advantage of any existing knowledge of the forward model. In this article, we propose a simple learned weighted average model that embeds linearizations of the forward model around various reference points into the model itself, explicitly incorporating known physics. Grounding the learned model with physics based linearizations improves the forward modeling accuracy and provides richer physics based gradient information during the inversion process leading to more accurate signal recovery. We demonstrate the efficacy on an ocean acoustic tomography (OAT) example that aims to recover ocean sound speed profile (SSP) variations from acoustic observations (e.g. eigenray arrival times) within simulation of ocean dynamics in the Gulf of Mexico. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.10198 [pdf, other]

IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame Interpolation with Events

Authors: Chenyang Shi, Hanxiao Liu, Jing Jin, Wenzhuo Li, Yuzhen Li, Boyi Wei, Yibo Zhang

Abstract: Video frame interpolation aims to generate high-quality intermediate frames from boundary frames and increase frame rate. While existing linear, symmetric and nonlinear models are used to bridge the gap from the lack of inter-frame motion, they cannot reconstruct real motions. Event cameras, however, are ideal for capturing inter-frame dynamics with their extremely high temporal resolution. In thi… ▽ More Video frame interpolation aims to generate high-quality intermediate frames from boundary frames and increase frame rate. While existing linear, symmetric and nonlinear models are used to bridge the gap from the lack of inter-frame motion, they cannot reconstruct real motions. Event cameras, however, are ideal for capturing inter-frame dynamics with their extremely high temporal resolution. In this paper, we propose an event-and-frame-based video frame interpolation method named IDO-VFI that assigns varying amounts of computation for different sub-regions via optical flow guidance. The proposed method first estimates the optical flow based on frames and events, and then decides whether to further calculate the residual optical flow in those sub-regions via a Gumbel gating module according to the optical flow amplitude. Intermediate frames are eventually generated through a concise Transformer-based fusion network. Our proposed method maintains high-quality performance while reducing computation time and computational effort by 10% and 17% respectively on Vimeo90K datasets, compared with a unified process on the whole region. Moreover, our method outperforms state-of-the-art frame-only and frames-plus-events methods on multiple video frame interpolation benchmarks. Codes and models are available at https://github.com/shicy17/IDO-VFI. △ Less

Submitted 18 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2304.07143 [pdf, other]

doi 10.1109/TIV.2024.3409468

Car-Following Models: A Multidisciplinary Review

Authors: Tianya Terry Zhang, Ph. D., Peter J. Jin, Ph. D., Sean T. McQuade, Ph. D., Alexandre Bayen, Ph. D., Benedetto Piccoli

Abstract: Car-following (CF) algorithms are crucial components of traffic simulations and have been integrated into many production vehicles equipped with Advanced Driving Assistance Systems (ADAS). Insights from the model of car-following behavior help us understand the causes of various macro phenomena that arise from interactions between pairs of vehicles. Car-following models encompass multiple discipli… ▽ More Car-following (CF) algorithms are crucial components of traffic simulations and have been integrated into many production vehicles equipped with Advanced Driving Assistance Systems (ADAS). Insights from the model of car-following behavior help us understand the causes of various macro phenomena that arise from interactions between pairs of vehicles. Car-following models encompass multiple disciplines, including traffic engineering, physics, dynamic system control, cognitive science, machine learning, and reinforcement learning. This paper presents an extensive survey that highlights the differences, complementarities, and overlaps among microscopic traffic flow and control models based on their underlying principles and design logic. It reviews representative algorithms, ranging from theory-based kinematic models, Psycho-Physical Models, and Adaptive cruise control models to data-driven algorithms like Reinforcement Learning (RL) and Imitation Learning (IL). The manuscript discusses the strengths and limitations of these models and explores their applications in different contexts. This review synthesizes existing researches across different domains to fill knowledge gaps and offer guidance for future research by identifying the latest trends in car following models and their applications. △ Less

Submitted 5 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

arXiv:2302.13092 [pdf, other]

JND-Based Perceptual Optimization For Learned Image Compression

Authors: Feng Ding, Jian Jin, Lili Meng, Weisi Lin

Abstract: Recently, learned image compression schemes have achieved remarkable improvements in image fidelity (e.g., PSNR and MS-SSIM) compared to conventional hybrid image coding ones due to their high-efficiency non-linear transform, end-to-end optimization frameworks, etc. However, few of them take the Just Noticeable Difference (JND) characteristic of the Human Visual System (HVS) into account and optim… ▽ More Recently, learned image compression schemes have achieved remarkable improvements in image fidelity (e.g., PSNR and MS-SSIM) compared to conventional hybrid image coding ones due to their high-efficiency non-linear transform, end-to-end optimization frameworks, etc. However, few of them take the Just Noticeable Difference (JND) characteristic of the Human Visual System (HVS) into account and optimize learned image compression towards perceptual quality. To address this issue, a JND-based perceptual quality loss is proposed. Considering that the amounts of distortion in the compressed image at different training epochs under different Quantization Parameters (QPs) are different, we develop a distortion-aware adjustor. After combining them together, we can better assign the distortion in the compressed image with the guidance of JND to preserve the high perceptual quality. All these designs enable the proposed method to be flexibly applied to various learned image compression schemes with high scalability and plug-and-play advantages. Experimental results on the Kodak dataset demonstrate that the proposed method has led to better perceptual quality than the baseline model under the same bit rate. △ Less

Submitted 8 March, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

Comments: 5 pages, 5 figures, conference

arXiv:2301.12804 [pdf, ps, other]

From ORAN to Cell-Free RAN: Architecture, Performance Analysis, Testbeds and Trials

Authors: Yang Cao, Ziyang Zhang, Xinjiang Xia, Pengzhe Xin, Dongjie Liu, Kang Zheng, Mengting Lou, Jing Jin, Qixing Wang, Dongming Wang, Yongming Huang, Xiaohu You, Jiangzhou Wang

Abstract: Open radio access network (ORAN) provides an open architecture to implement radio access network (RAN) of the fifth generation (5G) and beyond mobile communications. As a key technology for the evolution to the sixth generation (6G) systems, cell-free massive multiple-input multiple-output (CF-mMIMO) can effectively improve the spectrum efficiency, peak rate and reliability of wireless communicati… ▽ More Open radio access network (ORAN) provides an open architecture to implement radio access network (RAN) of the fifth generation (5G) and beyond mobile communications. As a key technology for the evolution to the sixth generation (6G) systems, cell-free massive multiple-input multiple-output (CF-mMIMO) can effectively improve the spectrum efficiency, peak rate and reliability of wireless communication systems. Starting from scalable implementation of CF-mMIMO, we study a cell-free RAN (CF-RAN) under the ORAN architecture. Through theoretical analysis and numerical simulation, we investigate the uplink and downlink spectral efficiencies of CF-mMIMO with the new architecture. We then discuss the implementation issues of CF-RAN under ORAN architecture, including time-frequency synchronization and over-the-air reciprocity calibration, low layer splitting, deployment of ORAN radio units (O-RU), artificial intelligent based user associations. Finally, we present some representative experimental results for the uplink distributed reception and downlink coherent joint transmission of CF-RAN with commercial off-the-shelf O-RUs. △ Less

Submitted 6 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2210.04987 [pdf, other]

Loop Unrolled Shallow Equilibrium Regularizer (LUSER) -- A Memory-Efficient Inverse Problem Solver

Authors: Peimeng Guan, Jihui Jin, Justin Romberg, Mark A. Davenport

Abstract: In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. Classical optimization-based techniques proceed by optimizing a data consistency metric together with a regularizer. Current state-of-the-art machine learning approaches draw inspiration from such techniques by unrolling the iterative updates for an optimization… ▽ More In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. Classical optimization-based techniques proceed by optimizing a data consistency metric together with a regularizer. Current state-of-the-art machine learning approaches draw inspiration from such techniques by unrolling the iterative updates for an optimization-based solver and then learning a regularizer from data. This loop unrolling (LU) method has shown tremendous success, but often requires a deep model for the best performance leading to high memory costs during training. Thus, to address the balance between computation cost and network expressiveness, we propose an LU algorithm with shallow equilibrium regularizers (LUSER). These implicit models are as expressive as deeper convolutional networks, but far more memory efficient during training. The proposed method is evaluated on image deblurring, computed tomography (CT), as well as single-coil Magnetic Resonance Imaging (MRI) tasks and shows similar, or even better, performance while requiring up to 8 times less computational resources during training when compared against a more typical LU architecture with feedforward convolutional regularizers. △ Less

Submitted 13 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

arXiv:2208.07583 [pdf, other]

HVS-Inspired Signal Degradation Network for Just Noticeable Difference Estimation

Authors: Jian Jin, Yuan Xue, Xingxing Zhang, Lili Meng, Yao Zhao, Weisi Lin

Abstract: Significant improvement has been made on just noticeable difference (JND) modelling due to the development of deep neural networks, especially for the recently developed unsupervised-JND generation models. However, they have a major drawback that the generated JND is assessed in the real-world signal domain instead of in the perceptual domain in the human brain. There is an obvious difference when… ▽ More Significant improvement has been made on just noticeable difference (JND) modelling due to the development of deep neural networks, especially for the recently developed unsupervised-JND generation models. However, they have a major drawback that the generated JND is assessed in the real-world signal domain instead of in the perceptual domain in the human brain. There is an obvious difference when JND is assessed in such two domains since the visual signal in the real world is encoded before it is delivered into the brain with the human visual system (HVS). Hence, we propose an HVS-inspired signal degradation network for JND estimation. To achieve this, we carefully analyze the HVS perceptual process in JND subjective viewing to obtain relevant insights, and then design an HVS-inspired signal degradation (HVS-SD) network to represent the signal degradation in the HVS. On the one hand, the well learnt HVS-SD enables us to assess the JND in the perceptual domain. On the other hand, it provides more accurate prior information for better guiding JND generation. Additionally, considering the requirement that reasonable JND should not lead to visual attention shifting, a visual attention loss is proposed to control JND generation. Experimental results demonstrate that the proposed method achieves the SOTA performance for accurately estimating the redundancy of the HVS. Source code will be available at https://github.com/jianjin008/HVS-SD-JND. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: Submit to IEEE Transactions on Cybernetics

arXiv:2205.13866 [pdf, other]

Task Offloading with Multi-Tier Computing Resources in Next Generation Wireless Networks

Authors: Kunlun Wang, Jiong Jin, Yang Yang, Tao Zhang, Arumugam Nallanathan, Chintha Tellambura, Bijan Jabbari

Abstract: With the development of next-generation wireless networks, the Internet of Things (IoT) is evolving towards the intelligent IoT (iIoT), where intelligent applications usually have stringent delay and jitter requirements. In order to provide low-latency services to heterogeneous users in the emerging iIoT, multi-tier computing was proposed by effectively combining edge computing and fog computing.… ▽ More With the development of next-generation wireless networks, the Internet of Things (IoT) is evolving towards the intelligent IoT (iIoT), where intelligent applications usually have stringent delay and jitter requirements. In order to provide low-latency services to heterogeneous users in the emerging iIoT, multi-tier computing was proposed by effectively combining edge computing and fog computing. More specifically, multi-tier computing systems compensate for cloud computing through task offloading and dispersing computing tasks to multi-tier nodes along the continuum from the cloud to things. In this paper, we investigate key techniques and directions for wireless communications and resource allocation approaches to enable task offloading in multi-tier computing systems. A multi-tier computing model, with its main functionality and optimization methods, is presented in details. We hope that this paper will serve as a valuable reference and guide to the theoretical, algorithmic, and systematic opportunities of multi-tier computing towards next-generation wireless networks. △ Less

Submitted 27 May, 2022; originally announced May 2022.

arXiv:2204.11669 [pdf]

doi 10.1038/s41746-023-00859-y

Deep-learning-enabled Brain Hemodynamic Mapping Using Resting-state fMRI

Authors: Xirui Hou, Pengfei Guo, Puyang Wang, Peiying Liu, Doris D. M. Lin, Hongli Fan, Yang Li, Zhiliang Wei, Zixuan Lin, Dengrong Jiang, Jin Jin, Catherine Kelly, Jay J. Pillai, Judy Huang, Marco C. Pinho, Binu P. Thomas, Babu G. Welch, Denise C. Park, Vishal M. Patel, Argye E. Hillis, Hanzhang Lu

Abstract: Cerebrovascular disease is a leading cause of death globally. Prevention and early intervention are known to be the most effective forms of its management. Non-invasive imaging methods hold great promises for early stratification, but at present lack the sensitivity for personalized prognosis. Resting-state functional magnetic resonance imaging (rs-fMRI), a powerful tool previously used for mappin… ▽ More Cerebrovascular disease is a leading cause of death globally. Prevention and early intervention are known to be the most effective forms of its management. Non-invasive imaging methods hold great promises for early stratification, but at present lack the sensitivity for personalized prognosis. Resting-state functional magnetic resonance imaging (rs-fMRI), a powerful tool previously used for mapping neural activity, is available in most hospitals. Here we show that rs-fMRI can be used to map cerebral hemodynamic function and delineate impairment. By exploiting time variations in breathing pattern during rs-fMRI, deep learning enables reproducible mapping of cerebrovascular reactivity (CVR) and bolus arrive time (BAT) of the human brain using resting-state CO2 fluctuations as a natural 'contrast media'. The deep-learning network was trained with CVR and BAT maps obtained with a reference method of CO2-inhalation MRI, which included data from young and older healthy subjects and patients with Moyamoya disease and brain tumors. We demonstrate the performance of deep-learning cerebrovascular mapping in the detection of vascular abnormalities, evaluation of revascularization effects, and vascular alterations in normal aging. In addition, cerebrovascular maps obtained with the proposed method exhibited excellent reproducibility in both healthy volunteers and stroke patients. Deep-learning resting-state vascular imaging has the potential to become a useful tool in clinical cerebrovascular imaging. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Journal ref: npj Digital Medicine (2023) 116

arXiv:2203.00629 [pdf, other]

Full RGB Just Noticeable Difference (JND) Modelling

Authors: Jian Jin, Dong Yu, Weisi Lin, Lili Meng, Hao Wang, Huaxiang Zhang

Abstract: Just Noticeable Difference (JND) has many applications in multimedia signal processing, especially for visual data processing up to date. It's generally defined as the minimum visual content changes that the human can perspective, which has been studied for decades. However, most of the existing methods only focus on the luminance component of JND modelling and simply regard chrominance components… ▽ More Just Noticeable Difference (JND) has many applications in multimedia signal processing, especially for visual data processing up to date. It's generally defined as the minimum visual content changes that the human can perspective, which has been studied for decades. However, most of the existing methods only focus on the luminance component of JND modelling and simply regard chrominance components as scaled versions of luminance. In this paper, we propose a JND model to generate the JND by taking the characteristics of full RGB channels into account, termed as the RGB-JND. To this end, an RGB-JND-NET is proposed, where the visual content in full RGB channels is used to extract features for JND generation. To supervise the JND generation, an adaptive image quality assessment combination (AIC) is developed. Besides, the RDB-JND-NET also takes the visual attention into account by automatically mining the underlying relationship between visual attention and the JND, which is further used to constrain the JND spatial distribution. To the best of our knowledge, this is the first work on careful investigation of JND modelling for full-color space. Experimental results demonstrate that the RGB-JND-NET model outperforms the relevant state-of-the-art JND models. Besides, the JND of the red and blue channels are larger than that of the green one according to the experimental results of the proposed model, which demonstrates that more changes can be tolerated in the red and blue channels, in line with the well-known fact that the human visual system is more sensitive to the green channel in comparison with the red and blue ones. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: 13 pages, 8 figures, 8 tables

arXiv:2201.04756 [pdf]

doi 10.1155/2022/2771085

Roadside Lidar Vehicle Detection and Tracking Using Range And Intensity Background Subtraction

Authors: Tianya Zhang, Peter J. Jin

Abstract: In this paper, we developed the solution of roadside LiDAR object detection using a combination of two unsupervised learning algorithms. The 3D point clouds are firstly converted into spherical coordinates and filled into the elevation-azimuth matrix using a hash function. After that, the raw LiDAR data were rearranged into new data structures to store the information of range, azimuth, and intens… ▽ More In this paper, we developed the solution of roadside LiDAR object detection using a combination of two unsupervised learning algorithms. The 3D point clouds are firstly converted into spherical coordinates and filled into the elevation-azimuth matrix using a hash function. After that, the raw LiDAR data were rearranged into new data structures to store the information of range, azimuth, and intensity. Then, the Dynamic Mode Decomposition method is applied to decompose the LiDAR data into low-rank backgrounds and sparse foregrounds based on intensity channel pattern recognition. The Coarse Fine Triangle Algorithm (CFTA) automatically finds the dividing value to separate the moving targets from static background according to range information. After intensity and range background subtraction, the foreground moving objects will be detected using a density-based detector and encoded into the state-space model for tracking. The output of the proposed solution includes vehicle trajectories that can enable many mobility and safety applications. The method was validated at both path and point levels and outperformed the state-of-the-art. In contrast to the previous methods that process directly on the scattered and discrete point clouds, the dynamic classification method can establish the less sophisticated linear relationship of the 3D measurement data, which captures the spatial-temporal structure that we often desire. △ Less

Submitted 7 June, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Journal ref: Journal of Advanced Transportation, 2022

arXiv:2201.02420 [pdf, ps, other]

Auto-Weighted Layer Representation Based View Synthesis Distortion Estimation for 3-D Video Coding

Authors: Jian Jin, Xingxing Zhang, Lili Meng, Weisi Lin, Jie Liang, Huaxiang Zhang, Yao Zhao

Abstract: Recently, various view synthesis distortion estimation models have been studied to better serve for 3-D video coding. However, they can hardly model the relationship quantitatively among different levels of depth changes, texture degeneration, and the view synthesis distortion (VSD), which is crucial for rate-distortion optimization and rate allocation. In this paper, an auto-weighted layer repres… ▽ More Recently, various view synthesis distortion estimation models have been studied to better serve for 3-D video coding. However, they can hardly model the relationship quantitatively among different levels of depth changes, texture degeneration, and the view synthesis distortion (VSD), which is crucial for rate-distortion optimization and rate allocation. In this paper, an auto-weighted layer representation based view synthesis distortion estimation model is developed. Firstly, the sub-VSD (S-VSD) is defined according to the level of depth changes and their associated texture degeneration. After that, a set of theoretical derivations demonstrate that the VSD can be approximately decomposed into the S-VSDs multiplied by their associated weights. To obtain the S-VSDs, a layer-based representation of S-VSD is developed, where all the pixels with the same level of depth changes are represented with a layer to enable efficient S-VSD calculation at the layer level. Meanwhile, a nonlinear mapping function is learnt to accurately represent the relationship between the VSD and S-VSDs, automatically providing weights for S-VSDs during the VSD estimation. To learn such function, a dataset of VSD and its associated S-VSDs are built. Experimental results show that the VSD can be accurately estimated with the weights learnt by the nonlinear mapping function once its associated S-VSDs are available. The proposed method outperforms the relevant state-of-the-art methods in both accuracy and efficiency. The dataset and source code of the proposed method will be available at https://github.com/jianjin008/. △ Less

Submitted 7 January, 2022; originally announced January 2022.

arXiv:2112.10071 [pdf, other]

A New Image Codec Paradigm for Human and Machine Uses

Authors: Sien Chen, Jian Jin, Lili Meng, Weisi Lin, Zhuo Chen, Tsui-Shan Chang, Zhengguang Li, Huaxiang Zhang

Abstract: With the AI of Things (AIoT) development, a huge amount of visual data, e.g., images and videos, are produced in our daily work and life. These visual data are not only used for human viewing or understanding but also for machine analysis or decision-making, e.g., intelligent surveillance, automated vehicles, and many other smart city applications. To this end, a new image codec paradigm for both… ▽ More With the AI of Things (AIoT) development, a huge amount of visual data, e.g., images and videos, are produced in our daily work and life. These visual data are not only used for human viewing or understanding but also for machine analysis or decision-making, e.g., intelligent surveillance, automated vehicles, and many other smart city applications. To this end, a new image codec paradigm for both human and machine uses is proposed in this work. Firstly, the high-level instance segmentation map and the low-level signal features are extracted with neural networks. Then, the instance segmentation map is further represented as a profile with the proposed 16-bit gray-scale representation. After that, both 16-bit gray-scale profile and signal features are encoded with a lossless codec. Meanwhile, an image predictor is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features. Finally, the residual map between the original image and the predicted one is compressed with a lossy codec, used for high-quality image reconstruction. With such designs, on the one hand, we can achieve scalable image compression to meet the requirements of different human consumption; on the other hand, we can directly achieve several machine vision tasks at the decoder side with the decoded 16-bit gray-scale profile, e.g., object classification, detection, and segmentation. Experimental results show that the proposed codec achieves comparable results as most learning-based codecs and outperforms the traditional codecs (e.g., BPG and JPEG2000) in terms of PSNR and MS-SSIM for image reconstruction. At the same time, it outperforms the existing codecs in terms of the mAP for object detection and segmentation. △ Less

Submitted 19 December, 2021; originally announced December 2021.

arXiv:2107.07752 [pdf, other]

doi 10.1016/j.media.2022.102700

NeXtQSM -- A complete deep learning pipeline for data-consistent quantitative susceptibility mapping trained with hybrid data

Authors: Francesco Cognolato, Kieran O'Brien, Jin Jin, Simon Robinson, Frederik B. Laun, Markus Barth, Steffen Bollmann

Abstract: Deep learning based Quantitative Susceptibility Mapping (QSM) has shown great potential in recent years, obtaining similar results to established non-learning approaches. Many current deep learning approaches are not data consistent, require in vivo training data or solve the QSM problem in consecutive steps resulting in the propagation of errors. Here we aim to overcome these limitations and deve… ▽ More Deep learning based Quantitative Susceptibility Mapping (QSM) has shown great potential in recent years, obtaining similar results to established non-learning approaches. Many current deep learning approaches are not data consistent, require in vivo training data or solve the QSM problem in consecutive steps resulting in the propagation of errors. Here we aim to overcome these limitations and developed a framework to solve the QSM processing steps jointly. We developed a new hybrid training data generation method that enables the end-to-end training for solving background field correction and dipole inversion in a data-consistent fashion using a variational network that combines the QSM model term and a learned regularizer. We demonstrate that NeXtQSM overcomes the limitations of previous deep learning methods. NeXtQSM offers a new deep learning based pipeline for computing quantitative susceptibility maps that integrates each processing step into the training and provides results that are robust and fast. △ Less

Submitted 30 August, 2023; v1 submitted 16 July, 2021; originally announced July 2021.

arXiv:2106.04817 [pdf, other]

Detecting and Correcting IMU Movements During Joint Angle Estimation

Authors: Chunzhi Yi, Feng Jiang, Baichun Wei, Chifu Yang, Zhen Ding, Jubo Jin, Jie Liu

Abstract: Inertial measurement units (IMUs) increasingly function as a basic component of wearable sensor network (WSN)systems. IMU-based joint angle estimation (JAE) is a relatively typical usage of IMUs, with extensive applications. However, the issue that IMUs move with respect to their original placement during JAE is still a research gap, and limits the robustness of deploying the technique in real-wor… ▽ More Inertial measurement units (IMUs) increasingly function as a basic component of wearable sensor network (WSN)systems. IMU-based joint angle estimation (JAE) is a relatively typical usage of IMUs, with extensive applications. However, the issue that IMUs move with respect to their original placement during JAE is still a research gap, and limits the robustness of deploying the technique in real-world application scenarios. In this study, we propose to detect and correct the IMU movement online in a relatively computationally lightweight manner. Particularly, we first experimentally investigate the influence of IMU movements. Second, we design the metrics for detecting IMU movements by mathematically formulating how the IMU movement affects the IMU measurements. Third, we determine the optimal thresholds of metrics by synthetic IMU data from a significantly amended simulation model. Finally, a correction method is proposed to correct the effects of IMU movements. We demonstrate our method on both synthetic data and real-user data. The results demonstrate our method is a promising solution to detecting and correcting IMU movements during JAE. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: 12 pages, submitted to SenSys 21

arXiv:2102.08168 [pdf, ps, other]

doi 10.1109/TCSVT.2021.3113572

Just Noticeable Difference for Deep Machine Vision

Authors: Jian Jin, Xingxing Zhang, Xin Fu, Huan Zhang, Weisi Lin, Jian Lou, Yao Zhao

Abstract: As an important perceptual characteristic of the Human Visual System (HVS), the Just Noticeable Difference (JND) has been studied for decades with image and video processing (e.g., perceptual visual signal compression). However, there is little exploration on the existence of JND for the Deep Machine Vision (DMV), although the DMV has made great strides in many machine vision tasks. In this paper,… ▽ More As an important perceptual characteristic of the Human Visual System (HVS), the Just Noticeable Difference (JND) has been studied for decades with image and video processing (e.g., perceptual visual signal compression). However, there is little exploration on the existence of JND for the Deep Machine Vision (DMV), although the DMV has made great strides in many machine vision tasks. In this paper, we take an initial attempt, and demonstrate that the DMV has the JND, termed as the DMV-JND. We then propose a JND model for the image classification task in the DMV. It has been discovered that the DMV can tolerate distorted images with average PSNR of only 9.56dBでしべる (the lower the better), by generating JND via unsupervised learning with the proposed DMV-JND-NET. In particular, a semantic-guided redundancy assessment strategy is designed to restrain the magnitude and spatial distribution of the DMV-JND. Experimental results on image classification demonstrate that we successfully find the JND for deep machine vision. Our DMV-JND facilitates a possible direction for DMV-oriented image and video compression, watermarking, quality assessment, deep neural network security, and so on. △ Less

Submitted 7 January, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, 2021

arXiv:2102.07085 [pdf, other]

Light Field Reconstruction via Deep Adaptive Fusion of Hybrid Lenses

Authors: Jing Jin, Mantang Guo, Junhui Hou, Hui Liu, Hongkai Xiong

Abstract: This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a… ▽ More This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation, while the other module warps another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations adaptively via the learned attention maps, leading to the final high-resolution LF image with satisfactory results on both plain textured areas and depth discontinuous boundaries. Besides, to promote the effectiveness of our method trained with simulated hybrid data on real hybrid data captured by a hybrid LF imaging system, we carefully design the network architecture and the training strategy. Extensive experiments on both real and simulated hybrid data demonstrate the significant superiority of our approach over state-of-the-art ones. To the best of our knowledge, this is the first end-to-end deep learning method for LF reconstruction from a real hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission. △ Less

Submitted 17 June, 2023; v1 submitted 14 February, 2021; originally announced February 2021.

Comments: Accepted by IEEE TPAMI. arXiv admin note: text overlap with arXiv:1907.09640

arXiv:2010.10327 [pdf, other]

Can Steering Wheel Detect Your Driving Fatigue?

Authors: Jianchao Lu, Xi Zheng, Tianyi Zhang, Michael Sheng, Chen Wang, Jiong Jin, Shui Yu, Wanlei Zhou

Abstract: Automated Driving System (ADS) has attracted increasing attention from both industrial and academic communities due to its potential for increasing the safety, mobility and efficiency of existing transportation systems. The state-of-the-art ADS follows the human-in-the-loop (HITL) design, where the driver's anomalous behaviour is closely monitored by the system. Though many approaches have been pr… ▽ More Automated Driving System (ADS) has attracted increasing attention from both industrial and academic communities due to its potential for increasing the safety, mobility and efficiency of existing transportation systems. The state-of-the-art ADS follows the human-in-the-loop (HITL) design, where the driver's anomalous behaviour is closely monitored by the system. Though many approaches have been proposed for detecting driver fatigue, they largely depend on vehicle driving parameters and facial features, which lacks reliability. Approaches using physiological based sensors (e.g., electroencephalogram or electrocardiogram) are either too clumsy to wear or impractical to install. In this paper, we propose a novel driver fatigue detection method by embedding surface electromyography (sEMG) sensors on a steering wheel. Compared with the existing methods, our approach is able to collect bio-signals in a non-intrusive way and detect driver fatigue at an earlier stage. The experimental results show that our approach outperforms existing methods with the weighted average F1 scores about 90%. We also propose promising future directions to deploy this approach in real-life settings, such as applying multimodal learning using several supplementary sensors. △ Less

Submitted 18 October, 2020; originally announced October 2020.

arXiv:2009.12537 [pdf, other]

Deep Selective Combinatorial Embedding and Consistency Regularization for Light Field Super-resolution

Authors: Jing Jin, Junhui Hou, Zhiyu Zhu, Jie Chen, Sam Kwong

Abstract: Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited detector resolution has to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challengi… ▽ More Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited detector resolution has to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challenging than traditional single-image SR. The performance of existing methods is still limited as they fail to thoroughly explore the coherence among LF sub-aperture images (SAIs) and are insufficient in accurately preserving the scene's parallax structure. To tackle this challenge, we propose a novel learning-based LF spatial SR framework. Specifically, each SAI of an LF image is first coarsely and individually super-resolved by exploring the complementary information among SAIs with selective combinatorial geometry embedding. To achieve efficient and effective selection of the complementary information, we propose two novel sub-modules conducted hierarchically: the patch selector provides an option of retrieving similar image patches based on offline disparity estimation to handle large-disparity correlations; and the SAI selector adaptively and flexibly selects the most informative SAIs to improve the embedding efficiency. To preserve the parallax structure among the reconstructed SAIs, we subsequently append a consistency regularization network trained over a structure-aware loss function to refine the parallax relationships over the coarse estimation. In addition, we extend the proposed method to irregular LF data. To the best of our knowledge, this is the first learning-based SR method for irregular LF data. Experimental results over both synthetic and real-world LF datasets demonstrate the significant advantage of our approach over state-of-the-art methods. △ Less

Submitted 6 October, 2021; v1 submitted 26 September, 2020; originally announced September 2020.

Comments: 14 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2004.02215

arXiv:2007.11882 [pdf, other]

Deep Spatial-angular Regularization for Compressive Light Field Reconstruction over Coded Apertures

Authors: Mantang Guo, Junhui Hou, Jing Jin, Jie Chen, Lap-Pui Chau

Abstract: Coded aperture is a promising approach for capturing the 4-D light field (LF), in which the 4-D data are compressively modulated into 2-D coded measurements that are further decoded by reconstruction algorithms. The bottleneck lies in the reconstruction algorithms, resulting in rather limited reconstruction quality. To tackle this challenge, we propose a novel learning-based framework for the reco… ▽ More Coded aperture is a promising approach for capturing the 4-D light field (LF), in which the 4-D data are compressively modulated into 2-D coded measurements that are further decoded by reconstruction algorithms. The bottleneck lies in the reconstruction algorithms, resulting in rather limited reconstruction quality. To tackle this challenge, we propose a novel learning-based framework for the reconstruction of high-quality LFs from acquisitions via learned coded apertures. The proposed method incorporates the measurement observation into the deep learning framework elegantly to avoid relying entirely on data-driven priors for LF reconstruction. Specifically, we first formulate the compressive LF reconstruction as an inverse problem with an implicit regularization term. Then, we construct the regularization term with an efficient deep spatial-angular convolutional sub-network to comprehensively explore the signal distribution free from the limited representation ability and inefficiency of deterministic mathematical modeling. Experimental results show that the reconstructed LFs not only achieve much higher PSNR/SSIM but also preserve the LF parallax structure better, compared with state-of-the-art methods on both real and synthetic LF benchmarks. In addition, experiments show that our method is efficient and robust to noise, which is an essential advantage for a real camera system. The code is publicly available at \url{https://github.com/angmt2008/LFCA} △ Less

Submitted 23 July, 2020; originally announced July 2020.

arXiv:2006.16312 [pdf, other]

Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising

Authors: Xiaotian Hao, Zhaoqing Peng, Yi Ma, Guan Wang, Junqi Jin, Jianye Hao, Shan Chen, Rongquan Bai, Mingzhou Xie, Miao Xu, Zhenzhe Zheng, Chuan Yu, Han Li, Jian Xu, Kun Gai

Abstract: In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing adver… ▽ More In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing advertising systems mainly focus on the immediate revenue with single ad exposures, ignoring the contribution of each exposure to the final conversion, thus usually falls into suboptimal solutions. In this paper, we formulate the sequential advertising strategy optimization as a dynamic knapsack problem. We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space while ensuring the solution quality. To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach. Extensive offline and online experiments show the superior performance of our approaches over state-of-the-art baselines in terms of cumulative revenue. △ Less

Submitted 29 June, 2020; originally announced June 2020.

Comments: accepted by ICML 2020

arXiv:2004.03113 [pdf, ps, other]

Generalized Quadratic Matrix Programming: A Unified Framework for Linear Precoding With Arbitrary Input Distributions

Authors: Juening Jin, Yahong Rosa~Zheng, Wen Chen, Chengshan Xiao

Abstract: This paper investigates a new class of non-convex optimization, which provides a unified framework for linear precoding in single/multi-user multiple-input multiple-output (MIMO) channels with arbitrary input distributions. The new optimization is called generalized quadratic matrix programming (GQMP). Due to the nondeterministic polynomial time (NP)-hardness of GQMP problems, instead of seeking g… ▽ More This paper investigates a new class of non-convex optimization, which provides a unified framework for linear precoding in single/multi-user multiple-input multiple-output (MIMO) channels with arbitrary input distributions. The new optimization is called generalized quadratic matrix programming (GQMP). Due to the nondeterministic polynomial time (NP)-hardness of GQMP problems, instead of seeking globally optimal solutions, we propose an efficient algorithm which is guaranteed to converge to a Karush-Kuhn-Tucker (KKT) point. The idea behind this algorithm is to construct explicit concave lower bounds for non-convex objective and constraint functions, and then solve a sequence of concave maximization problems until convergence. In terms of application, we consider a downlink underlay secure cognitive radio (CR) network, where each node has multiple antennas. We design linear precoders to maximize the average secrecy (sum) rate with finite-alphabet inputs and statistical channel state information (CSI) at the transmitter. The precoding problems under secure multicast/broadcast scenarios are GQMP problems, and thus they can be solved efficiently by our proposed algorithm. Several numerical examples are provided to show the efficacy of our algorithm. △ Less

Submitted 7 April, 2020; originally announced April 2020.

Comments: TSP

arXiv:2004.02215 [pdf, other]

Light Field Spatial Super-resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization

Authors: Jing Jin, Junhui Hou, Jie Chen, Sam Kwong

Abstract: Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited sampling resources have to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challengi… ▽ More Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited sampling resources have to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challenging than traditional single-image SR. The performance of existing methods is still limited as they fail to thoroughly explore the coherence among LF views and are insufficient in accurately preserving the parallax structure of the scene. In this paper, we propose a novel learning-based LF spatial SR framework, in which each view of an LF image is first individually super-resolved by exploring the complementary information among views with combinatorial geometry embedding. For accurate preservation of the parallax structure among the reconstructed views, a regularization network trained over a structure-aware loss function is subsequently appended to enforce correct parallax relationships over the intermediate estimation. Our proposed approach is evaluated over datasets with a large number of testing images including both synthetic and real-world scenes. Experimental results demonstrate the advantage of our approach over state-of-the-art methods, i.e., our method not only improves the average PSNR by more than 1.0 dBでしべる but also preserves more accurate parallax details, at a lower computational cost. △ Less

Submitted 5 April, 2020; originally announced April 2020.

Comments: This paper was accepted by CVPR 2020

arXiv:2003.13203 [pdf, ps, other]

Linear Precoding for Fading Cognitive Multiple Access Wiretap Channel with Finite-Alphabet Inputs

Authors: Juening Jin, Chengshan Xiao, Meixia Tao, Wen Chen

Abstract: We investigate the fading cognitive multiple access wiretap channel (CMAC-WT), in which two secondary-user transmitters (STs) send secure messages to a secondary-user receiver (SR) in the presence of an eavesdropper (ED) and subject to interference threshold constraints at multiple primary-user receivers (PRs). We design linear precoders to maximize the average secrecy sum rate for multiple-input… ▽ More We investigate the fading cognitive multiple access wiretap channel (CMAC-WT), in which two secondary-user transmitters (STs) send secure messages to a secondary-user receiver (SR) in the presence of an eavesdropper (ED) and subject to interference threshold constraints at multiple primary-user receivers (PRs). We design linear precoders to maximize the average secrecy sum rate for multiple-input multiple-output (MIMO) fading CMAC-WT under finite-alphabet inputs and statistical channel state information (CSI) at STs. For this non-deterministic polynomial time (NP)-hard problem, we utilize an accurate approximation of the average secrecy sum rate to reduce the computational complexity, and then present a two-layer algorithm by embedding the convex-concave procedure into an outer approximation framework. The idea behind this algorithm is to reformulate the approximated average secrecy sum rate as a difference of convex functions, and then generate a sequence of simpler relaxed sets to approach the non-convex feasible set. Subsequently, we maximize the approximated average secrecy sum rate over the sequence of relaxed sets by using the convex-concave procedure. Numerical results indicate that our proposed precoding algorithm is superior to the conventional Gaussian precoding method in the medium and high signal-to-noise ratio (SNR) regimes. △ Less

Submitted 29 March, 2020; originally announced March 2020.

Comments: TVT

arXiv:2003.11972 [pdf, ps, other]

Hybrid Precoding For Millimeter Wave MIMO Systems: A Matrix Factorization Approach

Authors: Juening Jin, Yahong Rosa Zheng, Wen Chen, Chengshan Xiao

Abstract: This paper investigates the hybrid precoding design for millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with finite-alphabet inputs. The precoding problem is a joint optimization of analog and digital precoders, and we treat it as a matrix factorization problem with power and constant modulus constraints. Our work presents three main contributions: First, we present a suffic… ▽ More This paper investigates the hybrid precoding design for millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with finite-alphabet inputs. The precoding problem is a joint optimization of analog and digital precoders, and we treat it as a matrix factorization problem with power and constant modulus constraints. Our work presents three main contributions: First, we present a sufficient condition and a necessary condition for hybrid precoding schemes to realize unconstrained optimal precoders exactly when the number of data streams Ns satisfies Ns = minfrank(H);Nrfg, where H represents the channel matrix and Nrf is the number of radio frequency (RF) chains. Second, we show that the coupled power constraint in our matrix factorization problem can be removed without loss of optimality. Third, we propose a Broyden-Fletcher-Goldfarb-Shanno (BFGS)-based algorithm to solve our matrix factorization problem using gradient and Hessian information. Several numerical results are provided to show that our proposed algorithm outperforms existing hybrid precoding algorithms. △ Less

Submitted 26 March, 2020; originally announced March 2020.

Comments: TWC

arXiv:1911.08118 [pdf]

doi 10.1002/mrm.28590

Improving FLAIR SAR efficiency at 7T by adaptive tailoring of adiabatic pulse power using deep convolutional neural networks

Authors: Shahrokh Abbasi-Rad, Kieran O'Brien, Samuel Kelly, Viktor Vegh, Anders Rodell, Yasvir Tesiram, Jin Jin, Markus Barth, Steffen Bollmann

Abstract: Purpose: The purpose of this study is to demonstrate a method for Specific Absorption Rate (SAR) reduction for T2-FLAIR MRI sequences at 7T by predicting the required adiabatic pulse power and scaling the amplitude in a slice-wise fashion. Methods: We used a TR-FOCI adiabatic pulse for spin inversion in a T2-FLAIR sequence to improve B1+ homogeneity and calculate the pulse power required for adiab… ▽ More Purpose: The purpose of this study is to demonstrate a method for Specific Absorption Rate (SAR) reduction for T2-FLAIR MRI sequences at 7T by predicting the required adiabatic pulse power and scaling the amplitude in a slice-wise fashion. Methods: We used a TR-FOCI adiabatic pulse for spin inversion in a T2-FLAIR sequence to improve B1+ homogeneity and calculate the pulse power required for adiabaticity slice-by-slice to minimize the SAR. Drawing on the implicit B1+ inhomogeneity present in a standard localizer scan, 3D AutoAlign localizers and SA2RAGE B1+ maps were acquired in eight volunteers. A convolutional neural network (CNN) was then trained to predict the B1+ profile from the localizers and scale factors for the pulse power for each slice were calculated. The ability to predict the B1+ profile as well as how the derived pulse scale factors affected the FLAIR inversion efficiency were assessed in transverse, sagittal, and coronal orientations. Results: The predicted B1+ maps matched the measured B1+ maps with a mean difference of 4.45% across all slices. The acquisition in the transverse orientation was shown to be most effective for this method and delivered a 40% reduction in SAR along with 1min and 30-sec reduction in scan time (28%) without degradation of image quality. Conclusion: We propose a SAR reduction technique based on the prediction of B1+ profiles from standard localizer scans using a CNN and show that scaling the inversion pulse power slice-by-slice for FLAIR sequences at 7T reduces SAR and scan time without compromising image quality. △ Less

Submitted 19 November, 2019; originally announced November 2019.

arXiv:1909.04012 [pdf]

doi 10.1088/1361-6560/ab7970

Deep Learning-based Radiomic Features for Improving Neoadjuvant Chemoradiation Response Prediction in Locally Advanced Rectal Cancer

Authors: Jie Fu, Xinran Zhong, Ning Li, Ritchell Van Dams, John Lewis, Kyunghyun Sung, Ann C. Raldow, Jing Jin, X. Sharon Qi

Abstract: Radiomic features achieve promising results in cancer diagnosis, treatment response prediction, and survival prediction. Our goal is to compare the handcrafted (explicitly designed) and deep learning (DL)-based radiomic features extracted from pre-treatment diffusion-weighted magnetic resonance images (DWIs) for predicting neoadjuvant chemoradiation treatment (nCRT) response in patients with local… ▽ More Radiomic features achieve promising results in cancer diagnosis, treatment response prediction, and survival prediction. Our goal is to compare the handcrafted (explicitly designed) and deep learning (DL)-based radiomic features extracted from pre-treatment diffusion-weighted magnetic resonance images (DWIs) for predicting neoadjuvant chemoradiation treatment (nCRT) response in patients with locally advanced rectal cancer (LARC). 43 patients receiving nCRT were included. All patients underwent DWIs before nCRT and total mesorectal excision surgery 6-12 weeks after completion of nCRT. Gross tumor volume (GTV) contours were drawn by an experienced radiation oncologist on DWIs. The patient-cohort was split into the responder group (n=22) and the non-responder group (n=21) based on the post-nCRT response assessed by postoperative pathology, MRI or colonoscopy. Handcrafted and DL-based features were extracted from the apparent diffusion coefficient (ADC) map of the DWI using conventional computer-aided diagnosis methods and a pre-trained convolution neural network, respectively. Least absolute shrinkage and selection operator (LASSO)-logistic regression models were constructed using extracted features for predicting treatment response. The model performance was evaluated with repeated 20 times stratified 4-fold cross-validation using receiver operating characteristic (ROC) curves and compared using the corrected resampled t-test. The model built with handcrafted features achieved the mean area under the ROC curve (AUえーゆーC) of 0.64, while the one built with DL-based features yielded the mean AUC of 0.73. The corrected resampled t-test on AUえーゆーC showed P-value < 0.05. DL-based features extracted from pre-treatment DWIs achieved significantly better classification performance compared with handcrafted features for predicting nCRT response in patients with LARC. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: Review in progress

Journal ref: 2020 Phys. Med. Biol

arXiv:1909.01341 [pdf, other]

Deep Coarse-to-fine Dense Light Field Reconstruction with Flexible Sampling and Geometry-aware Fusion

Authors: Jing Jin, Junhui Hou, Jie Chen, Huanqiang Zeng, Sam Kwong, Jingyi Yu

Abstract: A densely-sampled light field (LF) is highly desirable in various applications, such as 3-D reconstruction, post-capture refocusing and virtual reality. However, it is costly to acquire such data. Although many computational methods have been proposed to reconstruct a densely-sampled LF from a sparsely-sampled one, they still suffer from either low reconstruction quality, low computational efficie… ▽ More A densely-sampled light field (LF) is highly desirable in various applications, such as 3-D reconstruction, post-capture refocusing and virtual reality. However, it is costly to acquire such data. Although many computational methods have been proposed to reconstruct a densely-sampled LF from a sparsely-sampled one, they still suffer from either low reconstruction quality, low computational efficiency, or the restriction on the regularity of the sampling pattern. To this end, we propose a novel learning-based method, which accepts sparsely-sampled LFs with irregular structures, and produces densely-sampled LFs with arbitrary angular resolution accurately and efficiently. We also propose a simple yet effective method for optimizing the sampling pattern. Our proposed method, an end-to-end trainable network, reconstructs a densely-sampled LF in a coarse-to-fine manner. Specifically, the coarse sub-aperture image (SAI) synthesis module first explores the scene geometry from an unstructured sparsely-sampled LF and leverages it to independently synthesize novel SAIs, in which a confidence-based blending strategy is proposed to fuse the information from different input SAIs, giving an intermediate densely-sampled LF. Then, the efficient LF refinement module learns the angular relationship within the intermediate result to recover the LF parallax structure. Comprehensive experimental evaluations demonstrate the superiority of our method on both real-world and synthetic LF images when compared with state-of-the-art methods. In addition, we illustrate the benefits and advantages of the proposed approach when applied in various LF-based applications, including image-based rendering and depth estimation enhancement. △ Less

Submitted 26 September, 2020; v1 submitted 31 August, 2019; originally announced September 2019.

Comments: 17 pages, 11 figures, 10 tables

arXiv:1907.09640 [pdf, other]

Light Field Super-resolution via Attention-Guided Fusion of Hybrid Lenses

Authors: Jing Jin, Junhui Hou, Jie Chen, Sam Kwong, Jingyi Yu

Abstract: This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. S… ▽ More This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation; the other one constructs another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations via the learned attention maps, leading to the final high-resolution LF image. Extensive experiments demonstrate the significant superiority of our approach over state-of-the-art ones. That is, our method not only improves the PSNR by more than 2 dBでしべる, but also preserves the LF structure much better. To the best of our knowledge, this is the first end-to-end deep learning method for reconstructing a high-resolution LF image with a hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and also be beneficial to LF data storage and transmission. The code is available at https://github.com/jingjin25/LFhybridSR-Fusion. △ Less

Submitted 31 July, 2020; v1 submitted 22 July, 2019; originally announced July 2019.

Comments: This paper was accepted by ACM MM 2020

arXiv:1906.10834 [pdf, other]

Essence Knowledge Distillation for Speech Recognition

Authors: Zhenchuan Yang, Chun Zhang, Weibin Zhang, Jianxiu Jin, Dongpeng Chen

Abstract: It is well known that a speech recognition system that combines multiple acoustic models trained on the same data significantly outperforms a single-model system. Unfortunately, real time speech recognition using a whole ensemble of models is too computationally expensive. In this paper, we propose to distill the knowledge of essence in an ensemble of models (i.e. the teacher model) to a single mo… ▽ More It is well known that a speech recognition system that combines multiple acoustic models trained on the same data significantly outperforms a single-model system. Unfortunately, real time speech recognition using a whole ensemble of models is too computationally expensive. In this paper, we propose to distill the knowledge of essence in an ensemble of models (i.e. the teacher model) to a single model (i.e. the student model) that needs much less computation to deploy. Previously, all the soften outputs of the teacher model are used to optimize the student model. We argue that not all the outputs of the ensemble are necessary to be distilled. Some of the outputs may even contain noisy information that is useless or even harmful to the training of the student model. In addition, we propose to train the student model with a multitask learning approach by utilizing both the soften outputs of the teacher model and the correct hard labels. The proposed method achieves some surprising results on the Switchboard data set. When the student model is trained together with the correct labels and the essence knowledge from the teacher model, it not only significantly outperforms another single model with the same architecture that is trained only with the correct labels, but also consistently outperforms the teacher model that is used to generate the soft labels. △ Less

Submitted 25 June, 2019; originally announced June 2019.

Comments: 5 pages, 2 figures

arXiv:1901.01038 [pdf, other]

A Full Bayesian Approach to Sparse Network Inference using Heterogeneous Datasets

Authors: Junyang Jin, Ye Yuan, Jorge Goncalves

Abstract: Network inference has been attracting increasing attention in several fields, notably systems biology, control engineering and biomedicine. To develop a therapy, it is essential to understand the connectivity of biochemical units and the internal working mechanisms of the target network. A network is mainly characterized by its topology and internal dynamics. In particular, sparse topology and sta… ▽ More Network inference has been attracting increasing attention in several fields, notably systems biology, control engineering and biomedicine. To develop a therapy, it is essential to understand the connectivity of biochemical units and the internal working mechanisms of the target network. A network is mainly characterized by its topology and internal dynamics. In particular, sparse topology and stable system dynamics are fundamental properties of many real-world networks. In recent years, kernel-based methods have been popular in the system identification community. By incorporating empirical Bayes, this framework, which we call KEB, is able to promote system stability and impose sparse network topology. Nevertheless, KEB may not be ideal for topology detection due to local optima and numerical errors. Here, therefore, we propose an alternative, data-driven, method that is designed to greatly improve inference accuracy, compared with KEB. The proposed method uses dynamical structure functions to describe networks so that the information of unmeasurable nodes is encoded in the model. A powerful numerical sampling method, namely reversible jump Markov chain Monte Carlo (RJMCMC), is applied to explore full Bayesian models effectively. Monte Carlo simulations indicate that our approach produces more accurate networks compared with KEB methods. Furthermore, simulations of a synthetic biological network demonstrate that the performance of the proposed method is superior to that of the state-of-the-art method, namely iCheMA. The implication is that the proposed method can be used in a wide range of applications, such as controller design, machinery fault diagnosis and therapy development. △ Less

Submitted 4 January, 2019; originally announced January 2019.

arXiv:1901.00673 [pdf, other]

High Precision Variational Bayesian Inference of Sparse Linear Networks

Authors: Junyang Jin, Ye Yuan, Jorge Goncalves

Abstract: Sparse networks can be found in a wide range of applications, such as biological and communication networks. Inference of such networks from data has been receiving considerable attention lately, mainly driven by the need to understand and control internal working mechanisms. However, while most available methods have been successful at predicting many correct links, they also tend to infer many i… ▽ More Sparse networks can be found in a wide range of applications, such as biological and communication networks. Inference of such networks from data has been receiving considerable attention lately, mainly driven by the need to understand and control internal working mechanisms. However, while most available methods have been successful at predicting many correct links, they also tend to infer many incorrect links. Precision is the ratio between the number of correctly inferred links and all inferred links, and should ideally be close to 100%. For example, 50% precision means that half of inferred links are incorrect, and there is only a 50% chance of picking a correct one. In contrast, this paper develops a method, based on variational Bayesian inference and Gaussian processes, that focuses on inferring links with very high precision. In addition, our method does not require full-state measurements and effectively promotes both system stability and network sparsity. Monte Carlo simulations illustrate that our method has 100% or nearly 100% precision, even in the presence of noise. The method should be applicable to a wide range of network inference contexts, including biological networks and power systems. △ Less

Submitted 25 December, 2019; v1 submitted 3 January, 2019; originally announced January 2019.

arXiv:1806.11301 [pdf, ps, other]

doi 10.1109/JSAC.2015.2504318

A Low-Latency List Successive-Cancellation Decoding Implementation for Polar Codes

Authors: YouZhe Fan, ChenYang Xia, Ji Chen, Chi-Ying Tsui, Jie Jin, Hui Shen, Bin Li

Abstract: Due to their provably capacity-achieving performance, polar codes have attracted a lot of research interest recently. For a good error-correcting performance, list successive-cancellation decoding (LSCD) with large list size is used to decode polar codes. However, as the complexity and delay of the list management operation rapidly increase with the list size, the overall latency of LSCD becomes l… ▽ More Due to their provably capacity-achieving performance, polar codes have attracted a lot of research interest recently. For a good error-correcting performance, list successive-cancellation decoding (LSCD) with large list size is used to decode polar codes. However, as the complexity and delay of the list management operation rapidly increase with the list size, the overall latency of LSCD becomes large and limits the applicability of polar codes in high-throughput and latency-sensitive applications. Therefore, in this work, the low-latency implementation for LSCD with large list size is studied. Specifically, at the system level, a selective expansion method is proposed such that some of the reliable bits are not expanded to reduce the computation and latency. At the algorithmic level, a double thresholding scheme is proposed as a fast approximate-sorting method for the list management operation to reduce the LSCD latency for large list size. A VLSI architecture of the LSCD implementing the selective expansion and double thresholding scheme is then developed, and implemented using a UMC 90 nm CMOS technology. Experimental results show that, even for a large list size of 16, the proposed LSCD achieves a decoding throughput of 460 Mbps at a clock frequency of 658 MHz. △ Less

Submitted 29 June, 2018; originally announced June 2018.

Comments: 15 pages, 13 figures, 5 tables

Journal ref: IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 34, NO. 2, FEBRUARY 2016

arXiv:1805.03000 [pdf, ps, other]

doi 10.23919/FPL.2017.8056843

An Implementation of List Successive Cancellation Decoder with Large List Size for Polar Codes

Authors: ChenYang Xia, YouZhe Fan, Ji Chen, Chi-ying Tsui, ChongYang Zeng, Jie Jin, Bin Li

Abstract: Polar codes are the first class of forward error correction (FEC) codes with a provably capacity-achieving capability. Using list successive cancellation decoding (LSCD) with a large list size, the error correction performance of polar codes exceeds other well-known FEC codes. However, the hardware complexity of LSCD rapidly increases with the list size, which incurs high usage of the resources on… ▽ More Polar codes are the first class of forward error correction (FEC) codes with a provably capacity-achieving capability. Using list successive cancellation decoding (LSCD) with a large list size, the error correction performance of polar codes exceeds other well-known FEC codes. However, the hardware complexity of LSCD rapidly increases with the list size, which incurs high usage of the resources on the field programmable gate array (FPGA) and significantly impedes the practical deployment of polar codes. To alleviate the high complexity, in this paper, two low-complexity decoding schemes and the corresponding architectures for LSCD targeting FPGA implementation are proposed. The architecture is implemented in an Altera Stratix V FPGA. Measurement results show that, even with a list size of 32, the architecture is able to decode a codeword of 4096-bit polar code within 150 us, achieving a throughput of 27Mbps △ Less

Submitted 8 May, 2018; originally announced May 2018.

Comments: 4 pages, 4 figures, 4 tables, Published in 27th International Conference on Field Programmable Logic and Applications (FPL), 2017

arXiv:1805.02916 [pdf, ps, other]

doi 10.1109/TSP.2018.2838554

A High-Throughput Architecture of List Successive Cancellation Polar Codes Decoder with Large List Size

Authors: ChenYang Xia, Ji Chen, YouZhe Fan, Chi-ying Tsui, Jie Jin, Hui Shen, Bin Li

Abstract: As the first kind of forward error correction (FEC) codes that achieve channel capacity, polar codes have attracted much research interest recently. Compared with other popular FEC codes, polar codes decoded by list successive cancellation decoding (LSCD) with a large list size have better error correction performance. However, due to the serial decoding nature of LSCD and the high complexity of l… ▽ More As the first kind of forward error correction (FEC) codes that achieve channel capacity, polar codes have attracted much research interest recently. Compared with other popular FEC codes, polar codes decoded by list successive cancellation decoding (LSCD) with a large list size have better error correction performance. However, due to the serial decoding nature of LSCD and the high complexity of list management (LM), the decoding latency is high, which limits the usage of polar codes in practical applications that require low latency and high throughput. In this work, we study the high-throughput implementation of LSCD with a large list size. Specifically, at the algorithmic level, to achieve a low decoding latency with moderate hardware complexity, two decoding schemes, a multi-bit double thresholding scheme and a partial G-node look-ahead scheme, are proposed. Then, a high-throughput VLSI architecture implementing the proposed algorithms is developed with optimizations on different computation modules. From the implementation results on UMC 90 nm CMOS technology, the proposed architecture achieves decoding throughputs of 1.103 Gbps, 977 Mbps and 827 Mbps when the list sizes are 8, 16 and 32, respectively. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Comments: 16 pages, 13 figures, 8 tables, accepted by IEEE Transactions on Signal Processing

Journal ref: IEEE Transactions on Signal Processing ( Volume: 66, Issue: 14, 2018 )

arXiv:1705.05415 [pdf, other]

doi 10.1007/978-3-319-92384-0_16

Robotic Wireless Sensor Networks

Authors: Pradipta Ghosh, Andrea Gasparri, Jiong Jin, Bhaskar Krishnamachari

Abstract: In this chapter, we present a literature survey of an emerging, cutting-edge, and multi-disciplinary field of research at the intersection of Robotics and Wireless Sensor Networks (WSN) which we refer to as Robotic Wireless Sensor Networks (RWSN). We define a RWSN as an autonomous networked multi-robot system that aims to achieve certain sensing goals while meeting and maintaining certain communic… ▽ More In this chapter, we present a literature survey of an emerging, cutting-edge, and multi-disciplinary field of research at the intersection of Robotics and Wireless Sensor Networks (WSN) which we refer to as Robotic Wireless Sensor Networks (RWSN). We define a RWSN as an autonomous networked multi-robot system that aims to achieve certain sensing goals while meeting and maintaining certain communication performance requirements, through cooperative control, learning and adaptation. While both of the component areas, i.e., Robotics and WSN, are very well-known and well-explored, there exist a whole set of new opportunities and research directions at the intersection of these two fields which are relatively or even completely unexplored. One such example would be the use of a set of robotic routers to set up a temporary communication path between a sender and a receiver that uses the controlled mobility to the advantage of packet routing. We find that there exist only a limited number of articles to be directly categorized as RWSN related works whereas there exist a range of articles in the robotics and the WSN literature that are also relevant to this new field of research. To connect the dots, we first identify the core problems and research trends related to RWSN such as connectivity, localization, routing, and robust flow of information. Next, we classify the existing research on RWSN as well as the relevant state-of-the-arts from robotics and WSN community according to the problems and trends identified in the first step. Lastly, we analyze what is missing in the existing literature, and identify topics that require more research attention in the future. △ Less

Submitted 2 September, 2018; v1 submitted 15 May, 2017; originally announced May 2017.

arXiv:1609.09660 [pdf, other]

On Identification of Sparse Multivariable ARX Model: A Sparse Bayesian Learning Approach

Authors: J. Jin, Y. Yuan, W. Pan, D. L. T. Pham, C. J. Tomlin, A. Webb, J. Goncalves

Abstract: This paper begins with considering the identification of sparse linear time-invariant networks described by multivariable ARX models. Such models possess relatively simple structure thus used as a benchmark to promote further research. With identifiability of the network guaranteed, this paper presents an identification method that infers both the Boolean structure of the network and the internal… ▽ More This paper begins with considering the identification of sparse linear time-invariant networks described by multivariable ARX models. Such models possess relatively simple structure thus used as a benchmark to promote further research. With identifiability of the network guaranteed, this paper presents an identification method that infers both the Boolean structure of the network and the internal dynamics between nodes. Identification is performed directly from data without any prior knowledge of the system, including its order. The proposed method solves the identification problem using Maximum a posteriori estimation (MAP) but with inseparable penalties for complexity, both in terms of element (order of nonzero connections) and group sparsity (network topology). Such an approach is widely applied in Compressive Sensing (CS) and known as Sparse Bayesian Learning (SBL). We then propose a novel scheme that combines sparse Bayesian and group sparse Bayesian to efficiently solve the problem. The resulted algorithm has a similar form of the standard Sparse Group Lasso (SGL) while with known noise variance, it simplifies to exact re-weighted SGL. The method and the developed toolbox can be applied to infer networks from a wide range of fields, including systems biology applications such as signaling and genetic regulatory networks. △ Less

Submitted 30 September, 2016; originally announced September 2016.

arXiv:1605.09543 [pdf, other]

Sparse Bayesian Inference of Multivariable ARX Networks

Authors: J. Jin, Y. Yuan, A. Webb, J. Goncalves

Abstract: Increasing attention has recently been given to the inference of sparse networks. In biology, for example, most molecules only bind to a small number of other molecules, leading to sparse molecular interaction networks. To achieve sparseness, a common approach consists of applying weighted penalties to the number of links between nodes in the network and the complexity of the dynamics of existing… ▽ More Increasing attention has recently been given to the inference of sparse networks. In biology, for example, most molecules only bind to a small number of other molecules, leading to sparse molecular interaction networks. To achieve sparseness, a common approach consists of applying weighted penalties to the number of links between nodes in the network and the complexity of the dynamics of existing links. The selection of proper weights, however, is non-trivial. Alternatively, this paper proposes a novel data-driven method, called GESBL, that is able to penalise both network sparsity and model complexity without any tuning. GESBL combines Sparse Bayesian Learning (SBL) and Group Sparse Bayesian Learning (GSBL) to introduce penalties for complexity, both in terms of element (system order of nonzero connections) and group sparsity (network topology). The paper considers a class of sparse linear time-invariant networks where the dynamics are represented by multivariable ARX models. Data generated from sparse random ARX networks and synthetic gene regulatory networks indicate that our method, on average, considerably outperforms existing state-of-the-art methods. The proposed method can be applied to a wide range of fields, from systems biology applications in signalling and genetic regulatory networks to power systems. △ Less

Submitted 3 January, 2019; v1 submitted 31 May, 2016; originally announced May 2016.

Showing 1–47 of 47 results for author: Jin, J