Search | arXiv e-print repository

Synchronous Multi-modal Semantic CommunicationSystem with Packet-level Coding

Authors: Yun Tian, Jingkai Ying, Zhijin Qin, Ye Jin, Xiaoming Tao

Abstract: Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction of multimodal semantics have not been well studied. Due to the independent design of semantic encoders, synchronizing multimodal features in both the sem… ▽ More Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction of multimodal semantics have not been well studied. Due to the independent design of semantic encoders, synchronizing multimodal features in both the semantic and time domains is a challenging problem. In this paper, we take the facial video and speech transmission as an example and propose a Synchronous Multimodal Semantic Communication System (SyncSC) with Packet-Level Coding. To achieve semantic and time synchronization, 3D Morphable Mode (3DMM) coefficients and text are transmitted as semantics, and we propose a semantic codec that achieves similar quality of reconstruction and synchronization with lower bandwidth, compared to traditional methods. To protect semantic packets under the erasure channel, we propose a packet-Level Forward Error Correction (FEC) method, called PacSC, that maintains a certain visual quality performance even at high packet loss rates. Particularly, for text packets, a text packet loss concealment module, called TextPC, based on Bidirectional Encoder Representations from Transformers (BERT) is proposed, which significantly improves the performance of traditional FEC methods. The simulation results show that our proposed SyncSC reduce transmission overhead and achieve high-quality synchronous transmission of video and speech over the packet loss network. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: 12 pages, 9 figures

arXiv:2408.03651 [pdf, other]

SAM2-PATH: A better segment anything model for semantic segmentation in digital pathology

Authors: Mingya Zhang, Liang Wang, Limei Gu, Zhao Li, Yaohui Wang, Tingshen Ling, Xianping Tao

Abstract: The semantic segmentation task in pathology plays an indispensable role in assisting physicians in determining the condition of tissue lesions. Foundation models, such as the SAM (Segment Anything Model) and SAM2, exhibit exceptional performance in instance segmentation within everyday natural scenes. SAM-PATH has also achieved impressive results in semantic segmentation within the field of pathol… ▽ More The semantic segmentation task in pathology plays an indispensable role in assisting physicians in determining the condition of tissue lesions. Foundation models, such as the SAM (Segment Anything Model) and SAM2, exhibit exceptional performance in instance segmentation within everyday natural scenes. SAM-PATH has also achieved impressive results in semantic segmentation within the field of pathology. However, in computational pathology, the models mentioned above still have the following limitations. The pre-trained encoder models suffer from a scarcity of pathology image data; SAM and SAM2 are not suitable for semantic segmentation. In this paper, we have designed a trainable Kolmogorov-Arnold Networks(KAN) classification module within the SAM2 workflow, and we have introduced the largest pretrained vision encoder for histopathology (UNI) to date. Our proposed framework, SAM2-PATH, augments SAM2's capability to perform semantic segmentation in digital pathology autonomously, eliminating the need for human provided input prompts. The experimental results demonstrate that, after fine-tuning the KAN classification module and decoder, Our dataset has achieved competitive results on publicly available pathology data. The code has been open-sourced and can be found at the following address: https://github.com/simzhangbest/SAM2PATH. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 6 pages , 3 figures

arXiv:2406.10469 [pdf, other]

Object-Attribute-Relation Representation based Video Semantic Communication

Authors: Qiyuan Du, Yiping Duan, Qianqian Yang, Xiaoming Tao, Mérouane Debbah

Abstract: With the rapid growth of multimedia data volume, there is an increasing need for efficient video transmission in applications such as virtual reality and future video streaming services. Semantic communication is emerging as a vital technique for ensuring efficient and reliable transmission in low-bandwidth, high-noise settings. However, most current approaches focus on joint source-channel coding… ▽ More With the rapid growth of multimedia data volume, there is an increasing need for efficient video transmission in applications such as virtual reality and future video streaming services. Semantic communication is emerging as a vital technique for ensuring efficient and reliable transmission in low-bandwidth, high-noise settings. However, most current approaches focus on joint source-channel coding (JSCC) that depends on end-to-end training. These methods often lack an interpretable semantic representation and struggle with adaptability to various downstream tasks. In this paper, we introduce the use of object-attribute-relation (OAR) as a semantic framework for videos to facilitate low bit-rate coding and enhance the JSCC process for more effective video transmission. We utilize OAR sequences for both low bit-rate representation and generative video reconstruction. Additionally, we incorporate OAR into the image JSCC model to prioritize communication resources for areas more critical to downstream tasks. Our experiments on traffic surveillance video datasets assess the effectiveness of our approach in terms of video transmission performance. The empirical findings demonstrate that our OAR-based video coding method not only outperforms H.265 coding at lower bit-rates but also synergizes with JSCC to deliver robust and efficient video transmission. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2405.10514 [pdf, other]

Secrecy Performance Analysis of Multi-Functional RIS-Assisted NOMA Networks

Authors: Yingjie Pei, Wanli Ni, Jin Xu, Xinwei Yue, Xiaofeng Tao, Dusit Niyato

Abstract: Although reconfigurable intelligent surface (RIS) can improve the secrecy communication performance of wireless users, it still faces challenges such as limited coverage and double-fading effect. To address these issues, in this paper, we utilize a novel multi-functional RIS (MF-RIS) to enhance the secrecy performance of wireless users, and investigate the physical layer secrecy problem in non-ort… ▽ More Although reconfigurable intelligent surface (RIS) can improve the secrecy communication performance of wireless users, it still faces challenges such as limited coverage and double-fading effect. To address these issues, in this paper, we utilize a novel multi-functional RIS (MF-RIS) to enhance the secrecy performance of wireless users, and investigate the physical layer secrecy problem in non-orthogonal multiple access (NOMA) networks. Specifically, we derive closed-form expressions for the secrecy outage probability (SOP) and secrecy throughput of users in the MF-RIS-assisted NOMA networks with external and internal eavesdroppers. The asymptotic expressions for SOP and secrecy diversity order are also analyzed under high signal-to-noise ratio (SNR) conditions. Additionally, we examine the impact of receiver hardware limitations and error transmission-induced imperfect successive interference cancellation (SIC) on the secrecy performance. Numerical results indicate that: i) under the same power budget, the secrecy performance achieved by MF-RIS significantly outperforms active RIS and simultaneously transmitting and reflecting RIS; ii) with increasing power budget, residual interference caused by imperfect SIC surpasses thermal noise as the primary factor affecting secrecy capacity; and iii) deploying additional elements at the MF-RIS brings significant secrecy enhancements for the external eavesdropping scenario, in contrast to the internal eavesdropping case. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 14 pages, 9 figures, submitted to IEEE transactions on wireless communication

arXiv:2405.08096 [pdf, other]

Semantic MIMO Systems for Speech-to-Text Transmission

Authors: Zhenzi Weng, Zhijin Qin, Huiqiang Xie, Xiaoming Tao, Khaled B. Letaief

Abstract: Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve… ▽ More Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve the speech-to-text task at the receiver is first designed, which compresses the semantic information and generates the low-dimensional semantic features by leveraging the transformer module. In addition, a novel semantic-aware network is proposed to facilitate the transmission with high semantic fidelity to identify the critical semantic information and guarantee it is recovered accurately. Furthermore, we extend the SAC-ST with a neural network-enabled channel estimation network to mitigate the dependence on accurate channel state information and validate the feasibility of SAC-ST in practical communication environments. Simulation results will show that the proposed SAC-ST outperforms the communication framework without the semantic-aware network for speech-to-text transmission over the MIMO channels in terms of the speech-to-text metrics, especially in the low signal-to-noise regime. Moreover, the SAC-ST with the developed channel estimation network is comparable to the SAC-ST with perfect channel state information. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.16913 [pdf, other]

DE-CGAN: Boosting rTMS Treatment Prediction with Diversity Enhancing Conditional Generative Adversarial Networks

Authors: Matthew Squires, Xiaohui Tao, Soman Elangovan, Raj Gururajan, Haoran Xie, Xujuan Zhou, Yuefeng Li, U Rajendra Acharya

Abstract: Repetitive Transcranial Magnetic Stimulation (rTMS) is a well-supported, evidence-based treatment for depression. However, patterns of response to this treatment are inconsistent. Emerging evidence suggests that artificial intelligence can predict rTMS treatment outcomes for most patients using fMRI connectivity features. While these models can reliably predict treatment outcomes for many patients… ▽ More Repetitive Transcranial Magnetic Stimulation (rTMS) is a well-supported, evidence-based treatment for depression. However, patterns of response to this treatment are inconsistent. Emerging evidence suggests that artificial intelligence can predict rTMS treatment outcomes for most patients using fMRI connectivity features. While these models can reliably predict treatment outcomes for many patients for some underrepresented fMRI connectivity measures DNN models are unable to reliably predict treatment outcomes. As such we propose a novel method, Diversity Enhancing Conditional General Adversarial Network (DE-CGAN) for oversampling these underrepresented examples. DE-CGAN creates synthetic examples in difficult-to-classify regions by first identifying these data points and then creating conditioned synthetic examples to enhance data diversity. Through empirical experiments we show that a classification model trained using a diversity enhanced training set outperforms traditional data augmentation techniques and existing benchmark results. This work shows that increasing the diversity of a training dataset can improve classification model performance. Furthermore, this work provides evidence for the utility of synthetic patients providing larger more robust datasets for both AI researchers and psychiatrists to explore variable relationships. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.09222 [pdf, other]

A Robust Semantic Communication System for Image

Authors: Xiang Peng, Zhijin Qin, Xiaoming Tao, Jianhua Lu, Khaled B. Letaief

Abstract: Semantic communications have gained significant attention as a promising approach to address the transmission bottleneck, especially with the continuous development of 6G techniques. Distinct from the well investigated physical channel impairments, this paper focuses on semantic impairments in image, particularly those arising from adversarial perturbations. Specifically, we propose a novel metric… ▽ More Semantic communications have gained significant attention as a promising approach to address the transmission bottleneck, especially with the continuous development of 6G techniques. Distinct from the well investigated physical channel impairments, this paper focuses on semantic impairments in image, particularly those arising from adversarial perturbations. Specifically, we propose a novel metric for quantifying the intensity of semantic impairment and develop a semantic impairment dataset. Furthermore, we introduce a deep learning enabled semantic communication system, termed as DeepSC-RI, to enhance the robustness of image transmission, which incorporates a multi-scale semantic extractor with a dual-branch architecture for extracting semantics with varying granularity, thereby improving the robustness of the system. The fine-grained branch incorporates a semantic importance evaluation module to identify and prioritize crucial semantics, while the coarse-grained branch adopts a hierarchical approach for capturing the robust semantics. These two streams of semantics are seamlessly integrated via an advanced cross-attention-based semantic fusion module. Experimental results demonstrate the superior performance of DeepSC-RI under various levels of semantic impairment intensity. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 6 pages

arXiv:2403.09157 [pdf, ps, other]

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

Authors: Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, Xianping Tao

Abstract: In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models… ▽ More In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models (SSMs), such as Mamba, have been recognized as a promising method. They not only demonstrate superior performance in modeling long-range interactions, but also preserve a linear computational complexity. Inspired by the Mamba architecture, We proposed Vison Mamba-UNetV2, the Visual State Space (VSS) Block is introduced to capture extensive contextual information, the Semantics and Detail Infusion (SDI) is introduced to augment the infusion of low-level and high-level features. We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB and ETIS-LaribPolypDB public datasets. The results indicate that VM-UNetV2 exhibits competitive performance in medical image segmentation tasks. Our code is available at https://github.com/nobodyplayer1/VM-UNetV2. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 12 pages, 4 figures

arXiv:2402.13073 [pdf, other]

Towards Intelligent Communications: Large Model Empowered Semantic Communications

Authors: Huiqiang Xie, Zhijin Qin, Xiaoming Tao, Zhu Han

Abstract: Deep learning enabled semantic communications have shown great potential to significantly improve transmission efficiency and alleviate spectrum scarcity, by effectively exchanging the semantics behind the data. Recently, the emergence of large models, boasting billions of parameters, has unveiled remarkable human-like intelligence, offering a promising avenue for advancing semantic communication… ▽ More Deep learning enabled semantic communications have shown great potential to significantly improve transmission efficiency and alleviate spectrum scarcity, by effectively exchanging the semantics behind the data. Recently, the emergence of large models, boasting billions of parameters, has unveiled remarkable human-like intelligence, offering a promising avenue for advancing semantic communication by enhancing semantic understanding and contextual understanding. This article systematically investigates the large model-empowered semantic communication systems from potential applications to system design. First, we propose a new semantic communication architecture that seamlessly integrates large models into semantic communication through the introduction of a memory module. Then, the typical applications are illustrated to show the benefits of the new architecture. Besides, we discuss the key designs in implementing the new semantic communication systems from module design to system training. Finally, the potential research directions are identified to boost the large model-empowered semantic communications. △ Less

Submitted 19 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: 7 pages, 6 figures

arXiv:2402.02950 [pdf, other]

Semantic Entropy Can Simultaneously Benefit Transmission Efficiency and Channel Security of Wireless Semantic Communications

Authors: Yankai Rong, Guoshun Nan, Minwei Zhang, Sihan Chen, Songtao Wang, Xuefei Zhang, Nan Ma, Shixun Gong, Zhaohui Yang, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek

Abstract: Recently proliferated deep learning-based semantic communications (DLSC) focus on how transmitted symbols efficiently convey a desired meaning to the destination. However, the sensitivity of neural models and the openness of wireless channels cause the DLSC system to be extremely fragile to various malicious attacks. This inspires us to ask a question: "Can we further exploit the advantages of tra… ▽ More Recently proliferated deep learning-based semantic communications (DLSC) focus on how transmitted symbols efficiently convey a desired meaning to the destination. However, the sensitivity of neural models and the openness of wireless channels cause the DLSC system to be extremely fragile to various malicious attacks. This inspires us to ask a question: "Can we further exploit the advantages of transmission efficiency in wireless semantic communications while also alleviating its security disadvantages?". Keeping this in mind, we propose SemEntropy, a novel method that answers the above question by exploring the semantics of data for both adaptive transmission and physical layer encryption. Specifically, we first introduce semantic entropy, which indicates the expectation of various semantic scores regarding the transmission goal of the DLSC. Equipped with such semantic entropy, we can dynamically assign informative semantics to Orthogonal Frequency Division Multiplexing (OFDM) subcarriers with better channel conditions in a fine-grained manner. We also use the entropy to guide semantic key generation to safeguard communications over open wireless channels. By doing so, both transmission efficiency and channel security can be simultaneously improved. Extensive experiments over various benchmarks show the effectiveness of the proposed SemEntropy. We discuss the reason why our proposed method benefits secure transmission of DLSC, and also give some interesting findings, e.g., SemEntropy can keep the semantic accuracy remain 95% with 60% less transmission. △ Less

Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 13 pages, 12 figures

arXiv:2401.17575 [pdf, other]

Can We Improve Channel Reciprocity via Loop-back Compensation for RIS-assisted Physical Layer Key Generation

Authors: Ningya Xu, Guoshun Nan, Xiaofeng Tao, Na Li, Pengxuan Mao, Tianyuan Yang

Abstract: Reconfigurable intelligent surface (RIS) facilitates the extraction of unpredictable channel features for physical layer key generation (PKG), securing communications among legitimate users with symmetric keys. Previous works have demonstrated that channel reciprocity plays a crucial role in generating symmetric keys in PKG systems, whereas, in reality, reciprocity is greatly affected by hardware… ▽ More Reconfigurable intelligent surface (RIS) facilitates the extraction of unpredictable channel features for physical layer key generation (PKG), securing communications among legitimate users with symmetric keys. Previous works have demonstrated that channel reciprocity plays a crucial role in generating symmetric keys in PKG systems, whereas, in reality, reciprocity is greatly affected by hardware interference and RIS-based jamming attacks. This motivates us to propose LoCKey, a novel approach that aims to improve channel reciprocity by mitigating interferences and attacks with a loop-back compensation scheme, thus maximizing the secrecy performance of the PKG system. Specifically, our proposed LoCKey is capable of effectively compensating for the CSI non-reciprocity by the combination of transmit-back signal value and error minimization module. Firstly, we introduce the entire flowchart of our method and provide an in-depth discussion of each step. Following that, we delve into a theoretical analysis of the performance optimizations when our LoCKey is applied for CSI reciprocity enhancement. Finally, we conduct experiments to verify the effectiveness of the proposed LoCKey in improving channel reciprocity under various interferences for RIS-assisted wireless communications. The results demonstrate a significant improvement in both the rate of key generation assisted by the RIS and the consistency of the generated keys, showing great potential for the practical deployment of our LoCKey in future wireless systems. △ Less

Submitted 30 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted by ICC 2024

arXiv:2401.00859 [pdf, ps, other]

Federated Multi-View Synthesizing for Metaverse

Authors: Yiyu Guo, Zhijin Qin, Xiaoming Tao, Geoffrey Ye Li

Abstract: The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view… ▽ More The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view synthesizing framework that can efficiently provide computation, storage, and communication resources for wireless content delivery in the metaverse. We propose a three-dimensional (3D)-aware generative model that uses collections of single-view images. These single-view images are transmitted to a group of users with overlapping fields of view, which avoids massive content transmission compared to transmitting tiles or whole 3D models. We then present a federated learning approach to guarantee an efficient learning process. The training performance can be improved by characterizing the vertical and horizontal data samples with a large latent feature space, while low-latency communication can be achieved with a reduced number of transmitted parameters during federated learning. We also propose a federated transfer learning framework to enable fast domain adaptation to different target domains. Simulation results have demonstrated the effectiveness of our proposed federated multi-view synthesizing framework for VR content delivery. △ Less

Submitted 18 December, 2023; originally announced January 2024.

arXiv:2311.04685 [pdf, other]

An End-Cloud Computing Enabled Surveillance Video Transmission System

Authors: Dingxi Yang, Zhijin Qin, Liting Wang, Xiaoming Tao, Fang Cui, Hengjiang Wang

Abstract: The enormous data volume of video poses a significant burden on the network. Particularly, transferring high-definition surveillance videos to the cloud consumes a significant amount of spectrum resources. To address these issues, we propose a surveillance video transmission system enabled by end-cloud computing. Specifically, the cameras actively down-sample the original video and then a redundan… ▽ More The enormous data volume of video poses a significant burden on the network. Particularly, transferring high-definition surveillance videos to the cloud consumes a significant amount of spectrum resources. To address these issues, we propose a surveillance video transmission system enabled by end-cloud computing. Specifically, the cameras actively down-sample the original video and then a redundant frame elimination module is employed to further reduce the data volume of surveillance videos. Then we develop a key-frame assisted video super-resolution model to reconstruct the high-quality video at the cloud side. Moreover, we propose a strategy of extracting key frames from source videos for better reconstruction performance by utilizing the peak signal-to-noise ratio (PSNR) of adjacent frames to measure the propagation distance of key frame information. Simulation results show that the developed system can effectively reduce the data volume by the end-cloud collaboration and outperforms existing video super-resolution models significantly in terms of PSNR and structural similarity index (SSIM). △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2309.02653 [pdf, other]

Passive Eavesdropping Can Significantly Slow Down RIS-Assisted Secret Key Generation

Authors: Ningya Xu, Guoshun Nan, Xiaofeng Tao

Abstract: Reconfigurable Intelligent Surface (RIS) assisted physical layer key generation has shown great potential to secure wireless communications by smartly controlling signals such as phase and amplitude. However, previous studies mainly focus on RIS adjustment under ideal conditions, while the correlation between the eavesdropping channel and the legitimate channel, a more practical setting in the rea… ▽ More Reconfigurable Intelligent Surface (RIS) assisted physical layer key generation has shown great potential to secure wireless communications by smartly controlling signals such as phase and amplitude. However, previous studies mainly focus on RIS adjustment under ideal conditions, while the correlation between the eavesdropping channel and the legitimate channel, a more practical setting in the real world, is still largely under-explored for the key generation. To fill this gap, this paper aims to maximize the RIS-assisted physical-layer secret key generation by optimizing the RIS units switching under the eavesdropping channel. Firstly, we theoretically show that passive eavesdropping significantly reduces RIS-assisted secret key generation. Keeping this in mind, we then introduce a mathematical formulation to maximize the key generation rate and provide a step-by-step analysis. Extensive experiments show the effectiveness of our method in benefiting the secret key capacity under the eavesdropping channel. We also observe that the key randomness, and unmatched key rate, two metrics that measure the secret key quality, are also significantly improved, potentially paving the way to RIS-assisted key generation in real-world scenarios. △ Less

Submitted 14 October, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted by Globecom 2023

arXiv:2308.05760 [pdf, other]

Unified Statistical Channel Modeling and performance analysis of Vertical Underwater Wireless Optical Communication Links considering Turbulence-Induced Fading

Authors: Dongling Xu, Xiang Yi, Yalçn Ata, Xinyue Tao, Yuxuan Li, Peng Yue

Abstract: The reliability of a vertical underwater wireless optical communication (UWOC) network is seriously impacted by turbulence-induced fading due to fluctuations in the water temperature and salinity, which vary with depth. To better assess the vertical UWOC system performances, an accurate probability distribution function (PDF) model that can describe this fading is indispensable. In view of the lim… ▽ More The reliability of a vertical underwater wireless optical communication (UWOC) network is seriously impacted by turbulence-induced fading due to fluctuations in the water temperature and salinity, which vary with depth. To better assess the vertical UWOC system performances, an accurate probability distribution function (PDF) model that can describe this fading is indispensable. In view of the limitations of theoretical and experimental studies, this paper is the first to establish a more accurate modeling scheme for wave optics simulation (WOS) by fully considering the constraints of sampling conditions on multi-phase screen parameters. On this basis, we complete the modeling of light propagation in a vertical oceanic turbulence channel and subsequently propose a unified statistical model named mixture Weibull-generalized Gamma (WGG) distribution model to characterize turbulence-induced fading in vertical links. Interestingly, the WGG model is shown to provide a perfect fit with the acquired data under all considered channel conditions. We further show that the application of the WGG model leads to closed-form and analytically tractable expressions for key UWOC system performance metrics such as the average bit-error rate (BER). The presented results give valuable insight into the practical aspects of development of UWOC networks. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2307.02900 [pdf, ps, other]

Meta Federated Reinforcement Learning for Distributed Resource Allocation

Authors: Zelin Ji, Zhijin Qin, Xiaoming Tao

Abstract: In cellular networks, resource allocation is usually performed in a centralized way, which brings huge computation complexity to the base station (BS) and high transmission overhead. This paper explores a distributed resource allocation method that aims to maximize energy efficiency (EE) while ensuring the quality of service (QoS) for users. Specifically, in order to address wireless channel condi… ▽ More In cellular networks, resource allocation is usually performed in a centralized way, which brings huge computation complexity to the base station (BS) and high transmission overhead. This paper explores a distributed resource allocation method that aims to maximize energy efficiency (EE) while ensuring the quality of service (QoS) for users. Specifically, in order to address wireless channel conditions, we propose a robust meta federated reinforcement learning (\textit{MFRL}) framework that allows local users to optimize transmit power and assign channels using locally trained neural network models, so as to offload computational burden from the cloud server to the local users, reducing transmission overhead associated with local channel state information. The BS performs the meta learning procedure to initialize a general global model, enabling rapid adaptation to different environments with improved EE performance. The federated learning technique, based on decentralized reinforcement learning, promotes collaboration and mutual benefits among users. Analysis and numerical results demonstrate that the proposed \textit{MFRL} framework accelerates the reinforcement learning process, decreases transmission overhead, and offloads computation, while outperforming the conventional decentralized reinforcement learning algorithm in terms of convergence speed and EE performance across various scenarios. △ Less

Submitted 9 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Submitted to TWC

arXiv:2305.08247 [pdf]

A Fast and Robust Camera-IMU Online Calibration Method For Localization System

Authors: Xiaowen Tao, Pengxiang Meng, Bing Zhu, Jian Zhao

Abstract: Autonomous driving has spurred the development of sensor fusion techniques, which combine data from multiple sensors to improve system performance. In particular, localization system based on sensor fusion , such as Visual Simultaneous Localization and Mapping (VSLAM), is an important component in environment perception, and is the basis of decision-making and motion control for intelligent vehicl… ▽ More Autonomous driving has spurred the development of sensor fusion techniques, which combine data from multiple sensors to improve system performance. In particular, localization system based on sensor fusion , such as Visual Simultaneous Localization and Mapping (VSLAM), is an important component in environment perception, and is the basis of decision-making and motion control for intelligent vehicles. The accuracy of extrinsic calibration parameters between camera and IMU has significant effect on the positioning precision when performing VSLAM system. Currently, existing methods are time-consuming using complex optimization methods and sensitive to noise and outliers due to off-calibration, which can negatively impact system performance. To address these problems, this paper presents a fast and robust camera-IMU online calibration method based space coordinate transformation constraints and SVD (singular Value Decomposition) tricks. First, constraint equations are constructed based on equality of rotation and transformation matrices between camera frames and IMU coordinates at different moments. Secondly, the external parameters of the camera-IMU are solved using quaternion transformation and SVD techniques. Finally, the proposed method is validated using ROS platform, where images from the camera and velocity, acceleration, and angular velocity data from the IMU are recorded in a ROS bag file. The results showed that the proposed method can achieve robust and reliable camera-IMU online calibration parameters results with less tune consuming and less uncertainty. △ Less

Submitted 14 May, 2023; originally announced May 2023.

arXiv:2305.07220 [pdf, other]

Physical-layer Adversarial Robustness for Deep Learning-based Semantic Communications

Authors: Guoshun Nan, Zhichun Li, Jinli Zhai, Qimei Cui, Gong Chen, Xin Du, Xuefei Zhang, Xiaofeng Tao, Zhu Han, Tony Q. S. Quek

Abstract: End-to-end semantic communications (ESC) rely on deep neural networks (DNN) to boost communication efficiency by only transmitting the semantics of data, showing great potential for high-demand mobile applications. We argue that central to the success of ESC is the robust interpretation of conveyed semantics at the receiver side, especially for security-critical applications such as automatic driv… ▽ More End-to-end semantic communications (ESC) rely on deep neural networks (DNN) to boost communication efficiency by only transmitting the semantics of data, showing great potential for high-demand mobile applications. We argue that central to the success of ESC is the robust interpretation of conveyed semantics at the receiver side, especially for security-critical applications such as automatic driving and smart healthcare. However, robustifying semantic interpretation is challenging as ESC is extremely vulnerable to physical-layer adversarial attacks due to the openness of wireless channels and the fragileness of neural models. Toward ESC robustness in practice, we ask the following two questions: Q1: For attacks, is it possible to generate semantic-oriented physical-layer adversarial attacks that are imperceptible, input-agnostic and controllable? Q2: Can we develop a defense strategy against such semantic distortions and previously proposed adversaries? To this end, we first present MobileSC, a novel semantic communication framework that considers the computation and memory efficiency in wireless environments. Equipped with this framework, we propose SemAdv, a physical-layer adversarial perturbation generator that aims to craft semantic adversaries over the air with the abovementioned criteria, thus answering the Q1. To better characterize the realworld effects for robust training and evaluation, we further introduce a novel adversarial training method SemMixed to harden the ESC against SemAdv attacks and existing strong threats, thus answering the Q2. Extensive experiments on three public benchmarks verify the effectiveness of our proposed methods against various physical adversarial attacks. We also show some interesting findings, e.g., our MobileSC can even be more robust than classical block-wise communication systems in the low SNR regime. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: 17 pages, 28 figures, accepted by IEEE jsac

arXiv:2305.06543 [pdf, other]

QoE-based Semantic-Aware Resource Allocation for Multi-Task Networks

Authors: Lei Yan, Zhijin Qin, Chunfeng Li, Rui Zhang, Yongzhao Li, Xiaoming Tao

Abstract: By transmitting task-related information only, semantic communications yield significant performance gains over conventional communications. However, the lack of mature semantic theory about semantic information quantification and performance evaluation makes it challenging to perform resource allocation for semantic communications, especially when multiple tasks coexist in the network. To cope wi… ▽ More By transmitting task-related information only, semantic communications yield significant performance gains over conventional communications. However, the lack of mature semantic theory about semantic information quantification and performance evaluation makes it challenging to perform resource allocation for semantic communications, especially when multiple tasks coexist in the network. To cope with this challenge, we propose a quality-of-experience (QoE) based semantic-aware resource allocation method for multi-task networks in this paper. First, semantic entropy is defined to quantify the semantic information for different tasks, and the relationship between semantic entropy and Shannon entropy is analyzed. Then, we develop a novel QoE model to formulate the semantic-aware resource allocation in terms of semantic compression, channel assignment, and transmit power. The compatibility of the formulated problem with conventional communications is further demonstrated. To solve this problem, we decouple it into two subproblems and solved them by a developed deep Q-network (DQN) based method and a proposed low-complexity matching algorithm, respectively. Finally, simulation results validate the effectiveness and superiority of the proposed method, as well as its compatibility with conventional communications. △ Less

Submitted 8 April, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: This work has been accepted by IEEE Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2205.14530

arXiv:2303.16523 [pdf, other]

Boosting Physical Layer Black-Box Attacks with Semantic Adversaries in Semantic Communications

Authors: Zeju Li, Xinghan Liu, Guoshun Nan, Jinfei Zhou, Xinchen Lyu, Qimei Cui, Xiaofeng Tao

Abstract: End-to-end semantic communication (ESC) system is able to improve communication efficiency by only transmitting the semantics of the input rather than raw bits. Although promising, ESC has also been shown susceptible to the crafted physical layer adversarial perturbations due to the openness of wireless channels and the sensitivity of neural models. Previous works focus more on the physical layer… ▽ More End-to-end semantic communication (ESC) system is able to improve communication efficiency by only transmitting the semantics of the input rather than raw bits. Although promising, ESC has also been shown susceptible to the crafted physical layer adversarial perturbations due to the openness of wireless channels and the sensitivity of neural models. Previous works focus more on the physical layer white-box attacks, while the challenging black-box ones, as more practical adversaries in real-world cases, are still largely under-explored. To this end, we present SemBLK, a novel method that can learn to generate destructive physical layer semantic attacks for an ESC system under the black-box setting, where the adversaries are imperceptible to humans. Specifically, 1) we first introduce a surrogate semantic encoder and train its parameters by exploring a limited number of queries to an existing ESC system. 2) Equipped with such a surrogate encoder, we then propose a novel semantic perturbation generation method to learn to boost the physical layer attacks with semantic adversaries. Experiments on two public datasets show the effectiveness of our proposed SemBLK in attacking the ESC system under the black-box setting. Finally, we provide case studies to visually justify the superiority of our physical layer semantic perturbations. △ Less

Submitted 30 March, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: accepted by ICC2023

arXiv:2301.08376 [pdf, ps, other]

Resource Optimization for Semantic-Aware Networks with Task Offloading

Authors: Zelin Ji, Zhijin Qin, Xiaoming Tao, Han Zhu

Abstract: The limited capabilities of user equipment restrict the local implementation of computation-intensive applications. Edge computing, especially the edge intelligence system, enables local users to offload the computation tasks to the edge servers to reduce the computational energy consumption of user equipment and accelerate fast task execution. However, the limited bandwidth of upstream channels m… ▽ More The limited capabilities of user equipment restrict the local implementation of computation-intensive applications. Edge computing, especially the edge intelligence system, enables local users to offload the computation tasks to the edge servers to reduce the computational energy consumption of user equipment and accelerate fast task execution. However, the limited bandwidth of upstream channels may increase the task transmission latency and affect the computation offloading performance. To overcome the challenge arising from scarce wireless communication resources, we propose a semantic-aware multi-modal task offloading system that facilitates the extraction and offloading of semantic task information to edge servers. To cope with the different tasks with multi-modal data, a unified quality of experience (QoE) criterion is designed. Furthermore, a proximal policy optimization-based multi-agent reinforcement learning algorithm (MAPPO) is proposed to coordinate the resource management for wireless communications and computation in a distributed and low computational complexity manner. Simulation results verify that the proposed MAPPO algorithm outperforms other reinforcement learning algorithms and fixed schemes in terms of task execution speed and the overall system QoE. △ Less

Submitted 29 January, 2024; v1 submitted 19 January, 2023; originally announced January 2023.

Comments: Submitted to TWC, in major revision

arXiv:2301.05837 [pdf, ps, other]

Environment Semantics Aided Wireless Communications: A Case Study of mmWave Beam Prediction and Blockage Prediction

Authors: Yuwen Yang, Feifei Gao, Xiaoming Tao, Guangyi Liu, Chengkang Pan

Abstract: In this paper, we propose an environment semantics aided wireless communication framework to reduce the transmission latency and improve the transmission reliability, where semantic information is extracted from environment image data, selectively encoded based on its task-relevance, and then fused to make decisions for channel related tasks. As a case study, we develop an environment semantics ai… ▽ More In this paper, we propose an environment semantics aided wireless communication framework to reduce the transmission latency and improve the transmission reliability, where semantic information is extracted from environment image data, selectively encoded based on its task-relevance, and then fused to make decisions for channel related tasks. As a case study, we develop an environment semantics aided network architecture for mmWave communication systems, which is composed of a semantic feature extraction network, a feature selection algorithm, a task-oriented encoder, and a decision network. With images taken from street cameras and user's identification information as the inputs, the environment semantics aided network architecture is trained to predict the optimal beam index and the blockage state for the base station. It is seen that without pilot training or the costly beam scans, the environment semantics aided network architecture can realize extremely efficient beam prediction and timely blockage prediction, thus meeting requirements for ultra-reliable and low-latency communications (URLLCs). Simulation results demonstrate that compared with existing works, the proposed environment semantics aided network architecture can reduce system overheads such as storage space and computational cost while achieving satisfactory prediction accuracy and protecting user privacy. △ Less

Submitted 14 January, 2023; originally announced January 2023.

arXiv:2301.04552 [pdf, other]

A Generalized Semantic Communication System: from Sources to Channels

Authors: Zhijin Qin, Feifei Gao, Bo Lin, Xiaoming Tao, Guangyi Liu, Chengkang Pan

Abstract: Semantic communication is regarded as the breakthrough beyond the Shannon paradigm, which transmits only semantic information to significantly improve communication efficiency. This article introduces a framework for generalized semantic communication system, which exploits the semantic information in both the multimodal source and the wireless channel environment. Subsequently, the developed deep… ▽ More Semantic communication is regarded as the breakthrough beyond the Shannon paradigm, which transmits only semantic information to significantly improve communication efficiency. This article introduces a framework for generalized semantic communication system, which exploits the semantic information in both the multimodal source and the wireless channel environment. Subsequently, the developed deep learning enabled end-to-end semantic communication and environment semantics aided wireless communication techniques are demonstrated through two examples. The article concludes with several research challenges to boost the development of such a generalized semantic communication system. △ Less

Submitted 14 March, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

arXiv:2209.11519 [pdf, other]

doi 10.1109/LWC.2023.3255221

Vector Quantized Semantic Communication System

Authors: Qifan Fu, Huiqiang Xie, Zhijin Qin, Gregory Slabaugh, Xiaoming Tao

Abstract: Although analog semantic communication systems have received considerable attention in the literature, there is less work on digital semantic communication systems. In this paper, we develop a deep learning (DL)-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC. Specifically, we propose a convolutional neural network (CNN)-based transceiver to extr… ▽ More Although analog semantic communication systems have received considerable attention in the literature, there is less work on digital semantic communication systems. In this paper, we develop a deep learning (DL)-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC. Specifically, we propose a convolutional neural network (CNN)-based transceiver to extract multi-scale semantic features of images and introduce multi-scale semantic embedding spaces to perform semantic feature quantization, rendering the data compatible with digital communication systems. Furthermore, we employ adversarial training to improve the quality of received images by introducing a PatchGAN discriminator. Experimental results demonstrate that the proposed VQ-DeepSC is more robustness than BPG in digital communication systems and has comparable MS-SSIM performance to the DeepJSCC method. △ Less

Submitted 12 April, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: This five pages article has been accepted for publication in IEEE Wireless Communications Letters. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/LWC.2023.3255221

arXiv:2209.07689 [pdf, other]

A Unified Multi-Task Semantic Communication System for Multimodal Data

Authors: Guangyi Zhang, Qiyu Hu, Zhijin Qin, Yunlong Cai, Guanding Yu, Xiaoming Tao

Abstract: Task-oriented semantic communications have achieved significant performance gains. However, the employed deep neural networks in semantic communications have to be updated when the task is changed or multiple models need to be stored for performing different tasks. To address this issue, we develop a unified deep learning-enabled semantic communication system (U-DeepSC), where a unified end-to-end… ▽ More Task-oriented semantic communications have achieved significant performance gains. However, the employed deep neural networks in semantic communications have to be updated when the task is changed or multiple models need to be stored for performing different tasks. To address this issue, we develop a unified deep learning-enabled semantic communication system (U-DeepSC), where a unified end-to-end framework can serve many different tasks with multiple modalities of data. As the number of required features varies from task to task, we propose a vector-wise dynamic scheme that can adjust the number of transmitted symbols for different tasks. Moreover, our dynamic scheme can also adaptively adjust the number of transmitted features under different channel conditions to optimize the transmission efficiency. Particularly, we devise a lightweight feature selection module (FSM) to evaluate the importance of feature vectors, which can hierarchically drop redundant feature vectors and significantly accelerate the inference. To reduce the transmission overhead, we then design a unified codebook for feature representation to serve multiple tasks, where only the indices of these task-specific features in the codebook are transmitted. According to the simulation results, the proposed U-DeepSC achieves comparable performance to the task-oriented semantic communication system designed for a specific task but with significant reduction in both transmission overhead and model size. △ Less

Submitted 8 June, 2024; v1 submitted 15 September, 2022; originally announced September 2022.

arXiv:2207.11409 [pdf, other]

Computer Vision Aided mmWave Beam Alignment in V2X Communications

Authors: Weihua Xu, Feifei Gao, Xiaoming Tao, Jianhua Zhang, Ahmed Alkhateeb

Abstract: Visual information, captured for example by cameras, can effectively reflect the sizes and locations of the environmental scattering objects, and thereby can be used to infer communications parameters like propagation directions, receiver powers, as well as the blockage status. In this paper, we propose a novel beam alignment framework that leverages images taken by cameras installed at the mobile… ▽ More Visual information, captured for example by cameras, can effectively reflect the sizes and locations of the environmental scattering objects, and thereby can be used to infer communications parameters like propagation directions, receiver powers, as well as the blockage status. In this paper, we propose a novel beam alignment framework that leverages images taken by cameras installed at the mobile user. Specifically, we utilize 3D object detection techniques to extract the size and location information of the dynamic vehicles around the mobile user, and design a deep neural network (DNN) to infer the optimal beam pair for transceivers without any pilot signal overhead. Moreover, to avoid performing beam alignment too frequently or too slowly, a beam coherence time (BCT) prediction method is developed based on the vision information. This can effectively improve the transmission rate compared with the beam alignment approach with the fixed BCT. Simulation results show that the proposed vision based beam alignment methods outperform the existing LIDAR and vision based solutions, and demand for much lower hardware cost and communication overhead. △ Less

Submitted 23 July, 2022; originally announced July 2022.

Comments: 32 pages, 16 figures

arXiv:2206.02596 [pdf, other]

A Robust Deep Learning Enabled Semantic Communication System for Text

Authors: Xiang Peng, Zhijin Qin, Danlan Huang, Xiaoming Tao, Jianhua Lu, Guangyi Liu, Chengkang Pan

Abstract: With the advent of the 6G era, the concept of semantic communication has attracted increasing attention. Compared with conventional communication systems, semantic communication systems are not only affected by physical noise existing in the wireless communication environment, e.g., additional white Gaussian noise, but also by semantic noise due to the source and the nature of deep learning-based… ▽ More With the advent of the 6G era, the concept of semantic communication has attracted increasing attention. Compared with conventional communication systems, semantic communication systems are not only affected by physical noise existing in the wireless communication environment, e.g., additional white Gaussian noise, but also by semantic noise due to the source and the nature of deep learning-based systems. In this paper, we elaborate on the mechanism of semantic noise. In particular, we categorize semantic noise into two categories: literal semantic noise and adversarial semantic noise. The former is caused by written errors or expression ambiguity, while the latter is caused by perturbations or attacks added to the embedding layer via the semantic channel. To prevent semantic noise from influencing semantic communication systems, we present a robust deep learning enabled semantic communication system (R-DeepSC) that leverages a calibrated self-attention mechanism and adversarial training to tackle semantic noise. Compared with baseline models that only consider physical noise for text transmission, the proposed R-DeepSC achieves remarkable performance in dealing with semantic noise under different signal-to-noise ratios. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: 6 pages

arXiv:2205.14070 [pdf, other]

Multi-criteria Decision-making of Intelligent Vehicles under Fault Condition Enhancing Public-private Partnership

Authors: Xin Tao, Mladen Čičić, Jonas Mårtensson

Abstract: With the development of vehicular technologies on automation, electrification, and digitalization, vehicles are becoming more intelligent while being exposed to more complex, uncertain, and frequently occurring faults. In this paper, we look into the maintenance planning of an operating vehicle under fault condition and formulate it as a multi-criteria decision-making problem. The maintenance deci… ▽ More With the development of vehicular technologies on automation, electrification, and digitalization, vehicles are becoming more intelligent while being exposed to more complex, uncertain, and frequently occurring faults. In this paper, we look into the maintenance planning of an operating vehicle under fault condition and formulate it as a multi-criteria decision-making problem. The maintenance decisions are generated by route searching in road networks and evaluated based on risk assessment considering the uncertainty of vehicle breakdowns. Particularly, we consider two criteria, namely the risk of public time loss and the risk of mission delay, representing the concerns of the public sector and the private sector, respectively. A public time loss model is developed to evaluate the traffic congestion caused by a vehicle breakdown and the corresponding towing process. The Pareto optimal set of non-dominated decisions is derived by evaluating the risk of the decisions. We demonstrate the relevance of the problem and the effectiveness of the proposed method by numerical experiments derived from real-world scenarios. The experiments show that neglecting the risk of vehicle breakdown on public roads can cause a high risk of public time loss in dense traffic flow. With the proposed method, alternate decisions can be derived to reduce the risks of public time loss significantly with a low increase in the risk of mission delay. This study aims at catalyzing public-private partnership through collaborative decision-making between the private sector and the public sector, thus archiving a more sustainable transportation system in the future. △ Less

Submitted 27 May, 2022; originally announced May 2022.

arXiv:2205.04603 [pdf, other]

Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis

Authors: Zhenzi Weng, Zhijin Qin, Xiaoming Tao, Chengkang Pan, Guangyi Liu, Geoffrey Ye Li

Abstract: In this paper, we develop a deep learning based semantic communication system for speech transmission, named DeepSC-ST. We take the speech recognition and speech synthesis as the transmission tasks of the communication system, respectively. First, the speech recognition-related semantic features are extracted for transmission by a joint semantic-channel encoder and the text is recovered at the rec… ▽ More In this paper, we develop a deep learning based semantic communication system for speech transmission, named DeepSC-ST. We take the speech recognition and speech synthesis as the transmission tasks of the communication system, respectively. First, the speech recognition-related semantic features are extracted for transmission by a joint semantic-channel encoder and the text is recovered at the receiver based on the received semantic features, which significantly reduces the required amount of data transmission without performance degradation. Then, we perform speech synthesis at the receiver, which dedicates to re-generate the speech signals by feeding the recognized text and the speaker information into a neural network module. To enable the DeepSC-ST adaptive to dynamic channel environments, we identify a robust model to cope with different channel conditions. According to the simulation results, the proposed DeepSC-ST significantly outperforms conventional communication systems and existing DL-enabled communication systems, especially in the low signal-to-noise ratio (SNR) regime. A software demonstration is further developed as a proof-of-concept of the DeepSC-ST. △ Less

Submitted 31 March, 2023; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: arXiv admin note: text overlap with arXiv:2107.11190

arXiv:2202.04554 [pdf]

Real-time decision-making for autonomous vehicles under faults

Authors: Xin Tao, Zhao Yuan

Abstract: This paper addresses the challenges of decision-making for autonomous vehicles under faults during a transport mission. A real-time decision-making problem of vehicle routing planning considering maintenance management is formulated as an optimization problem. The goal is to minimize the total time to finish the transport mission by selecting the optimal workshop to conduct the maintenance and the… ▽ More This paper addresses the challenges of decision-making for autonomous vehicles under faults during a transport mission. A real-time decision-making problem of vehicle routing planning considering maintenance management is formulated as an optimization problem. The goal is to minimize the total time to finish the transport mission by selecting the optimal workshop to conduct the maintenance and the corresponding routes. Two methods are proposed to solve the optimization problem based on two methods of fundamental solutions: (1) Mixed Integer Programming; (2) Dijkstra's algorithm. We adapt these methods to solve the optimization problem and consider improving the computation efficiency. Numerical studies of test cases of highway and urban scenarios are presented to demonstrate the proposed methods, which show the feasibility and high computational efficiency of both methods. △ Less

Submitted 9 February, 2022; originally announced February 2022.

Comments: Accepted by the IEEE 9th International Conference on Industrial Engineering and Applications (ICIEA 2022). Date: November 2021. Email: taoxin@kth.se, zhaoyuan@hi.is, zhaoyuan.epslab@gmail.com

arXiv:2201.01389 [pdf, other]

Semantic Communications: Principles and Challenges

Authors: Zhijin Qin, Xiaoming Tao, Jianhua Lu, Wen Tong, Geoffrey Ye Li

Abstract: Semantic communication, regarded as the breakthrough beyond the Shannon paradigm, aims at the successful transmission of semantic information conveyed by the source rather than the accurate reception of each single symbol or bit regardless of its meaning. This article provides an overview on semantic communications. After a brief review of Shannon information theory, we discuss semantic communicat… ▽ More Semantic communication, regarded as the breakthrough beyond the Shannon paradigm, aims at the successful transmission of semantic information conveyed by the source rather than the accurate reception of each single symbol or bit regardless of its meaning. This article provides an overview on semantic communications. After a brief review of Shannon information theory, we discuss semantic communications with theory, framework, and system design enabled by deep learning. Different from the symbol/bit error rate used for measuring conventional communication systems, performance metrics for semantic communications are also discussed. The article concludes with several open questions in semantic communications. △ Less

Submitted 27 June, 2022; v1 submitted 30 December, 2021; originally announced January 2022.

arXiv:2112.10255 [pdf, other]

Task-Oriented Multi-User Semantic Communications

Authors: Huiqiang Xie, Zhijin Qin, Xiaoming Tao, Khaled B. Letaief

Abstract: While semantic communications have shown the potential in the case of single-modal single-users, its applications to the multi-user scenario remain limited. In this paper, we investigate deep learning (DL) based multi-user semantic communication systems for transmitting single-modal data and multimodal data, respectively. We will adopt three intelligent tasks, including, image retrieval, machine t… ▽ More While semantic communications have shown the potential in the case of single-modal single-users, its applications to the multi-user scenario remain limited. In this paper, we investigate deep learning (DL) based multi-user semantic communication systems for transmitting single-modal data and multimodal data, respectively. We will adopt three intelligent tasks, including, image retrieval, machine translation, and visual question answering (VQA) as the transmission goal of semantic communication systems. We will then propose a Transformer based unique framework to unify the structure of transmitters for different tasks. For the single-modal multi-user system, we will propose two Transformer based models, named, DeepSC-IR and DeepSC-MT, to perform image retrieval and machine translation, respectively. In this case, DeepSC-IR is trained to optimize the distance in embedding space between images and DeepSC-MT is trained to minimize the semantic errors by recovering the semantic meaning of sentences. For the multimodal multi-user system, we develop a Transformer enabled model, named, DeepSC-VQA, for the VQA task by extracting text-image information at the transmitters and fusing it at the receiver. In particular, a novel layer-wise Transformer is designed to help fuse multimodal data by adding connection between each of the encoder and decoder layers. Numerical results will show that the proposed models are superior to traditional communications in terms of the robustness to channels, computational complexity, transmission delay, and the task-execution performance at various task-specific metrics. △ Less

Submitted 19 December, 2021; originally announced December 2021.

Comments: 14 pages, 11 figures

arXiv:2110.08664 [pdf, other]

Finding Critical Scenarios for Automated Driving Systems: A Systematic Literature Review

Authors: Xinhai Zhang, Jianbo Tao, Kaige Tan, Martin Törngren, José Manuel Gaspar Sánchez, Muhammad Rusyadi Ramli, Xin Tao, Magnus Gyllenhammar, Franz Wotawa, Naveen Mohan, Mihai Nica, Hermann Felbinger

Abstract: Scenario-based approaches have been receiving a huge amount of attention in research and engineering of automated driving systems. Due to the complexity and uncertainty of the driving environment, and the complexity of the driving task itself, the number of possible driving scenarios that an ADS or ADAS may encounter is virtually infinite. Therefore it is essential to be able to reason about the i… ▽ More Scenario-based approaches have been receiving a huge amount of attention in research and engineering of automated driving systems. Due to the complexity and uncertainty of the driving environment, and the complexity of the driving task itself, the number of possible driving scenarios that an ADS or ADAS may encounter is virtually infinite. Therefore it is essential to be able to reason about the identification of scenarios and in particular critical ones that may impose unacceptable risk if not considered. Critical scenarios are particularly important to support design, verification and validation efforts, and as a basis for a safety case. In this paper, we present the results of a systematic literature review in the context of autonomous driving. The main contributions are: (i) introducing a comprehensive taxonomy for critical scenario identification methods; (ii) giving an overview of the state-of-the-art research based on the taxonomy encompassing 86 papers between 2017 and 2020; and (iii) identifying open issues and directions for further research. The provided taxonomy comprises three main perspectives encompassing the problem definition (the why), the solution (the methods to derive scenarios), and the assessment of the established scenarios. In addition, we discuss open research issues considering the perspectives of coverage, practicability, and scenario space explosion. △ Less

Submitted 16 October, 2021; originally announced October 2021.

Comments: 37 pages, 24 figures

arXiv:2106.01871 [pdf, other]

doi 10.1016/j.ress.2021.108251

Short-term Maintenance Planning of Autonomous Trucks for Minimizing Economic Risk

Authors: Xin Tao, Jonas Mårtensson, Håkan Warnquist, Anna Pernestål

Abstract: New autonomous driving technologies are emerging every day and some of them have been commercially applied in the real world. While benefiting from these technologies, autonomous trucks are facing new challenges in short-term maintenance planning, which directly influences the truck operator's profit. In this paper, we implement a vehicle health management system by addressing the maintenance plan… ▽ More New autonomous driving technologies are emerging every day and some of them have been commercially applied in the real world. While benefiting from these technologies, autonomous trucks are facing new challenges in short-term maintenance planning, which directly influences the truck operator's profit. In this paper, we implement a vehicle health management system by addressing the maintenance planning issues of autonomous trucks on a transport mission. We also present a maintenance planning model using a risk-based decision-making method, which identifies the maintenance decision with minimal economic risk of the truck company. Both availability losses and maintenance costs are considered when evaluating the economic risk. We demonstrate the proposed model by numerical experiments illustrating real-world scenarios. In the experiments, compared to three baseline methods, the expected economic risk of the proposed method is reduced by up to $47\%$. We also conduct sensitivity analyses of different model parameters. The analyses show that the economic risk significantly decreases when the estimation accuracy of remaining useful life, the maximal allowed time of delivery delay before order cancellation, or the number of workshops increases. The experiment results contribute to identifying future research and development attentions of autonomous trucks from an economic perspective. △ Less

Submitted 28 May, 2021; originally announced June 2021.

Comments: 22 pages, 13 figures, journal, submitted to Reliability Engineering & System Safety, under review

arXiv:2103.11907 [pdf, other]

MEC-Empowered Non-Terrestrial Network for 6G Wide-Area Time-Sensitive Internet of Things

Authors: Chengxiao Liu, Wei Feng, Xiaoming Tao, Ning Ge

Abstract: In the upcoming sixth-generation (6G) era, the demand for constructing a wide-area time-sensitive Internet of Things (IoT) keeps increasing. As conventional cellular technologies are hard to be directly used for wide-area time-sensitive IoT, it is beneficial to use non-terrestrial infrastructures including satellites and unmanned aerial vehicles (UAVs), where a non-terrestrial network (NTN) can be… ▽ More In the upcoming sixth-generation (6G) era, the demand for constructing a wide-area time-sensitive Internet of Things (IoT) keeps increasing. As conventional cellular technologies are hard to be directly used for wide-area time-sensitive IoT, it is beneficial to use non-terrestrial infrastructures including satellites and unmanned aerial vehicles (UAVs), where a non-terrestrial network (NTN) can be built under the cell-free architecture. Driven by the time-sensitive requirements and uneven distribution of IoT devices, the NTN is required to be empowered by mobile edge computing (MEC) while providing oasis-oriented on-demand coverage for the devices. Nevertheless, communication and MEC systems are coupled with each other under the influence of complex propagation environment in the MEC-empowered NTN, which makes it hard to orchestrate the resources. In this paper, we propose a process-oriented framework to design the communication and MEC systems in a time-division manner. Under this framework, the large-scale channel state information (CSI) is used to characterize the complex propagation environment with an affordable cost, where a non-convex latency minimization problem is formulated. After that, the approximated problem is given and it can be decomposed into subproblems. These subproblems are further solved in an iterative way. Simulation results demonstrate the superiority of the proposed process-oriented scheme over other algorithms. These results also indicate that the payload deployments of UAVs should be appropriately predesigned to improve the efficiency of resource use. Furthermore, the results imply that it is advantageous to integrate NTN with MEC for wide-area time-sensitive IoT. △ Less

Submitted 21 June, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: 29 pages, 11 figures, one column

arXiv:2103.03010 [pdf, other]

Perceptual Image Restoration with High-Quality Priori and Degradation Learning

Authors: Chaoyi Han, Yiping Duan, Xiaoming Tao, Jianhua Lu

Abstract: Perceptual image restoration seeks for high-fidelity images that most likely degrade to given images. For better visual quality, previous work proposed to search for solutions within the natural image manifold, by exploiting the latent space of a generative model. However, the quality of generated images are only guaranteed when latent embedding lies close to the prior distribution. In this work,… ▽ More Perceptual image restoration seeks for high-fidelity images that most likely degrade to given images. For better visual quality, previous work proposed to search for solutions within the natural image manifold, by exploiting the latent space of a generative model. However, the quality of generated images are only guaranteed when latent embedding lies close to the prior distribution. In this work, we propose to restrict the feasible region within the prior manifold. This is accomplished with a non-parametric metric for two distributions: the Maximum Mean Discrepancy (MMD). Moreover, we model the degradation process directly as a conditional distribution. We show that our model performs well in measuring the similarity between restored and degraded images. Instead of optimizing the long criticized pixel-wise distance over degraded images, we rely on such model to find visual pleasing images with high probability. Our simultaneous restoration and enhancement framework generalizes well to real-world complicated degradation types. The experimental results on perceptual quality and no-reference image quality assessment (NR-IQA) demonstrate the superior performance of our method. △ Less

Submitted 4 March, 2021; originally announced March 2021.

arXiv:2101.07471 [pdf, other]

doi 10.1109/TCOMM.2021.3107947

Deep Learning Based Channel Covariance Matrix Estimation with User Location and Scene Images

Authors: Weihua Xu, Feifei Gao, Jianhua Zhang, Xiaoming Tao, Ahmed Alkhateeb

Abstract: Channel covariance matrix (CCM) is one critical parameter for designing the communications systems. In this paper, a novel framework of the deep learning (DL) based CCM estimation is proposed that exploits the perception of the transmission environment without any channel sample or the pilot signals. Specifically, as CCM is affected by the user's movement, we design a deep neural network (DNN) to… ▽ More Channel covariance matrix (CCM) is one critical parameter for designing the communications systems. In this paper, a novel framework of the deep learning (DL) based CCM estimation is proposed that exploits the perception of the transmission environment without any channel sample or the pilot signals. Specifically, as CCM is affected by the user's movement, we design a deep neural network (DNN) to predict CCM from user location and user speed, and the corresponding estimation method is named as ULCCME. A location denoising method is further developed to reduce the positioning error and improve the robustness of ULCCME. For cases when user location information is not available, we propose an interesting way that uses the environmental 3D images to predict the CCM, and the corresponding estimation method is named as SICCME. Simulation results show that both the proposed methods are effective and will benefit the subsequent channel estimation. △ Less

Submitted 18 April, 2023; v1 submitted 19 January, 2021; originally announced January 2021.

Comments: 31 pages, 20 figures

arXiv:2007.08826 [pdf, ps, other]

Revisiting Rubik's Cube: Self-supervised Learning with Volume-wise Transformation for 3D Medical Image Segmentation

Authors: Xing Tao, Yuexiang Li, Wenhui Zhou, Kai Ma, Yefeng Zheng

Abstract: Deep learning highly relies on the quantity of annotated data. However, the annotations for 3D volumetric medical data require experienced physicians to spend hours or even days for investigation. Self-supervised learning is a potential solution to get rid of the strong requirement of training data by deeply exploiting raw data information. In this paper, we propose a novel self-supervised learnin… ▽ More Deep learning highly relies on the quantity of annotated data. However, the annotations for 3D volumetric medical data require experienced physicians to spend hours or even days for investigation. Self-supervised learning is a potential solution to get rid of the strong requirement of training data by deeply exploiting raw data information. In this paper, we propose a novel self-supervised learning framework for volumetric medical images. Specifically, we propose a context restoration task, i.e., Rubik's cube++, to pre-train 3D neural networks. Different from the existing context-restoration-based approaches, we adopt a volume-wise transformation for context permutation, which encourages network to better exploit the inherent 3D anatomical information of organs. Compared to the strategy of training from scratch, fine-tuning from the Rubik's cube++ pre-trained weight can achieve better performance in various tasks such as pancreas segmentation and brain tissue segmentation. The experimental results show that our self-supervised learning method can significantly improve the accuracy of 3D deep learning networks on volumetric medical datasets without the use of extra data. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: Accepted by MICCAI 2020

arXiv:2003.07719 [pdf]

Toward a Wearable RFID System for Real-Time Activity Recognition Using Radio Patterns

Authors: Liang Wang, Tao Gu, Xianping Tao, Jian Lu

Abstract: Elderly care is one of the many applications supported by real-time activity recognition systems. Traditional approaches use cameras, body sensor networks, or radio patterns from various sources for activity recognition. However, these approaches are limited due to ease-of-use, coverage, or privacy preserving issues. In this paper, we present a novel wearable Radio Frequency Identification (RFID)… ▽ More Elderly care is one of the many applications supported by real-time activity recognition systems. Traditional approaches use cameras, body sensor networks, or radio patterns from various sources for activity recognition. However, these approaches are limited due to ease-of-use, coverage, or privacy preserving issues. In this paper, we present a novel wearable Radio Frequency Identification (RFID) system aims at providing an easy-to-use solution with high detection coverage. Our system uses passive tags which are maintenance-free and can be embedded into the clothes to reduce the wearing and maintenance efforts. A small RFID reader is also worn on the user's body to extend the detection coverage as the user moves. We exploit RFID radio patterns and extract both spatial and temporal features to characterize various activities. We also address the issues of false negative of tag readings and tag/antenna calibration, and design a fast online recognition system. Antenna and tag selection is done automatically to explore the minimum number of devices required to achieve target accuracy. We develop a prototype system which consists of a wearable RFID system and a smartphone to demonstrate the working principles, and conduct experimental studies with four subjects over two weeks. The results show that our system achieves a high recognition accuracy of 93.6 percent with a latency of 5 seconds. Additionally, we show that the system only requires two antennas and four tagged body parts to achieve a high recognition accuracy of 85 percent. △ Less

Submitted 8 March, 2020; originally announced March 2020.

arXiv:1912.02796 [pdf]

doi 10.1109/ITSC.2018.8569945

Architecting Safety Supervisors for High Levels of Automated Driving

Authors: Martin Törngren, Xinhai Zhang, Naveen Mohan, Matthias Becker, Lars Svensson, Xin Tao, De-Jiu Chen, Jonas Westman

Abstract: The complexity of automated driving poses challenges for providing safety assurance. Focusing on the architecting of an Autonomous Driving Intelligence (ADI), i.e. the computational intelligence, sensors and communication needed for high levels of automated driving, we investigate so called safety supervisors that complement the nominal functionality. We present a problem formulation and a functio… ▽ More The complexity of automated driving poses challenges for providing safety assurance. Focusing on the architecting of an Autonomous Driving Intelligence (ADI), i.e. the computational intelligence, sensors and communication needed for high levels of automated driving, we investigate so called safety supervisors that complement the nominal functionality. We present a problem formulation and a functional architecture of a fault-tolerant ADI that encompasses a nominal and a safety supervisor channel. We then discuss the sources of hazardous events, the division of responsibilities among the channels, and when the supervisor should take over. We conclude with identified directions for further work. △ Less

Submitted 5 December, 2019; originally announced December 2019.

Journal ref: 2018 21st International Conference on Intelligent Transportation Systems (ITSC)

arXiv:1908.05094 [pdf, other]

Segmentation of Multimodal Myocardial Images Using Shape-Transfer GAN

Authors: Xumin Tao, Hongrong Wei, Wufeng Xue, Dong Ni

Abstract: Myocardium segmentation of late gadolinium enhancement (LGE) Cardiac MR images is important for evaluation of infarction regions in clinical practice. The pathological myocardium in LGE images presents distinctive brightness and textures compared with the healthy tissues, making it much more challenging to be segment. Instead, the balanced-Steady State Free Precession (bSSFP) cine images show clea… ▽ More Myocardium segmentation of late gadolinium enhancement (LGE) Cardiac MR images is important for evaluation of infarction regions in clinical practice. The pathological myocardium in LGE images presents distinctive brightness and textures compared with the healthy tissues, making it much more challenging to be segment. Instead, the balanced-Steady State Free Precession (bSSFP) cine images show clearly boundaries and can be easily segmented. Given this fact, we propose a novel shape-transfer GAN for LGE images, which can 1) learn to generate realistic LGE images from bSSFP with the anatomical shape preserved, and 2) learn to segment the myocardium of LGE images from these generated images. It's worth to note that no segmentation label of the LGE images is used during this procedure. We test our model on dataset from the Multi-sequence Cardiac MR Segmentation Challenge. The results show that the proposed Shape-Transfer GAN can achieve accurate myocardium masks of LGE images. △ Less

Submitted 14 August, 2019; originally announced August 2019.

Comments: accepted by STACOM 21019

arXiv:1802.09958 [pdf, ps, other]

Use of Two-Mode Circuitry and Optimal Energy-Efficient Power Control Under Target Delay-Outage Constraints

Authors: Jinkun Xu, Yu Chen, Hao Chen, Qimei Cui, Xiaofeng Tao

Abstract: An accurate energy efficiency analytical model based on a two-mode circuitry was recently proposed; and the model showed that the use of this circuitry can significantly improve a system's energy efficiency. In this paper, we use this analytical model to develop a new power control scheme, a scheme that is capable of allocating a minimum transmission power precisely within the delay-outage probabi… ▽ More An accurate energy efficiency analytical model based on a two-mode circuitry was recently proposed; and the model showed that the use of this circuitry can significantly improve a system's energy efficiency. In this paper, we use this analytical model to develop a new power control scheme, a scheme that is capable of allocating a minimum transmission power precisely within the delay-outage probability constraint. Precision brings substantial benefits as numerical results show that the energy efficiency using our scheme is much higher than other schemes. Results further suggest that data rate values affect energy efficiency non-uniformly, i.e., there exists a specific data rate value that achieves maximum energy efficiency. △ Less

Submitted 27 February, 2018; originally announced February 2018.

Showing 1–42 of 42 results for author: Tao, X