Search | arXiv e-print repository

Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds

Authors: Hanwei Zhang, Luo Cheng, Qisong He, Wei Huang, Renjue Li, Ronan Sicre, Xiaowei Huang, Holger Hermanns, Lijun Zhang

Abstract: Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect tha… ▽ More Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect that a seemingly well-trained model ends up misclassifying the input. This paper adds to the understanding of adversarial attacks by presenting Eidos, a framework providing Efficient Imperceptible aDversarial attacks on 3D pOint cloudS. Eidos supports a diverse set of imperceptibility metrics. It employs an iterative, two-step procedure to identify optimal adversarial examples, thereby enabling a runtime-imperceptibility trade-off. We provide empirical evidence relative to several popular 3D point cloud classification models and several established 3D attack methods, showing Eidos' superiority with respect to efficiency as well as imperceptibility. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2405.06230 [pdf]

Fire in SRRN: Next-Gen 3D Temperature Field Reconstruction Technology

Authors: Shenxiang Feng, Xiaojian Hao, Xiaodong Huang, Pan Pei, Tong Wei, Chenyang Xu

Abstract: In aerospace and energy engineering, accurate 3D combustion field temperature measurement is critical. The resolution of traditional methods based on algebraic iteration is limited by the initial voxel division. This study introduces a novel method for reconstructing three-dimensional temperature fields using the Spatial Radiation Representation Network (SRRN). This method utilizes the flame therm… ▽ More In aerospace and energy engineering, accurate 3D combustion field temperature measurement is critical. The resolution of traditional methods based on algebraic iteration is limited by the initial voxel division. This study introduces a novel method for reconstructing three-dimensional temperature fields using the Spatial Radiation Representation Network (SRRN). This method utilizes the flame thermal radiation characteristics and differentiable rendering in graphics, and combines it with a multi-layer perceptron to achieve a functional representation of the flame temperature field. The effectiveness of SRRN is evaluated through simulated temperature field reconstruction experiments with different levels of complexity. The maximum root mean square error is 10.17, which proves the robustness of the algorithm to Gaussian noise and salt-and-pepper noise. We conducted a butane flame temperature field reconstruction experiment, and the maximum relative error between the reconstruction result and the thermocouple measurement value was 4.86%, confirming that the algorithm can achieve accurate reconstruction. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.16305 [pdf, other]

Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model

Authors: Gehui Chen, Guan'an Wang, Xiaowen Huang, Jitao Sang

Abstract: Existing works have made strides in video generation, but the lack of sound effects (SFX) and background music (BGM) hinders a complete and immersive viewer experience. We introduce a novel semantically consistent v ideo-to-audio generation framework, namely SVA, which automatically generates audio semantically consistent with the given video content. The framework harnesses the power of multimoda… ▽ More Existing works have made strides in video generation, but the lack of sound effects (SFX) and background music (BGM) hinders a complete and immersive viewer experience. We introduce a novel semantically consistent v ideo-to-audio generation framework, namely SVA, which automatically generates audio semantically consistent with the given video content. The framework harnesses the power of multimodal large language model (MLLM) to understand video semantics from a key frame and generate creative audio schemes, which are then utilized as prompts for text-to-audio models, resulting in video-to-audio generation with natural language as an interface. We show the satisfactory performance of SVA through case study and discuss the limitations along with the future research direction. The project page is available at https://huiz-a.github.io/audio4video.github.io/. △ Less

Submitted 25 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15620 [pdf, other]

A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution

Authors: Zhixiong Yang, Jingyuan Xia, Shengxi Li, Xinghua Huang, Shuanghui Zhang, Zhen Liu, Yaowen Fu, Yongxiang Liu

Abstract: Deep learning-based methods have achieved significant successes on solving the blind super-resolution (BSR) problem. However, most of them request supervised pre-training on labelled datasets. This paper proposes an unsupervised kernel estimation model, named dynamic kernel prior (DKP), to realize an unsupervised and pre-training-free learning-based algorithm for solving the BSR problem. DKP can a… ▽ More Deep learning-based methods have achieved significant successes on solving the blind super-resolution (BSR) problem. However, most of them request supervised pre-training on labelled datasets. This paper proposes an unsupervised kernel estimation model, named dynamic kernel prior (DKP), to realize an unsupervised and pre-training-free learning-based algorithm for solving the BSR problem. DKP can adaptively learn dynamic kernel priors to realize real-time kernel estimation, and thereby enables superior HR image restoration performances. This is achieved by a Markov chain Monte Carlo sampling process on random kernel distributions. The learned kernel prior is then assigned to optimize a blur kernel estimation network, which entails a network-based Langevin dynamic optimization strategy. These two techniques ensure the accuracy of the kernel estimation. DKP can be easily used to replace the kernel estimation models in the existing methods, such as Double-DIP and FKP-DIP, or be added to the off-the-shelf image restoration model, such as diffusion model. In this paper, we incorporate our DKP model with DIP and diffusion model, referring to DIP-DKP and Diff-DKP, for validations. Extensive simulations on Gaussian and motion kernel scenarios demonstrate that the proposed DKP model can significantly improve the kernel estimation with comparable runtime and memory usage, leading to state-of-the-art BSR results. The code is available at https://github.com/XYLGroup/DKP. △ Less

Submitted 25 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: Accepted for publication in CVPR 2024

arXiv:2404.13786 [pdf, other]

Soar: Design and Deployment of A Smart Roadside Infrastructure System for Autonomous Driving

Authors: Shuyao Shi, Neiwen Ling, Zhehao Jiang, Xuan Huang, Yuze He, Xiaoguang Zhao, Bufang Yang, Chen Bian, Jingfei Xia, Zhenyu Yan, Raymond Yeung, Guoliang Xing

Abstract: Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components ca… ▽ More Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components carefully designed to overcome various system and physical challenges. Soar can leverage the existing operational infrastructure like street lampposts for a lower barrier of adoption. Soar adopts a new communication architecture that comprises a bi-directional multi-hop I2I network and a downlink I2V broadcast service, which are designed based on off-the-shelf 802.11ac interfaces in an integrated manner. Soar also features a hierarchical DL task management framework to achieve desirable load balancing among nodes and enable them to collaborate efficiently to run multiple data-intensive autonomous driving applications. We deployed a total of 18 Soar nodes on existing lampposts on campus, which have been operational for over two years. Our real-world evaluation shows that Soar can support a diverse set of autonomous driving applications and achieve desirable real-time performance and high communication reliability. Our findings and experiences in this work offer key insights into the development and deployment of next-generation smart roadside infrastructure and autonomous driving systems. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.12973 [pdf, other]

Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

Authors: Xiaofei Wang, Xingxu Huang, Stephen J. Price, Chao Li

Abstract: The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. Howeve… ▽ More The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets. △ Less

Submitted 27 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12595 [pdf, other]

Deep Reinforcement Learning-aided Transmission Design for Energy-efficient Link Optimization in Vehicular Communications

Authors: Zhengpeng Wang, Yanqun Tang, Yingzhe Mao, Tao Wang, Xiunan Huang

Abstract: This letter presents a deep reinforcement learning (DRL) approach for transmission design to optimize the energy efficiency in vehicle-to-vehicle (V2V) communication links. Considering the dynamic environment of vehicular communications, the optimization problem is non-convex and mathematically difficult to solve. Hence, we propose scenario identification-based double and Dueling deep Q-Network (S… ▽ More This letter presents a deep reinforcement learning (DRL) approach for transmission design to optimize the energy efficiency in vehicle-to-vehicle (V2V) communication links. Considering the dynamic environment of vehicular communications, the optimization problem is non-convex and mathematically difficult to solve. Hence, we propose scenario identification-based double and Dueling deep Q-Network (SI-D3QN), a DRL algorithm integrating both double deep Q-Network and Dueling deep Q-Network, for the joint design of modulation and coding scheme (MCS) selection and power control. To be more specific, we employ SI techique to enhance link performance and assit the D3QN agent in refining its decision-making processes. The experiment results demonstrate that, across various optimization tasks, our proposed SI-D3QN agent outperforms the benchmark algorithms in terms of the valid actions and link performance metrics. Particularly, while ensuring significant improvement in energy efficiency, the agent facilitates a 29.6% enhancement in the link throughput under the same energy consumption. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 5 pages, 3 figures

arXiv:2403.02566 [pdf, other]

Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning

Authors: Zhaoxin Fan, Runmin Jiang, Junhao Wu, Xin Huang, Tianyang Wang, Heng Huang, Min Xu

Abstract: 3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning. Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation. However, this approach heavily relies on labor-intensive and time-consuming fully annotated ground-truth labels, particularly for 3D volumes. To overcome this limitation,… ▽ More 3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning. Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation. However, this approach heavily relies on labor-intensive and time-consuming fully annotated ground-truth labels, particularly for 3D volumes. To overcome this limitation, we propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging. Our pipeline integrates three innovative components: a probability-based pseudo-label generation technique for synthesizing dense segmentation masks from sparse annotations, a Probabilistic Multi-head Self-Attention network for robust feature extraction within our Probabilistic Transformer Network, and a Probability-informed Segmentation Loss Function to enhance training with annotation confidence. Demonstrating significant advances, our approach not only rivals the performance of fully supervised methods but also surpasses existing weakly supervised methods in CT and MRI datasets, achieving up to 18.1% improvement in Dice scores for certain organs. The code is available at https://github.com/runminjiang/PW4MedSeg. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.09372 [pdf, other]

Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

Authors: Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

arXiv:2402.09048 [pdf, other]

Sensing in Bi-Static ISAC Systems with Clock Asynchronism: A Signal Processing Perspective

Authors: Kai Wu, Jacopo Pegoraro, Francesca Meneghello, J. Andrew Zhang, Jesus O. Lacruz, Joerg Widmer, Francesco Restuccia, Michele Rossi, Xiaojing Huang, Daqing Zhang, Giuseppe Caire, Y. Jay Guo

Abstract: Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks a… ▽ More Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks at far-separated transmitters and receivers. This causes the received signal to be affected by time-varying random phase offsets, severely degrading, or even failing, direct sensing. Hence, to effectively enable ISAC, considerable research has been directed toward addressing the clock asynchronism issue in bi-static sensing. This paper provides an overview of the issue and existing techniques developed in an ISAC background. Based on the review and comparison, we also draw insights into the future research directions and open problems, aiming to nurture the maturation of bi-static sensing in ISAC. △ Less

Submitted 24 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: 20 pages, 6 figures, 1 table

arXiv:2402.06841 [pdf]

Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA

Authors: Shaojie Tang, Penpen Miao, Xingyu Gao, Yu Zhong, Dantong Zhu, Haixing Wen, Zhihui Xu, Qiuyue Wei, Hongping Yao, Xin Huang, Rui Gao, Chen Zhao, Weihua Zhou

Abstract: A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point c… ▽ More A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point clouds of the LV epicardial contours (LVECs). Secondly, according to the characteristics of cardiac anatomy, the special points of anterior and posterior interventricular grooves (APIGs) were manually marked in both SPECT and CTA image volumes. Thirdly, we developed an in-house program for coarsely registering the special points of APIGs to ensure a correct cardiac orientation alignment between SPECT and CTA images. Fourthly, we employed ICP, SICP or CPD algorithm to achieve a fine registration for the point clouds (together with the special points of APIGs) of the LV epicardial surfaces (LVERs) in SPECT and CTA images. Finally, the image fusion between SPECT and CTA was realized after the fine registration. The experimental results showed that the cardiac orientation was aligned well and the mean distance error of the optimal registration method (CPD with affine transform) was consistently less than 3 mm. The proposed method could effectively fuse the structures from cardiac CTA and SPECT functional images, and demonstrated a potential in assisting in accurate diagnosis of cardiac diseases by combining complementary advantages of the two imaging modalities. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.01546 [pdf, other]

doi 10.1109/JIOT.2024.3362587

Privacy-Preserving Distributed Learning for Residential Short-Term Load Forecasting

Authors: Yi Dong, Yingjie Wang, Mariana Gama, Mustafa A. Mustafa, Geert Deconinck, Xiaowei Huang

Abstract: In the realm of power systems, the increasing involvement of residential users in load forecasting applications has heightened concerns about data privacy. Specifically, the load data can inadvertently reveal the daily routines of residential users, thereby posing a risk to their property security. While federated learning (FL) has been employed to safeguard user privacy by enabling model training… ▽ More In the realm of power systems, the increasing involvement of residential users in load forecasting applications has heightened concerns about data privacy. Specifically, the load data can inadvertently reveal the daily routines of residential users, thereby posing a risk to their property security. While federated learning (FL) has been employed to safeguard user privacy by enabling model training without the exchange of raw data, these FL models have shown vulnerabilities to emerging attack techniques, such as Deep Leakage from Gradients and poisoning attacks. To counteract these, we initially employ a Secure-Aggregation (SecAgg) algorithm that leverages multiparty computation cryptographic techniques to mitigate the risk of gradient leakage. However, the introduction of SecAgg necessitates the deployment of additional sub-center servers for executing the multiparty computation protocol, thereby escalating computational complexity and reducing system robustness, especially in scenarios where one or more sub-centers are unavailable. To address these challenges, we introduce a Markovian Switching-based distributed training framework, the convergence of which is substantiated through rigorous theoretical analysis. The Distributed Markovian Switching (DMS) topology shows strong robustness towards the poisoning attacks as well. Case studies employing real-world power system load data validate the efficacy of our proposed algorithm. It not only significantly minimizes communication complexity but also maintains accuracy levels comparable to traditional FL methods, thereby enhancing the scalability of our load forecasting algorithm. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.16592 [pdf]

A compact and cost-effective laser-powered speckle visibility spectroscopy (SVS) device for measuring cerebral blood flow

Authors: Yu Xi Huang, Simon Mahler, Maya Dickson, Aidin Abedi, Julian M. Tyszka, Jack Lo Yu Tung, Jonathan Russin, Charles Liu, Changhuei Yang

Abstract: In the realm of cerebrovascular monitoring, primary metrics typically include blood pressure, which influences cerebral blood flow (CBF) and is contingent upon vessel radius. Measuring CBF non-invasively poses a persistent challenge, primarily attributed to the difficulty of accessing and obtaining signal from the brain. This study aims to introduce a compact speckle visibility spectroscopy (SVS)… ▽ More In the realm of cerebrovascular monitoring, primary metrics typically include blood pressure, which influences cerebral blood flow (CBF) and is contingent upon vessel radius. Measuring CBF non-invasively poses a persistent challenge, primarily attributed to the difficulty of accessing and obtaining signal from the brain. This study aims to introduce a compact speckle visibility spectroscopy (SVS) device designed for non-invasive CBF measurements, offering cost-effectiveness and scalability while tracking CBF with remarkable sensitivity and temporal resolution. The wearable hardware has a modular design approach consisting solely of a laser diode as the source and a meticulously selected board camera as the detector. They both can be easily placed on the head of a subject to measure CBF with no additional optical elements. The SVS device can achieve a sampling rate of 80 Hzへるつ with minimal susceptibility to external disturbances. The device also achieves better SNR compared with traditional fiber-based SVS devices, capturing about 70 times more signal and showing superior stability and reproducibility. It is designed to be paired and distributed in multiple configurations around the head, and measure signals that exceed the quality of prior optical CBF measurement techniques. Given its cost-effectiveness, scalability, and simplicity, this laser-centric tool offers significant potential in advancing non-invasive cerebral monitoring technologies. △ Less

Submitted 8 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.16520 [pdf, other]

MT-HCCAR: Multi-Task Deep Learning with Hierarchical Classification and Attention-based Regression for Cloud Property Retrieval

Authors: Xingyan Li, Andrew M. Sayer, Ian T. Carroll, Xin Huang, Jianwu Wang

Abstract: In the realm of Earth science, effective cloud property retrieval, encompassing cloud masking, cloud phase classification, and cloud optical thickness (COT) prediction, remains pivotal. Traditional methodologies necessitate distinct models for each sensor instrument due to their unique spectral characteristics. Recent strides in Earth Science research have embraced machine learning and deep learni… ▽ More In the realm of Earth science, effective cloud property retrieval, encompassing cloud masking, cloud phase classification, and cloud optical thickness (COT) prediction, remains pivotal. Traditional methodologies necessitate distinct models for each sensor instrument due to their unique spectral characteristics. Recent strides in Earth Science research have embraced machine learning and deep learning techniques to extract features from satellite datasets' spectral observations. However, prevailing approaches lack novel architectures accounting for hierarchical relationships among retrieval tasks. Moreover, considering the spectral diversity among existing sensors, the development of models with robust generalization capabilities over different sensor datasets is imperative. Surprisingly, there is a dearth of methodologies addressing the selection of an optimal model for diverse datasets. In response, this paper introduces MT-HCCAR, an end-to-end deep learning model employing multi-task learning to simultaneously tackle cloud masking, cloud phase retrieval (classification tasks), and COT prediction (a regression task). The MT-HCCAR integrates a hierarchical classification network (HC) and a classification-assisted attention-based regression network (CAR), enhancing precision and robustness in cloud labeling and COT prediction. Additionally, a comprehensive model selection method rooted in K-fold cross-validation, one standard error rule, and two introduced performance scores is proposed to select the optimal model over three simulated satellite datasets OCI, VIIRS, and ABI. The experiments comparing MT-HCCAR with baseline methods, the ablation studies, and the model selection affirm the superiority and the generalization capabilities of MT-HCCAR. △ Less

Submitted 5 July, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 14 pages, 3 figures, accepted by ECML PKDD 2024

MSC Class: 68T07 ACM Class: I.2.6

arXiv:2401.12826 [pdf, other]

Digital Twin-Based Network Management for Better QoE in Multicast Short Video Streaming

Authors: Xinyu Huang, Shisheng Hu, Haojun Yang, Xinghan Wang, Yingying Pei, Xuemin Shen

Abstract: Multicast short video streaming can enhance bandwidth utilization by enabling simultaneous video transmission to multiple users over shared wireless channels. The existing network management schemes mainly rely on the sequential buffering principle and general quality of experience (QoE) model, which may deteriorate QoE when users' swipe behaviors exhibit distinct spatiotemporal variation. In this… ▽ More Multicast short video streaming can enhance bandwidth utilization by enabling simultaneous video transmission to multiple users over shared wireless channels. The existing network management schemes mainly rely on the sequential buffering principle and general quality of experience (QoE) model, which may deteriorate QoE when users' swipe behaviors exhibit distinct spatiotemporal variation. In this paper, we propose a digital twin (DT)-based network management scheme to enhance QoE. Firstly, user status emulated by the DT is utilized to estimate the transmission capabilities and watching probability distributions of sub-multicast groups (SMGs) for an adaptive segment buffering. The SMGs' buffers are aligned to the unique virtual buffers managed by the DT for a fine-grained buffer update. Then, a multicast QoE model consisting of rebuffering time, video quality, and quality variation is developed, by considering the mutual influence of segment buffering among SMGs. Finally, a joint optimization problem of segment version selection and slot division is formulated to maximize QoE. To efficiently solve the problem, a data-model-driven algorithm is proposed by integrating a convex optimization method and a deep reinforcement learning algorithm. Simulation results based on the real-world dataset demonstrate that the proposed DT-based network management scheme outperforms benchmark schemes in terms of QoE improvement. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 13 pages, 12 figures

arXiv:2401.10070 [pdf, other]

Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks

Authors: Yichao Du, Zhirui Zhang, Linan Yue, Xu Huang, Yuqing Zhang, Tong Xu, Linli Xu, Enhong Chen

Abstract: To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc{FedAvg}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the wh… ▽ More To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc{FedAvg}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the whole model and performance degradation caused by data heterogeneity among clients.To address these issues, we propose a personalized federated S2T framework that introduces \textsc{FedLoRA}, a lightweight LoRA module for client-side tuning and interaction with the server to minimize communication overhead, and \textsc{FedMem}, a global model equipped with a $k$-nearest-neighbor ($k$NN) classifier that captures client-specific distributional shifts to achieve personalization and overcome data heterogeneity. Extensive experiments based on Conformer and Whisper backbone models on CoVoST and GigaSpeech benchmarks show that our approach significantly reduces the communication overhead on all S2T tasks and effectively personalizes the global model to overcome data heterogeneity. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: ICASSP 2024

arXiv:2311.14925 [pdf, other]

Coordinate-based Neural Network for Fourier Phase Retrieval

Authors: Tingyou Li, Zixin Xu, Yong S. Chu, Xiaojing Huang, Jizhou Li

Abstract: Fourier phase retrieval is essential for high-definition imaging of nanoscale structures across diverse fields, notably coherent diffraction imaging. This study presents the Single impliCit neurAl Network (SCAN), a tool built upon coordinate neural networks meticulously designed for enhanced phase retrieval performance. Remedying the drawbacks of conventional iterative methods which are easiliy tr… ▽ More Fourier phase retrieval is essential for high-definition imaging of nanoscale structures across diverse fields, notably coherent diffraction imaging. This study presents the Single impliCit neurAl Network (SCAN), a tool built upon coordinate neural networks meticulously designed for enhanced phase retrieval performance. Remedying the drawbacks of conventional iterative methods which are easiliy trapped into local minimum solutions and sensitive to noise, SCAN adeptly connects object coordinates to their amplitude and phase within a unified network in an unsupervised manner. While many existing methods primarily use Fourier magnitude in their loss function, our approach incorporates both the predicted magnitude and phase, enhancing retrieval accuracy. Comprehensive tests validate SCAN's superiority over traditional and other deep learning models regarding accuracy and noise robustness. We also demonstrate that SCAN excels in the ptychography setting. △ Less

Submitted 8 January, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.12223 [pdf, other]

Digital Twin-Based User-Centric Edge Continual Learning in Integrated Sensing and Communication

Authors: Shisheng Hu, Jie Gao, Xinyu Huang, Mushu Li, Kaige Qu, Conghao Zhou, Xuemin, Shen

Abstract: In this paper, we propose a digital twin (DT)-based user-centric approach for processing sensing data in an integrated sensing and communication (ISAC) system with high accuracy and efficient resource utilization. The considered scenario involves an ISAC device with a lightweight deep neural network (DNN) and a mobile edge computing (MEC) server with a large DNN. After collecting sensing data, the… ▽ More In this paper, we propose a digital twin (DT)-based user-centric approach for processing sensing data in an integrated sensing and communication (ISAC) system with high accuracy and efficient resource utilization. The considered scenario involves an ISAC device with a lightweight deep neural network (DNN) and a mobile edge computing (MEC) server with a large DNN. After collecting sensing data, the ISAC device either processes the data locally or uploads them to the server for higher-accuracy data processing. To cope with data drifts, the server updates the lightweight DNN when necessary, referred to as continual learning. Our objective is to minimize the long-term average computation cost of the MEC server by optimizing two decisions, i.e., sensing data offloading and sensing data selection for the DNN update. A DT of the ISAC device is constructed to predict the impact of potential decisions on the long-term computation cost of the server, based on which the decisions are made with closed-form formulas. Experiments on executing DNN-based human motion recognition tasks are conducted to demonstrate the outstanding performance of the proposed DT-based approach in computation cost minimization. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: submitted to IEEE ICC 2024

arXiv:2311.00483 [pdf, other]

DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object Segmentation

Authors: Xiaohua Jiang, Yihao Guo, Jian Huang, Yuting Wu, Meiyi Luo, Zhaoyang Xu, Qianni Zhang, Xingru Huang, Hong He, Shaowei Jiang, Jing Ye, Mang Xiao

Abstract: The precise spatial and quantitative delineation of indistinct-boundary medical objects is paramount for the accuracy of diagnostic protocols, efficacy of surgical interventions, and reliability of postoperative assessments. Despite their significance, the effective segmentation and instantaneous three-dimensional reconstruction are significantly impeded by the paucity of representative samples in… ▽ More The precise spatial and quantitative delineation of indistinct-boundary medical objects is paramount for the accuracy of diagnostic protocols, efficacy of surgical interventions, and reliability of postoperative assessments. Despite their significance, the effective segmentation and instantaneous three-dimensional reconstruction are significantly impeded by the paucity of representative samples in available datasets and noise artifacts. To surmount these challenges, we introduced Stochastic Defect Injection (SDi) to augment the representational diversity of challenging indistinct-boundary objects within training corpora. Consequently, we propose the Dual-Encoder Fourier Group Harmonics Network (DEFN) to tailor noise filtration, amplify detailed feature recognition, and bolster representation across diverse medical imaging scenarios. By incorporating Dynamic Weight Composing (DWC) loss dynamically adjusts model's focus based on training progression, DEFN achieves SOTA performance on the OIMHS public dataset, showcasing effectiveness in indistinct boundary contexts. Source code for DEFN is available at: https://github.com/IMOP-lab/DEFN-pytorch. △ Less

Submitted 19 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: 36pages,16figures,7tables

MSC Class: 68; 92 ACM Class: I.4; J.3

arXiv:2310.12039 [pdf, other]

Ordered Reliability Direct Error Pattern Testing Decoding Algorithm

Authors: Reza Hadavian, Xiaoting Huang, Dmitri Truhachev, Kamal El-Sankary, Hamid Ebrahimzad, Hossein Najafi

Abstract: We introduce a novel universal soft-decision decoding algorithm for binary block codes called ordered reliability direct error pattern testing (ORDEPT). Our results, obtained for a variety of popular short high-rate codes, demonstrate that ORDEPT outperforms state-of-the-art decoding algorithms of comparable complexity such as ordered reliability bits guessing random additive noise decoding (ORBGR… ▽ More We introduce a novel universal soft-decision decoding algorithm for binary block codes called ordered reliability direct error pattern testing (ORDEPT). Our results, obtained for a variety of popular short high-rate codes, demonstrate that ORDEPT outperforms state-of-the-art decoding algorithms of comparable complexity such as ordered reliability bits guessing random additive noise decoding (ORBGRAND) in terms of the decoding error probability and latency. The improvements carry on to the iterative decoding of product codes and convolutional product-like codes, where we present a new adaptive decoding algorithm and demonstrate the ability of ORDEPT to efficiently find multiple candidate codewords to produce soft output. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.04456 [pdf, other]

Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation

Authors: Shihao Zou, Xianying Huang, Xudong Shen

Abstract: Emotion Recognition in Conversation (ERC) plays an important role in driving the development of human-machine interaction. Emotions can exist in multiple modalities, and multimodal ERC mainly faces two problems: (1) the noise problem in the cross-modal information fusion process, and (2) the prediction problem of less sample emotion labels that are semantically similar but different categories. To… ▽ More Emotion Recognition in Conversation (ERC) plays an important role in driving the development of human-machine interaction. Emotions can exist in multiple modalities, and multimodal ERC mainly faces two problems: (1) the noise problem in the cross-modal information fusion process, and (2) the prediction problem of less sample emotion labels that are semantically similar but different categories. To address these issues and fully utilize the features of each modality, we adopted the following strategies: first, deep emotion cues extraction was performed on modalities with strong representation ability, and feature filters were designed as multimodal prompt information for modalities with weak representation ability. Then, we designed a Multimodal Prompt Transformer (MPT) to perform cross-modal information fusion. MPT embeds multimodal fusion information into each attention layer of the Transformer, allowing prompt information to participate in encoding textual features and being fused with multi-level textual information to obtain better multimodal fusion features. Finally, we used the Hybrid Contrastive Learning (HCL) strategy to optimize the model's ability to handle labels with few samples. This strategy uses unsupervised contrastive learning to improve the representation ability of multimodal fusion and supervised contrastive learning to mine the information of labels with few samples. Experimental results show that our proposed model outperforms state-of-the-art models in ERC on two benchmark datasets. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: Accepted to ACM MM 2023

arXiv:2310.01861 [pdf, other]

Shifting More Attention to Breast Lesion Segmentation in Ultrasound Videos

Authors: Junhao Lin, Qian Dai, Lei Zhu, Huazhu Fu, Qiong Wang, Weibin Li, Wenhao Rao, Xiaoyang Huang, Liansheng Wang

Abstract: Breast lesion segmentation in ultrasound (US) videos is essential for diagnosing and treating axillary lymph node metastasis. However, the lack of a well-established and large-scale ultrasound video dataset with high-quality annotations has posed a persistent challenge for the research community. To overcome this issue, we meticulously curated a US video breast lesion segmentation dataset comprisi… ▽ More Breast lesion segmentation in ultrasound (US) videos is essential for diagnosing and treating axillary lymph node metastasis. However, the lack of a well-established and large-scale ultrasound video dataset with high-quality annotations has posed a persistent challenge for the research community. To overcome this issue, we meticulously curated a US video breast lesion segmentation dataset comprising 572 videos and 34,300 annotated frames, covering a wide range of realistic clinical scenarios. Furthermore, we propose a novel frequency and localization feature aggregation network (FLA-Net) that learns temporal features from the frequency domain and predicts additional lesion location positions to assist with breast lesion segmentation. We also devise a localization-based contrastive loss to reduce the lesion location distance between neighboring video frames within the same video and enlarge the location distances between frames from different ultrasound videos. Our experiments on our annotated dataset and two public video polyp segmentation datasets demonstrate that our proposed FLA-Net achieves state-of-the-art performance in breast lesion segmentation in US videos and video polyp segmentation while significantly reducing time and space complexity. Our model and dataset are available at https://github.com/jhl-Det/FLA-Net. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 10 pages

arXiv:2309.15867 [pdf]

Identifying factors associated with fast visual field progression in patients with ocular hypertension based on unsupervised machine learning

Authors: Xiaoqin Huang, Asma Poursoroush, Jian Sun, Michael V. Boland, Chris Johnson, Siamak Yousefi

Abstract: Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a laten… ▽ More Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a latent class mixed model (LCMM) to identify OHT subtypes using standard automated perimetry (SAP) mean deviation (MD) trajectories. We characterized the subtypes based on demographic, clinical, ocular, and VF factors at the baseline. We then identified factors driving fast VF progression using generalized estimating equation (GEE) and justified findings qualitatively and quantitatively. Results: The LCMM model discovered four clusters (subtypes) of eyes with different trajectories of MD worsening. The number of eyes in clusters were 794 (25%), 1675 (54%), 531 (17%) and 133 (4%). We labelled the clusters as Improvers, Stables, Slow progressors, and Fast progressors based on their mean of MD decline, which were 0.08, -0.06, -0.21, and -0.45 dBでしべる/year, respectively. Eyes with fast VF progression had higher baseline age, intraocular pressure (IOP), pattern standard deviation (PSD) and refractive error (RE), but lower central corneal thickness (CCT). Fast progression was associated with calcium channel blockers, being male, heart disease history, diabetes history, African American race, stroke history, and migraine headaches. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.15697 [pdf, other]

Physics Inspired Hybrid Attention for SAR Target Recognition

Authors: Zhongling Huang, Chong Wu, Xiwen Yao, Zhicheng Zhao, Xiankai Huang, Junwei Han

Abstract: There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability. The attributed scattering center (ASC) parameters garnered the most interest, being considered as additional input data or features for fusion in most methods. However, the performance greatly dep… ▽ More There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability. The attributed scattering center (ASC) parameters garnered the most interest, being considered as additional input data or features for fusion in most methods. However, the performance greatly depends on the ASC optimization result, and the fusion strategy is not adaptable to different types of physical information. Meanwhile, the current evaluation scheme is inadequate to assess the model's robustness and generalizability. Thus, we propose a physics inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol to address the above issues. PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of target, so as to re-weight the feature importance based on knowledge prior. It is flexible and generally applicable to various physical models, and can be integrated into arbitrary DNNs without modifying the original architecture. The experiments involve a rigorous assessment using the proposed OFA, which entails training and validating a model on either sufficient or limited data and evaluating on multiple test sets with different data distributions. Our method outperforms other state-of-the-art approaches in 12 test scenarios with same ASC parameters. Moreover, we analyze the working mechanism of PIHA and evaluate various PIHA enabled DNNs. The experiments also show PIHA is effective for different physical information. The source code together with the adopted physical information is available at https://github.com/XAI4SAR. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.02124 [pdf, other]

Exploiting Spatial-temporal Data for Sleep Stage Classification via Hypergraph Learning

Authors: Yuze Liu, Ziming Zhao, Tiehua Zhang, Kang Wang, Xin Chen, Xiaowei Huang, Jun Yin, Zhishu Shen

Abstract: Sleep stage classification is crucial for detecting patients' health conditions. Existing models, which mainly use Convolutional Neural Networks (CNN) for modelling Euclidean data and Graph Convolution Networks (GNN) for modelling non-Euclidean data, are unable to consider the heterogeneity and interactivity of multimodal data as well as the spatial-temporal correlation simultaneously, which hinde… ▽ More Sleep stage classification is crucial for detecting patients' health conditions. Existing models, which mainly use Convolutional Neural Networks (CNN) for modelling Euclidean data and Graph Convolution Networks (GNN) for modelling non-Euclidean data, are unable to consider the heterogeneity and interactivity of multimodal data as well as the spatial-temporal correlation simultaneously, which hinders a further improvement of classification performance. In this paper, we propose a dynamic learning framework STHL, which introduces hypergraph to encode spatial-temporal data for sleep stage classification. Hypergraphs can construct multi-modal/multi-type data instead of using simple pairwise between two subjects. STHL creates spatial and temporal hyperedges separately to build node correlations, then it conducts type-specific hypergraph learning process to encode the attributes into the embedding space. Extensive experiments show that our proposed STHL outperforms the state-of-the-art models in sleep stage classification tasks. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2307.11784 [pdf, other]

What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems

Authors: Saddek Bensalem, Chih-Hong Cheng, Wei Huang, Xiaowei Huang, Changshun Wu, Xingyu Zhao

Abstract: Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verificati… ▽ More Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.02514 [pdf, other]

Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data

Authors: Hongmin Cai, Xiaoke Huang, Zhengliang Liu, Wenxiong Liao, Haixing Dai, Zihao Wu, Dajiang Zhu, Hui Ren, Quanzheng Li, Tianming Liu, Xiang Li

Abstract: Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach invo… ▽ More Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach involves pre-trained language models and Graph Neural Network (GNN) that constructs a graph from the speech transcript, and extracts features using GNN for AD detection. Data augmentation techniques, including synonym replacement, GPT-based augmenter, and so on, were used to address the small dataset size. Audio data was also introduced, and WavLM model was used to extract audio features. These features were then fused with text features using various methods. Finally, a contrastive learning approach was attempted by converting speech transcripts back to audio and using it for contrastive learning with the original audio. We conducted intensive experiments and analysis on the above methods. Our findings shed light on the challenges and potential solutions in AD detection using speech and audio data. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.17419 [pdf]

Interferometric speckle visibility spectroscopy (iSVS) for measuring decorrelation time and dynamics of moving samples with enhanced signal-to-noise ratio and relaxed reference requirements

Authors: Yu Xi Huang, Simon Mahler, Jerome Mertz, Changhuei Yang

Abstract: Diffusing wave spectroscopy (DWS) is a group of techniques used to measure the dynamics of a scattering medium in a non-invasive manner. DWS methods rely on detecting the speckle light field from the moving scattering media and measuring the speckle decorrelation time to quantify the scattering mediums dynamics. For DWS, the signal-to-noise (SNR) is determined by the ratio between measured decorre… ▽ More Diffusing wave spectroscopy (DWS) is a group of techniques used to measure the dynamics of a scattering medium in a non-invasive manner. DWS methods rely on detecting the speckle light field from the moving scattering media and measuring the speckle decorrelation time to quantify the scattering mediums dynamics. For DWS, the signal-to-noise (SNR) is determined by the ratio between measured decorrelation time to the standard error of the measurement. This SNR is often low in certain applications because of high noise variances and low signal intensity, especially in biological applications with restricted exposure and emission levels. To address this photon-limited signal-to-noise ratio problem, we investigated, theoretically and experimentally, the SNR of an interferometric speckle visibility spectroscopy (iSVS) compared to more traditional DWS methods. We found that iSVS can provide excellent SNR performance through its ability to overcome camera noise. We also proved iSVS system has more relaxed constraints on the reference beam properties than most other interferometric systems. For an iSVS to function properly, we simply require the reference beam to exhibit local temporal stability, while incident angle, reference phase, and intensity uniformity do not need to be constrained. This flexibility can potentially enable more unconventional iSVS implementation schemes. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: 14 pages, 5 figures

MSC Class: 92C55

arXiv:2306.05946 [pdf, other]

Digital Twin-Assisted Resource Demand Prediction for Multicast Short Video Streaming

Authors: Xinyu Huang, Wen Wu, Xuemin Sherman Shen

Abstract: In this paper, we propose a digital twin (DT)-assisted resource demand prediction scheme to enhance prediction accuracy for multicast short video streaming. Particularly, we construct user DTs (UDTs) for collecting real-time user status, including channel condition, location, watching duration, and preference. A reinforcement learning-empowered K-means++ algorithm is developed to cluster users bas… ▽ More In this paper, we propose a digital twin (DT)-assisted resource demand prediction scheme to enhance prediction accuracy for multicast short video streaming. Particularly, we construct user DTs (UDTs) for collecting real-time user status, including channel condition, location, watching duration, and preference. A reinforcement learning-empowered K-means++ algorithm is developed to cluster users based on the collected user status in UDTs, which can effectively employ the mined users' intrinsic correlation to improve the accuracy of user clustering. We then analyze users' video watching duration and preferences in each multicast group to obtain the swiping probability distribution and recommended videos, respectively. The obtained information is utilized to predict radio and computing resource demand of each multicast group. Initial results demonstrate that the proposed scheme can effectively abstract multicast groups' swiping probability distributions for accurate resource demand prediction. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 2 pages, 3 figures

arXiv:2305.14838 [pdf, other]

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Authors: Chenyang Le, Yao Qian, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng, Xuedong Huang

Abstract: Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language. We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models and optimized data-efficiently for spoken language tasks. Particularly, we propose to incorporate… ▽ More Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language. We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models and optimized data-efficiently for spoken language tasks. Particularly, we propose to incorporate cross-modality learning into transfer learning and conduct them simultaneously for downstream tasks in a multi-task learning manner. Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks, achieving a new state-of-the-art average BLEU score of 31.5 on the multilingual speech to English text translation task for 21 languages, as measured on the public CoVoST2 evaluation set. △ Less

Submitted 14 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023, Poster

arXiv:2305.12311 [pdf, other]

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Authors: Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Abstract: The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is a… ▽ More The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is an integrative system that leverages state-of-the-art single-modality encoders, combining their outputs with a new modality-fusing encoder in order to flexibly project combinations of modalities into a shared representational space. Next, language tokens are generated from these representations via an autoregressive decoder. The whole framework is pretrained end-to-end on a large collection of dual- and single-modality datasets using a novel text completion objective that can be generalized across arbitrary combinations of modalities. i-Code V2 matches or outperforms state-of-the-art single- and dual-modality baselines on 7 multimodal tasks, demonstrating the power of generative multimodal pretraining across a diversity of tasks and signals. △ Less

Submitted 20 May, 2023; originally announced May 2023.

arXiv:2305.11548 [pdf, ps, other]

Sensing Aided Uplink Transmission in OTFS ISAC with Joint Parameter Association, Channel Estimation and Signal Detection

Authors: Xi Yang, Hang Li, Qinghua Guo, J. Andrew Zhang, Xiaojing Huang, Zhiqun Cheng

Abstract: In this work, we study sensing-aided uplink transmission in an integrated sensing and communication (ISAC) vehicular network with the use of orthogonal time frequency space (OTFS) modulation. To exploit sensing parameters for improving uplink communications, the parameters must be first associated with the transmitters, which is a challenging task. We propose a scheme that jointly conducts paramet… ▽ More In this work, we study sensing-aided uplink transmission in an integrated sensing and communication (ISAC) vehicular network with the use of orthogonal time frequency space (OTFS) modulation. To exploit sensing parameters for improving uplink communications, the parameters must be first associated with the transmitters, which is a challenging task. We propose a scheme that jointly conducts parameter association, channel estimation and signal detection by formulating it as a constrained bilinear recovery problem. Then we develop a message passing algorithm to solve the problem, leveraging the bilinear unitary approximate message passing (Bi-UAMP) algorithm. Numerical results validate the proposed scheme, which show that relevant performance bounds can be closely approached. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2304.00729 [pdf, other]

Data-Driven Safe Controller Synthesis for Deterministic Systems: A Posteriori Method With Validation Tests

Authors: Yu Chen, Chao Shang, Xiaolin Huang, Xiang Yin

Abstract: In this work, we investigate the data-driven safe control synthesis problem for unknown dynamic systems. We first formulate the safety synthesis problem as a robust convex program (RCP) based on notion of control barrier function. To resolve the issue of unknown system dynamic, we follow the existing approach by converting the RCP to a scenario convex program (SCP) by randomly collecting finite sa… ▽ More In this work, we investigate the data-driven safe control synthesis problem for unknown dynamic systems. We first formulate the safety synthesis problem as a robust convex program (RCP) based on notion of control barrier function. To resolve the issue of unknown system dynamic, we follow the existing approach by converting the RCP to a scenario convex program (SCP) by randomly collecting finite samples of system trajectory. However, to improve the sample efficiency to achieve a desired confidence bound, we provide a new posteriori method with validation tests. Specifically, after collecting a set of data for the SCP, we further collect another set of independent \emph{validate data} as posterior information to test the obtained solution. We derive a new overall confidence bound for the safety of the controller that connects the original sample data, the support constraints, and the validation data. The efficiency of the proposed approach is illustrated by a case study of room temperature control. We show that, compared with existing methods, the proposed approach can significantly reduce the required number of sample data to achieve a desired confidence bound. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2303.13859 [pdf, other]

XGC-VQA: A unified video quality assessment model for User, Professionally, and Occupationally-Generated Content

Authors: Xinhui Huang, Chunyi Li, Abdelhak Bentaleb, Roger Zimmermann, Guangtao Zhai

Abstract: With the rapid growth of Internet video data amounts and types, a unified Video Quality Assessment (VQA) is needed to inspire video communication with perceptual quality. To meet the real-time and universal requirements in providing such inspiration, this study proposes a VQA model from a classification of User Generated Content (UGC), Professionally Generated Content (PGC), and Occupationally Gen… ▽ More With the rapid growth of Internet video data amounts and types, a unified Video Quality Assessment (VQA) is needed to inspire video communication with perceptual quality. To meet the real-time and universal requirements in providing such inspiration, this study proposes a VQA model from a classification of User Generated Content (UGC), Professionally Generated Content (PGC), and Occupationally Generated Content (OGC). In the time domain, this study utilizes non-uniform sampling, as each content type has varying temporal importance based on its perceptual quality. In the spatial domain, centralized downsampling is performed before the VQA process by utilizing a patch splicing/sampling mechanism to lower complexity for real-time assessment. The experimental results demonstrate that the proposed method achieves a median correlation of $0.7$ while limiting the computation time below 5s for three content types, which ensures that the communication experience of UGC, PGC, and OGC can be optimized altogether. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: 6 pages, 4 figures

arXiv:2303.08015 [pdf, ps, other]

Molecular Communication for Quorum Sensing Inspired Cooperative Drug Delivery

Authors: Yuting Fang, Stuart T. Johnston, Matt Faria, Xinyu Huang, Andrew W. Eckford, Jamie Evans

Abstract: A cooperative drug delivery system is proposed, where quorum sensing (QS), a density-dependent bacterial behavior coordination mechanism, is employed by synthetic bacterium-based nanomachines (B-NMs) for controllable drug delivery. In our proposed system, drug delivery is only triggered when there are enough QS molecules, which in turn only happens when there are enough B-NMs. This makes the propo… ▽ More A cooperative drug delivery system is proposed, where quorum sensing (QS), a density-dependent bacterial behavior coordination mechanism, is employed by synthetic bacterium-based nanomachines (B-NMs) for controllable drug delivery. In our proposed system, drug delivery is only triggered when there are enough QS molecules, which in turn only happens when there are enough B-NMs. This makes the proposed system can be used to achieve a high release rate of drug molecules from a high number of B-NMs when the population density of B-NMs may not be known. Analytical expressions for i) the expected activation probability of the B-NM due to randomly-distributed B-NMs and ii) the expected aggregate absorption rate of drug molecules due to randomly-distributed QS activated B-NMs are derived. Analytical results are verified by particle-based simulations. The derived results can help to predict and control the impact of environmental factors (e.g. diffusion coefficient and degradation rate) on the absorption rate of drug molecules since rigorous diffusion-based molecular channels are considered. Our results show that the activation probability at the B-NM increases as this B-NM is located closer to the center of the B-NM population and the aggregate absorption rate of the drug molecules non-linearly increases as the population density increases. △ Less

Submitted 14 February, 2023; originally announced March 2023.

Comments: 9 pages; 9 figures

arXiv:2302.13755 [pdf, ps, other]

Neuroadaptive Distributed Event-triggered Control of Networked Uncertain Pure-feedback Systems with Polluted Feedback

Authors: Libei Sun, Zhirong Zhang, Xinjian Huang, Xiucai Huang

Abstract: This paper investigates the distributed event-triggered control problem for a class of uncertain pure-feedback nonlinear multi-agent systems (MASs) with polluted feedback. Under the setting of event-triggered control, substantial challenges exist in both control design and stability analysis for systems in more general non-affine pure-feedback forms wherein all state variables are not directly and… ▽ More This paper investigates the distributed event-triggered control problem for a class of uncertain pure-feedback nonlinear multi-agent systems (MASs) with polluted feedback. Under the setting of event-triggered control, substantial challenges exist in both control design and stability analysis for systems in more general non-affine pure-feedback forms wherein all state variables are not directly and continuously available or even polluted due to sensor failures, and thus far very limited results are available in literature. In this work, a nominal control strategy under regular state feedback is firstly developed by combining neural network (NN) approximating with dynamic filtering technique, and then a NN-based distributed event-triggered control strategy is proposed by resorting to a novel replacement policy, making the non-differentiability issue arising from event-triggering setting completely circumvented. Besides, the sensor ineffectiveness is accommodated automatically without using fault detection and diagnosis unit or controller reconfiguration. It is shown that all the internal signals are semi-globally uniformly ultimately bounded (SGUUB) with the aid of several vital lemmas, while the outputs of all the subsystems reaching a consensus without infinitely fast execution. Finally, the efficiency of the developed algorithm are verified via numerical simulation. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.01728 [pdf, other]

Decentralised and Cooperative Control of Multi-Robot Systems through Distributed Optimisation

Authors: Yi Dong, Zhongguo Li, Xingyu Zhao, Zhengtao Ding, Xiaowei Huang

Abstract: Multi-robot cooperative control has gained extensive research interest due to its wide applications in civil, security, and military domains. This paper proposes a cooperative control algorithm for multi-robot systems with general linear dynamics. The algorithm is based on distributed cooperative optimisation and output regulation, and it achieves global optimum by utilising only information share… ▽ More Multi-robot cooperative control has gained extensive research interest due to its wide applications in civil, security, and military domains. This paper proposes a cooperative control algorithm for multi-robot systems with general linear dynamics. The algorithm is based on distributed cooperative optimisation and output regulation, and it achieves global optimum by utilising only information shared among neighbouring robots. Technically, a high-level distributed optimisation algorithm for multi-robot systems is presented, which will serve as an optimal reference generator for each individual agent. Then, based on the distributed optimisation algorithm, an output regulation method is utilised to solve the optimal coordination problem for general linear dynamic systems. The convergence of the proposed algorithm is theoretically proved. Both numerical simulations and real-time physical robot experiments are conducted to validate the effectiveness of the proposed cooperative control algorithms. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Comments: Accepted by AAMAS'23

arXiv:2301.03062 [pdf, ps, other]

doi 10.1109/INFOCOM53939.2023.10229017

AnycostFL: Efficient On-Demand Federated Learning over Heterogeneous Edge Devices

Authors: Peichun Li, Guoliang Cheng, Xumin Huang, Jiawen Kang, Rong Yu, Yuan Wu, Miao Pan

Abstract: In this work, we investigate the challenging problem of on-demand federated learning (FL) over heterogeneous edge devices with diverse resource constraints. We propose a cost-adjustable FL framework, named AnycostFL, that enables diverse edge devices to efficiently perform local updates under a wide range of efficiency constraints. To this end, we design the model shrinking to support local model… ▽ More In this work, we investigate the challenging problem of on-demand federated learning (FL) over heterogeneous edge devices with diverse resource constraints. We propose a cost-adjustable FL framework, named AnycostFL, that enables diverse edge devices to efficiently perform local updates under a wide range of efficiency constraints. To this end, we design the model shrinking to support local model training with elastic computation cost, and the gradient compression to allow parameter transmission with dynamic communication overhead. An enhanced parameter aggregation is conducted in an element-wise manner to improve the model performance. Focusing on AnycostFL, we further propose an optimization design to minimize the global training loss with personalized latency and energy constraints. By revealing the theoretical insights of the convergence analysis, personalized training strategies are deduced for different devices to match their locally available resources. Experiment results indicate that, when compared to the state-of-the-art efficient FL algorithms, our learning framework can reduce up to 1.9 times of the training latency and energy consumption for realizing a reasonable global testing accuracy. Moreover, the results also demonstrate that, our approach significantly improves the converged global accuracy. △ Less

Submitted 8 January, 2023; originally announced January 2023.

Comments: Accepted to IEEE INFOCOM 2023

Journal ref: IEEE INFOCOM 2023 - IEEE Conference on Computer Communications, New York City, NY, USA, 2023, pp. 1-10

arXiv:2212.03329 [pdf, other]

Enhancing Low-Density EEG-Based Brain-Computer Interfaces with Similarity-Keeping Knowledge Distillation

Authors: Xin-Yao Huang, Sung-Yu Chen, Chun-Shu Wei

Abstract: Electroencephalogram (EEG) has been one of the common neuromonitoring modalities for real-world brain-computer interfaces (BCIs) because of its non-invasiveness, low cost, and high temporal resolution. Recently, light-weight and portable EEG wearable devices based on low-density montages have increased the convenience and usability of BCI applications. However, loss of EEG decoding performance is… ▽ More Electroencephalogram (EEG) has been one of the common neuromonitoring modalities for real-world brain-computer interfaces (BCIs) because of its non-invasiveness, low cost, and high temporal resolution. Recently, light-weight and portable EEG wearable devices based on low-density montages have increased the convenience and usability of BCI applications. However, loss of EEG decoding performance is often inevitable due to reduced number of electrodes and coverage of scalp regions of a low-density EEG montage. To address this issue, we introduce knowledge distillation (KD), a learning mechanism developed for transferring knowledge/information between neural network models, to enhance the performance of low-density EEG decoding. Our framework includes a newly proposed similarity-keeping (SK) teacher-student KD scheme that encourages a low-density EEG student model to acquire the inter-sample similarity as in a pre-trained teacher model trained on high-density EEG data. The experimental results validate that our SK-KD framework consistently improves motor-imagery EEG decoding accuracy when number of electrodes deceases for the input EEG data. For both common low-density headphone-like and headband-like montages, our method outperforms state-of-the-art KD methods across various EEG decoding model architectures. As the first KD scheme developed for enhancing EEG decoding, we foresee the proposed SK-KD framework to facilitate the practicality of low-density EEG-based BCI in real-world applications. △ Less

Submitted 6 December, 2022; originally announced December 2022.

arXiv:2211.15036 [pdf, ps, other]

Quantized control of non-Lipschitz nonlinear systems: a novel control framework with prescribed transient performance and lower design complexity

Authors: Zongcheng Liu, Jiangshuai Huang, Changyun Wen, Jing Zhou, Xiucai Huang

Abstract: A novel control design framework is proposed for a class of non-Lipschitz nonlinear systems with quantized states, meanwhile prescribed transient performance and lower control design complexity could be guaranteed. Firstly, different from all existing control methods for systems with state quantization, global stability of strict-feedback nonlinear systems is achieved without requiring the conditi… ▽ More A novel control design framework is proposed for a class of non-Lipschitz nonlinear systems with quantized states, meanwhile prescribed transient performance and lower control design complexity could be guaranteed. Firstly, different from all existing control methods for systems with state quantization, global stability of strict-feedback nonlinear systems is achieved without requiring the condition that the nonlinearities of the system model satisfy global Lipschitz continuity. Secondly, a novel barrier function-free prescribed performance control (BFPPC) method is proposed, which can guarantee prescribed transient performance under quantized states. Thirdly, a new \textit{W}-function-based control scheme is designed such that virtual control signals are not required to be differentiated repeatedly and the controller could be designed in a simple way, which guarantees global stability and lower design complexity compared with traditional dynamic surface control (DSC). Simulation results demonstrate the effectiveness of our method. △ Less

Submitted 27 November, 2022; originally announced November 2022.

arXiv:2211.06906 [pdf, other]

Digital Twin-Assisted Collaborative Transcoding for Better User Satisfaction in Live Streaming

Authors: Xinyu Huang, Mushu Li, Wen Wu, Conghao Zhou, Xuemin Sherman Shen

Abstract: In this paper, we propose a digital twin (DT)-assisted cloud-edge collaborative transcoding scheme to enhance user satisfaction in live streaming. We first present a DT-assisted transcoding workload estimation (TWE) model for the cloud-edge collaborative transcoding. Particularly, two DTs are constructed for emulating the cloud-edge collaborative transcoding process by analyzing spatial-temporal i… ▽ More In this paper, we propose a digital twin (DT)-assisted cloud-edge collaborative transcoding scheme to enhance user satisfaction in live streaming. We first present a DT-assisted transcoding workload estimation (TWE) model for the cloud-edge collaborative transcoding. Particularly, two DTs are constructed for emulating the cloud-edge collaborative transcoding process by analyzing spatial-temporal information of individual videos and transcoding configurations of transcoding queues, respectively. Two light-weight Bayesian neural networks are adopted to fit the TWE models in DTs, respectively. We then formulate a transcoding-path selection problem to maximize long-term user satisfaction within an average service delay threshold, taking into account the dynamics of video arrivals and video requests. The problem is transformed into a standard Markov decision process by using the Lyapunov optimization and solved by a deep reinforcement learning algorithm. Simulation results based on the real-world dataset demonstrate that the proposed scheme can effectively enhance user satisfaction compared with benchmark schemes. △ Less

Submitted 13 November, 2022; originally announced November 2022.

Comments: Submitted to ICC 2023

arXiv:2210.15022 [pdf, other]

Automatic Assessment of Infant Face and Upper-Body Symmetry as Early Signs of Torticollis

Authors: Michael Wan, Xiaofei Huang, Bethany Tunik, Sarah Ostadabbas

Abstract: We apply computer vision pose estimation techniques developed expressly for the data-scarce infant domain to the study of torticollis, a common condition in infants for which early identification and treatment is critical. Specifically, we use a combination of facial landmark and body joint estimation techniques designed for infants to estimate a range of geometric measures pertaining to face and… ▽ More We apply computer vision pose estimation techniques developed expressly for the data-scarce infant domain to the study of torticollis, a common condition in infants for which early identification and treatment is critical. Specifically, we use a combination of facial landmark and body joint estimation techniques designed for infants to estimate a range of geometric measures pertaining to face and upper body symmetry, drawn from an array of sources in the physical therapy and ophthalmology research literature in torticollis. We gauge performance with a range of metrics and show that the estimates of most these geometric measures are successful, yielding strong to very strong Spearman's $ρろー$ correlation with ground truth values. Furthermore, we show that these estimates, derived from pose estimation neural networks designed for the infant domain, cleanly outperform estimates derived from more widely known networks designed for the adult domain △ Less

Submitted 7 November, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.08369 [pdf]

Metasurface Smart Glass for Object Recognition

Authors: Cheng-Chia Tsai, Xiaoyan Huang, Zhicheng Wu, Zongfu Yu, Nanfang Yu

Abstract: Recent years have seen a considerable surge of research on developing heuristic approaches to realize analog computing using physical waves. Among these, neuromorphic computing using light waves is envisioned to feature performance metrics such as computational speed and energy efficiency exceeding those of conventional digital techniques by many orders of magnitude. Yet, neuromorphic computing ba… ▽ More Recent years have seen a considerable surge of research on developing heuristic approaches to realize analog computing using physical waves. Among these, neuromorphic computing using light waves is envisioned to feature performance metrics such as computational speed and energy efficiency exceeding those of conventional digital techniques by many orders of magnitude. Yet, neuromorphic computing based on photonics remains a challenge due to the difficulty of training and manufacturing sophisticated photonic structures to support neural networks with adequate expressive power. Here, we realize a diffractive optical neural network (ONN) based on metasurfaces that can recognize objects by directly processing light waves scattered from the objects. Metasurfaces composed of a two-dimensional array of millions of meta-units can realize precise control of optical wavefront with subwavelength resolution; thus, when used as constitutive layers of an ONN, they can provide exceptionally high expressive power. We experimentally demonstrate ONNs based on single-layered metasurfaces that modulate the phase and polarization over optical wavefront for recognizing optically coherent binary objects, including hand-written digits and English alphabetic letters. We further demonstrate, in simulation, ONNs based on metasurface doublets for human facial verification. The advantageous traits of metasurface-based ONNs, including ultra-compact form factors, zero power consumption, ultra-fast and parallel data processing capabilities, and physics-guaranteed data security, make them suitable as "edge" perception devices that can transform the future of image collection and analysis. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: 30 pages, 6 figures

arXiv:2210.07098 [pdf]

Meta-learning Based Short-Term Passenger Flow Prediction for Newly-Operated Urban Rail Transit Stations

Authors: Kuo Han, Jinlei Zhang, Chunqi Zhu, Lixing Yang, Xiaoyu Huang, Songsong Li

Abstract: Accurate short-term passenger flow prediction in urban rail transit stations has great benefits for reasonably allocating resources, easing congestion, and reducing operational risks. However, compared with data-rich stations, the passenger flow prediction in newly-operated stations is limited by passenger flow data volume, which would reduce the prediction accuracy and increase the difficulty for… ▽ More Accurate short-term passenger flow prediction in urban rail transit stations has great benefits for reasonably allocating resources, easing congestion, and reducing operational risks. However, compared with data-rich stations, the passenger flow prediction in newly-operated stations is limited by passenger flow data volume, which would reduce the prediction accuracy and increase the difficulty for station management and operation. Hence, how accurately predicting passenger flow in newly-operated stations with limited data is an urgent problem to be solved. Existing passenger flow prediction approaches generally depend on sufficient data, which might be unsuitable for newly-operated stations. Therefore, we propose a meta-learning method named Meta Long Short-Term Memory Network (Meta-LSTM) to predict the passenger flow in newly-operated stations. The Meta-LSTM is to construct a framework that increases the generalization ability of long short-term memory network (LSTM) to various passenger flow characteristics by learning passenger flow characteristics from multiple data-rich stations and then applying the learned parameter to data-scarce stations by parameter initialization. The Meta-LSTM is applied to the subway network of Nanning, Hangzhou, and Beijing, China. The experiments on three real-world subway networks demonstrate the effectiveness of our proposed Meta-LSTM over several competitive baseline models. Results also show that our proposed Meta-LSTM has a good generalization ability to various passenger flow characteristics, which can provide a reference for passenger flow prediction in the stations with limited data. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: 37 pages, 13 figures, 3 tables

arXiv:2210.04435 [pdf, other]

Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning

Authors: Xiaoyu Huang, Zhongyu Li, Yanzhen Xiang, Yiming Ni, Yufeng Chi, Yunhao Li, Lizhi Yang, Xue Bin Peng, Koushil Sreenath

Abstract: We present a reinforcement learning (RL) framework that enables quadrupedal robots to perform soccer goalkeeping tasks in the real world. Soccer goalkeeping using quadrupeds is a challenging problem, that combines highly dynamic locomotion with precise and fast non-prehensile object (ball) manipulation. The robot needs to react to and intercept a potentially flying ball using dynamic locomotion ma… ▽ More We present a reinforcement learning (RL) framework that enables quadrupedal robots to perform soccer goalkeeping tasks in the real world. Soccer goalkeeping using quadrupeds is a challenging problem, that combines highly dynamic locomotion with precise and fast non-prehensile object (ball) manipulation. The robot needs to react to and intercept a potentially flying ball using dynamic locomotion maneuvers in a very short amount of time, usually less than one second. In this paper, we propose to address this problem using a hierarchical model-free RL framework. The first component of the framework contains multiple control policies for distinct locomotion skills, which can be used to cover different regions of the goal. Each control policy enables the robot to track random parametric end-effector trajectories while performing one specific locomotion skill, such as jump, dive, and sidestep. These skills are then utilized by the second part of the framework which is a high-level planner to determine a desired skill and end-effector trajectory in order to intercept a ball flying to different regions of the goal. We deploy the proposed framework on a Mini Cheetah quadrupedal robot and demonstrate the effectiveness of our framework for various agile interceptions of a fast-moving ball in the real world. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: First two authors contributed equally. Accompanying video is at https://youtu.be/iX6OgG67-ZQ

arXiv:2209.06261 [pdf, other]

Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine

Authors: Kun Wang, William R. Johnson III, Shiyang Lu, Xiaonan Huang, Joran Booth, Rebecca Kramer-Bottiglio, Mridul Aanjaneya, Kostas Bekris

Abstract: Tensegrity robots, composed of rigid rods and flexible cables, exhibit high strength-to-weight ratios and significant deformations, which enable them to navigate unstructured terrains and survive harsh impacts. They are hard to control, however, due to high dimensionality, complex dynamics, and a coupled architecture. Physics-based simulation is a promising avenue for developing locomotion policie… ▽ More Tensegrity robots, composed of rigid rods and flexible cables, exhibit high strength-to-weight ratios and significant deformations, which enable them to navigate unstructured terrains and survive harsh impacts. They are hard to control, however, due to high dimensionality, complex dynamics, and a coupled architecture. Physics-based simulation is a promising avenue for developing locomotion policies that can be transferred to real robots. Nevertheless, modeling tensegrity robots is a complex task due to a substantial sim2real gap. To address this issue, this paper describes a Real2Sim2Real (R2S2R) strategy for tensegrity robots. This strategy is based on a differentiable physics engine that can be trained given limited data from a real robot. These data include offline measurements of physical properties, such as mass and geometry for various robot components, and the observation of a trajectory using a random control policy. With the data from the real robot, the engine can be iteratively refined and used to discover locomotion policies that are directly transferable to the real robot. Beyond the R2S2R pipeline, key contributions of this work include computing non-zero gradients at contact points, a loss function for matching tensegrity locomotion gaits, and a trajectory segmentation technique that avoids conflicts in gradient evaluation during training. Multiple iterations of the R2S2R process are demonstrated and evaluated on a real 3-bar tensegrity robot. △ Less

Submitted 17 September, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

Comments: Accepted to IROS2023; https://sites.google.com/view/sim2real

arXiv:2208.09792 [pdf, other]

Simultaneous Beam and User Selection for the Beamspace mmWave/THz Massive MIMO Downlink

Authors: Kai Wu, J. Andrew Zhang, Xiaojing Huang, Y. Jay Guo, Lajos Hanzo

Abstract: Beamspace millimeter-wave (mmWave) and terahertz (THz) massive MIMO constitute attractive schemes for next-generation communications, given their abundant bandwidth and high throughput. However, their user and beam selection problem has to be efficiently addressed. Inspired by this challenge, we develop low-complexity solutions explicitly. We introduce the dirty paper coding (DPC) into the joint u… ▽ More Beamspace millimeter-wave (mmWave) and terahertz (THz) massive MIMO constitute attractive schemes for next-generation communications, given their abundant bandwidth and high throughput. However, their user and beam selection problem has to be efficiently addressed. Inspired by this challenge, we develop low-complexity solutions explicitly. We introduce the dirty paper coding (DPC) into the joint user and beam selection problem. We unveil the compelling properties of the DPC sum rate in beamspace massive MIMO, showing its monotonic evolution against the number of users and beams selected. We then exploit its beneficial properties for substantially simplifying the joint user and beam selection problem. Furthermore, we develop a set of algorithms striking unique trade-offs for solving the simplified problem, facilitating simultaneous user and beam selection based on partial beamspace channels for the first time. Additionally, we derive the sum rate bound of the algorithms and analyze their complexity. Our simulation results validate the effectiveness of the proposed design and analysis, confirming their superiority over prior solutions. △ Less

Submitted 15 January, 2023; v1 submitted 20 August, 2022; originally announced August 2022.

Comments: 12 pages, 8 figures; to appear in IEEE Transactions on Communications

arXiv:2208.09791 [pdf, other]

Joint Communications and Sensing Employing Optimized MIMO-OFDM Signals

Authors: Kai Wu, J. Andrew Zhang, Zhitong Ni, Xiaojing Huang, Y. Jay Guo, Shanzhi Chen

Abstract: Joint communication and sensing (JCAS) has the potential to improve the overall energy, cost and frequency efficiency of IoT systems. As a first effort, we propose to optimize the MIMO-OFDM data symbols carried by sub-carriers for better time- and spatial-domain signal orthogonality. This not only boosts the availability of usable signals for JCAS, but also significantly facilitates Internet-of-Th… ▽ More Joint communication and sensing (JCAS) has the potential to improve the overall energy, cost and frequency efficiency of IoT systems. As a first effort, we propose to optimize the MIMO-OFDM data symbols carried by sub-carriers for better time- and spatial-domain signal orthogonality. This not only boosts the availability of usable signals for JCAS, but also significantly facilitates Internet-of-Things (IoT) devices to perform high-quality sensing. We establish an optimization problem that modifies data symbols on sub-carriers to enhance the above-mentioned signal orthogonality. We also develop an efficient algorithm to solve the problem based on the majorization-minimization framework. Moreover, we discover unique signal structures and features from the newly modeled problem, which substantially reduce the complexity of majorizing the objective function. We also develop new projectors to enforce the feasibility of the obtained solution. Simulations show that, compared with the original communication waveform to achieve the same sensing performance, the optimized waveform can reduce the signal-to-noise ratio (SNR) requirement by 3~4.5 dBでしべる, while the SNR loss for the uncoded bit error rate is only 1~1.5 dBでしべる. △ Less

Submitted 20 August, 2022; originally announced August 2022.

Comments: 15 pages, 7 figures; submitted to an IEEE journal

arXiv:2208.09782 [pdf, other]

Green Joint Communications and Sensing Employing Analog Multi-Beam Antenna Arrays

Authors: Kai Wu, J. Andrew Zhang, Xiaojing Huang, Robert W. Heath Jr., Y. Jay Guo

Abstract: Joint communications and sensing (JCAS) is potentially a hallmark technology for the sixth generation mobile network (6G). Most existing JCAS designs are based on digital arrays, analog arrays with tunable phase shifters, or hybrid arrays, which are effective but are generally complicated to design and power inefficient. This article introduces the energy-efficient and easy-to-design multi-beam an… ▽ More Joint communications and sensing (JCAS) is potentially a hallmark technology for the sixth generation mobile network (6G). Most existing JCAS designs are based on digital arrays, analog arrays with tunable phase shifters, or hybrid arrays, which are effective but are generally complicated to design and power inefficient. This article introduces the energy-efficient and easy-to-design multi-beam antenna arrays (MBAAs) for JCAS. Using pre-designed and fixed analog devices, such as lens or Butler matrix, MBAA can simultaneously steer multiple beams yet with negligible power consumption compared with other techniques. Moreover, MBAAs enable flexible beam synthesis, accurate angle-of-arrival estimation, and easy handling/utilization of the beam squint effect. All these features have not been well captured by the JACS community yet. To promote the awareness of them, we intuitively illustrate them and also exploit them for constructing a multi-beam JCAS framework. Finally, the challenges and opportunities are discussed to foster the development of green JCAS systems. △ Less

Submitted 1 January, 2023; v1 submitted 20 August, 2022; originally announced August 2022.

Comments: to appear in IEEE Communications Magazine; 7 pages, 5 figures, 1 table

arXiv:2208.05616 [pdf, other]

OpenMedIA: Open-Source Medical Image Analysis Toolbox and Benchmark under Heterogeneous AI Computing Platforms

Authors: Jia-Xin Zhuang, Xiansong Huang, Yang Yang, Jiancong Chen, Yue Yu, Wei Gao, Ge Li, Jie Chen, Tong Zhang

Abstract: In this paper, we present OpenMedIA, an open-source toolbox library containing a rich set of deep learning methods for medical image analysis under heterogeneous Artificial Intelligence (AI) computing platforms. Various medical image analysis methods, including 2D/3D medical image classification, segmentation, localisation, and detection, have been included in the toolbox with PyTorch and/or MindS… ▽ More In this paper, we present OpenMedIA, an open-source toolbox library containing a rich set of deep learning methods for medical image analysis under heterogeneous Artificial Intelligence (AI) computing platforms. Various medical image analysis methods, including 2D/3D medical image classification, segmentation, localisation, and detection, have been included in the toolbox with PyTorch and/or MindSpore implementations under heterogeneous NVIDIA and Huawei Ascend computing systems. To our best knowledge, OpenMedIA is the first open-source algorithm library providing compared PyTorch and MindSpore implementations and results on several benchmark datasets. The source codes and models are available at https://git.openi.org.cn/OpenMedIA. △ Less

Submitted 7 September, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: 12 pages, 1 figure

Showing 1–50 of 195 results for author: Huang, X