Search | arXiv e-print repository

Super-resolution multi-contrast unbiased eye atlases with deep probabilistic refinement

Authors: Ho Hin Lee, Adam M. Saunders, Michael E. Kim, Samuel W. Remedios, Lucas W. Remedios, Yucheng Tang, Qi Yang, Xin Yu, Shunxing Bao, Chloe Cho, Louise A. Mawn, Tonia S. Rex, Kevin L. Schey, Blake E. Dewey, Jeffrey M. Spraggins, Jerry L. Prince, Yuankai Huo, Bennett A. Landman

Abstract: Purpose: Eye morphology varies significantly across the population, especially for the orbit and optic nerve. These variations limit the feasibility and robustness of generalizing population-wise features of eye organs to an unbiased spatial reference. Approach: To tackle these limitations, we propose a process for creating high-resolution unbiased eye atlases. First, to restore spatial details… ▽ More Purpose: Eye morphology varies significantly across the population, especially for the orbit and optic nerve. These variations limit the feasibility and robustness of generalizing population-wise features of eye organs to an unbiased spatial reference. Approach: To tackle these limitations, we propose a process for creating high-resolution unbiased eye atlases. First, to restore spatial details from scans with a low through-plane resolution compared to a high in-plane resolution, we apply a deep learning-based super-resolution algorithm. Then, we generate an initial unbiased reference with an iterative metric-based registration using a small portion of subject scans. We register the remaining scans to this template and refine the template using an unsupervised deep probabilistic approach that generates a more expansive deformation field to enhance the organ boundary alignment. We demonstrate this framework using magnetic resonance images across four different tissue contrasts, generating four atlases in separate spatial alignments. Results: For each tissue contrast, we find a significant improvement using the Wilcoxon signed-rank test in the average Dice score across four labeled regions compared to a standard registration framework consisting of rigid, affine, and deformable transformations. These results highlight the effective alignment of eye organs and boundaries using our proposed process. Conclusions: By combining super-resolution preprocessing and deep probabilistic models, we address the challenge of generating an eye atlas to serve as a standardized reference across a largely variable population. △ Less

Submitted 14 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: Revised for submission to SPIE Journal of Medical Imaging. 26 pages, 6 figures

arXiv:2401.02557 [pdf, other]

Multivariate Functional Clustering with Variable Selection and Application to Sensor Data from Engineering Systems

Authors: Zhongnan Jin, Jie Min, Yili Hong, Pang Du, Qingyu Yang

Abstract: Multi-sensor data that track system operating behaviors are widely available nowadays from various engineering systems. Measurements from each sensor over time form a curve and can be viewed as functional data. Clustering of these multivariate functional curves is important for studying the operating patterns of systems. One complication in such applications is the possible presence of sensors who… ▽ More Multi-sensor data that track system operating behaviors are widely available nowadays from various engineering systems. Measurements from each sensor over time form a curve and can be viewed as functional data. Clustering of these multivariate functional curves is important for studying the operating patterns of systems. One complication in such applications is the possible presence of sensors whose data do not contain relevant information. Hence it is desirable for the clustering method to equip with an automatic sensor selection procedure. Motivated by a real engineering application, we propose a functional data clustering method that simultaneously removes noninformative sensors and groups functional curves into clusters using informative sensors. Functional principal component analysis is used to transform multivariate functional data into a coefficient matrix for data reduction. We then model the transformed data by a Gaussian mixture distribution to perform model-based clustering with variable selection. Three types of penalties, the individual, variable, and group penalties, are considered to achieve automatic variable selection. Extensive simulations are conducted to assess the clustering and variable selection performance of the proposed methods. The application of the proposed methods to an engineering system with multiple sensors shows the promise of the methods and reveals interesting patterns in the sensor data. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 30 pages, 7 figures

arXiv:2401.01933 [pdf, other]

doi 10.3847/1538-4357/ad2f30

Exploring Changing-look Active Galactic Nuclei with the Sloan Digital Sky Survey V: First Year Results

Authors: Grisha Zeltyn, Benny Trakhtenbrot, Michael Eracleous, Qian Yang, Paul Green, Scott F. Anderson, Stephanie LaMassa, Jessie Runnoe, Roberto J. Assef, Franz E. Bauer, W. N. Brandt, Megan C. Davis, Sara E. Frederick, Logan B. Fries, Matthew J. Graham, Norman A. Grogin, Muryel Guolo, Lorena Hernández-García, Anton M. Koekemoer, Mirko Krumpe, Xin Liu, Mary Loli Martínez-Aldama, Claudio Ricci, Donald P. Schneider, Yue Shen , et al. (10 additional authors not shown)

Abstract: "Changing-look" active galactic nuclei (CL-AGNs) challenge our basic ideas about the physics of accretion flows and circumnuclear gas around supermassive black holes. Using first-year Sloan Digital Sky Survey V (SDSS-V) repeated spectroscopy of nearly 29,000 previously known AGNs, combined with dedicated follow-up spectroscopy, and publicly available optical light curves, we have identified 116 CL… ▽ More "Changing-look" active galactic nuclei (CL-AGNs) challenge our basic ideas about the physics of accretion flows and circumnuclear gas around supermassive black holes. Using first-year Sloan Digital Sky Survey V (SDSS-V) repeated spectroscopy of nearly 29,000 previously known AGNs, combined with dedicated follow-up spectroscopy, and publicly available optical light curves, we have identified 116 CL-AGNs where (at least) one broad emission line has essentially (dis-)appeared, as well as 88 other extremely variable systems. Our CL-AGN sample, with 107 newly identified cases, is the largest reported to date, and includes $\sim0.4\%$ of the AGNs reobserved in first-year SDSS-V operations. Among our CL-AGNs, 67% exhibit dimming while 33% exhibit brightening. Our sample probes extreme AGN spectral variability on months to decades timescales, including some cases of recurring transitions on surprisingly short timescales ($\lesssim 2$ months in the rest frame). We find that CL events are preferentially found in lower-Eddington-ratio ($f_{Edd}$) systems: Our CL-AGNs have a $f_{Edd}$ distribution that significantly differs from that of a carefully constructed, redshift- and luminosity-matched control sample (Anderson-Darling test yielding $p_{\rm AD}\approx 6\times10^{-5}$; median $f_{Edd}\approx0.025$ vs. $0.043$). This preference for low $f_{Edd}$ strengthens previous findings of higher CL-AGN incidence at lower $f_{Edd}$, found in smaller samples. Finally, we show that the broad MgII emission line in our CL-AGN sample tends to vary significantly less than the broad H$βべーた$ emission line. Our large CL-AGN sample demonstrates the advantages and challenges in using multi-epoch spectroscopy from large surveys to study extreme AGN variability and physics. △ Less

Submitted 1 May, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: Submitted to ApJ. Full tables and figure-sets will be published upon acceptance, and can be made available upon request$.$

Journal ref: ApJ 966 85 (2024)

arXiv:2401.01643 [pdf, other]

S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

Authors: Qingyuan Yang, Guanzhou Chen, Xiaoliang Tan, Tong Wang, Jiaqi Wang, Xiaodong Zhang

Abstract: Stereo matching and semantic segmentation are significant tasks in binocular satellite 3D reconstruction. However, previous studies primarily view these as independent parallel tasks, lacking an integrated multitask learning framework. This work introduces a solution, the Single-branch Semantic Stereo Network (S3Net), which innovatively combines semantic segmentation and stereo matching using Self… ▽ More Stereo matching and semantic segmentation are significant tasks in binocular satellite 3D reconstruction. However, previous studies primarily view these as independent parallel tasks, lacking an integrated multitask learning framework. This work introduces a solution, the Single-branch Semantic Stereo Network (S3Net), which innovatively combines semantic segmentation and stereo matching using Self-Fuse and Mutual-Fuse modules. Unlike preceding methods that utilize semantic or disparity information independently, our method dentifies and leverages the intrinsic link between these two tasks, leading to a more accurate understanding of semantic information and disparity estimation. Comparative testing on the US3D dataset proves the effectiveness of our S3Net. Our model improves the mIoU in semantic segmentation from 61.38 to 67.39, and reduces the D1-Error and average endpoint error (EPE) in disparity estimation from 10.051 to 9.579 and 1.439 to 1.403 respectively, surpassing existing competitive methods. Our codes are available at:https://github.com/CVEO/S3Net. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.00776 [pdf, other]

Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy

Authors: Qin Yang

Abstract: In recent years, edge computing has served as a paradigm that enables many future technologies like AI, Robotics, IoT, and high-speed wireless sensor networks (like 5G) by connecting cloud computing facilities and services to the end users. Especially in medical and healthcare applications, it provides remote patient monitoring and increases voluminous multimedia. From the robotics angle, robot-as… ▽ More In recent years, edge computing has served as a paradigm that enables many future technologies like AI, Robotics, IoT, and high-speed wireless sensor networks (like 5G) by connecting cloud computing facilities and services to the end users. Especially in medical and healthcare applications, it provides remote patient monitoring and increases voluminous multimedia. From the robotics angle, robot-assisted therapy (RAT) is an active-assistive robotic technology in rehabilitation robotics, attracting many researchers to study and benefit people with disability like autism spectrum disorder (ASD) children. However, the main challenge of RAT is that the model capable of detecting the affective states of ASD people exists and can recall individual preferences. Moreover, involving expert diagnosis and recommendations to guide robots in updating the therapy approach to adapt to different statuses and scenarios is a crucial part of the ASD therapy process. This paper proposes the architecture of edge cognitive computing by combining human experts and assisted robots collaborating in the same framework to help ASD patients with long-term support. By integrating the real-time computing and analysis of a new cognitive robotic model for ASD therapy, the proposed architecture can achieve a seamless remote diagnosis, round-the-clock symptom monitoring, emergency warning, therapy alteration, and advanced assistance. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: This paper was accepted by the 38th AAAI 2024 workshop: "Cooperative Multi-Agent Systems Decision-Making and Learning: From Individual Needs to Swarm Intelligence"

arXiv:2401.00658 [pdf, other]

Point Cloud in the Air

Authors: Yulin Shao, Chenghong Bian, Li Yang, Qianqian Yang, Zhaoyang Zhang, Deniz Gunduz

Abstract: Acquisition and processing of point clouds (PCs) is a crucial enabler for many emerging applications reliant on 3D spatial data, such as robot navigation, autonomous vehicles, and augmented reality. In most scenarios, PCs acquired by remote sensors must be transmitted to an edge server for fusion, segmentation, or inference. Wireless transmission of PCs not only puts on increased burden on the alr… ▽ More Acquisition and processing of point clouds (PCs) is a crucial enabler for many emerging applications reliant on 3D spatial data, such as robot navigation, autonomous vehicles, and augmented reality. In most scenarios, PCs acquired by remote sensors must be transmitted to an edge server for fusion, segmentation, or inference. Wireless transmission of PCs not only puts on increased burden on the already congested wireless spectrum, but also confronts a unique set of challenges arising from the irregular and unstructured nature of PCs. In this paper, we meticulously delineate these challenges and offer a comprehensive examination of existing solutions while candidly acknowledging their inherent limitations. In response to these intricacies, we proffer four pragmatic solution frameworks, spanning advanced techniques, hybrid schemes, and distributed data aggregation approaches. In doing so, our goal is to chart a path toward efficient, reliable, and low-latency wireless PC transmission. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.17098 [pdf, ps, other]

Representation functions in the set of natural numbers

Authors: Shi-Qiang Chen, Csaba Sándor, Quan-Hui Yang

Abstract: Let $\mathbb{N}$ be the set of all nonnegative integers. For $S\subseteq \mathbb{N}$ and $n\in \mathbb{N}$, let $R_S(n)$ denote the number of solutions of the equation $n=s+s'$, $s, s'\in S$, $s<s'$. In this paper, we determine the structure of all sets $A$ and $B$ such that $A\cup B=\mathbb{N}\setminus\{r+mk:k\in\mathbb{N}\}$, $A\cap B=\emptyset$ and $R_{A}(n)=R_{B}(n)$ for every positive integ… ▽ More Let $\mathbb{N}$ be the set of all nonnegative integers. For $S\subseteq \mathbb{N}$ and $n\in \mathbb{N}$, let $R_S(n)$ denote the number of solutions of the equation $n=s+s'$, $s, s'\in S$, $s<s'$. In this paper, we determine the structure of all sets $A$ and $B$ such that $A\cup B=\mathbb{N}\setminus\{r+mk:k\in\mathbb{N}\}$, $A\cap B=\emptyset$ and $R_{A}(n)=R_{B}(n)$ for every positive integer $n$, where $m$ and $r$ are two integers with $m\ge 2$ and $r\ge 0$. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 32 pages

MSC Class: 11B34

arXiv:2312.17044 [pdf, other]

Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

Authors: Liang Zhao, Xiaocheng Feng, Xiachong Feng, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin, Ting Liu

Abstract: Transformer has taken the field of natural language processing (NLP) by storm since its birth. Further, Large language models (LLMs) built upon it have captured worldwide attention due to its superior abilities. Nevertheless, all Transformer-based models including these powerful LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones,… ▽ More Transformer has taken the field of natural language processing (NLP) by storm since its birth. Further, Large language models (LLMs) built upon it have captured worldwide attention due to its superior abilities. Nevertheless, all Transformer-based models including these powerful LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones, namely, they can not perform length extrapolation. Hence, a plethora of methods have been proposed to enhance length extrapolation of Transformer, in which the positional encoding (PE) is recognized as the major factor. In this survey, we present these advances towards length extrapolation in a unified notation from the perspective of PE. Specifically, we first introduce extrapolatable PEs, including absolute and relative PEs. Then, we dive into extrapolation methods based on them, covering position interpolation and randomized position methods. Finally, several challenges and future directions in this area are highlighted. Through this survey, We aim to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research. △ Less

Submitted 2 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: Work in progress

arXiv:2312.16554 [pdf, other]

A Theoretical Analysis of Efficiency Constrained Utility-Privacy Bi-Objective Optimization in Federated Learning

Authors: Hanlin Gu, Xinyuan Zhao, Gongxi Zhu, Yuxing Han, Yan Kang, Lixin Fan, Qiang Yang

Abstract: Federated learning (FL) enables multiple clients to collaboratively learn a shared model without sharing their individual data. Concerns about utility, privacy, and training efficiency in FL have garnered significant research attention. Differential privacy has emerged as a prevalent technique in FL, safeguarding the privacy of individual user data while impacting utility and training efficiency.… ▽ More Federated learning (FL) enables multiple clients to collaboratively learn a shared model without sharing their individual data. Concerns about utility, privacy, and training efficiency in FL have garnered significant research attention. Differential privacy has emerged as a prevalent technique in FL, safeguarding the privacy of individual user data while impacting utility and training efficiency. Within Differential Privacy Federated Learning (DPFL), previous studies have primarily focused on the utility-privacy trade-off, neglecting training efficiency, which is crucial for timely completion. Moreover, differential privacy achieves privacy by introducing controlled randomness (noise) on selected clients in each communication round. Previous work has mainly examined the impact of noise level ($σしぐま$) and communication rounds ($T$) on the privacy-utility dynamic, overlooking other influential factors like the sample ratio ($q$, the proportion of selected clients). This paper systematically formulates an efficiency-constrained utility-privacy bi-objective optimization problem in DPFL, focusing on $σしぐま$, $T$, and $q$. We provide a comprehensive theoretical analysis, yielding analytical solutions for the Pareto front. Extensive empirical experiments verify the validity and efficacy of our analysis, offering valuable guidance for low-cost parameter design in DPFL. △ Less

Submitted 29 January, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.16141 [pdf, other]

VirtualPainting: Addressing Sparsity with Virtual Points and Distance-Aware Data Augmentation for 3D Object Detection

Authors: Sudip Dhakal, Dominic Carrillo, Deyuan Qu, Michael Nutt, Qing Yang, Song Fu

Abstract: In recent times, there has been a notable surge in multimodal approaches that decorates raw LiDAR point clouds with camera-derived features to improve object detection performance. However, we found that these methods still grapple with the inherent sparsity of LiDAR point cloud data, primarily because fewer points are enriched with camera-derived features for sparsely distributed objects. We pres… ▽ More In recent times, there has been a notable surge in multimodal approaches that decorates raw LiDAR point clouds with camera-derived features to improve object detection performance. However, we found that these methods still grapple with the inherent sparsity of LiDAR point cloud data, primarily because fewer points are enriched with camera-derived features for sparsely distributed objects. We present an innovative approach that involves the generation of virtual LiDAR points using camera images and enhancing these virtual points with semantic labels obtained from image-based segmentation networks to tackle this issue and facilitate the detection of sparsely distributed objects, particularly those that are occluded or distant. Furthermore, we integrate a distance aware data augmentation (DADA) technique to enhance the models capability to recognize these sparsely distributed objects by generating specialized training samples. Our approach offers a versatile solution that can be seamlessly integrated into various 3D frameworks and 2D semantic segmentation methods, resulting in significantly improved overall detection accuracy. Evaluation on the KITTI and nuScenes datasets demonstrates substantial enhancements in both 3D and birds eye view (BEV) detection benchmarks △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.15880 [pdf, other]

KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph

Authors: Tiezheng Guo, Qingwen Yang, Chen Wang, Yanyi Liu, Pan Li, Jiawei Tang, Dapeng Li, Yingyou Wen

Abstract: Large language model (LLM) has achieved outstanding performance on various downstream tasks with its powerful natural language understanding and zero-shot capability, but LLM still suffers from knowledge limitation. Especially in scenarios that require long logical chains or complex reasoning, the hallucination and knowledge limitation of LLM limit its performance in question answering (QA). In th… ▽ More Large language model (LLM) has achieved outstanding performance on various downstream tasks with its powerful natural language understanding and zero-shot capability, but LLM still suffers from knowledge limitation. Especially in scenarios that require long logical chains or complex reasoning, the hallucination and knowledge limitation of LLM limit its performance in question answering (QA). In this paper, we propose a novel framework KnowledgeNavigator to address these challenges by efficiently and accurately retrieving external knowledge from knowledge graph and using it as a key factor to enhance LLM reasoning. Specifically, KnowledgeNavigator first mines and enhances the potential constraints of the given question to guide the reasoning. Then it retrieves and filters external knowledge that supports answering through iterative reasoning on knowledge graph with the guidance of LLM and the question. Finally, KnowledgeNavigator constructs the structured knowledge into effective prompts that are friendly to LLM to help its reasoning. We evaluate KnowledgeNavigator on multiple public KGQA benchmarks, the experiments show the framework has great effectiveness and generalization, outperforming previous knowledge graph enhanced LLM methods and is comparable to the fully supervised models. △ Less

Submitted 19 January, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.15762 [pdf, other]

On Robust Wasserstein Barycenter: The Model and Algorithm

Authors: Xu Wang, Jiawei Huang, Qingyuan Yang, Jinpeng Zhang

Abstract: The Wasserstein barycenter problem is to compute the average of $m$ given probability measures, which has been widely studied in many different areas; however, real-world data sets are often noisy and huge, which impedes its applications in practice. Hence, in this paper, we focus on improving the computational efficiency of two types of robust Wasserstein barycenter problem (RWB): fixed-support R… ▽ More The Wasserstein barycenter problem is to compute the average of $m$ given probability measures, which has been widely studied in many different areas; however, real-world data sets are often noisy and huge, which impedes its applications in practice. Hence, in this paper, we focus on improving the computational efficiency of two types of robust Wasserstein barycenter problem (RWB): fixed-support RWB (fixed-RWB) and free-support RWB (free-RWB); actually, the former is a subroutine of the latter. Firstly, we improve efficiency through model reducing; we reduce RWB as an augmented Wasserstein barycenter problem, which works for both fixed-RWB and free-RWB. Especially, fixed-RWB can be computed within $\widetilde{O}(\frac{mn^2}{εいぷしろん_+})$ time by using an off-the-shelf solver, where $εいぷしろん_+$ is the pre-specified additive error and $n$ is the size of locations of input measures. Then, for free-RWB, we leverage a quality guaranteed data compression technique, coreset, to accelerate computation by reducing the data set size $m$. It shows that running algorithms on the coreset is enough instead of on the original data set. Next, by combining the model reducing and coreset techniques above, we propose an algorithm for free-RWB by updating the weights and locations alternatively. Finally, our experiments demonstrate the efficiency of our techniques. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: Algorithms for accelerating robust Wasserstein barycenter problem

arXiv:2312.15584 [pdf]

doi 10.1103/PhysRevB.109.174412

Controllable magnon frequency comb in synthetic ferrimagnets

Authors: Y. Liu, T. T. Liu, Q. Q. Yang, G. Tian, Z. P. Hou, D. Y. Chen, Z. Fan, M. Zeng, X. B. Lu, X. S. Gao, M. H. Qin, J. M. Liu

Abstract: Magnon frequency comb provides opportunities for exploring magnon nonlinear effects and measuring the transmission magnon frequency in magnets, whose controllability becomes vital for modulating the operating frequency and improving the measurement accuracy. Nevertheless, such controllable frequency comb remains to be explored. In this work, we investigate theoretically and numerically the skyrmio… ▽ More Magnon frequency comb provides opportunities for exploring magnon nonlinear effects and measuring the transmission magnon frequency in magnets, whose controllability becomes vital for modulating the operating frequency and improving the measurement accuracy. Nevertheless, such controllable frequency comb remains to be explored. In this work, we investigate theoretically and numerically the skyrmion-induced magnon frequency comb effect generated by interaction between the magnon excitation mode and skyrmion breathing mode in synthetic ferrimagnets. It is revealed that both the skyrmion breathing mode and the magnon frequency gap closely depend on the net angular momentum δでるたs, emphasizing the pivotal role of δでるたs as an effective control parameter in governing the comb teeth. With the increase of δでるたs, the skyrmion size decreases, which results in the enlargement of the breathing frequency and the distance between the comb teeth. Moreover, the dependences of the magnon frequency gap on δでるたs and the inter-layer coupling allow one to modulate the comb lowest coherent frequency via structural control. Consequently, the coherent modes generated by the comb may range from gigahertz to terahertz frequencies, serving as a bridge between microwave and terahertz waves. Thus, this work represents a substantial advance in understanding the magnon frequency comb effect in ferrimagnets. △ Less

Submitted 11 March, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

Comments: 27 pages, 8 figures

Journal ref: Physical Review B 109, 174412 (2024)

arXiv:2312.11583 [pdf, other]

AI-Based Energy Transportation Safety: Pipeline Radial Threat Estimation Using Intelligent Sensing System

Authors: Chengyuan Zhu, Yiyuan Yang, Kaixiang Yang, Haifeng Zhang, Qinmin Yang, C. L. Philip Chen

Abstract: The application of artificial intelligence technology has greatly enhanced and fortified the safety of energy pipelines, particularly in safeguarding against external threats. The predominant methods involve the integration of intelligent sensors to detect external vibration, enabling the identification of event types and locations, thereby replacing manual detection methods. However, practical im… ▽ More The application of artificial intelligence technology has greatly enhanced and fortified the safety of energy pipelines, particularly in safeguarding against external threats. The predominant methods involve the integration of intelligent sensors to detect external vibration, enabling the identification of event types and locations, thereby replacing manual detection methods. However, practical implementation has exposed a limitation in current methods - their constrained ability to accurately discern the spatial dimensions of external signals, which complicates the authentication of threat events. Our research endeavors to overcome the above issues by harnessing deep learning techniques to achieve a more fine-grained recognition and localization process. This refinement is crucial in effectively identifying genuine threats to pipelines, thus enhancing the safety of energy transportation. This paper proposes a radial threat estimation method for energy pipelines based on distributed optical fiber sensing technology. Specifically, we introduce a continuous multi-view and multi-domain feature fusion methodology to extract comprehensive signal features and construct a threat estimation and recognition network. The utilization of collected acoustic signal data is optimized, and the underlying principle is elucidated. Moreover, we incorporate the concept of transfer learning through a pre-trained model, enhancing both recognition accuracy and training efficiency. Empirical evidence gathered from real-world scenarios underscores the efficacy of our method, notably in its substantial reduction of false alarms and remarkable gains in recognition accuracy. More generally, our method exhibits versatility and can be extrapolated to a broader spectrum of recognition tasks and scenarios. △ Less

Submitted 25 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

arXiv:2312.11111 [pdf, other]

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Authors: Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

Abstract: Emotion significantly impacts our daily behaviors and interactions. While recent generative AI models, such as large language models, have shown impressive performance in various tasks, it remains unclear whether they truly comprehend emotions. This paper aims to address this gap by incorporating psychological theories to gain a holistic understanding of emotions in generative AI models. Specifica… ▽ More Emotion significantly impacts our daily behaviors and interactions. While recent generative AI models, such as large language models, have shown impressive performance in various tasks, it remains unclear whether they truly comprehend emotions. This paper aims to address this gap by incorporating psychological theories to gain a holistic understanding of emotions in generative AI models. Specifically, we propose three approaches: 1) EmotionPrompt to enhance AI model performance, 2) EmotionAttack to impair AI model performance, and 3) EmotionDecode to explain the effects of emotional stimuli, both benign and malignant. Through extensive experiments involving language and multi-modal models on semantic understanding, logical reasoning, and generation tasks, we demonstrate that both textual and visual EmotionPrompt can boost the performance of AI models while EmotionAttack can hinder it. Additionally, EmotionDecode reveals that AI models can comprehend emotional stimuli akin to the mechanism of dopamine in the human brain. Our work heralds a novel avenue for exploring psychology to enhance our understanding of generative AI models. △ Less

Submitted 7 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: International Conference on Machine Learning (ICML) 2024; an extension to EmotionPrompt (arXiv:2307.11760)

arXiv:2312.11102 [pdf, ps, other]

Holographic Imaging with XL-MIMO and RIS: Illumination and Reflection Design

Authors: Giulia Torcolacci, Anna Guerra, Haiyang Zhang, Francesco Guidi, Qianyu Yang, Yonina C. Eldar, Davide Dardari

Abstract: This paper addresses a near-field imaging problem utilizing extremely large-scale multiple-input multiple-output (XL-MIMO) antennas and reconfigurable intelligent surfaces (RISs) already in place for wireless communications. To this end, we consider a system with a fixed transmitting antenna array illuminating a region of interest (ROI) and a fixed receiving antenna array inferring the ROI's scatt… ▽ More This paper addresses a near-field imaging problem utilizing extremely large-scale multiple-input multiple-output (XL-MIMO) antennas and reconfigurable intelligent surfaces (RISs) already in place for wireless communications. To this end, we consider a system with a fixed transmitting antenna array illuminating a region of interest (ROI) and a fixed receiving antenna array inferring the ROI's scattering coefficients. Leveraging XL-MIMO and high frequencies, the ROI is situated in the radiative near-field region of both antenna arrays, thus enhancing the degrees of freedom (DoF) (i.e., the channel matrix rank) of the illuminating and sensing channels available for imaging, here referred to as holographic imaging. To further boost the imaging performance, we optimize the illuminating waveform by solving a min-max optimization problem having the upper bound of the mean squared error (MSE) of the image estimate as the objective function. Additionally, we address the challenge of non-line-of-sight (NLOS) scenarios by considering the presence of a RIS and deriving its optimal reflection coefficients. Numerical results investigate the interplay between illumination optimization, geometric configuration (monostatic and bistatic), the DoF of the illuminating and sensing channels, image estimation accuracy, and image complexity. △ Less

Submitted 13 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.10640 [pdf, other]

Inverse design of coherent supercontinuum generation using free-form nanophotonic waveguides

Authors: Chia-Yi Lee, Yanwu Liu, Yinke Cheng, Cheng-Hao Lao, Qihuang Gong, Qi-Fan Yang

Abstract: Many key functionalities of optical frequency combs such as self-referencing and broad spectral access rely on coherent supercontinuum generation (SCG). While nanophotonic waveguides have emerged as a compact and power-efficient platform for SCG, their geometric degrees of freedom have not been fully utilized due to the underlying nonlinear and stochastic physics. Here, we introduce inverse design… ▽ More Many key functionalities of optical frequency combs such as self-referencing and broad spectral access rely on coherent supercontinuum generation (SCG). While nanophotonic waveguides have emerged as a compact and power-efficient platform for SCG, their geometric degrees of freedom have not been fully utilized due to the underlying nonlinear and stochastic physics. Here, we introduce inverse design to unlock free-form waveguides for coherent SCG. The efficacy of our design is numerically and experimentally demonstrated on Si3N4 waveguides, producing flat and coherent spectra from visible to mid-infrared wavelengths. Our work has direct applications in developing chip-based broadband light sources for spectroscopy, metrology, and sensing across multiple spectral regimes. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2312.10472 [pdf, other]

Analyzing Generalization in Policy Networks: A Case Study with the Double-Integrator System

Authors: Ruining Zhang, Haoran Han, Maolong Lv, Qisong Yang, Jian Cheng

Abstract: Extensive utilization of deep reinforcement learning (DRL) policy networks in diverse continuous control tasks has raised questions regarding performance degradation in expansive state spaces where the input state norm is larger than that in the training environment. This paper aims to uncover the underlying factors contributing to such performance deterioration when dealing with expanded state sp… ▽ More Extensive utilization of deep reinforcement learning (DRL) policy networks in diverse continuous control tasks has raised questions regarding performance degradation in expansive state spaces where the input state norm is larger than that in the training environment. This paper aims to uncover the underlying factors contributing to such performance deterioration when dealing with expanded state spaces, using a novel analysis technique known as state division. In contrast to prior approaches that employ state division merely as a post-hoc explanatory tool, our methodology delves into the intrinsic characteristics of DRL policy networks. Specifically, we demonstrate that the expansion of state space induces the activation function $\tanh$ to exhibit saturability, resulting in the transformation of the state division boundary from nonlinear to linear. Our analysis centers on the paradigm of the double-integrator system, revealing that this gradual shift towards linearity imparts a control behavior reminiscent of bang-bang control. However, the inherent linearity of the division boundary prevents the attainment of an ideal bang-bang control, thereby introducing unavoidable overshooting. Our experimental investigations, employing diverse RL algorithms, establish that this performance phenomenon stems from inherent attributes of the DRL policy network, remaining consistent across various optimization algorithms. △ Less

Submitted 31 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.10305 [pdf, other]

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Authors: Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang

Abstract: Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network. To overcome this challenge, we p… ▽ More Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network. To overcome this challenge, we propose a self-supervised disentangled representation learning method. Our approach tackles this issue through a two-phase process, utilizing a reference speech encoding network and a global information disentanglement network to gradually disentangle the speaker identity information from other irrelevant factors. We exclusively employ the disentangled speaker identity information to guide the speech extraction network. Moreover, we introduce the adaptive modulation Transformer to ensure that the acoustic representation of the mixed signal remains undisturbed by the speaker embeddings. This component incorporates speaker embeddings as conditional information, facilitating natural and efficient guidance for the speech extraction network. Experimental results substantiate the effectiveness of our meticulously crafted approach, showcasing a substantial reduction in the likelihood of speaker confusion. △ Less

Submitted 19 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI2024

arXiv:2312.10031 [pdf, other]

Secular Dynamics of Compact Three-Planet Systems

Authors: Qing Yang, Daniel Tamayo

Abstract: The secular Laplace-Lagrange orbital solution, decomposing eccentricities into a set of uniformly precessing eigenmodes is a classical result that is typically solved numerically. However, in the limit where orbits are closely spaced, several simplifications make it possible to make analytical progress. We derive simple expressions for the eccentricity eigenmodes in a co-planar 3-planet system whe… ▽ More The secular Laplace-Lagrange orbital solution, decomposing eccentricities into a set of uniformly precessing eigenmodes is a classical result that is typically solved numerically. However, in the limit where orbits are closely spaced, several simplifications make it possible to make analytical progress. We derive simple expressions for the eccentricity eigenmodes in a co-planar 3-planet system where the middle planet is massless, and show that these approximate the true eigenmodes of more general systems with 3 massive planets in various limits. These results provide intuition for the secular dynamics of real systems, and have applications for understanding the stability boundary for compact multi-planet systems. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 10 pages, submitted to ApJ

arXiv:2312.09007 [pdf, other]

LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution

Authors: Hongwei Cui, Yuyang Du, Qun Yang, Yulin Shao, Soung Chang Liew

Abstract: The exploration of large language models (LLMs) for task planning and IoT automation has recently gained significant attention. However, existing works suffer from limitations in terms of resource accessibility, complex task planning, and efficiency. In this paper, we present LLMind, an LLM-based AI agent framework that enables effective collaboration among IoT devices for executing complex tasks.… ▽ More The exploration of large language models (LLMs) for task planning and IoT automation has recently gained significant attention. However, existing works suffer from limitations in terms of resource accessibility, complex task planning, and efficiency. In this paper, we present LLMind, an LLM-based AI agent framework that enables effective collaboration among IoT devices for executing complex tasks. Inspired by the functional specialization theory of the brain, our framework integrates an LLM with domain-specific AI modules, enhancing its capabilities. Complex tasks, which may involve collaborations of multiple domain-specific AI modules and IoT devices, are executed through a control script generated by the LLM using a Language-Code transformation approach, which first converts language descriptions to an intermediate finite-state machine (FSM) before final precise transformation to code. Furthermore, the framework incorporates a novel experience accumulation mechanism to enhance response speed and effectiveness, allowing the framework to evolve and become progressively sophisticated through continuing user and machine interactions. △ Less

Submitted 20 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.06723 [pdf, other]

Learning to See Low-Light Images via Feature Domain Adaptation

Authors: Qirui Yang, Qihua Cheng, Huanjing Yue, Le Zhang, Yihao Liu, Jingyu Yang

Abstract: Raw low light image enhancement (LLIE) has achieved much better performance than the sRGB domain enhancement methods due to the merits of raw data. However, the ambiguity between noisy to clean and raw to sRGB mappings may mislead the single-stage enhancement networks. The two-stage networks avoid ambiguity by decoupling the two mappings but usually have large computing complexity. To solve this p… ▽ More Raw low light image enhancement (LLIE) has achieved much better performance than the sRGB domain enhancement methods due to the merits of raw data. However, the ambiguity between noisy to clean and raw to sRGB mappings may mislead the single-stage enhancement networks. The two-stage networks avoid ambiguity by decoupling the two mappings but usually have large computing complexity. To solve this problem, we propose a single-stage network empowered by Feature Domain Adaptation (FDA) to decouple the denoising and color mapping tasks in raw LLIE. The denoising encoder is supervised by the clean raw image, and then the denoised features are adapted for the color mapping task by an FDA module. We propose a Lineformer to serve as the FDA, which can well explore the global and local correlations with fewer line buffers (friendly to the line-based imaging process). During inference, the raw supervision branch is removed. In this way, our network combines the advantage of a two-stage enhancement process with the efficiency of single-stage inference. Experiments on four benchmark datasets demonstrate that our method achieves state-of-the-art performance with fewer computing costs (60% FLOPs of the two-stage method DNF). Our codes will be released after the acceptance of this work. △ Less

Submitted 19 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.06614 [pdf, other]

AttenScribble: Attentive Similarity Learning for Scribble-Supervised Medical Image Segmentation

Authors: Mu Tian, Qinzhu Yang, Yi Gao

Abstract: The success of deep networks in medical image segmentation relies heavily on massive labeled training data. However, acquiring dense annotations is a time-consuming process. Weakly-supervised methods normally employ less expensive forms of supervision, among which scribbles started to gain popularity lately thanks to its flexibility. However, due to lack of shape and boundary information, it is ex… ▽ More The success of deep networks in medical image segmentation relies heavily on massive labeled training data. However, acquiring dense annotations is a time-consuming process. Weakly-supervised methods normally employ less expensive forms of supervision, among which scribbles started to gain popularity lately thanks to its flexibility. However, due to lack of shape and boundary information, it is extremely challenging to train a deep network on scribbles that generalizes on unlabeled pixels. In this paper, we present a straightforward yet effective scribble supervised learning framework. Inspired by recent advances of transformer based segmentation, we create a pluggable spatial self-attention module which could be attached on top of any internal feature layers of arbitrary fully convolutional network (FCN) backbone. The module infuses global interaction while keeping the efficiency of convolutions. Descended from this module, we construct a similarity metric based on normalized and symmetrized attention. This attentive similarity leads to a novel regularization loss that imposes consistency between segmentation prediction and visual affinity. This attentive similarity loss optimizes the alignment of FCN encoders, attention mapping and model prediction. Ultimately, the proposed FCN+Attention architecture can be trained end-to-end guided by a combination of three learning objectives: partial segmentation loss, a customized masked conditional random fields and the proposed attentive similarity loss. Extensive experiments on public datasets (ACDC and CHAOS) showed that our framework not just out-performs existing state-of-the-art, but also delivers close performance to fully-supervised benchmark. Code will be available upon publication. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 11 pages, 3 figures, a modified version was submitted to Computerized Medical Imaging and Graphics and is under review

arXiv:2312.06462 [pdf, other]

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Authors: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

Abstract: Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral… ▽ More Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral relatiOns. For the first time, our framework explores three types of bilateral entanglements within AVS: pixel entanglement, modality entanglement, and temporal entanglement. Regarding pixel entanglement, we employ a Siam-Encoder Module (SEM) that leverages prior knowledge to generate more precise visual features from the foundational model. For modality entanglement, we design a Bilateral-Fusion Module (BFM), enabling COMBO to align corresponding visual and auditory signals bi-directionally. As for temporal entanglement, we introduce an innovative adaptive inter-frame consistency loss according to the inherent rules of temporal. Comprehensive experiments and ablation studies on AVSBench-object (84.7 mIoU on S4, 59.2 mIou on MS3) and AVSBench-semantic (42.1 mIoU on AVSS) datasets demonstrate that COMBO surpasses previous state-of-the-art methods. Code and more results will be publicly available at https://yannqi.github.io/AVS-COMBO/. △ Less

Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: CVPR 2024 Highlight. 13 pages, 10 figures

arXiv:2312.06097 [pdf, other]

doi 10.1007/s11433-023-2219-7

The FAST all sky HI survey (FASHI): The first release of catalog

Authors: Chuan-Peng Zhang, M. Zhu, P. Jiang, C. Cheng, J. Wang, J. Wang, J. -L. Xu, X. -L. Liu, N. -P. Yu, L. Qian, H. Yu, M. Ai, Y. Jing, C. Xu, Z. Liu, X. Guan, C. Sun, Q. Yang, M. Huang, Q. Hao, FAST Collaboration

Abstract: The FAST All Sky HI survey (FASHI) was designed to cover the entire sky observable by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), spanning approximately 22000 square degrees of declination between -14 deg and +66 deg, and in the frequency range of 1050-1450 MHz, with the expectation of eventually detecting more than 100000 HI sources. Between August 2020 and June 2023, FASHI… ▽ More The FAST All Sky HI survey (FASHI) was designed to cover the entire sky observable by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), spanning approximately 22000 square degrees of declination between -14 deg and +66 deg, and in the frequency range of 1050-1450 MHz, with the expectation of eventually detecting more than 100000 HI sources. Between August 2020 and June 2023, FASHI had covered more than 7600 square degrees, which is approximately 35% of the total sky observable by FAST. It has a median detection sensitivity of around 0.76 mJy/beam and a spectral line velocity resolution of ~6.4 km/s at a frequency of ~1.4 GHz. As of now, a total of 41741 extragalactic HI sources have been detected in the frequency range 1305.5-1419.5 MHz, corresponding to a redshift limit of z<0.09. By cross-matching FASHI sources with the Siena Galaxy Atlas (SGA) and the Sloan Digital Sky Survey (SDSS) catalogs, we found that 16972 (40.7%) sources have spectroscopic redshifts and 10975 (26.3%) sources have only photometric redshifts. Most of the remaining 13794 (33.0%) HI sources are located in the direction of the Galactic plane, making their optical counterparts difficult to identify due to high extinction or high contamination of Galactic stellar sources. Based on current survey results, the FASHI survey is an unprecedented blind extragalactic HI survey. It has higher spectral and spatial resolution and broader coverage than the Arecibo Legacy Fast ALFA Survey (ALFALFA). When completed, FASHI will provide the largest extragalactic HI catalog and an objective view of HI content and large-scale structure in the local universe. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 22 pages, 12 figures, published in SCPMA. All catalogs are available at https://zcp521.github.io/fashi and https://fast.bao.ac.cn/cms/article/271/

Journal ref: Sci. China-Phys. Mech. Astron. 67, 219511 (2024)

arXiv:2312.05886 [pdf, ps, other]

Edge-connectivity keeping trees in $k$-edge-connected graphs

Authors: Qing Yang, Yingzhi Tian

Abstract: Mader [J. Combin. Theory Ser. B 40 (1986) 152-158] proved that every $k$-edge-connected graph $G$ with minimum degree at least $k+1$ contains a vertex $u$ such that $G-\{u\}$ is still $k$-edge-connected. In this paper, we prove that every $k$-edge-connected graph $G$ with minimum degree at least $k+2$ contains an edge $uv$ such that $G-\{u,v\}$ is $k$-edge-connected for any positive integer $k$. I… ▽ More Mader [J. Combin. Theory Ser. B 40 (1986) 152-158] proved that every $k$-edge-connected graph $G$ with minimum degree at least $k+1$ contains a vertex $u$ such that $G-\{u\}$ is still $k$-edge-connected. In this paper, we prove that every $k$-edge-connected graph $G$ with minimum degree at least $k+2$ contains an edge $uv$ such that $G-\{u,v\}$ is $k$-edge-connected for any positive integer $k$. In addition, we show that for any tree $T$ of order $m$, every $k$-edge-connected graph $G$ with minimum degree greater than $4(k+m)^2$ contains a subtree $T'$ isomorphic to $T$ such that $G-V(T')$ is $k$-edge-connected. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.05062 [pdf, ps, other]

Deep Learning Enabled Semantic Communication Systems for Video Transmission

Authors: Zhenguo Zhang, Qianqian Yang, Shibo He, Jiming Chen

Abstract: Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compact… ▽ More Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compacts semantic-related information to improve transmission efficiency. In particular, we utilize the Bi-optical flow to estimate residual information of inter-frame details. We also propose a feature choice module and a feature fusion module to drop semantically redundant features while paying more attention to the important semantic-related content. We employ a frame prediction module to reconstruct semantic features of the prediction frame from the received signal at the receiver. To enhance the system's robustness, we propose a noise attention module that assigns different importance weights to the extracted features. Simulation results indicate that our proposed method outperforms existing approaches in terms of transmission efficiency, achieving about 33.3\% reduction in the number of transmitted symbols while improving the peak signal-to-noise ratio (PSNR) performance by an average of 0.56dBでしべる. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.05024 [pdf, other]

A Unified Framework for Unsupervised Domain Adaptation based on Instance Weighting

Authors: Jinjing Zhu, Feiyang Ye, Qiao Xiao, Pengxin Guo, Yu Zhang, Qiang Yang

Abstract: Despite the progress made in domain adaptation, solving Unsupervised Domain Adaptation (UDA) problems with a general method under complex conditions caused by label shifts between domains remains a formidable task. In this work, we comprehensively investigate four distinct UDA settings including closed set domain adaptation, partial domain adaptation, open set domain adaptation, and universal doma… ▽ More Despite the progress made in domain adaptation, solving Unsupervised Domain Adaptation (UDA) problems with a general method under complex conditions caused by label shifts between domains remains a formidable task. In this work, we comprehensively investigate four distinct UDA settings including closed set domain adaptation, partial domain adaptation, open set domain adaptation, and universal domain adaptation, where shared common classes between source and target domains coexist alongside domain-specific private classes. The prominent challenges inherent in diverse UDA settings center around the discrimination of common/private classes and the precise measurement of domain discrepancy. To surmount these challenges effectively, we propose a novel yet effective method called Learning Instance Weighting for Unsupervised Domain Adaptation (LIWUDA), which caters to various UDA settings. Specifically, the proposed LIWUDA method constructs a weight network to assign weights to each instance based on its probability of belonging to common classes, and designs Weighted Optimal Transport (WOT) for domain alignment by leveraging instance weights. Additionally, the proposed LIWUDA method devises a Separate and Align (SA) loss to separate instances with low similarities and align instances with high similarities. To guide the learning of the weight network, Intra-domain Optimal Transport (IOT) is proposed to enforce the weights of instances in common classes to follow a uniform distribution. Through the integration of those three components, the proposed LIWUDA method demonstrates its capability to address all four UDA settings in a unified manner. Experimental evaluations conducted on three benchmark datasets substantiate the effectiveness of the proposed LIWUDA method. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04822 [pdf, other]

SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles

Authors: Deyuan Qu, Qi Chen, Tianyu Bai, Andy Qin, Hongsheng Lu, Heng Fan, Song Fu, Qing Yang

Abstract: Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles. However, the absence of feature maps shared from other vehicles can lead to a significant decline in object detection performance for cooperative perception models compared to standalone 3D detection models. This drawback impedes the adoption of cooper… ▽ More Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles. However, the absence of feature maps shared from other vehicles can lead to a significant decline in object detection performance for cooperative perception models compared to standalone 3D detection models. This drawback impedes the adoption of cooperative perception as vehicle resources are often insufficient to concurrently employ two perception models. To tackle this issue, we present Simultaneous Individual and Cooperative Perception (SiCP), a generic framework that supports a wide range of the state-of-the-art standalone perception backbones and enhances them with a novel Dual-Perception Network (DP-Net) designed to facilitate both individual and cooperative perception. In addition to its lightweight nature with only 0.13M parameters, DP-Net is robust and retains crucial gradient information during feature map fusion. As demonstrated in a comprehensive evaluation on the OPV2V dataset, thanks to DP-Net, SiCP surpasses state-of-the-art cooperative perception solutions while preserving the performance of standalone perception solutions. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04366 [pdf, other]

Chirality induced spin selectivity in chiral crystals

Authors: Qun Yang, Yongkang Li, Claudia Felser, Binghai Yan

Abstract: Chirality is a fundamental property of great importance in physics, chemistry, and biology, and has recently been found to generate unexpected spin polarization for electrons passing through organic molecules, known as chirality-induced spin selectivity (CISS). CISS shows promising application potential in spintronic devices, spin-controlled chemistry, and enantiomer separation. It focuses on orga… ▽ More Chirality is a fundamental property of great importance in physics, chemistry, and biology, and has recently been found to generate unexpected spin polarization for electrons passing through organic molecules, known as chirality-induced spin selectivity (CISS). CISS shows promising application potential in spintronic devices, spin-controlled chemistry, and enantiomer separation. It focuses on organic molecules that exhibit poor electronic conductivity and inherent complexities, such as the debated role of SOC at the molecule-metal interface. In this work, we go beyond organic molecules and study chiral solids with excellent electrical conductivity, intrinsic SOC, and topological electronic structures. We demonstrate that electrons exhibit both spin and orbital polarization as they pass through chiral crystals. Both polarization increases with material thickness before saturating to the bulk values. The spin polarization is proportional to intrinsic SOC while the orbital polarization is insensitive to SOC. The large spin polarization comes with strong electrical magnetochiral anisotropy in the magneto-transport of these chiral crystals (e.g., RhSi). Our work reveals the interplay of chirality, electron spin, and orbital in chiral crystals, paving the way for developing chiral solids for chirality-induced phenomena. △ Less

Submitted 23 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 23 pages, 5 figures

arXiv:2312.00360 [pdf, other]

Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning

Authors: Shaohua Dong, Yunhe Feng, Qing Yang, Yan Huang, Dongfang Liu, Heng Fan

Abstract: Multimodal (e.g., RGB-Depth/RGB-Thermal) fusion has shown great potential for improving semantic segmentation in complex scenes (e.g., indoor/low-light conditions). Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates… ▽ More Multimodal (e.g., RGB-Depth/RGB-Thermal) fusion has shown great potential for improving semantic segmentation in complex scenes (e.g., indoor/low-light conditions). Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion. To address this issue, we propose a surprisingly simple yet effective dual-prompt learning network (dubbed DPLNet) for training-efficient multimodal (e.g., RGB-D/T) semantic segmentation. The core of DPLNet is to directly adapt a frozen pre-trained RGB model to multimodal semantic segmentation, reducing parameter updates. For this purpose, we present two prompt learning modules, comprising multimodal prompt generator (MPG) and multimodal feature adapter (MFA). MPG works to fuse the features from different modalities in a compact manner and is inserted from shadow to deep stages to generate the multi-level multimodal prompts that are injected into the frozen backbone, while MPG adapts prompted multimodal features in the frozen backbone for better multimodal semantic segmentation. Since both the MPG and MFA are lightweight, only a few trainable parameters (3.88M, 4.4% of the pre-trained backbone parameters) are introduced for multimodal feature fusion and learning. Using a simple decoder (3.27M parameters), DPLNet achieves new state-of-the-art performance or is on a par with other complex approaches on four RGB-D/T semantic segmentation datasets while satisfying parameter efficiency. Moreover, we show that DPLNet is general and applicable to other multimodal tasks such as salient object detection and video semantic segmentation. Without special design, DPLNet outperforms many complicated models. Our code will be available at github.com/ShaohuaDong2021/DPLNet. △ Less

Submitted 3 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: 11 pages, 4 figures, 9 tables

arXiv:2311.17431 [pdf, other]

Grounding Foundation Models through Federated Transfer Learning: A General Framework

Authors: Yan Kang, Tao Fan, Hanlin Gu, Xiaojin Zhang, Lixin Fan, Qiang Yang

Abstract: Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges,… ▽ More Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges, stemming primarily from constrained computing resources, data privacy, model heterogeneity, and model ownership. Federated Transfer Learning (FTL), the combination of federated learning and transfer learning, provides promising solutions to address these challenges. In recent years, the need for grounding FMs leveraging FTL, coined FTL-FM, has arisen strongly in both academia and industry. Motivated by the strong growth in FTL-FM research and the potential impact of FTL-FM on industrial applications, we propose an FTL-FM framework that formulates problems of grounding FMs in the federated learning setting, construct a detailed taxonomy based on the FTL-FM framework to categorize state-of-the-art FTL-FM works, and comprehensively overview FTL-FM works based on the proposed taxonomy. We also establish correspondences between FTL-FM and conventional phases of adapting FM so that FM practitioners can align their research works with FTL-FM. In addition, we overview advanced efficiency-improving and privacy-preserving techniques because efficiency and privacy are critical concerns in FTL-FM. Last, we discuss opportunities and future research directions of FTL-FM. △ Less

Submitted 29 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: In progress

arXiv:2311.17401 [pdf, ps, other]

Gene-MOE: A sparsely gated prognosis and classification framework exploiting pan-cancer genomic information

Authors: Xiangyu Meng, Xue Li, Qing Yang, Huanhuan Dai, Lian Qiao, Hongzhen Ding, Long Hao, Xun Wang

Abstract: Benefiting from the advancements in deep learning, various genomic analytical techniques, such as survival analysis, classification of tumors and their subtypes, and exploration of specific pathways, have significantly enhanced our understanding of the biological mechanisms driving cancer. However, the overfitting issue, arising from the limited number of patient samples, poses a challenge in impr… ▽ More Benefiting from the advancements in deep learning, various genomic analytical techniques, such as survival analysis, classification of tumors and their subtypes, and exploration of specific pathways, have significantly enhanced our understanding of the biological mechanisms driving cancer. However, the overfitting issue, arising from the limited number of patient samples, poses a challenge in improving the accuracy of genome analysis by deepening the neural network. Furthermore, it remains uncertain whether novel approaches such as the sparsely gated mixture of expert (MOE) and self-attention mechanisms can improve the accuracy of genomic analysis. In this paper, we introduce a novel sparsely gated RNA-seq analysis framework called Gene-MOE. This framework exploits the potential of the MOE layers and the proposed mixture of attention expert (MOAE) layers to enhance the analysis accuracy. Additionally, it addresses overfitting challenges by integrating pan-cancer information from 33 distinct cancer types through pre-training.We pre-trained Gene-MOE on TCGA pan-cancer RNA-seq dataset with 33 cancer types. Subsequently, we conducted experiments involving cancer classification and survival analysis based on the pre-trained Gene-MOE. According to the survival analysis results on 14 cancer types, Gene-MOE outperformed state-of-the-art models on 12 cancer types. Through detailed feature analysis, we found that the Gene-MOE model could learn rich feature representations of high-dimensional genes. According to the classification results, the total accuracy of the classification model for 33 cancer classifications reached 95.8%, representing the best performance compared to state-of-the-art models. These results indicate that Gene-MOE holds strong potential for use in cancer classification and survival analysis. △ Less

Submitted 18 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.16938 [pdf, other]

New insights into the doubly charmed exotic mesons

Authors: Di Guo, Qin-He Yang, Ling-Yun Dai, A. P. Szczepaniak

Abstract: Using effective Lagrangians constrained by the heavy quark spin symmetry and chiral symmetry, for the light quarks, we analyze the $D^0 D^0πぱい^+$, $\bar{D}^0D^0πぱい^0$ and $D^0\bar{D}^{*0}$ invariant mass spectra. Performing a simultaneous analysis of the doubly charmed and charm-anti-charm states gives further insights into the nature of the $T^+_{cc}$ and $χかい^0_{c1}(3872)$, exotic hadrons. We find tha… ▽ More Using effective Lagrangians constrained by the heavy quark spin symmetry and chiral symmetry, for the light quarks, we analyze the $D^0 D^0πぱい^+$, $\bar{D}^0D^0πぱい^0$ and $D^0\bar{D}^{*0}$ invariant mass spectra. Performing a simultaneous analysis of the doubly charmed and charm-anti-charm states gives further insights into the nature of the $T^+_{cc}$ and $χかい^0_{c1}(3872)$, exotic hadrons. We find that both states lie below the respective $DD^*$/$D\bar{D}^*$ thresholds. Also, the contributions of the triangle and box diagrams are negligible. △ Less

Submitted 10 May, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 5 pages, 5 figures

arXiv:2311.16919 [pdf, other]

Contribution of coherent electron production to measurements of heavy-flavor decayed electrons in heavy-ion collisions

Authors: Shenghui Zhang, Rongrong Ma, Yuanjing Ji, Zebo Tang, Qian Yang, Yifei Zhang, Wangwei Zha

Abstract: Heavy quarks, produced at early stages of heavy-ion collisions, are an excellent probe of the Quark-Gluon Plasma (QGP) also created in these collisions. Electrons from open heavy-flavor hadron decays (HFE) are good proxies for heavy quarks, and have been measured extensively in the last two decades to study QGP properties. These measurements are traditionally carried out by subtracting all known b… ▽ More Heavy quarks, produced at early stages of heavy-ion collisions, are an excellent probe of the Quark-Gluon Plasma (QGP) also created in these collisions. Electrons from open heavy-flavor hadron decays (HFE) are good proxies for heavy quarks, and have been measured extensively in the last two decades to study QGP properties. These measurements are traditionally carried out by subtracting all known background sources from the inclusive electron sample. More recently, a significant enhancement of $e^+e^-$ pair production at very low transverse momenta was observed in peripheral heavy-ion collisions. The production characteristics is consistent with coherent photon-photon interactions, which should also constitute a background source to the HFE measurements. In this article, we provide theoretical predictions for the contribution of coherent electron production to HFE as a function of transverse momentum, centrality and collision energy in Au+Au and Pb+Pb collisions. △ Less

Submitted 15 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.16167 [pdf, other]

Moving Sampling Physics-informed Neural Networks induced by Moving Mesh PDE

Authors: Yu Yang, Qihong Yang, Yangtao Deng, Qiaolin He

Abstract: In this work, we propose an end-to-end adaptive sampling neural network (MMPDE-Net) based on the moving mesh method, which can adaptively generate new sampling points by solving the moving mesh PDE. This model focuses on improving the quality of sampling points generation. Moreover, we develop an iterative algorithm based on MMPDE-Net, which makes the sampling points more precise and controllable.… ▽ More In this work, we propose an end-to-end adaptive sampling neural network (MMPDE-Net) based on the moving mesh method, which can adaptively generate new sampling points by solving the moving mesh PDE. This model focuses on improving the quality of sampling points generation. Moreover, we develop an iterative algorithm based on MMPDE-Net, which makes the sampling points more precise and controllable. Since MMPDE-Net is a framework independent of the deep learning solver, we combine it with physics-informed neural networks (PINN) to propose moving sampling PINN (MS-PINN) and demonstrate its effectiveness by error analysis under some assumptions. Finally, we demonstrate the performance improvement of MS-PINN compared to PINN through numerical experiments of four typical examples, which numerically verify the effectiveness of our method. △ Less

Submitted 9 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.15451 [pdf, other]

Uncertainty-aware Language Modeling for Selective Question Answering

Authors: Qi Yang, Shreya Ravikumar, Fynn Schmitt-Ulms, Satvik Lolla, Ege Demir, Iaroslav Elistratov, Alex Lavaee, Sadhana Lolla, Elaheh Ahmadi, Daniela Rus, Alexander Amini, Alejandro Perez

Abstract: We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs capable of estimating uncertainty with every prediction. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems. We evaluate converted models on the selective question answering setting -- to answer as many questions as possibl… ▽ More We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs capable of estimating uncertainty with every prediction. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems. We evaluate converted models on the selective question answering setting -- to answer as many questions as possible while maintaining a given accuracy, forgoing providing predictions when necessary. As part of our results, we test BERT and Llama 2 model variants on the SQuAD extractive QA task and the TruthfulQA generative QA task. We show that using the uncertainty estimates provided by our approach to selectively answer questions leads to significantly higher accuracy over directly using model probabilities. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.13381 [pdf, other]

Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training

Authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Jiming Chen

Abstract: Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art… ▽ More Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices like smartphones. Confidant partitions an LLM into several sub-models so that each fits into a mobile device's memory. A pipeline parallel training mechanism is further developed to ensure fast and efficient distributed training. In addition, we propose a novel backend scheduler to allocate different attention heads to heterogeneous compute hardware, including mobile CPU and GPUs, to maximize the compute resource utilization on each edge device. Our preliminary experimental results show that Confidant achieves at most 45.3% memory reduction and 8.03x inference speedup in practical settings. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 6 pages, 7 figures; Submitted to HotMobile 2024

arXiv:2311.13217 [pdf, other]

Controllable orbital angular momentum monopoles in chiral topological semimetals

Authors: Yun Yen, Jonas A. Krieger, Mengyu Yao, Iñigo Robredo, Kaustuv Manna, Qun Yang, Emily C. McFarlane, Chandra Shekhar, Horst Borrmann, Samuel Stolz, Roland Widmer, Oliver Gröning, Vladimir N. Strocov, Stuart S. P. Parkin, Claudia Felser, Maia G. Vergniory, Michael Schüler, Niels B. M. Schröter

Abstract: The emerging field of orbitronics aims at generating and controlling currents of electronic orbital angular momentum (OAM) for information processing. Structurally chiral topological crystals could be particularly suitable orbitronic materials because they have been predicted to host topological band degeneracies in reciprocal space that are monopoles of OAM. Around such a monopole, the OAM is loc… ▽ More The emerging field of orbitronics aims at generating and controlling currents of electronic orbital angular momentum (OAM) for information processing. Structurally chiral topological crystals could be particularly suitable orbitronic materials because they have been predicted to host topological band degeneracies in reciprocal space that are monopoles of OAM. Around such a monopole, the OAM is locked isotopically parallel or antiparallel to the direction of the electron's momentum, which could be used to generate large and controllable OAM currents. However, OAM monopoles have not yet been directly observed in chiral crystals, and no handle to control their polarity has been discovered. Here, we use circular dichroism in angle-resolved photoelectron spectroscopy (CD-ARPES) to image OAM monopoles in the chiral topological semimetals PtGa and PdGa. Moreover, we also demonstrate that the polarity of the monopole can be controlled via the structural handedness of the host crystal by imaging OAM monopoles and anti-monopoles in the two enantiomers of PdGa, respectively. For most photon energies used in our study, we observe a sign change in the CD-ARPES spectrum when comparing positive and negative momenta along the light direction near the topological degeneracy. This is consistent with the conventional view that CD-ARPES measures the projection of the OAM monopole along the photon momentum. For some photon energies, however, this sign change disappears, which can be understood from our numerical simulations as the interference of polar atomic OAM contributions, consistent with the presence of OAM monopoles. Our results highlight the potential of chiral crystals for orbitronic device applications, and our methodology could enable the discovery of even more complicated nodal OAM textures that could be exploited for orbitronics. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 16 pages, 8 figures

arXiv:2311.11211 [pdf]

Leveraging Generative AI for Clinical Evidence Summarization Needs to Ensure Trustworthiness

Authors: Gongbo Zhang, Qiao Jin, Denis Jered McInerney, Yong Chen, Fei Wang, Curtis L. Cole, Qian Yang, Yanshan Wang, Bradley A. Malin, Mor Peleg, Byron C. Wallace, Zhiyong Lu, Chunhua Weng, Yifan Peng

Abstract: Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, ho… ▽ More Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence. △ Less

Submitted 31 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

arXiv:2311.10944 [pdf, other]

Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks

Authors: Panfeng Li, Mohamed Abouelenien, Rada Mihalcea, Zhicheng Ding, Qikai Yang, Yiming Zhou

Abstract: Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contri… ▽ More Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contributions. First, we extract linguistic and physiological features from this data to train and construct the neural network models. Second, we propose a fused convolutional neural network model using both modalities in order to achieve an improved overall performance. Third, we compare our new approach with earlier methods designed for multimodal deception detection. We find that our system outperforms regular classification methods; our results indicate the feasibility of using neural networks for deception detection even in the presence of limited amounts of data. △ Less

Submitted 26 June, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: Accepted by 2024 5th International Conference on Information Science, Parallel and Distributed Systems

arXiv:2311.10070 [pdf, ps, other]

Conformally Covariant Boundary Operators and Sharp Higher Order Sobolev Trace Inequalities on Poincaré-Einstein Manifolds

Authors: Joshua Flynn, Guozhen Lu, Qiaohua Yang

Abstract: In this paper we introduce conformally covariant boundary operators for Poincaré-Einstein manifolds satisfying a mild spectral assumption. Using these boundary operators we set up higher order Dirichlet problems whose solutions are such that, when applied to by our boundary operators, they recover the fractional order GJMS operators on the conformal infinity of the manifold. We moreover obtain all… ▽ More In this paper we introduce conformally covariant boundary operators for Poincaré-Einstein manifolds satisfying a mild spectral assumption. Using these boundary operators we set up higher order Dirichlet problems whose solutions are such that, when applied to by our boundary operators, they recover the fractional order GJMS operators on the conformal infinity of the manifold. We moreover obtain all related higher order trace Sobolev inequalities on these manifolds. In conjunction with Beckner's fractional Sobolev inequalities on the sphere, we obtain as an application the sharp higher order Sobolev trace inequalities on the ball. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 41 pages

arXiv:2311.09956 [pdf, ps, other]

Conformally Covariant Boundary Operators and Sharp Higher Order CR Sobolev Trace Inequalities on the Siegel Domain and Complex Ball

Authors: Joshua Flynn, Guozhen Lu, Qiaohua Yang

Abstract: We first introduce an appropriate family of conformally covariant boundary operators associated to the Siegel domain ${\mathcal U}^{n+1}$ with the Heisenberg group $\mathbb{H}^{n}$ as its boundary and the complex ball $\mathbb{B}_{\mathbb{C}}^{n+1}$ with the complex sphere $\mathbb{S}^{2n+1}$ as its boundary. We provide the explicit formulas of these conformally covariant boundary operators. Secon… ▽ More We first introduce an appropriate family of conformally covariant boundary operators associated to the Siegel domain ${\mathcal U}^{n+1}$ with the Heisenberg group $\mathbb{H}^{n}$ as its boundary and the complex ball $\mathbb{B}_{\mathbb{C}}^{n+1}$ with the complex sphere $\mathbb{S}^{2n+1}$ as its boundary. We provide the explicit formulas of these conformally covariant boundary operators. Second, we establish all higher order extension theorems of Caffarelli-Silvestre type for the Siegel domain and complex ball. Third, we prove all higher order CR Sobolev trace inequalities for the Siegel domain ${\mathcal U}^{n+1}$ and the complex ball $\mathbb{B}_{\mathbb{C}}^{n+1}$.In particular, we generalize the Sobolev trace inequalityfor $γがんま\in (0, 1)$ in the CR setting by Frank-González-Monticelli-Tan to the case for all $γがんま\in (0, n+1)\backslash \mathbb{N}$. The family of higher order conformally covariant boundary operators we define are naturally intrinsic to the higher order Sobolev trace inequalities on both the Siegel domain ${\mathcal U}^{n+1}$ and complex ball $\mathbb{B}_{\mathbb{C}^{n+1}}$. Finally, we give an explicit solution to the scattering problem on the complex hyperbolic ball. More precisely, we obtain an integral representation and an expansion in terms of special functions for the solution to the scattering problem. △ Less

Submitted 22 January, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: 40 pages. A new section (Section 8) is added. This section gives an explicit solution to the scattering problem on the complex hyperbolic ball

arXiv:2311.07919 [pdf, other]

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Authors: Yunfei Chu, Jin Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, Jingren Zhou

Abstract: Recently, instruction-following audio-language models have received broad attention for audio interaction with humans. However, the absence of pre-trained audio models capable of handling diverse audio types and tasks has hindered progress in this field. Consequently, most existing works have only been able to support a limited range of interaction capabilities. In this paper, we develop the Qwen-… ▽ More Recently, instruction-following audio-language models have received broad attention for audio interaction with humans. However, the absence of pre-trained audio models capable of handling diverse audio types and tasks has hindered progress in this field. Consequently, most existing works have only been able to support a limited range of interaction capabilities. In this paper, we develop the Qwen-Audio model and address this limitation by scaling up audio-language pre-training to cover over 30 tasks and various audio types, such as human speech, natural sounds, music, and songs, to facilitate universal audio understanding abilities. However, directly co-training all tasks and datasets can lead to interference issues, as the textual labels associated with different datasets exhibit considerable variations due to differences in task focus, language, granularity of annotation, and text structure. To overcome the one-to-many interference, we carefully design a multi-task training framework by conditioning on a sequence of hierarchical tags to the decoder for encouraging knowledge sharing and avoiding interference through shared and specified tags respectively. Remarkably, Qwen-Audio achieves impressive performance across diverse benchmark tasks without requiring any task-specific fine-tuning, surpassing its counterparts. Building upon the capabilities of Qwen-Audio, we further develop Qwen-Audio-Chat, which allows for input from various audios and text inputs, enabling multi-turn dialogues and supporting various audio-central scenarios. △ Less

Submitted 21 December, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: The code, checkpoints and demo are released at https://github.com/QwenLM/Qwen-Audio

arXiv:2311.06750 [pdf, other]

Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark

Authors: Wenke Huang, Mang Ye, Zekun Shi, Guancheng Wan, He Li, Bo Du, Qiang Yang

Abstract: Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties. Recently, with the popularity of federated learning, an influx of approaches have delivered towards different realistic challenges. In this survey, we provide a systematic overview of the important and recent developments of research on federated learning. Firstly, we introduce the… ▽ More Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties. Recently, with the popularity of federated learning, an influx of approaches have delivered towards different realistic challenges. In this survey, we provide a systematic overview of the important and recent developments of research on federated learning. Firstly, we introduce the study history and terminology definition of this area. Then, we comprehensively review three basic lines of research: generalization, robustness, and fairness, by introducing their respective background concepts, task settings, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out several open issues in this field and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/WenkeHuang/MarsFL. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 22 pages, 4 figures

arXiv:2311.06463 [pdf, other]

Self-suppressed quantum diffusion and fundamental noise limit of soliton microcombs

Authors: Xing Jin, Zhe Lv, Qihuang Gong, Qi-Fan Yang

Abstract: Quantum diffusion of soliton microcombs has long been recognized as their fundamental noise limit. Here we surpass such limit by utilizing dispersive wave dynamics in multimode microresonators. Through the recoil force provided by these dispersive waves, the quantum diffusion can be suppressed to a much lower level that forms the ultimate fundamental noise limit of soliton microcombs. Our findings… ▽ More Quantum diffusion of soliton microcombs has long been recognized as their fundamental noise limit. Here we surpass such limit by utilizing dispersive wave dynamics in multimode microresonators. Through the recoil force provided by these dispersive waves, the quantum diffusion can be suppressed to a much lower level that forms the ultimate fundamental noise limit of soliton microcombs. Our findings enable coherence engineering of soliton microcombs in the quantum-limited regime, providing critical guidelines for using soliton microcombs to synthesize ultralow-noise microwave and optical signals. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 8 pages, 5 figures

arXiv:2311.05837 [pdf]

High-efficiency edge couplers enabled by vertically tapering on lithium-niobate photonic chips

Authors: Di Jia, Qiang Luo, Chen Yang, Rui Ma, Xuanyi Yu, Feng Gao, Qifan Yang, Fang Bo, Guoquan Zhang, Jingjun Xu

Abstract: In the past decade, photonic integrated circuits (PICs) based on thin-film lithium niobate (TFLN) have advanced in various fields, including optical communication, nonlinear photonics, and quantum optics. A critical component is an efficient edge coupler connecting PICs to light sources or detectors. Here, we propose an innovative edge coupler design with a wedge-shaped TFLN waveguide and a silico… ▽ More In the past decade, photonic integrated circuits (PICs) based on thin-film lithium niobate (TFLN) have advanced in various fields, including optical communication, nonlinear photonics, and quantum optics. A critical component is an efficient edge coupler connecting PICs to light sources or detectors. Here, we propose an innovative edge coupler design with a wedge-shaped TFLN waveguide and a silicon oxynitride (SiON) cladding. Experimental results show that the coupling loss between the TFLN PIC and a 3-μみゅーm mode field diameter (MFD) lensed fiber is low at 1.52 dBでしべる/facet, with the potential for improvement to 0.43 dBでしべる/facet theoretically. The coupling loss between the edge coupler and a UHNA7 fiber with an MFD of 3.2 μみゅーm is reduced to 0.92 dBでしべる/facet. This design maintains robust fabrication and alignment tolerance. Importantly, the minimum linewidth of the TFLN waveguide of the coupler (600 nm) can be easily achieved using foundry-available i-line stepper lithography. This work benefits the development of TFLN integrated platforms, such as on-chip electro-optic modulators, frequency comb generation, and quantum sensors. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.05827 [pdf, other]

AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

Authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Zhiguo Shi, Jiming Chen

Abstract: It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communicat… ▽ More It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communication overhead caused by the large amount of data transmitted from one device to another during training, as well as the sub-optimal partition point due to the inaccurate latency prediction of computation at each edge device can significantly slow down training. In this paper, we propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the computation latency of each layer at different devices, which also adapts to unseen devices through continuous learning. Therefore, the proposed latency predictor leads to better model partitioning which balances the computation loads across participating devices. Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster in the considered experimental settings. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.03612 [pdf, other]

BlockEmulator: An Emulator Enabling to Test Blockchain Sharding Protocols

Authors: Huawei Huang, Guang Ye, Qinde Chen, Zhaokang Yin, Xiaofei Luo, Jianru Lin, Taotao Li, Qinglin Yang, Zibin Zheng

Abstract: Numerous blockchain simulators have been proposed to allow researchers to simulate mainstream blockchains. However, we have not yet found a testbed that enables researchers to develop and evaluate their new consensus algorithms or new protocols for blockchain sharding systems. To fill this gap, we develop BlockEmulator, which is designed as an experimental platform, particularly for emulating bloc… ▽ More Numerous blockchain simulators have been proposed to allow researchers to simulate mainstream blockchains. However, we have not yet found a testbed that enables researchers to develop and evaluate their new consensus algorithms or new protocols for blockchain sharding systems. To fill this gap, we develop BlockEmulator, which is designed as an experimental platform, particularly for emulating blockchain sharding mechanisms. BlockEmulator adopts a lightweight blockchain architecture such that developers can only focus on implementing their new protocols or mechanisms. Using layered modules and useful programming interfaces offered by BlockEmulator, researchers can implement a new protocol with minimum effort. Through experiments, we test various functionalities of BlockEmulator in two steps. Firstly, we prove the correctness of the emulation results yielded by BlockEmulator by comparing the theoretical analysis with the observed experiment results. Secondly, other experimental results demonstrate that BlockEmulator can facilitate the measurement of a series of metrics, including throughput, transaction confirmation latency, cross-shard transaction ratio, the queuing size of transaction pools, workload distribution across blockchain shards, etc. We have made BlockEmulator open-source in Github. △ Less

Submitted 11 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.03500 [pdf]

Predicting Age from White Matter Diffusivity with Residual Learning

Authors: Chenyu Gao, Michael E. Kim, Ho Hin Lee, Qi Yang, Nazirah Mohd Khairi, Praitayini Kanakaraj, Nancy R. Newlin, Derek B. Archer, Angela L. Jefferson, Warren D. Taylor, Brian D. Boyd, Lori L. Beason-Held, Susan M. Resnick, The BIOCARD Study Team, Yuankai Huo, Katherine D. Van Schaik, Kurt G. Schilling, Daniel Moyer, Ivana Išgum, Bennett A. Landman

Abstract: Imaging findings inconsistent with those expected at specific chronological age ranges may serve as early indicators of neurological disorders and increased mortality risk. Estimation of chronological age, and deviations from expected results, from structural MRI data has become an important task for developing biomarkers that are sensitive to such deviations. Complementary to structural analysis,… ▽ More Imaging findings inconsistent with those expected at specific chronological age ranges may serve as early indicators of neurological disorders and increased mortality risk. Estimation of chronological age, and deviations from expected results, from structural MRI data has become an important task for developing biomarkers that are sensitive to such deviations. Complementary to structural analysis, diffusion tensor imaging (DTI) has proven effective in identifying age-related microstructural changes within the brain white matter, thereby presenting itself as a promising additional modality for brain age prediction. Although early studies have sought to harness DTI's advantages for age estimation, there is no evidence that the success of this prediction is owed to the unique microstructural and diffusivity features that DTI provides, rather than the macrostructural features that are also available in DTI data. Therefore, we seek to develop white-matter-specific age estimation to capture deviations from normal white matter aging. Specifically, we deliberately disregard the macrostructural information when predicting age from DTI scalar images, using two distinct methods. The first method relies on extracting only microstructural features from regions of interest. The second applies 3D residual neural networks (ResNets) to learn features directly from the images, which are non-linearly registered and warped to a template to minimize macrostructural variations. When tested on unseen data, the first method yields mean absolute error (MAE) of 6.11 years for cognitively normal participants and MAE of 6.62 years for cognitively impaired participants, while the second method achieves MAE of 4.69 years for cognitively normal participants and MAE of 4.96 years for cognitively impaired participants. We find that the ResNet model captures subtler, non-macrostructural features for brain age prediction. △ Less

Submitted 21 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: SPIE Medical Imaging: Image Processing. San Diego, CA. February 2024 (accepted as poster presentation)

Showing 151–200 of 1,421 results for author: Yang, Q