Search | arXiv e-print repository

Towards Unified Facial Action Unit Recognition Framework by Large Language Models

Authors: Guohong Hu, Xing Lan, Hanyu Jiang, Jiayi Lyu, Jian Xue

Abstract: Facial Action Units (AUえーゆーs) are of great significance in the realm of affective computing. In this paper, we propose AUえーゆー-LLaVA, the first unified AUえーゆー recognition framework based on the Large Language Model (LLM). AUえーゆー-LLaVA consists of a visual encoder, a linear projector layer, and a pre-trained LLM. We meticulously craft the text descriptions and fine-tune the model on various AUえーゆー datasets, allowing it… ▽ More Facial Action Units (AUえーゆーs) are of great significance in the realm of affective computing. In this paper, we propose AUえーゆー-LLaVA, the first unified AUえーゆー recognition framework based on the Large Language Model (LLM). AUえーゆー-LLaVA consists of a visual encoder, a linear projector layer, and a pre-trained LLM. We meticulously craft the text descriptions and fine-tune the model on various AUえーゆー datasets, allowing it to generate different formats of AUえーゆー recognition results for the same input image. On the BP4D and DISFA datasets, AUえーゆー-LLaVA delivers the most accurate recognition results for nearly half of the AUs. Our model achieves improvements of F1-score up to 11.4% in specific AUえーゆー recognition compared to previous benchmark results. On the FEAFA dataset, our method achieves significant improvements over all 24 AUs compared to previous benchmark results. AUえーゆー-LLaVA demonstrates exceptional performance and versatility in AUえーゆー recognition. △ Less

Submitted 12 September, 2024; originally announced September 2024.

arXiv:2409.07388 [pdf, other]

Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective

Authors: Guimin Hu, Yi Xin, Weimin Lyu, Haojian Huang, Chang Sun, Zhihong Zhu, Lin Gui, Ruichu Cai

Abstract: Multimodal affective computing (MAC) has garnered increasing attention due to its broad applications in analyzing human behaviors and intentions, especially in text-dominated multimodal affective computing field. This survey presents the recent trends of multimodal affective computing from NLP perspective through four hot tasks: multimodal sentiment analysis, multimodal emotion recognition in conv… ▽ More Multimodal affective computing (MAC) has garnered increasing attention due to its broad applications in analyzing human behaviors and intentions, especially in text-dominated multimodal affective computing field. This survey presents the recent trends of multimodal affective computing from NLP perspective through four hot tasks: multimodal sentiment analysis, multimodal emotion recognition in conversation, multimodal aspect-based sentiment analysis and multimodal multi-label emotion recognition. The goal of this survey is to explore the current landscape of multimodal affective research, identify development trends, and highlight the similarities and differences across various tasks, offering a comprehensive report on the recent progress in multimodal affective computing from an NLP perspective. This survey covers the formalization of tasks, provides an overview of relevant works, describes benchmark datasets, and details the evaluation metrics for each task. Additionally, it briefly discusses research in multimodal affective computing involving facial expressions, acoustic signals, physiological signals, and emotion causes. Additionally, we discuss the technical approaches, challenges, and future directions in multimodal affective computing. To support further research, we released a repository that compiles related works in multimodal affective computing, providing detailed resources and references for the community. △ Less

Submitted 11 September, 2024; originally announced September 2024.

arXiv:2409.07129 [pdf, other]

MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis

Authors: Hanyu Jiang, Jian Xue, Xing Lan, Guohong Hu, Ke Lu

Abstract: This paper introduces MVLLaVA, an intelligent agent designed for novel view synthesis tasks. MVLLaVA integrates multiple multi-view diffusion models with a large multimodal model, LLaVA, enabling it to handle a wide range of tasks efficiently. MVLLaVA represents a versatile and unified platform that adapts to diverse input types, including a single image, a descriptive caption, or a specific chang… ▽ More This paper introduces MVLLaVA, an intelligent agent designed for novel view synthesis tasks. MVLLaVA integrates multiple multi-view diffusion models with a large multimodal model, LLaVA, enabling it to handle a wide range of tasks efficiently. MVLLaVA represents a versatile and unified platform that adapts to diverse input types, including a single image, a descriptive caption, or a specific change in viewing azimuth, guided by language instructions for viewpoint generation. We carefully craft task-specific instruction templates, which are subsequently used to fine-tune LLaVA. As a result, MVLLaVA acquires the capability to generate novel view images based on user instructions, demonstrating its flexibility across diverse tasks. Experiments are conducted to validate the effectiveness of MVLLaVA, demonstrating its robust performance and versatility in tackling diverse novel view synthesis challenges. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: project page: https://jamesjg.github.io/MVLLaVA_homepage/

arXiv:2409.01580 [pdf, other]

Foreactor: Exploiting Storage I/O Parallelism with Explicit Speculation

Authors: Guanzhou Hu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau

Abstract: We introduce explicit speculation, a variant of I/O speculation technique where I/O system calls can be parallelized under the guidance of explicit application code knowledge. We propose a formal abstraction -- the foreaction graph -- which describes the exact pattern of I/O system calls in an application function as well as any necessary computation associated to produce their argument values. I/… ▽ More We introduce explicit speculation, a variant of I/O speculation technique where I/O system calls can be parallelized under the guidance of explicit application code knowledge. We propose a formal abstraction -- the foreaction graph -- which describes the exact pattern of I/O system calls in an application function as well as any necessary computation associated to produce their argument values. I/O system calls can be issued ahead of time if the graph says it is safe and beneficial to do so. With explicit speculation, serial applications can exploit storage I/O parallelism without involving expensive prediction or checkpointing mechanisms. Based on explicit speculation, we implement Foreactor, a library framework that allows application developers to concretize foreaction graphs and enable concurrent I/O with little or no modification to application source code. Experimental results show that Foreactor is able to improve the performance of both synthetic benchmarks and real applications by significant amounts (29%-50%). △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: 12 pages, 10 figures

arXiv:2409.01576 [pdf, other]

A Unified, Practical, and Understandable Summary of Non-transactional Consistency Levels in Distributed Replication

Authors: Guanzhou Hu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau

Abstract: We present a summary of non-transactional consistency levels in the context of distributed data replication protocols. The levels are built upon a practical object pool model and are defined in a unified framework centered around the concept of ordering. We show that each consistency level can be intuitively defined by specifying two types of constraints that determine the validity of orderings al… ▽ More We present a summary of non-transactional consistency levels in the context of distributed data replication protocols. The levels are built upon a practical object pool model and are defined in a unified framework centered around the concept of ordering. We show that each consistency level can be intuitively defined by specifying two types of constraints that determine the validity of orderings allowed by the level: convergence, which bounds the lineage shape of the ordering, and relationship, which bounds the relative positions of operations in the ordering. We give examples of representative protocols and systems that implement each consistency level. Furthermore, we discuss the availability upper bound of presented consistency levels. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: 12 pages, 4 figures

arXiv:2408.14410 [pdf, other]

Generalized Bayesian nonparametric clustering framework for high-dimensional spatial omics data

Authors: Bencong Zhu, Guanyu Hu, Xiaodan Fan, Qiwei Li

Abstract: The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has transformed genomic research by enabling high-throughput gene expression profiling while preserving spatial context. Identifying spatial domains within SRT data is a critical task, with numerous computational approaches currently available. However, most existing methods rely on a multi-stage pro… ▽ More The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has transformed genomic research by enabling high-throughput gene expression profiling while preserving spatial context. Identifying spatial domains within SRT data is a critical task, with numerous computational approaches currently available. However, most existing methods rely on a multi-stage process that involves ad-hoc dimension reduction techniques to manage the high dimensionality of SRT data. These low-dimensional embeddings are then subjected to model-based or distance-based clustering methods. Additionally, many approaches depend on arbitrarily specifying the number of clusters (i.e., spatial domains), which can result in information loss and suboptimal downstream analysis. To address these limitations, we propose a novel Bayesian nonparametric mixture of factor analysis (BNPMFA) model, which incorporates a Markov random field-constrained Gibbs-type prior for partitioning high-dimensional spatial omics data. This new prior effectively integrates the spatial constraints inherent in SRT data while simultaneously inferring cluster membership and determining the optimal number of spatial domains. We have established the theoretical identifiability of cluster membership within this framework. The efficacy of our proposed approach is demonstrated through realistic simulations and applications to two SRT datasets. Our results show that the BNPMFA model not only surpasses state-of-the-art methods in clustering accuracy and estimating the number of clusters but also offers novel insights for identifying cellular regions within tissue samples. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.13993 [pdf]

Enhanced Cherenkov radiation in twisted hyperbolic Van der Waals crystals

Authors: Hao Hu, Xiao Lin, Guangwei Hu, Francisco J. Garcia-Vidal, Yu Luo

Abstract: Cherenkov radiation in artificial structures experiencing strong radiation enhancements promises important applications in free-electron quantum emitters, broadband light sources, miniaturized particle detectors, etc. However, the momentum matching condition between the swift electron and emitted photons generally restricts the radiation enhancement to a particular momentum. Efficient Cherenkov ra… ▽ More Cherenkov radiation in artificial structures experiencing strong radiation enhancements promises important applications in free-electron quantum emitters, broadband light sources, miniaturized particle detectors, etc. However, the momentum matching condition between the swift electron and emitted photons generally restricts the radiation enhancement to a particular momentum. Efficient Cherenkov radiation over a wide range of momenta is highly demanded for many applications but has still remained a challenging task. To this end, we explore the interaction between a swift electron and twisted hyperbolic Van der Waals crystals, and observe enhanced Cherenkov radiation at the flatband resonance frequency. We show that, at the photonic magic angle of the twisted crystals, the electron momentum, once matching with that of the flatband photon, gives rise to a maximum energy loss (corresponding to the surface phonon generation), one-order of magnitude higher than that in conventional hyperbolic materials. Such a significant enhancement is attributed to the excitation of flatband surface phonon polaritons over a broad momentum range. Our findings provide a feasible route to highly directional free-electron radiation and radiation shaping. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2408.12800 [pdf, other]

Cap2Sum: Learning to Summarize Videos by Generating Captions

Authors: Cairong Zhao, Chutian Wang, Zifan Song, Guosheng Hu, Haonan Chen, Xiaofan Zhai

Abstract: With the rapid growth of video data on the internet, video summarization is becoming a very important AI technology. However, due to the high labelling cost of video summarization, existing studies have to be conducted on small-scale datasets, leading to limited performance and generalization capacity. In this work, we introduce the use of dense video captions as a supervision signal to train vide… ▽ More With the rapid growth of video data on the internet, video summarization is becoming a very important AI technology. However, due to the high labelling cost of video summarization, existing studies have to be conducted on small-scale datasets, leading to limited performance and generalization capacity. In this work, we introduce the use of dense video captions as a supervision signal to train video summarization models. Motivated by this, we propose Cap2Sum, a model that learns to summarize videos by generating captions, to exploit dense video caption annotations. This weakly-supervised approach allows us to train the models on large-scale dense video caption datasets to achieve better performance and generalization capacity. To further improve the generalization capacity, we introduce a CLIP (a strong vision-language model) Prior mechanism to enhance the learning of important objects that captions may ignore in the videos. In practice, Cap2Sum can perform zero-shot video summarization or be fine-tuned by the ground-truth summary or video caption of the target dataset. To examine the performance of Cap2Sum after weakly-supervised fine-tuning by the video captions, we propose two new datasets, TVSum-Caption and SumMe-Caption, which are derived from two common video summarization datasets and will be publicly released. We conduct extensive experiments and the results demonstrate that our method achieves significant improvements in performance and generalization capacity compared with previous methods. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: 13 pages, 4 figures

arXiv:2408.06574 [pdf, other]

SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

Authors: Dayong Wu, Jiaqi Li, Baoxin Wang, Honghong Zhao, Siyuan Xue, Yanjie Yang, Zhijun Chang, Rui Zhang, Li Qian, Bo Wang, Shijin Wang, Zhixiong Zhang, Guoping Hu

Abstract: Large language models (LLMs) have shown remarkable achievements across various language tasks.To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Ass… ▽ More Large language models (LLMs) have shown remarkable achievements across various language tasks.To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Assistant (SparkRA) based on our SciLit-LLM. SparkRA is accessible online and provides three primary functions: literature investigation, paper reading, and academic writing. As of July 30, 2024, SparkRA has garnered over 50,000 registered users, with a total usage count exceeding 1.3 million. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.02164 [pdf, other]

Rethinking Affect Analysis: A Protocol for Ensuring Fairness and Consistency

Authors: Guanyu Hu, Dimitrios Kollias, Eleni Papadopoulou, Paraskevi Tzouveli, Jie Wei, Xinyu Yang

Abstract: Evaluating affect analysis methods presents challenges due to inconsistencies in database partitioning and evaluation protocols, leading to unfair and biased results. Previous studies claim continuous performance improvements, but our findings challenge such assertions. Using these insights, we propose a unified protocol for database partitioning that ensures fairness and comparability. We provide… ▽ More Evaluating affect analysis methods presents challenges due to inconsistencies in database partitioning and evaluation protocols, leading to unfair and biased results. Previous studies claim continuous performance improvements, but our findings challenge such assertions. Using these insights, we propose a unified protocol for database partitioning that ensures fairness and comparability. We provide detailed demographic annotations (in terms of race, gender and age), evaluation metrics, and a common framework for expression recognition, action unit detection and valence-arousal estimation. We also rerun the methods with the new protocol and introduce a new leaderboards to encourage future research in affect recognition with a fairer comparison. Our annotations, code, and pre-trained models are available on \hyperlink{https://github.com/dkollias/Fair-Consistent-Affect-Analysis}{Github}. △ Less

Submitted 7 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2405.06841

arXiv:2407.21467 [pdf]

Deep Learning-Based Longitudinal Prediction of Childhood Myopia Progression Using Fundus Image Sequences and Baseline Refraction Data

Authors: Mengtian Kang, Yansong Hu, Shuo Gao, Yuanyuan Liu, Hongbei Meng, Xuemeng Li, Xuhang Chen, Hubin Zhao, Jing Fu, Guohua Hu, Wei Wang, Yanning Dai, Arokia Nathan, Peter Smielewski, Ningli Wang, Shiming Li

Abstract: Childhood myopia constitutes a significant global health concern. It exhibits an escalating prevalence and has the potential to evolve into severe, irreversible conditions that detrimentally impact familial well-being and create substantial economic costs. Contemporary research underscores the importance of precisely predicting myopia progression to enable timely and effective interventions, there… ▽ More Childhood myopia constitutes a significant global health concern. It exhibits an escalating prevalence and has the potential to evolve into severe, irreversible conditions that detrimentally impact familial well-being and create substantial economic costs. Contemporary research underscores the importance of precisely predicting myopia progression to enable timely and effective interventions, thereby averting severe visual impairment in children. Such predictions predominantly rely on subjective clinical assessments, which are inherently biased and resource-intensive, thus hindering their widespread application. In this study, we introduce a novel, high-accuracy method for quantitatively predicting the myopic trajectory and myopia risk in children using only fundus images and baseline refraction data. This approach was validated through a six-year longitudinal study of 3,408 children in Henan, utilizing 16,211 fundus images and corresponding refractive data. Our method based on deep learning demonstrated predictive accuracy with an error margin of 0.311D per year and AUC scores of 0.944 and 0.995 for forecasting the risks of developing myopia and high myopia, respectively. These findings confirm the utility of our model in supporting early intervention strategies and in significantly reducing healthcare costs, particularly by obviating the need for additional metadata and repeated consultations. Furthermore, our method was designed to rely only on fundus images and refractive error data, without the need for meta data or multiple inquiries from doctors, strongly reducing the associated medical costs and facilitating large-scale screening. Our model can even provide good predictions based on only a single time measurement. Consequently, the proposed method is an important means to reduce medical inequities caused by economic disparities. △ Less

Submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.17841 [pdf, ps, other]

Two-Timescale Design for Movable Antenna Array-Enabled Multiuser Uplink Communications

Authors: Guojie Hu, Qingqing Wu, Donghui Xu, Kui Xu, Jiangbo Si, Yunlong Cai, Naofal Al-Dhahir

Abstract: Movable antenna (MA) technology can flexibly reconfigure wireless channels by adjusting antenna positions in a local region, thus owing great potential for enhancing communication performance. This letter investigates MA technology enabled multiuser uplink communications over general Rician fading channels, which consist of a base station (BS) equipped with the MA array and multiple single-antenna… ▽ More Movable antenna (MA) technology can flexibly reconfigure wireless channels by adjusting antenna positions in a local region, thus owing great potential for enhancing communication performance. This letter investigates MA technology enabled multiuser uplink communications over general Rician fading channels, which consist of a base station (BS) equipped with the MA array and multiple single-antenna users. Since it is practically challenging to collect all instantaneous channel state information (CSI) by traversing all possible antenna positions at the BS, we instead propose a two-timescale scheme for maximizing the ergodic sum rate. Specifically, antenna positions at the BS are first optimized using only the statistical CSI. Subsequently, the receiving beamforming at the BS (for which we consider the three typical zero-forcing (ZF), minimum mean-square error (MMSE) and MMSE with successive interference cancellation (MMSE-SIC) receivers) is designed based on the instantaneous CSI with optimized antenna positions, thus significantly reducing practical implementation complexities. The formulated problems are highly non-convex and we develop projected gradient ascent (PGA) algorithms to effectively handle them. Simulation results illustrate that compared to conventional fixed-position antenna (FPA) array, the MA array can achieve significant performance gains by reaping an additional spatial degree of freedom. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.16924 [pdf]

Real-space topology-engineering of skyrmionic spin textures in a van der Waals ferromagnet Fe3GaTe2

Authors: Shuo Mi, Jianfeng Guo, Guojing Hu, Guangcheng Wang, Songyang Li, Zizhao Gong, Shuaizhao Jin, Rui Xu, Fei Pang, Wei Ji, Weiqiang Yu, Xiaolei Wang, Xueyun Wang, Haitao Yang, Zhihai Cheng

Abstract: Realizing magnetic skyrmions in two-dimensional (2D) van der Waals (vdW) ferromagnets offers unparalleled prospects for future spintronic applications. The room-temperature ferromagnet Fe3GaTe2 provides an ideal platform for tailoring these magnetic solitons. Here, skyrmions of distinct topological charges are artificially introduced and spatially engineered using magnetic force microscopy (MFM).… ▽ More Realizing magnetic skyrmions in two-dimensional (2D) van der Waals (vdW) ferromagnets offers unparalleled prospects for future spintronic applications. The room-temperature ferromagnet Fe3GaTe2 provides an ideal platform for tailoring these magnetic solitons. Here, skyrmions of distinct topological charges are artificially introduced and spatially engineered using magnetic force microscopy (MFM). The skyrmion lattice is realized by specific field-cooling process, and can be further controllably erased and painted via delicate manipulation of tip stray field. The skyrmion lattice with opposite topological charges (S = +1 or -1) can be tailored at the target regions to form topological skyrmion junctions (TSJs) with specific configurations. The delicate interplay of TSJs and spin-polarized device current were finally investigated via the in-situ transport measurements, alongside the topological stability of TSJs. Our results demonstrate that Fe3GaTe2 not only serves as a potential building block for room-temperature skyrmion-based spintronic devices, but also presents promising prospects for Fe3GaTe2-based heterostructures with the engineered topological spin textures. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.15798 [pdf, other]

Robust Facial Reactions Generation: An Emotion-Aware Framework with Modality Compensation

Authors: Guanyu Hu, Jie Wei, Siyang Song, Dimitrios Kollias, Xinyu Yang, Zhonglin Sun, Odysseus Kaloidas

Abstract: The objective of the Multiple Appropriate Facial Reaction Generation (MAFRG) task is to produce contextually appropriate and diverse listener facial behavioural responses based on the multimodal behavioural data of the conversational partner (i.e., the speaker). Current methodologies typically assume continuous availability of speech and facial modality data, neglecting real-world scenarios where… ▽ More The objective of the Multiple Appropriate Facial Reaction Generation (MAFRG) task is to produce contextually appropriate and diverse listener facial behavioural responses based on the multimodal behavioural data of the conversational partner (i.e., the speaker). Current methodologies typically assume continuous availability of speech and facial modality data, neglecting real-world scenarios where these data may be intermittently unavailable, which often results in model failures. Furthermore, despite utilising advanced deep learning models to extract information from the speaker's multimodal inputs, these models fail to adequately leverage the speaker's emotional context, which is vital for eliciting appropriate facial reactions from human listeners. To address these limitations, we propose an Emotion-aware Modality Compensatory (EMC) framework. This versatile solution can be seamlessly integrated into existing models, thereby preserving their advantages while significantly enhancing performance and robustness in scenarios with missing modalities. Our framework ensures resilience when faced with missing modality data through the Compensatory Modality Alignment (CMA) module. It also generates more appropriate emotion-aware reactions via the Emotion-aware Attention (EA) module, which incorporates the speaker's emotional information throughout the entire encoding and decoding process. Experimental results demonstrate that our framework improves the appropriateness metric FRCorr by an average of 57.2\% compared to the original model structure. In scenarios where speech modality data is missing, the performance of appropriate generation shows an improvement, and when facial data is missing, it only exhibits minimal degradation. △ Less

Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.08496 [pdf, ps, other]

Convergences of Combinatorial Ricci Flows to Degenerated Circle Packings in Hyperbolic Background Geometry

Authors: Guangming Hu, Sicheng Lu, Dong Tan, Youliang Zhong, Puchun Zhou

Abstract: This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circl… ▽ More This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circle packed surface, analougus to the methods of Chow-Luo and Takatsu. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 36 pages, 9 figures

MSC Class: 52C26; 57M50

arXiv:2407.05840 [pdf, other]

A 103-TOPS/mm$^2$ Integrated Photonic Computing Engine Enabling Next-Generation Reservoir Computing

Authors: Dongliang Wang, Yikun Nie, Gaolei Hu, Hon Ki Tsang, Chaoran Huang

Abstract: Reservoir computing (RC) is a leading machine learning algorithm for information processing due to its rich expressiveness. A new RC paradigm has recently emerged, showcasing superior performance and delivering more interpretable results with shorter training data sets and training times, representing the next generation of RC computing. This work presents the first realization of a high-speed nex… ▽ More Reservoir computing (RC) is a leading machine learning algorithm for information processing due to its rich expressiveness. A new RC paradigm has recently emerged, showcasing superior performance and delivering more interpretable results with shorter training data sets and training times, representing the next generation of RC computing. This work presents the first realization of a high-speed next-generation RC system on an integrated photonic chip. Our experimental results demonstrate state-of-the-art forecasting and classification performances under various machine learning tasks and achieve the fastest speeds of 60 Gbaud and a computing density of 103 tera operations/second/mm$^2$ (TOPS/mm$^2$). The passive system, composed of a simple star coupler with on-chip delay lines, offers several advantages over traditional RC systems, including no speed limitations, compact footprint, extremely high fabrication error tolerance, fewer metaparameters, and greater interpretability. This work lays the foundation for ultrafast on-chip photonic RC, representing significant progress toward developing next-generation high-speed photonic computing and signal processing. △ Less

Submitted 31 May, 2024; originally announced July 2024.

arXiv:2407.03835 [pdf, other]

7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition

Authors: Dimitrios Kollias, Stefanos Zafeiriou, Irene Kotsia, Abhinav Dhall, Shreya Ghosh, Chunchang Shao, Guanyu Hu

Abstract: This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with ECCV 2024. The 7th ABAW Competition addresses novel challenges in understanding human expressions and behaviors, crucial for the development of human-centered technologies. The Competition comprises of two sub-challenges: i) Multi-Task Learning… ▽ More This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with ECCV 2024. The 7th ABAW Competition addresses novel challenges in understanding human expressions and behaviors, crucial for the development of human-centered technologies. The Competition comprises of two sub-challenges: i) Multi-Task Learning (the goal is to learn at the same time, in a multi-task learning setting, to estimate two continuous affect dimensions, valence and arousal, to recognise between the mutually exclusive classes of the 7 basic expressions and 'other'), and to detect 12 Action Units); and ii) Compound Expression Recognition (the target is to recognise between the 7 mutually exclusive compound expression classes). s-Aff-Wild2, which is a static version of the A/V Aff-Wild2 database and contains annotations for valence-arousal, expressions and Action Units, is utilized for the purposes of the Multi-Task Learning Challenge; a part of C-EXPR-DB, which is an A/V in-the-wild database with compound expression annotations, is utilized for the purposes of the Compound Expression Recognition Challenge. In this paper, we introduce the two challenges, detailing their datasets and the protocols followed for each. We also outline the evaluation metrics, and highlight the baseline systems and their results. Additional information about the competition can be found at \url{https://affective-behavior-analysis-in-the-wild.github.io/7th}. △ Less

Submitted 8 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.14992 [pdf, other]

A multi-mesh approach for accurate computation of multi-target functionals in aerodynamics design

Authors: Guanghui Hu, Ruo Li, Jingfeng Wang

Abstract: Aerodynamic optimal design is crucial for enhancing performance of aircrafts, while calculating multi-target functionals through solving dual equations with arbitrary right-hand sides remains challenging. In this paper, a novel multi-target framework of DWR-based mesh refinement is proposed and analyzed. Theoretically, an extrapolation method is generalized to expand multi-variable functionals, wh… ▽ More Aerodynamic optimal design is crucial for enhancing performance of aircrafts, while calculating multi-target functionals through solving dual equations with arbitrary right-hand sides remains challenging. In this paper, a novel multi-target framework of DWR-based mesh refinement is proposed and analyzed. Theoretically, an extrapolation method is generalized to expand multi-variable functionals, which guarantees the dual equations of different objective functionals can be calculated separately. Numerically, an algorithm of calculating multi-target functionals is designed based on the multi-mesh approach, which can help to obtain different dual solutions simultaneously. One feature of our framework is the algorithm is easy to implement with the help of the hierarchical geometry tree structure and the calculation avoids the Galerkin orthogonality naturally. The framework takes a balance between different targets even when they are not the same orders of magnitude. While existing approach uses a linear combination of different components in multi-target functionals for adaptation, it introduces additional coefficients for adjusting. With each component calculated under a dual-consistent scheme, this multi-mesh framework addresses challenges such as the lift-drag ratio and other kinds of multi-target functionals, ensuring smooth convergence and precise calculations of dual solutions. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.12180 [pdf]

Unusual charge density wave introduced by Janus structure in monolayer vanadium dichalcogenides

Authors: Ziqiang Xu, Yan Shao, Chun Huang, Genyu Hu, Shihao Hu, Zhi-Lin Li, Xiaoyu Hao, Yanhui Hou, Teng Zhang, Jin-An Shi, Chen Liu, Jia-Ou Wang, Wu Zhou, Jiadong Zhou, Wei Ji, Jingsi Qiao, Xu Wu, Hong-Jun Gao, Yeliang Wang

Abstract: As a fundamental structural feature, the symmetry of materials determines the exotic quantum properties in transition metal dichalcogenides (TMDs) with charge density wave (CDW). Breaking the inversion symmetry, the Janus structure, an artificially constructed lattice, provides an opportunity to tune the CDW states and the related properties. However, limited by the difficulties in atomic-level fa… ▽ More As a fundamental structural feature, the symmetry of materials determines the exotic quantum properties in transition metal dichalcogenides (TMDs) with charge density wave (CDW). Breaking the inversion symmetry, the Janus structure, an artificially constructed lattice, provides an opportunity to tune the CDW states and the related properties. However, limited by the difficulties in atomic-level fabrication and material stability, the experimental visualization of the CDW states in 2D TMDs with Janus structure is still rare. Here, using surface selenization of VTe2, we fabricated monolayer Janus VTeSe. With scanning tunneling microscopy, an unusual root13-root13 CDW state with threefold rotational symmetry breaking was observed and characterized. Combined with theoretical calculations, we find this CDW state can be attributed to the charge modulation in the Janus VTeSe, beyond the conventional electron-phonon coupling. Our findings provide a promising platform for studying the CDW states and artificially tuning the electronic properties toward the applications. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.12164 [pdf, other]

A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis

Authors: Guoqiang Hu, Huaning Tan, Ruilai Li

Abstract: Acoustic features play an important role in improving the quality of the synthesised speech. Currently, the Mel spectrogram is a widely employed acoustic feature in most acoustic models. However, due to the fine-grained loss caused by its Fourier transform process, the clarity of speech synthesised by Mel spectrogram is compromised in mutant signals. In order to obtain a more detailed Mel spectrog… ▽ More Acoustic features play an important role in improving the quality of the synthesised speech. Currently, the Mel spectrogram is a widely employed acoustic feature in most acoustic models. However, due to the fine-grained loss caused by its Fourier transform process, the clarity of speech synthesised by Mel spectrogram is compromised in mutant signals. In order to obtain a more detailed Mel spectrogram, we propose a Mel spectrogram enhancement paradigm based on the continuous wavelet transform (CWT). This paradigm introduces an additional task: a more detailed wavelet spectrogram, which like the post-processing network takes as input the Mel spectrogram output by the decoder. We choose Tacotron2 and Fastspeech2 for experimental validation in order to test autoregressive (AR) and non-autoregressive (NAR) speech systems, respectively. The experimental results demonstrate that the speech synthesised using the model with the Mel spectrogram enhancement paradigm exhibits higher MOS, with an improvement of 0.14 and 0.09 compared to the baseline model, respectively. These findings provide some validation for the universality of the enhancement paradigm, as they demonstrate the success of the paradigm in different architectures. △ Less

Submitted 9 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by IALP 2024

arXiv:2406.11030 [pdf, other]

FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture

Authors: Wenyan Li, Xinyu Zhang, Jiaang Li, Qiwei Peng, Raphael Tang, Li Zhou, Weijia Zhang, Guimin Hu, Yifei Yuan, Anders Søgaard, Daniel Hershcovich, Desmond Elliott

Abstract: Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups. To bridge the gap in the literature on the often-overlooked regional diversity in this domain, we introduce FoodieQA, a manually curated, fine-grained image-text dataset capturing the intricate features of food cultures across various regions in China. We evaluate vision-language Models (VLMs)… ▽ More Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups. To bridge the gap in the literature on the often-overlooked regional diversity in this domain, we introduce FoodieQA, a manually curated, fine-grained image-text dataset capturing the intricate features of food cultures across various regions in China. We evaluate vision-language Models (VLMs) and large language models (LLMs) on newly collected, unseen food images and corresponding questions. FoodieQA comprises three multiple-choice question-answering tasks where models need to answer questions based on multiple images, a single image, and text-only descriptions, respectively. While LLMs excel at text-based question answering, surpassing human accuracy, the open-sourced VLMs still fall short by 41\% on multi-image and 21\% on single-image VQA tasks, although closed-weights models perform closer to human levels (within 10\%). Our findings highlight that understanding food and its cultural implications remains a challenging and under-explored direction. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10052 [pdf, other]

Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

Authors: Haoyu Wang, Guoqiang Hu, Guodong Lin, Wei-Qiang Zhang, Jian Li

Abstract: As a robust and large-scale multilingual speech recognition model, Whisper has demonstrated impressive results in many low-resource and out-of-distribution scenarios. However, its encoder-decoder structure hinders its application to streaming speech recognition. In this paper, we introduce Simul-Whisper, which uses the time alignment embedded in Whisper's cross-attention to guide auto-regressive d… ▽ More As a robust and large-scale multilingual speech recognition model, Whisper has demonstrated impressive results in many low-resource and out-of-distribution scenarios. However, its encoder-decoder structure hinders its application to streaming speech recognition. In this paper, we introduce Simul-Whisper, which uses the time alignment embedded in Whisper's cross-attention to guide auto-regressive decoding and achieve chunk-based streaming ASR without any fine-tuning of the pre-trained model. Furthermore, we observe the negative effect of the truncated words at the chunk boundaries on the decoding results and propose an integrate-and-fire-based truncation detection model to address this issue. Experiments on multiple languages and Whisper architectures show that Simul-Whisper achieves an average absolute word error rate degradation of only 1.46% at a chunk size of 1 second, which significantly outperforms the current state-of-the-art baseline. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024

arXiv:2406.09776 [pdf, other]

Faster Convergence on Heterogeneous Federated Edge Learning: An Adaptive Clustered Data Sharing Approach

Authors: Gang Hu, Yinglei Teng, Nan Wang, Zhu Han

Abstract: Federated Edge Learning (FEEL) emerges as a pioneering distributed machine learning paradigm for the 6G Hyper-Connectivity, harnessing data from the Internet of Things (IoT) devices while upholding data privacy. However, current FEEL algorithms struggle with non-independent and non-identically distributed (non-IID) data, leading to elevated communication costs and compromised model accuracy. To ad… ▽ More Federated Edge Learning (FEEL) emerges as a pioneering distributed machine learning paradigm for the 6G Hyper-Connectivity, harnessing data from the Internet of Things (IoT) devices while upholding data privacy. However, current FEEL algorithms struggle with non-independent and non-identically distributed (non-IID) data, leading to elevated communication costs and compromised model accuracy. To address these statistical imbalances within FEEL, we introduce a clustered data sharing framework, mitigating data heterogeneity by selectively sharing partial data from cluster heads to trusted associates through sidelink-aided multicasting. The collective communication pattern is integral to FEEL training, where both cluster formation and the efficiency of communication and computation impact training latency and accuracy simultaneously. To tackle the strictly coupled data sharing and resource optimization, we decompose the overall optimization problem into the clients clustering and effective data sharing subproblems. Specifically, a distribution-based adaptive clustering algorithm (DACA) is devised basing on three deductive cluster forming conditions, which ensures the maximum sharing yield. Meanwhile, we design a stochastic optimization based joint computed frequency and shared data volume optimization (JFVO) algorithm, determining the optimal resource allocation with an uncertain objective function. The experiments show that the proposed framework facilitates FEEL on non-IID datasets with faster convergence rate and higher model accuracy in a limited communication environment. △ Less

Submitted 8 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09304 [pdf]

Self-reconfigurable Multifunctional Memristive Nociceptor for Intelligent Robotics

Authors: Shengbo Wang, Mingchao Fang, Lekai Song, Cong Li, Jian Zhang, Arokia Nathan, Guohua Hu, Shuo Gao

Abstract: Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute… ▽ More Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute has been currently omitted, but it is highly desired for artificial nociceptors. Inspired by these shortcomings, this article presents, for the first time, a Self-Directed Channel (SDC) memristor-based self-reconfigurable nociceptor, capable of perceiving hazardous pressure stimuli under different temperatures and demonstrates key features of tactile nociceptors, including 'threshold,' 'no-adaptation,' and 'sensitization.' The maximum amplification of hazardous external stimuli is 1000%, and its response characteristics dynamically adapt to current temperature conditions by automatically altering the generated modulation schemes for the memristor. The maximum difference ratio of the response of memristors at different temperatures is 500%, and this adaptability closely mimics the functions of biological tactile nociceptors, resulting in accurate danger perception in various conditions. Beyond temperature adaptation, this memristor-based nociceptor has the potential to integrate different sensory modalities by applying various sensors, thereby achieving human-like perception capabilities in real-world environments. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures

arXiv:2406.07462 [pdf]

doi 10.1016/j.jmps.2024.105842

Rayleigh surface waves of extremal elastic materials

Authors: Yu Wei, Yi Chen, Wen Cheng, Xiaoning Liu, Gengkai Hu

Abstract: Extremal elastic materials here refer to a specific class of elastic materials whose elastic matrices exhibit one or more zero eigenvalues, resulting in soft deformation modes that, in principle, cost no energy. They can be approximated through artificially designed solid microstructures. Extremal elastic materials have exotic bulk wave properties unavailable with conventional solids due to the so… ▽ More Extremal elastic materials here refer to a specific class of elastic materials whose elastic matrices exhibit one or more zero eigenvalues, resulting in soft deformation modes that, in principle, cost no energy. They can be approximated through artificially designed solid microstructures. Extremal elastic materials have exotic bulk wave properties unavailable with conventional solids due to the soft modes, offering unprecedented opportunities for manipulating bulk waves, e.g., acting as phonon polarizers for elastic waves or invisibility cloaks for underwater acoustic waves. Despite their potential, Rayleigh surface waves, crucially linked to bulk wave behaviors of such extremal elastic materials, have largely remained unexplored so far. In this paper, we theoretically investigate the propagation of Rayleigh waves in extremal elastic materials based on continuum theory and verify our findings with designed microstructure metamaterials based on pantographic structures. Dispersion relations and polarizations of Rayleigh waves in extremal elastic materials are derived, and the impact of higher order gradient effects is also investigated by using strain gradient theory. This study provides a continuum model for exploring surface waves in extremal elastic materials and may stimulate applications of extremal elastic materials for controlling surface waves. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 8 figures

Journal ref: J. Mech. Phys. Solids, 193(105842), 2024

arXiv:2406.01503 [pdf, ps, other]

An inverse obstacle problem with a single pair of Cauchy data: Laplace's equation case

Authors: Xiaoxu Xu, Guanghui Hu

Abstract: This paper is concerned with an inverse obstacle problem for the Laplace's equation. The aim is to recover the constant conductivity coefficient in the equation and the boundary of a Dirichlet polygonal obstacle from a single pair of Cauchy data. Uniqueness results are established under some a priori assumptions on the input boundary value data. A domain-defined sampling method, based on the facto… ▽ More This paper is concerned with an inverse obstacle problem for the Laplace's equation. The aim is to recover the constant conductivity coefficient in the equation and the boundary of a Dirichlet polygonal obstacle from a single pair of Cauchy data. Uniqueness results are established under some a priori assumptions on the input boundary value data. A domain-defined sampling method, based on the factorization method originating from inverse acoustic scattering, has been proposed to recover both the constant conductivity coefficient and the polygonal obstacle. A hybrid strategy, which combines the sampling method and iterative scheme, is employed {\color{hgh}to reconstruct} the location and shape of the obstacle. Numerical examples indicate that our method is efficient. △ Less

Submitted 3 June, 2024; originally announced June 2024.

MSC Class: 35R30; 35R25; 35J25

arXiv:2406.01151 [pdf, other]

A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing

Authors: P. J. Zhou, Q. Yu, M. Chen, Y. C. Wang, L. W. Meng, Y. Zuo, N. Ning, Y. Liu, S. G. Hu, G. C. Qiao

Abstract: Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency,… ▽ More Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency, including sparse computing, partial membrane potential updates, and non-uniform weight quantization. Multiple neuromorphic cores and multi-mode routers form a fullerene-like network-on-chip (NoC). The average degree of communication nodes exceeds traditional topologies by 32%, with a minimal degree variance of 0.93, allowing advanced decentralized on-chip communication. Additionally, the NoC can be scaled up through extended off-chip high-level router nodes. A RISC-V CPU and a neuromorphic processor are tightly coupled and fabricated within a 5.42 mm^2 die area under 55 nm CMOS technology. The chip has a low power density of 0.52 mW/mm^2, reducing 67.5% compared to related works, and achieves a high neuron density of 30.23 K/mm^2. Eventually, the chip is demonstrated to be effective on different datasets and achieves 0.96 pJ/SOP energy efficiency. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 5 pages, 8 figures

arXiv:2406.00497 [pdf, ps, other]

Recent Advances in End-to-End Simultaneous Speech Translation

Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, Yingfeng Luo, Chen Xu, Tong Xiao, Jingbo Zhu

Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles. Secondly, satisfying real-time requirements presents inherent difficulties due to the need for immediate translation output. Thirdly, striking a balance between translation quality and latency constraints remains a critical challenge. Finally, the scarcity of annotated data adds another layer of complexity to the task. Through our exploration of these challenges and the proposed solutions, we aim to provide valuable insights into the current landscape of SimulST research and suggest promising directions for future exploration. △ Less

Submitted 20 August, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

Comments: Accepted by IJCAI 2024

arXiv:2405.07408 [pdf, other]

Bayesian Spatially Clustered Compositional Regression: Linking intersectoral GDP contributions to Gini Coefficients

Authors: Jingcheng Meng, Yimeng Ren, Xuening Zhu, Guanyu Hu

Abstract: The Gini coefficient is an universally used measurement of income inequality. Intersectoral GDP contributions reveal the economic development of different sectors of the national economy. Linking intersectoral GDP contributions to Gini coefficients will provide better understandings of how the Gini coefficient is influenced by different industries. In this paper, a compositional regression with sp… ▽ More The Gini coefficient is an universally used measurement of income inequality. Intersectoral GDP contributions reveal the economic development of different sectors of the national economy. Linking intersectoral GDP contributions to Gini coefficients will provide better understandings of how the Gini coefficient is influenced by different industries. In this paper, a compositional regression with spatially clustered coefficients is proposed to explore heterogeneous effects over spatial locations under nonparametric Bayesian framework. Specifically, a Markov random field constraint mixture of finite mixtures prior is designed for Bayesian log contrast regression with compostional covariates, which allows for both spatially contiguous clusters and discontinous clusters. In addition, an efficient Markov chain Monte Carlo algorithm for posterior sampling that enables simultaneous inference on both cluster configurations and cluster-wise parameters is designed. The compelling empirical performance of the proposed method is demonstrated via extensive simulation studies and an application to 51 states of United States from 2019 Bureau of Economic Analysis. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.06841 [pdf, other]

Bridging the Gap: Protocol Towards Fair and Consistent Affect Analysis

Authors: Guanyu Hu, Eleni Papadopoulou, Dimitrios Kollias, Paraskevi Tzouveli, Jie Wei, Xinyu Yang

Abstract: The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment. As these technologies play a pivotal role in decision-making, addressing biases across diverse subpopulation groups, including age, gender, and race, becomes paramount. Automatic affect analysis, at the intersection of physiology, psychology, and machin… ▽ More The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment. As these technologies play a pivotal role in decision-making, addressing biases across diverse subpopulation groups, including age, gender, and race, becomes paramount. Automatic affect analysis, at the intersection of physiology, psychology, and machine learning, has seen significant development. However, existing databases and methodologies lack uniformity, leading to biased evaluations. This work addresses these issues by analyzing six affective databases, annotating demographic attributes, and proposing a common protocol for database partitioning. Emphasis is placed on fairness in evaluations. Extensive experiments with baseline and state-of-the-art methods demonstrate the impact of these changes, revealing the inadequacy of prior assessments. The findings underscore the importance of considering demographic attributes in affect analysis research and provide a foundation for more equitable methodologies. Our annotations, code and pre-trained models are available at: https://github.com/dkollias/Fair-Consistent-Affect-Analysis △ Less

Submitted 16 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: accepted at IEEE FG 2024

arXiv:2405.05449 [pdf, other]

Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management

Authors: Gang Hu, Ming Gu

Abstract: Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz's portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. Th… ▽ More Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz's portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. The trained agents optimize portfolio assembly. A comparative analysis against standard financial models and AI frameworks, using metrics like returns, the Sharpe ratio, and nine evaluation indices, reveals our model's superiority. It notably achieves the highest yield and Sharpe ratio of 2.03, ensuring top profitability with the lowest risk in comparable return scenarios. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.05179 [pdf, ps, other]

Detection of a piecewise linear crack with one incident wave

Authors: Xiaoxu Xu, Guanqiu Ma, Guanghui Hu

Abstract: This paper is concerned with inverse crack scattering problems for time-harmonic acoustic waves. We prove that a piecewise linear crack with the sound-soft boundary condition in two dimensions can be uniquely determined by the far-field data corresponding to a single incident plane wave or point source. We propose two non-iterative methods for imaging the location and shape of a crack. The first o… ▽ More This paper is concerned with inverse crack scattering problems for time-harmonic acoustic waves. We prove that a piecewise linear crack with the sound-soft boundary condition in two dimensions can be uniquely determined by the far-field data corresponding to a single incident plane wave or point source. We propose two non-iterative methods for imaging the location and shape of a crack. The first one is a contrast sampling method, while the second one is a variant of the classical factorization method but only with one incoming wave. Newton's iteration method is then employed for getting a more precise reconstruction result. Numerical examples are presented to show the effectiveness of the proposed hybrid method. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04120 [pdf, ps, other]

Movable Antennas-Enabled Two-User Multicasting: Do We Really Need Alternating Optimization for Minimum Rate Maximization?

Authors: Guojie Hu, Qingqing Wu, Donghui Xu, Kui Xu, Jiangbo Si, Yunlong Cai, Naofal Al-Dhahir

Abstract: Movable antenna (MA) technology, which can reconfigure wireless channels by flexibly moving antenna positions in a specified region, has great potential for improving communication performance. In this paper, we consider a new setup of MAs-enabled multicasting, where we adopt a simple setting in which a linear MA array-enabled source (${\rm{S}}$) transmits a common message to two single-antenna us… ▽ More Movable antenna (MA) technology, which can reconfigure wireless channels by flexibly moving antenna positions in a specified region, has great potential for improving communication performance. In this paper, we consider a new setup of MAs-enabled multicasting, where we adopt a simple setting in which a linear MA array-enabled source (${\rm{S}}$) transmits a common message to two single-antenna users ${\rm{U}}_1$ and ${\rm{U}}_2$. We aim to maximize the minimum rate among these two users, by jointly optimizing the transmit beamforming and antenna positions at ${\rm{S}}$. Instead of utilizing the widely-used alternating optimization (AO) approach, we reveal, with rigorous proof, that the above two variables can be optimized separately: i) the optimal antenna positions can be firstly determined via the successive convex approximation technique, based on the rule of maximizing the correlation between ${\rm{S}}$-${\rm{U}}_1$ and ${\rm{S}}$-${\rm{U}}_2$ channels; ii) afterwards, the optimal closed-form transmit beamforming can be derived via simple arguments. Compared to AO, this new approach yields the same performance but reduces the computational complexities significantly. Moreover, it can provide insightful conclusions which are not possible with AO. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.19249 [pdf, other]

A Nonnested Augmented Subspace Method for Kohn-Sham Equation

Authors: Guanghui Hu, Hehu Xie, Fei Xu, Gang Zhao

Abstract: In this paper, a novel adaptive finite element method is proposed to solve the Kohn-Sham equation based on the moving mesh (nonnested mesh) adaptive technique and the augmented subspace method. Different from the classical self-consistent field iterative algorithm which requires to solve the Kohn-Sham equation directly in each adaptive finite element space, our algorithm transforms the Kohn-Sham e… ▽ More In this paper, a novel adaptive finite element method is proposed to solve the Kohn-Sham equation based on the moving mesh (nonnested mesh) adaptive technique and the augmented subspace method. Different from the classical self-consistent field iterative algorithm which requires to solve the Kohn-Sham equation directly in each adaptive finite element space, our algorithm transforms the Kohn-Sham equation into some linear boundary value problems of the same scale in each adaptive finite element space, and then the wavefunctions derived from the linear boundary value problems are corrected by solving a small-scale Kohn-Sham equation defined in a low-dimensional augmented subspace. Since the new algorithm avoids solving large-scale Kohn-Sham equation directly, a significant improvement for the solving efficiency can be obtained. In addition, the adaptive moving mesh technique is used to generate the nonnested adaptive mesh for the nonnested augmented subspace method according to the singularity of the approximate wavefunctions. The modified Hessian matrix of the approximate wavefunctions is used as the metric matrix to redistribute the mesh. Through the moving mesh adaptive technique, the redistributed mesh is almost optimal. A number of numerical experiments are carried out to verify the efficiency and the accuracy of the proposed algorithm. △ Less

Submitted 30 April, 2024; originally announced April 2024.

MSC Class: 65N30; 65N25; 65L15; 65B99

arXiv:2404.16271 [pdf]

True random number generation using 1T' molybdenum ditelluride

Authors: Yang Liu, Pengyu Liu, Yingyi Wen, Zihan Liang, Songwei Liu, Lekai Song, Jingfang Pei, Xiaoyue Fan, Teng Ma, Gang Wang, Shuo Gao, Kong-Pang Pun, Xiaolong Chen, Guohua Hu

Abstract: True random numbers are essential for scientific research and various engineering problems. Their generation, however, depends on a reliable entropy source. Here, we present true random number generation using the conductance noise probed from structurally metastable 1T' MoTe2 prepared via electrochemical exfoliation. The noise, fitting a Poisson process, is a robust entropy source capable of rema… ▽ More True random numbers are essential for scientific research and various engineering problems. Their generation, however, depends on a reliable entropy source. Here, we present true random number generation using the conductance noise probed from structurally metastable 1T' MoTe2 prepared via electrochemical exfoliation. The noise, fitting a Poisson process, is a robust entropy source capable of remaining stable even at 15 K. Noise spectral density and statistical time-lag suggest the noise originates from the random polarization of the ferroelectric dipoles in 1T' MoTe2. Using a simple circuit, the noise allows true random number generation, enabling their use as the seed for high-throughput secure random number generation over 1 Mbit/s, appealing for applications such as cryptography where secure data protection has now become severe. Particularly, we demonstrate safeguarding key biometric information in neural networks using the random numbers, proving a critical data privacy measure in big data and artificial intelligence. △ Less

Submitted 29 July, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.12980 [pdf, other]

Ring-a-Pose: A Ring for Continuous Hand Pose Tracking

Authors: Tianhong Catherine Yu, Guilin Hu, Ruidong Zhang, Hyunchul Lim, Saif Mahmud, Chi-Jung Lee, Ke Li, Devansh Agarwal, Shuyang Nie, Jinseok Oh, François Guimbretière, Cheng Zhang

Abstract: We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three use… ▽ More We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three user studies with a total of 30 participants, we evaluate Ring-a-Pose's performance on pose tracking and micro-finger gesture recognition. Without collecting any training data from a user, Ring-a-Pose tracks continuous hand poses with a joint error of 14.1mm. The joint error decreases to 10.3mm for fine-tuned user-dependent models. Ring-a-Pose recognizes 7-class micro-gestures with a 90.60% and 99.27% accuracy for user-independent and user-dependent models, respectively. Furthermore, the ring exhibits promising performance when worn on any finger. Ring-a-Pose enables the future of smart rings to track and recognize hand poses using relatively low-power acoustic sensing. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.06821 [pdf, other]

Uniqueness to inverse acoustic and elastic medium scattering problems with hyper-singular source method

Authors: Chun Liu, Guanghui Hu, Jianli Xiang, Jiayi Zhang

Abstract: This paper is concerned with inverse scattering problems of determining the support of an isotropic and homogeneous penetrable body from knowledge of multi-static far-field patterns in acoustics and in linear elasticity. The normal derivative of the total fields admits no jump on the interface of the scatterer in the trace sense. If the contrast function of the refractive index function or the den… ▽ More This paper is concerned with inverse scattering problems of determining the support of an isotropic and homogeneous penetrable body from knowledge of multi-static far-field patterns in acoustics and in linear elasticity. The normal derivative of the total fields admits no jump on the interface of the scatterer in the trace sense. If the contrast function of the refractive index function or the density function has a positive lower bound near the boundary, we propose a hyper-singular source method to prove uniqueness of inverse scattering with all incoming plane waves at a fixed energy. It is based on subtle analysis on the leading part of the scattered field when hyper-singular sources caused by the first derivative of the fundamental solution approach to a boundary point. As a by-product, we show that this hyper-singular method can be also used to determine the boundary value of a Holder continuous refractive index function in acoustics or a Holder continuous density function in linear elasticity. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.05982 [pdf, other]

The Convergence of Prescribed Combinatorial Ricci Flows for Total Geodesic Curvatures in Spherical Background Geometry

Authors: Guangming Hu, Ziping Lei, Yu Sun, Puchun Zhou

Abstract: In this paper, we study the existence and rigidity of (degenerated) circle pattern metric with prescribed total geodesic curvatures in spherical background geometry. To find the (degenerated) circle pattern metric with prescribed total geodesic curvatures, we define some prescribed combinatorial Ricci flows and study the convergence of flows for (degenerated) circle pattern metrics. We solve the p… ▽ More In this paper, we study the existence and rigidity of (degenerated) circle pattern metric with prescribed total geodesic curvatures in spherical background geometry. To find the (degenerated) circle pattern metric with prescribed total geodesic curvatures, we define some prescribed combinatorial Ricci flows and study the convergence of flows for (degenerated) circle pattern metrics. We solve the prescribed total geodesic curvature problem and provide two methods to find the degenerated circle pattern metric with prescribed total geodesic curvatures. As far as we know, this is the first degenerated result for total geodesic curvatures in spherical background geometry. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 29 pages, 7 figures

arXiv:2404.03395 [pdf, ps, other]

Movable Antennas-Assisted Secure Transmission Without Eavesdroppers' Instantaneous CSI

Authors: Guojie Hu, Qingqing Wu, Donghui Xu, Kui Xu, Jiangbo Si, Yunlong Cai, Naofal Al-Dhahir

Abstract: Movable antenna (MA) technology is highly promising for improving communication performance, due to its advantage of flexibly adjusting positions of antennas to reconfigure channel conditions. In this paper, we investigate MAs-assisted secure transmission under a legitimate transmitter Alice, a legitimate receiver Bob and multiple eavesdroppers. Specifically, we consider a practical scenario where… ▽ More Movable antenna (MA) technology is highly promising for improving communication performance, due to its advantage of flexibly adjusting positions of antennas to reconfigure channel conditions. In this paper, we investigate MAs-assisted secure transmission under a legitimate transmitter Alice, a legitimate receiver Bob and multiple eavesdroppers. Specifically, we consider a practical scenario where Alice has no any knowledge about the instantaneous non-line-of-sight component of the wiretap channel. Under this setup, we evaluate the secrecy performance by adopting the secrecy outage probability metric, the tight approximation of which is first derived by interpreting the Rician fading as a special case of Nakagami fading and concurrently exploiting the Laguerre series approximation. Then, we minimize the secrecy outage probability by jointly optimizing the transmit beamforming and positions of antennas at Alice. However, the problem is highly non-convex because the objective includes the complex incomplete gamma function. To tackle this challenge, we, for the first time, effectively approximate the inverse of the incomplete gamma function as a simple linear model. Based on this approximation, we arrive at a simplified problem with a clear structure, which can be solved via the developed alternating projected gradient ascent (APGA) algorithm. Considering the high complexity of the APGA, we further design another scheme where the zero-forcing based beamforming is adopted by Alice, and then we transform the problem into minimizing a simple function which is only related to positions of antennas at Alice.As demonstrated by simulations, our proposed schemes achieve significant performance gains compared to conventional schemes based on fixed-position antennas. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Submitted for journal publication

arXiv:2404.00403 [pdf, other]

UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause

Authors: Guimin Hu, Zhihong Zhu, Daniel Hershcovich, Hasti Seifi, Jiayuan Xie

Abstract: Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) has recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, thoughts, or situations are known as emotion causes. Both are like two sides of a coin, collectively describing human behaviors and intents. However, most existing wor… ▽ More Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) has recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, thoughts, or situations are known as emotion causes. Both are like two sides of a coin, collectively describing human behaviors and intents. However, most existing works treat MERC and MECPE as separate tasks, which may result in potential challenges in integrating emotion and cause in real-world applications. In this paper, we propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality and complementarity between emotion and emotion cause. Concretely, UniMEEC reformulates the MERC and MECPE tasks as two mask prediction problems, enhancing the interaction between emotion and cause. Meanwhile, UniMEEC shares the prompt learning among modalities for probing modality-specific knowledge from the Pre-trained model. Furthermore, we propose a task-specific hierarchical context aggregation to control the information flow to the task. Experiment results on four public benchmark datasets verify the model performance on MERC and MECPE tasks and achieve consistent improvements compared with state-of-the-art methods. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.18791 [pdf, other]

Object Pose Estimation via the Aggregation of Diffusion Features

Authors: Tianfu Wang, Guosheng Hu, Hongguang Wang

Abstract: Estimating the pose of objects from images is a crucial task of 3D scene understanding, and recent approaches have shown promising results on very large benchmarks. However, these methods experience a significant performance drop when dealing with unseen objects. We believe that it results from the limited generalizability of image features. To address this problem, we have an in-depth analysis on… ▽ More Estimating the pose of objects from images is a crucial task of 3D scene understanding, and recent approaches have shown promising results on very large benchmarks. However, these methods experience a significant performance drop when dealing with unseen objects. We believe that it results from the limited generalizability of image features. To address this problem, we have an in-depth analysis on the features of diffusion models, e.g. Stable Diffusion, which hold substantial potential for modeling unseen objects. Based on this analysis, we then innovatively introduce these diffusion features for object pose estimation. To achieve this, we propose three distinct architectures that can effectively capture and aggregate diffusion features of different granularity, greatly improving the generalizability of object pose estimation. Our approach outperforms the state-of-the-art methods by a considerable margin on three popular benchmark datasets, LM, O-LM, and T-LESS. In particular, our method achieves higher accuracy than the previous best arts on unseen objects: 98.2% vs. 93.5% on Unseen LM, 85.9% vs. 76.3% on Unseen O-LM, showing the strong generalizability of our method. Our code is released at https://github.com/Tianfu18/diff-feats-pose. △ Less

Submitted 1 June, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR2024

arXiv:2403.17676 [pdf]

Analysis on reservoir activation with the nonlinearity harnessed from solution-processed MoS2 devices

Authors: Songwei Liu, Yang Liu, Yingyi Wen, Jingfang Pei, Pengyu Liu, Lekai Song, Xiaoyue Fan, Wenchen Yang, Danmei Pan, Teng Ma, Yue Lin, Gang Wang, Guohua Hu

Abstract: Reservoir computing is a recurrent neural network that has been applied across various domains in machine learning. The implementation of reservoir computing, however, often demands heavy computations for activating the reservoir. Configuring physical reservoir networks and harnessing the nonlinearity from the underlying devices for activation is an emergent solution to address the computational c… ▽ More Reservoir computing is a recurrent neural network that has been applied across various domains in machine learning. The implementation of reservoir computing, however, often demands heavy computations for activating the reservoir. Configuring physical reservoir networks and harnessing the nonlinearity from the underlying devices for activation is an emergent solution to address the computational challenge. Herein, we analyze the feasibility of employing the nonlinearity from solution-processed molybdenum disulfide (MoS2) devices for reservoir activation. The devices, fabricated using liquid-phase exfoliated MoS2, exhibit a high-order nonlinearity achieved by Stark modulation of the MoS2 material. We demonstrate that this nonlinearity can be fitted and employed as the activation function to facilitate reservoir computing implementation. Notably, owing to the high-order nonlinearity, the network exhibits long-term synchronization and robust generalization abilities for approximating complex dynamical systems. Given the remarkable reservoir activation capability, coupled with the scalability of the device fabrication, our findings open the possibility for the physical realization of lightweight, efficient reservoir computing for, for instance, signal classification, motion tracking, and pattern recognition of complex time series as well as secure cryptography. As an example, we show the network can be appointed to generate chaotic random numbers for secure data encryption. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.17221 [pdf, other]

doi 10.1214/24-AOAS1899

Are Made and Missed Different? An analysis of Field Goal Attempts of Professional Basketball Players via Depth Based Testing Procedure

Authors: Kai Qi, Guanyu Hu, Wei Wu

Abstract: In this paper, we develop a novel depth-based testing procedure on spatial point processes to examine the difference in made and missed field goal attempts for NBA players. Specifically, our testing procedure can statistically detect the differences between made and missed field goal attempts for NBA players. We first obtain the depths of two processes under the polar coordinate system. A two-dime… ▽ More In this paper, we develop a novel depth-based testing procedure on spatial point processes to examine the difference in made and missed field goal attempts for NBA players. Specifically, our testing procedure can statistically detect the differences between made and missed field goal attempts for NBA players. We first obtain the depths of two processes under the polar coordinate system. A two-dimensional Kolmogorov-Smirnov test is then performed to test the difference between the depths of the two processes. Throughout extensive simulation studies, we show our testing procedure with good frequentist properties under both null hypothesis and alternative hypothesis. A comparison against the competing methods shows that our proposed procedure has better testing reliability and testing power. Application to the shot chart data of 191 NBA players in the 2017-2018 regular season offers interesting insights about these players' made and missed shot patterns. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 26 pages, 6 figures

arXiv:2403.15758 [pdf, ps, other]

An endpoint estimate for the maximal Calderón commutator with rough kernel

Authors: Guoen Hu, Xudong Lai, Xiangxing Tao, Qingying Xue

Abstract: In this paper, the authors consider the endpoint estimates for the maximal Calderón commutator defined by $$T_{Ωおめが,\,a}^*f(x)=\sup_{εいぷしろん>0}\Big|\int_{|x-y|>εいぷしろん}\frac{Ωおめが(x-y)}{|x-y|^{d+1}} \big(a(x)-a(y)\big)f(y)dy\Big|,$$ where $Ωおめが$ is homogeneous of degree zero, integrable on $S^{d-1}$ and has vanishing moment of order one, $a$ be a function on $\mathbb{R}^d$ such that… ▽ More In this paper, the authors consider the endpoint estimates for the maximal Calderón commutator defined by $$T_{Ωおめが,\,a}^*f(x)=\sup_{εいぷしろん>0}\Big|\int_{|x-y|>εいぷしろん}\frac{Ωおめが(x-y)}{|x-y|^{d+1}} \big(a(x)-a(y)\big)f(y)dy\Big|,$$ where $Ωおめが$ is homogeneous of degree zero, integrable on $S^{d-1}$ and has vanishing moment of order one, $a$ be a function on $\mathbb{R}^d$ such that $\nabla a\in L^{\infty}(\mathbb{R}^d)$. The authors prove that if $Ωおめが\in L\log L(S^{d-1})$, then $T^*_{Ωおめが,\,a}$ satisfies an endpoint estimate of $L\log\log L$ type. △ Less

Submitted 14 April, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

Comments: 25 pages

MSC Class: 42B20

arXiv:2403.15274 [pdf]

Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review

Authors: Jinge Wang, Zien Cheng, Qiuming Yao, Li Liu, Dong Xu, Gangqing Hu

Abstract: The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinforma… ▽ More The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments. △ Less

Submitted 12 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: Peer-reviewed and accepted by Quantitative Biology

arXiv:2403.14242 [pdf, other]

E-Syn: E-Graph Rewriting with Technology-Aware Cost Functions for Logic Synthesis

Authors: Chen Chen, Guangyu Hu, Dongsheng Zuo, Cunxi Yu, Yuzhe Ma, Hongce Zhang

Abstract: Logic synthesis plays a crucial role in the digital design flow. It has a decisive influence on the final Quality of Results (QoR) of the circuit implementations. However, existing multi-level logic optimization algorithms often employ greedy approaches with a series of local optimization steps. Each step breaks the circuit into small pieces (e.g., k-feasible cuts) and applies incremental changes… ▽ More Logic synthesis plays a crucial role in the digital design flow. It has a decisive influence on the final Quality of Results (QoR) of the circuit implementations. However, existing multi-level logic optimization algorithms often employ greedy approaches with a series of local optimization steps. Each step breaks the circuit into small pieces (e.g., k-feasible cuts) and applies incremental changes to individual pieces separately. These local optimization steps could limit the exploration space and may miss opportunities for significant improvements. To address the limitation, this paper proposes using e-graph in logic synthesis. The new workflow, named Esyn, makes use of the well-established e-graph infrastructure to efficiently perform logic rewriting. It explores a diverse set of equivalent Boolean representations while allowing technology-aware cost functions to better support delay-oriented and area-oriented logic synthesis. Experiments over a wide range of benchmark designs show our proposed logic optimization approach reaches a wider design space compared to the commonly used AIG-based logic synthesis flow. It achieves on average 15.29% delay saving in delay-oriented synthesis and 6.42% area saving for area-oriented synthesis. △ Less

Submitted 25 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted by DAC 2024; Please note that this is not the final camera-ready version

arXiv:2403.11656 [pdf, other]

LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model

Authors: Yuxin Cao, Jinghao Li, Xi Xiao, Derui Wang, Minhui Xue, Hao Ge, Wei Liu, Guangwu Hu

Abstract: Previous work has shown that well-crafted adversarial perturbations can threaten the security of video recognition systems. Attackers can invade such models with a low query budget when the perturbations are semantic-invariant, such as StyleFool. Despite the query efficiency, the naturalness of the minutia areas still requires amelioration, since StyleFool leverages style transfer to all pixels in… ▽ More Previous work has shown that well-crafted adversarial perturbations can threaten the security of video recognition systems. Attackers can invade such models with a low query budget when the perturbations are semantic-invariant, such as StyleFool. Despite the query efficiency, the naturalness of the minutia areas still requires amelioration, since StyleFool leverages style transfer to all pixels in each frame. To close the gap, we propose LocalStyleFool, an improved black-box video adversarial attack that superimposes regional style-transfer-based perturbations on videos. Benefiting from the popularity and scalably usability of Segment Anything Model (SAM), we first extract different regions according to semantic information and then track them through the video stream to maintain the temporal consistency. Then, we add style-transfer-based perturbations to several regions selected based on the associative criterion of transfer-based gradient information and regional area. Perturbation fine adjustment is followed to make stylized videos adversarial. We demonstrate that LocalStyleFool can improve both intra-frame and inter-frame naturalness through a human-assessed survey, while maintaining competitive fooling rate and query efficiency. Successful experiments on the high-resolution dataset also showcase that scrupulous segmentation of SAM helps to improve the scalability of adversarial attacks under high-resolution data. △ Less

Submitted 27 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted to 2024 IEEE Security and Privacy Workshops (SPW)

arXiv:2403.11613 [pdf, other]

Scattering Singularity in Topological Dielectric Photonic Crystals

Authors: Langlang Xiong, Xunya Jiang, Guangwei Hu

Abstract: The exploration of topology in natural materials and metamaterials has garnered significant attention. Notably, the one-dimensional (1D) and two-dimensional (2D) Su-Schrieffer-Heeger (SSH) model, assessed through tight-binding approximations, has been extensively investigated in both quantum and classical systems, encompassing general and higher-order topology. Despite these advancements, a compre… ▽ More The exploration of topology in natural materials and metamaterials has garnered significant attention. Notably, the one-dimensional (1D) and two-dimensional (2D) Su-Schrieffer-Heeger (SSH) model, assessed through tight-binding approximations, has been extensively investigated in both quantum and classical systems, encompassing general and higher-order topology. Despite these advancements, a comprehensive examination of these models from the perspective of wave physics, particularly the scattering view, remains underexplored. In this study, we systematically unveil the origin of the 1D and 2D Zak phases stemming from the zero-scattering point, termed the scattering singularity in k-space. Employing an expanded plane wave expansion, we accurately compute the reflective spectrum of an infinite 2D photonic crystal (2D-PhC). Analyzing the reflective spectrum reveals the presence of a zero-scattering line in the 2D-PhC, considered the topological origin of the non-trivial Zak phase. Two distinct models, representing omnidirectional non-trivial cases and directional non-trivial cases, are employed to substantiate these findings. Our work introduces a novel perspective for characterizing the nature of non-trivial topological phases. The identification of the zero-scattering line not only enhances our understanding of the underlying physics but also provides valuable insights for the design of innovative devices. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 8 pages, 4 figures

arXiv:2403.08621 [pdf, ps, other]

Spin-resolved counting statistics as a sensitive probe of spin correlation in transport through a quantum dot spin valve

Authors: Guanjian Hu, Shikuan Wang, Jing Hu, RuiQiang Li, Yiying Yan, JunYan Luo

Abstract: We investigate the noise in spin transport through a single quantum dot (QD) tunnel coupled to ferromagnetic electrodes with noncollinear magnetizations. Based on a spin-resolved quantum master equation, auto- and cross-correlations of spin-resolved currents are analyzed to reveal the underlying spin transport dynamics and characteristics for various polarizations. We find the currents of majority… ▽ More We investigate the noise in spin transport through a single quantum dot (QD) tunnel coupled to ferromagnetic electrodes with noncollinear magnetizations. Based on a spin-resolved quantum master equation, auto- and cross-correlations of spin-resolved currents are analyzed to reveal the underlying spin transport dynamics and characteristics for various polarizations. We find the currents of majority and minority spins could be strongly autocorrelated despite uncorrelated charge transfer. The interplay between tunnel coupling and the Coulomb interaction gives rise to an exchange magnetic field, leading to the precession of the accumulated spin in the QD. It strongly suppresses the bunching of spin tunneling events and results in a unique double-peak structure in the noise of the net spin current. The spin autocorrelation is found to be susceptible to magnetization alignments, which may serve as a sensitive tool to measure the magnetization directions between the ferromagnetic electrodes. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 9 pages, 4 figures

arXiv:2403.08450 [pdf, ps, other]

Increasing stability for inverse source problem with limited-aperture far field data at multi-frequencies

Authors: Ibtissem Ben Aïcha, Guanghui Hu, Suliang Si

Abstract: We study the increasing stability of an inverse source problem for the Helmholtz equation from limited-aperture far field data at multiple wave numbers. The measurement data are givenby the far field patterns $u^\infity(\hat{x},k)$ for all observation directions in some neighborhood of a fixed direction $\hat{x}$ and for all wave numbers k belonging to a finite interval $(0,K)$. In this paper, we… ▽ More We study the increasing stability of an inverse source problem for the Helmholtz equation from limited-aperture far field data at multiple wave numbers. The measurement data are givenby the far field patterns $u^\infity(\hat{x},k)$ for all observation directions in some neighborhood of a fixed direction $\hat{x}$ and for all wave numbers k belonging to a finite interval $(0,K)$. In this paper, we discuss the increasing stability with respect to the width of the wavenumber interval $K>1$. In three dimensions we establish stability estimates of the $L^2$-norm and $H^{-1}$-norm of the source function from the far field data. The ill-posedness of the inverse source problem turns out to be of Hölder type while increasing the wavenumber band K. We also discuss an analytic continuation argument of the far-field data with respect to the wavenumbers at a fixed direction. △ Less

Submitted 13 March, 2024; originally announced March 2024.

MSC Class: 35R30; 78A46

Showing 1–50 of 538 results for author: Hu, G