Search | arXiv e-print repository

DIABLO: A 6-DoF Wheeled Bipedal Robot Composed Entirely of Direct-Drive Joints

Authors: Dingchuan Liu, Fangfang Yang, Xuanhong Liao, Ximin Lyu

Abstract: Wheeled bipedal robots offer the advantages of both wheeled and legged robots, combining the ability to traverse a wide range of terrains and environments with high efficiency. However, the conventional approach in existing wheeled bipedal robots involves motor-driven joints with high-ratio gearboxes. While this approach provides specific benefits, it also presents several challenges, including in… ▽ More Wheeled bipedal robots offer the advantages of both wheeled and legged robots, combining the ability to traverse a wide range of terrains and environments with high efficiency. However, the conventional approach in existing wheeled bipedal robots involves motor-driven joints with high-ratio gearboxes. While this approach provides specific benefits, it also presents several challenges, including increased mechanical complexity, efficiency losses, noise, vibrations, and higher maintenance and lubrication requirements. Addressing the aforementioned concerns, we developed a direct-drive wheeled bipedal robot called DIABLO, which eliminates the use of gearboxes entirely. Our robotic system is simplified as a second-order inverted pendulum, and we have designed an LQR-based balance controller to ensure stability. Additionally, we implemented comprehensive motion controller, including yaw, split-angle, height, and roll controllers. Through expriments in simulations and real-world prototype, we have demonstrated that our platform achieves satisfactory performance. △ Less

Submitted 1 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

Comments: This paper has already been accepted by IROS 2024

arXiv:2407.06514 [pdf, other]

Asymmetric Mask Scheme for Self-Supervised Real Image Denoising

Authors: Xiangyu Liao, Tianheng Zheng, Jiayu Zhong, Pingping Zhang, Chao Ren

Abstract: In recent years, self-supervised denoising methods have gained significant success and become critically important in the field of image restoration. Among them, the blind spot network based methods are the most typical type and have attracted the attentions of a large number of researchers. Although the introduction of blind spot operations can prevent identity mapping from noise to noise, it imp… ▽ More In recent years, self-supervised denoising methods have gained significant success and become critically important in the field of image restoration. Among them, the blind spot network based methods are the most typical type and have attracted the attentions of a large number of researchers. Although the introduction of blind spot operations can prevent identity mapping from noise to noise, it imposes stringent requirements on the receptive fields in the network design, thereby limiting overall performance. To address this challenge, we propose a single mask scheme for self-supervised denoising training, which eliminates the need for blind spot operation and thereby removes constraints on the network structure design. Furthermore, to achieve denoising across entire image during inference, we propose a multi-mask scheme. Our method, featuring the asymmetric mask scheme in training and inference, achieves state-of-the-art performance on existing real noisy image datasets. All the source code will be made available to the public. △ Less

Submitted 14 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.00674 [pdf, other]

Emergent Crowd Grouping via Heuristic Self-Organization

Authors: Xiao-Cheng Liao, Wei-Neng Chen, Xiang-Ling Chen, Yi Mei

Abstract: Modeling crowds has many important applications in games and computer animation. Inspired by the emergent following effect in real-life crowd scenarios, in this work, we develop a method for implicitly grouping moving agents. We achieve this by analyzing local information around each agent and rotating its preferred velocity accordingly. Each agent could automatically form an implicit group with i… ▽ More Modeling crowds has many important applications in games and computer animation. Inspired by the emergent following effect in real-life crowd scenarios, in this work, we develop a method for implicitly grouping moving agents. We achieve this by analyzing local information around each agent and rotating its preferred velocity accordingly. Each agent could automatically form an implicit group with its neighboring agents that have similar directions. In contrast to an explicit group, there are no strict boundaries for an implicit group. If an agent's direction deviates from its group as a result of positional changes, it will autonomously exit the group or join another implicitly formed neighboring group. This implicit grouping is autonomously emergent among agents rather than deliberately controlled by the algorithm. The proposed method is compared with many crowd simulation models, and the experimental results indicate that our approach achieves the lowest congestion levels in some classic scenarios. In addition, we demonstrate that adjusting the preferred velocity of agents can actually reduce the dissimilarity between their actual velocity and the original preferred velocity. Our work is available online. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.16942 [pdf, other]

Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

Authors: Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

Abstract: Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE… ▽ More Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RETFound and UIOS, and got further improvement with thresholding strategy to 98.44%. In the external test sets obtained from other OCT devices, FMUE achieved an accuracy of 88.75% and 92.73% before and after thresholding. Our model is superior to two ophthalmologists with a higher F1 score (95.17% vs. 61.93% &71.72%). Besides, our model correctly predicts high uncertainty scores for samples with ambiguous features, of non-target-category diseases, or with low-quality to prompt manual checks and prevent misdiagnosis. FMUE provides a trustworthy method for automatic retinal anomalies detection in the real-world clinical open set environment. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: All codes are available at https://github.com/yuanyuanpeng0129/FMUE

arXiv:2406.12779 [pdf, other]

Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition

Authors: Xingming Liao, Nankai Lin, Haowen Li, Lianglun Cheng, Zhuowei Wang, Chong Chen

Abstract: Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of… ▽ More Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of nested entities in NNER, existing data augmentation methods cannot be directly applied to NNER tasks. Therefore, in this work, we focus on data augmentation for NNER and resort to more expressive structures, Composited-Nested-Label Classification (CNLC) in which constituents are combined by nested-word and nested-label, to model nested entities. The dataset is augmented using the Composited-Nested-Learning (CNL). In addition, we propose the Confidence Filtering Mechanism (CFM) for a more efficient selection of generated data. Experimental results demonstrate that this approach results in improvements in ACE2004 and ACE2005 and alleviates the impact of sample imbalance. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted by CSCWD 2024

arXiv:2406.09140 [pdf, other]

Investigating the translation capabilities of Large Language Models trained on parallel data only

Authors: Javier García Gilabert, Carlos Escolano, Aleix Sant Savall, Francesca De Luca Fornaciari, Audrey Mash, Xixian Liao, Maite Melero

Abstract: In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this… ▽ More In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce PLUME (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the impact of the different elements of the prompt, and their cross-lingual representation space. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: We release our code at: https://github.com/projecte-aina/Plume

arXiv:2406.07808 [pdf]

Coexistence of ferroelectric and ferrielectric phases in ultrathin antiferroelectric PbZrO3 thin films

Authors: Ying Liu, Ranming Niu, Roger Uriach, David Pesquera, Jose Manuel Caicedo Roque, Jose Santiso, Julie M Cairney, Xiaozhou Liao, Jordi Arbiol, Gustau Catalan

Abstract: Whereas ferroelectricity may vanish in ultra-thin ferroelectric films, it is expected to emerge in ultra-thin anti-ferroelectric films, sparking people's interest in using antiferroelectric materials as an alternative to ferroelectric ones for high-density data storage applications. Lead Zirconate (PbZrO3) is considered the prototype material for antiferroelectricity, and indeed previous studies i… ▽ More Whereas ferroelectricity may vanish in ultra-thin ferroelectric films, it is expected to emerge in ultra-thin anti-ferroelectric films, sparking people's interest in using antiferroelectric materials as an alternative to ferroelectric ones for high-density data storage applications. Lead Zirconate (PbZrO3) is considered the prototype material for antiferroelectricity, and indeed previous studies indicated that nanoscale PbZrO3 films exhibit ferroelectricity. The understanding of such phenomena from the microstructure aspect is crucial but still lacking. In this study, we fabricated a PbZrO3 film with thicknesses varying from 5 nm to 80 nm. Using Piezoresponse Force Microscopy, we discovered the film displayed a transition from antiferroelectric behaviour in the thicker areas to ferroelectric behaviour in the thinner ones, with a critical thickness between 10 and 15 nm. In this critical thickness range, a 12 nm PZO thin film was chosen for further study using aberration-corrected scanning transmission electron microscopy. The investigation showed that the film comprises both ferroelectric and ferrielectric phases. The ferroelectric phase is characterized by polarisation along the pseudocubic [011] projection direction. The positions of Pb, Zr, and O were determined using the integrated differential phase contrast method. This allowed us to ascertain that the ferroelectric PbZrO3 unit cell is half the size of that in the antiferroelectric phase on the ab plane. The observed unit cell is different from the electric field-induced ferroelectric rhombohedral phases. Additionally, we identified a ferrielectric phase with a unique up-up-zero-zero dipole configuration. The finding is crucial for understanding the performance of ultrathin antiferroelectric thin films and the subsequent design and development of antiferroelectric devices. △ Less

Submitted 31 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.03875 [pdf, other]

Energy-storing analysis and fishtail stiffness optimization for a wire-driven elastic robotic fish

Authors: Xiaocun Liao, Chao Zhou, Junfeng Fan, Zhuoliang Zhang, Zhaoran Yin, Liangwei Deng

Abstract: The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bion… ▽ More The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bionic spine can produce elastic deformation to store energy under the action of the wire driving and motor for responding to the fluctuations of the motor power. Further, we analyze the effects of the energy-storing of the active-segment elastic spine on the smoothness of motor power. Based on the developed Lagrangian dynamic model and cantilever beam model, the power-variance-based nonlinear optimization model for the stiffness of the active-segment elastic spine is established to respond to the sharp fluctuations of motor power during each fishtail swing cycle. Results validate that the energy-storing of the active-segment elastic spine plays a vital role in improving the power fluctuations and maximum frequency of the motor by adjusting its stiffness reasonably, which is beneficial to achieving high propulsion and high speed for robotic fish. Compared with the active-segment rigid spine that is incapable of storing energy, the energy-storing of the active-segment elastic spine is beneficial to increase the maximum frequency of the motor and the average thrust of the fishtail by 0.41 Hzへるつ, and 0.06 N, respectively. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 14 pages, 19 figures

arXiv:2406.02642 [pdf, other]

E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory

Authors: Zhou Yang, Zhaochun Ren, Chenglong Ye, Yufeng Wang, Haizhou Sun, Chao Chen, Xiaofei Zhu, Yunbing Wu, Xiangwen Liao

Abstract: In-context learning (ICL) achieves remarkable performance in various domains such as knowledge acquisition, commonsense reasoning, and semantic understanding. However, its performance significantly deteriorates for emotion detection tasks, especially fine-grained emotion recognition. The underlying reasons for this remain unclear. In this paper, we identify the reasons behind ICL's poor performanc… ▽ More In-context learning (ICL) achieves remarkable performance in various domains such as knowledge acquisition, commonsense reasoning, and semantic understanding. However, its performance significantly deteriorates for emotion detection tasks, especially fine-grained emotion recognition. The underlying reasons for this remain unclear. In this paper, we identify the reasons behind ICL's poor performance from the perspective of prototype theory and propose a method to address this issue. Specifically, we conduct extensive pilot experiments and find that ICL conforms to the prototype theory on fine-grained emotion recognition. Based on this theory, we uncover the following deficiencies in ICL: (1) It relies on prototypes (example-label pairs) that are semantically similar but emotionally inaccurate to predict emotions. (2) It is prone to interference from irrelevant categories, affecting the accuracy and robustness of the predictions. To address these issues, we propose an Emotion Context Learning method (E-ICL) on fine-grained emotion recognition. E-ICL relies on more emotionally accurate prototypes to predict categories by referring to emotionally similar examples with dynamic labels. Simultaneously, E-ICL employs an exclusionary emotion prediction strategy to avoid interference from irrelevant categories, thereby increasing its accuracy and robustness. Note that the entire process is accomplished with the assistance of a plug-and-play emotion auxiliary model, without additional training. Experiments on the fine-grained emotion datasets EDOS, Empathetic-Dialogues, EmpatheticIntent, and GoEmotions show that E-ICL achieves superior emotion prediction performance. Furthermore, even when the emotion auxiliary model used is lower than 10% of the LLMs, E-ICL can still boost the performance of LLMs by over 4% on multiple datasets. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 16 pages, 7 figures, 5 tables

arXiv:2405.21045 [pdf]

An Attention-Based Multi-Context Convolutional Encoder-Decoder Neural Network for Work Zone Traffic Impact Prediction

Authors: Qinhua Jiang, Xishun Liao, Yaofa Gong, Jiaqi Ma

Abstract: Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict th… ▽ More Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict the traffic speed and incident likelihood during planned work zone events. The proposed model transforms traffic patterns into 2D space-time images for both model input and output and employs an attention-based multi-context convolutional encoder-decoder architecture to capture the spatial-temporal dependencies between work zone events and traffic variations. Trained and validated on four years of archived work zone traffic data from Maryland, USA, the model demonstrates superior performance over baseline models in predicting traffic speed, incident likelihood, and inferred traffic attributes such as queue length and congestion timings (i.e., start time and duration). Specifically, the proposed model outperforms the baseline models by reducing the prediction error of traffic speed by 5% to 34%, queue length by 11% to 29%, congestion timing by 6% to 17%, and increasing the accuracy of incident predictions by 5% to 7%. Consequently, this model offers substantial promise for enhancing the planning and traffic management of work zones. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20050 [pdf, ps, other]

Stability of the second non trivial eigenvalue of the Neumann Laplacian

Authors: Xin Liao

Abstract: In this paper, building on the ideas of Brasco and Pratelli (Geom. Funct. Anal., 22 (2012), 107-135), we establish a stability estimate for Bucur and Henrot's inequality (Acta Math., 222 (2019), 337-361). Their inequality asserts that, among regular sets of given measure, the disjoint union of two balls with the same radius maximizes the second non trivial eigenvalue of the Neumann Laplacian. In this paper, building on the ideas of Brasco and Pratelli (Geom. Funct. Anal., 22 (2012), 107-135), we establish a stability estimate for Bucur and Henrot's inequality (Acta Math., 222 (2019), 337-361). Their inequality asserts that, among regular sets of given measure, the disjoint union of two balls with the same radius maximizes the second non trivial eigenvalue of the Neumann Laplacian. △ Less

Submitted 30 May, 2024; originally announced May 2024.

MSC Class: 2010 Mathematics Subject Classification: 47A75; 49Q20; 49R05

arXiv:2405.18435 [pdf, other]

QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks. △ Less

Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

Comments: initial technical report

arXiv:2405.17870 [pdf, other]

Full-Stack Allreduce on Multi-Rail Networks

Authors: Enda Yu, Dezun Dong, Xiangke Liao

Abstract: The high communication costs impede scalability in distributed systems. Multimodal models like Sora exacerbate this issue by requiring more resources than current networks can support. However, existing network architectures fail to address this gap. In this paper, we provide full-stack support for allreduce on multi-rail networks, aiming to overcome the scalability limitations of large-scale netw… ▽ More The high communication costs impede scalability in distributed systems. Multimodal models like Sora exacerbate this issue by requiring more resources than current networks can support. However, existing network architectures fail to address this gap. In this paper, we provide full-stack support for allreduce on multi-rail networks, aiming to overcome the scalability limitations of large-scale networks by facilitating collaborative data transfer across various networks. To achieve this, we propose the Nezha system, which integrates TCP, in-network computing protocol SHARP, and RDMA-based protocol GLEX. To maximize data transfer rates, Nezha incorporates a load balancing data allocation scheme based on cost feedback and combines exception handling to achieve reliable data transmission. Our experiments on a six-node cluster demonstrate that Nezha significantly enhances allreduce performance by 58\% to 87\% in homogeneous dual-rail configurations and offers considerable acceleration in heterogeneous settings, contingent on the performance variance among networks. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Submitted to SC'2024

arXiv:2405.17468 [pdf, other]

Deep Activity Model: A Generative Approach for Human Mobility Pattern Synthesis

Authors: Xishun Liao, Brian Yueshuai He, Qinhua Jiang, Chenchen Kuai, Jiaqi Ma

Abstract: Human mobility significantly impacts various aspects of society, including transportation, urban planning, and public health. The increasing availability of diverse mobility data and advancements in deep learning have revolutionized mobility modeling. Existing deep learning models, however, mainly study spatio-temporal patterns using trajectories and often fall short in capturing the underlying se… ▽ More Human mobility significantly impacts various aspects of society, including transportation, urban planning, and public health. The increasing availability of diverse mobility data and advancements in deep learning have revolutionized mobility modeling. Existing deep learning models, however, mainly study spatio-temporal patterns using trajectories and often fall short in capturing the underlying semantic interdependency among activities. Moreover, they are also constrained by the data source. These two factors thereby limit their realism and adaptability, respectively. Meanwhile, traditional activity-based models (ABMs) in transportation modeling rely on rigid assumptions and are costly and time-consuming to calibrate, making them difficult to adapt and scale to new regions, especially those regions with limited amount of required conventional travel data. To address these limitations, we develop a novel generative deep learning approach for human mobility modeling and synthesis, using ubiquitous and open-source data. Additionally, the model can be fine-tuned with local data, enabling adaptable and accurate representations of mobility patterns across different regions. The model is evaluated on a nationwide dataset of the United States, where it demonstrates superior performance in generating activity chains that closely follow ground truth distributions. Further tests using state- or city-specific datasets from California, Washington, and Mexico City confirm its transferability. This innovative approach offers substantial potential to advance mobility modeling research, especially in generating human activity chains as input for downstream activity-based mobility simulation models and providing enhanced tools for urban planners and policymakers. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.11900 [pdf, ps, other]

Global-in-time well-posedness of the compressible Navier-Stokes equations with striated density

Authors: Xian Liao, Sagbo Marcel Zodji

Abstract: We first show local-in-time well-posedness of the compressible Navier-Stokes equations, assuming striated regularity while no other smoothness or smallness conditions on the initial density. With these local-in-time solutions served as blocks, for \textit{less} regular initial data where the vacuum is permitted, the global-in-time well-posedness follows from the energy estimates and the propagated… ▽ More We first show local-in-time well-posedness of the compressible Navier-Stokes equations, assuming striated regularity while no other smoothness or smallness conditions on the initial density. With these local-in-time solutions served as blocks, for \textit{less} regular initial data where the vacuum is permitted, the global-in-time well-posedness follows from the energy estimates and the propagated striated regularity of the density function, if the bulk viscosity coefficient is large enough in the two dimensional case. The global-in-time well-posedness holds also true in the three dimensional case, provided with large bulk viscosity coefficient together with small initial energy. This solves the density-patch problem in the exterior domain for the compressible model with $W^{2,p}$-Interfaces. Finally, the singular incompressible limit toward the inhomogenous incompressible model when the bulk viscosity coefficient tends to infinity is obtained. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 35 pages

MSC Class: 35A01; 35A02; 35Q30; 35R35; 76N10

arXiv:2405.11715 [pdf, other]

Semantic Trajectory Data Mining with LLM-Informed POI Classification

Authors: Yifan Liu, Chenchen Kuai, Haoxuan Ma, Xishun Liao, Brian Yueshuai He, Jiaqi Ma

Abstract: Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly en… ▽ More Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly enhance the quality of trajectory mining. However, integrating these insights is challenging, as many POIs have incomplete feature information, and current learning-based POI algorithms require the integrity of datasets to do the classification. In this paper, we introduce a novel pipeline for human travel trajectory mining. Our approach first leverages the strong inferential and comprehension capabilities of large language models (LLMs) to annotate POI with activity types and then uses a Bayesian-based algorithm to infer activity for each stay point in a trajectory. In our evaluation using the OpenStreetMap (OSM) POI dataset, our approach achieves a 93.4% accuracy and a 96.1% F-1 score in POI classification, and a 91.7% accuracy with a 92.3% F-1 score in activity inference. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.10600 [pdf, ps, other]

Dirichlet problem for a class of nonlinear degenerate elliptic operators with critical growth and logarithmic perturbation

Authors: Hua Chen, Xin Liao, Ming Zhang

Abstract: In this paper, we investigate the existence of weak solutions for a class of degenerate elliptic Dirichlet problems with critical nonlinearity and a logarithmic perturbation In this paper, we investigate the existence of weak solutions for a class of degenerate elliptic Dirichlet problems with critical nonlinearity and a logarithmic perturbation △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2404.18149 [pdf, other]

Compressed Deepfake Video Detection Based on 3D Spatiotemporal Trajectories

Authors: Zongmei Chen, Xin Liao, Xiaoshuai Wu, Yanxiang Chen

Abstract: The misuse of deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals. However, existing methods for detecting deepfakes primarily focus on uncompressed videos, such as noise characteristics, local textures, or frequency statistics. When applied to compressed videos, these methods experience a decrease in detection performance and are less suitable f… ▽ More The misuse of deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals. However, existing methods for detecting deepfakes primarily focus on uncompressed videos, such as noise characteristics, local textures, or frequency statistics. When applied to compressed videos, these methods experience a decrease in detection performance and are less suitable for real-world scenarios. In this paper, we propose a deepfake video detection method based on 3D spatiotemporal trajectories. Specifically, we utilize a robust 3D model to construct spatiotemporal motion features, integrating feature details from both 2D and 3D frames to mitigate the influence of large head rotation angles or insufficient lighting within frames. Furthermore, we separate facial expressions from head movements and design a sequential analysis method based on phase space motion trajectories to explore the feature differences between genuine and fake faces in deepfake videos. We conduct extensive experiments to validate the performance of our proposed method on several compressed deepfake benchmarks. The robustness of the well-designed features is verified by calculating the consistent distribution of facial landmarks before and after video compression.Our method yields satisfactory results and showcases its potential for practical applications. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18136 [pdf, other]

SafePaint: Anti-forensic Image Inpainting with Domain Adaptation

Authors: Dunyun Chen, Xin Liao, Xiaoshuai Wu, Shiwei Chen

Abstract: Existing image inpainting methods have achieved remarkable accomplishments in generating visually appealing results, often accompanied by a trend toward creating more intricate structural textures. However, while these models excel at creating more realistic image content, they often leave noticeable traces of tampering, posing a significant threat to security. In this work, we take the anti-foren… ▽ More Existing image inpainting methods have achieved remarkable accomplishments in generating visually appealing results, often accompanied by a trend toward creating more intricate structural textures. However, while these models excel at creating more realistic image content, they often leave noticeable traces of tampering, posing a significant threat to security. In this work, we take the anti-forensic capabilities into consideration, firstly proposing an end-to-end training framework for anti-forensic image inpainting named SafePaint. Specifically, we innovatively formulated image inpainting as two major tasks: semantically plausible content completion and region-wise optimization. The former is similar to current inpainting methods that aim to restore the missing regions of corrupted images. The latter, through domain adaptation, endeavors to reconcile the discrepancies between the inpainted region and the unaltered area to achieve anti-forensic goals. Through comprehensive theoretical analysis, we validate the effectiveness of domain adaptation for anti-forensic performance. Furthermore, we meticulously crafted a region-wise separated attention (RWSA) module, which not only aligns with our objective of anti-forensics but also enhances the performance of the model. Extensive qualitative and quantitative evaluations show our approach achieves comparable results to existing image inpainting methods while offering anti-forensic capabilities not available in other methods. △ Less

Submitted 6 August, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.17867 [pdf, other]

Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

Authors: Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin

Abstract: AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images… ▽ More AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.15290 [pdf]

A point cloud processing method of mmWave radar over automotive scenario

Authors: Qingmian Wan, Hongli Peng, Xing Liao, Kuayue Liu

Abstract: This paper introduces in detail the effective method of comprehensive target judgment by using radar RA map and point cloud map. Different output of radar can effectively judge the road boundary of target and the relative coordinates of target, avoid the error of output caused by excessive processing information, and greatly improve the processing efficiency of DBSCAN of the measured target. This paper introduces in detail the effective method of comprehensive target judgment by using radar RA map and point cloud map. Different output of radar can effectively judge the road boundary of target and the relative coordinates of target, avoid the error of output caused by excessive processing information, and greatly improve the processing efficiency of DBSCAN of the measured target. △ Less

Submitted 23 March, 2024; originally announced April 2024.

arXiv:2404.14642 [pdf, other]

Uncertainty Quantification on Graph Learning: A Survey

Authors: Chao Chen, Chenghua Guo, Rui Xu, Xiangwen Liao, Xi Zhang, Sihong Xie, Hui Xiong, Philip Yu

Abstract: Graphical models, including Graph Neural Networks (GNNs) and Probabilistic Graphical Models (PGMs), have demonstrated their exceptional capabilities across numerous fields. These models necessitate effective uncertainty quantification to ensure reliable decision-making amid the challenges posed by model training discrepancies and unpredictable testing scenarios. This survey examines recent works t… ▽ More Graphical models, including Graph Neural Networks (GNNs) and Probabilistic Graphical Models (PGMs), have demonstrated their exceptional capabilities across numerous fields. These models necessitate effective uncertainty quantification to ensure reliable decision-making amid the challenges posed by model training discrepancies and unpredictable testing scenarios. This survey examines recent works that address uncertainty quantification within the model architectures, training, and inference of GNNs and PGMs. We aim to provide an overview of the current landscape of uncertainty in graphical models by organizing the recent methods into uncertainty representation and handling. By summarizing state-of-the-art methods, this survey seeks to deepen the understanding of uncertainty quantification in graphical models, thereby increasing their effectiveness and safety in critical applications. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13814 [pdf, other]

doi 10.1007/JHEP06(2024)197

Discovering Quirks through Timing at FASER and Future Forward Experiments at the LHC

Authors: Jonathan L. Feng, Jinmian Li, Xufei Liao, Jian Ni, Junle Pei

Abstract: Quirks are generic predictions of strongly-coupled dark sectors. For weak-scale masses and a broad range of confining scales in the dark sector, quirks can be discovered only at the energy frontier, but quirk--anti-quirk pairs are produced with unusual signatures at low $p_T$, making them difficult to detect at the large LHC detectors. We determine the prospects for discovering quirks using timing… ▽ More Quirks are generic predictions of strongly-coupled dark sectors. For weak-scale masses and a broad range of confining scales in the dark sector, quirks can be discovered only at the energy frontier, but quirk--anti-quirk pairs are produced with unusual signatures at low $p_T$, making them difficult to detect at the large LHC detectors. We determine the prospects for discovering quirks using timing information at FASER, FASER2, and an "ultimate detector" in the far-forward region at the LHC. NLO QCD corrections are incorporated in the simulation of quirk production, which can significantly increase the production rate. To accurately propagate quirk pairs from the ATLAS interaction point to the forward detectors, the ionization energy loss of charged quirks traveling through matter, the radiation of infracolor glueballs and QCD hadrons during quirk pair oscillations, and the annihilation of quirkonium are properly considered. The quirk signal is separated from the large muon background using timing information from scintillator detectors by requiring either two coincident delayed tracks, based on arrival times at the detector, or two coincident slow tracks, based on time differences between hits in the front and back scintillators. We find that simple cuts preserve much of the signal, but reduce the muon background to negligible levels. With the data already collected, FASER can discover quirks in currently unconstrained parameter space. FASER2, running at the Forward Physics Facility during the HL-LHC era, will greatly extend this reach, probing the TeV-scale quirk masses motivated by the gauge hierarchy problem for the broad range of dark-sector confining scales between 100 eV and 100 keV. △ Less

Submitted 20 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: 29 pages, 11 figures, version to appear in JHEP

arXiv:2403.17328 [pdf, other]

Learning Traffic Signal Control via Genetic Programming

Authors: Xiao-Cheng Liao, Yi Mei, Mengjie Zhang

Abstract: The control of traffic signals is crucial for improving transportation efficiency. Recently, learning-based methods, especially Deep Reinforcement Learning (DRL), garnered substantial success in the quest for more efficient traffic signal control strategies. However, the design of rewards in DRL highly demands domain knowledge to converge to an effective policy, and the final policy also presents… ▽ More The control of traffic signals is crucial for improving transportation efficiency. Recently, learning-based methods, especially Deep Reinforcement Learning (DRL), garnered substantial success in the quest for more efficient traffic signal control strategies. However, the design of rewards in DRL highly demands domain knowledge to converge to an effective policy, and the final policy also presents difficulties in terms of explainability. In this work, a new learning-based method for signal control in complex intersections is proposed. In our approach, we design a concept of phase urgency for each signal phase. During signal transitions, the traffic light control strategy selects the next phase to be activated based on the phase urgency. We then proposed to represent the urgency function as an explainable tree structure. The urgency function can calculate the phase urgency for a specific phase based on the current road conditions. Genetic programming is adopted to perform gradient-free optimization of the urgency function. We test our algorithm on multiple public traffic signal control datasets. The experimental results indicate that the tree-shaped urgency function evolved by genetic programming outperforms the baselines, including a state-of-the-art method in the transportation field and a well-known DRL-based method. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16398 [pdf, other]

Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data

Authors: Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Fengyuan Yu, Huabin Zhu, Binhui Yao, Tao Wang, Xiaolin Zheng, Yanchao Tan

Abstract: Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) incon… ▽ More Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) inconsistent representation spaces among local models. The former indicates that representation collapse in local model will subsequently impact the global model and other local models. The latter means that clients model data representation with inconsistent parameters due to the deficiency of supervision signals. In this work, we propose FedU2 which enhances generating uniform and unified representation in FUSL with non-IID data. Specifically, FedU2 consists of flexible uniform regularizer (FUR) and efficient unified aggregator (EUA). FUR in each client avoids representation collapse via dispersing samples uniformly, and EUA in server promotes unified representation by constraining consistent client model updating. To extensively validate the performance of FedU2, we conduct both cross-device and cross-silo evaluation experiments on two benchmark datasets, i.e., CIFAR10 and CIFAR100. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.15836 [pdf, other]

VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification

Authors: Lanfeng Zhong, Xin Liao, Shaoting Zhang, Xiaofan Zhang, Guotai Wang

Abstract: Despite that deep learning methods have achieved remarkable performance in pathology image classification, they heavily rely on labeled data, demanding extensive human annotation efforts. In this study, we present a novel human annotation-free method for pathology image classification by leveraging pre-trained Vision-Language Models (VLMs). Without human annotation, pseudo labels of the training s… ▽ More Despite that deep learning methods have achieved remarkable performance in pathology image classification, they heavily rely on labeled data, demanding extensive human annotation efforts. In this study, we present a novel human annotation-free method for pathology image classification by leveraging pre-trained Vision-Language Models (VLMs). Without human annotation, pseudo labels of the training set are obtained by utilizing the zero-shot inference capabilities of VLM, which may contain a lot of noise due to the domain shift between the pre-training data and the target dataset. To address this issue, we introduce VLM-CPL, a novel approach based on consensus pseudo labels that integrates two noisy label filtering techniques with a semi-supervised learning strategy. Specifically, we first obtain prompt-based pseudo labels with uncertainty estimation by zero-shot inference with the VLM using multiple augmented views of an input. Then, by leveraging the feature representation ability of VLM, we obtain feature-based pseudo labels via sample clustering in the feature space. Prompt-feature consensus is introduced to select reliable samples based on the consensus between the two types of pseudo labels. By rejecting low-quality pseudo labels, we further propose High-confidence Cross Supervision (HCS) to learn from samples with reliable pseudo labels and the remaining unlabeled samples. Experimental results showed that our method obtained an accuracy of 87.1% and 95.1% on the HPH and LC25K datasets, respectively, and it largely outperformed existing zero-shot classification and noisy label learning methods. The code is available at https://github.com/lanfz2000/VLM-CPL. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Under review

arXiv:2403.11170 [pdf, ps, other]

Common substring with shifts in b-ary expansions

Authors: Xin Liao, Dingding Yu

Abstract: Denote by $S_n(x,y)$ the length of the longest common substring of $x$ and $y$ with shifts in their first $n$ digits of $b$-ary expansions. We show that the sets of pairs $(x,y)$, for which the growth rate of $S_n(x,y)$ is $αあるふぁ\log n$ with $0\le αあるふぁ\le \infty$, have full Hausdorff dimension. Denote by $S_n(x,y)$ the length of the longest common substring of $x$ and $y$ with shifts in their first $n$ digits of $b$-ary expansions. We show that the sets of pairs $(x,y)$, for which the growth rate of $S_n(x,y)$ is $αあるふぁ\log n$ with $0\le αあるふぁ\le \infty$, have full Hausdorff dimension. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.04159 [pdf, other]

Metrical theory of power-2-decaying Gauss-like expansion

Authors: Zhihui Li, Xin Liao, Dingding Yu

Abstract: Each $x\in (0,1]$ can be uniquely expanded as a power-2-decaying Gauss-like expansion, in the form of \begin{equation*} x=\sum_{i=1}^{\infty}2^{-(d_1(x)+d_2(x)+\cdots+d_i(x))},\qquad d_i(x)\in \mathbb{N}. \end{equation*} Let $φふぁい:\mathbb{N}\to \mathbb{R}^{+}$ be an arbitrary positive function. We are interested in the size of the set… ▽ More Each $x\in (0,1]$ can be uniquely expanded as a power-2-decaying Gauss-like expansion, in the form of \begin{equation*} x=\sum_{i=1}^{\infty}2^{-(d_1(x)+d_2(x)+\cdots+d_i(x))},\qquad d_i(x)\in \mathbb{N}. \end{equation*} Let $φふぁい:\mathbb{N}\to \mathbb{R}^{+}$ be an arbitrary positive function. We are interested in the size of the set $$F(φふぁい)=\{x\in (0,1]:d_n(x)\ge φふぁい(n)~~\text{for infinity many}~n\}.$$ We prove a Borel-Bernstein theorem on the zero-one law of the Lebesgue measure of $F(φふぁい)$. When the Lebesgue measure of $F(φふぁい)$ is zero, we calculate its Hausdorff dimension. Furthermore, we analyse the growth rate of the maximal digit among the first $n$ digits from probability and multifractal perspectives. △ Less

Submitted 29 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.01798 [pdf, other]

Towards Fair and Efficient Learning-based Congestion Control

Authors: Xudong Liao, Han Tian, Chaoliang Zeng, Xinchen Wan, Kai Chen

Abstract: Recent years have witnessed a plethora of learning-based solutions for congestion control (CC) that demonstrate better performance over traditional TCP schemes. However, they fail to provide consistently good convergence properties, including {\em fairness}, {\em fast convergence} and {\em stability}, due to the mismatch between their objective functions and these properties. Despite being intuiti… ▽ More Recent years have witnessed a plethora of learning-based solutions for congestion control (CC) that demonstrate better performance over traditional TCP schemes. However, they fail to provide consistently good convergence properties, including {\em fairness}, {\em fast convergence} and {\em stability}, due to the mismatch between their objective functions and these properties. Despite being intuitive, integrating these properties into existing learning-based CC is challenging, because: 1) their training environments are designed for the performance optimization of single flow but incapable of cooperative multi-flow optimization, and 2) there is no directly measurable metric to represent these properties into the training objective function. We present Astraea, a new learning-based congestion control that ensures fast convergence to fairness with stability. At the heart of Astraea is a multi-agent deep reinforcement learning framework that explicitly optimizes these convergence properties during the training process by enabling the learning of interactive policy between multiple competing flows, while maintaining high performance. We further build a faithful multi-flow environment that emulates the competing behaviors of concurrent flows, explicitly expressing convergence properties to enable their optimization during training. We have fully implemented Astraea and our comprehensive experiments show that Astraea can quickly converge to fairness point and exhibit better stability than its counterparts. For example, \sys achieves near-optimal bandwidth sharing (i.e., fairness) when multiple flows compete for the same bottleneck, delivers up to 8.4$\times$ faster convergence speed and 2.8$\times$ smaller throughput deviation, while achieving comparable or even better performance over prior solutions. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01244 [pdf, other]

Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal

Authors: Jianheng Huang, Leyang Cui, Ante Wang, Chengyi Yang, Xinting Liao, Linfeng Song, Junfeng Yao, Jinsong Su

Abstract: Large language models (LLMs) suffer from catastrophic forgetting during continual learning. Conventional rehearsal-based methods rely on previous training data to retain the model's ability, which may not be feasible in real-world applications. When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent. To addr… ▽ More Large language models (LLMs) suffer from catastrophic forgetting during continual learning. Conventional rehearsal-based methods rely on previous training data to retain the model's ability, which may not be feasible in real-world applications. When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent. To address this challenge, we propose a framework called Self-Synthesized Rehearsal (SSR) that uses the LLM to generate synthetic instances for rehearsal. Concretely, we first employ the base LLM for in-context learning to generate synthetic instances. Subsequently, we utilize the latest LLM to refine the instance outputs based on the synthetic inputs, preserving its acquired ability. Finally, we select diverse high-quality synthetic instances for rehearsal in future stages. Experimental results demonstrate that SSR achieves superior or comparable performance compared to conventional rehearsal-based approaches while being more data-efficient. Besides, SSR effectively preserves the generalization capabilities of LLMs in general domains. △ Less

Submitted 25 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

Comments: ACL 2024 main, long paper

arXiv:2402.17959 [pdf, other]

An Iterative Associative Memory Model for Empathetic Response Generation

Authors: Zhou Yang, Zhaochun Ren, Yufeng Wang, Chao Chen, Haizhou Sun, Xiaofei Zhu, Xiangwen Liao

Abstract: Empathetic response generation aims to comprehend the cognitive and emotional states in dialogue utterances and generate proper responses. Psychological theories posit that comprehending emotional and cognitive states necessitates iteratively capturing and understanding associated words across dialogue utterances. However, existing approaches regard dialogue utterances as either a long sequence or… ▽ More Empathetic response generation aims to comprehend the cognitive and emotional states in dialogue utterances and generate proper responses. Psychological theories posit that comprehending emotional and cognitive states necessitates iteratively capturing and understanding associated words across dialogue utterances. However, existing approaches regard dialogue utterances as either a long sequence or independent utterances for comprehension, which are prone to overlook the associated words between them. To address this issue, we propose an Iterative Associative Memory Model (IAMM) for empathetic response generation. Specifically, we employ a novel second-order interaction attention mechanism to iteratively capture vital associated words between dialogue utterances and situations, dialogue history, and a memory module (for storing associated words), thereby accurately and nuancedly comprehending the utterances. We conduct experiments on the Empathetic-Dialogue dataset. Both automatic and human evaluations validate the efficacy of the model. Variant experiments on LLMs also demonstrate that attending to associated words improves empathetic comprehension and expression. △ Less

Submitted 2 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: 12 pages, 4 figures

arXiv:2402.17437 [pdf, other]

Exploiting Emotion-Semantic Correlations for Empathetic Response Generation

Authors: Zhou Yang, Zhaochun Ren, Yufeng Wang, Xiaofei Zhu, Zhihao Chen, Tiecheng Cai, Yunbing Wu, Yisong Su, Sibo Ju, Xiangwen Liao

Abstract: Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with… ▽ More Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with other grammar semantic roles, i.e., words with semantic meanings, in grammar. Previous methods overlook these two characteristics, which easily lead to misunderstandings of emotions and neglect of key semantics. To address this issue, we propose a dynamical Emotion-Semantic Correlation Model (ESCM) for empathetic dialogue generation tasks. ESCM constructs dynamic emotion-semantic vectors through the interaction of context and emotions. We introduce dependency trees to reflect the correlations between emotions and semantics. Based on dynamic emotion-semantic vectors and dependency trees, we propose a dynamic correlation graph convolutional network to guide the model in learning context meanings in dialogue and generating empathetic responses. Experimental results on the EMPATHETIC-DIALOGUES dataset show that ESCM understands semantics and emotions more accurately and expresses fluent and informative empathetic responses. Our analysis results also indicate that the correlations between emotions and semantics are frequently used in dialogues, which is of great significance for empathetic perception and expression. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 12 pages, 3 figures, Findings of EMNLP 2023

arXiv:2402.11801 [pdf, other]

Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models

Authors: Zhou Yang, Zhaochun Ren, Wang Yufeng, Shizhong Peng, Haizhou Sun, Xiaofei Zhu, Xiangwen Liao

Abstract: Empathetic response generation is increasingly significant in AI, necessitating nuanced emotional and cognitive understanding coupled with articulate response expression. Current large language models (LLMs) excel in response expression; however, they lack the ability to deeply understand emotional and cognitive nuances, particularly in pinpointing fine-grained emotions and their triggers. Convers… ▽ More Empathetic response generation is increasingly significant in AI, necessitating nuanced emotional and cognitive understanding coupled with articulate response expression. Current large language models (LLMs) excel in response expression; however, they lack the ability to deeply understand emotional and cognitive nuances, particularly in pinpointing fine-grained emotions and their triggers. Conversely, small-scale empathetic models (SEMs) offer strength in fine-grained emotion detection and detailed emotion cause identification. To harness the complementary strengths of both LLMs and SEMs, we introduce a Hybrid Empathetic Framework (HEF). HEF regards SEMs as flexible plugins to improve LLM's nuanced emotional and cognitive understanding. Regarding emotional understanding, HEF implements a two-stage emotion prediction strategy, encouraging LLMs to prioritize primary emotions emphasized by SEMs, followed by other categories, substantially alleviates the difficulties for LLMs in fine-grained emotion detection. Regarding cognitive understanding, HEF employs an emotion cause perception strategy, prompting LLMs to focus on crucial emotion-eliciting words identified by SEMs, thus boosting LLMs' capabilities in identifying emotion causes. This collaborative approach enables LLMs to discern emotions more precisely and formulate empathetic responses. We validate HEF on the Empathetic-Dialogue dataset, and the findings indicate that our framework enhances the refined understanding of LLMs and their ability to convey empathetic responses. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 12 pages, 4 figures

arXiv:2401.16714 [pdf]

A Point Cloud Enhancement Method for 4D mmWave Radar Imagery

Authors: Qingmian Wan, Hongli Peng, Xing Liao, Kuayue Liu, Junfa Mao

Abstract: A point cloud enhancement method for 4D mmWave radar imagery is proposed in this paper. Based on the patch antenna and MIMO array theories, the MIMO array with small redundancy and high SNR is designed to provide the probability of high angular resolution and detection rate. The antenna array is deployed using a ladder shape in vertical direction to decrease the redundancy and improve the resoluti… ▽ More A point cloud enhancement method for 4D mmWave radar imagery is proposed in this paper. Based on the patch antenna and MIMO array theories, the MIMO array with small redundancy and high SNR is designed to provide the probability of high angular resolution and detection rate. The antenna array is deployed using a ladder shape in vertical direction to decrease the redundancy and improve the resolution in horizontal direction with the constrains of physical factors. Considering the complicated environment of the real world with non-uniform distributed clutters, the dynamic detection method is used to solve the weak target sensing problem. The window size of CFAR detector is assumed variant to be determined using optimization method, making it adaptive to different environments especially when weak targets exist. The angular resolution increase using FT-based DOA method and the designed antenna array is described, which provides the basis of accurate detection and dense point cloud. To verify the performance of the proposed method, experiments of simulations and practical measurements are carried out, whose results show that the accuracy and the point cloud density are improved with comparison of the original manufacturer mmWave radar of TI AWR2243. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.16711 [pdf, other]

Insight into the Galactic Bulge Chemodynamical Properties from Gaia DR3

Authors: Xiaojie Liao, Zhaoyu Li, Iulia Simion, Juntai Shen, Robert Grand, Francesca Fragkoudi, Federico Marinacci

Abstract: We explore the chemodynamical properties of the Galaxy in the azimuthal velocity $V_φふぁい$ and metallicity [Fe/H] space using red giant stars from Gaia Data Release 3. The row-normalized $V_φふぁい$-[Fe/H] maps form a coherent sequence from the bulge to the outer disk, clearly revealing the thin/thick disk and the Splash. The metal-rich stars display bar-like kinematics while the metal-poor stars show dispe… ▽ More We explore the chemodynamical properties of the Galaxy in the azimuthal velocity $V_φふぁい$ and metallicity [Fe/H] space using red giant stars from Gaia Data Release 3. The row-normalized $V_φふぁい$-[Fe/H] maps form a coherent sequence from the bulge to the outer disk, clearly revealing the thin/thick disk and the Splash. The metal-rich stars display bar-like kinematics while the metal-poor stars show dispersion-dominated kinematics. The intermediate-metallicity population ($-1<$[Fe/H]$<-0.4$) can be separated into two populations, one that is bar-like, i.e. dynamically cold ($σしぐま_{V_R}\sim80$ $\rm km\ s^{-1}$) and fast rotating ($V_φふぁい\gtrsim100$ $\rm km\ s^{-1}$), and the Splash, which is dynamically hot ($σしぐま_{V_R}\sim110$ $\rm km\ s^{-1}$) and slow rotating ($V_φふぁい\lesssim100$ $\rm km\ s^{-1}$). We compare the observations in the bulge region with an Auriga simulation where the last major merger event occurred $\sim10$ Gyr ago: only stars born around the time of the merger reveal a Splash-like feature in the $V_φふぁい$-[Fe/H] space, suggesting that the Splash is likely merger-induced, predominantly made-up of heated disk stars and the starburst associated with the last major merger. Since the Splash formed from the proto-disk, its lower metallicity limit coincides with that of the thick disk. The bar formed later from the dynamically hot disk with [Fe/H] $>-1$ dex, with the Splash not participating in the bar formation and growth. Moreover, with a set of isolated evolving $N$-body disk simulations, we confirm that a non-rotating classical bulge can be spun up by the bar and develop cylindrical rotation, consistent with the observation for the metal-poor stars. △ Less

Submitted 27 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: ApJ accepted, 20 pages, 15 figures, comments welcome

arXiv:2401.13516 [pdf, other]

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

Authors: Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

Abstract: Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areasthat vary between frames. Consequently, existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address thislimitation, we propose Delocate, a novel Deepfake detection model that can both recognize andlocalize unknown domai… ▽ More Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areasthat vary between frames. Consequently, existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address thislimitation, we propose Delocate, a novel Deepfake detection model that can both recognize andlocalize unknown domain Deepfake videos. Ourmethod consists of two stages named recoveringand localization. In the recovering stage, the modelrandomly masks regions of interest (ROIs) and reconstructs real faces without tampering traces, leading to a relatively good recovery effect for realfaces and a poor recovery effect for fake faces. Inthe localization stage, the output of the recoveryphase and the forgery ground truth mask serve assupervision to guide the forgery localization process. This process strategically emphasizes the recovery phase of fake faces with poor recovery, facilitating the localization of tampered regions. Ourextensive experiments on four widely used benchmark datasets demonstrate that Delocate not onlyexcels in localizing tampered areas but also enhances cross-domain detection performance. △ Less

Submitted 9 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2308.09921, arXiv:2305.05943

arXiv:2401.03321 [pdf, other]

PIXAR: Auto-Regressive Language Modeling in Pixel Space

Authors: Yintao Tai, Xiyang Liao, Alessandro Suglia, Antonio Vergari

Abstract: Recent work showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations. These models are implemented as autoencoders that reconstruct masked patches of rendered text. However, these pixel-based LLMs are limited to discriminative tasks (e.g., classification) and, similar to BERT, cannot be used to generate text. Therefore, they can… ▽ More Recent work showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations. These models are implemented as autoencoders that reconstruct masked patches of rendered text. However, these pixel-based LLMs are limited to discriminative tasks (e.g., classification) and, similar to BERT, cannot be used to generate text. Therefore, they cannot be used for generative tasks such as free-form question answering. In this work, we introduce PIXAR, the first pixel-based autoregressive LLM that performs text generation. Consisting of only a decoder, PIXAR can perform free-form generative tasks while keeping the number of parameters on par with previous encoder-decoder models. Furthermore, we highlight the challenges of generating text as non-noisy images and show this is due to using a maximum likelihood objective. To overcome this problem, we propose an adversarial pretraining stage that improves the readability and accuracy of PIXAR by 8.1 on LAMBADA and 8.5 on bAbI -- making it comparable to GPT-2 on text generation tasks. This paves the way to build open-vocabulary LLMs that operate on perceptual input only and calls into question the necessity of the usual symbolic input representation, i.e., text as (sub)tokens. △ Less

Submitted 23 February, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

arXiv:2401.03315 [pdf, other]

Malla: Demystifying Real-world Large Language Model Integrated Malicious Services

Authors: Zilong Lin, Jian Cui, Xiaojing Liao, XiaoFeng Wang

Abstract: The underground exploitation of large language models (LLMs) for malicious services (i.e., Malla) is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. However, there has been little effort to understand this new cybercrime, in terms of its magnitude, impact, and techniques. In this paper, we conduct the first systematic… ▽ More The underground exploitation of large language models (LLMs) for malicious services (i.e., Malla) is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. However, there has been little effort to understand this new cybercrime, in terms of its magnitude, impact, and techniques. In this paper, we conduct the first systematic study on 212 real-world Mallas, uncovering their proliferation in underground marketplaces and exposing their operational modalities. Our study discloses the Malla ecosystem, revealing its significant growth and impact on today's public LLM services. Through examining 212 Mallas, we uncovered eight backend LLMs used by Mallas, along with 182 prompts that circumvent the protective measures of public LLM APIs. We further demystify the tactics employed by Mallas, including the abuse of uncensored LLMs and the exploitation of public LLM APIs through jailbreak prompts. Our findings enable a better understanding of the real-world exploitation of LLMs by cybercriminals, offering insights into strategies to counteract this cybercrime. △ Less

Submitted 4 July, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

Comments: Accepted at the 33rd USENIX Security Symposium (USENIX Security '24). The data and code are available at https://github.com/idllresearch/malicious-gpt

arXiv:2401.01064 [pdf, other]

Robust Inference for Multiple Predictive Regressions with an Application on Bond Risk Premia

Authors: Xiaosai Liao, Xinjue Li, Qingliang Fan

Abstract: We propose a robust hypothesis testing procedure for the predictability of multiple predictors that could be highly persistent. Our method improves the popular extended instrumental variable (IVX) testing (Phillips and Lee, 2013; Kostakis et al., 2015) in that, besides addressing the two bias effects found in Hosseinkouchack and Demetrescu (2021), we find and deal with the variance-enlargement eff… ▽ More We propose a robust hypothesis testing procedure for the predictability of multiple predictors that could be highly persistent. Our method improves the popular extended instrumental variable (IVX) testing (Phillips and Lee, 2013; Kostakis et al., 2015) in that, besides addressing the two bias effects found in Hosseinkouchack and Demetrescu (2021), we find and deal with the variance-enlargement effect. We show that two types of higher-order terms induce these distortion effects in the test statistic, leading to significant over-rejection for one-sided tests and tests in multiple predictive regressions. Our improved IVX-based test includes three steps to tackle all the issues above regarding finite sample bias and variance terms. Thus, the test statistics perform well in size control, while its power performance is comparable with the original IVX. Monte Carlo simulations and an empirical study on the predictability of bond risk premia are provided to demonstrate the effectiveness of the newly proposed approach. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.00865 [pdf, other]

Xorbits: Automating Operator Tiling for Distributed Data Science

Authors: Weizheng Lu, Kaisheng He, Xuye Qin, Chengjie Li, Zhong Wang, Tao Yuan, Xia Liao, Feng Zhang, Yueguo Chen, Xiaoyong Du

Abstract: Data science pipelines commonly utilize dataframe and array operations for tasks such as data preprocessing, analysis, and machine learning. The most popular tools for these tasks are pandas and NumPy. However, these tools are limited to executing on a single node, making them unsuitable for processing large-scale data. Several systems have attempted to distribute data science applications to clus… ▽ More Data science pipelines commonly utilize dataframe and array operations for tasks such as data preprocessing, analysis, and machine learning. The most popular tools for these tasks are pandas and NumPy. However, these tools are limited to executing on a single node, making them unsuitable for processing large-scale data. Several systems have attempted to distribute data science applications to clusters while maintaining interfaces similar to single-node libraries, enabling data scientists to scale their workloads without significant effort. However, existing systems often struggle with processing large datasets due to Out-of-Memory (OOM) problems caused by poor data partitioning. To overcome these challenges, we develop Xorbits, a high-performance, scalable data science framework specifically designed to distribute data science workloads across clusters while retaining familiar APIs. The key differentiator of Xorbits is its ability to dynamically switch between graph construction and graph execution. Xorbits has been successfully deployed in production environments with up to 5k CPU cores. Its applications span various domains, including user behavior analysis and recommendation systems in the e-commerce sector, as well as credit assessment and risk management in the finance industry. Users can easily scale their data science workloads by simply changing the import line of their pandas and NumPy code. Our experiments demonstrate that Xorbits can effectively process very large datasets without encountering OOM or data-skewing problems. Over the fastest state-of-the-art solutions, Xorbits achieves an impressive 2.66* speedup on average. In terms of API coverage, Xorbits attains a compatibility rate of 96.7%, surpassing the fastest framework by an impressive margin of 60 percentage points. Xorbits is available at https://github.com/xorbitsai/xorbits. △ Less

Submitted 19 March, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

Comments: ICDE 2024 Industrial and Application Track

arXiv:2401.00166 [pdf, ps, other]

Block-Level MU-MISO Interference Exploitation Precoding: Optimal Structure and Explicit Duality

Authors: Junwen Yang, Ang Li, Xuewen Liao, Christos Masouros, A. L. Swindlehurst

Abstract: This paper investigates block-level interference exploitation (IE) precoding for multi-user multiple-input single-output (MU-MISO) downlink systems. To overcome the need for symbol-level IE precoding to frequently update the precoding matrix, we propose to jointly optimize all the precoders or transmit signals within a transmission block. The resultant precoders only need to be updated once per bl… ▽ More This paper investigates block-level interference exploitation (IE) precoding for multi-user multiple-input single-output (MU-MISO) downlink systems. To overcome the need for symbol-level IE precoding to frequently update the precoding matrix, we propose to jointly optimize all the precoders or transmit signals within a transmission block. The resultant precoders only need to be updated once per block, and while not necessarily constant over all the symbol slots, we refer to the technique as block-level slot-variant IE precoding. Through a careful examination of the optimal structure and the explicit duality inherent in block-level power minimization (PM) and signal-to-interference-plus-noise ratio (SINR) balancing (SB) problems, we discover that the joint optimization can be decomposed into subproblems with smaller variable sizes. As a step further, we propose block-level slot-invariant IE precoding by adding a structural constraint on the slot-variant IE precoding to maintain a constant precoder throughout the block. A novel linear precoder for IE is further presented, and we prove that the proposed slot-variant and slot-invariant IE precoding share an identical solution when the number of symbol slots does not exceed the number of users. Numerical simulations demonstrate that the proposed precoders achieve a significant complexity reduction compared against benchmark schemes, without sacrificing performance. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: Submitted to IEEE

arXiv:2312.12023 [pdf, other]

Progressive Frequency-Aware Network for Laparoscopic Image Desmoking

Authors: Jiale Zhang, Wenfeng Huang, Xiangyun Liao, Qiong Wang

Abstract: Laparoscopic surgery offers minimally invasive procedures with better patient outcomes, but smoke presence challenges visibility and safety. Existing learning-based methods demand large datasets and high computational resources. We propose the Progressive Frequency-Aware Network (PFAN), a lightweight GAN framework for laparoscopic image desmoking, combining the strengths of CNN and Transformer for… ▽ More Laparoscopic surgery offers minimally invasive procedures with better patient outcomes, but smoke presence challenges visibility and safety. Existing learning-based methods demand large datasets and high computational resources. We propose the Progressive Frequency-Aware Network (PFAN), a lightweight GAN framework for laparoscopic image desmoking, combining the strengths of CNN and Transformer for progressive information extraction in the frequency domain. PFAN features CNN-based Multi-scale Bottleneck-Inverting (MBI) Blocks for capturing local high-frequency information and Locally-Enhanced Axial Attention Transformers (LAT) for efficiently handling global low-frequency information. PFAN efficiently desmokes laparoscopic images even with limited training data. Our method outperforms state-of-the-art approaches in PSNR, SSIM, CIEDE2000, and visual quality on the Cholec80 dataset and retains only 629K parameters. Our code and models are made publicly available at: https://github.com/jlzcode/PFAN. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.11577 [pdf, other]

PR-NeuS: A Prior-based Residual Learning Paradigm for Fast Multi-view Neural Surface Reconstruction

Authors: Jianyao Xu, Qingshan Xu, Xinyao Liao, Wanjuan Su, Chen Zhang, Yew-Soon Ong, Wenbing Tao

Abstract: Neural surfaces learning has shown impressive performance in multi-view surface reconstruction. However, most existing methods use large multilayer perceptrons (MLPs) to train their models from scratch, resulting in hours of training for a single scene. Recently, how to accelerate the neural surfaces learning has received a lot of attention and remains an open problem. In this work, we propose a p… ▽ More Neural surfaces learning has shown impressive performance in multi-view surface reconstruction. However, most existing methods use large multilayer perceptrons (MLPs) to train their models from scratch, resulting in hours of training for a single scene. Recently, how to accelerate the neural surfaces learning has received a lot of attention and remains an open problem. In this work, we propose a prior-based residual learning paradigm for fast multi-view neural surface reconstruction. This paradigm consists of two optimization stages. In the first stage, we propose to leverage generalization models to generate a basis signed distance function (SDF) field. This initial field can be quickly obtained by fusing multiple local SDF fields produced by generalization models. This provides a coarse global geometry prior. Based on this prior, in the second stage, a fast residual learning strategy based on hash-encoding networks is proposed to encode an offset SDF field for the basis SDF field. Moreover, we introduce a prior-guided sampling scheme to help the residual learning stage converge better, and thus recover finer structures. With our designed paradigm, experimental results show that our method only takes about 3 minutes to reconstruct the surface of a single scene, while achieving competitive surface quality. Our code will be released upon publication. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.08787 [pdf]

doi 10.1002/aelm.202400102

Switching dynamics and improved efficiency of free-standing antiferroelectric capacitors

Authors: Umair Saeed, David Pesquera, Ying Liu, Ignasi Fina, Saptam Ganguly, Jose Santiso, Jessica Padilla, Jose Manuel Caicedo Roque, Xiaozhou Liao, Gustau Catalan

Abstract: We report the switching dynamics of antiferroelectric Lead Zirconate (PbZrO3) free standing capacitors compared to their epitaxial counterparts. Frequency dependence of hysteresis indicates that freestanding capacitors exhibit a lower dispersion of switching fields, lower residual polarization, and faster switching response as compared to epitaxially-clamped capacitors. As a consequence, freestand… ▽ More We report the switching dynamics of antiferroelectric Lead Zirconate (PbZrO3) free standing capacitors compared to their epitaxial counterparts. Frequency dependence of hysteresis indicates that freestanding capacitors exhibit a lower dispersion of switching fields, lower residual polarization, and faster switching response as compared to epitaxially-clamped capacitors. As a consequence, freestanding capacitor membranes exhibit better energy storage density and efficiency. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 27 pages, 7 Figures, Supplementary material

arXiv:2312.07556 [pdf, other]

Federated Learning for Short Text Clustering

Authors: Mengling Hu, Chaochao Chen, Weiming Liu, Xinting Liao, Xiaolin Zheng

Abstract: Short text clustering has been popularly studied for its significance in mining valuable insights from many short texts. In this paper, we focus on the federated short text clustering (FSTC) problem, i.e., clustering short texts that are distributed in different clients, which is a realistic problem under privacy requirements. Compared with the centralized short text clustering problem that short… ▽ More Short text clustering has been popularly studied for its significance in mining valuable insights from many short texts. In this paper, we focus on the federated short text clustering (FSTC) problem, i.e., clustering short texts that are distributed in different clients, which is a realistic problem under privacy requirements. Compared with the centralized short text clustering problem that short texts are stored on a central server, the FSTC problem has not been explored yet. To fill this gap, we propose a Federated Robust Short Text Clustering (FSTC) framework. FSTC includes two main modules, i.e., robust short text clustering module and federated cluster center aggregation module. The robust short text clustering module aims to train an effective short text clustering model with local data in each client. We innovatively combine optimal transport to generate pseudo-labels with Gaussian-uniform mixture model to ensure the reliability of the pseudo-supervised data. The federated cluster center aggregation module aims to exchange knowledge across clients without sharing local raw data in an efficient way. The server aggregates the local cluster centers from different clients and then sends the global centers back to all clients in each communication round. Our empirical studies on three short text clustering datasets demonstrate that FSTC significantly outperforms the federated short text clustering baselines. △ Less

Submitted 23 November, 2023; originally announced December 2023.

arXiv:2312.05990 [pdf, other]

Constructing Vec-tionaries to Extract Message Features from Texts: A Case Study of Moral Appeals

Authors: Zening Duan, Anqi Shao, Yicheng Hu, Heysung Lee, Xining Liao, Yoo Ji Suh, Jisoo Kim, Kai-Cheng Yang, Kaiping Chen, Sijia Yang

Abstract: While researchers often study message features like moral content in text, such as party manifestos and social media, their quantification remains a challenge. Conventional human coding struggles with scalability and intercoder reliability. While dictionary-based methods are cost-effective and computationally efficient, they often lack contextual sensitivity and are limited by the vocabularies dev… ▽ More While researchers often study message features like moral content in text, such as party manifestos and social media, their quantification remains a challenge. Conventional human coding struggles with scalability and intercoder reliability. While dictionary-based methods are cost-effective and computationally efficient, they often lack contextual sensitivity and are limited by the vocabularies developed for the original applications. In this paper, we present an approach to construct vec-tionary measurement tools that boost validated dictionaries with word embeddings through nonlinear optimization. By harnessing semantic relationships encoded by embeddings, vec-tionaries improve the measurement of message features from text, especially those in short format, by expanding the applicability of original vocabularies to other contexts. Importantly, a vec-tionary can produce additional metrics to capture the valence and ambivalence of a message feature beyond its strength in texts. Using moral content in tweets as a case study, we illustrate the steps to construct the moral foundations vec-tionary, showcasing its ability to process texts missed by conventional dictionaries and word embedding methods and to produce measurements better aligned with crowdsourced human assessments. Furthermore, additional metrics from the vec-tionary unveiled unique insights that facilitated predicting outcomes such as message retransmission. △ Less

Submitted 8 March, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.04900 [pdf]

Graph for Science: From API based Programming to Graph Engine based Programming for HPC

Authors: Yu Zhang, Zixiao Wang, Jin Zhao, Yuluo Guo, Hui Yu, Zhiying Huang, Xuanhua Shi, Xiaofei Liao

Abstract: Modern scientific applications predominantly run on large-scale computing platforms, necessitating collaboration between scientific domain experts and high-performance computing (HPC) experts. While domain experts are often skilled in customizing domain-specific scientific computing routines, which often involves various matrix computations, HPC experts are essential for achieving efficient execut… ▽ More Modern scientific applications predominantly run on large-scale computing platforms, necessitating collaboration between scientific domain experts and high-performance computing (HPC) experts. While domain experts are often skilled in customizing domain-specific scientific computing routines, which often involves various matrix computations, HPC experts are essential for achieving efficient execution of these computations on large-scale platforms. This process often involves utilizing complex parallel computing libraries tailored to specific matrix computation scenarios. However, the intricate programming procedure and the need for deep understanding in both application domains and HPC poses significant challenges to the widespread adoption of scientific computing. In this research, we observe that matrix computations can be transformed into equivalent graph representations, and that by utilizing graph processing engines, HPC experts can be freed from the burden of implementing efficient scientific computations. Based on this observation, we introduce a graph engine-based scientific computing (Graph for Science) paradigm, which provides a unified graph programming interface, enabling domain experts to promptly implement various types of matrix computations. The proposed paradigm leverages the underlying graph processing engine to achieve efficient execution, eliminating the needs for HPC expertise in programming large-scale scientific applications. Our results show that the graph engine-based scientific computing paradigm achieves performance comparable to the best-performing implementations based on existing parallel computing libraries and bespoke implementations. Importantly, the paradigm greatly simplifies the development of scientific computations on large-scale platforms, reducing the programming difficulty for scientists and facilitating broader adoption of scientific computing. △ Less

Submitted 2 April, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.01780 [pdf]

Parameter Estimation of Differential Equation Model based on Optimal Weight Choice Method

Authors: Jun Wanga, Xianglei Li, Xianghu Lia

Abstract: Differential equations are important tools to portray dynamic problems, and are widely used in finance, engineering and biology. Here, multiple dynamic differential models were built innovatively, and discretized with the Runge-Kutta method. The the model parameters were estimated. The models were averaged using the Optimal weight selection method, and the consistency of such parameter estimation… ▽ More Differential equations are important tools to portray dynamic problems, and are widely used in finance, engineering and biology. Here, multiple dynamic differential models were built innovatively, and discretized with the Runge-Kutta method. The the model parameters were estimated. The models were averaged using the Optimal weight selection method, and the consistency of such parameter estimation was verified. Numerical simulations were also conducted, and the simulated results outperformed ordinary linear models. Finally, the differential averaging model built here was used to empirically analyze the Shanghai Index 300. This method integrated the fluctuation features of multiple Shanghai Composite Index fitting models, and yielded good analytical results. This study provides a methodological reference for analysis of stock market situations, and offers research clues for the parameter estimation of differential equations. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.01263 [pdf]

doi 10.1103/PhysRevLett.131.186302

Gate-Tunable Berry Curvature Dipole Polarizability in Dirac Semimetal Cd3As2

Authors: Tong-Yang Zhao, An-Qi Wang, Xing-Guo Ye, Xing-Yu Liu, Xin Liao, Zhi-Min Liao

Abstract: We reveal the gate-tunable Berry curvature dipole polarizability in Dirac semimetal Cd3As2 nanoplates through measurements of the third-order nonlinear Hall effect. Under an applied electric field, the Berry curvature exhibits an asymmetric distribution, forming a field-induced Berry curvature dipole, resulting in a measurable third-order Hall voltage with a cubic relationship to the longitudinal… ▽ More We reveal the gate-tunable Berry curvature dipole polarizability in Dirac semimetal Cd3As2 nanoplates through measurements of the third-order nonlinear Hall effect. Under an applied electric field, the Berry curvature exhibits an asymmetric distribution, forming a field-induced Berry curvature dipole, resulting in a measurable third-order Hall voltage with a cubic relationship to the longitudinal electric field. Notably, the magnitude and polarity of this third-order nonlinear Hall effect can be effectively modulated by gate voltages. Furthermore, our scaling relation analysis demonstrates that the sign of the Berry curvature dipole polarizability changes when tuning the Fermi level across the Dirac point, in agreement with theoretical calculations. The results highlight the gate control of nonlinear quantum transport in Dirac semimetals, paving the way for promising advancements in topological electronics. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Journal ref: Phys. Rev. Lett. 131, 186302 (2023)

arXiv:2311.15486 [pdf, other]

Detection prospects of long-lived quirk pairs at the LHC far detectors

Authors: Jinmian Li, Xufei Liao, Jian Ni, Junle Pei

Abstract: We examine the sensitivity reaches of several LHC far detectors, such as FASER2, MATHUSLA, ANUBIS, SND@LHC, and FACET, to five simplified quirk scenarios. We include the next-to-leading order QCD corrections in our simulation of quirk events, which enhance the total production rate and increase the fraction of events in the forward direction for most cases. We calculate the time scales for the qui… ▽ More We examine the sensitivity reaches of several LHC far detectors, such as FASER2, MATHUSLA, ANUBIS, SND@LHC, and FACET, to five simplified quirk scenarios. We include the next-to-leading order QCD corrections in our simulation of quirk events, which enhance the total production rate and increase the fraction of events in the forward direction for most cases. We calculate the time scales for the quirk pair to lose energy through radiations and for the quirk pair annihilation. Our results show that these far detectors can offer promising probes to the quirk scenario, complementing the searches at the main detectors. Especially, FACET and FASER2 detectors can surpass the majority of searches conducted at the LHC main detector, with the exception of the HSCP search, for the color-neutral quirk $\mathcal{E}$. △ Less

Submitted 29 April, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: 21 pages, 11 figures

Showing 1–50 of 316 results for author: Liao, X