Search | arXiv e-print repository

JSCDS: A Core Data Selection Method with Jason-Shannon Divergence for Caries RGB Images-Efficient Learning

Authors: Peiliang Zhang, Yujia Tong, Chenghu Du, Chao Che, Yongjun Zhu

Abstract: Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency withou… ▽ More Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency without significantly compromising model performance. However, distance-based data selection methods struggle to distinguish dependencies among high-dimensional caries data. To address this issue, we propose a Core Data Selection Method with Jensen-Shannon Divergence (JSCDS) for efficient caries image learning and caries classification. We describe the core data selection criterion as the distribution of samples in different classes. JSCDS calculates the cluster centers by sample embedding representation in the caries classification network and utilizes Jensen-Shannon Divergence to compute the mutual information between data samples and cluster centers, capturing nonlinear dependencies among high-dimensional data. The average mutual information is calculated to fit the above distribution, serving as the criterion for constructing the core set for model training. Extensive experiments on RGB caries datasets show that JSCDS outperforms other data selection methods in prediction performance and time consumption. Notably, JSCDS exceeds the performance of the full dataset model with only 50% of the core data, with its performance advantage becoming more pronounced in the 70% of core data. △ Less

Submitted 6 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

Comments: Accepted in KDD 2024 Workshop AIDSH

arXiv:2406.18606 [pdf, other]

Bayesian Inference for Stochastic Predictions of Non-Gaussian Systems with Applications in Climate Change

Authors: Yunjin Tong

Abstract: Climate change poses significant challenges for accurate climate modeling due to the complexity and variability of non-Gaussian climate systems. To address the complexities of non-Gaussian systems in climate modeling, this thesis proposes a Bayesian framework utilizing the Unscented Kalman Filter (UKF), Ensemble Kalman Filter (EnKF), and Unscented Particle Filter (UPF) for one-dimensional and two-… ▽ More Climate change poses significant challenges for accurate climate modeling due to the complexity and variability of non-Gaussian climate systems. To address the complexities of non-Gaussian systems in climate modeling, this thesis proposes a Bayesian framework utilizing the Unscented Kalman Filter (UKF), Ensemble Kalman Filter (EnKF), and Unscented Particle Filter (UPF) for one-dimensional and two-dimensional stochastic climate models, evaluated with real-world temperature and sea level data. We study these methods under varying conditions, including measurement noise, sample sizes, and observed and hidden variables, to highlight their respective advantages and limitations. Our findings reveal that merely increasing data is insufficient for accurate predictions; instead, selecting appropriate methods is crucial. This research provides insights into issues related to information barrier, curse of dimensionality, prediction variability, and measurement noise quantification, thereby enhancing the application of these techniques in real-world climate scenarios. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.17758 [pdf, other]

MotionBooth: Motion-Aware Customized Text-to-Video Generation

Authors: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

Abstract: In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance t… ▽ More In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance the subject's learning performance, along with a subject token cross-attention loss to integrate the customized subject with motion control signals. Additionally, we propose training-free techniques for managing subject and camera motions during inference. In particular, we utilize cross-attention map manipulation to govern subject motion and introduce a novel latent shift module for camera movement control as well. MotionBooth excels in preserving the appearance of subjects while simultaneously controlling the motions in generated videos. Extensive quantitative and qualitative evaluations demonstrate the superiority and effectiveness of our method. Our project page is at https://jianzongwu.github.io/projects/motionbooth △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Project page at https://jianzongwu.github.io/projects/motionbooth

arXiv:2406.16956 [pdf, other]

Data-Driven Computing Methods for Nonlinear Physics Systems with Geometric Constraints

Authors: Yunjin Tong

Abstract: In a landscape where scientific discovery is increasingly driven by data, the integration of machine learning (ML) with traditional scientific methodologies has emerged as a transformative approach. This paper introduces a novel, data-driven framework that synergizes physics-based priors with advanced ML techniques to address the computational and practical limitations inherent in first-principle-… ▽ More In a landscape where scientific discovery is increasingly driven by data, the integration of machine learning (ML) with traditional scientific methodologies has emerged as a transformative approach. This paper introduces a novel, data-driven framework that synergizes physics-based priors with advanced ML techniques to address the computational and practical limitations inherent in first-principle-based methods and brute-force machine learning methods. Our framework showcases four algorithms, each embedding a specific physics-based prior tailored to a particular class of nonlinear systems, including separable and nonseparable Hamiltonian systems, hyperbolic partial differential equations, and incompressible fluid dynamics. The intrinsic incorporation of physical laws preserves the system's intrinsic symmetries and conservation laws, ensuring solutions are physically plausible and computationally efficient. The integration of these priors also enhances the expressive power of neural networks, enabling them to capture complex patterns typical in physical phenomena that conventional methods often miss. As a result, our models outperform existing data-driven techniques in terms of prediction accuracy, robustness, and predictive capability, particularly in recognizing features absent from the training set, despite relying on small datasets, short training periods, and small sample sizes. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.15440 [pdf, ps, other]

Sufficient D-Stability Conditions for Non-Square Matrices

Authors: Yuhao Tong, Steven W. Su

Abstract: This note explores the extension of D-stability to non-square matrices, applicable to distributed/decentralized controllability analysis. We first present a definition of D-stability for non-square matrices, directly extending from square matrices. We propose sufficient conditions for specific configurations of non-square matrices. Finally, we consider the selection of configurations to ensure the… ▽ More This note explores the extension of D-stability to non-square matrices, applicable to distributed/decentralized controllability analysis. We first present a definition of D-stability for non-square matrices, directly extending from square matrices. We propose sufficient conditions for specific configurations of non-square matrices. Finally, we consider the selection of configurations to ensure the D-stability of a given non-square system. △ Less

Submitted 29 May, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.00367

arXiv:2406.13368 [pdf]

Lewis Acidity and Basicity Diagnostics of Molten Salt for its Properties and Structure Online Monitoring

Authors: Changzu Zhu, Jia Song, Xiaorui Xu, Chengyu Wang, Yang Tong, Lve Lin, Shaoqiang Guo, Wentao Zhou, Adrien Couet, Yafei Wang

Abstract: Analogous to the aqueous solution where the pH of the solvent affects its multiple behaviors, the Lewis acidity-basicity of molten salts also greatly influences their thermophysical and thermochemical properties. In the study, we develop ion probes to quantitatively determine the acidity-basicity scale of molten NaCl-xAlCl3 (x = 1.5-2.1) salt using in-situ ultra-violet visible (UV-Vis) spectroscop… ▽ More Analogous to the aqueous solution where the pH of the solvent affects its multiple behaviors, the Lewis acidity-basicity of molten salts also greatly influences their thermophysical and thermochemical properties. In the study, we develop ion probes to quantitatively determine the acidity-basicity scale of molten NaCl-xAlCl3 (x = 1.5-2.1) salt using in-situ ultra-violet visible (UV-Vis) spectroscopy. With the accumulation of acidity-basicity data of NaCl-AlCl3 molten salt for a variety of compositions, the correlation between the acidity-basicity of salt and its measured fundamental properties are derived. To understand the physical and chemical features controlling the acidity-basicity variations, the structures of NaCl-xAlCl3 molten salts with different chemical compositions are investigated in terms of bonded complexes and coordination numbers. The comprehensive understanding of the correlation between composition, acidity-basicity, properties, and structures of molten salt can serve for the full screening and online monitoring of salt melt in extreme environments by simply measuring the salt acidity-basicity as developed in this study. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.11389 [pdf, other]

SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

Authors: Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

Abstract: Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods s… ▽ More Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods such as a GNNExplainer. However, post-hoc explanations can not facilitate the model predictions and the computational cost of these methods cannot meet practical requirements, thus limiting their application in real-world scenarios. To address these issues, we propose SEFraud, a novel graph-based self-explainable fraud detection framework that simultaneously tackles fraud detection and result in interpretability. Concretely, SEFraud first leverages customized heterogeneous graph transformer networks with learnable feature masks and edge masks to learn expressive representations from the informative heterogeneously typed transactions. A new triplet loss is further designed to enhance the performance of mask learning. Empirical results on various datasets demonstrate the effectiveness of SEFraud as it shows considerable advantages in both the fraud detection performance and interpretability of prediction results. Moreover, SEFraud has been deployed and offers explainable fraud detection service for the largest bank in China, Industrial and Commercial Bank of China Limited (ICBC). Results collected from the production environment of ICBC show that SEFraud can provide accurate detection results and comprehensive explanations that align with the expert business understanding, confirming its efficiency and applicability in large-scale online services. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by KDD 2024

arXiv:2406.05422 [pdf, other]

Diffusion-based Reinforcement Learning for Dynamic UAV-assisted Vehicle Twins Migration in Vehicular Metaverses

Authors: Yongju Tong, Jiawen Kang, Junlong Chen, Minrui Xu, Gaolei Li, Weiting Zhang, Xincheng Yan

Abstract: Air-ground integrated networks can relieve communication pressure on ground transportation networks and provide 6G-enabled vehicular Metaverses services offloading in remote areas with sparse RoadSide Units (RSUs) coverage and downtown areas where users have a high demand for vehicular services. Vehicle Twins (VTs) are the digital twins of physical vehicles to enable more immersive and realistic v… ▽ More Air-ground integrated networks can relieve communication pressure on ground transportation networks and provide 6G-enabled vehicular Metaverses services offloading in remote areas with sparse RoadSide Units (RSUs) coverage and downtown areas where users have a high demand for vehicular services. Vehicle Twins (VTs) are the digital twins of physical vehicles to enable more immersive and realistic vehicular services, which can be offloaded and updated on RSU, to manage and provide vehicular Metaverses services to passengers and drivers. The high mobility of vehicles and the limited coverage of RSU signals necessitate VT migration to ensure service continuity when vehicles leave the signal coverage of RSUs. However, uneven VT task migration might overload some RSUs, which might result in increased service latency, and thus impactive immersive experiences for users. In this paper, we propose a dynamic Unmanned Aerial Vehicle (UAV)-assisted VT migration framework in air-ground integrated networks, where UAVs act as aerial edge servers to assist ground RSUs during VT task offloading. In this framework, we propose a diffusion-based Reinforcement Learning (RL) algorithm, which can efficiently make immersive VT migration decisions in UAV-assisted vehicular networks. To balance the workload of RSUs and improve VT migration quality, we design a novel dynamic path planning algorithm based on a heuristic search strategy for UAVs. Simulation results show that the diffusion-based RL algorithm with UAV-assisted performs better than other baseline schemes. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05418 [pdf, other]

Multi-attribute Auction-based Resource Allocation for Twins Migration in Vehicular Metaverses: A GPT-based DRL Approach

Authors: Yongju Tong, Junlong Chen, Minrui Xu, Jiawen Kang, Zehui Xiong, Dusit Niyato, Chau Yuen, Zhu Han

Abstract: Vehicular Metaverses are developed to enhance the modern automotive industry with an immersive and safe experience among connected vehicles and roadside infrastructures, e.g., RoadSide Units (RSUs). For seamless synchronization with virtual spaces, Vehicle Twins (VTs) are constructed as digital representations of physical entities. However, resource-intensive VTs updating and high mobility of vehi… ▽ More Vehicular Metaverses are developed to enhance the modern automotive industry with an immersive and safe experience among connected vehicles and roadside infrastructures, e.g., RoadSide Units (RSUs). For seamless synchronization with virtual spaces, Vehicle Twins (VTs) are constructed as digital representations of physical entities. However, resource-intensive VTs updating and high mobility of vehicles require intensive computation, communication, and storage resources, especially for their migration among RSUs with limited coverages. To address these issues, we propose an attribute-aware auction-based mechanism to optimize resource allocation during VTs migration by considering both price and non-monetary attributes, e.g., location and reputation. In this mechanism, we propose a two-stage matching for vehicular users and Metaverse service providers in multi-attribute resource markets. First, the resource attributes matching algorithm obtains the resource attributes perfect matching, namely, buyers and sellers can participate in a double Dutch auction (DDA). Then, we train a DDA auctioneer using a generative pre-trained transformer (GPT)-based deep reinforcement learning (DRL) algorithm to adjust the auction clocks efficiently during the auction process. We compare the performance of social welfare and auction information exchange costs with state-of-the-art baselines under different settings. Simulation results show that our proposed GPT-based DRL auction schemes have better performance than others. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 16 pages, 6 figures, 3 tables

arXiv:2406.05383 [pdf, other]

A Discrete Exterior Calculus of Bundle-valued Forms

Authors: Theo Braune, Yiying Tong, François Gay-Balmaz, Mathieu Desbrun

Abstract: The discretization of Cartan's exterior calculus of differential forms has been fruitful in a variety of theoretical and practical endeavors: from computational electromagnetics to the development of Finite-Element Exterior Calculus, the development of structure-preserving numerical tools satisfying exact discrete equivalents to Stokes' theorem or the de Rham complex for the exterior derivative ha… ▽ More The discretization of Cartan's exterior calculus of differential forms has been fruitful in a variety of theoretical and practical endeavors: from computational electromagnetics to the development of Finite-Element Exterior Calculus, the development of structure-preserving numerical tools satisfying exact discrete equivalents to Stokes' theorem or the de Rham complex for the exterior derivative have found numerous applications in computational physics. However, there has been a dearth of effort in establishing a more general discrete calculus, this time for differential forms with values in vector bundles over a combinatorial manifold equipped with a connection. In this work, we propose a discretization of the exterior covariant derivative of bundle-valued differential forms. We demonstrate that our discrete operator mimics its continuous counterpart, satisfies the Bianchi identities on simplicial cells, and contrary to previous attempts at its discretization, ensures numerical convergence to its exact evaluation with mesh refinement under mild assumptions. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 58 pages, 20 figures, Fix erroneous line break

MSC Class: 53A70

arXiv:2406.04679 [pdf, other]

XctDiff: Reconstruction of CT Images with Consistent Anatomical Structures from a Single Radiographic Projection Image

Authors: Qingze Bai, Tiange Liu, Zhi Liu, Yubing Tong, Drew Torigian, Jayaram Udupa

Abstract: In this paper, we present XctDiff, an algorithm framework for reconstructing CT from a single radiograph, which decomposes the reconstruction process into two easily controllable tasks: feature extraction and CT reconstruction. Specifically, we first design a progressive feature extraction strategy that is able to extract robust 3D priors from radiographs. Then, we use the extracted prior informat… ▽ More In this paper, we present XctDiff, an algorithm framework for reconstructing CT from a single radiograph, which decomposes the reconstruction process into two easily controllable tasks: feature extraction and CT reconstruction. Specifically, we first design a progressive feature extraction strategy that is able to extract robust 3D priors from radiographs. Then, we use the extracted prior information to guide the CT reconstruction in the latent space. Moreover, we design a homogeneous spatial codebook to improve the reconstruction quality further. The experimental results show that our proposed method achieves state-of-the-art reconstruction performance and overcomes the blurring issue. We also apply XctDiff on self-supervised pre-training task. The effectiveness indicates that it has promising additional applications in medical image analysis. The code is available at:https://github.com/qingze-bai/XctDiff △ Less

Submitted 13 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.20282 [pdf, other]

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

Authors: Chaoyang Wang, Xiangtai Li, Lu Qi, Henghui Ding, Yunhai Tong, Ming-Hsuan Yang

Abstract: Semantic segmentation and semantic image synthesis are two representative tasks in visual perception and generation. While existing methods consider them as two distinct tasks, we propose a unified diffusion-based framework (SemFlow) and model them as a pair of reverse problems. Specifically, motivated by rectified flow theory, we train an ordinary differential equation (ODE) model to transport be… ▽ More Semantic segmentation and semantic image synthesis are two representative tasks in visual perception and generation. While existing methods consider them as two distinct tasks, we propose a unified diffusion-based framework (SemFlow) and model them as a pair of reverse problems. Specifically, motivated by rectified flow theory, we train an ordinary differential equation (ODE) model to transport between the distributions of real images and semantic masks. As the training object is symmetric, samples belonging to the two distributions, images and semantic masks, can be effortlessly transferred reversibly. For semantic segmentation, our approach solves the contradiction between the randomness of diffusion outputs and the uniqueness of segmentation results. For image synthesis, we propose a finite perturbation approach to enhance the diversity of generated results without changing the semantic categories. Experiments show that our SemFlow achieves competitive results on semantic segmentation and semantic image synthesis tasks. We hope this simple framework will motivate people to rethink the unification of low-level and high-level vision. Project page: https://github.com/wang-chaoyang/SemFlow. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.04128 [pdf, other]

Fine-grained Speech Sentiment Analysis in Chinese Psychological Support Hotlines Based on Large-scale Pre-trained Model

Authors: Zhonglong Chen, Changwei Song, Yining Chen, Jianqiang Li, Guanghui Fu, Yongsheng Tong, Qing Zhao

Abstract: Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased… ▽ More Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased suicide risk. However, the high demand for psychological interventions often results in a shortage of professional operators, highlighting the need for an effective speech emotion recognition model. This model would automatically detect and analyze callers' emotions, facilitating integration into hotline services. Additionally, it would enable large-scale data analysis of psychological support hotline interactions to explore psychological phenomena and behaviors across populations. Our study utilizes data from the Beijing psychological support hotline, the largest suicide hotline in China. We analyzed speech data from 105 callers containing 20,630 segments and categorized them into 11 types of negative emotions. We developed a negative emotion recognition model and a fine-grained multi-label classification model using a large-scale pre-trained model. Our experiments indicate that the negative emotion recognition model achieves a maximum F1-score of 76.96%. However, it shows limited efficacy in the fine-grained multi-label classification task, with the best model achieving only a 41.74% weighted F1-score. We conducted an error analysis for this task, discussed potential future improvements, and considered the clinical application possibilities of our study. All the codes are public available. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04086 [pdf, other]

Optimizing Language Model's Reasoning Abilities with Weak Supervision

Authors: Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin, Chengsong Huang, Jiaxin Huang, Jingbo Shang

Abstract: While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities w… ▽ More While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities with minimal human supervision. In this work, we introduce self-reinforcement, which begins with Supervised Fine-Tuning (SFT) of the model using a small collection of annotated questions. Then it iteratively improves LLMs by learning from the differences in responses from the SFT and unfinetuned models on unlabeled questions. Our approach provides an efficient approach without relying heavily on extensive human-annotated explanations. However, current reasoning benchmarks typically only include golden-reference answers or rationales. Therefore, we present \textsc{PuzzleBen}, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales across various domains, such as brainteasers, puzzles, riddles, parajumbles, and critical reasoning tasks. A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities. Our experiments underscore the significance of \textsc{PuzzleBen}, as well as the effectiveness of our methodology as a promising direction in future endeavors. Our dataset and code will be published soon on \texttt{Anonymity Link}. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.12731 [pdf, other]

Near-Quantum-limited Haloscope Detection of Dark Photon Dark Matter Enhanced by a High-Q Superconducting Cavit

Authors: Runqi Kang, Man Jiao, Yu Tong, Yang Liu, Youpeng Zhong, Yi-Fu Cai, Jingwei Zhou, Xing Rong, Jiangfeng Du

Abstract: We report new experimental results on the search for dark photons based on a near-quantum-limited haloscope equipped with a superconducting cavity. The loaded quality factor of the superconducting cavity is $6\times10^{5}$, so that the expected signal from dark photon dark matter can be enhanced by more than one order compared to a copper cavity. A Josephson parametric amplifier with a near-quantu… ▽ More We report new experimental results on the search for dark photons based on a near-quantum-limited haloscope equipped with a superconducting cavity. The loaded quality factor of the superconducting cavity is $6\times10^{5}$, so that the expected signal from dark photon dark matter can be enhanced by more than one order compared to a copper cavity. A Josephson parametric amplifier with a near-quantum-limited noise temperature has been utilized to minimize the noise during the search. Furthermore, a digital acquisition card based on field programmable gate arrays has been utilized to maximize data collection efficiency with a duty cycle being 100$\%$. This work has established the most stringent constraints on dark photons at around 26.965 $μみゅー$eV. In the future, our apparatus can be extended to search for other dark matter candidates, such as axions and axion-like particles, and scrutinize new physics beyond the Standard Model. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.11605 [pdf, other]

VG4D: Vision-Language Model Goes 4D Video Recognition

Authors: Zhichao Deng, Xiangtai Li, Xia Li, Yunhai Tong, Shen Zhao, Mengyuan Liu

Abstract: Understanding the real world through point cloud video is a crucial aspect of robotics and autonomous driving systems. However, prevailing methods for 4D point cloud recognition have limitations due to sensor resolution, which leads to a lack of detailed information. Recent advances have shown that Vision-Language Models (VLM) pre-trained on web-scale text-image datasets can learn fine-grained vis… ▽ More Understanding the real world through point cloud video is a crucial aspect of robotics and autonomous driving systems. However, prevailing methods for 4D point cloud recognition have limitations due to sensor resolution, which leads to a lack of detailed information. Recent advances have shown that Vision-Language Models (VLM) pre-trained on web-scale text-image datasets can learn fine-grained visual concepts that can be transferred to various downstream tasks. However, effectively integrating VLM into the domain of 4D point clouds remains an unresolved problem. In this work, we propose the Vision-Language Models Goes 4D (VG4D) framework to transfer VLM knowledge from visual-text pre-trained models to a 4D point cloud network. Our approach involves aligning the 4D encoder's representation with a VLM to learn a shared visual and text space from training on large-scale image-text pairs. By transferring the knowledge of the VLM to the 4D encoder and combining the VLM, our VG4D achieves improved recognition performance. To enhance the 4D encoder, we modernize the classic dynamic point cloud backbone and propose an improved version of PSTNet, im-PSTNet, which can efficiently model point cloud videos. Experiments demonstrate that our method achieves state-of-the-art performance for action recognition on both the NTU RGB+D 60 dataset and the NTU RGB+D 120 dataset. Code is available at \url{https://github.com/Shark0-0/VG4D}. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: ICRA 2024

arXiv:2404.11503 [pdf, other]

Mixing Time of Open Quantum Systems via Hypocoercivity

Authors: Di Fang, Jianfeng Lu, Yu Tong

Abstract: Understanding the mixing of open quantum systems is a fundamental problem in physics and quantum information science. Existing approaches for estimating the mixing time often rely on the spectral gap estimation of the Lindbladian generator, which can be challenging to obtain in practice. We propose a novel theoretical framework to estimate the mixing time of open quantum systems that treats the Ha… ▽ More Understanding the mixing of open quantum systems is a fundamental problem in physics and quantum information science. Existing approaches for estimating the mixing time often rely on the spectral gap estimation of the Lindbladian generator, which can be challenging to obtain in practice. We propose a novel theoretical framework to estimate the mixing time of open quantum systems that treats the Hamiltonian and dissipative part separately, thus circumventing the need for a priori estimation of the spectral gap of the full Lindbladian generator. The technique is based on the construction of an energy functional inspired by the hypocoercivity of (classical) kinetic theory. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09083 [pdf]

Interplay between electronic dephasing and localization in finite-sized Chern insulator

Authors: Yunhe Bai, Yuanzhao Li, Jianli Luan, Yang Chen, Zongwei Gao, Wenyu Song, Yitian Tong, Jinsong Zhang, Yayu Wang, Junjie Qi, Chui-Zhen Chen, Hua Jiang, X. C. Xie, Ke He, Yang Feng, Xiao Feng, Qi-Kun Xue

Abstract: Anderson localization is anticipated to play a pivotal role in the manifestation of the quantum anomalous Hall effect, akin to its role in conventional quantum Hall effects. The significance of Anderson localization is particularly pronounced in elucidating the reasons behind the fragility of the observed quantum anomalous Hall state in the intrinsic magnetic topological insulator MnBi2Te4 with a… ▽ More Anderson localization is anticipated to play a pivotal role in the manifestation of the quantum anomalous Hall effect, akin to its role in conventional quantum Hall effects. The significance of Anderson localization is particularly pronounced in elucidating the reasons behind the fragility of the observed quantum anomalous Hall state in the intrinsic magnetic topological insulator MnBi2Te4 with a large predicted magnetic gap. Here, employing varying sized MnBi2Te4 micro/nano-structures fabricated from a single molecular-beam-epitaxy-grown thin film, we have carried out a systematic size- and temperature-dependent study on the transport properties of the films regarding the quantum anomalous Hall states. The low-temperature transport properties of the finite-sized MnBi2Te4 samples can be quantitatively understood through Anderson localization, which plays an indispensable role in stabilizing the ground states. At higher temperatures, the failure of electron localization induced by an excessively short electronic dephasing length is identified as the cause of deviation from quantization. The work reveals that electronic dephasing and localization are non-negligible factors in designing high-temperature quantum anomalous Hall systems. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 20 pages, 4 figures

arXiv:2404.07796 [pdf, other]

doi 10.48550/arXiv.2404.07796

Point defects in CdTe and CdTeSe alloy: a first principles investigation with DFT+U

Authors: Xiaofeng Xiang, Yijun Tong, Aaron Gehrke, Scott Dunham

Abstract: CdTe and its alloy CdTeSe are widely used in optoelectronic devices, such as radiation detectors and solar cells, due to their superior electrical properties. However, the formation of defects and defect complexes in these materials can significantly affect their performance. As a result, understanding the defect formation and recombination processes in CdTe and CdTeSe alloy is of great importance… ▽ More CdTe and its alloy CdTeSe are widely used in optoelectronic devices, such as radiation detectors and solar cells, due to their superior electrical properties. However, the formation of defects and defect complexes in these materials can significantly affect their performance. As a result, understanding the defect formation and recombination processes in CdTe and CdTeSe alloy is of great importance. In recent years, density functional theory (DFT) calculations have emerged as a powerful tool for investigating the properties of defects in semiconductors. In this paper, we use DFT+U calculations to comprehensively study the properties of intrinsic defects as well as extrinsic defects induced by commonly used dopants, such as Cu and group V elements, in CdTe and CdTeSe alloy. This work provides insights into the effects of these defects on the electrical and optical properties of the material. △ Less

Submitted 1 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: 10 pages, 23 figures

arXiv:2404.06970 [pdf, other]

Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning

Authors: Peipei Liu, Gaosheng Wang, Ying Tong, Jian Liang, Zhenquan Ding, Hongsong Zhu

Abstract: Few-shot named entity recognition can identify new types of named entities based on a few labeled examples. Previous methods employing token-level or span-level metric learning suffer from the computational burden and a large number of negative sample spans. In this paper, we propose the Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning (MsFNER), which splits the… ▽ More Few-shot named entity recognition can identify new types of named entities based on a few labeled examples. Previous methods employing token-level or span-level metric learning suffer from the computational burden and a large number of negative sample spans. In this paper, we propose the Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning (MsFNER), which splits the general NER into two stages: entity-span detection and entity classification. There are 3 processes for introducing MsFNER: training, finetuning, and inference. In the training process, we train and get the best entity-span detection model and the entity classification model separately on the source domain using meta-learning, where we create a contrastive learning module to enhance entity representations for entity classification. During finetuning, we finetune the both models on the support dataset of target domain. In the inference process, for the unlabeled data, we first detect the entity-spans, then the entity-spans are jointly determined by the entity classification model and the KNN. We conduct experiments on the open FewNERD dataset and the results demonstrate the advance of MsFNER. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.04490 [pdf, other]

Hyperparameter Optimization for SecureBoost via Constrained Multi-Objective Federated Learning

Authors: Yan Kang, Ziyao Ren, Lixin Fan, Linghua Yang, Yongxin Tong, Qiang Yang

Abstract: SecureBoost is a tree-boosting algorithm that leverages homomorphic encryption (HE) to protect data privacy in vertical federated learning. SecureBoost and its variants have been widely adopted in fields such as finance and healthcare. However, the hyperparameters of SecureBoost are typically configured heuristically for optimizing model performance (i.e., utility) solely, assuming that privacy is… ▽ More SecureBoost is a tree-boosting algorithm that leverages homomorphic encryption (HE) to protect data privacy in vertical federated learning. SecureBoost and its variants have been widely adopted in fields such as finance and healthcare. However, the hyperparameters of SecureBoost are typically configured heuristically for optimizing model performance (i.e., utility) solely, assuming that privacy is secured. Our study found that SecureBoost and some of its variants are still vulnerable to label leakage. This vulnerability may lead the current heuristic hyperparameter configuration of SecureBoost to a suboptimal trade-off between utility, privacy, and efficiency, which are pivotal elements toward a trustworthy federated learning system. To address this issue, we propose the Constrained Multi-Objective SecureBoost (CMOSB) algorithm, which aims to approximate Pareto optimal solutions that each solution is a set of hyperparameters achieving an optimal trade-off between utility loss, training cost, and privacy leakage. We design measurements of the three objectives, including a novel label inference attack named instance clustering attack (ICA) to measure the privacy leakage of SecureBoost. Additionally, we provide two countermeasures against ICA. The experimental results demonstrate that the CMOSB yields superior hyperparameters over those optimized by grid search and Bayesian optimization regarding the trade-off between utility loss, training cost, and privacy leakage. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.02102 [pdf]

Atomic magnetometry using a metasurface polarizing beamsplitter in silicon on sapphire

Authors: Xuting Yang, Pritha Mukherjee, Minjeong Kim, Hongyan Mei, Chengyu Fang, Soyeon Choi, Yuhan Tong, Sarah Perlowski, David A. Czaplewski, Alan M. Dibos, Mikhail A. Kats, Jennifer T. Choy

Abstract: We demonstrate atomic magnetometry using a metasurface polarizing beamsplitter fabricated on a silicon-on-sapphire (SOS) platform. The metasurface splits a beam that is near-resonant with the rubidium atoms (795 nm) into orthogonal linear polarizations, enabling measurement of magnetically sensitive circular birefringence in a rubidium vapor through balanced polarimetry. We incorporated the metasu… ▽ More We demonstrate atomic magnetometry using a metasurface polarizing beamsplitter fabricated on a silicon-on-sapphire (SOS) platform. The metasurface splits a beam that is near-resonant with the rubidium atoms (795 nm) into orthogonal linear polarizations, enabling measurement of magnetically sensitive circular birefringence in a rubidium vapor through balanced polarimetry. We incorporated the metasurface into an atomic magnetometer based on nonlinear magneto-optical rotation and measured sub-nanotesla sensitivity, which is limited by low-frequency technical noise and transmission loss through the metasurface. To our knowledge, this work represents the first demonstration of SOS nanophotonics for atom-based sensing and paves the way for highly integrated, miniaturized atomic sensors with enhanced sensitivity and portability. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01510 [pdf, ps, other]

Homotopy commutativity in quasitoric manifolds

Authors: Sho Hasui, Daisuke Kishimoto, Yichen Tong, Mitsunobu Tsutaya

Abstract: We prove that the loop space of a quasitoric manifold is homotopy commutative if and only if the underlying polytope is $(Δでるた^3)^n$ and the characteristic matrix is equivalent to a matrix of certain type. We also construct for each $n\ge 2$ and a positive integer $k$, a quasitoric manifold $M(k,n)$ over $(Δでるた^3)^n$ such that its loop space is homotopy commutative if and only if $k$ is even, where ever… ▽ More We prove that the loop space of a quasitoric manifold is homotopy commutative if and only if the underlying polytope is $(Δでるた^3)^n$ and the characteristic matrix is equivalent to a matrix of certain type. We also construct for each $n\ge 2$ and a positive integer $k$, a quasitoric manifold $M(k,n)$ over $(Δでるた^3)^n$ such that its loop space is homotopy commutative if and only if $k$ is even, where every quasitoric manifold over $Δでるた^3$ is equivalent to $\mathbb{C} P^3$ whose loop space is homotopy commutative. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 14 pages

MSC Class: 57S12; 55P35; 55Q15

arXiv:2403.20046 [pdf, other]

Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning

Authors: Yongqi Tong, Dawei Li, Sizhe Wang, Yujia Wang, Fei Teng, Jingbo Shang

Abstract: Recent works have shown the benefits to LLMs from fine-tuning golden-standard Chain-of-Thought (CoT) rationales or using them as correct examples in few-shot prompting. While humans can indeed imitate correct examples, learning from our mistakes is another vital aspect of human cognition. Hence, a question naturally arises: \textit{can LLMs learn and benefit from their mistakes, especially for the… ▽ More Recent works have shown the benefits to LLMs from fine-tuning golden-standard Chain-of-Thought (CoT) rationales or using them as correct examples in few-shot prompting. While humans can indeed imitate correct examples, learning from our mistakes is another vital aspect of human cognition. Hence, a question naturally arises: \textit{can LLMs learn and benefit from their mistakes, especially for their reasoning? } This study investigates this problem from both the prompting and model-tuning perspectives. We begin by introducing \textsc{CoTErrorSet}, a new benchmark with 609,432 questions, each designed with both correct and error references, and demonstrating the types and reasons for making such mistakes. To explore the effectiveness of those mistakes, we design two methods: (1) \textbf{Self-rethinking} prompting guides LLMs to rethink whether they have made similar previous mistakes; and (2) \textbf{Mistake tuning} involves finetuning models in both correct and incorrect reasoning domains, rather than only tuning models to learn ground truth in traditional methodology. We conduct a series of experiments to prove LLMs can obtain benefits from mistakes in both directions. Our two methods offer potentially cost-effective strategies by leveraging errors to enhance reasoning capabilities, which costs significantly less than creating meticulously hand-crafted golden references. We ultimately make a thorough analysis of the reasons behind LLMs' errors, which provides directions that future research needs to overcome. \textsc{CoTErrorSet} will be published soon on \texttt{\url{https://github.com/YookiTong/Learn-from-Mistakes-CotErrorSet}}. △ Less

Submitted 7 June, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

Comments: The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) - Main Conference

arXiv:2403.15875 [pdf, other]

LAMPER: LanguAge Model and Prompt EngineeRing for zero-shot time series classification

Authors: Zhicheng Du, Zhaotian Xie, Yan Tong, Peiwu Qin

Abstract: This study constructs the LanguAge Model with Prompt EngineeRing (LAMPER) framework, designed to systematically evaluate the adaptability of pre-trained language models (PLMs) in accommodating diverse prompts and their integration in zero-shot time series (TS) classification. We deploy LAMPER in experimental assessments using 128 univariate TS datasets sourced from the UCR archive. Our findings in… ▽ More This study constructs the LanguAge Model with Prompt EngineeRing (LAMPER) framework, designed to systematically evaluate the adaptability of pre-trained language models (PLMs) in accommodating diverse prompts and their integration in zero-shot time series (TS) classification. We deploy LAMPER in experimental assessments using 128 univariate TS datasets sourced from the UCR archive. Our findings indicate that the feature representation capacity of LAMPER is influenced by the maximum input token threshold imposed by PLMs. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Accepted as tiny paper in ICLR 2024

arXiv:2403.09616 [pdf, other]

Explore In-Context Segmentation via Latent Diffusion Models

Authors: Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan

Abstract: In-context segmentation has drawn more attention with the introduction of vision foundation models. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. In this work, we explore this problem from a new perspective, using one representative generation model, the latent diffusion model (LDM). We observe a tas… ▽ More In-context segmentation has drawn more attention with the introduction of vision foundation models. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. In this work, we explore this problem from a new perspective, using one representative generation model, the latent diffusion model (LDM). We observe a task gap between generation and segmentation in diffusion models, but LDM is still an effective minimalist for in-context segmentation. In particular, we propose two meta-architectures and correspondingly design several output alignment and optimization strategies. We have conducted comprehensive ablation studies and empirically found that the segmentation quality counts on output alignment and in-context instructions. Moreover, we build a new and fair in-context segmentation benchmark that includes both image and video datasets. Experiments validate the efficiency of our approach, demonstrating comparable or even stronger results than previous specialist models or visual foundation models. Our study shows that LDMs can also achieve good enough results for challenging in-context segmentation tasks. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.02632 [pdf, other]

doi 10.1109/JIOT.2023.3243944

Human Activity Recognition with Low-Resolution Infrared Array Sensor Using Semi-supervised Cross-domain Neural Networks for Indoor Environment

Authors: Cunyi Yin, Xiren Miao, Jing Chen, Hao Jiang, Deying Chen, Yixuan Tong, Shaocong Zheng

Abstract: Low-resolution infrared-based human activity recognition (HAR) attracted enormous interests due to its low-cost and private. In this paper, a novel semi-supervised crossdomain neural network (SCDNN) based on 8 $\times$ 8 low-resolution infrared sensor is proposed for accurately identifying human activity despite changes in the environment at a low-cost. The SCDNN consists of feature extractor, dom… ▽ More Low-resolution infrared-based human activity recognition (HAR) attracted enormous interests due to its low-cost and private. In this paper, a novel semi-supervised crossdomain neural network (SCDNN) based on 8 $\times$ 8 low-resolution infrared sensor is proposed for accurately identifying human activity despite changes in the environment at a low-cost. The SCDNN consists of feature extractor, domain discriminator and label classifier. In the feature extractor, the unlabeled and minimal labeled target domain data are trained for domain adaptation to achieve a mapping of the source domain and target domain data. The domain discriminator employs the unsupervised learning to migrate data from the source domain to the target domain. The label classifier obtained from training the source domain data improves the recognition of target domain activities due to the semi-supervised learning utilized in training the target domain data. Experimental results show that the proposed method achieves 92.12\% accuracy for recognition of activities in the target domain by migrating the source and target domains. The proposed approach adapts superior to cross-domain scenarios compared to the existing deep learning methods, and it provides a low-cost yet highly adaptable solution for cross-domain scenarios. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01387 [pdf, other]

A Comprehensive Survey of Federated Transfer Learning: Challenges, Methods and Applications

Authors: Wei Guo, Fuzhen Zhuang, Xiao Zhang, Yiqi Tong, Jin Dong

Abstract: Federated learning (FL) is a novel distributed machine learning paradigm that enables participants to collaboratively train a centralized model with privacy preservation by eliminating the requirement of data sharing. In practice, FL often involves multiple participants and requires the third party to aggregate global information to guide the update of the target participant. Therefore, many FL me… ▽ More Federated learning (FL) is a novel distributed machine learning paradigm that enables participants to collaboratively train a centralized model with privacy preservation by eliminating the requirement of data sharing. In practice, FL often involves multiple participants and requires the third party to aggregate global information to guide the update of the target participant. Therefore, many FL methods do not work well due to the training and test data of each participant may not be sampled from the same feature space and the same underlying distribution. Meanwhile, the differences in their local devices (system heterogeneity), the continuous influx of online data (incremental data), and labeled data scarcity may further influence the performance of these methods. To solve this problem, federated transfer learning (FTL), which integrates transfer learning (TL) into FL, has attracted the attention of numerous researchers. However, since FL enables a continuous share of knowledge among participants with each communication round while not allowing local data to be accessed by other participants, FTL faces many unique challenges that are not present in TL. In this survey, we focus on categorizing and reviewing the current progress on federated transfer learning, and outlining corresponding solutions and applications. Furthermore, the common setting of FTL scenarios, available datasets, and significant related research are summarized in this survey. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.16138 [pdf, other]

Integration of Conventional Surface Science Techniques with Surface-Sensitive Azimuthal and Polarization Dependent Femtosecond-Resolved Sum Frequency Generation Spectroscopy

Authors: Zhipeng Huang, Tobias Roos, Yujin Tong, R. Kramer Campen

Abstract: Experimental insight into the elementary processes underlying charge transfer across interfaces has blossomed with the wide-spread availability of ultra-high vacuum set-ups that allow the preparation and characterization of solid surfaces with well-defined molecular adsorbates over a wide ranges of temperatures. Thick layers of molecular adsorbates or heterostructures of 2D materials generally pre… ▽ More Experimental insight into the elementary processes underlying charge transfer across interfaces has blossomed with the wide-spread availability of ultra-high vacuum set-ups that allow the preparation and characterization of solid surfaces with well-defined molecular adsorbates over a wide ranges of temperatures. Thick layers of molecular adsorbates or heterostructures of 2D materials generally preclude the use of electrons or atoms as probes in such characterization. However with linear photon-in/photon-out techniques it is often challenging to assign the observed optical response to a particular portion of the interface. We and prior workers have demonstrated in work under ambient conditions that by full characterization of the symmetry of the second order nonlinear optical susceptibility, i.e. the $χかい^{(2)}$, in sum frequency generation (SFG) spectroscopy, this problem can be overcome. Here we describe an ultra-high vacuum system built to allow conventional UHV sample preparation and characterization, femtosecond and polarization resolved SFG spectroscopy, the azimuthal sample rotation necessary to fully describe $χかい^{(2)}$ symmetry and with sufficient stability to allow scanning SFG microscopy. We demonstrate these capabilities in proof-of-principle measurements on CO adsorbed on Pt(111) and of the clean Ag(111) surface. Because this set-up allows both full characterization of the nonlinear susceptibility and the temperature control and sample preparation/characterization of conventional UHV set-ups we expect it to be of great utility in investigation of both the basic physics and applications of solid, 2D material heterostructures. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.05247 [pdf, other]

A Geometric VOF Method for Interface Flow Simulations

Authors: Dezhi Dai, Haomin Yuan, Albert Y. Tong, Adrian Tentner

Abstract: A novel numerical technique designed for interface flow simulations using the Volume of Fluid (VOF) method on arbitrary unstructured meshes has been introduced. The method is called SimPLIC, which seamlessly integrates Piecewise Linear Interface Calculation (PLIC) and Simpson's rule. The main focus of the proposed method is to compute the volume of the primary phase that moves across a mesh face w… ▽ More A novel numerical technique designed for interface flow simulations using the Volume of Fluid (VOF) method on arbitrary unstructured meshes has been introduced. The method is called SimPLIC, which seamlessly integrates Piecewise Linear Interface Calculation (PLIC) and Simpson's rule. The main focus of the proposed method is to compute the volume of the primary phase that moves across a mesh face within a single time step. This is achieved by reconstructing the interface and assessing how the submerged face area evolves over time. Simpson's rule is employed to integrate the time evolution of this submerged face area, ensuring an accurate estimation of the volume of the transported primary phase. The method's robustness was validated by solving a spherical interface advection problem in a non-uniform three-dimensional flow across unstructured meshes with diverse cell types and dimensions. Key metrics such as volume conservation, shape retention, friction boundedness and solving efficiency were meticulously monitored and juxtaposed. Numerical outcomes underscored the precision and adequacy of the PLIC-VOF technique when complemented with Simpson's rule in advecting the interface. Furthermore, the SimPLIC method has been integrated into OpenFOAM v2312 as an unofficial extension and is now accessible to the community. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.01713 [pdf, other]

Prompting Large Language Models for Zero-Shot Clinical Prediction with Structured Longitudinal Electronic Health Record Data

Authors: Yinghao Zhu, Zixiang Wang, Junyi Gao, Yuning Tong, Jingkun An, Weibin Liao, Ewen M. Harrison, Liantao Ma, Chengwei Pan

Abstract: The inherent complexity of structured longitudinal Electronic Health Records (EHR) data poses a significant challenge when integrated with Large Language Models (LLMs), which are traditionally tailored for natural language processing. Motivated by the urgent need for swift decision-making during new disease outbreaks, where traditional predictive models often fail due to a lack of historical data,… ▽ More The inherent complexity of structured longitudinal Electronic Health Records (EHR) data poses a significant challenge when integrated with Large Language Models (LLMs), which are traditionally tailored for natural language processing. Motivated by the urgent need for swift decision-making during new disease outbreaks, where traditional predictive models often fail due to a lack of historical data, this research investigates the adaptability of LLMs, like GPT-4, to EHR data. We particularly focus on their zero-shot capabilities, which enable them to make predictions in scenarios in which they haven't been explicitly trained. In response to the longitudinal, sparse, and knowledge-infused nature of EHR data, our prompting approach involves taking into account specific EHR characteristics such as units and reference ranges, and employing an in-context learning strategy that aligns with clinical contexts. Our comprehensive experiments on the MIMIC-IV and TJH datasets demonstrate that with our elaborately designed prompting framework, LLMs can improve prediction performance in key tasks such as mortality, length-of-stay, and 30-day readmission by about 35\%, surpassing ML models in few-shot settings. Our research underscores the potential of LLMs in enhancing clinical decision-making, especially in urgent healthcare situations like the outbreak of emerging diseases with no labeled data. The code is publicly available at https://github.com/yhzhu99/llm4healthcare for reproducibility. △ Less

Submitted 10 February, 2024; v1 submitted 25 January, 2024; originally announced February 2024.

arXiv:2401.12544 [pdf]

Correlation between magnetic domain structures and quantum anomalous Hall effect in epitaxial MnBi2Te4 thin films

Authors: Yang Shi, Yunhe Bai, Yuanzhao Li, Yang Feng, Qiang Li, Huanyu Zhang, Yang Chen, Yitian Tong, Jianli Luan, Ruixuan Liu, Pengfei Ji, Zongwei Gao, Hangwen Guo, Jinsong Zhang, Yayu Wang, Xiao Feng, Ke He, Xiaodong Zhou, Jian Shen

Abstract: We use magnetic force microscopy (MFM) to study spatial uniformity of magnetization of epitaxially grown MnBi2Te4 thin films. Compared to films which exhibit no quantum anomalous Hall effect (QAH), films with QAH are observed to have more spatial uniformity of magnetization with larger domain size. The domain evolution upon magnetic field sweeping indicates that the magnetic domains or the spatial… ▽ More We use magnetic force microscopy (MFM) to study spatial uniformity of magnetization of epitaxially grown MnBi2Te4 thin films. Compared to films which exhibit no quantum anomalous Hall effect (QAH), films with QAH are observed to have more spatial uniformity of magnetization with larger domain size. The domain evolution upon magnetic field sweeping indicates that the magnetic domains or the spatial nonuniformity of magnetization originates from the strong pinning of the inherent sample inhomogeneity. A direct correlation between the Hall resistivity and the domain size has been established by analyzing a series of thin films with and without QAH. Our observation shows that one has to suppress the spatial nonuniformity of magnetization to allow the Hall resistivity to be quantized. The fact that a sizable longitudinal resistivity remains even for the QAH sample suggests a quantized Hall insulator scenario. Our work provides important insights to the understanding of the quantization mechanism and the dissipation of the QAH state in MnBi2Te4 system. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 14 pages, 4 figures

arXiv:2401.11450 [pdf]

Reentrant quantum anomalous Hall effect in molecular beam epitaxy-grown MnBi2Te4 thin films

Authors: Yuanzhao Li, Yunhe Bai, Yang Feng, Jianli Luan, Zongwei Gao, Yang Chen, Yitian Tong, Ruixuan Liu, Su Kong Chong, Kang L. Wang, Xiaodong Zhou, Jian Shen, Jinsong Zhang, Yayu Wang, Chui-Zhen Chen, XinCheng Xie, Xiao Feng, Ke He, Qi-Kun Xue

Abstract: In this study, we investigate intrinsic magnetic topological insulator MnBi2Te4 thin films grown by molecular beam epitaxy. We observe a reentrant quantum anomalous Hall effect when the Fermi energy enters the valance band and magnetic field equals zero, indicating the emergence of the Chern Anderson insulator state. The discovery opens a new avenue for realizing the QAH effect and underscores the… ▽ More In this study, we investigate intrinsic magnetic topological insulator MnBi2Te4 thin films grown by molecular beam epitaxy. We observe a reentrant quantum anomalous Hall effect when the Fermi energy enters the valance band and magnetic field equals zero, indicating the emergence of the Chern Anderson insulator state. The discovery opens a new avenue for realizing the QAH effect and underscores the fundamental role of both Berry curvature and Anderson localization. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 15 pages, 4 figures

arXiv:2401.10228 [pdf, other]

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

Authors: Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang

Abstract: Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation. However, most VFMs cannot run in realtime, which makes it difficult to transfer them into several products. On the other hand, current real-time segmentation mainl… ▽ More Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation. However, most VFMs cannot run in realtime, which makes it difficult to transfer them into several products. On the other hand, current real-time segmentation mainly has one purpose, such as semantic segmentation on the driving scene. We argue that diverse outputs are needed for real applications. Thus, this work explores a new real-time segmentation setting, named all-purpose segmentation in real-time, to transfer VFMs in real-time deployment. It contains three different tasks, including interactive segmentation, panoptic segmentation, and video segmentation. We aim to use one model to achieve the above tasks in real-time. We first benchmark several strong baselines. Then, we present Real-Time All Purpose SAM (RAP-SAM). It contains an efficient encoder and an efficient decoupled decoder to perform prompt-driven decoding. Moreover, we further explore different training strategies and tuning methods to boost co-training performance further. Our code and model are available at https://github.com/xushilin1/RAP-SAM/. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: Project Page: https://xushilin1.github.io/rap_sam/

arXiv:2401.10226 [pdf, other]

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Authors: Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy

Abstract: We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process. This approach overcomes the limitations of traditional video inpainting methods that depend on manually labeled binary masks, a process often tedious and labor-intensive. We present the Remove Objects from Videos by Instructions (ROVI) dataset, containing 5,650 vid… ▽ More We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process. This approach overcomes the limitations of traditional video inpainting methods that depend on manually labeled binary masks, a process often tedious and labor-intensive. We present the Remove Objects from Videos by Instructions (ROVI) dataset, containing 5,650 videos and 9,091 inpainting results, to support training and evaluation for this task. We also propose a novel diffusion-based language-driven video inpainting framework, the first end-to-end baseline for this task, integrating Multimodal Large Language Models to understand and execute complex language-based inpainting requests effectively. Our comprehensive results showcase the dataset's versatility and the model's effectiveness in various language-instructed inpainting scenarios. We will make datasets, code, and models publicly available. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: Project Page: https://jianzongwu.github.io/projects/rovi

arXiv:2401.04136 [pdf, other]

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

Authors: Haonan Wang, Qianli Shen, Yao Tong, Yang Zhang, Kenji Kawaguchi

Abstract: The commercialization of text-to-image diffusion models (DMs) brings forth potential copyright concerns. Despite numerous attempts to protect DMs from copyright issues, the vulnerabilities of these solutions are underexplored. In this study, we formalized the Copyright Infringement Attack on generative AI models and proposed a backdoor attack method, SilentBadDiffusion, to induce copyright infring… ▽ More The commercialization of text-to-image diffusion models (DMs) brings forth potential copyright concerns. Despite numerous attempts to protect DMs from copyright issues, the vulnerabilities of these solutions are underexplored. In this study, we formalized the Copyright Infringement Attack on generative AI models and proposed a backdoor attack method, SilentBadDiffusion, to induce copyright infringement without requiring access to or control over training processes. Our method strategically embeds connections between pieces of copyrighted information and text references in poisoning data while carefully dispersing that information, making the poisoning data inconspicuous when integrated into a clean dataset. Our experiments show the stealth and efficacy of the poisoning data. When given specific text prompts, DMs trained with a poisoning ratio of 0.20% can produce copyrighted images. Additionally, the results reveal that the more sophisticated the DMs are, the easier the success of the attack becomes. These findings underline potential pitfalls in the prevailing copyright protection strategies and underscore the necessity for increased scrutiny to prevent the misuse of DMs. △ Less

Submitted 26 May, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

Comments: Accepted for presentation at ICML 2024

arXiv:2401.03664 [pdf]

Dual-Channel Reliable Breast Ultrasound Image Classification Based on Explainable Attribution and Uncertainty Quantification

Authors: Shuge Lei, Haonan Hu, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Jijun Tang, Yan Tong

Abstract: This paper focuses on the classification task of breast ultrasound images and researches on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationales based on the improved feature a… ▽ More This paper focuses on the classification task of breast ultrasound images and researches on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationales based on the improved feature attribution algorithm SP-RISA are gracefully applied. Uncertainty quantification is used to evaluate the predictive reliability via the Test Time Enhancement. The effectiveness of this reliability evaluation framework has been verified on our breast ultrasound clinical dataset YBUS, and its robustness is verified on the public dataset BUSI. The expected calibration errors on both datasets are significantly lower than traditional evaluation methods, which proves the effectiveness of our proposed reliability measurement. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2312.16197 [pdf, other]

INFAMOUS-NeRF: ImproviNg FAce MOdeling Using Semantically-Aligned Hypernetworks with Neural Radiance Fields

Authors: Andrew Hou, Feng Liu, Zhiyuan Ren, Michel Sarkis, Ning Bi, Yiying Tong, Xiaoming Liu

Abstract: We propose INFAMOUS-NeRF, an implicit morphable face model that introduces hypernetworks to NeRF to improve the representation power in the presence of many training subjects. At the same time, INFAMOUS-NeRF resolves the classic hypernetwork tradeoff of representation power and editability by learning semantically-aligned latent spaces despite the subject-specific models, all without requiring a l… ▽ More We propose INFAMOUS-NeRF, an implicit morphable face model that introduces hypernetworks to NeRF to improve the representation power in the presence of many training subjects. At the same time, INFAMOUS-NeRF resolves the classic hypernetwork tradeoff of representation power and editability by learning semantically-aligned latent spaces despite the subject-specific models, all without requiring a large pretrained model. INFAMOUS-NeRF further introduces a novel constraint to improve NeRF rendering along the face boundary. Our constraint can leverage photometric surface rendering and multi-view supervision to guide surface color prediction and improve rendering near the surface. Finally, we introduce a novel, loss-guided adaptive sampling method for more effective NeRF training by reducing the sampling redundancy. We show quantitatively and qualitatively that our method achieves higher representation power than prior face modeling methods in both controlled and in-the-wild settings. Code and models will be released upon publication. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.12171 [pdf, other]

Equivariant divergence formula for chaotic flows

Authors: Angxiu Ni, Yao Tong

Abstract: We prove the equivariant divergence formula for the axiom A flow attractors, which is a recursive formula for perturbation of transfer operators of physical measures along center-unstable manifolds. Hence the linear response acquires an `ergodic theorem', which means that it can be sampled by recursively computing only $2u$ many vectors on one orbit, where $u$ is the unstable dimension. We prove the equivariant divergence formula for the axiom A flow attractors, which is a recursive formula for perturbation of transfer operators of physical measures along center-unstable manifolds. Hence the linear response acquires an `ergodic theorem', which means that it can be sampled by recursively computing only $2u$ many vectors on one orbit, where $u$ is the unstable dimension. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: comments are welcome!

arXiv:2312.12012 [pdf, other]

Efficient and Private Federated Trajectory Matching

Authors: Yuxiang Wang, Yuxiang Zeng, Yi Xu, Zimu Zhou, Yongxin Tong

Abstract: Federated Trajectory Matching (FTM) is gaining increasing importance in big trajectory data analytics, supporting diverse applications such as public health, law enforcement, and emergency response. FTM retrieves trajectories that match with a query trajectory from a large-scale trajectory database, while safeguarding the privacy of trajectories in both the query and the database. A naive solution… ▽ More Federated Trajectory Matching (FTM) is gaining increasing importance in big trajectory data analytics, supporting diverse applications such as public health, law enforcement, and emergency response. FTM retrieves trajectories that match with a query trajectory from a large-scale trajectory database, while safeguarding the privacy of trajectories in both the query and the database. A naive solution to FTM is to process the query through Secure Multi-Party Computation (SMC) across the entire database, which is inherently secure yet inevitably slow due to the massive secure operations. A promising acceleration strategy is to filter irrelevant trajectories from the database based on the query, thus reducing the SMC operations. However, a key challenge is how to publish the query in a way that both preserves privacy and enables efficient trajectory filtering. In this paper, we design GIST, a novel framework for efficient Federated Trajectory Matching. GIST is grounded in Geo-Indistinguishability, a privacy criterion dedicated to locations. It employs a new privacy mechanism for the query that facilitates efficient trajectory filtering. We theoretically prove the privacy guarantee of the mechanism and the accuracy of the filtering strategy of GIST. Extensive evaluations on five real datasets show that GIST is significantly faster and incurs up to 3 orders of magnitude lower communication cost than the state-of-the-arts. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 14 pages

arXiv:2311.14818 [pdf, other]

Stochastic error cancellation in analog quantum simulation

Authors: Yiyi Cai, Yu Tong, John Preskill

Abstract: Analog quantum simulation is a promising path towards solving classically intractable problems in many-body physics on near-term quantum devices. However, the presence of noise limits the size of the system and the length of time that can be simulated. In our work, we consider an error model in which the actual Hamiltonian of the simulator differs from the target Hamiltonian we want to simulate by… ▽ More Analog quantum simulation is a promising path towards solving classically intractable problems in many-body physics on near-term quantum devices. However, the presence of noise limits the size of the system and the length of time that can be simulated. In our work, we consider an error model in which the actual Hamiltonian of the simulator differs from the target Hamiltonian we want to simulate by small local perturbations, which are assumed to be random and unbiased. We analyze the error accumulated in observables in this setting and show that, due to stochastic error cancellation, with high probability the error scales as the square root of the number of qubits instead of linearly. We explore the concentration phenomenon of this error as well as its implications for local observables in the thermodynamic limit. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 14 pages, 2 figures

arXiv:2311.07858 [pdf]

doi 10.1038/s41467-024-47133-7

Large-area, freestanding single-crystal gold of single nanometer thickness

Authors: Chenxinyu Pan, Yuanbiao Tong, Haoliang Qian, Alexey V. Krasavin, Jialin Li, Jiajie Zhu, Yiyun Zhang, Bowen Cui, Zhiyong Li, Chenming Wu, Zhenxin Wang, Lufang Liu, Linjun Li, Xin Guo, Anatoly V. Zayats, Limin Tong, Pan Wang

Abstract: Two-dimensional single-crystal metals are highly sought after for next-generation technologies. Here, we report large-area (>10^4 μみゅーm2), single-crystal two-dimensional gold with thicknesses down to a single-nanometer level, employing an atomic-level-precision chemical etching approach. The ultrathin thickness and single-crystal quality endow two-dimensional gold with unique properties including sig… ▽ More Two-dimensional single-crystal metals are highly sought after for next-generation technologies. Here, we report large-area (>10^4 μみゅーm2), single-crystal two-dimensional gold with thicknesses down to a single-nanometer level, employing an atomic-level-precision chemical etching approach. The ultrathin thickness and single-crystal quality endow two-dimensional gold with unique properties including significantly quantum-confinement-augmented optical nonlinearity, low sheet resistance, high transparency and excellent mechanical flexibility. By patterning the two-dimensional gold into nanoribbon arrays, extremely-confined near-infrared plasmonic resonances are further demonstrated with quality factors up to 5. The freestanding nature of two-dimensional gold allows its straightforward manipulation and transfer-printing for integration with other structures. The developed two-dimensional gold provides an emerging platform for fundamental studies in various disciplines and opens up new opportunities for applications in high-performance ultrathin optoelectronic, photonic and quantum devices. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Journal ref: Nature Commun. 15 (2024) 2840-2849

arXiv:2311.05282 [pdf, other]

Empowering high-dimensional optical fiber communications with integrated photonic processors

Authors: Kaihang Lu, Zengqi Chen, Hao Chen, Wu Zhou, Zunyue Zhang, Hon Ki Tsang, Yeyu Tong

Abstract: Mode division multiplexing (MDM) in optical fibers enables multichannel capabilities for various applications, including data transmission, quantum networks, imaging, and sensing. However, MDM optical fiber systems, usually necessities bulk-optics approaches for launching different orthogonal fiber modes into the multimode optical fiber, and multiple-input multiple-output digital electronic signal… ▽ More Mode division multiplexing (MDM) in optical fibers enables multichannel capabilities for various applications, including data transmission, quantum networks, imaging, and sensing. However, MDM optical fiber systems, usually necessities bulk-optics approaches for launching different orthogonal fiber modes into the multimode optical fiber, and multiple-input multiple-output digital electronic signal processing at the receiver side to undo the arbitrary mode scrambling in a circular-core optical fiber. Here we show that a high-dimensional optical fiber communication system can be entirely implemented by a reconfigurable integrated photonic processor, featuring kernels of multichannel mode multiplexing transmitter and all-optical descrambling receiver. High-speed and inter-chip communications involving six spatial- and polarization modes have been experimentally demonstrated with high efficiency and high-quality eye diagrams, despite the presence of random mode scrambling and polarization rotation in a circular-core few-mode fiber. The proposed photonic integration approach holds promising prospects for future space-division multiplexing applications. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.03675 [pdf, other]

Ultra-compact and efficient integrated multichannel mode multiplexer in silicon for few-mode fibers

Authors: Wu Zhou, Zunyue Zhang, Hao Chen, Hon Ki Tsang, Yeyu Tong

Abstract: Space-division multiplexing (SDM) is one of the key enabling technologies to increase the capacity of fiber communication systems. However, implementing SDM-based systems using multimode fiber has been challenging with the need for compact, low-cost, and scalable mode de/multiplexer (DE/MUX). Here we present a novel integrated mode MUX for few-mode fibers (FMFs) which can launch up to eight spatia… ▽ More Space-division multiplexing (SDM) is one of the key enabling technologies to increase the capacity of fiber communication systems. However, implementing SDM-based systems using multimode fiber has been challenging with the need for compact, low-cost, and scalable mode de/multiplexer (DE/MUX). Here we present a novel integrated mode MUX for few-mode fibers (FMFs) which can launch up to eight spatial and polarization channels. The new design is composed of a two-dimensional multimode grating coupler (MMGC), highly compact mode size converters (MSCs), and adiabatic directional couplers (ADCs). Eight data lanes in FMFs can be selectively launched with integrated optical phase shifters. Experimental results reveal efficient chip-to-fiber coupling with peak efficiencies of -3.8 dBでしべる, -5.5 dBでしべる, -3.6 dBでしべる, and -4.1 dBでしべる for LP01, LP11a, LP11b, and LP21b modes, respectively. Meanwhile, the proposed design can efficiently couple all the degenerate LP modes in a two-mode FMF, allowing signal descrambling in the demultiplexer. Thanks to the use of integrated subwavelength Mikaelian lens for mode-independent field size conversion with loss $\leq$0.25 dBでしべる, the total footprint of the MMGC and MSCs is only 35x35 $μみゅー$m$^{2}$. The proposed design shows great potential for densely integrated photonic circuits in future SDM applications. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 10 pages, 5 figures

arXiv:2310.19657 [pdf, other]

doi 10.1103/PhysRevB.109.L161402

Isolating the Nonlinear Optical Response of a MoS$_2$ Monolayer under Extreme Screening of a Metal Substrate

Authors: Tao Yang, Stephan Sleziona, Erik Pollmann, Eckart Hasselbrink, Peter Kratzer, Marika Schleberger, R. Kramer Campen, Yujin Tong

Abstract: Transition metal dichalcogenides (TMDCs) monolayers, as two-dimensional (2D) direct bandgap semiconductors, hold promise for advanced optoelectronic and photocatalytic devices. Interaction with three-dimensional (3D) metals, like Au, profoundly affects their optical properties, posing challenges in characterizing the monolayer's optical responses within the semiconductor-metal junction. In this st… ▽ More Transition metal dichalcogenides (TMDCs) monolayers, as two-dimensional (2D) direct bandgap semiconductors, hold promise for advanced optoelectronic and photocatalytic devices. Interaction with three-dimensional (3D) metals, like Au, profoundly affects their optical properties, posing challenges in characterizing the monolayer's optical responses within the semiconductor-metal junction. In this study, using precise polarization-controlled final-state sum frequency generation (FS-SFG), we successfully isolated the optical responses of a MoS$_2$ monolayer from a MoS$_2$/Au junction. The resulting SFG spectra exhibit a linear lineshape, devoid of A or B exciton features, attributed to the strong dielectric screening and substrate induced doping. The linear lineshape illustrates the expected constant density of states (DOS) at the band edge of the 2D semiconductor, a feature often obscured by excitonic interactions in week-screening conditions such as in a free-standing monolayer. Extrapolation yields the onset of a direct quasiparticle bandgap of about $1.65\pm0.20$ eV, indicating a strong bandgap renormalization. This study not only enriches our understanding of the optical responses of a 2D semiconductor in extreme screening conditions but also provides a critical reference for advancing 2D semiconductor-based photocatalytic applications. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 14 pages, 4 figures + supplemental material

Journal ref: Physical Review B 109, L161402 (2024)

arXiv:2310.17389 [pdf, other]

ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

Authors: Zi Lin, Zihan Wang, Yongqi Tong, Yangkun Wang, Yuxin Guo, Yujia Wang, Jingbo Shang

Abstract: Despite remarkable advances that large language models have achieved in chatbots, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media content, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored.… ▽ More Despite remarkable advances that large language models have achieved in chatbots, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media content, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored. In this work, we introduce ToxicChat, a novel benchmark based on real user queries from an open-source chatbot. This benchmark contains the rich, nuanced phenomena that can be tricky for current toxicity detection models to identify, revealing a significant domain difference compared to social media content. Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat. Our work illuminates the potentially overlooked challenges of toxicity detection in real-world user-AI conversations. In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Journal ref: EMNLP findings 2023

arXiv:2310.13287 [pdf]

Space-confined solid-phase growth of two-domain 1T'-ReSe2 for tunable optoelectronics

Authors: Yunhao Tong, Fanyi Kong, Lei Zhang, Xinyi Hou, Zhengxian Zha, Zheng Hao, Jianxun Dai, Changsen Sun, Jingfeng Song, Huolin Huang, Chenhua Ji, Lujun Pan, Dawei Li

Abstract: Two-dimensional layered ReX2 (X = Se, S) has attracted researcher's great interest due to its unusual in-plane anisotropic optical and electrical properties and great potential in polarization-sensitive optoelectronic devices, while the clean, energy-saving, and ecological synthesis of highly-crystalline ReSe2 with controlled domains remains challenging yet promising. Here, we develop a novel spac… ▽ More Two-dimensional layered ReX2 (X = Se, S) has attracted researcher's great interest due to its unusual in-plane anisotropic optical and electrical properties and great potential in polarization-sensitive optoelectronic devices, while the clean, energy-saving, and ecological synthesis of highly-crystalline ReSe2 with controlled domains remains challenging yet promising. Here, we develop a novel space-confined solid-phase approach for the growth of high-quality two-domain 1T'-ReSe2 with tunable optoelectronic properties by using pure Re powder film as Re precursor. The results show that ReSe2 can be grown at a temperature as low as 550 oC in a small-tube-assisted space-confined reactor, with its size and shape well-tailored via temperature control. A solid-phase two-domain ReSe2 growth mechanism is proposed, as evidenced by combining in-situ optical monitoring, ex-situ electron microscope and elemental mapping, and polarized optical imaging. Moreover, we have fabricated two-domain ReSe2 transistors, which exhibit switchable transport behavior between n-type and ambipolar character via grain boundary orientation control. This modulation phenomenon is attributed to the different doping levels between the grain boundary and the single domain. Furthermore, the as-fabricated two-domain ReSe2 photodetectors exhibit a highly gate-tunable current on-off ratio (with a maximum value of ~8.2x10^3), a polarization-sensitive photo-response, and a high-speed response time (~300 us), exceeding most of the previously reported ReX2 photodetectors. Our work thus provides a new, low-consumption, energy-saving growth strategy toward high-quality, domain-controlled ReX2 for highly tunable and high-performance optoelectronics. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 24 pages, 6 figures

arXiv:2310.12342 [pdf, other]

Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking

Authors: Yongqi Tong, Yifan Wang, Dawei Li, Sizhe Wang, Zi Lin, Simeng Han, Jingbo Shang

Abstract: Chain-of-Thought(CoT) prompting and its variants explore equipping large language models (LLMs) with high-level reasoning abilities by emulating human-like linear cognition and logic. However, the human mind is complicated and mixed with both linear and nonlinear thinking. In this work, we propose \textbf{I}nferential \textbf{E}xclusion \textbf{P}rompting (IEP), a novel prompting that combines the… ▽ More Chain-of-Thought(CoT) prompting and its variants explore equipping large language models (LLMs) with high-level reasoning abilities by emulating human-like linear cognition and logic. However, the human mind is complicated and mixed with both linear and nonlinear thinking. In this work, we propose \textbf{I}nferential \textbf{E}xclusion \textbf{P}rompting (IEP), a novel prompting that combines the principles of elimination and inference in order to guide LLMs to think non-linearly. IEP guides LLMs to plan and then utilize Natural Language Inference (NLI) to deduce each possible solution's entailment relation with context, commonsense, or facts, therefore yielding a broader perspective by thinking back for inferring. This forward planning and backward eliminating process allows IEP to better simulate the complex human thinking processes compared to other CoT-based methods, which only reflect linear cognitive processes. We conducted a series of empirical studies and have corroborated that IEP consistently outperforms CoT across various tasks. Additionally, we observe that integrating IEP and CoT further improves the LLMs' performance on certain tasks, highlighting the necessity of equipping LLMs with mixed logic processes. Moreover, to better evaluate comprehensive features inherent in human logic, we introduce \textbf{M}ental-\textbf{A}bility \textbf{R}easoning \textbf{B}enchmark (MARB). The benchmark comprises six novel subtasks with a total of 9,115 questions, among which 1,685 are developed with hand-crafted rationale references. We believe both \textsc{IEP} and \textsc{MARB} can serve as a promising direction for unveiling LLMs' logic and verbal reasoning abilities and drive further advancements. \textsc{MARB} will be available at ~\texttt{anonymity link} soon. △ Less

Submitted 14 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.08850 [pdf]

An unprecedented synergy of high-temperature tensile strength and ductility in a NiCoCrAlTi high-entropy alloy

Authors: Hongmin Zhang, Fanchao Meng, Haoyan Meng, Yang Tong, Peter K. Liaw, Xiao Yang, Lei Zhao, Haizhou Wang, Yanfei Gao, Shuying Chen

Abstract: The present work reported a novel L12-strengthening NiCoCrAlTi high entropy alloy (HEA) with an outstanding synergy of tensile strength and ductility at both ambient and high temperatures. Transmission electron microscopy (TEM) characterization revealed a high density of rod-like and spheroidal L12 precipitates distributing in the micro/nanograins and non-recrystallized regions in the annealed spe… ▽ More The present work reported a novel L12-strengthening NiCoCrAlTi high entropy alloy (HEA) with an outstanding synergy of tensile strength and ductility at both ambient and high temperatures. Transmission electron microscopy (TEM) characterization revealed a high density of rod-like and spheroidal L12 precipitates distributing in the micro/nanograins and non-recrystallized regions in the annealed specimens. The tremendously high yield stress, ultimate tensile stress (UTS), and ductility of the HEA at 600 C were ~1060 MPa, 1271 MPa, and 25%, respectively, which were significantly superior to most reported HEAs and Co- and Ni-based superalloys to date. Systematic TEM analysis unveiled that the cooperation among L12 precipitation, extensive stacking faults (SFs), deformation twins (DTs), immobile Lomer-Cottrell (L-C) locks formed from interactions between SFs and SFs/DTs, hierarchical SFs/DTs networks, as well as hetero-deformation-induced strengthening dominated the plastic deformation at 600 C. Such a unique deformation mechanism enabled extremely high tensile strength and sustained ductility of the HEA at a high temperature. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.01393 [pdf, other]

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

Authors: Shilin Xu, Xiangtai Li, Size Wu, Wenwei Zhang, Yunhai Tong, Chen Change Loy

Abstract: Open-vocabulary object detection (OVOD) aims to detect the objects beyond the set of classes observed during training. This work introduces a straightforward and efficient strategy that utilizes pre-trained vision-language models (VLM), like CLIP, to identify potential novel classes through zero-shot classification. Previous methods use a class-agnostic region proposal network to detect object pro… ▽ More Open-vocabulary object detection (OVOD) aims to detect the objects beyond the set of classes observed during training. This work introduces a straightforward and efficient strategy that utilizes pre-trained vision-language models (VLM), like CLIP, to identify potential novel classes through zero-shot classification. Previous methods use a class-agnostic region proposal network to detect object proposals and consider the proposals that do not match the ground truth as background. Unlike these methods, our method will select a subset of proposals that will be considered as background during the training. Then, we treat them as novel classes during training. We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training. Compared to previous pseudo methods, our approach does not require re-training and offline labeling processing, which is more efficient and effective in one-shot training. Empirical evaluations on three datasets, including LVIS, V3Det, and COCO, demonstrate significant improvements over the baseline performance without incurring additional parameters or computational costs during inference. In addition, we also apply our method to various baselines. In particular, compared with the previous method, F-VLM, our method achieves a 1.7% improvement on the LVIS dataset. Combined with the recent method CLIPSelf, our method also achieves 46.7 novel class AP on COCO without introducing extra data for pertaining. We also achieve over 6.5% improvement over the F-VLM baseline in the recent challenging V3Det dataset. We release our code and models at https://github.com/xushilin1/dst-det. △ Less

Submitted 1 April, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Showing 1–50 of 257 results for author: Tong, Y