Search | arXiv e-print repository

arXiv:2407.19679 [pdf]

Harnessing Large Vision and Language Models in Agriculture: A Review

Authors: Hongyan Zhu, Shuai Qin, Min Su, Chengzhi Lin, Anjie Li, Junfeng Gao

Abstract: Large models can play important roles in many domains. Agriculture is another key factor affecting the lives of people around the world. It provides food, fabric, and coal for humanity. However, facing many challenges such as pests and diseases, soil degradation, global warming, and food security, how to steadily increase the yield in the agricultural sector is a problem that humans still need to… ▽ More Large models can play important roles in many domains. Agriculture is another key factor affecting the lives of people around the world. It provides food, fabric, and coal for humanity. However, facing many challenges such as pests and diseases, soil degradation, global warming, and food security, how to steadily increase the yield in the agricultural sector is a problem that humans still need to solve. Large models can help farmers improve production efficiency and harvest by detecting a series of agricultural production tasks such as pests and diseases, soil quality, and seed quality. It can also help farmers make wise decisions through a variety of information, such as images, text, etc. Herein, we delve into the potential applications of large models in agriculture, from large language model (LLM) and large vision model (LVM) to large vision-language models (LVLM). After gaining a deeper understanding of multimodal large language models (MLLM), it can be recognized that problems such as agricultural image processing, agricultural question answering systems, and agricultural machine automation can all be solved by large models. Large models have great potential in the field of agriculture. We outline the current applications of agricultural large models, and aims to emphasize the importance of large models in the domain of agriculture. In the end, we envisage a future in which famers use MLLM to accomplish many tasks in agriculture, which can greatly improve agricultural production efficiency and yield. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.13994 [pdf, other]

Evidential Deep Learning for Interatomic Potentials

Authors: Han Xu, Taoyong Cui, Chenyu Tang, Dongzhan Zhou, Yuqiang Li, Xiang Gao, Xingao Gong, Wanli Ouyang, Shufei Zhang, Mao Su

Abstract: Machine learning interatomic potentials (MLIPs) have been widely used to facilitate large scale molecular simulations with ab initio level accuracy. However, MLIP-based molecular simulations frequently encounter the issue of collapse due to decreased prediction accuracy for out-of-distribution (OOD) data. To mitigate this issue, it is crucial to enrich the training set with active learning, where… ▽ More Machine learning interatomic potentials (MLIPs) have been widely used to facilitate large scale molecular simulations with ab initio level accuracy. However, MLIP-based molecular simulations frequently encounter the issue of collapse due to decreased prediction accuracy for out-of-distribution (OOD) data. To mitigate this issue, it is crucial to enrich the training set with active learning, where uncertainty estimation serves as an effective method for identifying and collecting OOD data. Therefore, a feasible method for uncertainty estimation in MLIPs is desired. The existing methods either require expensive computations or compromise prediction accuracy. In this work, we introduce evidential deep learning for interatomic potentials (eIP) with a physics-inspired design. Our experiments demonstrate that eIP consistently generates reliable uncertainties without incurring notable additional computational costs, while the prediction accuracy remains unchanged. Furthermore, we present an eIP-based active learning workflow, where eIP is used not only to estimate the uncertainty of molecular data but also to perform uncertainty-driven dynamics simulations. Our findings show that eIP enables efficient sampling for a more diverse dataset, thereby advancing the feasibility of MLIP-based molecular simulations. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.13446 [pdf, ps, other]

Subsampled One-Step Estimation for Fast Statistical Inference

Authors: Miaomiao Su, Ruoyu Wang

Abstract: Subsampling is an effective approach to alleviate the computational burden associated with large-scale datasets. Nevertheless, existing subsampling estimators incur a substantial loss in estimation efficiency compared to estimators based on the full dataset. Specifically, the convergence rate of existing subsampling estimators is typically $n^{-1/2}$ rather than $N^{-1/2}$, where $n$ and $N$ denot… ▽ More Subsampling is an effective approach to alleviate the computational burden associated with large-scale datasets. Nevertheless, existing subsampling estimators incur a substantial loss in estimation efficiency compared to estimators based on the full dataset. Specifically, the convergence rate of existing subsampling estimators is typically $n^{-1/2}$ rather than $N^{-1/2}$, where $n$ and $N$ denote the subsample and full data sizes, respectively. This paper proposes a subsampled one-step (SOS) method to mitigate the estimation efficiency loss utilizing the asymptotic expansions of the subsampling and full-data estimators. The resulting SOS estimator is computationally efficient and achieves a fast convergence rate of $\max\{n^{-1}, N^{-1/2}\}$ rather than $n^{-1/2}$. We establish the asymptotic distribution of the SOS estimator, which can be non-normal in general, and construct confidence intervals on top of the asymptotic distribution. Furthermore, we prove that the SOS estimator is asymptotically normal and equivalent to the full data-based estimator when $n / \sqrt{N} \to \infty$.Simulation studies and real data analyses were conducted to demonstrate the finite sample performance of the SOS estimator. Numerical results suggest that the SOS estimator is almost as computationally efficient as the uniform subsampling estimator while achieving similar estimation efficiency to the full data-based estimator. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2406.14966 [pdf, other]

AIGC-Chain: A Blockchain-Enabled Full Lifecycle Recording System for AIGC Product Copyright Management

Authors: Jiajia Jiang, Moting Su, Xiangli Xiao, Yushu Zhang, Yuming Fang

Abstract: As artificial intelligence technology becomes increasingly prevalent, Artificial Intelligence Generated Content (AIGC) is being adopted across various sectors. Although AIGC is playing an increasingly significant role in business and culture, questions surrounding its copyright have sparked widespread debate. The current legal framework for copyright and intellectual property is grounded in the co… ▽ More As artificial intelligence technology becomes increasingly prevalent, Artificial Intelligence Generated Content (AIGC) is being adopted across various sectors. Although AIGC is playing an increasingly significant role in business and culture, questions surrounding its copyright have sparked widespread debate. The current legal framework for copyright and intellectual property is grounded in the concept of human authorship, but in the creation of AIGC, human creators primarily provide conceptual ideas, with AI independently responsible for the expressive elements. This disconnect creates complexity and difficulty in determining copyright ownership under existing laws. Consequently, it is imperative to reassess the intellectual contributions of all parties involved in the creation of AIGC to ensure a fair allocation of copyright ownership. To address this challenge, we introduce AIGC-Chain, a blockchain-enabled full lifecycle recording system designed to manage the copyright of AIGC products. It is engineered to meticulously document the entire lifecycle of AIGC products, providing a transparent and dependable platform for copyright management. Furthermore, we propose a copyright tracing method based on an Indistinguishable Bloom Filter, named IBFT, which enhances the efficiency of blockchain transaction queries and significantly reduces the risk of fraudulent copyright claims for AIGC products. In this way, auditors can analyze the copyright of AIGC products by reviewing all relevant information retrieved from the blockchain. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14191 [pdf, other]

Temporal Knowledge Graph Question Answering: A Survey

Authors: Miao Su, Zixuan Li, Zhuo Chen, Long Bai, Xiaolong Jin, Jiafeng Guo

Abstract: Knowledge Base Question Answering (KBQA) has been a long-standing field to answer questions based on knowledge bases. Recently, the evolving dynamics of knowledge have attracted a growing interest in Temporal Knowledge Graph Question Answering (TKGQA), an emerging task to answer temporal questions. However, this field grapples with ambiguities in defining temporal questions and lacks a systematic… ▽ More Knowledge Base Question Answering (KBQA) has been a long-standing field to answer questions based on knowledge bases. Recently, the evolving dynamics of knowledge have attracted a growing interest in Temporal Knowledge Graph Question Answering (TKGQA), an emerging task to answer temporal questions. However, this field grapples with ambiguities in defining temporal questions and lacks a systematic categorization of existing methods for TKGQA. In response, this paper provides a thorough survey from two perspectives: the taxonomy of temporal questions and the methodological categorization for TKGQA. Specifically, we first establish a detailed taxonomy of temporal questions engaged in prior studies. Subsequently, we provide a comprehensive review of TKGQA techniques of two categories: semantic parsing-based and TKG embedding-based. Building on this review, the paper outlines potential research directions aimed at advancing the field of TKGQA. This work aims to serve as a comprehensive reference for TKGQA and to stimulate further research. △ Less

Submitted 5 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: 8 pages, 3 figures

arXiv:2406.05504 [pdf, other]

G-Transformer: Counterfactual Outcome Prediction under Dynamic and Time-varying Treatment Regimes

Authors: Hong Xiong, Feng Wu, Leon Deng, Megan Su, Li-wei H Lehman

Abstract: In the context of medical decision making, counterfactual prediction enables clinicians to predict treatment outcomes of interest under alternative courses of therapeutic actions given observed patient history. Prior machine learning approaches for counterfactual predictions under time-varying treatments focus on static time-varying treatment regimes where treatments do not depend on previous cova… ▽ More In the context of medical decision making, counterfactual prediction enables clinicians to predict treatment outcomes of interest under alternative courses of therapeutic actions given observed patient history. Prior machine learning approaches for counterfactual predictions under time-varying treatments focus on static time-varying treatment regimes where treatments do not depend on previous covariate history. In this work, we present G-Transformer, a Transformer-based framework supporting g-computation for counterfactual prediction under dynamic and time-varying treatment strategies. G-Transfomer captures complex, long-range dependencies in time-varying covariates using a Transformer architecture. G-Transformer estimates the conditional distribution of relevant covariates given covariate and treatment history at each time point using an encoder architecture, then produces Monte Carlo estimates of counterfactual outcomes by simulating forward patient trajectories under treatment strategies of interest. We evaluate G-Transformer extensively using two simulated longitudinal datasets from mechanistic models, and a real-world sepsis ICU dataset from MIMIC-IV. G-Transformer outperforms both classical and state-of-the-art counterfactual prediction models in these settings. To the best of our knowledge, this is the first Transformer-based architecture for counterfactual outcome prediction under dynamic and time-varying treatment strategies. △ Less

Submitted 27 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.03136 [pdf, ps, other]

Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

Authors: Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu

Abstract: We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of n… ▽ More We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of nearly linear algorithms by controlling the LoRA update computation term by term, assuming the Strong Exponential Time Hypothesis (SETH). For the former, we identify a sharp transition in the efficiency of all possible rank-$r$ LoRA update algorithms for transformers, based on specific norms resulting from the multiplications of the input sequence $\mathbf{X}$, pretrained weights $\mathbf{W^\star}$, and adapter matrices $αあるふぁ\mathbf{B} \mathbf{A} / r$. Specifically, we derive a shared upper bound threshold for such norms and show that efficient (sub-quadratic) approximation algorithms of LoRA exist only below this threshold. For the latter, we prove the existence of nearly linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients and approximating the gradients with a series of chained low-rank approximations. To showcase our theory, we consider two practical scenarios: partial (e.g., only $\mathbf{W}_V$ and $\mathbf{W}_Q$) and full adaptations (e.g., $\mathbf{W}_Q$, $\mathbf{W}_V$, and $\mathbf{W}_K$) of weights in attention heads. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.08308 [pdf, other]

Online Test-time Adaptation for Interatomic Potentials

Authors: Taoyong Cui, Chenyu Tang, Dongzhan Zhou, Yuqiang Li, Xingao Gong, Wanli Ouyang, Mao Su, Shufei Zhang

Abstract: Machine learning interatomic potentials (MLIPs) enable more efficient molecular dynamics (MD) simulations with ab initio accuracy, which have been used in various domains of physical science. However, distribution shift between training and test data causes deterioration of the test performance of MLIPs, and even leads to collapse of MD simulations. In this work, we propose an online Test-time Ada… ▽ More Machine learning interatomic potentials (MLIPs) enable more efficient molecular dynamics (MD) simulations with ab initio accuracy, which have been used in various domains of physical science. However, distribution shift between training and test data causes deterioration of the test performance of MLIPs, and even leads to collapse of MD simulations. In this work, we propose an online Test-time Adaptation Interatomic Potential (TAIP) framework to improve the generalization on test data. Specifically, we design a dual-level self-supervised learning approach that leverages global structure and atomic local environment information to align the model with the test data. Extensive experiments demonstrate TAIP's capability to bridge the domain gap between training and test dataset without additional data. TAIP enhances the test performance on various benchmarks, from small molecule datasets to complex periodic molecular systems with various types of elements. Remarkably, it also enables stable MD simulations where the corresponding baseline models collapse. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2404.10354 [pdf]

Physical formula enhanced multi-task learning for pharmacokinetics prediction

Authors: Ruifeng Li, Dongzhan Zhou, Ancheng Shen, Ao Zhang, Mao Su, Mingqian Li, Hongyang Chen, Gang Chen, Yin Zhang, Shufei Zhang, Yuqiang Li, Wanli Ouyang

Abstract: Artificial intelligence (AI) technology has demonstrated remarkable potential in drug dis-covery, where pharmacokinetics plays a crucial role in determining the dosage, safety, and efficacy of new drugs. A major challenge for AI-driven drug discovery (AIDD) is the scarcity of high-quality data, which often requires extensive wet-lab work. A typical example of this is pharmacokinetic experiments. I… ▽ More Artificial intelligence (AI) technology has demonstrated remarkable potential in drug dis-covery, where pharmacokinetics plays a crucial role in determining the dosage, safety, and efficacy of new drugs. A major challenge for AI-driven drug discovery (AIDD) is the scarcity of high-quality data, which often requires extensive wet-lab work. A typical example of this is pharmacokinetic experiments. In this work, we develop a physical formula enhanced mul-ti-task learning (PEMAL) method that predicts four key parameters of pharmacokinetics simultaneously. By incorporating physical formulas into the multi-task framework, PEMAL facilitates effective knowledge sharing and target alignment among the pharmacokinetic parameters, thereby enhancing the accuracy of prediction. Our experiments reveal that PEMAL significantly lowers the data demand, compared to typical Graph Neural Networks. Moreover, we demonstrate that PEMAL enhances the robustness to noise, an advantage that conventional Neural Networks do not possess. Another advantage of PEMAL is its high flexibility, which can be potentially applied to other multi-task machine learning scenarios. Overall, our work illustrates the benefits and potential of using PEMAL in AIDD and other scenarios with data scarcity and noise. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.07323 [pdf, other]

Surrogate modeling for probability distribution estimation:uniform or adaptive design?

Authors: Maijia Su, Ziqi Wang, Oreste Salvatore Bursi, Marco Broccardo

Abstract: The active learning (AL) technique, one of the state-of-the-art methods for constructing surrogate models, has shown high accuracy and efficiency in forward uncertainty quantification (UQ) analysis. This paper provides a comprehensive study on AL-based global surrogates for computing the full distribution function, i.e., the cumulative distribution function (CDF) and the complementary CDF (CCDF).… ▽ More The active learning (AL) technique, one of the state-of-the-art methods for constructing surrogate models, has shown high accuracy and efficiency in forward uncertainty quantification (UQ) analysis. This paper provides a comprehensive study on AL-based global surrogates for computing the full distribution function, i.e., the cumulative distribution function (CDF) and the complementary CDF (CCDF). To this end, we investigate the three essential components for building surrogates, i.e., types of surrogate models, enrichment methods for experimental designs, and stopping criteria. For each component, we choose several representative methods and study their desirable configurations. In addition, we devise a uniform design (i.e., space-filling design) as a baseline for measuring the improvement of using AL. Combining all the representative methods, a total of 1,920 UQ analyses are carried out to solve 16 benchmark examples. The performance of the selected strategies is evaluated based on accuracy and efficiency. In the context of full distribution estimation, this study concludes that (i) AL techniques cannot provide a systematic improvement compared with uniform designs, (ii) the recommended surrogate modeling methods depend on the features of the problems (especially the local nonlinearity), target accuracy, and computational budget. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.20134 [pdf, other]

User Modeling Challenges in Interactive AI Assistant Systems

Authors: Megan Su, Yuwei Bao

Abstract: Interactive Artificial Intelligent(AI) assistant systems are designed to offer timely guidance to help human users to complete a variety tasks. One of the remaining challenges is to understand user's mental states during the task for more personalized guidance. In this work, we analyze users' mental states during task executions and investigate the capabilities and challenges for large language mo… ▽ More Interactive Artificial Intelligent(AI) assistant systems are designed to offer timely guidance to help human users to complete a variety tasks. One of the remaining challenges is to understand user's mental states during the task for more personalized guidance. In this work, we analyze users' mental states during task executions and investigate the capabilities and challenges for large language models to interpret user profiles for more personalized user guidance. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.12716 [pdf, ps, other]

A New Reduction Method from Multivariate Polynomials to Univariate Polynomials

Authors: Cancan Wang, Ming Su, Gang Wang, Qingpo Zhang

Abstract: Polynomial multiplication is a fundamental problem in symbolic computation. There are efficient methods for the multiplication of two univariate polynomials. However, there is rarely efficiently nontrivial method for the multiplication of two multivariate polynomials. Therefore, we consider a new multiplication mechanism that involves a) reversibly reducing multivariate polynomials into univariate… ▽ More Polynomial multiplication is a fundamental problem in symbolic computation. There are efficient methods for the multiplication of two univariate polynomials. However, there is rarely efficiently nontrivial method for the multiplication of two multivariate polynomials. Therefore, we consider a new multiplication mechanism that involves a) reversibly reducing multivariate polynomials into univariate polynomials, b) calculating the product of the derived univariate polynomials by the Toom-Cook or FFT algorithm, and c) correctly recovering the product of multivariate polynomials from the product of two univariate polynomials. This work focuses on step a), expecting the degrees of the derived univariate polynomials to be as small as possible. We propose iterative Kronecker substitution, where smaller substitution exponents are selected instead of standard Kronecker substitution. We also apply the Chinese remainder theorem to polynomial reduction and find its advantages in some cases. Afterwards, we provide a hybrid reduction combining the advantages of both reduction methods. Moreover, we compare these reduction methods in terms of lower and upper bounds of the degree of the product of two derived univariate polynomials, and their computational complexities. With randomly generated multivariate polynomials, experiments show that the degree of the product of two univariate polynomials derived from the hybrid reduction can be reduced even to approximately 3% that resulting from the standard Kronecker substitution, implying an efficient subsequent multiplication of two univariate polynomials. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 15 pages

arXiv:2403.07969 [pdf, other]

KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

Authors: Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

Abstract: In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code… ▽ More In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code-style schema representation method to uniformly transform different schemas into Python classes, with which complex schema information, such as constraints among tasks in UIE, can be captured in an LLM-friendly manner. We further construct a code-style schema library covering over $\textbf{30,000}$ types of knowledge, which is the largest one for UIE, to the best of our knowledge. To ease the learning process of LLMs, KnowCoder contains a two-phase learning framework that enhances its schema understanding ability via code pretraining and its schema following ability via instruction tuning. After code pretraining on around $1.5$B automatically constructed data, KnowCoder already attains remarkable generalization ability and achieves relative improvements by $\textbf{49.8%}$ F1, compared to LLaMA2, under the few-shot setting. After instruction tuning, KnowCoder further exhibits strong generalization ability on unseen schemas and achieves up to $\textbf{12.5%}$ and $\textbf{21.9%}$, compared to sota baselines, under the zero-shot setting and the low resource setting, respectively. Additionally, based on our unified schema representations, various human-annotated datasets can simultaneously be utilized to refine KnowCoder, which achieves significant improvements up to $\textbf{7.5%}$ under the supervised setting. △ Less

Submitted 13 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.08940 [pdf, other]

Structure and magnetic properties of a La$_{0.75}$Sr$_{0.25}$Cr$_{0.90}$O$_{3-δでるた}$ single crystal

Authors: Kaitong Sun, Yinghao Zhu, Shinichiro Yano, Qian Zhao, Muqing Su, Guanping Xu, Ruifeng Zheng, Ying Ellie Fu, Hai-Feng Li

Abstract: We have successfully grown large and good-quality single crystals of the La$_{0.75}$Sr$_{0.25}$Cr$_{0.90}$O$_{3-δでるた}$ compound using the floating-zone method with laser diodes. We investigated the crystal quality, crystallography, chemical composition, magnetic properties and the oxidation state of Cr in the grown single crystals by employing a combination of techniques, including X-ray Laue and pow… ▽ More We have successfully grown large and good-quality single crystals of the La$_{0.75}$Sr$_{0.25}$Cr$_{0.90}$O$_{3-δでるた}$ compound using the floating-zone method with laser diodes. We investigated the crystal quality, crystallography, chemical composition, magnetic properties and the oxidation state of Cr in the grown single crystals by employing a combination of techniques, including X-ray Laue and powder diffraction, scanning electron microscopy, magnetization measurements, X-ray photoelectron spectroscopy and light absorption. The La$_{0.75}$Sr$_{0.25}$Cr$_{0.90}$O$_{3-δでるた}$ single crystal exhibits a single-phase composition, crystallizing in a trigonal structure with the space group $R\bar{3}c$ at room temperature. The chemical composition was determined as La$_{0.75}$Sr$_{0.25}$Cr$_{0.90}$O$_{3-δでるた}$, indicating a significant chromium deficiency. Upon warming, we observed five distinctive characteristic temperatures, namely $T_1 =$ 21.50(1) K, $T_2 =$ 34.98(1) K, $T_3 =$ 117.94(1) K, $T_4 =$ 155.01(1) K, and $T_{\textrm{N}} =$ 271.80(1) K, revealing five distinct magnetic anomalies. Our magnetization study allows us to explore the nature of these anomalies. Remarkably, the oxidation state of chromium in the single-crystal La$_{0.75}$Sr$_{0.25}$Cr$_{0.90}$O$_{3-δでるた}$, characterized by a band gap of 1.630(8) eV, is exclusively attributed to Cr$^{3+}$ ions, making a departure from the findings of previous studies on polycrystalline materials. △ Less

Submitted 18 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: 24 pages, 10 figures

arXiv:2402.06852 [pdf]

ChemLLM: A Chemical Large Language Model

Authors: Di Zhang, Wei Liu, Qian Tan, Jingdan Chen, Hang Yan, Yuliang Yan, Jiatong Li, Weiran Huang, Xiangyu Yue, Wanli Ouyang, Dongzhan Zhou, Shufei Zhang, Mao Su, Han-Sen Zhong, Yuqiang Li

Abstract: Large language models (LLMs) have made impressive progress in chemistry applications. However, the community lacks an LLM specifically designed for chemistry. The main challenges are two-fold: firstly, most chemical data and scientific knowledge are stored in structured databases, which limits the model's ability to sustain coherent dialogue when used directly. Secondly, there is an absence of obj… ▽ More Large language models (LLMs) have made impressive progress in chemistry applications. However, the community lacks an LLM specifically designed for chemistry. The main challenges are two-fold: firstly, most chemical data and scientific knowledge are stored in structured databases, which limits the model's ability to sustain coherent dialogue when used directly. Secondly, there is an absence of objective and fair benchmark that encompass most chemistry tasks. Here, we introduce ChemLLM, a comprehensive framework that features the first LLM dedicated to chemistry. It also includes ChemData, a dataset specifically designed for instruction tuning, and ChemBench, a robust benchmark covering nine essential chemistry tasks. ChemLLM is adept at performing various tasks across chemical disciplines with fluid dialogue interaction. Notably, ChemLLM achieves results comparable to GPT-4 on the core chemical tasks and demonstrates competitive performance with LLMs of similar size in general scenarios. ChemLLM paves a new path for exploration in chemical studies, and our method of incorporating structured chemical knowledge into dialogue systems sets a new standard for developing LLMs in various scientific fields. Codes, Datasets, and Model weights are publicly accessible at https://hf.co/AI4Chem △ Less

Submitted 25 April, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: 9 pages, 5 figures

arXiv:2401.03747 [pdf, other]

The Importance of Corner Frequency in Site-Based Stochastic Ground Motion Models

Authors: Maijia Su, Mayssa Dabaghi, Marco Broccardo

Abstract: Synthetic ground motions (GMs) play a fundamental role in both deterministic and probabilistic seismic engineering assessments. This paper shows that the family of filtered and modulated white noise stochastic GM models overlooks a key parameter -- the high-pass filter's corner frequency, $f_c$. In the simulated motions, this causes significant distortions in the long-period range of the linear-re… ▽ More Synthetic ground motions (GMs) play a fundamental role in both deterministic and probabilistic seismic engineering assessments. This paper shows that the family of filtered and modulated white noise stochastic GM models overlooks a key parameter -- the high-pass filter's corner frequency, $f_c$. In the simulated motions, this causes significant distortions in the long-period range of the linear-response spectra and in the linear-response spectral correlations. To address this, we incorporate $f_c$ as an explicitly fitted parameter in a site-based stochastic model. We optimize $f_c$ by individually matching the long-period linear-response spectrum (i.e., $Sa(T)$ for $T \geq 1$s) of synthetic GMs with that of each recorded GM. We show that by fitting $f_c$ the resulting stochastically simulated GMs can precisely capture the spectral amplitudes, variability (i.e., variances of $\log(Sa(T))$), and the correlation structure (i.e., correlation of $\log(Sa(T))$ between distinct periods $T_1$ and $T_2$) of recorded GMs. To quantify the impact of $f_c$, a sensitivity analysis is conducted through linear regression. This regression relates the logarithmic linear-response spectrum ($\log(Sa(T))$) to seven GM parameters, including the optimized $f_c$. The results indicate that the variance of $f_c$ observed in natural GMs, along with its correlation with the other GM parameters, accounts for 26\% of the spectral variability in long periods. Neglecting either the $f_c$ variance or $f_c$ correlation typically results in an important overestimation of the linear-response spectral correlation. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 16 pages, 10 figures

arXiv:2401.01718 [pdf]

RHDLPP: A multigroup radiation hydrodynamics code for laser-produced plasmas

Authors: Qi Min, Ziyang Xu, Siqi He, Haidong Lu, Xingbang Liu, Ruizi Shen, Yanhong Wu, Qikun Pan, Chongxiao Zhao, Fei Chen, Maogen Su, Chenzhong Dong

Abstract: We introduce the RHDLPP, a flux-limited multigroup radiation hydrodynamics numerical code designed for simulating laser-produced plasmas in diverse environments. The code bifurcates into two packages: RHDLPP-LTP for low-temperature plasmas generated by moderate-intensity nanosecond lasers, and RHDLPP-HTP for high-temperature, high-density plasmas formed by high-intensity laser pulses. The core rad… ▽ More We introduce the RHDLPP, a flux-limited multigroup radiation hydrodynamics numerical code designed for simulating laser-produced plasmas in diverse environments. The code bifurcates into two packages: RHDLPP-LTP for low-temperature plasmas generated by moderate-intensity nanosecond lasers, and RHDLPP-HTP for high-temperature, high-density plasmas formed by high-intensity laser pulses. The core radiation hydrodynamic equations are resolved in the Eulerian frame, employing an operator-split method. This method decomposes the solution into two substeps: first, the explicit resolution of the hyperbolic subsystems integrating radiation and fluid dynamics, and second, the implicit treatment of the parabolic part comprising stiff radiation diffusion, heat conduction, and energy exchange. Laser propagation and energy deposition are modeled through a hybrid approach, combining geometrical optics ray-tracing in sub-critical plasma regions with a one-dimensional solution of the Helmholtz wave equation in super-critical areas. The thermodynamic states are ascertained using an equation of state, based on either the real gas approximation or the quotidian equation of state (QEOS). Additionally, RHDLPP includes RHDLPP-SpeIma3D, a three-dimensional spectral simulation post-processing module, for generating both temporally-spatially resolved and time-integrated spectra and imaging, facilitating direct comparisons with experimental data. The paper showcases a series of verification tests to establish the code's accuracy and efficiency, followed by application cases, including simulations of laser-produced aluminum (Al) plasmas, pre-pulse-induced target deformation of tin (Sn) microdroplets relevant to extreme ultraviolet lithography light sources, and varied imaging and spectroscopic simulations. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.00374 [pdf, other]

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

Authors: Haiyang Liu, Zihao Zhu, Giorgio Becherini, Yichen Peng, Mingyang Su, You Zhou, Xuefei Zhe, Naoya Iwamoto, Bo Zheng, Michael J. Black

Abstract: We propose EMAGE, a framework to generate full-body human gestures from audio and masked gestures, encompassing facial, local body, hands, and global movements. To achieve this, we first introduce BEAT2 (BEAT-SMPLX-FLAME), a new mesh-level holistic co-speech dataset. BEAT2 combines a MoShed SMPL-X body with FLAME head parameters and further refines the modeling of head, neck, and finger movements,… ▽ More We propose EMAGE, a framework to generate full-body human gestures from audio and masked gestures, encompassing facial, local body, hands, and global movements. To achieve this, we first introduce BEAT2 (BEAT-SMPLX-FLAME), a new mesh-level holistic co-speech dataset. BEAT2 combines a MoShed SMPL-X body with FLAME head parameters and further refines the modeling of head, neck, and finger movements, offering a community-standardized, high-quality 3D motion captured dataset. EMAGE leverages masked body gesture priors during training to boost inference performance. It involves a Masked Audio Gesture Transformer, facilitating joint training on audio-to-gesture generation and masked gesture reconstruction to effectively encode audio and body gesture hints. Encoded body hints from masked gestures are then separately employed to generate facial and body movements. Moreover, EMAGE adaptively merges speech features from the audio's rhythm and content and utilizes four compositional VQ-VAEs to enhance the results' fidelity and diversity. Experiments demonstrate that EMAGE generates holistic gestures with state-of-the-art performance and is flexible in accepting predefined spatial-temporal gesture inputs, generating complete, audio-synchronized results. Our code and dataset are available https://pantomatrix.github.io/EMAGE/ △ Less

Submitted 30 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

Comments: Fix typos; Conflict of Interest Disclosure; CVPR Camera Ready; Project Page: https://pantomatrix.github.io/EMAGE/

arXiv:2312.10359 [pdf, other]

Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Authors: Mingbin Xu, Alex Jin, Sicheng Wang, Mu Su, Tim Ng, Henry Mason, Shiyi Han, Zhihong Lei, Yaqiao Deng, Zhen Huang, Mahesh Krishnamoorthy

Abstract: With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other smart home automation devices.… ▽ More With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other smart home automation devices. In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation. We achieve over 5.26 times faster than realtime (0.19 RTF) speech recognition on smart wearables while minimizing energy consumption and achieving state-of-the-art accuracy. The proposed methods are widely applicable to other transformer-based server-free AI applications. In addition, we provide a complete theory on optimal pre-normalizers that numerically stabilize layer normalization in any Lp-norm using any floating point precision. △ Less

Submitted 13 May, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

arXiv:2311.17071 [pdf, other]

Globular Clusters Contribute to the Nuclear Star Cluster and Galaxy Center Gamma-Ray Excess, Moderated by Galaxy Assembly History

Authors: Yuan Gao, Hui Li, Xiaojia Zhang, Meng Su, Stephen Chi Yung Ng

Abstract: Two unresolved questions at galaxy centers, namely the formation of the nuclear star cluster (NSC) and the origin of the gamma-ray excess in the Milky Way (MW) and Andromeda (M31), are both related to the formation and evolution of globular clusters (GCs). They migrate towards the galaxy center due to dynamical friction, and get tidally disrupted to release the stellar mass content including milli… ▽ More Two unresolved questions at galaxy centers, namely the formation of the nuclear star cluster (NSC) and the origin of the gamma-ray excess in the Milky Way (MW) and Andromeda (M31), are both related to the formation and evolution of globular clusters (GCs). They migrate towards the galaxy center due to dynamical friction, and get tidally disrupted to release the stellar mass content including millisecond pulsars (MSPs), which contribute to the NSC and gamma-ray excess. In this study, we propose a semi-analytical model of GC formation and evolution that utilizes the Illustris cosmological simulation to accurately capture the formation epochs of GCs and simulate their subsequent evolution. Our analysis confirms that our GC properties at z=0 are consistent with observations, and our model naturally explains the formation of a massive NSC in a galaxy similar to the MW and M31. We also find a remarkable similarity in our model prediction with the gamma-ray excess signal in the MW. However, our predictions fall short by approximately an order of magnitude in M31, indicating distinct origins for the two gamma-ray excesses. Meanwhile, we utilize the catalog of Illustris halos to investigate the influence of galaxy assembly history. We find that the earlier a galaxy is assembled, the heavier and spatially more concentrated its GC system behaves at z=0. This results in a larger NSC mass and brighter gamma-ray emission from deposited MSPs △ Less

Submitted 26 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 11 pages, 18 figures. Accepted for publication in MNRAS. Comments welcome!

arXiv:2311.09566 [pdf, other]

A Knowledge Distillation Approach for Sepsis Outcome Prediction from Multivariate Clinical Time Series

Authors: Anna Wong, Shu Ge, Nassim Oufattole, Adam Dejl, Megan Su, Ardavan Saeedi, Li-wei H. Lehman

Abstract: Sepsis is a life-threatening condition triggered by an extreme infection response. Our objective is to forecast sepsis patient outcomes using their medical history and treatments, while learning interpretable state representations to assess patients' risks in developing various adverse outcomes. While neural networks excel in outcome prediction, their limited interpretability remains a key issue.… ▽ More Sepsis is a life-threatening condition triggered by an extreme infection response. Our objective is to forecast sepsis patient outcomes using their medical history and treatments, while learning interpretable state representations to assess patients' risks in developing various adverse outcomes. While neural networks excel in outcome prediction, their limited interpretability remains a key issue. In this work, we use knowledge distillation via constrained variational inference to distill the knowledge of a powerful "teacher" neural network model with high predictive power to train a "student" latent variable model to learn interpretable hidden state representations to achieve high predictive performance for sepsis outcome prediction. Using real-world data from the MIMIC-IV database, we trained an LSTM as the "teacher" model to predict mortality for sepsis patients, given information about their recent history of vital signs, lab values and treatments. For our student model, we use an autoregressive hidden Markov model (AR-HMM) to learn interpretable hidden states from patients' clinical time series, and use the posterior distribution of the learned state representations to predict various downstream outcomes, including hospital mortality, pulmonary edema, need for diuretics, dialysis, and mechanical ventilation. Our results show that our approach successfully incorporates the constraint to achieve high predictive power similar to the teacher model, while maintaining the generative performance. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 12 pages

arXiv:2311.00738 [pdf, other]

Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

Authors: Yuwei Bao, Keunwoo Peter Yu, Yichi Zhang, Shane Storks, Itamar Bar-Yossef, Alexander De La Iglesia, Megan Su, Xiao Lin Zheng, Joyce Chai

Abstract: Despite tremendous advances in AI, it remains a significant challenge to develop interactive task guidance systems that can offer situated, personalized guidance and assist humans in various tasks. These systems need to have a sophisticated understanding of the user as well as the environment, and make timely accurate decisions on when and what to say. To address this issue, we created a new multi… ▽ More Despite tremendous advances in AI, it remains a significant challenge to develop interactive task guidance systems that can offer situated, personalized guidance and assist humans in various tasks. These systems need to have a sophisticated understanding of the user as well as the environment, and make timely accurate decisions on when and what to say. To address this issue, we created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based on natural interaction between a human user and a human instructor. We further proposed two tasks: User and Environment Understanding, and Instructor Decision Making. We leveraged several foundation models to study to what extent these models can be quickly adapted to perceptually enabled task guidance. Our quantitative, qualitative, and human evaluation results show that these models can demonstrate fair performances in some cases with no task-specific training, but a fast and reliable adaptation remains a significant challenge. Our benchmark and baselines will provide a stepping stone for future work on situated task guidance. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP 2023 Findings

arXiv:2310.05040 [pdf, ps, other]

Global large strong solutions to the radially symmetric compressible Navier-Stokes equations in 2D solid balls

Authors: Xiangdi Huang, Mengluan Su, Wei Yan, Rongfeng Yu

Abstract: In this paper, we consider the initial-boundary value problems of the compressible isentropic Navier-Stokes equations with density-dependent viscosity on two dimensional solid balls which was first introduced by Kazhikhov where shear viscosity $μみゅー$ is assumed to be constant and the bulk viscosity $λらむだ$ is a polynomial of density up to power $βべーた$. Under the condition of $βべーた>1$, we prove the global exi… ▽ More In this paper, we consider the initial-boundary value problems of the compressible isentropic Navier-Stokes equations with density-dependent viscosity on two dimensional solid balls which was first introduced by Kazhikhov where shear viscosity $μみゅー$ is assumed to be constant and the bulk viscosity $λらむだ$ is a polynomial of density up to power $βべーた$. Under the condition of $βべーた>1$, we prove the global existence of the radially symmetric strong solutions to the Kazhikhov models under Dirichlet boundary conditions for arbitrary large initial smooth data. Moreover, the density is shown to be uniformly bounded with respect to time when $βべーた\in (\max\{1,\frac{γがんま+2}{4}\},γがんま]$. This improves the previous result of \cite{2016Huang,2022Huang} for general 2D domains where they require $βべーた>4/3$ to ensure global existence and is the first result concerning the global existence of classical solutions to the radially symmetric compressible Navier-Stokes equations in 2D solid balls under Dirichlet boundary condition. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: 31 pages

MSC Class: 35Q30

arXiv:2309.16994 [pdf]

A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater

Authors: Viorel Munteanu, Victor Gordeev, Michael Saldana, Eva Aßmann, Justin Maine Su, Nicolae Drabcinski, Oksana Zlenko, Maryna Kit, Felicia Iordachi, Khooshbu Kantibhai Patel, Abdullah Al Nahid, Likhitha Chittampalli, Yidian Xu, Pavel Skums, Shelesh Agrawal, Martin Hölzer, Adam Smith, Alex Zelikovsky, Serghei Mangul

Abstract: In light of the continuous transmission and evolution of SARS-CoV-2 coupled with a significant decline in clinical testing, there is a pressing need for scalable, cost-effective, long-term, passive surveillance tools to effectively monitor viral variants circulating in the population. Wastewater genomic surveillance of SARS-CoV-2 has arrived as an alternative to clinical genomic surveillance, allo… ▽ More In light of the continuous transmission and evolution of SARS-CoV-2 coupled with a significant decline in clinical testing, there is a pressing need for scalable, cost-effective, long-term, passive surveillance tools to effectively monitor viral variants circulating in the population. Wastewater genomic surveillance of SARS-CoV-2 has arrived as an alternative to clinical genomic surveillance, allowing to continuously monitor the prevalence of viral lineages in communities of various size at a fraction of the time, cost, and logistic effort and serving as an early warning system for emerging variants, critical for developed communities and especially for underserved ones. Importantly, lineage prevalence estimates obtained with this approach aren't distorted by biases related to clinical testing accessibility and participation. However, the relative performance of bioinformatics methods used to measure relative lineage abundances from wastewater sequencing data is unknown, preventing both the research community and public health authorities from making informed decisions regarding computational tool selection. Here, we perform comprehensive benchmarking of 18 bioinformatics methods for estimating the relative abundance of SARS-CoV-2 (sub)lineages in wastewater by using data from 36 in vitro mixtures of synthetic lineage and sublineage genomes. In addition, we use simulated data from 78 mixtures of lineages and sublineages co-occurring in the clinical setting with proportions mirroring their prevalence ratios observed in real data. Importantly, we investigate how the accuracy of the evaluated methods is impacted by the sequencing technology used, the associated error rate, the read length, read depth, but also by the exposure of the synthetic RNA mixtures to wastewater, with the goal of capturing the effects induced by the wastewater matrix, including RNA fragmentation and degradation. △ Less

Submitted 21 January, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: For correspondence: serghei.mangul@gmail.com

arXiv:2309.15718 [pdf, other]

doi 10.1038/s42256-024-00818-6

Geometry-enhanced Pre-training on Interatomic Potentials

Authors: Taoyong Cui, Chenyu Tang, Mao Su, Shufei Zhang, Yuqiang Li, Lei Bai, Yuhan Dong, Xingao Gong, Wanli Ouyang

Abstract: Machine learning interatomic potentials (MLIPs) enables molecular dynamics (MD) simulations with ab initio accuracy and has been applied to various fields of physical science. However, the performance and transferability of MLIPs are limited by insufficient labeled training data, which require expensive ab initio calculations to obtain the labels, especially for complex molecular systems. To addre… ▽ More Machine learning interatomic potentials (MLIPs) enables molecular dynamics (MD) simulations with ab initio accuracy and has been applied to various fields of physical science. However, the performance and transferability of MLIPs are limited by insufficient labeled training data, which require expensive ab initio calculations to obtain the labels, especially for complex molecular systems. To address this challenge, we design a novel geometric structure learning paradigm that consists of two stages. We first generate a large quantity of 3D configurations of target molecular system with classical molecular dynamics simulations. Then, we propose geometry-enhanced self-supervised learning consisting of masking, denoising, and contrastive learning to better capture the topology and 3D geometric information from the unlabeled 3D configurations. We evaluate our method on various benchmarks ranging from small molecule datasets to complex periodic molecular systems with more types of elements. The experimental results show that the proposed pre-training method can greatly enhance the accuracy of MLIPs with few extra computational costs and works well with different invariant or equivariant graph neural network architectures. Our method improves the generalization capability of MLIPs and helps to realize accurate MD simulations for complex molecular systems. △ Less

Submitted 12 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Journal ref: Published in Nature Machine Intelligence 2024

arXiv:2309.13326 [pdf]

SARS-CoV-2 Wastewater Genomic Surveillance: Approaches, Challenges, and Opportunities

Authors: Viorel Munteanu, Michael Saldana, Dumitru Ciorba, Viorel Bostan, Justin Maine Su, Nadiia Kasianchuk, Nitesh Kumar Sharma, Sergey Knyazev, Victor Gordeev, Eva Aßmann, Andrei Lobiuc, Mihai Covasa, Keith A. Crandall, Wenhao O. Ouyang, Nicholas C. Wu, Christopher Mason, Braden T Tierney, Alexander G Lucaci, Alex Zelikovsky, Fatemeh Mohebbi, Pavel Skums, Cynthia Gibas, Jessica Schlueter, Piotr Rzymski, Helena Solo-Gabriele , et al. (3 additional authors not shown)

Abstract: During the SARS-CoV-2 pandemic, wastewater-based genomic surveillance (WWGS) emerged as an efficient viral surveillance tool that takes into account asymptomatic cases and can identify known and novel mutations and offers the opportunity to assign known virus lineages based on the detected mutations profiles. WWGS can also hint towards novel or cryptic lineages, but it is difficult to clearly iden… ▽ More During the SARS-CoV-2 pandemic, wastewater-based genomic surveillance (WWGS) emerged as an efficient viral surveillance tool that takes into account asymptomatic cases and can identify known and novel mutations and offers the opportunity to assign known virus lineages based on the detected mutations profiles. WWGS can also hint towards novel or cryptic lineages, but it is difficult to clearly identify and define novel lineages from wastewater (WW) alone. While WWGS has significant advantages in monitoring SARS-CoV-2 viral spread, technical challenges remain, including poor sequencing coverage and quality due to viral RNA degradation. As a result, the viral RNAs in wastewater have low concentrations and are often fragmented, making sequencing difficult. WWGS analysis requires advanced computational tools that are yet to be developed and benchmarked. The existing bioinformatics tools used to analyze wastewater sequencing data are often based on previously developed methods for quantifying the expression of transcripts or viral diversity. Those methods were not developed for wastewater sequencing data specifically, and are not optimized to address unique challenges associated with wastewater. While specialized tools for analysis of wastewater sequencing data have also been developed recently, it remains to be seen how they will perform given the ongoing evolution of SARS-CoV-2 and the decline in testing and patient-based genomic surveillance. Here, we discuss opportunities and challenges associated with WWGS, including sample preparation, sequencing technology, and bioinformatics methods. △ Less

Submitted 30 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: V Munteanu and M Saldana contributed equally to this work. M Hölzer, A Smith and S Mangul jointly supervised this work. For correspondence: serghei.mangul@gmail.com

arXiv:2309.12960 [pdf, other]

Nested Event Extraction upon Pivot Element Recogniton

Authors: Weicheng Ren, Zixuan Li, Xiaolong Jin, Long Bai, Miao Su, Yantao Liu, Saiping Guan, Jiafeng Guo, Xueqi Cheng

Abstract: Nested Event Extraction (NEE) aims to extract complex event structures where an event contains other events as its arguments recursively. Nested events involve a kind of Pivot Elements (PEs) that simultaneously act as arguments of outer-nest events and as triggers of inner-nest events, and thus connect them into nested structures. This special characteristic of PEs brings challenges to existing NE… ▽ More Nested Event Extraction (NEE) aims to extract complex event structures where an event contains other events as its arguments recursively. Nested events involve a kind of Pivot Elements (PEs) that simultaneously act as arguments of outer-nest events and as triggers of inner-nest events, and thus connect them into nested structures. This special characteristic of PEs brings challenges to existing NEE methods, as they cannot well cope with the dual identities of PEs. Therefore, this paper proposes a new model, called PerNee, which extracts nested events mainly based on recognizing PEs. Specifically, PerNee first recognizes the triggers of both inner-nest and outer-nest events and further recognizes the PEs via classifying the relation type between trigger pairs. The model uses prompt learning to incorporate information from both event types and argument roles for better trigger and argument representations to improve NEE performance. Since existing NEE datasets (e.g., Genia11) are limited to specific domains and contain a narrow range of event types with nested structures, we systematically categorize nested events in the generic domain and construct a new NEE dataset, called ACE2005-Nest. Experimental results demonstrate that PerNee consistently achieves state-of-the-art performance on ACE2005-Nest, Genia11, and Genia13. The ACE2005-Nest dataset and the code of the PerNee model are available at https://github.com/waysonren/PerNee. △ Less

Submitted 7 April, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: Accepted at LREC-COLING 2024

arXiv:2309.09872 [pdf, ps, other]

Moment-assisted GMM for Improving Subsampling-based MLE with Large-scale data

Authors: Miaomiao Su, Qihua Wang, Ruoyu Wang

Abstract: The maximum likelihood estimation is computationally demanding for large datasets, particularly when the likelihood function includes integrals. Subsampling can reduce the computational burden, but it typically results in efficiency loss. This paper proposes a moment-assisted subsampling (MAS) method that can improve the estimation efficiency of existing subsampling-based maximum likelihood estima… ▽ More The maximum likelihood estimation is computationally demanding for large datasets, particularly when the likelihood function includes integrals. Subsampling can reduce the computational burden, but it typically results in efficiency loss. This paper proposes a moment-assisted subsampling (MAS) method that can improve the estimation efficiency of existing subsampling-based maximum likelihood estimators. The motivation behind this approach stems from the fact that sample moments can be efficiently computed even if the sample size of the whole data set is huge. Through the generalized method of moments, the proposed method incorporates informative sample moments of the whole data. The MAS estimator can be computed rapidly and is asymptotically normal with a smaller asymptotic variance than the corresponding estimator without incorporating sample moments of the whole data. The asymptotic variance of the MAS estimator depends on the specific sample moments incorporated. We derive the optimal moment that minimizes the resulting asymptotic variance in terms of Loewner order. Simulation studies and real data analysis were conducted to compare the proposed method with existing subsampling methods. Numerical results demonstrate the promising performance of the MAS method across various scenarios. △ Less

Submitted 20 July, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

arXiv:2308.02269 [pdf, other]

Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph

Authors: Hiroki Arimura, Shunsuke Inenaga, Yasuaki Kobayashi, Yuto Nakashima, Mizuki Sue

Abstract: In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size $e$ for a text $T$ of length $n$ into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size $r$, the irreducible PLCP array of size $r$, and the qu… ▽ More In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size $e$ for a text $T$ of length $n$ into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size $r$, the irreducible PLCP array of size $r$, and the quasi-irreducible LPF array of size $e$, as well as the lex-parse of size $O(r)$ and the LZ77-parse of size $z$, where $r, z \le e$. As main results, we showed that the above structures can be optimally computed from either the CDAWG for $T$ stored in read-only memory or its self-index version of size $e$ without a text in $O(e)$ worst-case time and words of working space. To obtain the above results, we devised techniques for enumerating a particular subset of suffixes in the lexicographic and text orders using the forward and backward search on the CDAWG by extending the results by Belazzougui et al. in 2015. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: The short version of this paper will appear in SPIRE 2023, Pisa, Italy, September 26-28, 2023, Lecture Notes in Computer Science, Springer

arXiv:2306.07505 [pdf]

Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV. Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01). Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2304.06292 [pdf, ps, other]

Improved Naive Bayes with Mislabeled Data

Authors: Qianhan Zeng, Yingqiu Zhu, Xuening Zhu, Feifei Wang, Weichen Zhao, Shuning Sun, Meng Su, Hansheng Wang

Abstract: Labeling mistakes are frequently encountered in real-world applications. If not treated well, the labeling mistakes can deteriorate the classification performances of a model seriously. To address this issue, we propose an improved Naive Bayes method for text classification. It is analytically simple and free of subjective judgements on the correct and incorrect labels. By specifying the generatin… ▽ More Labeling mistakes are frequently encountered in real-world applications. If not treated well, the labeling mistakes can deteriorate the classification performances of a model seriously. To address this issue, we propose an improved Naive Bayes method for text classification. It is analytically simple and free of subjective judgements on the correct and incorrect labels. By specifying the generating mechanism of incorrect labels, we optimize the corresponding log-likelihood function iteratively by using an EM algorithm. Our simulation and experiment results show that the improved Naive Bayes method greatly improves the performances of the Naive Bayes method with mislabeled data. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2212.04772 [pdf]

doi 10.1021/acs.inorgchem.1c03675

Telluride nanocrystals with adjustable amorphous shell thickness and core-shell structure modulation by aqueous cation-exchange

Authors: Xinyuan Li, Mengyao Su, Yi-Chi Wang, Meng Xu, Minman Tong, Sarah J. Haigh, Jiatao Zhang

Abstract: Engineering the structure of core-shell colloidal semiconductor nanoparticles (CSNPs) is attractive due to the potential to enhance photo-induced charge transfer (PICT) and induce favourable optical and electronic properties. Nonetheless, the sensitivity of telluride CSNPs to high temperatures makes it challenging to precisely modulate their surface crystallinity. Herein, we have developed an effi… ▽ More Engineering the structure of core-shell colloidal semiconductor nanoparticles (CSNPs) is attractive due to the potential to enhance photo-induced charge transfer (PICT) and induce favourable optical and electronic properties. Nonetheless, the sensitivity of telluride CSNPs to high temperatures makes it challenging to precisely modulate their surface crystallinity. Herein, we have developed an efficient strategy for synthesising telluride CSNPs with thin amorphous shells using aqueous cation exchange (ACE). By changing the synthesis temperature in the range 40 to 110C, the crystallinity of the CdTe nanoparticles was controllable from perfect crystals with no detectable amorphous shell (c-CdTe) to a core-shell structure with a crystalline CdTe NP core covered by an amorphous shell of tunable thickness up to 7-8nm (c@a-CdTe) . A second ACE step transformed the c@a-CdTe to crystalline CdTe@HgTe core-shell NPs. The c@a-CdTe nanoparticles synthesized at 60C and having a 4-5 nm thick amorphous shell, exhibited the highest surface-enhanced Raman scattering activity with a high enhancement factor around 8.82x10^5, attributed to the coupling between the amorphous shell and the crystalline core. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: 15 pages, 5 figures, plus supplementary information

Journal ref: Inorganic Chemistry 61 (2022) 3989

arXiv:2212.03741 [pdf, other]

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation

Authors: Ronghui Li, Junfan Zhao, Yachao Zhang, Mingyang Su, Zeping Ren, Han Zhang, Yansong Tang, Xiu Li

Abstract: Generating full-body and multi-genre dance sequences from given music is a challenging task, due to the limitations of existing datasets and the inherent complexity of the fine-grained hand motion and dance genres. To address these problems, we propose FineDance, which contains 14.6 hours of music-dance paired data, with fine-grained hand motions, fine-grained genres (22 dance genres), and accurat… ▽ More Generating full-body and multi-genre dance sequences from given music is a challenging task, due to the limitations of existing datasets and the inherent complexity of the fine-grained hand motion and dance genres. To address these problems, we propose FineDance, which contains 14.6 hours of music-dance paired data, with fine-grained hand motions, fine-grained genres (22 dance genres), and accurate posture. To the best of our knowledge, FineDance is the largest music-dance paired dataset with the most dance genres. Additionally, to address monotonous and unnatural hand movements existing in previous methods, we propose a full-body dance generation network, which utilizes the diverse generation capabilities of the diffusion model to solve monotonous problems, and use expert nets to solve unreal problems. To further enhance the genre-matching and long-term stability of generated dances, we propose a Genre&Coherent aware Retrieval Module. Besides, we propose a novel metric named Genre Matching Score to evaluate the genre-matching degree between dance and music. Quantitative and qualitative experiments demonstrate the quality of FineDance, and the state-of-the-art performance of FineNet. The FineDance Dataset and more qualitative samples can be found at our website. △ Less

Submitted 30 August, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

Comments: Accepted by ICCV 2023

arXiv:2210.16190 [pdf]

doi 10.1038/s41524-023-01130-4

Transferable E(3) equivariant parameterization for Hamiltonian of molecules and solids

Authors: Yang Zhong, Hongyu Yu, Mao Su, Xingao Gong, Hongjun Xiang

Abstract: Using the message-passing mechanism in machine learning (ML) instead of self-consistent iterations to directly build the mapping from structures to electronic Hamiltonian matrices will greatly improve the efficiency of density functional theory (DFT) calculations. In this work, we proposed a general analytic Hamiltonian representation in an E(3) equivariant framework, which can fit the ab initio H… ▽ More Using the message-passing mechanism in machine learning (ML) instead of self-consistent iterations to directly build the mapping from structures to electronic Hamiltonian matrices will greatly improve the efficiency of density functional theory (DFT) calculations. In this work, we proposed a general analytic Hamiltonian representation in an E(3) equivariant framework, which can fit the ab initio Hamiltonian of molecules and solids by a complete data-driven method and are equivariant under rotation, space inversion, and time reversal operations. Our model reached state-of-the-art precision in the benchmark test and accurately predicted the electronic Hamiltonian matrices and related properties of various periodic and aperiodic systems, showing high transferability and generalization ability. This framework provides a general transferable model that can be used to accelerate the electronic structure calculations on different large systems with the same network weights trained on small structures. △ Less

Submitted 4 February, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: 33 pages, 6 figures

arXiv:2209.04260 [pdf, other]

doi 10.1103/PhysRevD.106.063026

Search for relativistic fractionally charged particles in space

Authors: DAMPE Collaboration, F. Alemanno, C. Altomare, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De-Benedittis, I. De Mitri, F. de Palma, M. Deliyergiyev, A. Di Giovanni, M. Di Santo , et al. (126 additional authors not shown)

Abstract: More than a century after the performance of the oil drop experiment, the possible existence of fractionally charged particles FCP still remains unsettled. The search for FCPs is crucial for some extensions of the Standard Model in particle physics. Most of the previously conducted searches for FCPs in cosmic rays were based on experiments underground or at high altitudes. However, there have been… ▽ More More than a century after the performance of the oil drop experiment, the possible existence of fractionally charged particles FCP still remains unsettled. The search for FCPs is crucial for some extensions of the Standard Model in particle physics. Most of the previously conducted searches for FCPs in cosmic rays were based on experiments underground or at high altitudes. However, there have been few searches for FCPs in cosmic rays carried out in orbit other than AMS-01 flown by a space shuttle and BESS by a balloon at the top of the atmosphere. In this study, we conduct an FCP search in space based on on-orbit data obtained using the DArk Matter Particle Explorer (DAMPE) satellite over a period of five years. Unlike underground experiments, which require an FCP energy of the order of hundreds of GeV, our FCP search starts at only a few GeV. An upper limit of $6.2\times 10^{-10}~~\mathrm{cm^{-2}sr^{-1} s^{-1}}$ is obtained for the flux. Our results demonstrate that DAMPE exhibits higher sensitivity than experiments of similar types by three orders of magnitude that more stringently restricts the conditions for the existence of FCP in primary cosmic rays. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: 19 pages, 6 figures, accepted by PRD

Report number: 106, 063026

Journal ref: Physical Review D 106.6 (2022): 063026

arXiv:2207.11651 [pdf]

Improved Multi-Dimensional Bee Colony Algorithm for Airport Freight Station Scheduling

Authors: Haiquan Wang, Menghao Su, Ran Zhao, Xiaobin Xu, Hans-Dietrich Haasis, Jianhua Wei, Shengjun Wen, Yan Wang, Ping Liu, Hongjun Li

Abstract: Due to the rapid increase of air cargo and postal transport volume, an efficient automated multi-dimensional warehouse with elevating transfer vehicles (ETVs) should be established and an effective scheduling strategy should be designed for improving the cargo handling efficiency. In this paper, artificial bee colony algorithm, which possesses strong global optimization ability and fewer parameter… ▽ More Due to the rapid increase of air cargo and postal transport volume, an efficient automated multi-dimensional warehouse with elevating transfer vehicles (ETVs) should be established and an effective scheduling strategy should be designed for improving the cargo handling efficiency. In this paper, artificial bee colony algorithm, which possesses strong global optimization ability and fewer parameters, is firstly introduced to simultaneously optimize the route of ETV and the assignment of entrances and exits. Moreover, for further improve the optimization performance of ABC, novel full-dimensional search strategy with parallelization, and random multi-dimensional search strategy are incorporated in the framework of ABC to improve the diversity of the population and the convergence speed respectively. Our proposed algorithms are evaluated on several benchmark functions, and then applied to solve the combinatorial optimization problem with multitask, multiple entrances and exits in air cargo terminal. The simulations show that the proposed algorithms can achieve much more desired performance than the traditional artificial bee colony algorithm at balancing the exploitation and exploration abilities. △ Less

Submitted 23 July, 2022; originally announced July 2022.

arXiv:2206.00447 [pdf, other]

CD$^2$: Fine-grained 3D Mesh Reconstruction With Twice Chamfer Distance

Authors: Rongfei Zeng, Mai Su, Ruiyun Yu, Xingwei Wang

Abstract: Monocular 3D reconstruction is to reconstruct the shape of object and its other information from a single RGB image. In 3D reconstruction, polygon mesh, with detailed surface information and low computational cost, is the most prevalent expression form obtained from deep learning models. However, the state-of-the-art schemes fail to directly generate well-structured meshes, and we identify that mo… ▽ More Monocular 3D reconstruction is to reconstruct the shape of object and its other information from a single RGB image. In 3D reconstruction, polygon mesh, with detailed surface information and low computational cost, is the most prevalent expression form obtained from deep learning models. However, the state-of-the-art schemes fail to directly generate well-structured meshes, and we identify that most meshes have severe Vertices Clustering (VC) and Illegal Twist (IT) problems. By analyzing the mesh deformation process, we pinpoint that the inappropriate usage of Chamfer Distance (CD) loss is a root cause of VC and IT problems in deep learning model. In this paper, we initially demonstrate these two problems induced by CD loss with visual examples and quantitative analyses. Then, we propose a fine-grained reconstruction method CD$^2$ by employing Chamfer distance twice to perform a plausible and adaptive deformation. Extensive experiments on two 3D datasets and comparisons with five latest schemes demonstrate that our CD$^2$ directly generates a well-structured mesh and outperforms others in terms of several quantitative metrics. △ Less

Submitted 29 January, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Just accepted by TOMM

arXiv:2205.12762 [pdf, other]

doi 10.1103/PhysRevA.105.052432

On-demand multimode optical storage in a laser-written on-chip waveguide

Authors: Ming-Xu Su, Tian-Xiang Zhu, Chao Liu, Zong-Quan Zhou, Chuan-Feng Li, Guang-Can Guo

Abstract: Quantum memory is a fundamental building block for large-scale quantum networks. On-demand optical storage with a large bandwidth, a high multimode capacity and an integrated structure simultaneously is crucial for practical application. However, this has not been demonstrated yet. Here, we fabricate an on-chip waveguide in a $\mathrm {^{151}Eu^{3+}:Y_2SiO_5}$ crystal with insertion losses of 0.2… ▽ More Quantum memory is a fundamental building block for large-scale quantum networks. On-demand optical storage with a large bandwidth, a high multimode capacity and an integrated structure simultaneously is crucial for practical application. However, this has not been demonstrated yet. Here, we fabricate an on-chip waveguide in a $\mathrm {^{151}Eu^{3+}:Y_2SiO_5}$ crystal with insertion losses of 0.2 dBでしべる, and propose a novel pumping scheme to enable spin-wave atomic frequency comb (AFC) storage with a bandwidth of 11 MHz inside the waveguide. Based on this, we demonstrate the storage of 200 temporal modes using the AFC scheme and conditional on-demand storage of 100 temporal modes using the spin-wave AFC scheme. The interference visibility between the readout light field and the reference light field is $99.0\% \pm 0.6\%$ and $97\% \pm 3\%$ for AFC and spin-wave AFC storage, respectively, indicating the coherent nature of this low-loss, multimode and integrated storage device. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Journal ref: Phys. Rev. A 105, 052432 (2022)

arXiv:2205.09127 [pdf, other]

doi 10.1007/JHEP08(2022)140

Naturally Light Dirac and Pseudo-Dirac Neutrinos from Left-Right Symmetry

Authors: K. S. Babu, Xiao-Gang He, Mingxian Su, Anil Thapa

Abstract: We develop a class of left-right symmetric theories based on the gauge group $SU(3)_c \times SU(2)_L \times SU(2)_R \times U(1)$ with a generalized seesaw mechanism for generating the charged fermion masses. Neutrinos are naturally Dirac particles in this setup with their small masses arising from two-loop quantum corrections. We evaluate these two-loop diagrams exactly and analyze the flavor stru… ▽ More We develop a class of left-right symmetric theories based on the gauge group $SU(3)_c \times SU(2)_L \times SU(2)_R \times U(1)$ with a generalized seesaw mechanism for generating the charged fermion masses. Neutrinos are naturally Dirac particles in this setup with their small masses arising from two-loop quantum corrections. We evaluate these two-loop diagrams exactly and analyze the flavor structure of the lepton sector. We find excellent fits to neutrino oscillation data, independent of the right-handed gauge symmetry breaking scale. We also explore the possibility that neutrinos are pseudo-Dirac particles in this framework, with the tiny mass splittings between active and sterile neutrinos arising from Planck-induced corrections and find possible realizations. These models can be tested in the near future with precision cosmological measurements of $ΔでるたN_{\rm eff}$ in CMB which is predicted to be $\simeq 0.14$. This class of models allows for a solution to the strong CP problem via parity symmetry without the need for an axion. △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: 18 pages and references, 4 figures

arXiv:2205.05475 [pdf]

Efficient determination of the Hamiltonian and electronic properties using graph neural network with complete local coordinates

Authors: Mao Su, Ji-Hui Yang, Hong-Jun Xiang, Xin-Gao Gong

Abstract: Despite the successes of machine learning methods in physical sciences, prediction of the Hamiltonian, and thus electronic properties, is still unsatisfactory. Here, based on graph neural network architecture, we present an extendable neural network model to determine the Hamiltonian from ab initio data, with only local atomic structures as inputs. Rotational equivariance of the Hamiltonian is ach… ▽ More Despite the successes of machine learning methods in physical sciences, prediction of the Hamiltonian, and thus electronic properties, is still unsatisfactory. Here, based on graph neural network architecture, we present an extendable neural network model to determine the Hamiltonian from ab initio data, with only local atomic structures as inputs. Rotational equivariance of the Hamiltonian is achieved by our complete local coordinates. The local coordinates information, encoded using the convolutional neural network and designed to preserve Hermitian symmetry, is used to map hopping parameters onto local structures. We demonstrate the performance of our model using graphene and SiGe random alloys as examples. We show that our neural network model, although trained using small-size systems, can predict the Hamiltonian, as well as electronic properties such as band structures and densities of states (DOS) for large-size systems within the ab initio accuracy, justifying its extensibility. In combination with the high efficiency of our model, which takes only seconds to get the Hamiltonian of a 1728-atom system, present work provides a general framework to predict electronic properties efficiently and accurately, which provides new insights into computational physics and will accelerate the research for large-scale materials. △ Less

Submitted 11 January, 2023; v1 submitted 11 May, 2022; originally announced May 2022.

Comments: 21 pages, 7 figures

arXiv:2204.01601 [pdf, other]

Towards Privacy-Preserving and Verifiable Federated Matrix Factorization

Authors: Xicheng Wan, Yifeng Zheng, Qun Li, Anmin Fu, Mang Su, Yansong Gao

Abstract: Recent years have witnessed the rapid growth of federated learning (FL), an emerging privacy-aware machine learning paradigm that allows collaborative learning over isolated datasets distributed across multiple participants. The salient feature of FL is that the participants can keep their private datasets local and only share model updates. Very recently, some research efforts have been initiated… ▽ More Recent years have witnessed the rapid growth of federated learning (FL), an emerging privacy-aware machine learning paradigm that allows collaborative learning over isolated datasets distributed across multiple participants. The salient feature of FL is that the participants can keep their private datasets local and only share model updates. Very recently, some research efforts have been initiated to explore the applicability of FL for matrix factorization (MF), a prevalent method used in modern recommendation systems and services. It has been shown that sharing the gradient updates in federated MF entails privacy risks on revealing users' personal ratings, posing a demand for protecting the shared gradients. Prior art is limited in that they incur notable accuracy loss, or rely on heavy cryptosystem, with a weak threat model assumed. In this paper, we propose VPFedMF, a new design aimed at privacy-preserving and verifiable federated MF. VPFedMF provides guarantees on the confidentiality of individual gradient updates through lightweight and secure aggregation. Moreover, VPFedMF ambitiously and newly supports correctness verification of the aggregation results produced by the coordinating server in federated MF. Experiments on a real-world movie rating dataset demonstrate the practical performance of VPFedMF in terms of computation, communication, and accuracy. △ Less

Submitted 11 June, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: Accepted by Knowledge-Based Systems

arXiv:2201.03691 [pdf, other]

doi 10.1103/PhysRevLett.128.180501

On-demand Integrated Quantum Memory for Polarization Qubits

Authors: Tian-Xiang Zhu, Chao Liu, Ming Jin, Ming-Xu Su, Yu-Ping Liu, Wen-Juan Li, Yang Ye, Zong-Quan Zhou, Chuan-Feng Li, Guang-Can Guo

Abstract: Photonic polarization qubits are widely used in quantum computation and quantum communication due to the robustness in transmission and the easy qubit manipulation. An integrated quantum memory for polarization qubits is a fundamental building block for large-scale integrated quantum networks. However, on-demand storing polarization qubits in an integrated quantum memory is a long-standing challen… ▽ More Photonic polarization qubits are widely used in quantum computation and quantum communication due to the robustness in transmission and the easy qubit manipulation. An integrated quantum memory for polarization qubits is a fundamental building block for large-scale integrated quantum networks. However, on-demand storing polarization qubits in an integrated quantum memory is a long-standing challenge due to the anisotropic absorption of solids and the polarization-dependent features of microstructures. Here we demonstrate a reliable on-demand quantum memory for polarization qubits, using a depressed-cladding waveguide fabricated in a 151Eu3+: Y2SiO5 crystal. The site-2 151Eu3+ ions in Y2SiO5 crystal provides a near-uniform absorption for arbitrary polarization states and a new pump sequence is developed to prepare a wideband and enhanced absorption profile. A fidelity of 99.4\pm0.6% is obtained for the qubit storage process with an input of 0.32 photons per pulse, together with a storage bandwidth of 10 MHz. This reliable integrated quantum memory for polarization qubits reveals the potential for use in the construction of integrated quantum networks. △ Less

Submitted 10 January, 2022; originally announced January 2022.

Comments: 20 pages, 5 figures

MSC Class: 81P45

arXiv:2112.08860 [pdf, other]

doi 10.1016/j.scib.2021.12.015

Search for gamma-ray spectral lines with the DArk Matter Particle Explorer

Authors: Francesca Alemanno, Qi An, Philipp Azzarello, Felicia Carla Tiziana Barbato, Paolo Bernardini, Xiao-Jun Bi, Ming-Sheng Cai, Elisabetta Casilli, Enrico Catanzani, Jin Chang, Deng-Yi Chen, Jun-Ling Chen, Zhan-Fang Chen, Ming-Yang Cui, Tian-Shu Cui, Yu-Xing Cui, Hao-Ting Dai, Antonio De Benedittis, Ivan De Mitri, Francesco de Palma, Maksym Deliyergiyev, Margherita Di Santo, Qi Ding, Tie-Kuang Dong, Zhen-Xing Dong , et al. (121 additional authors not shown)

Abstract: The DArk Matter Particle Explorer (DAMPE) is well suitable for searching for monochromatic and sharp $γがんま$-ray structures in the GeV$-$TeV range thanks to its unprecedented high energy resolution. In this work, we search for $γがんま$-ray line structures using five years of DAMPE data. To improve the sensitivity, we develop two types of dedicated data sets (including the BgoOnly data which is the first ti… ▽ More The DArk Matter Particle Explorer (DAMPE) is well suitable for searching for monochromatic and sharp $γがんま$-ray structures in the GeV$-$TeV range thanks to its unprecedented high energy resolution. In this work, we search for $γがんま$-ray line structures using five years of DAMPE data. To improve the sensitivity, we develop two types of dedicated data sets (including the BgoOnly data which is the first time to be used in the data analysis for the calorimeter-based gamma-ray observatories) and adopt the signal-to-noise ratio optimized regions of interest (ROIs) for different DM density profiles. No line signals or candidates are found between 10 and 300 GeV in the Galaxy. The constraints on the velocity-averaged cross section for $χかいχかい\to γがんまγがんま$ and the decay lifetime for $χかい\to γがんまνにゅー$, both at 95% confidence level, have been calculated and the systematic uncertainties have been taken into account. Comparing to the previous Fermi-LAT results, though DAMPE has an acceptance smaller by a factor of $\sim 10$, similar constraints on the DM parameters are achieved and below 100 GeV the lower limits on the decay lifetime are even stronger by a factor of a few. Our results demonstrate the potential of high-energy-resolution observations on dark matter detection. △ Less

Submitted 6 December, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: 14 pages, 8 figures. Update the content to keep up with the published version

Journal ref: Science Bulletin, Volume 67, Issue 7, 15 April 2022, Pages 679-684

arXiv:2112.01215 [pdf]

Adaptive Group Collaborative Artificial Bee Colony Algorithm

Authors: Haiquan Wang, Hans-DietrichHaasis, Panpan Du, Xiaobin Xu, Menghao Su, Shengjun Wen, Wenxuan Yue, Shanshan Zhang

Abstract: As an effective algorithm for solving complex optimization problems, artificial bee colony (ABC) algorithm has shown to be competitive, but the same as other population-based algorithms, it is poor at balancing the abilities of global searching in the whole solution space (named as exploration) and quick searching in local solution space which is defined as exploitation. For improving the performa… ▽ More As an effective algorithm for solving complex optimization problems, artificial bee colony (ABC) algorithm has shown to be competitive, but the same as other population-based algorithms, it is poor at balancing the abilities of global searching in the whole solution space (named as exploration) and quick searching in local solution space which is defined as exploitation. For improving the performance of ABC, an adaptive group collaborative ABC (AgABC) algorithm is introduced where the population in different phases is divided to specific groups and different search strategies with different abilities are assigned to the members in groups, and the member or strategy which obtains the best solution will be employed for further searching. Experimental results on benchmark functions show that the proposed algorithm with dynamic mechanism is superior to other algorithms in searching accuracy and stability. Furthermore, numerical experiments show that the proposed method can generate the optimal solution for the complex scheduling problem. △ Less

Submitted 2 December, 2021; originally announced December 2021.

arXiv:2112.00447 [pdf]

An improved bearing fault detection strategy based on artificial bee colony algorithm

Authors: Haiquan Wang, Wenxuan Yue, Shengjun Wen, Xiaobin Xu, Menghao Su, Shanshan Zhang, Panpan Du

Abstract: The operating state of bearing directly affects the performance of rotating machinery and how to accurately and decisively extract features from the original vibration signal and recognize the faulty parts as early as possible is very critical. In this study, the one-dimensional ternary model which has been proved to be an effective statistical method in feature selection is introduced and shapele… ▽ More The operating state of bearing directly affects the performance of rotating machinery and how to accurately and decisively extract features from the original vibration signal and recognize the faulty parts as early as possible is very critical. In this study, the one-dimensional ternary model which has been proved to be an effective statistical method in feature selection is introduced and shapelets transformation is proposed to calculate the parameter of it which is also the standard deviation of the transformed shaplets that is usually selected by trial and error. Moreover, XGBoost is used to recognize the faults from the obtained features, and an improved artificial bee colony algorithm(ABC) where the evolution is guided by the importance indices of different search space is proposed to optimize the parameters of XGBoost. Here the value of importance index is related to the probability of optimal solutions in certain space, thus the problem of easily falling into local optimality in traditional ABC could be avoided.The experimental results based on the failure vibration signal samples show that the average accuracy of fault signal recognition can reach 97% which is much higher than the ones corresponding to other extraction strategies, thus the ability of extraction could be improved. And with the improved artificial bee colony algorithm which is used to optimize the parameters of XGBoost, the classification accuracy could be improved from 97.02% to about 98.60% compared with the traditional classification strategy △ Less

Submitted 2 December, 2021; v1 submitted 1 December, 2021; originally announced December 2021.

arXiv:2110.13499 [pdf, other]

SEDML: Securely and Efficiently Harnessing Distributed Knowledge in Machine Learning

Authors: Yansong Gao, Qun Li, Yifeng Zheng, Guohong Wang, Jiannan Wei, Mang Su

Abstract: Training high-performing deep learning models require a rich amount of data which is usually distributed among multiple data sources in practice. Simply centralizing these multi-sourced data for training would raise critical security and privacy concerns, and might be prohibited given the increasingly strict data regulations. To resolve the tension between privacy and data utilization in distribut… ▽ More Training high-performing deep learning models require a rich amount of data which is usually distributed among multiple data sources in practice. Simply centralizing these multi-sourced data for training would raise critical security and privacy concerns, and might be prohibited given the increasingly strict data regulations. To resolve the tension between privacy and data utilization in distributed learning, a machine learning framework called private aggregation of teacher ensembles(PATE) has been recently proposed. PATE harnesses the knowledge (label predictions for an unlabeled dataset) from distributed teacher models to train a student model, obviating access to distributed datasets. Despite being enticing, PATE does not offer protection for the individual label predictions from teacher models, which still entails privacy risks. In this paper, we propose SEDML, a new protocol which allows to securely and efficiently harness the distributed knowledge in machine learning. SEDML builds on lightweight cryptography and provides strong protection for the individual label predictions, as well as differential privacy guarantees on the aggregation results. Extensive evaluations show that while providing privacy protection, SEDML preserves the accuracy as in the plaintext baseline. Meanwhile, SEDML's performance in computing and communication is 43 times and 1.23 times higher than the latest technology, respectively. △ Less

Submitted 26 October, 2021; originally announced October 2021.

arXiv:2110.00123 [pdf, other]

doi 10.3847/2041-8213/ac2de6

Observations of Forbush Decreases of cosmic ray electrons and positrons with the Dark Matter Particle Explorer

Authors: Francesca Alemanno, Qi An, Philipp Azzarello, Felicia Carla Tiziana Barbato, Paolo Bernardini, XiaoJun Bi, MingSheng Cai, Elisabetta Casilli, Enrico Catanzani, Jin Chang, DengYi Chen, JunLing Chen, ZhanFang Chen, MingYang Cui, TianShu Cui, YuXing Cui, HaoTing Dai, Antonio De Benedittis, Ivan De Mitri, Francesco de Palma, Maksym Deliyergiyev, Margherita Di Santo, Qi Ding, TieKuang Dong, ZhenXing Dong , et al. (124 additional authors not shown)

Abstract: The Forbush Decrease (FD) represents the rapid decrease of the intensities of charged particles accompanied with the coronal mass ejections (CMEs) or high-speed streams from coronal holes. It has been mainly explored with ground-based neutron monitors network which indirectly measure the integrated intensities of all species of cosmic rays by counting secondary neutrons produced from interaction b… ▽ More The Forbush Decrease (FD) represents the rapid decrease of the intensities of charged particles accompanied with the coronal mass ejections (CMEs) or high-speed streams from coronal holes. It has been mainly explored with ground-based neutron monitors network which indirectly measure the integrated intensities of all species of cosmic rays by counting secondary neutrons produced from interaction between atmosphere atoms and cosmic rays. The space-based experiments can resolve the species of particles but the energy ranges are limited by the relative small acceptances except for the most abundant particles like protons and helium. Therefore, the FD of cosmic ray electrons and positrons have just been investigated by the PAMELA experiment in the low energy range ($<5$ GeV) with limited statistics. In this paper, we study the FD event occurred in September, 2017, with the electron and positron data recorded by the Dark Matter Particle Explorer. The evolution of the FDs from 2 GeV to 20 GeV with a time resolution of 6 hours are given. We observe two solar energetic particle events in the time profile of the intensity of cosmic rays, the earlier and weak one has not been shown in the neutron monitor data. Furthermore, both the amplitude and recovery time of fluxes of electrons and positrons show clear energy-dependence, which is important in probing the disturbances of the interplanetary environment by the coronal mass ejections. △ Less

Submitted 30 September, 2021; originally announced October 2021.

Comments: This article is dedicated to the 72nd anniversary of People's Republic of China

arXiv:2108.02341 [pdf, other]

doi 10.1103/PhysRevD.104.095017

Loop effect with vector mediator in the coherent neutrino-nucleus scattering

Authors: Wei Chao, Tong Li, Jiajun Liao, Min Su

Abstract: The observation of the coherent elastic neutrino-nucleus scattering (CE$νにゅー$NS) provides us opportunities to explore a wide class of new physics. In the Standard Model (SM), the CE$νにゅー$NS process arises from the vector and axial-vector neutral currents through the exchange of $Z$ boson and the axial-vector current contribution turns out to be subdominant. It is thus natural to consider the extra contr… ▽ More The observation of the coherent elastic neutrino-nucleus scattering (CE$νにゅー$NS) provides us opportunities to explore a wide class of new physics. In the Standard Model (SM), the CE$νにゅー$NS process arises from the vector and axial-vector neutral currents through the exchange of $Z$ boson and the axial-vector current contribution turns out to be subdominant. It is thus natural to consider the extra contributions to CE$νにゅー$NS from more generic new physics beyond the SM with (axial-)vector interactions associated with a new vector mediator $Z'$. Besides the ordinary CE$νにゅー$NS, the active neutrinos can convert into a new exotic fermion $χかい$ through the process $νにゅーN\to χかいN$ mediated by $Z'$ without violating the coherence. It would be interesting to consider the implication of this conversion for the new fermion sector beyond the SM. We consider the framework of a simplified neutrino model in which a new Dirac fermion $χかい$ interacts with active neutrinos and a leptophobic vector mediator $Z'$. We evaluate both the tree-level and loop-level contributions to the CE$νにゅー$NS and in particular the loop diagrams produce active neutrino elastic scattering process $νにゅーN\to νにゅーN$ with the fermion $χかい$ inside the loops. When the interaction between $Z'$ and the SM quarks is vector type and axial-vector type, the CE$νにゅー$NS processes are respectively dominated by the tree-level and loop-level contributions. We investigate the constraints on the model parameters by fitting to the COHERENT data, assuming a wide range of $m_χかい$. The parameter space with $m_χかい$ larger than the maximal energy of incoming neutrinos can be constrained by including the loop-level contribution. More importantly, the inclusion of loop diagrams can place constraint on axial-vector interaction whose tree-level process is absent in the coherent neutrino-nucleus scattering. △ Less

Submitted 20 November, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: 19 pages, 4 figures. version published in PRD

arXiv:2107.13208 [pdf, ps, other]

Optimal gamma-ray selections for monochromatic line searches with DAMPE

Authors: Zun-Lei Xu, Kai-Kai Duan, Wei Jiang, Shi-Jun Lei, Xiang Li, Zhao-Qiang Shen, Tao Ma, Meng Su, Qiang Yuan, Chuan Yue, Yi-Zhong Fan, Jin Chang

Abstract: The DArk Matter Particle Explorer (DAMPE) is a space high-energy cosmic-ray detector covering a wide energy band with a high energy resolution. One of the key scientific goals of DAMPE is to carry out indirect detection of dark matter by searching for high-energy gamma-ray line structure. To promote the sensitivity of gamma-ray line search with DAMPE, it is crucial to improve the acceptance and en… ▽ More The DArk Matter Particle Explorer (DAMPE) is a space high-energy cosmic-ray detector covering a wide energy band with a high energy resolution. One of the key scientific goals of DAMPE is to carry out indirect detection of dark matter by searching for high-energy gamma-ray line structure. To promote the sensitivity of gamma-ray line search with DAMPE, it is crucial to improve the acceptance and energy resolution of gamma-ray photons. In this paper, we quantitatively prove that the photon sample with the largest ratio of acceptance to energy resolution is optimal for line search. We therefore develop a line-search sample specifically optimized for the line search. Meanwhile, in order to increase the statistics, we also selected the so called BGO-only photons that convert into $e^+e^-$ pairs only in the BGO calorimeter. The standard, the line-search, and the BGO-only photon samples are then tested for line search individually and collectively. The results show that a significantly improved limit could be obtained from an appropriate combination of the date sets, and the increase is about 20\% for the highest case compared with using the standard sample only. △ Less

Submitted 11 November, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

arXiv:2106.12288 [pdf, other]

MG-DVD: A Real-time Framework for Malware Variant Detection Based on Dynamic Heterogeneous Graph Learning

Authors: Chen Liu, Bo Li, Jun Zhao, Ming Su, Xu-Dong Liu

Abstract: Detecting the newly emerging malware variants in real time is crucial for mitigating cyber risks and proactively blocking intrusions. In this paper, we propose MG-DVD, a novel detection framework based on dynamic heterogeneous graph learning, to detect malware variants in real time. Particularly, MG-DVD first models the fine-grained execution event streams of malware variants into dynamic heteroge… ▽ More Detecting the newly emerging malware variants in real time is crucial for mitigating cyber risks and proactively blocking intrusions. In this paper, we propose MG-DVD, a novel detection framework based on dynamic heterogeneous graph learning, to detect malware variants in real time. Particularly, MG-DVD first models the fine-grained execution event streams of malware variants into dynamic heterogeneous graphs and investigates real-world meta-graphs between malware objects, which can effectively characterize more discriminative malicious evolutionary patterns between malware and their variants. Then, MG-DVD presents two dynamic walk-based heterogeneous graph learning methods to learn more comprehensive representations of malware variants, which significantly reduces the cost of the entire graph retraining. As a result, MG-DVD is equipped with the ability to detect malware variants in real time, and it presents better interpretability by introducing meaningful meta-graphs. Comprehensive experiments on large-scale samples prove that our proposed MG-DVD outperforms state-of-the-art methods in detecting malware variants in terms of effectiveness and efficiency. △ Less

Submitted 24 June, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

Comments: 8 pages, 7 figures, Accepted at the 30th International Joint Conference on Artificial Intelligence(IJCAI 2021)

Showing 1–50 of 141 results for author: Su, M