-
Uni-ELF: A Multi-Level Representation Learning Framework for Electrolyte Formulation Design
Authors:
Boshen Zeng,
Sian Chen,
Xinxin Liu,
Changhong Chen,
Bin Deng,
Xiaoxu Wang,
Zhifeng Gao,
Yuzhi Zhang,
Weinan E,
Linfeng Zhang
Abstract:
Advancements in lithium battery technology heavily rely on the design and engineering of electrolytes. However, current schemes for molecular design and recipe optimization of electrolytes lack an effective computational-experimental closed loop and often fall short in accurately predicting diverse electrolyte formulation properties. In this work, we introduce Uni-ELF, a novel multi-level represen…
▽ More
Advancements in lithium battery technology heavily rely on the design and engineering of electrolytes. However, current schemes for molecular design and recipe optimization of electrolytes lack an effective computational-experimental closed loop and often fall short in accurately predicting diverse electrolyte formulation properties. In this work, we introduce Uni-ELF, a novel multi-level representation learning framework to advance electrolyte design. Our approach involves two-stage pretraining: reconstructing three-dimensional molecular structures at the molecular level using the Uni-Mol model, and predicting statistical structural properties (e.g., radial distribution functions) from molecular dynamics simulations at the mixture level. Through this comprehensive pretraining, Uni-ELF is able to capture intricate molecular and mixture-level information, which significantly enhances its predictive capability. As a result, Uni-ELF substantially outperforms state-of-the-art methods in predicting both molecular properties (e.g., melting point, boiling point, synthesizability) and formulation properties (e.g., conductivity, Coulombic efficiency). Moreover, Uni-ELF can be seamlessly integrated into an automatic experimental design workflow. We believe this innovative framework will pave the way for automated AI-based electrolyte design and engineering.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
$\text{Memory}^3$: Language Modeling with Explicit Memory
Authors:
Hongkang Yang,
Zehao Lin,
Wenjin Wang,
Hao Wu,
Zhiyu Li,
Bo Tang,
Wenqiang Wei,
Jinbo Wang,
Zeyun Tang,
Shichao Song,
Chenyang Xi,
Yu Yu,
Kai Chen,
Feiyu Xiong,
Linpeng Tang,
Weinan E
Abstract:
The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled…
▽ More
The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named $\text{Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Uni-Mol2: Exploring Molecular Pretraining Model at Scale
Authors:
Xiaohong Ji,
Zhen Wang,
Zhifeng Gao,
Hang Zheng,
Linfeng Zhang,
Guolin Ke,
Weinan E
Abstract:
In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining mo…
▽ More
In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining models remains unexplored. In this work, we present Uni-Mol2 , an innovative molecular pretraining model that leverages a two-track transformer to effectively integrate features at the atomic level, graph level, and geometry structure level. Along with this, we systematically investigate the scaling law within molecular pretraining models, characterizing the power-law correlations between validation loss and model size, dataset size, and computational resources. Consequently, we successfully scale Uni-Mol2 to 1.1 billion parameters through pretraining on 800 million conformations, making it the largest molecular pretraining model to date. Extensive experiments show consistent improvement in the downstream tasks as the model size grows. The Uni-Mol2 with 1.1B parameters also outperforms existing methods, achieving an average 27% improvement on the QM9 and 14% on COMPAS-1D dataset.
△ Less
Submitted 1 July, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
Improving Generalization and Convergence by Enhancing Implicit Regularization
Authors:
Mingze Wang,
Haotian He,
Jinbo Wang,
Zilin Wang,
Guanhua Huang,
Feiyu Xiong,
Zhiyu Li,
Weinan E,
Lei Wu
Abstract:
In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I…
▽ More
In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Coarse-graining conformational dynamics with multi-dimensional generalized Langevin equation: how, when, and why
Authors:
Pinchen Xie,
Yunrui Qiu,
Weinan E
Abstract:
A data-driven ab initio generalized Langevin equation (AIGLE) approach is developed to learn and simulate high-dimensional, heterogeneous, coarse-grained conformational dynamics. Constrained by the fluctuation-dissipation theorem, the approach can build coarse-grained models in dynamical consistency with all-atom molecular dynamics. We also propose practical criteria for AIGLE to enforce long-term…
▽ More
A data-driven ab initio generalized Langevin equation (AIGLE) approach is developed to learn and simulate high-dimensional, heterogeneous, coarse-grained conformational dynamics. Constrained by the fluctuation-dissipation theorem, the approach can build coarse-grained models in dynamical consistency with all-atom molecular dynamics. We also propose practical criteria for AIGLE to enforce long-term dynamical consistency. Case studies of a toy polymer, with 20 coarse-grained sites, and the alanine dipeptide, with two dihedral angles, elucidate why one should adopt AIGLE or its Markovian limit for modeling coarse-grained conformational dynamics in practice.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
On the semiclassical bounce with strong minimal assumptions
Authors:
Wagno Cesar e Silva,
Ilya L. Shapiro
Abstract:
We explore the possibility of avoiding cosmological singularity with a bounce solution in the early Universe. The main finding is that simple and well-known semiclassical correction, which describes the mixing of radiation and gravity in the effective action, may provide an analytic solution with a bounce. The solution requires a positive beta function for the total radiation term and the contract…
▽ More
We explore the possibility of avoiding cosmological singularity with a bounce solution in the early Universe. The main finding is that simple and well-known semiclassical correction, which describes the mixing of radiation and gravity in the effective action, may provide an analytic solution with a bounce. The solution requires a positive beta function for the total radiation term and the contraction of the Universe at the initial instant. The numerical estimate shows that the bounce may occur in an acceptable range of energies, but only under strong assumptions about the particle physics beyond the Standard Model.
△ Less
Submitted 26 April, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling
Authors:
Mingze Wang,
Weinan E
Abstract:
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit…
▽ More
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads. These theoretical insights are validated experimentally and offer natural suggestions for alternative architectures.
△ Less
Submitted 2 July, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Anchor function: a type of benchmark functions for studying language models
Authors:
Zhongwang Zhang,
Zhiwei Wang,
Junjie Yao,
Zhangchen Zhou,
Xiaolong Li,
Weinan E,
Zhi-Qin John Xu
Abstract:
Understanding transformer-based language models is becoming increasingly crucial, particularly as they play pivotal roles in advancing towards artificial general intelligence. However, language model research faces significant challenges, especially for academic research groups with constrained resources. These challenges include complex data structures, unknown target functions, high computationa…
▽ More
Understanding transformer-based language models is becoming increasingly crucial, particularly as they play pivotal roles in advancing towards artificial general intelligence. However, language model research faces significant challenges, especially for academic research groups with constrained resources. These challenges include complex data structures, unknown target functions, high computational costs and memory requirements, and a lack of interpretability in the inference process, etc. Drawing a parallel to the use of simple models in scientific research, we propose the concept of an anchor function. This is a type of benchmark function designed for studying language models in learning tasks that follow an "anchor-key" pattern. By utilizing the concept of an anchor function, we can construct a series of functions to simulate various language tasks. The anchor function plays a role analogous to that of mice in diabetes research, particularly suitable for academic research. We demonstrate the utility of the anchor function with an example, revealing two basic operations by attention structures in language models: shifting tokens and broadcasting one token from one position to many positions. These operations are also commonly observed in large language models. The anchor function framework, therefore, opens up a series of valuable and accessible research questions for further exploration, especially for theoretical study.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Solving multiscale dynamical systems by deep learning
Authors:
Zhi-Qin John Xu,
Junjie Yao,
Yuxiao Yi,
Liangkai Hang,
Weinan E,
Yaoyu Zhang,
Tianhan Zhang
Abstract:
Multiscale dynamical systems, modeled by high-dimensional stiff ordinary differential equations (ODEs) with wide-ranging characteristic timescales, arise across diverse fields of science and engineering, but their numerical solvers often encounter severe efficiency bottlenecks. This paper introduces a novel DeePODE method, which consists of a global multiscale sampling method and a fitting by deep…
▽ More
Multiscale dynamical systems, modeled by high-dimensional stiff ordinary differential equations (ODEs) with wide-ranging characteristic timescales, arise across diverse fields of science and engineering, but their numerical solvers often encounter severe efficiency bottlenecks. This paper introduces a novel DeePODE method, which consists of a global multiscale sampling method and a fitting by deep neural networks to handle multiscale systems. DeePODE's primary contribution is to address the multiscale challenge of efficiently uncovering representative training sets by combining the Monte Carlo method and the ODE system's intrinsic evolution without suffering from the ``curse of dimensionality''. The DeePODE method is validated in multiscale systems from diverse areas, including a predator-prey model, a power system oscillation, a battery electrolyte auto-ignition, and turbulent flames. Our methods exhibit strong generalization capabilities to unseen conditions, highlighting the power of deep learning in modeling intricate multiscale dynamical processes across science and engineering domains.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
DPA-2: Towards a universal large atomic model for molecular and material simulation
Authors:
Duo Zhang,
Xinzijian Liu,
Xiangyu Zhang,
Chengqian Zhang,
Chun Cai,
Hangrui Bi,
Yiming Du,
Xuejian Qin,
Jiameng Huang,
Bowen Li,
Yifan Shan,
Jinzhe Zeng,
Yuzhi Zhang,
Siyuan Liu,
Yifan Li,
Junhan Chang,
Xinyan Wang,
Shuo Zhou,
Jianchuan Liu,
Xiaoshan Luo,
Zhenyu Wang,
Wanrun Jiang,
Jing Wu,
Yudi Yang,
Jiyuan Yang
, et al. (17 additional authors not shown)
Abstract:
The rapid development of artificial intelligence (AI) is driving significant changes in the field of atomic modeling, simulation, and design. AI-based potential energy models have been successfully used to perform large-scale and long-time simulations with the accuracy of ab initio electronic structure methods. However, the model generation process still hinders applications at scale. We envision…
▽ More
The rapid development of artificial intelligence (AI) is driving significant changes in the field of atomic modeling, simulation, and design. AI-based potential energy models have been successfully used to perform large-scale and long-time simulations with the accuracy of ab initio electronic structure methods. However, the model generation process still hinders applications at scale. We envision that the next stage would be a model-centric ecosystem, in which a large atomic model (LAM), pre-trained with as many atomic datasets as possible and can be efficiently fine-tuned and distilled to downstream tasks, would serve the new infrastructure of the field of molecular modeling. We propose DPA-2, a novel architecture for a LAM, and develop a comprehensive pipeline for model fine-tuning, distillation, and application, associated with automatic workflows. We show that DPA-2 can accurately represent a diverse range of chemical systems and materials, enabling high-quality simulations and predictions with significantly reduced efforts compared to traditional methods. Our approach paves the way for a universal large atomic model that can be widely applied in molecular and material simulation research, opening new opportunities for scientific discoveries and industrial applications.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
SUSY QED with Lorentz-asymmetric fermionic matter and a glance at the electron's EDM
Authors:
João Paulo S. Melo,
Wagno Cesar e Silva,
José A. Helayël-Neto
Abstract:
This contribution sets out to pursue the investigation of a supersymmetric electrodynamics model with Lorentz-symmetry violation (LSV) manifested by a space-time unbalance in the propagation of the fermionic charged matter. Despite violation of Lorentz symmetry, the supersymmetry algebra is kept untouched. We then adopt a superspace approach to build up an $\mathcal{N}=1$-supersymmetric Abelian ga…
▽ More
This contribution sets out to pursue the investigation of a supersymmetric electrodynamics model with Lorentz-symmetry violation (LSV) manifested by a space-time unbalance in the propagation of the fermionic charged matter. Despite violation of Lorentz symmetry, the supersymmetry algebra is kept untouched. We then adopt a superspace approach to build up an $\mathcal{N}=1$-supersymmetric Abelian gauge theory in presence of a Lorentz-violating background supermultiplet that accommodates the space-time asymmetry parameter of the charged matter. We describe, in this scenario, how the particular Lorentz-symmetry breaking, brought about by the fermionic matter, affects its (matter) scalar partners and the photon/photino that minimally couple to charged matter. From the (modified) Dirac, Klein-Gordon and Maxwell field equations, we work out the corresponding dispersion relations to inspect and discuss the physical effects of the LSV Majorana fermion condensates that naturally emerge from the background supermultiplet. Finally, we target efforts to investigate the Gordon decomposition of the charged lepton electromagnetic current. This is carried out by iterating the (fermion and scalar) matter field equations, which points to an effective contribution to the electron's electric dipole moment (EDM). This result allows us to attain an estimate of the pseudo-vector condensate of the (LSV) Majorana background fermion.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Learning Free Terminal Time Optimal Closed-loop Control of Manipulators
Authors:
Wei Hu,
Yue Zhao,
Weinan E,
Jiequn Han,
Jihao Long
Abstract:
This paper presents a novel approach to learning free terminal time closed-loop control for robotic manipulation tasks, enabling dynamic adjustment of task duration and control inputs to enhance performance. We extend the supervised learning approach, namely solving selected optimal open-loop problems and utilizing them as training data for a policy network, to the free terminal time scenario. Thr…
▽ More
This paper presents a novel approach to learning free terminal time closed-loop control for robotic manipulation tasks, enabling dynamic adjustment of task duration and control inputs to enhance performance. We extend the supervised learning approach, namely solving selected optimal open-loop problems and utilizing them as training data for a policy network, to the free terminal time scenario. Three main challenges are addressed in this extension. First, we introduce a marching scheme that enhances the solution quality and increases the success rate of the open-loop solver by gradually refining time discretization. Second, we extend the QRnet in Nakamura-Zimmerer et al. (2021b) to the free terminal time setting to address discontinuity and improve stability at the terminal state. Third, we present a more automated version of the initial value problem (IVP) enhanced sampling method from previous work (Zhang et al., 2022) to adaptively update the training dataset, significantly improving its quality. By integrating these techniques, we develop a closed-loop policy that operates effectively over a broad domain with varying optimal time durations, achieving near globally optimal total costs.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
DeePTB: A deep learning-based tight-binding approach with $ab$ $initio$ accuracy
Authors:
Qiangqiang Gu,
Zhanghao Zhouyin,
Shishir Kumar Pandey,
Peng Zhang,
Linfeng Zhang,
Weinan E
Abstract:
Simulating electronic behavior in materials and devices with realistic large system sizes remains a formidable task within the $ab$ $initio$ framework. We propose DeePTB, an efficient deep learning-based tight-binding (TB) approach with $ab$ $initio$ accuracy to address this issue. By training with $ab$ $initio$ eigenvalues, our method can efficiently predict TB Hamiltonians for unseen structures.…
▽ More
Simulating electronic behavior in materials and devices with realistic large system sizes remains a formidable task within the $ab$ $initio$ framework. We propose DeePTB, an efficient deep learning-based tight-binding (TB) approach with $ab$ $initio$ accuracy to address this issue. By training with $ab$ $initio$ eigenvalues, our method can efficiently predict TB Hamiltonians for unseen structures. This capability facilitates efficient simulation of large-size systems under external perturbations like strain, which are vital for semiconductor band gap engineering. Moreover, DeePTB, combined with molecular dynamics, can be used to perform efficient and accurate finite temperature simulations of both atomic and electronic behavior simultaneously. This is demonstrated by computing the temperature-dependent properties of a GaP system with $10^6$ atoms.
△ Less
Submitted 11 July, 2023; v1 submitted 10 July, 2023;
originally announced July 2023.
-
Invertible Coarse Graining with Physics-Informed Generative Artificial Intelligence
Authors:
Jun Zhang,
Xiaohan Lin,
Weinan E,
Yi Qin Gao
Abstract:
Multiscale molecular modeling is widely applied in scientific research of molecular properties over large time and length scales. Two specific challenges are commonly present in multiscale modeling, provided that information between the coarse and fine representations of molecules needs to be properly exchanged: One is to construct coarse grained models by passing information from the fine to coar…
▽ More
Multiscale molecular modeling is widely applied in scientific research of molecular properties over large time and length scales. Two specific challenges are commonly present in multiscale modeling, provided that information between the coarse and fine representations of molecules needs to be properly exchanged: One is to construct coarse grained models by passing information from the fine to coarse levels; the other is to restore finer molecular details given coarse grained configurations. Although these two problems are commonly addressed independently, in this work, we present a theory connecting them, and develop a methodology called Cycle Coarse Graining (CCG) to solve both problems in a unified manner. In CCG, reconstruction can be achieved via a tractable deep generative model, allowing retrieval of fine details from coarse-grained simulations. The reconstruction in turn delivers better coarse-grained models which are informed of the fine-grained physics, and enables calculation of the free energies in a rare-event-free manner. CCG thus provides a systematic way for multiscale molecular modeling, where the finer details of coarse-grained simulations can be efficiently retrieved, and the coarse-grained models can be improved consistently.
△ Less
Submitted 20 July, 2024; v1 submitted 2 May, 2023;
originally announced May 2023.
-
DeePMD-kit v2: A software package for Deep Potential models
Authors:
Jinzhe Zeng,
Duo Zhang,
Denghui Lu,
Pinghui Mo,
Zeyu Li,
Yixiao Chen,
Marián Rynik,
Li'ang Huang,
Ziyao Li,
Shaochen Shi,
Yingze Wang,
Haotian Ye,
Ping Tuo,
Jiabin Yang,
Ye Ding,
Yifan Li,
Davide Tisi,
Qiyu Zeng,
Han Bao,
Yu Xia,
Jiameng Huang,
Koki Muraoka,
Yibo Wang,
Junhan Chang,
Fengbo Yuan
, et al. (22 additional authors not shown)
Abstract:
DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials (MLP) known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced…
▽ More
DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials (MLP) known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensile properties, type embedding, model deviation, Deep Potential - Range Correction (DPRc), Deep Potential Long Range (DPLR), GPU support for customized operators, model compression, non-von Neumann molecular dynamics (NVNMD), and improved usability, including documentation, compiled binary packages, graphical user interfaces (GUI), and application programming interfaces (API). This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, the article benchmarks the accuracy and efficiency of different models and discusses ongoing developments.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
The Random Feature Method for Time-dependent Problems
Authors:
Jingrun Chen,
Weinan E,
Yixin Luo
Abstract:
We present a framework for solving time-dependent partial differential equations (PDEs) in the spirit of the random feature method. The numerical solution is constructed using a space-time partition of unity and random feature functions. Two different ways of constructing the random feature functions are investigated: feature functions that treat the spatial and temporal variables (STC) on the sam…
▽ More
We present a framework for solving time-dependent partial differential equations (PDEs) in the spirit of the random feature method. The numerical solution is constructed using a space-time partition of unity and random feature functions. Two different ways of constructing the random feature functions are investigated: feature functions that treat the spatial and temporal variables (STC) on the same footing, or functions that are the product of two random feature functions depending on spatial and temporal variables separately (SoV). Boundary and initial conditions are enforced by penalty terms. We also study two ways of solving the resulting least-squares problem: the problem is solved as a whole or solved using the block time-marching strategy. The former is termed ``the space-time random feature method'' (ST-RFM). Numerical results for a series of problems show that the proposed method, i.e. ST-RFM with STC and ST-RFM with SoV, have spectral accuracy in both space and time. In addition, ST-RFM only requires collocation points, not a mesh. This is important for solving problems with complex geometry. We demonstrate this by using ST-RFM to solve a two-dimensional wave equation over a complex domain. The two strategies differ significantly in terms of the behavior in time. In the case when block time-marching is used, we prove a lower error bound that shows an exponentially growing factor with respect to the number of blocks in time. For ST-RFM, we prove an upper bound with a sublinearly growing factor with respect to the number of subdomains in time. These estimates are also confirmed by numerical results.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
MAC: A unified framework boosting low resource automatic speech recognition
Authors:
Zeping Min,
Qian Ge,
Zhong Li,
Weinan E
Abstract:
We propose a unified framework for low resource automatic speech recognition tasks named meta audio concatenation (MAC). It is easy to implement and can be carried out in extremely low resource environments. Mathematically, we give a clear description of MAC framework from the perspective of bayesian sampling. In this framework, we leverage a novel concatenative synthesis text-to-speech system to…
▽ More
We propose a unified framework for low resource automatic speech recognition tasks named meta audio concatenation (MAC). It is easy to implement and can be carried out in extremely low resource environments. Mathematically, we give a clear description of MAC framework from the perspective of bayesian sampling. In this framework, we leverage a novel concatenative synthesis text-to-speech system to boost the low resource ASR task. By the concatenative synthesis text-to-speech system, we can integrate language pronunciation rules and adjust the TTS process. Furthermore, we propose a broad notion of meta audio set to meet the modeling needs of different languages and different scenes when using the system. Extensive experiments have demonstrated the great effectiveness of MAC on low resource ASR tasks. For CTC greedy search, CTC prefix, attention, and attention rescoring decode mode in Cantonese ASR task, Taiwanese ASR task, and Japanese ASR task the MAC method can reduce the CER by more than 15\%. Furthermore, in the ASR task, MAC beats wav2vec2 (with fine-tuning) on common voice datasets of Cantonese and gets really competitive results on common voice datasets of Taiwanese and Japanese. Among them, it is worth mentioning that we achieve a \textbf{10.9\%} character error rate (CER) on the common voice Cantonese ASR task, bringing about \textbf{30\%} relative improvement compared to the wav2vec2 (with fine-tuning).
△ Less
Submitted 15 February, 2023; v1 submitted 5 February, 2023;
originally announced February 2023.
-
Effective approach to the Antoniadis-Mottola model: quantum decoupling of the higher derivative terms
Authors:
Wagno Cesar e Silva,
Ilya L. Shapiro
Abstract:
We explore the decoupling of massive ghost mode in the $4D$ (four-dimensional) theory of the conformal factor of the metric. The model was introduced by Antoniadis and Mottola in [1] and can be regarded as a close analog of the fourth-derivative quantum gravity. The analysis of the derived one-loop nonlocal form factors includes their asymptotic behavior in the UV and IR limits. In the UV (high en…
▽ More
We explore the decoupling of massive ghost mode in the $4D$ (four-dimensional) theory of the conformal factor of the metric. The model was introduced by Antoniadis and Mottola in [1] and can be regarded as a close analog of the fourth-derivative quantum gravity. The analysis of the derived one-loop nonlocal form factors includes their asymptotic behavior in the UV and IR limits. In the UV (high energy) domain, our results reproduce the Minimal Subtraction scheme-based beta functions of [1]. In the IR (i.e., at low energies), the diagrams with massive ghost internal lines collapse into tadpole-type graphs without nonlocal contributions and become irrelevant. On the other hand, those structures that contribute to the running of parameters of the action and survive in the IR, are well-correlated with the divergent part (or the leading in UV contributions to the form factors), coming from the effective low-energy theory of the conformal factor. This effective theory describes only the light propagating mode. Finally, we discuss whether these results may shed light on the possible running of the cosmological constant at low energies.
△ Less
Submitted 12 July, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Hybrid Auxiliary Field Quantum Monte Carlo for Molecular Systems
Authors:
Yixiao Chen,
Linfeng Zhang,
Weinan E,
Roberto Car
Abstract:
We propose a quantum Monte Carlo approach to solve the ground state many-body Schrodinger equation for the electronic ground state. The method combines optimization from variational Monte Carlo and propagation from auxiliary field quantum Monte Carlo, in a way that significantly alleviates the sign problem. In application to molecular systems, we obtain highly accurate results for configurations d…
▽ More
We propose a quantum Monte Carlo approach to solve the ground state many-body Schrodinger equation for the electronic ground state. The method combines optimization from variational Monte Carlo and propagation from auxiliary field quantum Monte Carlo, in a way that significantly alleviates the sign problem. In application to molecular systems, we obtain highly accurate results for configurations dominated by either dynamic or static electronic correlation.
△ Less
Submitted 20 April, 2023; v1 submitted 19 November, 2022;
originally announced November 2022.
-
Ab Initio Generalized Langevin Equation
Authors:
Pinchen Xie,
Roberto Car,
Weinan E
Abstract:
We introduce a machine learning-based approach called ab initio generalized Langevin equation (AIGLE) to model the dynamics of slow collective variables in materials and molecules. In this scheme, the parameters are learned from atomistic simulations based on ab initio quantum mechanical models. Force field, memory kernel, and noise generator are constructed in the context of the Mori-Zwanzig form…
▽ More
We introduce a machine learning-based approach called ab initio generalized Langevin equation (AIGLE) to model the dynamics of slow collective variables in materials and molecules. In this scheme, the parameters are learned from atomistic simulations based on ab initio quantum mechanical models. Force field, memory kernel, and noise generator are constructed in the context of the Mori-Zwanzig formalism, under the constraint of the fluctuation-dissipation theorem. Combined with deep potential molecular dynamics and electronic density functional theory, this approach opens the way to multi-scale modeling in a variety of situations. Here, we demonstrate this capability with a study of two mesoscale processes in crystalline lead titanate, namely the field-driven dynamics of a planar ferroelectric domain wall, and the dynamics of an extensive lattice of coarse-grained electric dipoles. In the first case, AIGLE extends the reach of ab initio simulations to a regime of noise-driven motions not accessible to molecular dynamics. In the second case, AIGLE deals with an extensive set of collective variables by adopting a local approximation for the memory kernel and retaining only short-range noise correlations. The scheme is computationally more efficient than molecular dynamics by several orders of magnitude, and mimics the microscopic dynamics at low frequencies where it reproduces accurately the dominant far-infrared absorption frequency.
△ Less
Submitted 15 February, 2024; v1 submitted 11 November, 2022;
originally announced November 2022.
-
Initial Value Problem Enhanced Sampling for Closed-Loop Optimal Control Design with Deep Neural Networks
Authors:
Xuanxi Zhang,
Jihao Long,
Wei Hu,
Weinan E,
Jiequn Han
Abstract:
Closed-loop optimal control design for high-dimensional nonlinear systems has been a long-standing challenge. Traditional methods, such as solving the associated Hamilton-Jacobi-Bellman equation, suffer from the curse of dimensionality. Recent literature proposed a new promising approach based on supervised learning, by leveraging powerful open-loop optimal control solvers to generate training dat…
▽ More
Closed-loop optimal control design for high-dimensional nonlinear systems has been a long-standing challenge. Traditional methods, such as solving the associated Hamilton-Jacobi-Bellman equation, suffer from the curse of dimensionality. Recent literature proposed a new promising approach based on supervised learning, by leveraging powerful open-loop optimal control solvers to generate training data and neural networks as efficient high-dimensional function approximators to fit the closed-loop optimal control. This approach successfully handles certain high-dimensional optimal control problems but still performs poorly on more challenging problems. One of the crucial reasons for the failure is the so-called distribution mismatch phenomenon brought by the controlled dynamics. In this paper, we investigate this phenomenon and propose the initial value problem enhanced sampling method to mitigate this problem. We theoretically prove that this sampling strategy improves over the vanilla strategy on the classical linear-quadratic regulator by a factor proportional to the total time duration. We further numerically demonstrate that the proposed sampling strategy significantly improves the performance on tested control problems, including the optimal landing problem of a quadrotor and the optimal reaching problem of a 7 DoF manipulator.
△ Less
Submitted 9 July, 2023; v1 submitted 8 September, 2022;
originally announced September 2022.
-
Bridging Traditional and Machine Learning-based Algorithms for Solving PDEs: The Random Feature Method
Authors:
Jingrun Chen,
Xurong Chi,
Weinan E,
Zhouwang Yang
Abstract:
One of the oldest and most studied subject in scientific computing is algorithms for solving partial differential equations (PDEs). A long list of numerical methods have been proposed and successfully used for various applications. In recent years, deep learning methods have shown their superiority for high-dimensional PDEs where traditional methods fail. However, for low dimensional problems, it…
▽ More
One of the oldest and most studied subject in scientific computing is algorithms for solving partial differential equations (PDEs). A long list of numerical methods have been proposed and successfully used for various applications. In recent years, deep learning methods have shown their superiority for high-dimensional PDEs where traditional methods fail. However, for low dimensional problems, it remains unclear whether these methods have a real advantage over traditional algorithms as a direct solver. In this work, we propose the random feature method (RFM) for solving PDEs, a natural bridge between traditional and machine learning-based algorithms. RFM is based on a combination of well-known ideas: 1. representation of the approximate solution using random feature functions; 2. collocation method to take care of the PDE; 3. the penalty method to treat the boundary conditions, which allows us to treat the boundary condition and the PDE in the same footing. We find it crucial to add several additional components including multi-scale representation and rescaling the weights in the loss function. We demonstrate that the method exhibits spectral accuracy and can compete with traditional solvers in terms of both accuracy and efficiency. In addition, we find that RFM is particularly suited for complex problems with complex geometry, where both traditional and machine learning-based algorithms encounter difficulties.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
Measurement of the branching fraction for the decay $B \to K^{\ast}(892)\ell^+\ell^-$ at Belle II
Authors:
Belle II Collaboration,
F. Abudinén,
I. Adachi,
R. Adak,
K. Adamczyk,
L. Aggarwal,
P. Ahlburg,
H. Ahmed,
J. K. Ahn,
H. Aihara,
N. Akopov,
A. Aloisio,
F. Ameli,
L. Andricek,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
V. Aulchenko,
T. Aushev,
V. Aushev,
T. Aziz,
V. Babu,
S. Bacher,
H. Bae,
S. Baehr
, et al. (569 additional authors not shown)
Abstract:
We report a measurement of the branching fraction of $B \to K^{\ast}(892)\ell^+\ell^-$ decays, where $\ell^+\ell^- = μ^+μ^-$ or $e^+e^-$, using electron-positron collisions recorded at an energy at or near the $Υ(4S)$ mass and corresponding to an integrated luminosity of $189$ fb$^{-1}$. The data was collected during 2019--2021 by the Belle II experiment at the SuperKEKB $e^{+}e^{-}$ asymmetric-en…
▽ More
We report a measurement of the branching fraction of $B \to K^{\ast}(892)\ell^+\ell^-$ decays, where $\ell^+\ell^- = μ^+μ^-$ or $e^+e^-$, using electron-positron collisions recorded at an energy at or near the $Υ(4S)$ mass and corresponding to an integrated luminosity of $189$ fb$^{-1}$. The data was collected during 2019--2021 by the Belle II experiment at the SuperKEKB $e^{+}e^{-}$ asymmetric-energy collider. We reconstruct $K^{\ast}(892)$ candidates in the $K^+π^-$, $K_{S}^{0}π^+$, and $K^+π^0$ final states. The signal yields with statistical uncertainties are $22\pm 6$, $18 \pm 6$, and $38 \pm 9$ for the decays $B \to K^{\ast}(892)μ^+μ^-$, $B \to K^{\ast}(892)e^+e^-$, and $B \to K^{\ast}(892)\ell^+\ell^-$, respectively. We measure the branching fractions of these decays for the entire range of the dilepton mass, excluding the very low mass region to suppress the $B \to K^{\ast}(892)γ(\to e^+e^-)$ background and regions compatible with decays of charmonium resonances, to be \begin{equation} {\cal B}(B \to K^{\ast}(892)μ^+μ^-) = (1.19 \pm 0.31 ^{+0.08}_{-0.07}) \times 10^{-6}, {\cal B}(B \to K^{\ast}(892)e^+e^-) = (1.42 \pm 0.48 \pm 0.09)\times 10^{-6}, {\cal B}(B \to K^{\ast}(892)\ell^+\ell^-) = (1.25 \pm 0.30 ^{+0.08}_{-0.07}) \times 10^{-6}, \end{equation} where the first and second uncertainties are statistical and systematic, respectively. These results, limited by sample size, are the first measurements of $B \to K^{\ast}(892)\ell^+\ell^-$ branching fractions from the Belle II experiment.
△ Less
Submitted 19 September, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Ab initio multi-scale modeling of ferroelectrics: The case of PbTiO3
Authors:
Pinchen Xie,
Yixiao Chen,
Weinan E,
Roberto Car
Abstract:
We report an ab initio multi-scale study of lead titanate using the Deep Potential (DP) models, a family of machine learning-based atomistic models, trained on first-principles density functional theory data, to represent potential and polarization surfaces. Our approach includes anharmonic effects beyond the limitations of reduced models and of the linear approximation for the polarization. The c…
▽ More
We report an ab initio multi-scale study of lead titanate using the Deep Potential (DP) models, a family of machine learning-based atomistic models, trained on first-principles density functional theory data, to represent potential and polarization surfaces. Our approach includes anharmonic effects beyond the limitations of reduced models and of the linear approximation for the polarization. The calculated enthalpy, spontaneous polarization, specific heat and dielectric susceptibility agree well with experiments on single crystals. In addition, we study how the free energy depends on the polarization with enhanced sampling methods, further supporting the first-order and order-disorder character of the transition. The latter is evidenced by persistence of local dipoles above the transition temperature. The simulated free energy surface as a function of the global polarization leads to a Landau-Devonshire theory of the single domain crystal.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Solving optimal control of rigid-body dynamics with collisions using the hybrid minimum principle
Authors:
Wei Hu,
Jihao Long,
Yaohua Zang,
Weinan E,
Jiequn Han
Abstract:
Collisions are common in many dynamical systems with real applications. They can be formulated as hybrid dynamical systems with discontinuities automatically triggered when states transverse certain manifolds. We present an algorithm for the optimal control problem of such hybrid dynamical systems based on solving the equations derived from the hybrid minimum principle (HMP). The algorithm is an i…
▽ More
Collisions are common in many dynamical systems with real applications. They can be formulated as hybrid dynamical systems with discontinuities automatically triggered when states transverse certain manifolds. We present an algorithm for the optimal control problem of such hybrid dynamical systems based on solving the equations derived from the hybrid minimum principle (HMP). The algorithm is an iterative scheme following the spirit of the method of successive approximations (MSA), and it is robust to undesired collisions observed in the initial guesses. We analyze the discontinuities in the system and propose a stable collision condition, which is crucial for the convergence of iterative algorithms in systems experiencing collisions. Subsequently, we establish a convergence theorem demonstrating linear convergence for the MSA algorithm when collisions are present. We also address several numerical challenges introduced by the discontinuities. The algorithm is tested on disc collision problems whose optimal solutions exhibit one or multiple collisions. Linear convergence in terms of iteration steps and asymptotic first-order accuracy in terms of time discretization are observed when the algorithm is implemented with the forward-Euler scheme. The numerical results demonstrate that the proposed algorithm has better accuracy and convergence than direct methods based on gradient descent. Furthermore, the algorithm is also simpler, more accurate, and more stable than a deep reinforcement learning method.
△ Less
Submitted 10 May, 2023; v1 submitted 17 May, 2022;
originally announced May 2022.
-
Empowering Optimal Control with Machine Learning: A Perspective from Model Predictive Control
Authors:
Weinan E,
Jiequn Han,
Jihao Long
Abstract:
Solving complex optimal control problems have confronted computational challenges for a long time. Recent advances in machine learning have provided us with new opportunities to address these challenges. This paper takes model predictive control, a popular optimal control method, as the primary example to survey recent progress that leverages machine learning techniques to empower optimal control…
▽ More
Solving complex optimal control problems have confronted computational challenges for a long time. Recent advances in machine learning have provided us with new opportunities to address these challenges. This paper takes model predictive control, a popular optimal control method, as the primary example to survey recent progress that leverages machine learning techniques to empower optimal control solvers. We also discuss some of the main challenges encountered when applying machine learning to develop more robust optimal control algorithms.
△ Less
Submitted 20 July, 2022; v1 submitted 16 May, 2022;
originally announced May 2022.
-
A Machine Learning Enhanced Algorithm for the Optimal Landing Problem
Authors:
Yaohua Zang,
Jihao Long,
Xuanxi Zhang,
Wei Hu,
Weinan E,
Jiequn Han
Abstract:
We propose a machine learning enhanced algorithm for solving the optimal landing problem. Using Pontryagin's minimum principle, we derive a two-point boundary value problem for the landing problem. The proposed algorithm uses deep learning to predict the optimal landing time and a space-marching technique to provide good initial guesses for the boundary value problem solver. The performance of the…
▽ More
We propose a machine learning enhanced algorithm for solving the optimal landing problem. Using Pontryagin's minimum principle, we derive a two-point boundary value problem for the landing problem. The proposed algorithm uses deep learning to predict the optimal landing time and a space-marching technique to provide good initial guesses for the boundary value problem solver. The performance of the proposed method is studied using the quadrotor example, a reasonably high dimensional and strongly nonlinear system. Drastic improvement in reliability and efficiency is observed.
△ Less
Submitted 13 March, 2022;
originally announced March 2022.
-
Deep Potentials for Materials Science
Authors:
Tongqi Wen,
Linfeng Zhang,
Han Wang,
Weinan E,
David J. Srolovitz
Abstract:
To fill the gap between accurate (and expensive) ab initio calculations and efficient atomistic simulations based on empirical interatomic potentials, a new class of descriptions of atomic interactions has emerged and been widely applied; i.e., machine learning potentials (MLPs). One recently developed type of MLP is the Deep Potential (DP) method. In this review, we provide an introduction to DP…
▽ More
To fill the gap between accurate (and expensive) ab initio calculations and efficient atomistic simulations based on empirical interatomic potentials, a new class of descriptions of atomic interactions has emerged and been widely applied; i.e., machine learning potentials (MLPs). One recently developed type of MLP is the Deep Potential (DP) method. In this review, we provide an introduction to DP methods in computational materials science. The theory underlying the DP method is presented along with a step-by-step introduction to their development and use. We also review materials applications of DPs in a wide range of materials systems. The DP Library provides a platform for the development of DPs and a database of extant DPs. We discuss the accuracy and efficiency of DPs compared with ab initio methods and empirical potentials.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Trace anomaly and induced action for a metric-scalar background
Authors:
Manuel Asorey,
Wagno Cesar e Silva,
Ilya L. Shapiro,
Públio R. B. do Vale
Abstract:
The conformal anomaly and anomaly-induced effective action represent useful and economic ways to describe semiclassical contributions to the action of gravity. We discuss the anomaly in the case when the background is formed by metric and scalar fields and formulate the induced action in two standard covariant forms. The analysis of induced action at low energies reveals existing connection to the…
▽ More
The conformal anomaly and anomaly-induced effective action represent useful and economic ways to describe semiclassical contributions to the action of gravity. We discuss the anomaly in the case when the background is formed by metric and scalar fields and formulate the induced action in two standard covariant forms. The analysis of induced action at low energies reveals existing connection to the renormalization group and effective potential. The classification of anomalous terms is extended to the scalar background and ambiguities in the total derivative terms in the anomaly are considered using Pauli-Villars regularization.
△ Less
Submitted 7 February, 2023; v1 submitted 31 January, 2022;
originally announced February 2022.
-
A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics
Authors:
Tianhan Zhang,
Yuxiao Yi,
Yifan Xu,
Zhi X. Chen,
Yaoyu Zhang,
Weinan E,
Zhi-Qin John Xu
Abstract:
Machine learning has long been considered as a black box for predicting combustion chemical kinetics due to the extremely large number of parameters and the lack of evaluation standards and reproducibility. The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be. Sampling and preprocessing…
▽ More
Machine learning has long been considered as a black box for predicting combustion chemical kinetics due to the extremely large number of parameters and the lack of evaluation standards and reproducibility. The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be. Sampling and preprocessing determine the DNN training dataset, further affect DNN prediction ability. The current work proposes using Box-Cox transformation (BCT) to preprocess the combustion data. In addition, this work compares different sampling methods with or without preprocessing, including the Monte Carlo method, manifold sampling, generative neural network method (cycle-GAN), and newly-proposed multi-scale sampling. Our results reveal that the DNN trained by the manifold data can capture the chemical kinetics in limited configurations but cannot remain robust toward perturbation, which is inevitable for the DNN coupled with the flow field. The Monte Carlo and cycle-GAN samplings can cover a wider phase space but fail to capture small-scale intermediate species, producing poor prediction results. A three-hidden-layer DNN, based on the multi-scale method without specific flame simulation data, allows predicting chemical kinetics in various scenarios and being stable during the temporal evolutions. This single DNN is readily implemented with several CFD codes and validated in various combustors, including (1). zero-dimensional autoignition, (2). one-dimensional freely propagating flame, (3). two-dimensional jet flame with triple-flame structure, and (4). three-dimensional turbulent lifted flames. The results demonstrate the satisfying accuracy and generalization ability of the pre-trained DNN. The Fortran and Python versions of DNN and example code are attached in the supplementary for reproducibility.
△ Less
Submitted 12 August, 2022; v1 submitted 9 January, 2022;
originally announced January 2022.
-
A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics
Authors:
Zhiwei Wang,
Yaoyu Zhang,
Enhan Zhao,
Yiguang Ju,
Weinan E,
Zhi-Qin John Xu,
Tianhan Zhang
Abstract:
A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics is proposed and validated using high-temperature auto-ignitions, perfectly stirred reactors (PSR), and one-dimensional freely propagating flames of n-heptane/air mixtures. The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a specie…
▽ More
A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics is proposed and validated using high-temperature auto-ignitions, perfectly stirred reactors (PSR), and one-dimensional freely propagating flames of n-heptane/air mixtures. The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a species, represents a reduced mechanism. The optimization goal is to minimize the reduced mechanism size given the error tolerance of a group of pre-selected benchmark quantities. The key idea of the DeePMR is to employ a deep neural network (DNN) to formulate the objective function in the optimization problem. In order to explore high dimensional Boolean space efficiently, an iterative DNN-assisted data sampling and DNN training procedure are implemented. The results show that DNN-assistance improves sampling efficiency significantly, selecting only $10^5$ samples out of $10^{34}$ possible samples for DNN to achieve sufficient accuracy. The results demonstrate the capability of the DNN to recognize key species and reasonably predict reduced mechanism performance. The well-trained DNN guarantees the optimal reduced mechanism by solving an inverse optimization problem. By comparing ignition delay times, laminar flame speeds, temperatures in PSRs, the resulting skeletal mechanism has fewer species (45 species) but the same level of accuracy as the skeletal mechanism (56 species) obtained by the Path Flux Analysis (PFA) method. In addition, the skeletal mechanism can be further reduced to 28 species if only considering atmospheric, near-stoichiometric conditions (equivalence ratio between 0.6 and 1.2). The DeePMR provides an innovative way to perform model reduction and demonstrates the great potential of data-driven methods in the combustion area.
△ Less
Submitted 8 September, 2022; v1 submitted 6 January, 2022;
originally announced January 2022.
-
DeePN$^2$: A deep learning-based non-Newtonian hydrodynamic model
Authors:
Lidong Fang,
Pei Ge,
Lei Zhang,
Weinan E,
Huan Lei
Abstract:
A long standing problem in the modeling of non-Newtonian hydrodynamics of polymeric flows is the availability of reliable and interpretable hydrodynamic models that faithfully encode the underlying micro-scale polymer dynamics. The main complication arises from the long polymer relaxation time, the complex molecular structure and heterogeneous interaction. DeePN$^2$, a deep learning-based non-Newt…
▽ More
A long standing problem in the modeling of non-Newtonian hydrodynamics of polymeric flows is the availability of reliable and interpretable hydrodynamic models that faithfully encode the underlying micro-scale polymer dynamics. The main complication arises from the long polymer relaxation time, the complex molecular structure and heterogeneous interaction. DeePN$^2$, a deep learning-based non-Newtonian hydrodynamic model, has been proposed and has shown some success in systematically passing the micro-scale structural mechanics information to the macro-scale hydrodynamics for suspensions with simple polymer conformation and bond potential. The model retains a multi-scaled nature by mapping the polymer configurations into a set of symmetry-preserving macro-scale features. The extended constitutive laws for these macro-scale features can be directly learned from the kinetics of their micro-scale counterparts. In this paper, we develop DeePN$^2$ using more complex micro-structural models. We show that DeePN$^2$ can faithfully capture the broadly overlooked viscoelastic differences arising from the specific molecular structural mechanics without human intervention.
△ Less
Submitted 13 April, 2022; v1 submitted 29 December, 2021;
originally announced December 2021.
-
DeepHAM: A Global Solution Method for Heterogeneous Agent Models with Aggregate Shocks
Authors:
Jiequn Han,
Yucheng Yang,
Weinan E
Abstract:
An efficient, reliable, and interpretable global solution method, the Deep learning-based algorithm for Heterogeneous Agent Models (DeepHAM), is proposed for solving high dimensional heterogeneous agent models with aggregate shocks. The state distribution is approximately represented by a set of optimal generalized moments. Deep neural networks are used to approximate the value and policy function…
▽ More
An efficient, reliable, and interpretable global solution method, the Deep learning-based algorithm for Heterogeneous Agent Models (DeepHAM), is proposed for solving high dimensional heterogeneous agent models with aggregate shocks. The state distribution is approximately represented by a set of optimal generalized moments. Deep neural networks are used to approximate the value and policy functions, and the objective is optimized over directly simulated paths. In addition to being an accurate global solver, this method has three additional features. First, it is computationally efficient in solving complex heterogeneous agent models, and it does not suffer from the curse of dimensionality. Second, it provides a general and interpretable representation of the distribution over individual states, which is crucial in addressing the classical question of whether and how heterogeneity matters in macroeconomics. Third, it solves the constrained efficiency problem as easily as it solves the competitive equilibrium, which opens up new possibilities for studying optimal monetary and fiscal policies in heterogeneous agent models with aggregate shocks.
△ Less
Submitted 21 February, 2022; v1 submitted 28 December, 2021;
originally announced December 2021.
-
A deep potential model with long-range electrostatic interactions
Authors:
Linfeng Zhang,
Han Wang,
Maria Carolina Muniz,
Athanassios Z. Panagiotopoulos,
Roberto Car,
Weinan E
Abstract:
Machine learning models for the potential energy of multi-atomic systems, such as the deep potential (DP) model, make possible molecular simulations with the accuracy of quantum mechanical density functional theory, at a cost only moderately higher than that of empirical force fields. However, the majority of these models lack explicit long-range interactions and fail to describe properties that d…
▽ More
Machine learning models for the potential energy of multi-atomic systems, such as the deep potential (DP) model, make possible molecular simulations with the accuracy of quantum mechanical density functional theory, at a cost only moderately higher than that of empirical force fields. However, the majority of these models lack explicit long-range interactions and fail to describe properties that derive from the Coulombic tail of the forces. To overcome this limitation we extend the DP model by approximating the long-range electrostatic interaction between ions (nuclei+core electrons) and valence electrons with that of distributions of spherical Gaussian charges located at ionic and electronic sites. The latter are rigorously defined in terms of the centers of the maximally localized Wannier distributions, whose dependence on the local atomic environment is modeled accurately by a deep neural network. In the deep potential long-range (DPLR) model, the electrostatic energy of the Gaussian charge system is added to short-range interactions that are represented as in the standard DP model. The resulting potential energy surface is smooth and possesses analytical forces and virial. Missing effects in the standard DP scheme are recovered, improving on accuracy and predictive power. By including long-range electrostatics, DPLR correctly extrapolates to large systems the potential energy surface learned from quantum mechanical calculations on smaller systems. We illustrate the approach with three examples, the potential energy profile of the water dimer, the free energy of interaction of a water molecule with a liquid water slab, and the phonon dispersion curves of the NaCl crystal.
△ Less
Submitted 16 February, 2022; v1 submitted 26 December, 2021;
originally announced December 2021.
-
Measurement of inclusive electrons from open heavy-flavor hadron decays in $p$+$p$ collisions at $\sqrt{s} = 200$ GeV with the STAR detector
Authors:
STAR Collaboration,
M. S. Abdallah,
B. E. Aboona,
J. Adam,
L. Adamczyk,
J. R. Adams,
J. K. Adkins,
G. Agakishiev,
I. Aggarwal,
M. M. Agg arwal,
Z. Ahammed,
I. Alekseev,
D. M. Anderson,
A. Aparin,
E. C. Aschenauer,
M. U. Ashraf,
F. G. Atetalla,
A. Attri,
G. S. Averichev,
V. Bairathi,
W. Baker,
J. G. Ball Cap,
K. Barish,
A. Behera,
R. Bellwied
, et al. (372 additional authors not shown)
Abstract:
We report a new measurement of the production cross section for inclusive electrons from open heavy-flavor hadron decays as a function of transverse momentum ($p_{\rm T}$) at mid-rapidity ($|y|<$ 0.7) in $p$+$p$ collisions at $\sqrt{s} = 200$ GeV. The result is presented for 2.5 $<p_{\rm T}<$ 10 GeV/$c$ with an improved precision above 6 GeV/$c$ with respect to the previous measurements, providing…
▽ More
We report a new measurement of the production cross section for inclusive electrons from open heavy-flavor hadron decays as a function of transverse momentum ($p_{\rm T}$) at mid-rapidity ($|y|<$ 0.7) in $p$+$p$ collisions at $\sqrt{s} = 200$ GeV. The result is presented for 2.5 $<p_{\rm T}<$ 10 GeV/$c$ with an improved precision above 6 GeV/$c$ with respect to the previous measurements, providing more constraints on perturbative QCD calculations. Moreover, this measurement also provides a high-precision reference for measurements of nuclear modification factors for inclusive electrons from open-charm and -bottom hadron decays in heavy-ion collisions.
△ Less
Submitted 3 March, 2022; v1 submitted 27 September, 2021;
originally announced September 2021.
-
Subsurface Carbon-Induced Local Charge of Copper for On-Surface Displacement Reaction
Authors:
Shaoshan Wang,
Pengcheng Ding,
Zhuo Li,
Cristina Mattioli,
Wenlong E,
Ye Sun,
André Gourdon,
Lev Kantorovich,
Flemming Besenbacher,
Xueming Yang,
Miao Yu
Abstract:
Transition metal carbides have sparked unprecedented enthusiasm as high-performance catalysts in recent years. Still, the catalytic properties of copper (Cu) carbide remain unexplored. By introducing subsurface carbon (C) to Cu(111), displacement reaction of proton in carboxyl acid group with single Cu atom is demonstrated at the atomic scale and room temperature. Its occurrence is attributed to t…
▽ More
Transition metal carbides have sparked unprecedented enthusiasm as high-performance catalysts in recent years. Still, the catalytic properties of copper (Cu) carbide remain unexplored. By introducing subsurface carbon (C) to Cu(111), displacement reaction of proton in carboxyl acid group with single Cu atom is demonstrated at the atomic scale and room temperature. Its occurrence is attributed to the C-doping induced local charge of surface Cu atoms (up to +0.30 e/atom), which accelerates the rate of on-surface deprotonation via reduction of the corresponding energy barrier, thus enabling the instant displacement of a proton with a Cu atom when the molecules land on the surface. Such well-defined and robust Cu$^{δ+}$ surface based on the subsurface C doping offers a novel catalytic platform for on-surface synthesis.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
On the vector conformal models in an arbitrary dimension
Authors:
Manuel Asorey,
Lesław Rachwał,
Ilya L. Shapiro,
Wagno Cesar e Silva
Abstract:
The conventional model of the gauge vector field is invariant under the local conformal symmetry only in the four-dimensional space ($4d$). Conformal generalization to an arbitrary dimension $d$ is impossible even for the free theory, differently from scalar and fermion fields. We discuss how to overcome this restriction and eventually construct four vector conformal actions. One of these models i…
▽ More
The conventional model of the gauge vector field is invariant under the local conformal symmetry only in the four-dimensional space ($4d$). Conformal generalization to an arbitrary dimension $d$ is impossible even for the free theory, differently from scalar and fermion fields. We discuss how to overcome this restriction and eventually construct four vector conformal actions. One of these models is the particular case of the previously known conformal theory of $n$-forms and others are new, up to our knowledge. In some of these models the gauge invariance is preserved, two of the new models are described by local actions with auxiliary compensating scalar fields, and the extended version of one of these models is on shell equivalent to the last, non-analytic, purely metric version.
△ Less
Submitted 3 October, 2021; v1 submitted 27 July, 2021;
originally announced July 2021.
-
MOD-Net: A Machine Learning Approach via Model-Operator-Data Network for Solving PDEs
Authors:
Lulu Zhang,
Tao Luo,
Yaoyu Zhang,
Weinan E,
Zhi-Qin John Xu,
Zheng Ma
Abstract:
In this paper, we propose a a machine learning approach via model-operator-data network (MOD-Net) for solving PDEs. A MOD-Net is driven by a model to solve PDEs based on operator representation with regularization from data. For linear PDEs, we use a DNN to parameterize the Green's function and obtain the neural operator to approximate the solution according to the Green's method. To train the DNN…
▽ More
In this paper, we propose a a machine learning approach via model-operator-data network (MOD-Net) for solving PDEs. A MOD-Net is driven by a model to solve PDEs based on operator representation with regularization from data. For linear PDEs, we use a DNN to parameterize the Green's function and obtain the neural operator to approximate the solution according to the Green's method. To train the DNN, the empirical risk consists of the mean squared loss with the least square formulation or the variational formulation of the governing equation and boundary conditions. For complicated problems, the empirical risk also includes a few labels, which are computed on coarse grid points with cheap computation cost and significantly improves the model accuracy. Intuitively, the labeled dataset works as a regularization in addition to the model constraints. The MOD-Net solves a family of PDEs rather than a specific one and is much more efficient than original neural operator because few expensive labels are required. We numerically show MOD-Net is very efficient in solving Poisson equation and one-dimensional radiative transfer equation. For nonlinear PDEs, the nonlinear MOD-Net can be similarly used as an ansatz for solving nonlinear PDEs, exemplified by solving several nonlinear PDE problems, such as the Burgers equation.
△ Less
Submitted 28 December, 2021; v1 submitted 8 July, 2021;
originally announced July 2021.
-
Generalization Error of GAN from the Discriminator's Perspective
Authors:
Hongkang Yang,
Weinan E
Abstract:
The generative adversarial network (GAN) is a well-known model for learning high-dimensional distributions, but the mechanism for its generalization ability is not understood. In particular, GAN is vulnerable to the memorization phenomenon, the eventual convergence to the empirical distribution. We consider a simplified GAN model with the generator replaced by a density, and analyze how the discri…
▽ More
The generative adversarial network (GAN) is a well-known model for learning high-dimensional distributions, but the mechanism for its generalization ability is not understood. In particular, GAN is vulnerable to the memorization phenomenon, the eventual convergence to the empirical distribution. We consider a simplified GAN model with the generator replaced by a density, and analyze how the discriminator contributes to generalization. We show that with early stopping, the generalization error measured by Wasserstein metric escapes from the curse of dimensionality, despite that in the long term, memorization is inevitable. In addition, we present a hardness of learning result for WGAN.
△ Less
Submitted 5 November, 2021; v1 submitted 8 July, 2021;
originally announced July 2021.
-
An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation
Authors:
Jihao Long,
Jiequn Han,
Weinan E
Abstract:
Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states. However, most analysis of such algorithms gives rise to error bounds that involve either the number of states or the number of features. This paper considers the situation where the function approximation is ma…
▽ More
Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states. However, most analysis of such algorithms gives rise to error bounds that involve either the number of states or the number of features. This paper considers the situation where the function approximation is made either using the kernel method or the two-layer neural network model, in the context of a fitted Q-iteration algorithm with explicit regularization. We establish an $\tilde{O}(H^3|\mathcal {A}|^{\frac14}n^{-\frac14})$ bound for the optimal policy with $Hn$ samples, where $H$ is the length of each episode and $|\mathcal {A}|$ is the size of action space. Our analysis hinges on analyzing the $L^2$ error of the approximated Q-function using $n$ data points. Even though this result still requires a finite-sized action space, the error bound is independent of the dimensionality of the state space.
△ Less
Submitted 15 February, 2022; v1 submitted 15 April, 2021;
originally announced April 2021.
-
Efficient sampling of high-dimensional free energy landscapes using adaptive reinforced dynamics
Authors:
Dongdong Wang,
Yanze Wang,
Junhan Chang,
Linfeng Zhang,
Han Wang,
Weinan E
Abstract:
Enhanced sampling methods such as metadynamics and umbrella sampling have become essential tools for exploring the configuration space of molecules and materials. At the same time, they have long faced a number of issues such as the inefficiency when dealing with a large number of collective variables (CVs) or systems with high free energy barriers. In this work, we show that with \redc{the cluste…
▽ More
Enhanced sampling methods such as metadynamics and umbrella sampling have become essential tools for exploring the configuration space of molecules and materials. At the same time, they have long faced a number of issues such as the inefficiency when dealing with a large number of collective variables (CVs) or systems with high free energy barriers. In this work, we show that with \redc{the clustering and adaptive tuning techniques}, the reinforced dynamics (RiD) scheme can be used to efficiently explore the configuration space and free energy landscapes with a large number of CVs or systems with high free energy barriers. We illustrate this by studying various representative and challenging examples. Firstly we demonstrate the efficiency of adaptive RiD compared with other methods, and construct the 9-dimensional free energy landscape of peptoid trimer which has energy barriers of more than 8 kcal/mol. We then study the folding of the protein chignolin using 18 CVs. In this case, both the folding and unfolding rates are observed to be equal to 4.30~$μs^{-1}$. Finally, we propose a protein structure refinement protocol based on RiD. This protocol allows us to efficiently employ more than 100 CVs for exploring the landscape of protein structures and it gives rise to an overall improvement of 14.6 units over the initial Global Distance Test-High Accuracy (GDT-HA) score.
△ Less
Submitted 26 December, 2021; v1 submitted 4 April, 2021;
originally announced April 2021.
-
An Expectation-Maximization Algorithm for Continuous-time Hidden Markov Models
Authors:
Qingcan Wang,
Weinan E
Abstract:
We propose a unified framework that extends the inference methods for classical hidden Markov models to continuous settings, where both the hidden states and observations occur in continuous time. Two different settings are analyzed: hidden jump process with a finite state space, and hidden diffusion process with a continuous state space. For each setting, we first estimate the hidden states given…
▽ More
We propose a unified framework that extends the inference methods for classical hidden Markov models to continuous settings, where both the hidden states and observations occur in continuous time. Two different settings are analyzed: hidden jump process with a finite state space, and hidden diffusion process with a continuous state space. For each setting, we first estimate the hidden states given the observations and model parameters, showing that the posterior distribution of the hidden states can be described by differential equations in continuous time. We then consider the estimation of unknown model parameters, deriving the continuous-time formulas for the expectation-maximization algorithm. We also propose a Monte Carlo method based on the continuous formulation, sampling the posterior distribution of the hidden states and updating the parameter estimation.
△ Less
Submitted 17 June, 2021; v1 submitted 31 March, 2021;
originally announced March 2021.
-
The Phase Diagram of a Deep Potential Water Model
Authors:
Linfeng Zhang,
Han Wang,
Roberto Car,
Weinan E
Abstract:
Using the Deep Potential methodology, we construct a model that reproduces accurately the potential energy surface of the SCAN approximation of density functional theory for water, from low temperature and pressure to about 2400 K and 50 GPa, excluding the vapor stability region. The computational efficiency of the model makes it possible to predict its phase diagram using molecular dynamics. Sati…
▽ More
Using the Deep Potential methodology, we construct a model that reproduces accurately the potential energy surface of the SCAN approximation of density functional theory for water, from low temperature and pressure to about 2400 K and 50 GPa, excluding the vapor stability region. The computational efficiency of the model makes it possible to predict its phase diagram using molecular dynamics. Satisfactory overall agreement with experimental results is obtained. The fluid phases, molecular and ionic, and all the stable ice polymorphs, ordered and disordered, are predicted correctly, with the exception of ice III and XV that are stable in experiments, but metastable in the model. The evolution of the atomic dynamics upon heating, as ice VII transforms first into ice VII$''$ and then into an ionic fluid, reveals that molecular dissociation and breaking of the ice rules coexist with strong covalent fluctuations, explaining why only partial ionization was inferred in experiments.
△ Less
Submitted 11 February, 2021; v1 submitted 9 February, 2021;
originally announced February 2021.
-
DeePKS-kit: a package for developing machine learning-based chemically accurate energy and density functional models
Authors:
Yixiao Chen,
Linfeng Zhang,
Han Wang,
Weinan E
Abstract:
We introduce DeePKS-kit, an open-source software package for developing machine learning based energy and density functional models. DeePKS-kit is interfaced with PyTorch, an open-source machine learning library, and PySCF, an ab initio computational chemistry program that provides simple and customized tools for developing quantum chemistry codes. It supports the DeePHF and DeePKS methods. In add…
▽ More
We introduce DeePKS-kit, an open-source software package for developing machine learning based energy and density functional models. DeePKS-kit is interfaced with PyTorch, an open-source machine learning library, and PySCF, an ab initio computational chemistry program that provides simple and customized tools for developing quantum chemistry codes. It supports the DeePHF and DeePKS methods. In addition to explaining the details in the methodology and the software, we also provide an example of developing a chemically accurate model for water clusters.
△ Less
Submitted 21 June, 2021; v1 submitted 29 December, 2020;
originally announced December 2020.
-
A deep learning-based ODE solver for chemical kinetics
Authors:
Tianhan Zhang,
Yaoyu Zhang,
Weinan E,
Yiguang Ju
Abstract:
Developing efficient and accurate algorithms for chemistry integration is a challenging task due to its strong stiffness and high dimensionality. The current work presents a deep learning-based numerical method called DeepCombustion0.0 to solve stiff ordinary differential equation systems. The homogeneous autoignition of DME/air mixture, including 54 species, is adopted as an example to illustrate…
▽ More
Developing efficient and accurate algorithms for chemistry integration is a challenging task due to its strong stiffness and high dimensionality. The current work presents a deep learning-based numerical method called DeepCombustion0.0 to solve stiff ordinary differential equation systems. The homogeneous autoignition of DME/air mixture, including 54 species, is adopted as an example to illustrate the validity and accuracy of the algorithm. The training and testing datasets cover a wide range of temperature, pressure, and mixture conditions between 750-1200 K, 30-50 atm, and equivalence ratio = 0.7-1.5. Both the first-stage low-temperature ignition (LTI) and the second-stage high-temperature ignition (HTI) are considered. The methodology highlights the importance of the adaptive data sampling techniques, power transform preprocessing, and binary deep neural network (DNN) design. By using the adaptive random samplings and appropriate power transforms, smooth submanifolds in the state vector phase space are observed, on which two three-layer DNNs can be appropriately trained. The neural networks are end-to-end, which predict temporal gradients of the state vectors directly. The results show that temporal evolutions predicted by DNN agree well with traditional numerical methods in all state vector dimensions, including temperature, pressure, and species concentrations. Besides, the ignition delay time differences are within 1%. At the same time, the CPU time is reduced by more than 20 times and 200 times compared with the HMTS and VODE method, respectively. The current work demonstrates the enormous potential of applying the deep learning algorithm in chemical kinetics and combustion modeling.
△ Less
Submitted 23 November, 2020;
originally announced December 2020.
-
Bounce and stability in the early cosmology with anomaly-induced corrections
Authors:
Wagno Cesar e Silva,
Ilya L. Shapiro
Abstract:
An extremely fast exponential expansion of the Universe is typical for the stable version of the inflationary model, based on the anomaly-induced action of gravity. The total amount of exponential $e$-folds could be very large, before the transition to the unstable version and the beginning of the Starobinsky inflation. Thus, the stable exponential expansion can be seen as a pre-inflationary semic…
▽ More
An extremely fast exponential expansion of the Universe is typical for the stable version of the inflationary model, based on the anomaly-induced action of gravity. The total amount of exponential $e$-folds could be very large, before the transition to the unstable version and the beginning of the Starobinsky inflation. Thus, the stable exponential expansion can be seen as a pre-inflationary semiclassical cosmological solution. We explore whether this stable phase could follow after the bounce, subsequent to the contraction of the Universe. Extending the previous consideration of the bounce, we explore both stable expansion and the bounce solutions in the models with non-zero cosmological constant and the presence of background radiation. The critical part of the analysis concerns stability for small perturbations of the Hubble parameter. It is shown that the stability is possible for the variations in the bounce region, but not in the sufficiently distant past in the contraction phase.
△ Less
Submitted 25 December, 2020; v1 submitted 18 December, 2020;
originally announced December 2020.
-
On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers
Authors:
Weinan E,
Stephan Wojtowytsch
Abstract:
A recent numerical study observed that neural network classifiers enjoy a large degree of symmetry in the penultimate layer. Namely, if $h(x) = Af(x) +b$ where $A$ is a linear map and $f$ is the output of the penultimate layer of the network (after activation), then all data points $x_{i, 1}, \dots, x_{i, N_i}$ in a class $C_i$ are mapped to a single point $y_i$ by $f$ and the points $y_i$ are loc…
▽ More
A recent numerical study observed that neural network classifiers enjoy a large degree of symmetry in the penultimate layer. Namely, if $h(x) = Af(x) +b$ where $A$ is a linear map and $f$ is the output of the penultimate layer of the network (after activation), then all data points $x_{i, 1}, \dots, x_{i, N_i}$ in a class $C_i$ are mapped to a single point $y_i$ by $f$ and the points $y_i$ are located at the vertices of a regular $k-1$-dimensional standard simplex in a high-dimensional Euclidean space.
We explain this observation analytically in toy models for highly expressive deep neural networks. In complementary examples, we demonstrate rigorously that even the final output of the classifier $h$ is not uniform over data samples from a class $C_i$ if $h$ is a shallow network (or if the deeper layers do not bring the data samples into a convenient geometric configuration).
△ Less
Submitted 4 June, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Some observations on high-dimensional partial differential equations with Barron data
Authors:
Weinan E,
Stephan Wojtowytsch
Abstract:
We use explicit representation formulas to show that solutions to certain partial differential equations lie in Barron spaces or multilayer spaces if the PDE data lie in such function spaces. Consequently, these solutions can be represented efficiently using artificial neural networks, even in high dimension. Conversely, we present examples in which the solution fails to lie in the function space…
▽ More
We use explicit representation formulas to show that solutions to certain partial differential equations lie in Barron spaces or multilayer spaces if the PDE data lie in such function spaces. Consequently, these solutions can be represented efficiently using artificial neural networks, even in high dimension. Conversely, we present examples in which the solution fails to lie in the function space associated to a neural network under consideration.
△ Less
Submitted 4 June, 2021; v1 submitted 2 December, 2020;
originally announced December 2020.
-
Generalization and Memorization: The Bias Potential Model
Authors:
Hongkang Yang,
Weinan E
Abstract:
Models for learning probability distributions such as generative models and density estimators behave quite differently from models for learning functions. One example is found in the memorization phenomenon, namely the ultimate convergence to the empirical distribution, that occurs in generative adversarial networks (GANs). For this reason, the issue of generalization is more subtle than that for…
▽ More
Models for learning probability distributions such as generative models and density estimators behave quite differently from models for learning functions. One example is found in the memorization phenomenon, namely the ultimate convergence to the empirical distribution, that occurs in generative adversarial networks (GANs). For this reason, the issue of generalization is more subtle than that for supervised learning. For the bias potential model, we show that dimension-independent generalization accuracy is achievable if early stopping is adopted, despite that in the long term, the model either memorizes the samples or diverges.
△ Less
Submitted 1 March, 2021; v1 submitted 28 November, 2020;
originally announced November 2020.
-
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning
Authors:
Pan Zhou,
Jiashi Feng,
Chao Ma,
Caiming Xiong,
Steven Hoi,
Weinan E
Abstract:
It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the heavy tails of gradient noise in these algorithms. This motivates us to analyze these algorithms thr…
▽ More
It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the heavy tails of gradient noise in these algorithms. This motivates us to analyze these algorithms through their Levy-driven stochastic differential equations (SDEs) because of the similar convergence behaviors of an algorithm and its SDE. Then we establish the escaping time of these SDEs from a local basin. The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD. So SGD is more locally unstable than ADAM~at sharp minima defined as the minima whose local basins have small Radon measure, and can better escape from them to flatter ones with larger Radon measure. As flat minima here which often refer to the minima at flat or asymmetric basins/valleys often generalize better than sharp ones , our result explains the better generalization performance of SGD over ADAM. Finally, experimental results confirm our heavy-tailed gradient noise assumption and theoretical affirmation.
△ Less
Submitted 28 November, 2021; v1 submitted 12 October, 2020;
originally announced October 2020.