-
Neural Thermodynamic Integration: Free Energies from Energy-based Diffusion Models
Authors:
Bálint Máté,
François Fleuret,
Tristan Bereau
Abstract:
Thermodynamic integration (TI) offers a rigorous method for estimating free-energy differences by integrating over a sequence of interpolating conformational ensembles. However, TI calculations are computationally expensive and typically limited to coupling a small number of degrees of freedom due to the need to sample numerous intermediate ensembles with sufficient conformational-space overlap. I…
▽ More
Thermodynamic integration (TI) offers a rigorous method for estimating free-energy differences by integrating over a sequence of interpolating conformational ensembles. However, TI calculations are computationally expensive and typically limited to coupling a small number of degrees of freedom due to the need to sample numerous intermediate ensembles with sufficient conformational-space overlap. In this work, we propose to perform TI along an alchemical pathway represented by a trainable neural network, which we term Neural TI. Critically, we parametrize a time-dependent Hamiltonian interpolating between the interacting and non-interacting systems, and optimize its gradient using a denoising-diffusion objective. The ability of the resulting energy-based diffusion model to sample all intermediate ensembles allows us to perform TI from a single reference calculation. We apply our method to Lennard-Jones fluids, where we report accurate calculations of the excess chemical potential, demonstrating that Neural TI is capable of coupling hundreds of degrees of freedom at once.
△ Less
Submitted 12 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Roadmap on Data-Centric Materials Science
Authors:
Stefan Bauer,
Peter Benner,
Tristan Bereau,
Volker Blum,
Mario Boley,
Christian Carbogno,
C. Richard A. Catlow,
Gerhard Dehm,
Sebastian Eibl,
Ralph Ernstorfer,
Ádám Fekete,
Lucas Foppa,
Peter Fratzl,
Christoph Freysoldt,
Baptiste Gault,
Luca M. Ghiringhelli,
Sajal K. Giri,
Anton Gladyshev,
Pawan Goyal,
Jason Hattrick-Simpers,
Lara Kabalan,
Petr Karpov,
Mohammad S. Khorrami,
Christoph Koch,
Sebastian Kokott
, et al. (36 additional authors not shown)
Abstract:
Science is and always has been based on data, but the terms "data-centric" and the "4th paradigm of" materials research indicate a radical change in how information is retrieved, handled and research is performed. It signifies a transformative shift towards managing vast data collections, digital repositories, and innovative data analytics methods. The integration of Artificial Intelligence (AI) a…
▽ More
Science is and always has been based on data, but the terms "data-centric" and the "4th paradigm of" materials research indicate a radical change in how information is retrieved, handled and research is performed. It signifies a transformative shift towards managing vast data collections, digital repositories, and innovative data analytics methods. The integration of Artificial Intelligence (AI) and its subset Machine Learning (ML), has become pivotal in addressing all these challenges. This Roadmap on Data-Centric Materials Science explores fundamental concepts and methodologies, illustrating diverse applications in electronic-structure theory, soft matter theory, microstructure research, and experimental techniques like photoemission, atom probe tomography, and electron microscopy. While the roadmap delves into specific areas within the broad interdisciplinary field of materials science, the provided examples elucidate key concepts applicable to a wider range of topics. The discussed instances offer insights into addressing the multifaceted challenges encountered in contemporary materials research.
△ Less
Submitted 1 May, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Condensed-phase molecular representation to link structure and thermodynamics in molecular dynamics
Authors:
Bernadette Mohr,
Diego van der Mast,
Tristan Bereau
Abstract:
Molecular design requires systematic and broadly applicable methods to extract structure-property relationships. The focus of this study is on learning thermodynamic properties from molecular-liquid simulations. The methodology relies on an atomic representation originally developed for electronic properties: the Spectrum of London and Axilrod-Teller-Muto representation (SLATM). SLATM's expansion…
▽ More
Molecular design requires systematic and broadly applicable methods to extract structure-property relationships. The focus of this study is on learning thermodynamic properties from molecular-liquid simulations. The methodology relies on an atomic representation originally developed for electronic properties: the Spectrum of London and Axilrod-Teller-Muto representation (SLATM). SLATM's expansion in one-, two-, and three-body interactions makes it amenable to probing structural ordering in molecular liquids. We show that such representation encodes enough critical information to permit the learning of thermodynamic properties via linear methods. We demonstrate our approach on the preferential insertion of small solute molecules toward cardiolipin membranes and monitor selectivity against a similar lipid. Our analysis reveals simple, interpretable relationships between two- and three-body interactions and selectivity, identifies key interactions to build optimal prototypical solutes, and charts a two-dimensional projection that displays clearly separated basins. The methodology is generally applicable to a variety of thermodynamic properties.
△ Less
Submitted 1 June, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Identifying sequential residue patterns in bitter and umami peptides
Authors:
Arghya Dutta,
Tristan Bereau,
Thomas A. Vilgis
Abstract:
The primary structures of peptides, originating from food proteins, affect their taste. Connecting primary structure to taste, however, is difficult because the size of the peptide sequence space increases exponentially with increasing peptide length, while experimentally-labeled data on peptides' tastes remain scarce. We propose a method that coarse-grains the sequence space to reduce its size an…
▽ More
The primary structures of peptides, originating from food proteins, affect their taste. Connecting primary structure to taste, however, is difficult because the size of the peptide sequence space increases exponentially with increasing peptide length, while experimentally-labeled data on peptides' tastes remain scarce. We propose a method that coarse-grains the sequence space to reduce its size and systematically identifies the most common coarse-grained residue patterns found in known bitter and umami peptides. We select the optimal patterns by performing extensive out-of-sample tests. The optimal patterns better represent the bitter and umami peptides when compared against baseline peptides, bitter peptides with all hydrophobic residues and umami peptides with all negatively charged residues, and peptides with randomly-chosen residues. Our method complements quantitative structure--activity relationship methods by offering generic, coarse-grained bitter and umami residue patterns that can aid in locating short bitter or umami segments in a protein and in designing new umami peptides.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
Shared Metadata for Data-Centric Materials Science
Authors:
Luca M. Ghiringhelli,
Carsten Baldauf,
Tristan Bereau,
Sandor Brockhauser,
Christian Carbogno,
Javad Chamanara,
Stefano Cozzini,
Stefano Curtarolo,
Claudia Draxl,
Shyam Dwaraknath,
Ádám Fekete,
James Kermode,
Christoph T. Koch,
Markus Kühbach,
Alvin Noe Ladines,
Patrick Lambrix,
Maja-Olivia Lenz-Himmer,
Sergey Levchenko,
Micael Oliveira,
Adam Michalchuk,
Ron Miller,
Berk Onat,
Pasquale Pavone,
Giovanni Pizzi,
Benjamin Regler
, et al. (10 additional authors not shown)
Abstract:
The expansive production of data in materials science, their widespread sharing and repurposing requires educated support and stewardship. In order to ensure that this need helps rather than hinders scientific work, the implementation of the FAIR-data principles (Findable, Accessible, Interoperable, and Reusable) must not be too narrow. Besides, the wider materials-science community ought to agree…
▽ More
The expansive production of data in materials science, their widespread sharing and repurposing requires educated support and stewardship. In order to ensure that this need helps rather than hinders scientific work, the implementation of the FAIR-data principles (Findable, Accessible, Interoperable, and Reusable) must not be too narrow. Besides, the wider materials-science community ought to agree on the strategies to tackle the challenges that are specific to its data, both from computations and experiments. In this paper, we present the result of the discussions held at the workshop on "Shared Metadata and Data Formats for Big-Data Driven Materials Science". We start from an operative definition of metadata, and what features a FAIR-compliant metadata schema should have. We will mainly focus on computational materials-science data and propose a constructive approach for the FAIRification of the (meta)data related to ground-state and excited-states calculations, potential-energy sampling, and generalized workflows. Finally, challenges with the FAIRification of experimental (meta)data and materials-science ontologies are presented together with an outlook of how to meet them.
△ Less
Submitted 23 August, 2023; v1 submitted 29 May, 2022;
originally announced May 2022.
-
FAIR data enabling new horizons for materials research
Authors:
Matthias Scheffler,
Martin Aeschlimann,
Martin Albrecht,
Tristan Bereau,
Hans-Joachim Bungartz,
Claudia Felser,
Mark Greiner,
Axel Groß,
Christoph T. Koch,
Kurt Kremer,
Wolfgang E. Nagel,
Markus Scheidgen,
Christof Wöll,
Claudia Draxl
Abstract:
The prosperity and lifestyle of our society are very much governed by achievements in condensed matter physics, chemistry and materials science, because new products for sectors such as energy, the environment, health, mobility and information technology (IT) rely largely on improved or even new materials. Examples include solid-state lighting, touchscreens, batteries, implants, drug delivery and…
▽ More
The prosperity and lifestyle of our society are very much governed by achievements in condensed matter physics, chemistry and materials science, because new products for sectors such as energy, the environment, health, mobility and information technology (IT) rely largely on improved or even new materials. Examples include solid-state lighting, touchscreens, batteries, implants, drug delivery and many more. The enormous amount of research data produced every day in these fields represents a gold mine of the twenty-first century. This gold mine is, however, of little value if these data are not comprehensively characterized and made available. How can we refine this feedstock; that is, turn data into knowledge and value? For this, a FAIR (findable, accessible, interoperable and reusable) data infrastructure is a must. Only then can data be readily shared and explored using data analytics and artificial intelligence (AI) methods. Making data 'findable and AI ready' (a forward-looking interpretation of the acronym) will change the way in which science is carried out today. In this Perspective, we discuss how we can prepare to make this happen for the field of materials science.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Broad chemical transferability in structure-based coarse-graining
Authors:
Kiran H. Kanekal,
Joseph F. Rudzinski,
Tristan Bereau
Abstract:
Compared to top-down coarse-grained (CG) models, bottom-up approaches are capable of offering higher structural fidelity. This fidelity results from the tight link to a higher-resolution reference, making the CG model chemically specific. Unfortunately, chemical specificity can be at odds with compound-screening strategies, which call for transferable parametrizations. Here we present an approach…
▽ More
Compared to top-down coarse-grained (CG) models, bottom-up approaches are capable of offering higher structural fidelity. This fidelity results from the tight link to a higher-resolution reference, making the CG model chemically specific. Unfortunately, chemical specificity can be at odds with compound-screening strategies, which call for transferable parametrizations. Here we present an approach to reconcile bottom-up, structure-preserving CG models with chemical transferability. We consider the bottom-up CG parametrization of 3,441 C$_7$O$_2$ small-molecule isomers. Our approach combines atomic representations, unsupervised learning, and a large-scale extended-ensemble force-matching parametrization. We first identify a subset of 19 representative molecules, which maximally encode the local environment of all gas-phase conformers. Reference interactions between the 19 representative molecules were obtained from both homogeneous bulk liquids and various binary mixtures. An extended-ensemble parametrization over all 703 state points leads to a CG model that is both structure-based and chemically transferable. Remarkably, the resulting force field is on average more structurally accurate than single-state-point equivalents. Averaging over the extended ensemble acts as a mean-force regularizer, smoothing out both force and structural correlations that are overly specific to a single state point. Our approach aims at transferability through a set of CG bead types that can be used to easily construct new molecules, while retaining the benefits of a structure-based parametrization.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Multipolar Force Fields for Amide-I Spectroscopy from Conformational Dynamics of the Alanine-Trimer
Authors:
Padmabati Mondal,
Pierre-André Cazade,
Akshaya K. Das,
Tristan Bereau,
Markus Meuwly
Abstract:
The dynamics and spectroscopy of N-methyl-acetamide (NMA) and trialanine in solution is characterized from molecular dynamics (MD) simulations using different energy functions, including a conventional point charge (PC)-based force field, one based on a multipolar (MTP) representation of the electrostatics, and a semiempirical DFT method. For the 1-d infrared spectra, the frequency splitting betwe…
▽ More
The dynamics and spectroscopy of N-methyl-acetamide (NMA) and trialanine in solution is characterized from molecular dynamics (MD) simulations using different energy functions, including a conventional point charge (PC)-based force field, one based on a multipolar (MTP) representation of the electrostatics, and a semiempirical DFT method. For the 1-d infrared spectra, the frequency splitting between the two amide-I groups is 10 cm$^{-1}$ from the PC, 13 cm$^{-1}$ from the MTP, and 47 cm$^{-1}$ from SCC-DFTB simulations, compared with 25 cm$^{-1}$ from experiment. The frequency trajectory required for determining the frequency fluctuation correlation function (FFCF) is determined from individual (INM) and full normal mode (FNM) analyses of the amide-I vibrations. The spectroscopy, time-zero magnitude of the FFCF $C(t=0)$, and the static component $Δ_0^2$ from simulations using MTP and analysis based on FNM are all consistent with experiments for (Ala)$_3$. Contrary to that, for the analysis excluding mode-mode coupling (INM) the FFCF decays to zero too rapidly and for simulations with a PC-based force field the $Δ_0^2$ is too small by a factor of two compared with experiments. Simulations with SCC-DFTB agree better with experiment for these observables than those from PC-based simulations. The conformational ensemble sampled from simulations using PCs is consistent with the literature , whereas that covered by the MTP-based simulations is dominated by P$_{\rm II}$ which agrees with and confirms recently reported, Bayesian-refined populations based on 1-dimensional infrared experiments. Full normal mode analysis together with a MTP representation provides a meaningful model to correctly describe the dynamics of hydrated trialanine.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Dynamical properties across different coarse-grained models for ionic liquids
Authors:
Joseph F. Rudzinski,
Sebastian Kloth,
Svenja Wörner,
Tamisra Pal,
Kurt Kremer,
Tristan Bereau,
Michael Vogel
Abstract:
Room-temperature ionic liquids (RTILs) stand out among molecular liquids for their rich physicochemical characteristics, including structural and dynamic heterogeneity. The significance of electrostatic interactions in RTILs results in long characteristic length- and timescales, and has motivated the development of a number of coarse-grained (CG) simulation models. In this study, we aim to better…
▽ More
Room-temperature ionic liquids (RTILs) stand out among molecular liquids for their rich physicochemical characteristics, including structural and dynamic heterogeneity. The significance of electrostatic interactions in RTILs results in long characteristic length- and timescales, and has motivated the development of a number of coarse-grained (CG) simulation models. In this study, we aim to better understand the connection between certain CG parametrization strategies and the dynamical properties and transferability of the resulting models. We systematically compare five CG models: a model largely parametrized from experimental thermodynamic observables; a refinement of this model to increase its structural accuracy; and three models that reproduce a given set of structural distribution functions by construction, with varying intramolecular parametrizations and reference temperatures. All five CG models display limited structural transferability over temperature, and also result in various effective dynamical speedup factors, relative to a reference atomistic model. On the other hand, the structure-based CG models tend to result in more consistent cation-anion relative diffusion than the thermodynamic-based models, for a single thermodynamic state point. By linking short- and long-timescale dynamical behaviors, we demonstrate that the varying dynamical properties of the different coarse-grained models can be largely collapsed onto a single curve, which provides evidence for a route to constructing dynamically-consistent CG models of RTILs.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Adversarial reverse mapping of condensed-phase molecular structures: Chemical transferability
Authors:
Marc Stieffenhofer,
Tristan Bereau,
Michael Wand
Abstract:
Switching between different levels of resolution is essential for multiscale modeling, but restoring details at higher resolution remains challenging. In our previous study we have introduced deepBackmap: a deep neural-network-based approach to reverse-map equilibrated molecular structures for condensed-phase systems. Our method combines data-driven and physics-based aspects, leading to high-quali…
▽ More
Switching between different levels of resolution is essential for multiscale modeling, but restoring details at higher resolution remains challenging. In our previous study we have introduced deepBackmap: a deep neural-network-based approach to reverse-map equilibrated molecular structures for condensed-phase systems. Our method combines data-driven and physics-based aspects, leading to high-quality reconstructed structures. In this work, we expand the scope of our model and examine its chemical transferability. To this end, we train deepBackmap solely on homogeneous molecular liquids of small molecules, and apply it to a more challenging polymer melt. We augment the generator's objective with different force-field-based terms as prior to regularize the results. The best performing physical prior depends on whether we train for a specific chemistry, or transfer our model. Our local environment representation combined with the sequential reconstruction of fine-grained structures help reach transferability of the learned correlations.
△ Less
Submitted 23 February, 2021; v1 submitted 13 January, 2021;
originally announced January 2021.
-
Reweighting non-equilibrium steady-state dynamics along collective variables
Authors:
Marius Bause,
Tristan Bereau
Abstract:
Computer simulations generate microscopic trajectories of complex systems at a single thermodynamic state point. We recently introduced a Maximum Caliber (MaxCal) approach for dynamical reweighting. Our approach mapped these trajectories to a Markovian description on the configurational coordinates, and reweighted path probabilities as a function of external forces. Trajectory probabilities can be…
▽ More
Computer simulations generate microscopic trajectories of complex systems at a single thermodynamic state point. We recently introduced a Maximum Caliber (MaxCal) approach for dynamical reweighting. Our approach mapped these trajectories to a Markovian description on the configurational coordinates, and reweighted path probabilities as a function of external forces. Trajectory probabilities can be dynamically reweighted both from and to equilibrium or non-equilibrium steady states. As the system's dimensionality increases, an exhaustive description of the microtrajectories becomes prohibitive--even with a Markovian assumption. Instead we reduce the dimensionality of the configurational space to collective variables (CVs). Going from configurational to CV space, we define local entropy productions derived from configurationally averaged mean forces. The entropy production is shown to be a suitable constraint on MaxCal for non-equilibrium steady states expressed as a function of CVs. We test the reweighting procedure on two systems: a particle subject to a two-dimensional potential and a coarse-grained peptide. Our CV-based MaxCal approach expands dynamical reweighting to larger systems, for both static and dynamical properties, and across a large range of driving forces.
△ Less
Submitted 12 March, 2021; v1 submitted 6 January, 2021;
originally announced January 2021.
-
Data-driven equation for drug-membrane permeability across drugs and membranes
Authors:
Arghya Dutta,
Jilles Vreeken,
Luca M. Ghiringhelli,
Tristan Bereau
Abstract:
Drug efficacy depends on its capacity to permeate across the cell membrane. We consider the prediction of passive drug-membrane permeability coefficients. Beyond the widely recognized correlation with hydrophobicity, we additionally consider the functional relationship between passive permeation and acidity. To discover easily interpretable equations that explain the data well, we use the recently…
▽ More
Drug efficacy depends on its capacity to permeate across the cell membrane. We consider the prediction of passive drug-membrane permeability coefficients. Beyond the widely recognized correlation with hydrophobicity, we additionally consider the functional relationship between passive permeation and acidity. To discover easily interpretable equations that explain the data well, we use the recently proposed sure-independence screening and sparsifying operator (SISSO), an artificial-intelligence technique that combines symbolic regression with compressed sensing. Our study is based on a large in silico dataset of 0.4 million small molecules extracted from coarse-grained simulations. We rationalize the equation suggested by SISSO via an analysis of the inhomogeneous solubility-diffusion model in several asymptotic acidity regimes. We further extend our analysis to the dependence on lipid-membrane composition. Lipid-tail unsaturation plays a key role, but surprisingly contributes stepwise rather than proportionally. Our results are in line with previously observed changes in permeability, suggesting the distinction between liquid-disordered (Ld) and liquid-ordered (Lo) permeation. Together, compressed sensing with analytically derived asymptotes establish and validate an accurate, broadly applicable, and interpretable equation for passive permeability across both drug and lipid-tail chemistry.
△ Less
Submitted 29 June, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
Computational compound screening of biomolecules and soft materials by molecular simulations
Authors:
Tristan Bereau
Abstract:
Decades of hardware, methodological, and algorithmic development have propelled molecular dynamics (MD) simulations to the forefront of materials-modeling techniques, bridging the gap between electronic-structure theory and continuum methods. The physics-based approach makes MD appropriate to study emergent phenomena, but simultaneously incurs significant computational investment. This topical rev…
▽ More
Decades of hardware, methodological, and algorithmic development have propelled molecular dynamics (MD) simulations to the forefront of materials-modeling techniques, bridging the gap between electronic-structure theory and continuum methods. The physics-based approach makes MD appropriate to study emergent phenomena, but simultaneously incurs significant computational investment. This topical review explores the use of MD outside the scope of individual systems, but rather considering many compounds. Such an in silico screening approach makes MD amenable to establishing coveted structure--property relationships. We specifically focus on biomolecules and soft materials, characterized by the significant role of entropic contributions and heterogeneous systems and scales. An account of the state of the art for the implementation of an MD-based screening paradigm is described, including automated force-field parametrization, system preparation, and efficient sampling across both conformation and composition. Emphasis is placed on machine-learning methods to enable MD-based screening. The resulting framework enables the generation of compound--property databases and the use of advanced statistical modeling to gather insight. The review further summarizes a number of relevant applications.
△ Less
Submitted 10 November, 2020; v1 submitted 7 October, 2020;
originally announced October 2020.
-
Coarse-grained conformational surface hopping: Methodology and transferability
Authors:
Joseph F. Rudzinski,
Tristan Bereau
Abstract:
Coarse-grained (CG) conformational surface hopping (SH) adapts the concept of multisurface dynamics, initially developed to describe electronic transitions in chemical reactions, to accurately describe classical molecular dynamics at a reduced level. The SH scheme couples distinct conformational basins (states), each described by its own force field (surface), resulting in a significant improvemen…
▽ More
Coarse-grained (CG) conformational surface hopping (SH) adapts the concept of multisurface dynamics, initially developed to describe electronic transitions in chemical reactions, to accurately describe classical molecular dynamics at a reduced level. The SH scheme couples distinct conformational basins (states), each described by its own force field (surface), resulting in a significant improvement of the approximation to the many-body potential of mean force [Phys. Rev. Lett. 121, 256002 (2018)]. The present study first describes CG SH in more detail, through both a toy model and a three-bead model of hexane. We further extend the methodology to non-bonded interactions and report its impact on liquid properties. Finally, we investigate the transferability of the surfaces to distinct systems and thermodynamic state points, through a simple tuning of the state probabilities. In particular, applications to variations in temperature and chemical composition show good agreement with reference atomistic calculations, introducing a promising "weak-transferability regime," where CG force fields can be shared across thermodynamic and chemical neighborhoods.
△ Less
Submitted 9 November, 2020; v1 submitted 18 September, 2020;
originally announced September 2020.
-
Free-energy landscape of polymer-crystal polymorphism
Authors:
Chan Liu,
Jan Gerit Brandenburg,
Omar Valsson,
Kurt Kremer,
Tristan Bereau
Abstract:
Polymorphism rationalizes how processing can control the final structure of a material. The rugged free-energy landscape and exceedingly slow kinetics in the solid state have so far hampered computational investigations. We report for the first time the free-energy landscape of a polymorphic crystalline polymer, syndiotactic polystyrene. Coarse-grained metadynamics simulations allow us to efficien…
▽ More
Polymorphism rationalizes how processing can control the final structure of a material. The rugged free-energy landscape and exceedingly slow kinetics in the solid state have so far hampered computational investigations. We report for the first time the free-energy landscape of a polymorphic crystalline polymer, syndiotactic polystyrene. Coarse-grained metadynamics simulations allow us to efficiently sample the landscape at large. The free-energy difference between the two main polymorphs, $α$ and $β$, is further investigated by quantum-chemical calculations. The two methods are in line with experimental observations: they predict $β$ as the more stable polymorph at standard conditions. Critically, the free-energy landscape suggests how the $α$ polymorph may lead to experimentally observed kinetic traps. The combination of multiscale modeling, enhanced sampling, and quantum-chemical calculations offers an appealing strategy to uncover complex free-energy landscapes with polymorphic behavior.
△ Less
Submitted 23 July, 2020;
originally announced July 2020.
-
Hydration free energies from kernel-based machine learning: Compound-database bias
Authors:
Clemens Rauer,
Tristan Bereau
Abstract:
We consider the prediction of a basic thermodynamic property---hydration free energies---across a large subset of the chemical space of small organic molecules. Our in silico study is based on computer simulations at the atomistic level with implicit solvent. We report on a kernel-based machine learning approach that is inspired by recent work in learning electronic properties, but differs in key…
▽ More
We consider the prediction of a basic thermodynamic property---hydration free energies---across a large subset of the chemical space of small organic molecules. Our in silico study is based on computer simulations at the atomistic level with implicit solvent. We report on a kernel-based machine learning approach that is inspired by recent work in learning electronic properties, but differs in key aspects: The representation is averaged over several conformers to account for the statistical ensemble. We also include an atomic-decomposition ansatz, which we show offers significant added transferability compared to molecular learning. Finally, we explore the existence of severe biases from databases of experimental compounds. By performing a combination of dimensionality reduction and cross-learning models, we show that the rate of learning depends significantly on the breadth and variety of the training dataset. Our study highlights the dangers of fitting machine-learning models to databases of narrow chemical range.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Adversarial Reverse Mapping of Equilibrated Condensed-Phase Molecular Structures
Authors:
Marc Stieffenhofer,
Michael Wand,
Tristan Bereau
Abstract:
A tight and consistent link between resolutions is crucial to further expand the impact of multiscale modeling for complex materials. We herein tackle the generation of condensed molecular structures as a refinement -- backmapping -- of a coarse-grained structure. Traditional schemes start from a rough coarse-to-fine mapping and perform further energy minimization and molecular dynamics simulation…
▽ More
A tight and consistent link between resolutions is crucial to further expand the impact of multiscale modeling for complex materials. We herein tackle the generation of condensed molecular structures as a refinement -- backmapping -- of a coarse-grained structure. Traditional schemes start from a rough coarse-to-fine mapping and perform further energy minimization and molecular dynamics simulations to equilibrate the system. In this study we introduce DeepBackmap: A deep neural network based approach to directly predict equilibrated molecular structures for condensed-phase systems. We use generative adversarial networks to learn the Boltzmann distribution from training data and realize reverse mapping by using the coarse-grained structure as a conditional input. We apply our method to a challenging condensed-phase polymeric system. We observe that the model trained in a melt has remarkable transferability to the crystalline phase. The combination of data-driven and physics-based aspects of our architecture help reach temperature transferability with only limited training data.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Interpretable Embeddings From Molecular Simulations Using Gaussian Mixture Variational Autoencoders
Authors:
Yasemin Bozkurt Varolgunes,
Tristan Bereau,
Joseph F. Rudzinski
Abstract:
Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive a priori in…
▽ More
Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive a priori intuition into the relevant driving forces. In particular, autoencoders are powerful tools for dimensionality reduction, as they naturally force an information bottleneck and, thereby, a low-dimensional embedding of the essential features. While variational autoencoders ensure continuity of the embedding by assuming a unimodal Gaussian prior, this is at odds with the multi-basin free-energy landscapes that typically arise from the identification of meaningful collective variables. In this work, we incorporate this physical intuition into the prior by employing a Gaussian mixture variational autoencoder (GMVAE), which encourages the separation of metastable states within the embedding. The GMVAE performs dimensionality reduction and clustering within a single unified framework, and is capable of identifying the inherent dimensionality of the input data, in terms of the number of Gaussians required to categorize the data. We illustrate our approach on two toy models, alanine dipeptide, and a challenging disordered peptide ensemble, demonstrating the enhanced clustering effect of the GMVAE prior compared to standard VAEs. The resulting embeddings appear to be promising representations for constructing Markov state models, highlighting the transferability of the dimensionality reduction from static equilibrium properties to dynamics.
△ Less
Submitted 22 December, 2019;
originally announced December 2019.
-
Direct route to reproducing pair distribution functions with coarse-grained models via transformed atomistic cross correlations
Authors:
Svenja J. Woerner,
Tristan Bereau,
Kurt Kremer,
Joseph F. Rudzinski
Abstract:
Coarse-grained (CG) models are often parametrized to reproduce one-dimensional structural correlation functions of an atomically-detailed model along the degrees of freedom governing each interaction potential. While cross correlations between these degrees of freedom inform the optimal set of interaction parameters, the correlations generated from the higher-resolution simulations are often too c…
▽ More
Coarse-grained (CG) models are often parametrized to reproduce one-dimensional structural correlation functions of an atomically-detailed model along the degrees of freedom governing each interaction potential. While cross correlations between these degrees of freedom inform the optimal set of interaction parameters, the correlations generated from the higher-resolution simulations are often too complex to act as an accurate proxy for the CG correlations. Instead, the most popular methods determine the interaction parameters iteratively, while assuming that individual interactions are uncorrelated. While these iterative methods have been validated for a wide range of systems, they also have disadvantages when parametrizing models for multi-component systems or when refining previously established models to better reproduce particular structural features. In this work, we propose two distinct approaches for the direct (i.e., non-iterative) parametrization of a CG model by adjusting the high-resolution cross correlations of an atomistic model in order to more accurately reflect correlations that will be generated by the resulting CG model. The derived models more accurately describe the low-order structural features of the underlying AA model, while necessarily generating inherently distinct cross correlations compared with the atomically-detailed reference model. We demonstrate the proposed methods for a one-site-per-molecule representation of liquid water, where pairwise interactions are incapable of reproducing the true tetrahedral solvation structure. We then investigate the precise role that distinct cross-correlation features play in determining the correct pair correlation functions, evaluating the importance of the placement of correlation features as well as the balance between features appearing in different solvation shells.
△ Less
Submitted 29 November, 2019; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Microscopic reweighting for non-equilibrium steady states dynamics
Authors:
Marius Bause,
Timon Wittenstein,
Kurt Kremer,
Tristan Bereau
Abstract:
Computer simulations generate trajectories at a single, well-defined thermodynamic state point. Statistical reweighting offers the means to reweight static and dynamical properties to different equilibrium state points by means of analytic relations. We extend these ideas to non-equilibrium steady states by relying on a maximum path entropy formalism subject to physical constraints. Stochastic the…
▽ More
Computer simulations generate trajectories at a single, well-defined thermodynamic state point. Statistical reweighting offers the means to reweight static and dynamical properties to different equilibrium state points by means of analytic relations. We extend these ideas to non-equilibrium steady states by relying on a maximum path entropy formalism subject to physical constraints. Stochastic thermodynamics analytically relates the forward and backward probabilities of any pathway through the external non-conservative force, enabling reweighting both in and out of equilibrium. We avoid the combinatorial explosion of microtrajectories by systematically constructing pathways through Markovian transitions. We further identify a quantity that is invariant to dynamical reweighting, analogous to the density of states in equilibrium reweighting.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
Resolution limit of data-driven coarse-grained models spanning chemical space
Authors:
Kiran H. Kanekal,
Tristan Bereau
Abstract:
Increasing the efficiency of materials design and discovery remains a significant challenge, especially given the prohibitively large size of chemical compound space. The use of a chemically transferable coarse-grained model enables different molecular fragments to map to the same bead type, while also reducing computational expense. These properties further increase screening efficiency, as many…
▽ More
Increasing the efficiency of materials design and discovery remains a significant challenge, especially given the prohibitively large size of chemical compound space. The use of a chemically transferable coarse-grained model enables different molecular fragments to map to the same bead type, while also reducing computational expense. These properties further increase screening efficiency, as many compounds are screened through the use of a single coarse-grained simulation, effectively reducing the size of chemical compound space. Here, we propose new criteria for the rational design of coarse-grained models that allows for the optimization of their chemical transferability and evaluate the Martini model within this framework. We further investigate the scope of this chemical transferability by parameterizing three Martini-like models, in which the number of bead types ranges from five to sixteen for the different force fields. We then implement a Bayesian approach to determining which chemical groups are more likely to be present on fragments corresponding to specific bead types for each model. We demonstrate that a level of performance and accuracy comparable to Martini can be obtained by using a force field with fewer bead types. However, the advantage of including more bead types is a reduction of uncertainty with respect to back-mapping these bead types to specific chemistries. Just as reducing the size of the coarse-grained particles leads to a finer mapping of conformational space, increasing the number of bead types yields a finer mapping of chemical compound space. Finally, we note that, due to the relatively large size of the chemical fragments that map to a single martini bead, a clear resolution limit arises when using the water/octanol partition free energy as the only descriptor when coarse-graining chemical compound space.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Controlled exploration of chemical space by machine learning of coarse-grained representations
Authors:
Christian Hoffmann,
Roberto Menichetti,
Kiran H. Kanekal,
Tristan Bereau
Abstract:
The size of chemical compound space is too large to be probed exhaustively. This leads high-throughput protocols to drastically subsample and results in sparse and non-uniform datasets. Rather than arbitrarily selecting compounds, we systematically explore chemical space according to the target property of interest. We first perform importance sampling by introducing a Markov chain Monte Carlo sch…
▽ More
The size of chemical compound space is too large to be probed exhaustively. This leads high-throughput protocols to drastically subsample and results in sparse and non-uniform datasets. Rather than arbitrarily selecting compounds, we systematically explore chemical space according to the target property of interest. We first perform importance sampling by introducing a Markov chain Monte Carlo scheme across compounds. We then train an ML model on the sampled data to expand the region of chemical space probed. Our boosting procedure enhances the number of compounds by a factor 2 to 10, enabled by the ML model's coarse-grained representation, which both simplifies the structure-property relationship and reduces the size of chemical space. The ML model correctly recovers linear relationships between transfer free energies. These linear relationships correspond to features that are global to the dataset, marking the region of chemical space up to which predictions are reliable---a more robust alternative to the predictive variance. Bridging coarse-grained simulations with ML gives rise to an unprecedented database of drug-membrane insertion free energies for 1.3 million compounds.
△ Less
Submitted 30 July, 2019; v1 submitted 6 May, 2019;
originally announced May 2019.
-
Automated detection of many-particle solvation states for accurate characterizations of diffusion kinetics
Authors:
Joseph F. Rudzinski,
Marc Radu,
Tristan Bereau
Abstract:
Discrete-space kinetic models, i.e., Markov state models, have emerged as powerful tools for reducing the complexity of trajectories generated from molecular dynamics simulations. These models require configuration-space representations that accurately characterize the relevant dynamics. Well-established, low-dimensional order parameters for constructing this representation have led to widespread…
▽ More
Discrete-space kinetic models, i.e., Markov state models, have emerged as powerful tools for reducing the complexity of trajectories generated from molecular dynamics simulations. These models require configuration-space representations that accurately characterize the relevant dynamics. Well-established, low-dimensional order parameters for constructing this representation have led to widespread application of Markov state models to study conformational dynamics in biomolecular systems. On the contrary, applications to characterize single-molecule diffusion processes have been scarce and typically employ system-specific, higher-dimensional order parameters to characterize the local solvation state of the molecule. In this work, we propose an automated method for generating a coarse configuration-space representation, using generic features of solvation structure---the coordination numbers about each particle. To overcome the inherent noisy behavior of these low-dimensional observables, we treat the features as indicators of an underlying, latent Markov process. The resulting hidden Markov models filter the trajectories of each feature into the most likely latent solvation state at each time step. The filtered trajectories are then used to construct a configuration-space discretization, which accurately describes the diffusion kinetics. The method is validated on a standard model for glassy liquids, where particle jumps between local cages determine the diffusion properties of the system. Not only do the resulting models provide quantitatively accurate characterizations of the diffusion constant, but they also reveal a mechanistic description of diffusive jumps, quantifying the heterogeneity of local diffusion.
△ Less
Submitted 22 November, 2018; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Accurate structure-based coarse-graining leads to consistent barrier-crossing dynamics
Authors:
Tristan Bereau,
Joseph F. Rudzinski
Abstract:
Structure-based coarse graining of molecular systems offers a systematic route to reproduce the many-body potential of mean force. Unfortunately, common strategies are inherently limited by the molecular mechanics force field employed. Here, we extend the concept of multisurface dynamics, initially developed to describe electronic transitions in chemical reactions, to accurately sample the conform…
▽ More
Structure-based coarse graining of molecular systems offers a systematic route to reproduce the many-body potential of mean force. Unfortunately, common strategies are inherently limited by the molecular mechanics force field employed. Here, we extend the concept of multisurface dynamics, initially developed to describe electronic transitions in chemical reactions, to accurately sample the conformational ensemble of a classical system in equilibrium. In analogy to describing different electronic configurations, a surface-hopping scheme couples distinct conformational basins beyond the additivity of the Hamiltonian. The incorporation of more surfaces leads systematically toward improved cross-correlations. The resulting models naturally achieve consistent long-time dynamics for systems governed by barrier-crossing events.
△ Less
Submitted 26 December, 2018; v1 submitted 16 August, 2018;
originally announced August 2018.
-
Drug-membrane permeability across chemical space
Authors:
Roberto Menichetti,
Kiran H. Kanekal,
Tristan Bereau
Abstract:
Unraveling the relation between the chemical structure of small drug-like compounds and their rate of passive permeation across lipid membranes is of fundamental importance for pharmaceutical applications. The elucidation of a comprehensive structure-permeability relationship expressed in terms of a few molecular descriptors is unfortunately hampered by the overwhelming number of possible compound…
▽ More
Unraveling the relation between the chemical structure of small drug-like compounds and their rate of passive permeation across lipid membranes is of fundamental importance for pharmaceutical applications. The elucidation of a comprehensive structure-permeability relationship expressed in terms of a few molecular descriptors is unfortunately hampered by the overwhelming number of possible compounds. In this work, we reduce a priori the size and diversity of chemical space to solve an analogous---but smoothed out---structure-property relationship problem. This is achieved by relying on a physics-based coarse-grained model that reduces the size of chemical space, enabling a comprehensive exploration of this space with greatly reduced computational cost. We perform high-throughput coarse-grained (HTCG) simulations to derive a permeability surface in terms of two simple molecular descriptors---bulk partitioning free energy and pKa. The surface is constructed by exhaustively simulating all coarse-grained compounds that are representative of small organic molecules (ranging from 30 to 160 Da) in a high-throughput scheme. We provide results for acidic, basic and zwitterionic compounds. Connecting back to the atomic resolution, the HTCG predictions for more than 500,000 compounds allow us to establish a clear connection between specific chemical groups and the resulting permeability coefficient, enabling for the first time an inverse design procedure. Our results have profound implications for drug synthesis: the predominance of commonly-employed chemical moieties narrows down the range of permeabilities.
△ Less
Submitted 27 December, 2018; v1 submitted 25 May, 2018;
originally announced May 2018.
-
Polymorphism of syndiotactic polystyrene crystals from multiscale simulations
Authors:
Chan Liu,
Kurt Kremer,
Tristan Bereau
Abstract:
Syndiotactic polystyrene (sPS) exhibits complex polymorphic behavior upon crystallization. Computational modeling of polymer crystallization has remained a challenging task because the relevant processes are slow on the molecular time scale. We report herein a detailed characterization of sPS-crystal polymorphism by means of coarse-grained (CG) and atomistic (AA) modeling. The CG model, parametriz…
▽ More
Syndiotactic polystyrene (sPS) exhibits complex polymorphic behavior upon crystallization. Computational modeling of polymer crystallization has remained a challenging task because the relevant processes are slow on the molecular time scale. We report herein a detailed characterization of sPS-crystal polymorphism by means of coarse-grained (CG) and atomistic (AA) modeling. The CG model, parametrized in the melt, shows remarkable transferability properties in the crystalline phase. Not only is the transition temperature in good agreement with atomistic simulations, it stabilizes the main $α$ and $β$ polymorphs, observed experimentally. We compare in detail the propensity of polymorphs at the CG and AA level and discuss finite-size as well as box-geometry effects. All in all, we demontrate the appeal of CG modeling to efficiently characterize polymer-crystal polymorphism at large scale.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning
Authors:
Tristan Bereau,
Robert A. DiStasio Jr.,
Alexandre Tkatchenko,
O. Anatole von Lilienfeld
Abstract:
Classical intermolecular potentials typically require an extensive parametrization procedure for any new compound considered. To do away with prior parametrization, we propose a combination of physics-based potentials with machine learning (ML), coined IPML, which is transferable across small neutral organic and biologically-relevant molecules. ML models provide on-the-fly predictions for environm…
▽ More
Classical intermolecular potentials typically require an extensive parametrization procedure for any new compound considered. To do away with prior parametrization, we propose a combination of physics-based potentials with machine learning (ML), coined IPML, which is transferable across small neutral organic and biologically-relevant molecules. ML models provide on-the-fly predictions for environment-dependent local atomic properties: electrostatic multipole coefficients (significant error reduction compared to previously reported), the population and decay rate of valence atomic densities, and polarizabilities across conformations and chemical compositions of H, C, N, and O atoms. These parameters enable accurate calculations of intermolecular contributions---electrostatics, charge penetration, repulsion, induction/polarization, and many-body dispersion. Unlike other potentials, this model is transferable in its ability to handle new molecules and conformations without explicit prior parametrization: All local atomic properties are predicted from ML, leaving only eight global parameters---optimized once and for all across compounds. We validate IPML on various gas-phase dimers at and away from equilibrium separation, where we obtain mean absolute errors between 0.4 and 0.7 kcal/mol for several chemically and conformationally diverse datasets representative of non-covalent interactions in biologically-relevant molecules. We further focus on hydrogen-bonded complexes---essential but challenging due to their directional nature---where datasets of DNA base pairs and amino acids yield an extremely encouraging 1.4 kcal/mol error. Finally, and as a first look, we consider IPML in denser systems: water clusters, supramolecular host-guest complexes, and the benzene crystal.
△ Less
Submitted 11 January, 2018; v1 submitted 16 October, 2017;
originally announced October 2017.
-
Efficient potential of mean force calculation from multiscale simulations: solute insertion in a lipid membrane
Authors:
Roberto Menichetti,
Kurt Kremer,
Tristan Bereau
Abstract:
The determination of potentials of mean force for solute insertion in a membrane by means of all-atom molecular dynamics simulations is often hampered by sampling issues. A multiscale approach to conformational sampling was recently proposed by Bereau and Kremer (2016). It aims at accelerating the sampling of the atomistic conformational space by means of a systematic backmapping of coarse-grained…
▽ More
The determination of potentials of mean force for solute insertion in a membrane by means of all-atom molecular dynamics simulations is often hampered by sampling issues. A multiscale approach to conformational sampling was recently proposed by Bereau and Kremer (2016). It aims at accelerating the sampling of the atomistic conformational space by means of a systematic backmapping of coarse-grained snapshots. In this work, we first analyze the efficiency of this method by comparing its predictions for propanol insertion into a 1,2-Dimyristoyl-sn-glycero-3-phosphocholine membrane (DMPC) against reference atomistic simulations. The method is found to provide accurate results with a gain of one order of magnitude in computational time. We then investigate the role of the coarse-grained representation in affecting the reliability of the method in the case of a 1,2-Dioleoyl-sn-glycero-3-phosphocholine membrane (DOPC). We find that the accuracy of the results is tightly connected to the presence a good configurational overlap between the coarse-grained and atomistic models---a general requirement when developing multiscale simulation methods.
△ Less
Submitted 6 October, 2017;
originally announced October 2017.
-
In silico screening of drug-membrane thermodynamics reveals linear relations between bulk partitioning and the potential of mean force
Authors:
Roberto Menichetti,
Kiran H. Kanekal,
Kurt Kremer,
Tristan Bereau
Abstract:
The partitioning of small molecules in cell membranes---a key parameter for pharmaceutical applications---typically relies on experimentally-available bulk partitioning coefficients. Computer simulations provide a structural resolution of the insertion thermodynamics via the potential of mean force, but require significant sampling at the atomistic level. Here, we introduce high-throughput coarse-…
▽ More
The partitioning of small molecules in cell membranes---a key parameter for pharmaceutical applications---typically relies on experimentally-available bulk partitioning coefficients. Computer simulations provide a structural resolution of the insertion thermodynamics via the potential of mean force, but require significant sampling at the atomistic level. Here, we introduce high-throughput coarse-grained molecular dynamics simulations to screen thermodynamic properties. This application of physics based models in a large-scale study of small molecules establishes linear relationships between partitioning coefficients and key features of the potential of mean force. This allows us to predict the structure of the insertion from bulk experimental measurements for more than 400,000 compounds. The potential of mean force hereby becomes an easily accessible quantity---already recognized for its high predictability of certain properties, e.g., passive permeation. Further, we demonstrate how coarse graining helps reduce the size of chemical space, enabling a hierarchical approach to screening small molecules.
△ Less
Submitted 1 December, 2017; v1 submitted 8 June, 2017;
originally announced June 2017.
-
Concurrent parametrization against static and kinetic information leads to more robust coarse-grained force fields
Authors:
Joseph F. Rudzinski,
Tristan Bereau
Abstract:
The parametrization of coarse-grained (CG) simulation models for molecular systems often aims at reproducing static properties alone. The reduced molecular friction of the CG representation usually results in faster, albeit inconsistent, dynamics. In this work, we rely on Markov state models to simultaneously characterize the static and kinetic properties of two CG peptide force fields---one top-d…
▽ More
The parametrization of coarse-grained (CG) simulation models for molecular systems often aims at reproducing static properties alone. The reduced molecular friction of the CG representation usually results in faster, albeit inconsistent, dynamics. In this work, we rely on Markov state models to simultaneously characterize the static and kinetic properties of two CG peptide force fields---one top-down and one bottom-up. Instead of a rigorous evolution of CG dynamics (e.g., using a generalized Langevin equation), we attempt to improve the description of kinetics by simply altering the existing CG models, which employ standard Langevin dynamics. By varying masses and relevant force-field parameters, we can improve the timescale separation of the slow kinetic processes, achieve a more consistent ratio of mean-first-passage times between metastable states, and refine the relative free-energies between these states. Importantly, we show that the incorporation of kinetic information into a structure-based parametrization improves the description of the helix-coil transition sampled by a minimal CG model. While structure-based models understabilize the helical state, kinetic constraints help identify CG models that improve the ratio of forward/backward timescales by effectively hindering the sampling of spurious conformational intermediate states.
△ Less
Submitted 19 July, 2016;
originally announced July 2016.
-
Consistent interpretation of molecular simulation kinetics using Markov state models biased with external information
Authors:
Joseph F. Rudzinski,
Kurt Kremer,
Tristan Bereau
Abstract:
Molecular simulations can provide microscopic insight into the physical and chemical driving forces of complex molecular processes. Despite continued advancement of simulation methodology, model errors may lead to inconsistencies between simulated and reference (e.g., from experiments or higher-level simulations) observables. To bound the microscopic information generated by computer simulations w…
▽ More
Molecular simulations can provide microscopic insight into the physical and chemical driving forces of complex molecular processes. Despite continued advancement of simulation methodology, model errors may lead to inconsistencies between simulated and reference (e.g., from experiments or higher-level simulations) observables. To bound the microscopic information generated by computer simulations within reference measurements, we propose a method that reweights the microscopic transitions of the system to improve consistency with a set of coarse kinetic observables. The method employs the well-developed Markov state modeling framework to efficiently link microscopic dynamics with long-time scale constraints, thereby consistently addressing a wide range of time scales. To emphasize the robustness of the method, we consider two distinct coarse-grained models with significant kinetic inconsistencies. When applied to the simulated conformational dynamics of small peptides, the reweighting procedure systematically improves the time scale separation of the slowest processes. Additionally, constraining the forward and backward rates between metastable states leads to slight improvement of their relative stabilities and, thus, refined equilibrium properties of the resulting model. Finally, we find that difficulties in simultaneously describing both the simulated data and the provided constraints can help identify specific limitations of the underlying simulation approach.
△ Less
Submitted 11 February, 2016;
originally announced February 2016.
-
Folding and insertion thermodynamics of the transmembrane WALP peptide
Authors:
Tristan Bereau,
W. F. Drew Bennett,
Jim Pfaendtner,
Markus Deserno,
Mikko Karttunen
Abstract:
The anchor of most integral membrane proteins consists of one or several helices spanning the lipid bilayer. The WALP peptide, GWW(LA)$_n$(L)WWA, is a common model helix to study the fundamentals of protein insertion and folding, as well as helix-helix association in the membrane. Its structural properties have been illuminated in a large number of experimental and simulation studies. In this comb…
▽ More
The anchor of most integral membrane proteins consists of one or several helices spanning the lipid bilayer. The WALP peptide, GWW(LA)$_n$(L)WWA, is a common model helix to study the fundamentals of protein insertion and folding, as well as helix-helix association in the membrane. Its structural properties have been illuminated in a large number of experimental and simulation studies. In this combined coarse-grained and atomistic simulation study, we probe the thermodynamics of a single WALP peptide, focusing on both the insertion across the water-membrane interface, as well as folding in both water and a membrane. The potential of mean force characterizing the peptide's insertion into the membrane shows qualitatively similar behavior across peptides and three force fields. However, the Martini force field exhibits a pronounced secondary minimum for an adsorbed interfacial state, which may even become the global minimum---in contrast to both atomistic simulations and the alternative PLUM force field. Even though the two coarse-grained models reproduce the free energy of insertion of individual amino acids side chains, they both underestimate its corresponding value for the full peptide (as compared with atomistic simulations), hinting at cooperative physics beyond the residue level. Folding of WALP in the two environments indicates the helix as the most stable structure, though with different relative stabilities and chain-length dependence.
△ Less
Submitted 28 August, 2015;
originally announced August 2015.
-
Transferable atomic multipole machine learning models for small organic molecules
Authors:
Tristan Bereau,
Denis Andrienko,
O. Anatole von Lilienfeld
Abstract:
Accurate representation of the molecular electrostatic potential, which is often expanded in distributed multipole moments, is crucial for an efficient evaluation of intermolecular interactions. Here we introduce a machine learning model for multipole coefficients of atom types H, C, O, N, S, F, and Cl in any molecular conformation. The model is trained on quantum chemical results for atoms in var…
▽ More
Accurate representation of the molecular electrostatic potential, which is often expanded in distributed multipole moments, is crucial for an efficient evaluation of intermolecular interactions. Here we introduce a machine learning model for multipole coefficients of atom types H, C, O, N, S, F, and Cl in any molecular conformation. The model is trained on quantum chemical results for atoms in varying chemical environments drawn from thousands of organic molecules. Multipoles in systems with neutral, cationic, and anionic molecular charge states are treated with individual models. The models' predictive accuracy and applicability are illustrated by evaluating intermolecular interaction energies of nearly 1,000 dimers and the cohesive energy of the benzene crystal.
△ Less
Submitted 31 March, 2015; v1 submitted 18 March, 2015;
originally announced March 2015.
-
Toward transferable interatomic van der Waals interactions without electrons: The role of multipole electrostatics and many-body dispersion
Authors:
Tristan Bereau,
O. Anatole von Lilienfeld
Abstract:
We estimate polarizabilities of atoms in molecules without electron density, using a Voronoi tesselation approach instead of conventional density partitioning schemes. The resulting atomic dispersion coefficients are calculated, as well as many-body dispersion effects on intermolecular potential energies. We also estimate contributions from multipole electrostatics and compare them to dispersion.…
▽ More
We estimate polarizabilities of atoms in molecules without electron density, using a Voronoi tesselation approach instead of conventional density partitioning schemes. The resulting atomic dispersion coefficients are calculated, as well as many-body dispersion effects on intermolecular potential energies. We also estimate contributions from multipole electrostatics and compare them to dispersion. We assess the performance of the resulting intermolecular interaction model from dispersion and electrostatics for more than 1,300 neutral and charged, small organic molecular dimers. Applications to water clusters, the benzene crystal, the anti-cancer drug ellipticine---intercalated between two Watson-Crick DNA base pairs, as well as six macro-molecular host-guest complexes highlight the potential of this method and help to identify points of future improvement. The mean absolute error made by the combination of static electrostatics with many-body dispersion reduces at larger distances, while it plateaus for two-body dispersion, in conflict with the common assumption that the simple $1/R^6$ correction will yield proper dissociative tails. Overall, the method achieves an accuracy well within conventional molecular force fields while exhibiting a simple parametrization protocol.
△ Less
Submitted 13 June, 2014; v1 submitted 26 March, 2014;
originally announced March 2014.
-
Structural Basis of Folding Cooperativity in Model Proteins: Insights from a Microcanonical Perspective
Authors:
Tristan Bereau,
Markus Deserno,
Michael Bachmann
Abstract:
Two-state cooperativity is an important characteristic in protein folding. It is defined by a depletion of states lying energetically between folded and unfolded conformations. While there are different ways to test for two-state cooperativity, most of them probe indirect proxies of this depletion. Yet, generalized-ensemble computer simulations allow to unambiguously identify this transition by a…
▽ More
Two-state cooperativity is an important characteristic in protein folding. It is defined by a depletion of states lying energetically between folded and unfolded conformations. While there are different ways to test for two-state cooperativity, most of them probe indirect proxies of this depletion. Yet, generalized-ensemble computer simulations allow to unambiguously identify this transition by a microcanonical analysis on the basis of the density of states. Here we perform a detailed characterization of several helical peptides using coarse-grained simulations. The level of resolution of the coarse-grained model allows to study realistic structures ranging from small alpha-helices to a de novo three-helix bundle - without biasing the force field toward the native state of the protein. Linking thermodynamic and structural features shows that while short alpha-helices exhibit two-state cooperativity, the type of transition changes for longer chain lengths because the chain forms multiple helix nucleation sites, stabilizing a significant population of intermediate states. The helix bundle exhibits the signs of two-state cooperativity owing to favorable helix-helix interactions, as predicted from theoretical models. The detailed analysis of secondary and tertiary structure formation fits well into the framework of several folding mechanisms and confirms features observed so far only in lattice models.
△ Less
Submitted 1 July, 2011;
originally announced July 2011.
-
Interplay between Secondary and Tertiary Structure Formation in Protein Folding Cooperativity
Authors:
Tristan Bereau,
Michael Bachmann,
Markus Deserno
Abstract:
Protein folding cooperativity is defined by the nature of the finite-size thermodynamic transition exhibited upon folding: two-state transitions show a free energy barrier between the folded and unfolded ensembles, while downhill folding is barrierless. A microcanonical analysis, where the energy is the natural variable, has shown better suited to unambiguously characterize the nature of the trans…
▽ More
Protein folding cooperativity is defined by the nature of the finite-size thermodynamic transition exhibited upon folding: two-state transitions show a free energy barrier between the folded and unfolded ensembles, while downhill folding is barrierless. A microcanonical analysis, where the energy is the natural variable, has shown better suited to unambiguously characterize the nature of the transition compared to its canonical counterpart. Replica exchange molecular dynamics simulations of a high resolution coarse-grained model allow for the accurate evaluation of the density of states, in order to extract precise thermodynamic information, and measure its impact on structural features. The method is applied to three helical peptides: a short helix shows sharp features of a two-state folder, while a longer helix and a three-helix bundle exhibit downhill and two-state transitions, respectively. Extending the results of lattice simulations and theoretical models, we find that it is the interplay between secondary structure and the loss of non-native tertiary contacts which determines the nature of the transition.
△ Less
Submitted 1 July, 2011;
originally announced July 2011.