-
Reaction Coordinates for Conformational Transitions using Linear Discriminant Analysis on Positions
Authors:
Subarna Sasmal,
Martin McCullagh,
Glen M. Hocky
Abstract:
In this work, we demonstrate that Linear Discriminant Analysis (LDA) applied to atomic positions in two different states of a biomolecule produces a good reaction coordinate between those two states. Atomic coordinates of a macromolecule are a direct representation of a macromolecular configuration, and yet they have not been used in enhanced sampling studies due to a lack of rotational and transl…
▽ More
In this work, we demonstrate that Linear Discriminant Analysis (LDA) applied to atomic positions in two different states of a biomolecule produces a good reaction coordinate between those two states. Atomic coordinates of a macromolecule are a direct representation of a macromolecular configuration, and yet they have not been used in enhanced sampling studies due to a lack of rotational and translational invariance. We resolve this issue using the technique of our prior work, where a molecular configuration is considered a member of an equivalence class in size-and-shape space, which is the set of all configurations that can be translated and rotated to a single point within a reference multivariate Gaussian distribution characterizing a single molecular state. The reaction coordinates produced by LDA applied to positions are shown to be good reaction coordinates both in terms of characterizing the transition between two states of a system within a long molecular dynamics (MD) simulation, and also ones that allow us to readily produce free energy estimates along that reaction coordinate using enhanced sampling MD techniques.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Size-and-Shape Space Gaussian Mixture Models for Structural Clustering of Molecular Dynamics Trajectories
Authors:
Heidi Klem,
Glen M. Hocky,
Martin McCullagh
Abstract:
Determining the optimal number and identity of structural clusters from an ensemble of molecular configurations continues to be a challenge. Recent structural clustering methods have focused on the use of internal coordinates due to the innate rotational and translational invariance of these features. The vast number of possible internal coordinates necessitates a feature space supervision step to…
▽ More
Determining the optimal number and identity of structural clusters from an ensemble of molecular configurations continues to be a challenge. Recent structural clustering methods have focused on the use of internal coordinates due to the innate rotational and translational invariance of these features. The vast number of possible internal coordinates necessitates a feature space supervision step to make clustering tractable, but yields a protocol that can be system type specific. Particle positions offer an appealing alternative to internal coordinates, but suffer from a lack of rotational and translational invariance, as well as a perceived insensitivity to regions of structural dissimilarity. Here, we present a method, denoted shape-GMM, that overcomes the shortcomings of particle positions using a weighted maximum likelihood (ML) alignment procedure. This alignment strategy is then built into an expectation maximization Gaussian mixture model (GMM) procedure to capture metastable states in the free energy landscape. The resulting algorithm distinguishes between a variety of different structures, including those indistinguishable by RMSD and pair-wise distances, as demonstrated on several model systems. Shape-GMM results on an extensive simulation of the the fast-folding HP35 Nle/Nle mutant protein support a 4-state folding/unfolding mechanism which is consistent with previous experimental results and provides kinetic detail comparable to previous state of the art clustering approaches, as measured by the VAMP-2 score. Currently, training of shape-GMMs is recommended for systems (or subsystems) that can be represented by $\lesssim$ 200 particles and $\lesssim$ 100K configurations to estimate high-dimensional covariance matrices and balance computational expense. Once a shape-GMM is trained, it can be used to predict the cluster identities of millions of configurations.
△ Less
Submitted 23 March, 2022; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Implicit Solvation Using the Superposition Approximation (IS-SPA): Extension to Polar Solutes in Chloroform
Authors:
Peter T. Lake,
Max A. Mattson,
Martin McCullagh
Abstract:
Efficient, accurate, and adaptable implicit solvent models remain a significant challenge in the field of molecular simulation. A recent implicit solvent model, IS-SPA, based on approximating the mean solvent force using the superposition approximation, provides a platform to achieve these goals. IS-SPA was originally developed to handle non-polar solutes in the TIP3P water model but can be extend…
▽ More
Efficient, accurate, and adaptable implicit solvent models remain a significant challenge in the field of molecular simulation. A recent implicit solvent model, IS-SPA, based on approximating the mean solvent force using the superposition approximation, provides a platform to achieve these goals. IS-SPA was originally developed to handle non-polar solutes in the TIP3P water model but can be extended to accurately treat polar solutes in other polar solvents. In this manuscript, we demonstrate how to adapt IS-SPA to include the treatment of solvent orientation and long ranged electrostatics in a solvent of chloroform. The orientation of chloroform is approximated as that of an ideal dipole aligned in a mean electrostatic field. The solvent--solute force is then considered as an averaged radially symmetric Lennard-Jones component and a multipole expansion of the electrostatic component through the octupole term. Parameters for the model include atom-based solvent density and mean electric field functions that are fit from explicit solvent simulations of independent atoms or molecules. Using these parameters, IS-SPA accounts for asymmetry of charge solvation and reproduces the explicit solvent potential of mean force of dimerization of two oppositely charged Lennard-Jones spheres with high fidelity. Additionally, the model more accurately captures the effect of explicit solvent on the monomer and dimer configurations of alanine dipeptide in chloroform than a generalized Born or constant density dielectric model. The current version of the algorithm is expected to outperform explicit solvent simulations for aggregation of small peptides at concentrations below 150 mM, well above the typical experimental concentrations for these materials.
△ Less
Submitted 5 October, 2020; v1 submitted 29 September, 2020;
originally announced September 2020.