-
FlowMM: Generating Materials with Riemannian Flow Matching
Authors:
Benjamin Kurt Miller,
Ricky T. Q. Chen,
Anuroop Sriram,
Brandon M Wood
Abstract:
Crystalline materials are a fundamental component in next-generation technologies, yet modeling their distribution presents unique computational challenges. Of the plausible arrangements of atoms in a periodic lattice only a vanishingly small percentage are thermodynamically stable, which is a key indicator of the materials that can be experimentally realized. Two fundamental tasks in this area ar…
▽ More
Crystalline materials are a fundamental component in next-generation technologies, yet modeling their distribution presents unique computational challenges. Of the plausible arrangements of atoms in a periodic lattice only a vanishingly small percentage are thermodynamically stable, which is a key indicator of the materials that can be experimentally realized. Two fundamental tasks in this area are to (a) predict the stable crystal structure of a known composition of elements and (b) propose novel compositions along with their stable structures. We present FlowMM, a pair of generative models that achieve state-of-the-art performance on both tasks while being more efficient and more flexible than competing methods. We generalize Riemannian Flow Matching to suit the symmetries inherent to crystals: translation, rotation, permutation, and periodic boundary conditions. Our framework enables the freedom to choose the flow base distributions, drastically simplifying the problem of learning crystal structures compared with diffusion models. In addition to standard benchmarks, we validate FlowMM's generated structures with quantum chemistry calculations, demonstrating that it is about 3x more efficient, in terms of integration steps, at finding stable materials compared to previous open methods.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts
Authors:
Richard Tran,
Janice Lan,
Muhammed Shuaibi,
Brandon M. Wood,
Siddharth Goyal,
Abhishek Das,
Javier Heras-Domingo,
Adeesh Kolluru,
Ammar Rizvi,
Nima Shoghi,
Anuroop Sriram,
Felix Therrien,
Jehad Abed,
Oleksandr Voznyy,
Edward H. Sargent,
Zachary Ulissi,
C. Lawrence Zitnick
Abstract:
The development of machine learning models for electrocatalysts requires a broad set of training data to enable their use across a wide variety of materials. One class of materials that currently lacks sufficient training data is oxides, which are critical for the development of OER catalysts. To address this, we developed the OC22 dataset, consisting of 62,331 DFT relaxations (~9,854,504 single p…
▽ More
The development of machine learning models for electrocatalysts requires a broad set of training data to enable their use across a wide variety of materials. One class of materials that currently lacks sufficient training data is oxides, which are critical for the development of OER catalysts. To address this, we developed the OC22 dataset, consisting of 62,331 DFT relaxations (~9,854,504 single point calculations) across a range of oxide materials, coverages, and adsorbates. We define generalized total energy tasks that enable property prediction beyond adsorption energies; we test baseline performance of several graph neural networks; and we provide pre-defined dataset splits to establish clear benchmarks for future efforts. In the most general task, GemNet-OC sees a ~36% improvement in energy predictions when combining the chemically dissimilar OC20 and OC22 datasets via fine-tuning. Similarly, we achieved a ~19% improvement in total energy predictions on OC20 and a ~9% improvement in force predictions in OC22 when using joint training. We demonstrate the practical utility of a top performing model by capturing literature adsorption energies and important OER scaling relationships. We expect OC22 to provide an important benchmark for models seeking to incorporate intricate long-range electrostatic and magnetic interactions in oxide surfaces. Dataset and baseline models are open sourced, and a public leaderboard is available to encourage continued community developments on the total energy tasks and data.
△ Less
Submitted 7 March, 2023; v1 submitted 17 June, 2022;
originally announced June 2022.
-
Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations
Authors:
Anuroop Sriram,
Abhishek Das,
Brandon M. Wood,
Siddharth Goyal,
C. Lawrence Zitnick
Abstract:
Recent progress in Graph Neural Networks (GNNs) for modeling atomic simulations has the potential to revolutionize catalyst discovery, which is a key step in making progress towards the energy breakthroughs needed to combat climate change. However, the GNNs that have proven most effective for this task are memory intensive as they model higher-order interactions in the graphs such as those between…
▽ More
Recent progress in Graph Neural Networks (GNNs) for modeling atomic simulations has the potential to revolutionize catalyst discovery, which is a key step in making progress towards the energy breakthroughs needed to combat climate change. However, the GNNs that have proven most effective for this task are memory intensive as they model higher-order interactions in the graphs such as those between triplets or quadruplets of atoms, making it challenging to scale these models. In this paper, we introduce Graph Parallelism, a method to distribute input graphs across multiple GPUs, enabling us to train very large GNNs with hundreds of millions or billions of parameters. We empirically evaluate our method by scaling up the number of parameters of the recently proposed DimeNet++ and GemNet models by over an order of magnitude. On the large-scale Open Catalyst 2020 (OC20) dataset, these graph-parallelized models lead to relative improvements of 1) 15% on the force MAE metric for the S2EF task and 2) 21% on the AFbT metric for the IS2RS task, establishing new state-of-the-art results.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.