ColabFit: Collaborative Development of Data-Driven Interatomic Potentials for Predictive Molecular Simulations

Leadership and Funding

The ColabFit project is led by a team of researchers from the departments of Aerospace Engineering and Mechanics (AEM), and Chemical Engineering and Materials Science (CEMS) at the University of Minnesota, Twin Cities in the United States:

ColabFit is funded by the U.S. National Science Foundation through grant #2039575


The emergence of data-driven approaches for developing interatomic potentials promises to transform materials design and synthesis. Data-driven interatomic potentials (DDIPs) build on recent advances in machine learning to accurately model the potential energy surface of a material system by inferring its underlying functional form from a large number of quantum input configurations. DDIPs thus enable truly predictive molecular simulations with the accuracy of first principle methods over length and time scales comparable to classical molecular simulations.

This project aims to create a computational framework “ColabFit” that enables researchers to rapidly develop and deploy DDIPs for complex material systems by connecting existing cyberinfrastructure resources of first principles and experimental data with a variety of fitting frameworks. Building on an interoperable standard for machine learning models, researchers using ColabFit will be able to archive their state-of-the-art DDIPs and training sets to the Open Knowledgebase of Interatomic Models (OpenKIM) project, and retrieve existing ones to continue their collaborative development within a supported fitting framework of their choosing. Integration with OpenKIM will ensure that any DDIP created with ColabFit can be immediately used in multiple major simulation packages. ColabFit will be developed in collaboration with an international consortium of leaders in DDIP development, high-throughput first principles computation cyberinfrastructures, and materials standards organizations. The project will be tested on a target application of DDIP development for phase transformations in 2D transition metal dichalcogenides.

This project addresses a pressing need of the molecular simulation community. The creation of ColabFit will provide materials researchers with a powerful new ability to efficiently synthesize all available data and knowledge related to their particular problem of study. DDIPs developed through ColabFit and shared through OpenKIM will be archived with full provenance and version control and a persistent digital object identifier (DOI) to enable reproducible science and R&D, and be available to other researchers in the community to build upon by extending them for their own needs. Thus, major inefficiencies in today’s materials research industry will be eliminated and all of society can benefit from the resulting increase in scientific advancement.


ColabFit aims to create a middleware layer enabling rapid development of high-quality predictive interatomic potentials by interfacing cyberinfrastructure (CI) resources including OpenKIM and first principles and experimental data repositories with IP/DDIP fitting codes. ColabFit will define a portable universal format for representing potentials (IPs and DDIPs), training data, and the training procedure (e.g. loss function and optimization algorithm), and transferring this information between OpenKIM, CI repositories, and fitting codes as shown schematically in the figure. Existing standards such as the KIM Application Programming Interface (KIM API), the Open Databases Integration for Materials Design (OPTIMADE) standard, and the Open Neural Network Exchange (ONNX) standard will be leveraged and supported. Any necessary translations in data formats will be handled automatically by ColabFit. This will allow archived potentials and training sets to be imported into any supported fitting code to initiate new training (regardless of the code originally used to train them), and for trained potentials to be contributed back to various supported CIs. This also ensures that legacy DDIPs will continue to be supported and available to the community even when no longer actively being developed. ColabFit will be implemented in Python with the ability to be embedded in C, C++ and Fortran programs when necessary. A graphical user interface (GUI) will be provided for direct access to ColabFit functionality outside fitting codes.

This pilot program aims at creating a first implementation of ColabFit that will make the above possible. In addition, the KIM-based Learning-Integrated Fitting Framework (KLIFF), a Python package for fitting machine learning potentials compatible with the KIM API, will be provided as a native fitting code for ColabFit and used to develop and test ColabFit functionality. This is helpful both for testing ColabFit during development, and also for KLIFF to be available to the research community as a native fitting code that fully integrates ColabFit functionality and is maintained by the ColabFit team. The pilot ColabFit will provide support for semi-automated manual DDIP development. The long-term vision is for this effort to enable platforms for fully-automated development of DDIPs.

The ColabFit framework with KLIFF will be validated on a materials science target application related to phase transformations in 2D transition metal dichalcogenides (TMDs), an area that has important technological applications in which the PIs have extensive domain expertise. Specifically, a DDIP will be developed to study phase transformations in MoS2 and Mo1−xWxTe2 systems with an aim to generate a temperature-composition phase diagram for the latter. The training set for the DDIP will be designed to include configurations that are important for describing the various phases of the TMD and the transformation between them (e.g. the reaction pathway). It will be constructed using existing results from first principles and experimental CIs as well as new first principles calculations performed as part of this project.


As explained above, the objectives of ColabFit are (1) to facilitate the connection between empirical and data-driven interatomic potential (DDIP) fitting codes and major online repositories of first principles (FP) data through the application programming interfaces (APIs) that these repositories have developed; and (2) to make it easy for the materials research community to use and collaborate on DDIP development. To facilitate these goals, the PIs have assembled a consortium of leaders in DDIP development and FP cyberinfrastructures (CIs) who can provide input to help guide the design of ColabFit and technical support to enable the ColabFit research team to interface with their platforms. ColabFit will engage with this consortium through an online kickoff meeting and ongoing consultation to design the ColabFit interface standard and archive formats. ColabFit will work to grow the pool of participants as the project moves forward.

DDIP/IP Projects

  • ALC – Interatomic Potentials “à la carte”, Lawrence Livermore National Laboratory, Livermore, CA, USA

  • Atomicrex – A tool for the construction of interaction models, Chalmers University of Technology, Sweden and Darmstadt University of Technology, Germany

  • DOEIPM – Database optimization for empirical interatomic potential models, University of Illinois at Urbana-Champaign, USA

  • FitSNAP – Software for generating SNAP machine-learning interatomic potentials, Sandia National Laboratories, Albuquerque, NM, USA

  • GAP – Gaussian Approximation Potential, University of Cambridge, Cambridge, UK

  • GP/MFF – Gaussian process-based active learning, Harvard University, Cambridge, MA, USA

  • INNP – Implanted Neural Network Potentials, Harvard University, Cambridge, MA, USA

  • MAML – MAterials Machine Learning, University of California, San Diego, CA, USA

  • MLIP – Machine Learning Interatomic Potentials, Skolovo Institute of Science and Technology, Moscow, Russia

  • PINNfit – Physically informed artificial neural networks for atomistic modeling of materials, George Mason University, Fairfax, VA, USA

  • POET – Potential Optimization by Evolutionary Techniques, Johns Hopkins University, Baltimore, MD, USA

  • Potfit – Effective potentials from ab-initio data, University of Warwick, Warwick, UK

  • ReaxFF – Reactive Force Field, Pennsylvania State University, University Park, PA, USA

  • RuNNer – Development of Neural Network potential-energy surfaces, University of Göttingen, Göttingen, Germany

  • SchNetPack – Deep Neural Networks for Atomistic Systems, University of Luxembourg, Luxembourg

FP Data Repositories

  • AFLOW – Automatic framework for high-throughput materials discovery, Duke University, Alexandria, VA, USA

  • CMR – Computational Materials Repository, Technical University of Denmark, Lyngby, Denmark

  • Materials Project, LBNL, Berkeley, CA, USA

  • NOMAD – Novel Materials Discovery, University of Warwick, Warwick, UK

  • OQMD – Open Quantum Materials Database, Northwestern University, Evanston, IL, USA

Standards Organizations

  • OPTIMADE – Open Databases Integration for Materials Design

  • MolSSI – Molecular Sciences Software Institute


For more information, please contact Prof. Ellad Tadmor at