EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations

Equivariant graph neural networks force fields (EGraFFs) have shown great promise in modelling complex interactions in atomic systems by exploiting the graphs' inherent symmetries. Recent works have led to a surge in the development of novel architectures that incorporate equivariance-based inductive biases alongside architectural innovations like graph transformers and message passing to model atomic interactions. However, thorough evaluations of these deploying EGraFFs for the downstream task of real-world atomistic simulations, is lacking. To this end, here we perform a systematic benchmarking of 6 EGraFF algorithms (NequIP, Allegro, BOTNet, MACE, Equiformer, TorchMDNet), with the aim of understanding their capabilities and limitations for realistic atomistic simulations. In addition to our thorough evaluation and analysis on eight existing datasets based on the benchmarking literature, we release two new benchmark datasets, propose four new metrics, and three challenging tasks. The new datasets and tasks evaluate the performance of EGraFF to out-of-distribution data, in terms of different crystal structures, temperatures, and new molecules. Interestingly, evaluation of the EGraFF models based on dynamic simulations reveals that having a lower error on energy or force does not guarantee stable or reliable simulation or faithful replication of the atomic structures. Moreover, we find that no model clearly outperforms other models on all datasets and tasks. Importantly, we show that the performance of all the models on out-of-distribution datasets is unreliable, pointing to the need for the development of a foundation model for force fields that can be used in real-world simulations. In summary, this work establishes a rigorous framework for evaluating machine learning force fields in the context of atomic simulations and points to open research challenges within this domain.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Formation Energy 3BPA NequIP MAE 3.15 # 4
Formation Energy 3BPA MACE MAE 4 # 3
Formation Energy 3BPA BOTNet MAE 5 # 1
Formation Energy 3BPA Allegro MAE 4.13 # 2
Formation Energy Acetylacetone NequIP MAE 1.38 # 3
Formation Energy Acetylacetone Allegro MAE 0.92 # 4
Formation Energy Acetylacetone BOTNet MAE 2 # 1
Formation Energy Acetylacetone MACE MAE 2 # 1
Formation Energy Aspirin MACE MAE 13.79 # 2
Formation Energy Aspirin NequIP MAE 9.27 # 4
Formation Energy Aspirin Allegro MAE 14.36 # 1
Formation Energy Aspirin BOTNet MAE 12.63 # 3
Formation Energy Ethanol NequIP MAE 4.99 # 4
Formation Energy Ethanol MACE MAE 209.96 # 1
Formation Energy Ethanol BOTNet MAE 203.83 # 2
Formation Energy Ethanol Allegro MAE 6.94 # 3
Formation Energy GeTe MACE MAE 2670 # 2
Formation Energy GeTe BOTNet MAE 3034 # 1
Formation Energy GeTe Allegro MAE 1009.4 # 4
Formation Energy GeTe NequIP MAE 1780.951 # 3
Formation Energy LiPS NequIP MAE 165.43 # 1
Formation Energy LiPS MACE MAE 30 # 3
Formation Energy LiPS BOTNet MAE 28 # 4
Formation Energy LiPS Allegro MAE 31.75 # 2
Formation Energy LiPS20 MACE MAE 14.05 # 4
Formation Energy LiPS20 NequIP MAE 26.8 # 2
Formation Energy LiPS20 Allegro MAE 33.17 # 1
Formation Energy LiPS20 BOTNet MAE 24.59 # 3
Formation Energy Naphthalene NequIP MAE 2.66 # 4
Formation Energy Naphthalene MACE MAE 161.74 # 2
Formation Energy Naphthalene BOTNet MAE 182.55 # 1
Formation Energy Naphthalene Allegro MAE 5.82 # 3
Formation Energy Salicylic Acid MACE MAE 165.29 # 1
Formation Energy Salicylic Acid Allegro MAE 8.59 # 3
Formation Energy Salicylic Acid BOTNet MAE 153.06 # 2
Formation Energy Salicylic Acid NequIP MAE 6.29 # 4

Methods


No methods listed for this paper. Add relevant methods here