Size-Extensive Molecular Machine Learning with Global Descriptors [post]

Hyunwook Jung, Sina Stocker, Christian Kunkel, Harald Oberhofer, Byungchan Han, Karsten Reuter, Johannes T. Margraf
2019 unpublished
<div> <div> <div> <p>Machine learning (ML) models are increasingly used to predict molecular prop- erties in a high-throughput setting at a much lower computational cost than con- ventional electronic structure calculations. Such ML models require descriptors that encode the molecular structure in a vector. These descriptors are generally designed to respect the symmetries and invariances of the target property. However, size- extensivity is usually not guaranteed for so-called global
more » ... s. In this contri- bution, we show how extensivity can be build into ML models with global descriptors such as the Many-Body Tensor Representation. Properties of extensive and non- extensive models for the atomization energy are systematically explored by training on small molecules and testing on small, medium and large molecules. Our result shows that the non-extensive model is only useful in the size-range of its training set, whereas the extensive models provide reasonable predictions across large size differences. Remaining sources of error for the extensive models are discussed. </p> </div> </div> </div>
doi:10.26434/chemrxiv.10002020 fatcat:h454be5xazf2rmbbdzxfyx2ony