Construction Zone: a software package for building complex nanoscale atomic scenes for applications in machine learning data generation pipelines

Luis Rangel DaCosta, Mary Scott
2021 Microscopy and Microanalysis  
Applied machine learning in the realm of atomic-resolution electron microscopy is becoming an increasingly prevalent technique. Training popular models like neural networks to state-of-the-art accuracy, though, often requires vast amounts of well-labeled data, on the order of tens to hundreds of thousands of images. Similarly, it is important to validate machine learning approaches against some form of ground-truth reference in order to develop a fuller understanding of a particular model or
more » ... hnique's performance. Utilizing ground-truth references can also help characterize a particular technique's robustness against label noise in training datasets, which can be detrimental to the training of a neural network [1]. These needs could potentially be well-addressed through simulation and perhaps easily done for simple problems of interest such as single crystal classification, where it is relatively easy to generate enough atomic models to well cover a relevant experiment sample space. However, many materials systems of interest have complex nanoscale structures where there is much more significant challenge in generating enough unique samples to represent experimental distributions and thus properly train a machine learning model. For example, in nanoparticle systems with multiply-twinned structures, there have been a variety of proposed synthetic routes for the formation of twinned regions [2,3]-studying such systems in an high-throughput manner with machine learning would require generating image data of many nanoparticles with varying grain boundary and twin structures. Current popular software packages for generating and manipulating atomic models for materials science like pymatgen [4] and Atomic Simulation Environment (ASE) [5] are well-suited for ab-initio type problems but are ultimately not designed to handle larger nanoscale environments or features. Often, for image simulation of complex structures, researchers manually write bespoke scripts that are hard to scale and reuse. In this work, we discuss the development of a python-based software package designed to facilitate the generation of nanoscale atomic scenes for the purpose of large-scale image simulation efforts. We will also discuss how such a tool could be implemented into a practical workflow in a machine learning application, namely, segmentation of nanoparticles in HRTEM imaging. Construction Zone (CZ) is a python package for building and generating nanoscale atomic scenes, primarily for use in conjunction with image simulation. It builds on popular open-source software in the materials simulation community, namely, pymatgen and ASE. The design philosophy in CZ is to separate scene construction into two main classes of objects-generators, which populate a region with atoms, and volumes, which describe boundaries on said regions-and provides easy methods and routines for establishing relationships between these objects. At the most basic level, each generator is attached to a volume. Generators and volumes maintain their own spatial properties (such as origins and orientations) and can be either independently or jointly https://doi.
doi:10.1017/s1431927621010424 fatcat:pidi3grrevgi3bjz7nzk5t7kgq