SISSO++: A C++ Implementation of the Sure-Independence Screening and Sparsifying Operator Approach [dataset]

Thomas A. r. Purcell, Matthias Scheffler, Christian Carbogno, Luca M. Ghiringhelli
2022 Journal of Open Source Software  
The sure independence screening and sparsifying operator (SISSO) approach (Ouyang et al., 2018) is an algorithm belonging to the field of artificial intelligence and more specifically a combination of symbolic regression and compressed sensing. As a symbolic regression method, SISSO is used to identify mathematical functions, i.e. the descriptors, that best predict the target property of a data set. Furthermore, the compressed sensing aspect of SISSO, allows it to find sparse linear models
more » ... tens to thousands of data points. SISSO is introduced for both regression and classification tasks. In practice, SISSO first constructs a large and exhaustive feature space of trillions of potential descriptors by taking in a set of user-provided primary features as a dataframe, and then iteratively applying a set of unary and binary operators, e.g. addition, multiplication, exponentiation, and squaring, according to a user-defined specification. From this exhaustive pool of candidate descriptors, the ones most correlated to a target property are identified via sure-independence screening, from which the low-dimensional linear models with the lowest error are found via an 0 regularization. Because symbolic regression generates an interpretable equation, it has become an increasingly popular concept across scientific disciplines (Neumann et al., 2020; Udrescu & Tegmark, 2020; Wang et al., 2019) . A particular advantage of these approaches are their capability to model complex phenomena using relatively simple descriptors. SISSO has been used successfully in the past to model, explore, and predict important material properties, including the stability of different phases (
doi:10.21105/joss.03960 dblp:journals/jossw/PurcellSCG22 fatcat:pdfei5gq3jabjhti5nv7plidzq