A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
The size of chemical compound space is too large to be probed exhaustively. This leads high-throughput protocols to drastically subsample and results in sparse and non-uniform datasets. Rather than arbitrarily selecting compounds, we systematically explore chemical space according to the target property of interest. We first perform importance sampling by introducing a Markov chain Monte Carlo scheme across compounds. We then train an ML model on the sampled data to expand the region ofarXiv:1905.01897v1 fatcat:2tyfrtrtifgufejlphvswwvp2e