QM/NN QSPR Models with Error Estimation: Vapor Pressure and LogP

Bernd Beck, Andreas Breindl, Timothy Clark
2000 Journal of chemical information and computer sciences  
QSPR models for logP and vapor pressures of organic compounds based on neural net interpretation of descriptors derived from quantum mechanical (semiempirical MO; AM1) calculations are presented. The models are cross-validated by dividing the compound set into several equal portions and training several individual multilayer feedforward neural nets (trained by the back-propagation of errors algorithm), each with a different portion as test set. The results of these nets are combined to give a
more » ... an predicted property value and a standard deviation. The performance of two models, for logP and the vapor pressure at room temperature, is analyzed, and the reliability of the predictions is tested. Estimation of physical properties from molecular structures (quantitatiVe structure-property relationships, QSPRs) has been one of the cornerstones of computational chemistry since the pioneering work of Hansch and Leo. 1,2 Many models for properties such as the logarithm of the octanol/ water partition coefficient, logP, 3-5 standard enthalpies of formation, 6-8 boiling points, 9 melting points, 10 and aqueous solubility 11,12 are based on incremental approaches in which the molecule is divided into atoms or groups and each fragment is assigned an incremental contribution. With correction factors to take unusual interactions into account, these 2D-methods can be very accurate and are computationally very efficient. They do, however, suffer from the disadvantages that increments may not have been derived for some rare fragments, that exceptional molecular features not found in the parametrization set may render the results unreliable for "exotic" molecules, and that they may tend to be overfitted for some classes of molecules. The typical characteristics of such models is that they fit "normal" molecules very well but that their performance falls very sharply for the "exotics". This need not necessarily be a serious disadvantage as, for instance, the limitations on candidate drug molecules are relatively restrictive, 13,14 so that property estimation within a limited class of compounds is needed. We 15,16 and others 17,18 have investigated alternative approaches in which quantum mechanically derived 3Ddescriptors are used to derive the QSPR, either using linear regression 17 or simple multilayer feedforward neural nets 15, 16, 18 as the interpretative tool. Our results for logP 15 suggest that models so derived may be very robust and general and that such models may provide a real alternative to the conventional 2D-approaches. 3D-Techniques also offer the possibility that conformationally dependent properties can be calculated. This would, however, require very accurate experimental parametrization data for a large number of conformationally rigid molecules and is presently impractical for almost all physical properties. Our aim in this work is to derive two quantum mechanics/ neural net (QM/NN) QSPR models (for logP and the logarithm of the vapor pressure at 25 o C) using implicit crossvalidation of the neural net models and with an estimate of the likely error limits of the predicted values built into the model. We will analyze the performance of the models and investigate a technique for estimating the reliability of the individual predictions. DESCRIPTORS As in our previous work on logP, 15 we use predominantly electrostatic descriptors, including those derived by Politzer et al., 19 calculated for the AM1 20 optimized structures using the NAO-PC model 21-23 to calculate the molecular electrostatic potential. The descriptors used for the logP nets have been described previously. 15 For the nets used to estimate vapor pressures, the descriptor set defined in Table 1 was used. Training/Test Sets. The set of 1085 molecules used previously for the logP model 15 was also used in this work. For the parametrization, data for the vapor pressure of compounds measured at 25°C, or in a temperature range that allowed us to use Antoine's equation to calculate the vapor pressure at 25°C, were chosen from ref 24. This results in a total of 551 compounds for the combined training/test set. No data are available that allow us to judge the experimental errors. Semiempirical MO-Calculations. The molecular structures were optimized without symmetry constraints to a gradient norm of 0.4 kcal mol -1 Å -1 with VAMP 6.5 25 and VAMP 7.0 26 using the standard default EF optimizer. 27 The standard AM1 20 Hamiltonian and parameter set was used throughout. The optimized geometries for the logP dataset were those defined in ref 15. The starting geometries for the vapor pressure compounds were derived from SMILES strings 28 using CORINA 29 for the 2D-3D conversion. The single conformations resulting from optimization of these starting geometries were used throughout. The molecular electrostatic potentials were calculated using the natural atomic orbital/point charge (NAO-PC) model, 21-23 and
doi:10.1021/ci990131n pmid:10955536 fatcat:etbstqoxhbbfvknihirkeys7da