Oral presentation

2021 The EuroBiotech Journal  
The aim was to compare the performance of Two Adaptive Boosting ensembles based on ECFP and 16 selected physicochemical descriptors using the F-1 score on an external set of 54 molecules. The training molecules were obtained from BitterDB, SupersweetDB and Fenaroli's handbook of flavor ingredients. The ECFP6-1024bit were created using the RDKit package in an Anaconda environment. The physicochemical descriptors were calculated using the Mordred package. The method for selecting the
more » ... al descriptors was mutual information. The Adaboost model was trained using the library scikit-learn. A 5-fold cross validation was used as metric for the fine tuning of the models. Afterwards, the F-1 score on the external was calculated. The model based on ECFP showed a cross-validation accuracy of 0.80. The f1 score on the UNIMI set was 0.71. The model based on physicochemical descriptors showed a cross-validation accuracy of 0.77. The f1 score on the UNIMI set was 0.78. The latter showed that the selected physicochemical descriptors led to a better separation of the bitter/non bitter molecules. The ECFP and the physicochemical descriptors provide different information about the training set and together could provide a better input for bitterness prediction.
doi:10.2478/ebtj-2021-0030 fatcat:z4m6cey7tbf5ll5c473ocvb52e