Classifying the toxicity of pesticides to honey bees via support vector machines with random walk graph kernels
Pesticides benefit agriculture by increasing crop yield, quality, and security. However, pesticides may inadvertently harm bees, which are agriculturally and ecologically vital as pollinators. The development of new pesticides---driven by pest resistance to and demands to reduce negative environmental impacts of incumbent pesticides---necessitates assessments of pesticide toxicity to bees. We leverage a data set of 382 molecules labeled from honey bee toxicity experiments to train a classifier
... hat predicts the toxicity of a new pesticide molecule to honey bees. Traditionally, the first step of a molecular machine learning task is to explicitly convert molecules into feature vector representations for input to the classifier. Instead, we (i) adopt the fixed-length random walk graph kernel to express the similarity between any two molecular graphs and (ii) use the kernel trick to train a support vector machine (SVM) to classify the bee toxicity of pesticides represented as molecular graphs. We assess the performance of the graph-kernel-SVM classifier under different walk lengths used to describe the molecular graphs. The optimal classifier, with walk length 5, achieves an (mean over 100 runs) accuracy, precision, and recall of 0.83, 0.71, and 0.72 on a test data set.