SU-QMI: A Feature Selection Method Based on Graph Theory for Prediction of Antimicrobial Resistance in Gram-Negative Bacteria

Abu Chowdhury, Douglas Call, Shira Broschat
2020 Proceedings of 1st International Electronic Conference on Microbiology   unpublished
Machine learning can be used as an alternative to similarity algorithms such as BLASTp when the latter fail to identify dissimilar antimicrobial-resistance genes (ARGs) in bacteria; however, determining the most informative characteristics, known as features, for antimicrobial resistance (AMR) is essential to obtain accurate predictions. In this paper we introduce a feature selection algorithm called symmetrical uncertainty-qualitative mutual information (SU-QMI) which selects features based on
more » ... estimates of their relevance, redundancy, and interdependency. We use these together with graph theory to derive a feature selection method for identifying putative ARGs in Gram-negative bacteria. We extract physicochemical, evolutionary, and structural features from the protein sequences of five genera of Gram-negative bacteria-Acinetobacter, Klebsiella, Campylobacter, Salmonella, and Escherichia-which confer resistance to acetyltransferase (aac), β-lactamase (bla), and dihydrofolate reductase (dfr). Our SU-QMI algorithm is then used to find the best subset of features, and a support vector machine (SVM) model is trained for AMR prediction using this feature subset. We evaluate performance using an independent set of protein sequences from three Gram-negative bacterial genera-Pseudomonas, Vibrio, and Enterobacter-and achieve prediction accuracy ranging from 88% to 100%. Compared to the SU-QMI method, BLASTp requires similarity as low as 53% for comparable classification results. Our results indicate the effectiveness of the SU-QMI method for selecting the best protein features for AMR prediction in Gram-negative bacteria.
doi:10.3390/ecm2020-07129 fatcat:jezyv3sbvzcnhcvec5khb3gqiq