Using an Optimal Set of Features with a Machine Learning-Based Approach to Predict Effector Proteins for Legionella pneumophila release_zasr6w5fsvcwjllzaj7eeh6mxa

by Zhila Esna Ashari, Kelly A Brayton, Shira L Broschat

Released as a post by Cold Spring Harbor Laboratory.

2018  

Abstract

Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This work focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires' disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 760 effector proteins, more than any other study, 315 of which have been validated. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.
In application/xml+jats format

Archived Files and Locations

application/pdf   513.7 kB
file_pc3cdwcyfbgn7o432aeohuvthu
www.biorxiv.org (repository)
web.archive.org (webarchive)
application/pdf   514.1 kB
file_h2gm25yyl5eu7eiulchqw3saam
www.biorxiv.org (repository)
web.archive.org (webarchive)
application/pdf   514.7 kB
file_alfjkso6u5dafi4ccxgleteck4
web.archive.org (webarchive)
www.biorxiv.org (web)
Read Archived PDF
Preserved and Accessible
Type  post
Stage   unknown
Date   2018-08-02
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 4761de5b-92b9-43c2-aa3b-1f1aa993b58e
API URL: JSON