Accurate prediction of bacterial two-component signaling with a deep recurrent neural network ORAKLE [article]

Jan Balewski, Zachary Hallberg
2019 bioRxiv   pre-print
Two-component systems (2CS) are a primary method that bacteria use to detect and respond to environmental stimuli. Receptor histidine kinases (HK) detect an environmental signal, activating the appropriate response regulator (RR). Genes for such {\it cognate} HK-RR pairs are often located proximally on the chromosome, allowing easier identification of the target for a particular signal. However, almost half of all HK and RR proteins are {\it orphans}, with no nearby partner, complicating
more » ... ication of the proteins that respond to a particular signal. To address this problem, we trained a neural network on the amino acid sequences of known 2CS pairs. Next, we developed a recommender algorithm that ranks a set of HKs for an arbitrary fixed RR and arbitrary species whose amino acid sequences are known. The recommender strongly favors known 2CS pairs, and correctly selects orphan pairs in \textit{Escherichia coli}. We expect that use of these results will permit rapid discovery of orphan HK-RR pairs.
doi:10.1101/532721 fatcat:54nfktiwxjfivmwt5eiqoxf53a