Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs
Lecture Notes in Computer Science
Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. This thesis introduces new computational methods that extract these sequence motifs and carrier proteins,
... d learn the sorting pathways. We first develop a system that utilizes the known cellular sorting pathways to learn sequence motifs and predict locations. We proposed a discriminative motif finding method that identifies potential targeting motifs. Our method utilizes a tree structure mimicking the known targeting pathways. Using these motifs we were able to improve localization prediction on a benchmark dataset of yeast proteins. The motifs identified are more conserved than the average protein sequence. Using our motif-based predictions we were also able to correct annotation errors in public databases for the location of some of the proteins. Furthermore we present a new method that integrates sequence, motif and protein interaction data to model how proteins are sorted through the sorting pathways with a hidden Markov model (HMM). Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. We extend this model to support alternative splicing and multiple cell types in higher organisms. Using our method we performed the first systematic discovery of targeting pathways in the human proteome based on confocal microscopy images on HPA. We show that our pathways structure improves localization prediction, and the learned structure resembles our basic understanding of cellular sorting mechanism. iii Acknowledgements First and foremost, I would like to thank my advisers Ziv Bar-Joseph and Robert F. Murphy for their support and guidance. Both the professional and the personal relation with them are invaluable. Their encouragement is the most important reason I can finish my graduate study. I learned a lot from how Ziv choose and define research problems, the way he approaches problems, what he emphasizes and what he avoids. Bob started me in the field that becomes my research focus and inspired me to seek biological problems with true impact. His experience taught me a great deal about how to do research in computational biology.