Accelerating the search for the missing proteins in the human proteome

Mark S. Baker, Seong Beom Ahn, Abidali Mohamedali, Mohammad T. Islam, David Cantor, Peter D. Verhaert, Susan Fanayan, Samridhi Sharma, Edouard C. Nice, Mark Connor, Shoba Ranganathan
2017 Nature Communications  
The Human Proteome Project (HPP) aims to discover high-stringency data for all proteins encoded by the human genome. Currently, B18% of the proteins in the human proteome (the missing proteins) do not have high-stringency evidence (for example, mass spectrometry) confirming their existence, while much additional information is available about many of these missing proteins. Here, we present MissingProteinPedia as a community resource to accelerate the discovery and understanding of these
more » ... proteins. T he Human Proteome Project (HPP) supports defining what it is to be human in molecular terms. It strives to 'know thyself' by finding high-stringency evidence for the B20,000 proteins encoded by the human genome. Here, we focus on what has been termed the human proteome's 'missing proteins', discuss what renders them currently unobservable using high-stringency proteomic approaches, and outline a road-map that aims to accelerate the HPP. We review milestones and the progress of this global scientific effort to accurately identify and understand the biology of genome-coded human proteins. We focus on what has been achieved to date and we identify some areas where progress may be made. We provide a comprehensive survey of the characteristics of the so-called 'missing proteins', a term initially coined by Hancock and colleagues defined in Box 1 (refs 1,2), and we emphasize why they may be difficult to detect using mass spectrometry (MS) and/or validated antibody (Abs) techniques. Our re-analysis of publicly available MS data for the largest family of missing proteins (olfactory receptors), viewed in conjunction with other specific missing protein examples reveals a need for the community to capture as much complementary evidence as possible about missing proteins, in addition to high-stringency MS data. With this aim, we launch MissingProteinPedia (, a community biological database that is complementary to the high-stringency HPP methodologies currently underway. MissingProteinPedia is a low-stringency communal database that will increase our understanding of the spatiotemporal biology of missing proteins, and accelerate their discovery by high-stringency MS. Schema 1: The MissingProteinPedia collates and displays protein information from existing databases using various web services and application programming interfaces. Furthermore, the web interface allows researchers to collaborate and share data not available through other databases. The schema includes the recent illustration of the high-stringency HPP metrics engine 9 . MissingProteinPedia facilitates HPP cross-disciplinary collaboration by providing a complementary, unfiltered, lower stringency perspective to both the HPP metrics and guidelines approaches, enabling community evaluation and scrutiny. MissingProteinPedia incorporates text mining technology to fetch
doi:10.1038/ncomms14271 pmid:28117396 pmcid:PMC5286205 fatcat:htunxdfvlvgnbbdnc7zizwuddi