mzResults: An Interactive Viewer for Interrogation and Distribution of Proteomics Results

James T. Webber, Manor Askenazi, Jarrod A. Marto
2011 Molecular & Cellular Proteomics  
The growing use of mass spectrometry in the context of biomedical research has been accompanied by an increased demand for distribution of results in a format that facilitates rapid and efficient validation of claims by reviewers and other interested parties. However, the continued evolution of mass spectrometry hardware, sample preparation methods, and peptide identification algorithms complicates standardization and creates hurdles related to compliance with journal submission requirements.
more » ... reover, the recently announced Philadelphia Guidelines (1, 2) suggest that authors provide native mass spectrometry data files in support of their peer-reviewed research articles. These trends highlight the need for data viewers and other tools that work independently of manufacturers' proprietary data systems and seamlessly connect proteomics results with original data files to support user-driven data validation and review. Based upon our recently described API 1 -based framework for mass spectrometry data analysis (3, 4), we created an interactive viewer (mzResults) that is built on established database standards and enables efficient distribution and interrogation of results associated with proteomics experiments, while also providing a convenient mechanism for authors to comply with data submission standards as described in the Philadelphia Guidelines. In addition, the architecture of mzResults supports in-depth queries of the native mass spectrometry files through our multiplierz software environment. We use phosphoproteomics data to illustrate the features and capabilities of mzResults. Molecular & Cellular Proteomics 10: 10.1074/mcp.M110.003970, 1-7, 2011. Burgeoning demand for systematic generation of mass spectrometry data in support of biomedical studies continues to drive the development of proteomics technologies at a rapid pace. The concomitant expansion of proteomics data that accompany scientific reports, often as supplementary materials, has catalyzed numerous and somewhat disparate efforts to standardize results reporting (5-11). However, as an emerging field of endeavor, proteomics faces a unique set of challenges that complicate efforts to establish open and portable formats for sharing results; specific obstacles include: (i) Technology innovation: mass spectrometry in particular continues to evolve rapidly, leading to multiple hardware configurations, scan functions, and proprietary file formats. (ii) Discovery mode experiments: the majority of studies are performed in discovery mode with the goal of maximizing new information about the protein content of each sample. As a result, methods are in a state of flux with correspondingly little standardization. (iii) Unbounded measurement space: genetic alterations (alternate splicing, translocations, etc.) and post-translational processing (modifications, enzymatic cleavage, etc.) significantly amplify the number of chemically distinct protein products relative to that predicted by the genetic code. As a result, a large fraction of MS/MS spectra go unassigned in typical database search strategies. However, increased recognition that the vast repertoire of gene-and protein-level modifications are correlated with biological function ensures that archived proteomics data, particularly unassigned MS/MS spectra, will be frequently revisited as new information becomes available. (iv) Uncertainty in sequence assignment: Search algorithms (Mascot, Sequest, Protein Pilot, etc.) may assign different peptide sequences to the same MS/MS spectrum. In addition, a large fraction of peptides cannot be uniquely assigned to a single protein, leading to an ambiguous relationship between a subset of the major claims in proteomics experiments (e.g. protein ID/quantification) and the underlying primary measurements (e.g. peptide sequences). Collectively, these phenomena create a difficult environment in which to simultaneously standardize the reporting of results and enable interested third parties to browse relevant data and test alternative hypotheses with respect to peptide identification or other claims. These problems are exacerbated as the scale of proteomics studies increases. Several groups (12-15) have developed powerful and flexible pipelines to facilitate sample tracking, data acquisition, archiving, and analysis of proteomics experiments. Although
doi:10.1074/mcp.m110.003970 pmid:21266631 pmcid:PMC3098584 fatcat:7g422ax5orejzhx6lcmnryf6j4