Common sequence variants affect molecular function more than rare variants?

Yannick Mahlich, Jonas Reeb, Maximilian Hecht, Maria Schelling, Tjaart Andries Petrus De Beer, Yana Bromberg, Burkhard Rost
2017 Scientific Reports  
Any two unrelated individuals differ by about 10,000 single amino acid variants (SAVs). Do these impact molecular function? Experimental answers cannot answer comprehensively, while state-of-the-art prediction methods can. We predicted the functional impacts of SAVs within human and for variants between human and other species. Several surprising results stood out. Firstly, four methods (CADD, PolyPhen-2, SIFT, and SNAP2) agreed within 10 percentage points on the percentage of rare SAVs
more » ... d with effect. However, they differed substantially for the common SAVs: SNAP2 predicted, on average, more effect for common than for rare SAVs. Given the large ExAC data sets sampling 60,706 individuals, the differences were extremely significant (p-value < 2.2e-16). We provided evidence that SNAP2 might be closer to reality for common SAVs than the other methods, due to its different focus in development. Secondly, we predicted significantly higher fractions of SAVs with effect between healthy individuals than between species; the difference increased for more distantly related species. The same trends were maintained for subsets of only housekeeping proteins and when moving from exomes of 1,000 to 60,000 individuals. SAVs frozen at speciation might maintain protein function, while many variants within a species might bring about crucial changes, for better or worse. Single nucleotide variants (SNVs) constitute the most frequent form of human genetic variation 1 . Here, we focus on non-synonymous SNVs, i.e. genomic variants that result in single amino acid variants (SAVs) in protein sequences. Children differ by about two SAVs from their parents (de novo variation), while any two unrelated individuals can differ by as many as 10-20 K 2 . The vast majority (99%) of the known unique SAVs are rare, i.e. observed in less than 1% of the population 1, 3 . Only about 0.5% of the unique SAVs are common, i.e. observed in over 5% of the population 1, 3 . SAVs can impact protein function in many ways. We might be inclined to classify SAVs according to what they affect or do not affect. Effects are commonly distinguished upon protein function and structure. This distinction has limited value because what changes structure often tends to affect function. Similarly, we might distinguish between the effect upon molecular function (e.g. binding stronger or not binding), upon the role of a protein in a process (native process hampered, blocked, or non-native role acquired), or upon the localization of a protein (e.g. protein makes it to the membrane or not). Again the problem of this distinction is that these aspects are coupled: for instance, effects upon molecular function and localization might affect the process or not. All of the above, we might classify as effects upon the protein. Unfortunately, from all experiments monitoring SAV effects in many model organisms, just a few tens of thousands effects are available in public databases. For a tiny subset of these, enough detail is available to consider all effect types (structure vs. function, molecular vs. process vs. localization). We might consider the effect upon protein as molecular as opposed to the effect upon the organism, such as diseases. Toward this end, the distinction is often made between SAVs that cause severe monogenic diseases 4 (referred to as OMIM-type SAVs) or contribute to complex diseases 5 and low-effect SAVs, which are only cumulatively linked to our phenotypic Published: xx xx xxxx OPEN www.nature.com/scientificreports/ 2 Scientific RepoRts | 7: 1608 |
doi:10.1038/s41598-017-01054-2 pmid:28487536 pmcid:PMC5431670 fatcat:kivphgy27nfvrmbjl3oxorn3kq