Characterizing Protein Conformational Spaces using Dimensionality Reduction and Algebraic Topology [article]

Arpita Joshi, Nurit Haspel, Eduardo Gonzalez
2021 bioRxiv   pre-print
Datasets representing the conformational landscapes of protein structures are high dimensional and hence present computational challenges. Efficient and effective dimensionality reduction of these datasets is therefore paramount to our ability to analyze the conformational landscapes of proteins and extract important information regarding protein folding, conformational changes and binding. Representing the structures with fewer attributes that capture the most variance of the data, makes for
more » ... icker and precise analysis of these structures. In this work we make use of dimensionality reduction methods for reducing the number of instances and for feature reduction. The reduced dataset that is obtained is then subjected to topological and quantitative analysis. In this step we perform hierarchical clustering to obtain different sets of conformation clusters that may correspond to intermediate structures. The structures represented by these conformations are then analyzed by studying their high dimension topological properties to identify truly distinct conformations and holes in the conformational space that may represent high energy barriers. Our results show that the clusters closely follow known experimental results about intermediate structures, as well as binding and folding events.
doi:10.1101/2021.11.16.468545 fatcat:aofpyw75qrh7li5p4svh6ngsty