A novel pattern recognition algorithm to classify membrane protein unfolding pathways with high-throughput single-molecule force spectroscopy
Motivation: Misfolding of membrane proteins plays an important role in many human diseases such as retinitis pigmentosa, hereditary deafness and diabetes insipidus. Little is known about membrane proteins as there are only very few high-resolution structures. Singlemolecule force spectroscopy is a novel technique, which measures the force necessary to pull a protein out of a membrane. Such force curves contain valuable information on the protein structure, conformation, and inter-and
... ular forces. High-throughput force spectroscopy experiments generate hundreds of force curves including spurious ones and good curves, which correspond to different unfolding pathways. Manual analysis of these data is a bottleneck and source of inconsistent and subjective annotation. Results: We propose a novel algorithm for the identification of spurious curves and curves representing different unfolding pathways. Our algorithm proceeds in three stages: first, we reduce noise in the curves by applying dimension reduction; second, we align the curves with dynamic programming and compute pairwise distances and third, we cluster the curves based on these distances. We apply our method to a hand-curated dataset of 135 force curves of bacteriorhodopsin mutant P50A. Our algorithm achieves a success rate of 81% distinguishing spurious from good curves and a success rate of 76% classifying unfolding pathways. As a result, we discuss five different unfolding pathways of bacteriorhodopsin including three main unfolding events and several minor ones. Finally, we link folding barriers to the degree of conservation of residues. Overall, the algorithm tackles the force spectroscopy bottleneck and leads to more consistent and reproducible results paving the way for high-throughput analysis of structural features of membrane proteins.