Fast and adaptive protein structure representations for machine learning [article]

Janani Durairaj, Mehmet Akdel, Dick de Ridder, Aalt D.J. van Dijk
2021 bioRxiv   pre-print
The growing prevalence and popularity of protein structure data, both experimental and computationally modelled, necessitates fast tools and algorithms to enable exploratory and interpretable structure-based machine learning. Alignment-free approaches have been developed for divergent proteins, but proteins sharing functional and structural similarity are often better understood via structural alignment, which has typically been too computationally expensive for larger datasets. Here, we
more » ... ce the concept of rotation-invariant shape-mers to multiple structure alignment, creating a structure aligner that scales well with the number of proteins and allows for aligning over a thousand structures in 20 minutes. We demonstrate how alignment-free shape-mer counts and aligned structural features, when used in machine learning tasks, can adapt to different levels of functional hierarchy in protein kinases, pinpointing residues and structural fragments that play a role in catalytic activity.
doi:10.1101/2021.04.07.438777 fatcat:qospbzohkbbwbkrxdozbok7oje