Deep Generative Models of Protein Domain Structures Can Uncover Distant Relationships: Evidence for an Urfold

Eli Draizen, Menuka Jaiswal, Saad Saleem, Yonghyeon Kweon, Stella Veretnik, Cameron Mura, Philip E. Bourne
2020 Zenodo  
Recent advances in protein structure determination and prediction offer new opportunities to decipher relationships amongst proteins—a task that entails 3D structure comparison and classification. Historically, protein domain classification has been somewhat manual and heuristic. While CATH and related resources represent significant steps towards a more systematic and automatable approach, more scalable and objective classification methods will enable a fuller exploration of protein structure
more » ... r 'fold' space. Comparative analyses of protein structure latent spaces may uncover distant relationships, and will potentially entail a large-scale restructuring of traditional classification schemes. We have developed 3D convolutional variational autoencoders to 'define' ideal geometries and biophysical properties of proteins at CATH's homologous superfamily (SF) level. To quantitatively evaluate pairwise 'distances' between SFs, we built one model per SF and compared the evidence lower bound (ELBO) loss functions of the models when evaluated with different SF structure representatives. Clustering on these distance matrices provides a new view of protein interrelationships—a view that extends beyond simple structural/geometric similarity, towards the realm of structure/function properties, and that is consistent with a recently proposed 'Urfold' concept.
doi:10.5281/zenodo.4298958 fatcat:klh6psjzbja23bpnjrxdwijdci