Geometric Detection Algorithms for Cavities on Protein Surfaces in Molecular Graphics: A Survey
Computer graphics forum (Print)
Detecting and analyzing protein cavities provides significant information about active sites for biological processes (e.g., protein-protein or protein-ligand binding) in molecular graphics and modeling. Using the three-dimensional structure of a given protein (i.e., atom types and their locations in 3D) as retrieved from a PDB (Protein Data Bank) file, it is now computationally viable to determine a description of these cavities. Such cavities correspond to pockets, clefts, invaginations,
... , tunnels, channels, and grooves on the surface of a given protein. In this work, we survey the literature on protein cavity computation and classify algorithmic approaches into three categories: evolution-based, energy-based, and geometry-based. Our survey focuses on geometric algorithms, whose taxonomy is extended to include not only sphere-, grid-, and tessellation-based methods, but also surface-based, hybrid geometric, consensus, and time-varying methods. Finally, we detail those techniques that have been customized for GPU (Graphics Processing Unit) computing. binding sites of proteins [KG07] . This explains why detecting molecular cavities still is a very active research area [HSAH*09]. Although several authors have surveyed cavity detection algorithms [GS11,ZGWW12,BCG*13,Duk13,KSL*15], these surveys only present brief citations backed by summary descriptions, i.e., they do not provide enough detail on the algorithms. Furthermore, these surveys agree on a simplified classification of cavity detection algorithms into the following classes: sphere-based, grid-based, and Voronoi-based. More importantly, such surveys lack a critical comparison between algorithms. As an exception, a more detailed survey focusing on the visual analysis of biomolecular cavities was recently published [KKL*16], i.e., with a flavor in molecular visualization. On the contrary, our survey adopts a more geometry-based approach to protein cavity detection. This survey falls in the scope of molecular graphics and modeling, i.e., a research area at the intersection of computational biology, bioinformatics, computational geometry and computer graphics. More specifically, this article approaches the computer graphics and computational geometry side of cavity detection methods, i.e., the geometry of proteins; hence, the focus is on geometry-based algorithms for identifying cavities on protein surfaces such as those depicted in Fig. 1 . As mentioned above, geometric methods for detecting cavities on proteins fall into three main categories: grid-based, sphere-based, and Voronoibased. We extend this classification of geometric methods as a tool to organize the survey itself, as illustrated in Fig. 2. Background There has been considerable work on cavity detection for molecules. This is especially relevant for molecular docking and related problems. A molecule is considered to be an orderly grouping of atoms bound by favorable chemical connections [JKSS96, WM97]. In particular, the family of biomolecules spans the building blocks of living organisms. This family includes large macromolecules, namely proteins, polysaccharides, lipids, nucleic acids, and small molecules (e.g., primary metabolites and secondary metabolites). In this paper, we are interested in proteins and their cavities, where their interactions with ligands usually take place. Proteins Proteins constitute about twenty percent of the human body, and play a crucial role in most biological processes. Amino acids are the building blocks that make up proteins [Whi05]. In summary terms, a protein can be understood at four distinct structural levels [AJL*07]. The primary structure of a protein is given by its sequence (or chain) of amino acids. The secondary structure of a protein comprises amino acid subsequences that exhibit a specific structural regularity. These secondary regular structures are known as alpha-helices (alphahelixes) and beta-pleated sheets (beta-sheets). Alternatively, the secondary structure can be defined using the regularity of backbone dihedral angles of amino acid residues. The tertiary structure denotes the geometric shape of a given protein, i.e., it refers to the folding of the whole protein chain (including the secondary structures) into its final 3-dimensional shape. Recall that it is the protein folding that makes the protein acquire its functional shape or Simões et al.