Pattern recognition tools for proteomics

Virginio Cantoni
2014 The European Physical Journal Plus  
The computer science community started research activities into pattern recognition and artificial vision in the sixties. After so many years of studies and so much research, successfully developed in several fields in recent decades, the pattern recognition community has started to apply the know-how, computing strategies, technologies, methods and tools acquired to new areas, such as computational and structural biology and, in particular, in proteomics. Some of the lines and perspectives of
more » ... his initiative are presented in this Focus Point. The goal is not a complete survey of the strategies pursued. Rather, it is to describe some remarkable examples of the novel and promising approaches currently under development to the community of physicists. The selected contributions are related to: i) The identification of motifs and domains conserved in families of proteins on the geometric-topologic basis. Protein structure analysis and comparison are important to understand the evolutionary relationships among proteins, predicting protein folding and protein functions. A structural motif is a compact 3D protein block, which appears in a variety of molecules. Several motifs are packed together to form domains. Some investigations on protein analysis at various structural levels within a protein or within the entire PDB are discussed and a survey of various approaches to 3D geometrical and topological structure retrieval and comparisons, based on very effective pattern recognition techniques -the Generalized Hough Transform-are presented in detail in the paper Motifs and structural blocks retrieval by GHT by Virginio Cantoni, Alessio Ferone, Alfredo Petrosino and Ozlem Polat. ii) The prediction of interactions among proteins and other small molecules. The identification of protein-binding sites, their classification and analysis are of interest for drug design and treatment of diseases. Binding sites recognition is generally based on geometry and combined with physico-chemical properties, since the conformation, size and chemical composition of the protein surface are all relevant for the interaction with a specific ligand. The amount of work done in this area is huge. In the paper Predicting protein-ligand and protein-peptide interfaces by Paola Bertolazzi, Concettina Guerra and Giampaolo Liuzzi a taxonomy of the different approaches is given and their advantages and disadvantages are compared. Broadly speaking, four main categories are envisaged: i) shape-based methods; ii) alignment-based methods; iii) graph-theoretic approaches; and iv) machinelearning methods. In detail, the case of protein-peptide interfaces is considered, in which the binding region peculiarities specialize both geometric and machine-learning methods. iii) The volumetric data structure for protein representation and morphological analysis. The 3D matching problem is an important objective in the discovery of protein "active sites", i.e. complementary regions compatible biochemically, geometrically and topologically, so that they have matching concave and convex segments. The problem is usually pursued by ad hoc pattern descriptors which are often point-based and cumbersome for management and processing. In the paper Structural representation data structures by Virginio Cantoni, Luca Lombardi, Alessandro Gaggia and Riccardo Gatti, two new main approaches based on different representations (and consequently different data structures) are discussed: the former is based on a hierarchical data structure, the latter on first-order statistics. The first step is the segmentation, performed by classical operators of mathematical morphology, of the protein "solvent-excluded surface" in concavity and convexity regions. Then the interface areas, which can potentially be active sites, are effectively represented through a suitable rich "concavity tree"(CT) and/or through Contribution to the Focus Point on "Pattern Recognition Tools for Proteomics" edited by V. Cantoni. a
doi:10.1140/epjp/i2014-14130-3 fatcat:sddaw3sbo5eb7mkkib6o23hpge