A Constraint Dynamic Graph Approach to Identify the Secondary Structure Topology from cryoEM Density Data in Presence of Errors

Abhishek Biswas, Dong Si, Kamal Al Nasr, Desh Ranjan, Mohammad Zubair, Jing He
2011 2011 IEEE International Conference on Bioinformatics and Biomedicine  
Figure 1. SSEs and the topology. (A) The density map (grey) was simulated to 10 Å resolution using protein 3PBA from the Protein Data Bank (PDB) and EMAN software [1] . The SSEs (red: helix sticks, blue: sheet) were detected using SSE Tracer, an extended version of Helix Tracer [2], and viewed by Chimera. For clear viewing, only SSEs at the front of the structure are labeled. Arrows: the direction of the protein sequence; (B) The true topology of the protein sequence (arrow) and for the sticks
more » ... cross and dot); (C) H 1 to H 10 : helix segments; E 1 to E 4 : ȕ-strands; ". . .": loops longer than 2 amino acids. Abstract The determination of the secondary structure topology is a critical step in deriving the atomic structure from the protein density map obtained from electron cryo-microscopy technique. This step often relies on the matching of two sources of information. One source comes from the secondary structures detected from the protein density map at the medium resolution, such as 5-10 Å. The other source comes from the predicted secondary structures from the amino acid sequence. Due to the uncertainty in either source of information, a pool of possible secondary structure positions has to be sampled in order to include the true answer. A naïve way to find the native topology is to exhaustively map the pool of possible secondary structures detected in the density map with the pool of the secondary structures predicted from the sequence and search for the topology with the lowest cost. This paper studies the question that is how to reduce the computation of the mapping when the uncertainty of the secondary structure predictions is considered. We present a method that combines the concept of dynamic graph with our previous work of using constrained shortest path to identify the topology of the secondary structures. We show a reduction of about 34.55% time as comparison to the naïve way of handling the inaccuracies. To our knowledge, this is the 1 st computationally effective exact algorithm to identify the optimal topology of the secondary structures when the inaccuracy of the predicted data is considered. IEEE International Conference on Bioinformatics and Biomedicine 978-0-7695-4574-5/11 $26.00
doi:10.1109/bibm.2011.73 dblp:conf/bibm/BiswasSARZH11 fatcat:3isdkurzrjc2xbdipumoj56dva