Defining structural and evolutionary modules in proteins: a community detection approach to explore sub-domain architecture
BMC Structural Biology
Assessing protein modularity is important to understand protein evolution. Still the question of the existence of a sub-domain modular architecture remains. We propose a graph-theory approach with significance and power testing to identify modules in protein structures. In the first step, clusters are determined by optimizing the partition that maximizes the modularity score. Second, each cluster is tested for significance. Significant clusters are referred to as modules. Evolutionary modules
... e identified by analyzing homologous structures. Dynamic modules are inferred from sets of snapshots of molecular simulations. We present here a methodology to identify sub-domain architecture robustly, biologically meaningful, and statistically supported. Results: The robustness of this new method is tested using simulated data with known modularity. Modules are correctly identified even when there is a low correlation between landmarks within a module. We also analyzed the evolutionary modularity of a data set of α-amylase catalytic domain homologs, and the dynamic modularity of the Niemann-Pick C1 (NPC1) protein N-terminal domain. The α-amylase contains an (α/β) 8 barrel (TIM barrel) with the polysaccharides cleavage site and a calcium-binding domain. In this data set we identified four robust evolutionary modules, one of which forms the minimal functional TIM barrel topology. The NPC1 protein is involved in the intracellular lipid metabolism coordinating sterol trafficking. NPC1 N-terminus is the first luminal domain which binds to cholesterol and its oxygenated derivatives. Our inferred dynamic modules in the protein NPC1 are also shown to match functional components of the protein related to the NPC1 disease. Conclusions: A domain compartmentalization can be found and described in correlation space. To our knowledge, there is no other method attempting to identify sub-domain architecture from the correlation among residues. Most attempts made focus on sequence motifs of protein-protein interactions, binding sites, or sequence conservancy. We were able to describe functional/structural sub-domain architecture related to key residues for starch cleavage, calcium, and chloride binding sites in the α-amylase, and sterol opening-defining modules and disease-related residues in the NPC1. We also described the evolutionary sub-domain architecture of the α-amylase catalytic domain, identifying the already reported minimum functional TIM barrel.