Go-Kit: A Tool To Enable Energy Landscape Exploration of Proteins
Coarse-grained Go-like models, based on the principle of minimal frustration, provide valuable insight into fundamental questions in the field of protein folding and dynamics. In conjunction with the commonly used molecular dynamics (MD) simulations, energy landscape exploration methods like discrete path sampling(DPS) with Go-like models can provide quantitative details of the thermodynamics and kinetics of proteins. Here we present Go-kit, a software that facilitates the setup of MD and DPS
... tup of MD and DPS simulations of several flavours of Go-like models. Go-kit is designed for use with MD (GROMACS) and DPS (PATHSAMPLE) simulation engines that are open source. The Go-kit code is written in python2.7 and is also open-source. A case-study for the ribosomal protein S6 is discussed to illustrate the utility of the software. The software is available at https://github.com/gokit1/gokit. The potential energy landscape for a biomolecule is the potential energy as a function of all the configurational degrees of freedom included in the model. 1-6 The kinetics and thermodynamics of the system in question are encoded in this underlying landscape. 3,7,8 Deterministic molecular dynamics (MD) simulations are a very popular tool to study the time evolution of proteins on the landscape. 9-11 In contrast, stochastic global optimisation methods like basinhopping 12-14 in combination with generation of a kinetic transition network using discrete path sampling (DPS) 1,15 are based on geometry optimisation techniques, statistical mechanics, and unimolecular rate theory, and enable construction of a kinetic transition network representation of the landscape, provide a general overview of the multidimensional energy landscape,. 2,16-18 To study folding behaviour of proteins, their potential energy is often approximated by coarse-grained models. 19-25 Among coarse-grained models of proteins, Go-like models 26 encode the protein structure and are commonly used. 25,27-34 In its simplest reduction, a protein is represented by a connection of beads where the Hamiltonian is parameterized to support the known native-state as the global minimum. The success of this simplified representation of a protein in reproducing the thermodynamics and kinetics of folding is due to the fact that the folding process has evolved to satisfy the principle of minimal frustration. 20,31,35 Many variations of this model have been successful in elucidating protein folding rates, 24,36 pathways 37 and mechanisms. 23,24,38-44 Current web-servers such as the SMOG-server 25,34,45,46 assist set up of a one-bead coarse-grained representation and an allheavy atom representation 34 of the protein with structure-based Go-models. The software presented here, Go-kit facilitates the exploration of energy landscapes of proteins for multiple flavours of Go-like models using MD and DPS simulations. A PDB file, usually of an experimental protein structure from the Protein Data Bank, 47 is the only input required. The generated output files can be used directly to perform MD with the opensource GROMACS package, 9-11 and DPS with the open-source PATHSAMPLE package. 48 2 We provide a general overview of the work-flow followed in Go-kit next. Software Overview Go-kit is a Python program, which can be installed and run on Linux and MacOS machines. It generates input files for two purposes: to enable MD simulations with the open-source GROMACS package 9-11 and discrete path sampling simulations with the open-source PATH-SAMPLE package. 1,15,48 The argument parser interface is called to read variables from the command-line. A help module that contains all the arguments can be accessed by typing --help argument. All the variables that can be changed from the command-line, along with their default value are listed in supplementary information (Tables S1 and S2 ). This is the first version of the software and for single-chain proteins only. Future versions of the software will incorporate multiple protein chains and DNA/RNA protein interactions. The workflow is depicted in Figure 1 . The only input the user provides is a PDB ID or a PDB format file containing all the atoms in the native state of the protein. This file can be downloaded from the protein data bank (PDB). 47 The BIOPYTHON 49 module is employed to check the PDB file provided for errors with strict parsing. This is the native structure that the coarse-grained system should support as the global minimum. Representation of the protein in Go-like models In the basic Go-model, each residue in the protein is coarse-grained to one or more beads that are parameterised to reproduce residue-residue interactions. The system is represented by a chain of connected beads biased toward the native configuration. Go-kit coarse-grains the protein to one and two bead models.