Preface Modeling and Simulation of Gene Regulation and Metabolic Pathways

Julio Collado-Vides, Ralf Hofestädt, Michael Mavrovouniotis, Gerhard Michal, Julio Collado-Vides
1998 unpublished
The second Dagstuhl seminar for Modeling and Simulation of Gene Regulation and Metabolic Pathways was held from June, 21 to 26, 1998. It was a multidisciplinary seminar with 59 participants from 15 countries. Schloss Dagstuhl workshops in general emphasize computer science, and we are delighted to focus on the rapidly developing links between biosciences and computer sciences. The 1998 meeting is a sequel to the 1995 Dagstuhl seminar on the same topic. Both were generously supported by grants
more » ... om the Volkswagen Stiftung and the European Community (TMR Grant). The availability of a rapidly increasing volume of molecular data enhances our capability to study cell behavior. In order to exploit molecular data, one must investigate the link between genes and proteins; the link between protein structure and protein function; and the concerted effects of many proteins acting on, and interacting with, the mixture of small and large molecules within a cell. This last step is the study of gene regulation and metabolic pathways which was the topic of the Dagstuhl seminar. The molecular data must be stored and analyzed. Database systems for genes and proteins (EMBL, GENBANK, PIR, SWISS-PROT) offer access via internet. In the research field of molecular biology this technique allows the analysis of metabolic processes. To understand the molecular logic of cells we must be able to analyze metabolic processes in qualitative and quantitative terms. Therefore, modeling and simulation are important methods. They influence the domain of medicine and (human) genetics -the microscopic level. Today integrative molecular information systems which represent different molecular knowledge (data) are available. The state of the art is shown by P. Karps system EcoCyc, which represents the metabolic pathways of E. coli. For every gene or protein within a specific metabolic pathway, EcoCyc presents the access to all corresponding genes and/or proteins. Moreover, the electronical information system KEGG represents all biochemical networks and allows the access to the protein and gene database systems via metabolic pathways. However, both systems are based on the idea of the statical representation of the molecular data and knowledge. The next important step is to implement and integrate powerful interactive simulation environments which allow the access to different molecular database systems and the simulation of complex biochemical reactions. Molecular information systems for gene regulation and metabolic pathways were one topic of the Dagstuhl seminar. The idea was to discuss the progress of this research field and the 3 integration of the molecular database systems in combination with simulation tools. The organisers of the seminar invited colleagues, who presented their ideas through 42 talks and computer demos. More than 30 years ago Gerhard Michal started to collect all biochemical reactions. His classification is presented by the Boehringer pathway chart. This data collection was extended by the KEGG research group, which implemented the first electronical representation of this data in 1996. Nowadays all biochemical reactions are available via internet using the KEGG system. KEGG represents links to molecular database systems for genes, proteins, and enzymes, which are elements of metabolic pathways. Thus a link to the EMBL database system represents more information about a specific gene, and a link to the SWISS-PROT system represents more information about the protein (enzyme). Regarding the KEGG system the representation of quantitative data and kinetic data is not available today. Furthermore, additional to the molecular data (genes, proteins, and pathways) the first molecular information systems are available which represent data of the cell signals. Besides the Japanese Cell Transduction Database the GENENET database system is available. Taking regard to both molecular information systems this can be interpreted as the first scientific step in which cell reaction processes are surveyed from the gene regulation process to the cell communication. For molecular biology the phenomena of gene regulation is the main question. The systematic discussion of this question is based on the electronical representation of the molecular knowledge, which allows the complex analysis of this data. For that reason specific database systems are implemented (OperonDB, TRANSFAC and TRRD). These database systems represent all known operons and the transcriptional factors for E. coli (OperonDB) and eukaryotic cells. Today, two research fields based on this data are supported: The prediction of promoter sequences and the modeling of gene regulation. The prediction of promoter sequences is of importance, because the promoter is the starting signal for a structure gene which represents the genetic information. The human genome project will sequence the whole genome until the year 2004 (64 * 10 9 base pairs). The next step is to calculate the corresponding genetic map. Therefore, sequence pattern matching algorithms must be developed and implemented. In addition modeling and simulation of gene regulation processes will support the systematic analysis of the metabolic pathways. John Reinitz opened the seminar. He presented ideas about modeling of genetic factors and analyzed the process of segment determination in Drosophila through numerically inverting a chemical kinetic equation which describes the regulatory circuitry and accounts for the synthesis rate, diffusion, and decay of gene products. The molecular mechanisms of gene regulation were presented by Edgar Wingender. During the last decade he has been analyzing the molecular mechanisms of eukaryotic gene regulation and has been collecting all transcriptional factors which can be found using his database system TRANSFAC. The predic-4 tion of promoter sequences based on this data was one important topic of the gene regulation session. Julio Collado-Vides, Gary Stormo, and Thomas Werner showed algorithms for the detection of promoter sequences for E. coli and eukaryotic cells. The molecular mechanisms of the cell death were discussed by Dominique Bergeron, and Luiz Mendoza talked about complex metabolic networks. The modeling of regulatory networks belongs to the topic of Biophysics and Biomathematics. Moreover, discret models are developed using methods of Bioinformatics. At the beginning of that session Jay Mittenthal presented the metabolic pathway of the Pentose Phosphat Cyclus. Gerhard Michal is the creator of the Boehringer pathway chart which inspired many of us to pursue databases and integrative methods for the study of the metabolism. In his talk he discussed a brief overview of the issues surrounding the development of graphical representations and displays of metabolic pathways and other biological information. In the case of analytic models Michael Savageau introduced a model which allows the simulation of complex kinetic effects. Using graph theoretical methods Michael Kohn discussed his model for the simulation of metabolic networks. Stefan Schuster outlined several powerful methods for determining key features of a metabolic pathway or network. He showed how conservation relations may be identified and how elementary biochemical routes (and hence the spectrum of behaviors of the biochemical network) may be determined. Further he outlined the principles of metabolic control analysis and its extensions. A new grammatical model for the analysis of complex metabolic processes was presented by Simone Bentolila. Another topic of the seminar were molecular database systems. At the beginning of this session Thomas Mück discussed new topics in the research field of database systems and Vladimir Babenko introduced new techniques for the integration of molecular database systems. Minor Kanehisa showed the pathway database system KEGG and discussed further applications. Fedor Kolpakov demonstrated the database system GENENET, which is similar organized to the Japanese database system for Cellular Signal Transduction, which was presented by Takako Takai-Igarashi. Rolf Apweiler talked about the SWISS-PROT database, and Daniel Kahn demonstrated a new database system for the integration of protein knowledge. One important application of this molecular data is the diagnosis of metabolic diseases. In the case of inborn errors Manuela Prüss introduced the database system MDDB. The final topic of the seminar was the integration and simulation of metabolic networks. The first generation of powerful simulation environments for the metabolic network control was discussed. These tools work using the biochemical data and diverse models which were presented in the sessions mentioned before. Pedro Mendes demonstrated his simulation environment GEPASI, which allows the analytical modeling of the metabolic processes. A first information system based on the integration of molecular databases and a grammatical simulation environment was introduced by Uwe Scholz and Ralf Hofestädt. Finally, an expert system for the modeling of metabolic processes was presented by Jaime Lagunez. 5 Concluding remarks It is not sufficient to know what each protein or gene does in the cell (it usually catalyzes or regulates a biochemical reaction), but one must also decipher what they are all doing together (they form pathways of elaborate transformations and regulatory networks). In order to decipher the metabolic pathways that define the behavior of the cell as a whole, one must use information on single-protein activity. But there is also information flow in the reverse direction: The position and role of an enzyme in the metabolic network provides crucial insights and hypotheses for its genetic regulation and its relationship to other proteins. Genes and proteins are routinely sequenced and stored in database systems. Data on biochemical pathways has been systematically collected for the last three decades (in pictorial and text form), and the accumulation of such data has increased dramatically in recent years (and shifted to computational representations). The systematic use of collected data is also continually making advances. Methods for computational modeling and simulation are made feasible by the availability of data and are driven by the need to understand the behavior of complex biological systems. The integration of information, especially combinations of genes, enzymes, and metabolic pathways will be necessary in the study of biological regulatory structures, which usually involve multiple facets, components, and scales of action. Database systems and powerful models are already available, and the first practical simulation tools are implemented based on powerful theoretical methods. These information-integrative activities will become increasingly shed light on the biochemical mechanism of life. The actual questions of the seminar were focused by the final discussion which concluded that: The number of molecular database systems is increasing. Moreover, these systems are available via internet. The now available accessing techniques are www links to the relevant molecular database systems, which support the navigation through the molecular data. However, this data must be available for further analysis processes. The detection of promoter structures is one actual example, which shows also the algorithmic problems of this research field. Besides the algorithmic analysis, modeling and simulation based on this molecular data are of importance. Different tools are developed and implemented. However, the selection of the model depends on the actual question. The main task for the next years is the integration of the database systems and the simulation environments, which will allow the simulation of complex metabolic networks. Acknowledgement The organisers thank the Volkswagen Stiftung and the European Community (TMR Grant) for its generous financial support. Further information about the Dagstuhl seminar:
fatcat:tw324dirsrhi7dcisfx5occ3iy