Genome-specific models of bacterial cells and reverse engineering of prokaryote biology


Discovering the network of biochemical processes underlying the behavior of Geobacteria and other microbes is obtained by creating a suite of interoperable systems biology modules. The workflow takes multiplex bioanalytical data as input, discovers the transcriptional regulatory network (TRN) and other process networks, and then uses cell simulation to derive microbial behavior, notably the biotechnical characteristics in the context of environmental remediation and energy production. To attain this goal we integrate a number of bioinformatics, cell modeling, and multiplex data/model integration tools. We have started this project with a TRN discovery system (a preliminary version is at Input to this system is microarray data on gene expression profiles generated by the bacterium in response to thermal, chemical, or gene insertion/deletion perturbations. A database provides a preliminary TRN which provides serves as a training set for the systems biology modules. Network inference using a similarity measure assumes that the activity of a transcription factor (TF) is represented by the expression of the gene that makes it. Failure to observe high correlation between mRNA level and TF activity in E.coli shows that this assumption does not hold. Therefore, in order to use expression data, we estimate the TF activities independent of expression level of the mRNA that translates into the TF. To accomplish this, we developed a novel algorithm to predict TF activities from expression levels of all genes that the TF regulates. This module is integrated with gene ontology and phylogenic similarity modules using a Bayesian framework. A second microarray data analyzer extracts transcription and RNA degradation kinetic rate constants and TF/gene binding constants at sites of gene regulation. The resulting TRN and information on the genes translated into the TF are fed into a final module which derives the general behavior and critical condition for dramatic changes in cell performance/characteristics. The workflow is demonstrated on several cellular systems. Identifying distinct types of functionally associated proteins in the Caulobacter crescentus Pathway/Genome Database Michelle L Green* and Peter D Karp, Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, CA 94025. The PathoLogic program constructs a Pathway/Genome database (PGDB) using a genome’s annotation to predict the set of metabolic pathways present in an organism. PathoLogic determines the set of reactions composing those pathways from the list of enzymes in the organism, and computationally predicts operons for the organism. Pathologic includes predictors for protein complexes and transporters that require manual curator review. Previously, we extended our pathway hole filler (PHFiller) to include genome context data (e.g., co-occurrence profiles, conserved gene neighbors, gene fusions) in the search for missing enzymes. Adding genome context data improved the coverage of PHFiller by eliminating the need for known enzyme analog sequences from other organisms. PHFiller-GC works by identifying functionally associated proteins for each known enzyme in a pathway and then predicting the probability that each functionally associated protein catalyzes the missing reaction in the pathway based on genome context data. We have further extended the capability of the PHFiller-GC algorithm to predict additional types of functional associations beyond proteins that appear in the same pathway. These additional functional associations include: 1. Proteins that appear in the same complex. 2. Protein pairs where protein A transports a compound that is acted on by the pathway in which protein B operates as an enzyme. 3. Protein pairs whose genes appear in the same operon. 4. Protein pairs where protein A regulates transcription of the gene encoding protein B. Our predictor integrates co-occurrence profiles, conserved gene neighbors, gene fusions, gene clusters, and co-expression profiles to identify candidate pairs and evaluate the probability that two genes are functionally associated by one or more of the above criteria. The predictor was trained using proteins from EcoCyc known to be associated by one or more of these functional association criteria. We performed cross-validation studies in EcoCyc to determine the predictive value of the algorithm for identifying known functionally associated protein pairs. We also applied the full predictor (i.e., identifying any functional association) and the individual predictors (i.e., identifying pairs in the same pathway, same complex, etc.) to the Caulobacter crescentus PGDB, CauloCyc, and identified functional associations previously available only through manual curation. Global Analyses of Two-Component Signal Transduction Pathways in Caulobacter crescentus Michael T. Laub (, Emanuele G. Biondi, Jeffrey M. Skerker, Barrett S. Perchuk Department of Biology, Massachusetts Institute of Technology Two-component signal transduction systems, comprised of histidine kinases and their response regulator substrates, are the predominant means by which bacteria sense and respond to signals. These systems allow cells to adapt to prevailing conditions by modifying cellular physiology, including initiating programs of gene expression, catalyzing reactions, or modifying proteinprotein interactions. These signaling pathways have also been demonstrated to play a role in coordinating bacterial cell cycle progression and development. We have initiated a system-level investigation of two-component pathways in the tractable model organism Caulobacter crescentus, which encodes 62 histidine kinases and 44 response regulators. Comprehensive deletion and overexpression screens have identified more than 40 of these 106 two-component genes as required for growth, viability, or proper cell cycle progression. We have also developed a systematic biochemical approach, called phosphotransfer profiling, to map the connectivity of histidine kinases and response regulators. By combining these genetic and biochemical approaches, we have begun mapping pathways critical to growth and cell cycle progression. This includes a complex genetic circuit that controls the activity of CtrA, the master regulator of the Caulobacter cell cycle. At the heart of this circuit are two phosphorelays, one of which culminates in phosphorylation of CtrA and another which leads to proteolytic stabilization of CtrA. Both phosphorelays are initiated by the essential histidine kinase CckA. Once activated and stabilized by these two phosphorelays, CtrA triggers expression of target genes including the essential regulator divK. DivK then feeds back to down-regulate CckA, and consequently, CtrA. Our results thus define a negative feedback loop that drives cell cycle oscillations in C. crescentus. We have also used our systematic phosphotransfer profiling technique to probe the molecular basis for specificity in two-component signaling pathways. We have found that histidine kinases are endowed with a global, kinetic preference in vitro for their in vivo cognate response regulators. This system-wide selectivity insulates two-component pathways from one another, preventing unwanted cross-talk. Moreover, it suggests that the specificity of two-component signaling pathways is determined almost exclusively at the biochemical level. By analyzing patterns of co-evolution between cognate histidine kinases and response regulators we have mapped the amino acids which dictate specificity in these pathways. Site-directed mutagenesis of these amino acids has been used to “rewire” signaling pathways. This serves as both a proof of specificity and may enable (i) the prediction of HK-RR pairs in other organisms and (ii) the rational design of novel signaling pathways for construction of biosensors or synthetic genetic


    0 Figures and Tables

      Download Full PDF Version (Non-Commercial Use)