Large-scale Comparative Genomic Ranking of Taxonomically Restricted Genes (TRGs) in Bacterial and Archaeal Genomes using the “Quality Index for Predicted Proteins” (QIPP)

June 1st, 2007

Speaker: Dawn Field

G.A. Wilson1, E.J. Feil2, A.K. Lilley1, and D. Field1

1 CEH-Oxford, Mansfield Road, Oxford, OX1 3SR, UK
2 Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, UK BA2 7AY

Lineage-specific, or taxonomically restricted genes (TRGs), especially those which are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein. We have developed “QIPP” (”Quality Index for Predicted Proteins”), an index that scores the ‘quality’ of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores and identifies many high-scoring orphans as potentially ‘authentic’ (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores. The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine.

Evolution of minimal metabolic networks

June 1st, 2007

Speaker: Csaba Pal

It is possible to infer aspects of an organism’s lifestyle from its gene content. Can the reverse also be done? Here we consider this issue by modelling evolution of the reduced genomes of endosymbiotic bacteria. The diversity of gene content in these bacteria may reflect both variation in selective forces and contingency-dependent loss of alternative pathways. Using an in silico representation of the metabolic network of Escherichia coli, we examine the role of contingency by repeatedly simulating the successive loss of genes while controlling for the environment. The minimal networks that result are variable in both gene content and number. Partially different metabolisms can thus evolve owing to contingency alone. The simulation outcomes do preserve a core metabolism, however, which is over-represented in strict intracellular bacteria. Moreover, differences between minimal networks based on lifestyle are predictable: by simulating their respective environmental conditions, we can model evolution of the gene content in Buchnera aphidicola and Wigglesworthia glossinidia with over 80% accuracy.

Genomic Reconstruction and Experimental Assessment of Metabolic Subsystems in Bacteria

June 1st, 2007

Speaker: Andrei Osterman

Burnham Institute for Medical Research, CA,
Fellowship for Interpretation of Genomes, IL

A large-scale sequencing and comparative analysis of diverse genomes strongly enhanced our ability to project the knowledge of genes and pathways from model organisms to others. This progress is particularly notable in reconstruction of the key metabolic subsystems, that are present, with some variations, in a variety of diverse microbial species. Capturing and analyzing functional variants of annotated subsystems projected across a large collection of integrated genomes is one of the goals of The SEED project (http://theseed.uchicago.edu/FIG/index.cgi). The analysis of functional and genomic context, most importantly, conserved operons and regulons, allows us to efficiently address open problems, eg propose the most likely gene candidates to fill-in the gaps in pathways. Although genomics-based functional predictions are emerging on a massive scale, the actual impact of this effort is contingent on our ability to complement it by systematic experimental verification. I will outline a concept of an annotation-reconstruction-prediction-verification pipeline, as an efficient and scalable approach to address this problem. The power of the approach will be illustrated by examples from our systematic survey of carbohydrate utilization pathways in environmental bacteria. This and related studies set the stage for genome-scale metabolic modeling of diverse microbial species and interpretation of the emerging metagenomic data.

Computational Comparative Genomics: Genes, Regulation, Evolution

June 1st, 2007

Speaker: Manolis Kellis

Comparative genomics of multiple related species has emerged as one of the most powerful and systematic ways to identify the functional elements encoded in any genome, by virtue of their conservation across millions of years of evolution. These elements are subtle and hidden in millions of non-functional nucleotides, requiring new techniques to decipher them and to understand their roles across diverse experimental and functional datasets.

In my group, we have used comparative genomics of multiple closely related species for the de-novo discovery of genes, regulatory motifs, microRNAs, gene targets, and enhancer elements. By studying the conservation properties of known functional elements, we define evolutionary signatures, specific to each type of functional element and dictated by the precise selective constraints that it evolves under. We have successfully applied such approaches to study multiple fungal, fly, and mammalian genomes, refining the annotation of existing elements and revealing hundreds of new functional elements. In addition, comparative genomics has enabled us to study the evolutionary dynamics of genomes, gene families, and regulatory motifs, and their role in the emergence of new gene functions.

This talk will cover our recent work analyzing 12 flies and 17 fungi. In particular, I will focus on new probabilistic methods for the accurate reconstruction of gene family evolution across complex phylogenies and complete genomes, and the identification of unusual gene structures suggesting new mechanisms of post-transcriptional regulation.

The BioCyc Collection of 260+ Pathway/Genome Databases

June 1st, 2007

Speaker: Peter Karp

Peter D. Karp, Pallavi Kaipa, Alexander Shearer, SRI International

The BioCyc [1] collection of 260+ Pathway/Genome Databases (PGDBs) is available at URL BioCyc.org. BioCyc includes the EcoCyc DB, which describes the metabolic and genetic regulatory network of Escherichia coli; and the MetaCyc DB, which describes more than 900 metabolic pathways that were experimentally elucidated in more than 900
organisms.

In addition, BioCyc contains PGDBs for most organisms with completely sequenced genomes. Each BioCyc PGDB combines the annotated genome sequenced with predicted metabolic pathways, predictions of pathway hole fillers, and operon predictions. BioCyc is also available through SRI’s BioWarehouse relational database system, and as a downloadable Lisp executable.

Each BioCyc PGDB uses the same DB schema, facilitating comparisons among the DBs. We have developed a number of systems-level comparisons of pathway and genome information that will be presented. In addition, we will present several novel tools for analysis of omics datasets can be applied to any BioCyc PGDB.

[1] “Expansion of the BioCyc collection of pathway/genome databases to 160 genomes,” Nucleic Acids Research 19:6083-89 2005.

Evolution of bacterial regulatory systems

June 1st, 2007

Speaker: Mikhail S. Gelfand

Institute for Information Transmission Problems, RAS
Bolshoi Karetny pereulok 19, Moscow 127994, Russia

When one has hundreds bacterial genomes to compare, with dozens of genomes in some “popular” taxa (alpha- and gamma-proteobacteria, Firmicutes), it becomes possible to study evolution of regulatory systems. These studies demonstrate remarkable flexibility of these systems with rapid regulon expansion, changes in the regulator specificity, merging of reglons following regulator loss, complete replacement of regulatory systems, etc. I believe it is somewhat premature to speak about quantitative description of such events, but we can now compile an inventory of the main types of events. It is also possible to reconstruct detailed evolutionary history of some regulatory systems. I will try to do that using three main examples: the LacI family of transcription factors, RNA-level T-box regulation in the Firmicutes, and transcription factors regulating iron homeostasis in alpha-proteobacteria.

Software tools for high-throughput genome annotation

June 1st, 2007

Speaker: Dmitrij Frishman

Genomic sequences are being deciphered at unprecedented pace, and the demand in sequence data is also continuously growing, fueled by thrilling potential applications which range from personalized genome-based medicine and targeted cancer therapies to microbial strain optimization and bioterrorism prevention. In my talk I will focus on modern computational methods and software infrastructure for analyzing protein function at genomic scale. The new version of the PEDANT genome analysis system and its integration with SIMAP - a precomputed similarity matrix for six million amino acid sequences - will be described. Another novel software system - PROMPT - is a comprehensive bioinformatics software environment which enables the user to compare arbitrary protein sequence sets, revealing statistically significant differences in their annotation features. I will also discuss possible ways to reduce the error level of automatically generated annotation by using machine intelligence.

Micro-SIG: Comparative genomics, evolution, and regulation in microbes

February 28th, 2007

ISMB 2007 SIG Meeting
Date: Friday, July 20, 2007
Start time: 9:00 a.m. — End time: 6:30 p.m
Schedule

With more than 400 microbial genomes sequenced and more than 900 in the sequencing pipelines, comparative genomics is turning into a major tool for the annotation and analysis of sequences and genomes. To solve many problems related to the ever-growing number of bacterial sequences, new algorithms and software programs for post-sequencing functional analysis are being developed by the scientific community. Among them are whole genome comparison and visualization, discovery of coding regions and regulatory signals, and deducing the mechanisms and history of genome evolution has been a very active area of research in the last years.

Metagenomics, an emerging subdiscipline of microbiology, allows for the study of genomic sequences found in ecosystems, including genomes of species that are difficult to culture. They provide exciting opportunities to discover the extent of co-existing genetic diversity within natural ecosystems, and even microdiversity within populations of the same species.

This one-day SIG will be devoted to discussing the latest advances in computational techniques and biological insights in the area of comparative microbial genomics and regulation in bacteria. This session will bring together bioinformaticians and experimentalists, and will consist of invited talks and discussions chaired by leaders in the field. Talks will be grouped in four major areas:

  1. Comparative, population, and environmental genomics
  2. Elucidation of gene regulation in microbes
  3. Inference of genome phylogenies and dynamics
  4. Building and optimization of publicly available computational tools and databases for microbial analysis.


This session is organized by
Adam Arkin, Lawrence Berkeley National Laboratory and UC Berkeley
Eric Alm, MIT
Inna Dubchak, Lawrence Berkeley National Laboratory and Joint Genome Institute

 

About

February 26th, 2007

This website is here to provide information about the Special Interest Group meeting on Comparative genomics, evolution, and regulation in microbes meeting at the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) & 6th European Conference on Computational Biology (ECCB) in Vienna, Austria.

We will provide abstracts, agendas and links to external information on this site. If you have any questions, feel free to email us, the list of organizers is on the home page of the website.