Comparative genomic inference of bacterial regulatory systems

Speaker: Erik van Nimwegen

In this talk I will briefly discuss two recent projects. First, we have developed a Bayesian model for predicting two-component system interactions across all sequenced bacteria directly from the amino acid sequences of the kinases and receivers. Tests on predicting known cognate interactions, and comparison of predictions with known orphan interactions in Caulobacter suggests that the method accurately reconstructs two-component signaling networks. Second, we have developed a method for quantifying selection at non-coding positions genome-wide (both intergenic and at silent positions) from multiple alignments of clades of related bacterial genomes. The method explicitly models evolution at non-coding positions taking into account the phylogenetic relations between the species, base composition biases, and codon biases. Using this method we find that there are strikingly universal characteristics of selection patterns across all sequenced clades with some interesting implications. For example, whereas silent sites evolve according to a neutral background model, intergenic regions show significant evidence of selection in all clades with consistently more selection upstream than downstream of genes. However, although the number of transcription factors grows approximately quadratically with the number of genes in bacterial genomes, our analysis strongly suggests that the number of regulatory elements per upstream region is the same in bacteria with large and small genomes. This has important implications for the structure of regulatory networks between large and small bacterial genomes. We also find a universal pattern of high adenosine frequency, significant selection at silent positions, and strong avoidance of RNA secondary structure in the area immediately around translation start sites in all clades. This suggests that selection for translation initiation efficiency shapes the sequence composition around translation start in all clades. In addition, this universal pattern may be used to improve accuracy of gene start annotation.

Comments are closed.