The prediction of operons, the smallest unit of transcription in prokaryotes, is the first step towards reconstruction of a regulatory network at the whole genome level. Sequence information, in particular the distance between open reading frames, has been used to predict if adjacent Escherichia coli genes are in an operon. While appreciably successful, these predictions need to be validated and refined experimentally. As a growing number of gene expression array experiments on E.coli became available, we investigated to what extent they could be used to improve and validate these predictions. To this end, we examined a large collection of published microarry data. The correlation between expression ratios of adjacent genes was used in a Bayesian classification scheme to predict whether the genes are in an operon or not. We found that for the genes whose expression levels change significantly across the experiments in the data set, the currently available gene expression data allowed a significant refinement of the sequenced-based predictions. We report these co-expression correlations in an E.coli genomic map. For a significant portion of gene pairs, however, the set of array experiments considered did not contain sufficient information to determine whether they are in the same transcriptional unit. This is not due to unreliability of the array data per se, but to the design of the experiments analyzed. In general, experiments that perturb a large number of genes offer more information for operon prediction than confined perturbations. These results provide a rationale for conducting expression studies comparing conditions that cause global changes in gene expression.
ASJC Scopus subject areas