Long-Awaited Bacterial Genome Debuts Last week, Frederick R. Blattner of the University of Wisconsin-Madison made the announcement that the microbiology community had been anxiously anticipating: He and his colleagues have finished sequencing the genome of Escherichia coli, the bacterium studied for decades by biologists. "It's the most important bacterium there is," says Eric C. Lander of the Whitehead Institute for Biomedical Research in Cambridge, Mass. " E. coli is the bacterium of choice for studying how bacteria work. It's an invaluable sequence." Blattner broke the news publicly at a meeting on small genomes in Hilton Head, S.C. He revealed that the bacterium's genome consists of 4,638,858 nucleotide base pairs, the chemical subunits of DNA, and appears to contain 4,300 genes. The exact number of genes remains "fluid" as rigorous analysis of the sequence continues, says Blattner. The first full sequencing of a bacterial genome was announced in 1995, and completion of several other small genomes followed quickly. At the South Carolina meeting, scientists from the Institute for Genome Research (TIGR) in Rockville, Md., reported that they were putting the final touches on four more sequences: the bacteria that cause syphilis, ulcers, and Lyme disease, as well as an archaea, one of the unusual microorganisms that form the so-called third branch of life Yet microbiologists have looked forward to having E. coli's genome more than any other. "The main advantage of E. coli is that there's an enormous biology literature on this organism. That means when you have a gene and a gene product, you can fit them into the vast understanding of its biology," says Monica Riley of the Marine Biological Laboratory in Woods Hole, Mass., who called news of the genome's completion "exhilarating." " E. coli also provides a reference point for all the other small genomes being sequenced, because for many of those organisms there's very little known about their biology," she adds. As with the other recently unveiled genomes, E. coli's offers a bounty of novel genes. Almost 2,500 of them bear no strong resemblance to any known genes, leaving scientists with few clues to their roles. "All this is going to take a while to analyze. It's a massive amount of data," says Riley, who maintains an online encyclopedia of E. coli genes. That more than half of the bacterium's genome remains a complete mystery may seem a surprise, considering how extensively investigators have utilized E. coli. Yet petri dishes and test tubes aren't normal environments for the bacterium, so many of its genes may never have made their presence known to scientists. "I think a lot of the genes in E. coli function in niches other than the laboratory," observes Blattner. Blattner's group, which started the E. coli project in 1991, deposited the last few sequences of the genome into public databases on Jan. 16, narrowly beating a Japanese group led by Hirotada Mori of the Nara Institute of Science and Technology. Mori's team, using previously published data from Blattner's group, actually produced a composite sequence of more than one E. coli strain. "It was kind of a race at the finish," admits Blattner. Researchers have just begun to compare the two genomes to see what genes distinguish the strains. Blattner notes that an important project will be to compare these genomes with those of E. coli strains that can cause fatal food poisoning. With reference genome sequences in hand, it should become relatively easy to make such comparisons, he says. The full E. coli sequence may even help researchers clean up their growing wealth of data on the human genome, adds Blattner. Some studies suggest that because of laboratory contamination, 15 percent of the human gene sequences now in databases contain parts of the E. coli genome.