Computing Sciences masthead Berkeley Lab Computing Sciences Berkeley Lab logo

After the First Decade of Metagenomics, Adolescent Growth Spurt Anticipated

But computational and other bottlenecks still need to be addressed

Contact: David Gilbert, DOE JGI Public Affairs Manager
(925) 296-5643, [email protected]

September 29, 2008

WALNUT CREEK, CA—Mostly hidden from the scrutiny of the naked eye, microbes have been said to run the world.  The challenge is how best to characterize them given that less than one percent of the estimated hundreds of millions of microbial species can be cultured in the laboratory. The answer is metagenomics—an increasingly popular approach for extracting the genomes of uncultured microorganisms and discerning their specific metabolic capabilities directly from environmental samples. Now, some ten years after the term was coined, metagenomics is going mainstream and already paying provocative dividends according to a “Q&A,” News and Views by the U.S. Department of Energy Joint Genome Institute (DOE JGI) microbial ecology program head Philip Hugenholtz and MIT researcher Gene Tyson, published in the September 25, 2008 edition of the journal Nature.

“By employing the techniques of metagenomics we can go beyond the identification of specific players to creating an inventory of the genes in that environment,” said Hugenholtz. “We find that genes occurring more frequently in a particular community seem to confer attributes beneficial for maintenance of the function of that particular ecological niche.”

Hugenholtz and Tyson were part of the team assembled by University of California, Berkeley geochemist Jillian Banfield to investigate microbial communities associated with the acid mine drainage of Iron Mountain in far Northern California in 2004.  In the dank recesses of the mine, protected by moon suits from the highly acidic effluent, the researchers scooped up pink biofilm growing on the surface of acid mine drainage streams.  Extracting the nucleic acid from the sample and directing DOE JGI’s powerful DNA sequencing resource on them, the Banfield team was able to reconstruct the metabolic profiles of the organisms living under such inhospitable conditions—like putting many Humpty-Dumpties back together again.  Their findings, published in Nature 428, 37–43 (01 Feb 2004), showed that reconstructing the genomes of dominant populations from the environment was feasible and that the imprints of evolutionary selection could be discerned in these genomes. 

hypersaline mat

A cross-section of a hypersaline
Guerrero Negro microbial mat

Since this pioneering work, DOE JGI has gone on to characterize many other metagenomes with other newly selected targets in the sequencing queue at the Walnut Creek, Calif. Production Genomics Facility.  These range from the hindguts of termites, to plumb for microbes producing cellulose-degrading enzymes, likewise to microbial communities in the cow rumen, foregut of the tammar wallaby, and the crop of the hoatzin, the Amazon stinkbird.  Beyond guts, the DOE JGI, through its Community Sequencing Program is enabling metagenomic explorations of Lake Washington near Seattle, Antarctica’s Lake Vostok, and the Great Salt Lake, in addition to the hypersaline mats at Guerrero Negro, Baja California. A video podcast of the Lake Vostok CSP project is featured on the DOE JGI site.  Nature features an audio podcast which includes an interview with Hugenholtz on their site.

Responding to the steadily increasing need to manage and interpret the terabases and terabytes of metagenomic data now bubbling up into the public domain, DOE JGI launched the Integrated Microbial Genomes with Microbiome Samples (IMG/M) data management and analysis system, developed in collaboration with Berkeley Lab’s Biological Data Management and Technology Center. IMG/M provides tools for analyzing the functional capability of microbial communities based on the DNA sequence of the metagenome in question.

“Metagenomic tools are becoming more widely available and improving at a steady pace,” said Hugenholtz. “But, there are still computational and other bottlenecks to be addressed, such as the high percentage of uncharacterized genes emerging from metagenomic studies.”

In the Nature piece, Hugenholtz and Tyson go on to cite the emergence of next generation sequencing technologies that are already creating a deluge of data that has outstripped the computational power available to cope with it.

“Nevertheless, it’s not necessary to compare all the data to glean useful biological insights,” Hugenholtz said. “What we can capture will help steer the direction toward a relevant data subset to investigate. At least with metagenomics, we have the environmental genetic blueprints awaiting our interpretation. We are still far from capturing and characterizing the dazzling diversity of the microbial life on earth—but at least we have hit upon the gold standard for scratching the surface.”

The U.S. Department of Energy Joint Genome Institute, supported by the DOE Office of Science, unites the expertise of five national laboratories—Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Pacific Northwest—along with the Stanford Human Genome Center to advance genomics in support of the DOE missions related to clean energy generation and environmental characterization and cleanup. DOE JGI’s Walnut Creek, CA, Production Genomics Facility provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges.

The Biological Data Management and Technology Center (BDMTC) at Lawrence Berkeley National Laboratory serves as a source of expertise in and provides support for data management and bioinformatics tool development projects for several organizations in the San Francisco Bay Area. The Center enables collaborating organizations to share experience, expertise, technology, and results across projects, employing industry practices in developing data management systems and bioinformatics tools, while maintaining academic high standards for the underlying data generation, interpretation, and analysis methods and algorithms.