LBL Researchers work in Plant Genome Project

April 2, 1993

By Lynn Yarris, [email protected]

A considerable amount of public attention has been focused on the human genome project in which LBL researchers are playing an important role. There is, however, another national genome project that has received relatively little attention even though it too offers potentially enormous dividends for all of humanity. This effort is the plant genome project of the U.S. Department of Agriculture (USDA) and LBL researchers are playing an important role in it as well.

As Earth's population continues to grow, the need for new varieties of crops that are hardier, higher-yielding, and more nutritious has never been greater. Creating plant varieties with desired characteristics is an age-old practice, but agricultural scientists are starting to replace traditional methods of selective breeding with modern genetic engineering techniques. What used to be a trial and error approach involving many generations of plants over a period of years, promises to become a much more controlled and rapid procedure.

"We are seeing a real data explosion in plant genetics and there is a critical need to collect, organize, and integrate this wide variety of new data as soon as possible," says John McCarthy, a computer scientist in LBL's Information and Computing Services division (ICSD) who has been a key player in developing databases for LBL's Human Genome Center.

In the spring of 1991, McCarthy was approached by USDA about adapting LBL human genome database models for use with plant genome data. Using database design tools developed by LBL's data management group under Victor Markowitz, McCarthy and other members of ICSD's genome computing group began working with USDA- sponsored researchers to develop databases for the genomes of wheat, soybean, and forest trees.

About a year into the project, McCarthy and his colleagues elected to work with ACEDB (A Caenorhabditis Elegans Data Base) a system developed in 1991 for the international nematode genome project by an English biologist, Richard Durbin, and a French physicist, Jean Thierry-Mieg (who worked with the theoretical physics group at LBL in 1983-84).

"ACEDB has many of the capabilities our genome databases need beyond traditional relational systems," says McCarthy. "Its open architecture enables groups like ours to build on its foundation."

The versatility of ACEDB had also been demonstrated when the it was successfully adapted for a database on Arabidopsis thaliana, a type of mustard that is a model organism for plant biologists comparable to animal models like the fruit fly (Drosophila) and C. elegans.

"Durbin and Thierry-Mieg provide ACEDB's source code free and welcome collaborators in its ongoing development," explains McCarthy. "Therefore, at LBL we've been able to make major contributions to design and development of new display modules and extensions to the database core."

Despite its origin as a genome database for a specific organism, ACEDB is a general-purpose, object-oriented, hypertext system, according to McCarthy. Information is presented in multiple window displays containing text, diagrams, and pictures. Scientists can browse data using a computer mouse and click on components such as gene names to bring up detail windows.

"An object-oriented approach simplifies database design, maintenance and user interface development, which in turn makes it easier to collect and upload data," McCarthy says. "This helps move information much more quickly from lab notebooks into public archives."

ACEDB itself runs on Unix systems, but can be run, through programs such as X-windows, over networks from Macintosh and IBM-PC compatible computers as well as Unix workstations. McCarthy reports that a stand-alone test version of ACEDB for the Macintosh is also available.

In less than a year, McCarthy and his LBL colleagues have been able to help plant genome researchers design, implement, test, and transfer individual database operations to remote computers at Iowa State University (SoyBase), Cornell (GrainGenes), and the USDA Regional Laboratory here in Albany (Dendrome). These databases have since been uploaded into the National Agricultural Library's Plant Genome Database so that researchers connected to the InterNet can now run them from computers anywhere in the world.

"It has really been enjoyable to work with the plant genome and ACEDB database projects," says McCarthy. "The strong cooperation between different collaborating groups has already brought us a long way."