Machines and the Human Genome Project

Spring 1992

By Lynn Yarris, LCYarris@lbl.gov

The world's fastest marathon runners can cover the 26 mile distance of their race in a little more than two hours. At the Indy 500, the world's fastest cars routinely cover the same distance in a little less than ten minutes. Machines can do things faster than people can. This axiom has been put to good use in a great number of human endeavors but only recently has it been brought to bear on one of the most monumental endeavors of all -- the Human Genome Project.

The Human Genome Project is a national effort to decipher the human genetic blueprint that is being spearheaded by the U.S. Department of Energy (DOE) and the National Institutes of Health (NIH). It will require "mapping" the location of some 100,000 genes along the 23 pairs of human chromosomes, then "sequencing," or determining the order of, the three billion base pairs of nucleotides that make up these chromosomes (see sidebar).

To appreciate the enormity of the Human Genome Project's goal, consider that if all of the DNA in the nucleus of a typical human cell were to be unraveled and stretched out flat, it would extend about six feet in length. To date, less than half an inch of this DNA has been sequenced. Mapping efforts have not progressed much further. Despite nearly 50 years of work, less than 2,000 genes have been mapped. Everyone involved with the Human Genome project agrees that the process must be speeded up for the project to be completed within a reasonable period of time and at an affordable cost.

"Using current procedures, a single research group might be able to sequence a human chromosome in about 1,000 years for $600 million," says Michael Palazzolo, a geneticist with LBL's Human Genome Center, one of three centers established by DOE to construct high-resolution (within a few million base pairs) genetic maps. The center at LBL, which is lead by Jasper Rine (see sidebar), is primarily responsible for mapping and sequencing chromosome 21. With an estimated total of 900 genes, 21 is the smallest of the human chromosomes but commands a lot of research attention because it carries the genes for Down's syndrome (caused by an extra copy of 21), and familial Alzheimer's disease.

"Our Center has announced the goal of developing a complete contig (a set of overlapping pieces of DNA that spans an uninterrupted stretch of the genome) of the long arm of chromosome 21 within two years," says Rine. Such a map will cover 10 megabases (10 million base pairs) and require the cloning of thousands of pieces of DNA grown in colonies of yeast or bacteria cells. Another major goal of the center, Rine says, is to be able, within the next three years, to sequence DNA at the rate of 1.2 megabases per year, a sixfold increase over today's best rates. This too shall require thousands of mind- numbingly repetitive operations -- the kind that machines such as robots and computers thrive on.

"From the beginning, laboratory automation has been recognized as an essential element of the Human Genome Project," says Ed Theil a computer systems engineer who has been working with the Human Genome Center's instrumentation group. "Beyond the advantages of speed and relief from tedium, automation minimizes human errors and captures data instantly as part of the process."

Adds Tony Hansen, a physicist in the instrumentation group, "Automation also allows the development of new biochemical procedures that would otherwise be inconceivable due to the impracticality of numbers or the volume of work."

Researchers in the Genome Center's instrumentation group have taken what they call a "bottom-up evolutionary" approach to bringing robotics and automation to the Human Genome Project. Explains long-time group leader Joe Jaklevic, "Individual tasks are being addressed with the knowledge that each forms only a piece of a larger picture and that the whole picture is not yet fully understood. Equipment to perform different tasks must therefore be modular with the capability of being linked at some future time."

The advantage in this approach is that it allows for a multifaceted system of automation to evolve as the necessary hardware and software elements are developed. This means that a single component of the system can be put to use immediately when it is ready rather than waiting for the rest of the system to be in place. Such a component can also be used by itself in a "stand-alone mode" for research purposes other than mapping and sequencing the human genome, and is easy to replace if newer and improved technology becomes available.

Automation of laboratory procedures at LBL starts with communication between the biologists and the engineers to identify where the needs and opportunities are. The general criteria is that tasks which have been done many times before and will need to be done many times again should be automated. The next step is to look for commercially available machines.

"We don't want to reinvent the wheel," says Jaklevic, "so we prefer to take what is out there and adapt it to the specific needs of our biologists."

At its simplest level, automation involves the use of an anthropomorphic robot, a machine that serves as the mechanical equivalent of a human technician by mimicking the actions of a human hand. Basically, this means the robot picks things up and puts them down -- loads and unloads. While the robot may not necessarily do these actions any faster than humans, it is tireless and will not make mistakes.

One such robot that was originally developed by the Hewlett- Packard company as a "microassay system" for pharmaceutical research has been customized at LBL to load growth medium into microtiter plates. Each plate is lined with rows of tiny wells inside which geneticists grow colonies of yeast or bacteria cells that contain fragments of human DNA. Since human chromosomes are far too large to be handled intact by any known biological procedure, the DNA must be broken into fragments that, through cloning techniques, can be faithfully reproduced for mapping, sequencing, and other research purposes. Fragments as large as one megabase are cloned in yeast artificial chromosomes (YACs). Smaller fragments are usually cloned in bacterial cells, either in the form of P1 pacmids, which can accommodate 100 kilobases (100 thousand base pairs), or as cosmids, which can hold up to about 45 kilobases, or as phage chromosomes, which can accommodate nearly 25 kilobases, or as plasmids, which can carry up to 12 kilobases of human DNA.

The Hewlett-Packard robot consists of a computer-controlled arm with a gripper hand, stackers for handling microtiter plates and pipet tip racks, and eight programmable syringes for pipetting and dispensing liquid growth medium into as many as 16 microtiter wells at once. LBL engineers have added a replication tool that contains 96 stainless steel pins -- matching the total number of wells on each microtiter plate -- with which the robot, using its gripper hand, can transfer cells from one plate to another.

"We've programmed this robot to fill up to 100 plates (9,600 wells) with liquid growth medium in less than three hours," says Don Huber, one of the engineers who, along with Bill Searles and Linda Sindelar, helped adapt the robot to the needs of the Human Genome Center and now oversees its operation.

Geneticists determine the chromosomal position of their DNA fragments through libraries of carefully arrayed YACs, P1 pacmids, or cosmids. These clone libraries may consist of 50,000 or more individual fragments, each growing in its own colony in its own microtiter well. Assembling such libraries requires that cells of yeast or bacteria growing at random in Petri dishes be picked up and transferred to the growth medium in microtiter plate wells. Since any mistake in the order of this transfer can create problems that may haunt researchers for years, many top scientists insist on picking their own colonies rather than delegate the task to technicians or students. The task is traditionally performed with toothpicks, a steady hand and a sharp eye.

To free up the time of scientists for more creative endeavors, and eliminate the errors that crop up because hands and eyes lose their edge after several hours of tedium, the instrumentation group has developed a high-speed automated colony picker and arrayer that can pick colonies at the rate of one colony per second, or 3,600 an hour -- an order of magnitude faster than experienced humans.

"In our first large-scale test, we picked and arrayed a library of 10,800 colonies in a ten hour run," says Jaklevic.

The automatic colony picker is an example of what the instrumentation group calls the "second level" of automation, in which a general purpose robot (which had been tried and found to be too slow) is replaced by highly specialized machinery.

Says Theil, "By breaking an overall protocol into smaller, functional pieces, it is possible to design limited-purpose machines that can do one thing many times faster than a general purpose robot."

In the case of the high-speed colony picker, the machinery consists of two tables capable of computer-controlled movement positioned underneath a pair of plungers with a rotating carousel wheel in between that is fitted with 24 reusable needles.

To operate the system, a Petri dish containing colonies of yeast or bacteria is placed on one table, and a 96-well microtiter plate is placed on the other. Guided by a digital image that shows the location of colonies in the dish, the first table moves a colony directly underneath a plunger which at the same time is being armed with a needle from the carousel. On a signal from a computer, the plunger dips the needle into the colony and immediately retracts it with "picked" cells. The carousel then rotates the needle over to the second plunger and the table bearing the microtiter plate moves until the correct well is directly underneath it. The computer activates the second plunger, dipping the needle into the growth medium where some of the cells will be deposited. After retraction from the growth medium, the needle is rotated by the carousel through a tank filled with sterilizing liquid and made ready for reuse.

The system is designed so that while one needle is picking, another is depositing, and another is being sterilized. When fully operational and interacting with other components of the automation system being designed for the Human Genome Center, the instrumentation group expects their colony picker to be able to pick and array as many as a million colonies a year. Already, the colony picker has helped make LBL's Human Genome Center one of the major distributors of clone libraries to research organizations across the nation.

Other biological procedures important for mapping and sequencing are also in the process of being automated by the instrumentation group. In keeping with the group's bottom-up philosophy, each task is handled at its own individual station and each station can be operated by itself or be integrated into an overall flow of operations as the need arises. For example, the digital image that guides the positioning of the automatic colony picker's first table is generated at an "imaging station." This station uses a charged-coupling device (CCD) camera to produce computer-readable digitized images of colonies, gels, dot blots, and other biological materials either directly or off film autoradiographs.

Says Jaklevic, "A number of critical processes in existing and proposed mapping and sequencing protocols rely on the generation and interpretation of image data. For a sequencing rate of 1.2 megabases per year, we'll need to process approximately 150 images per week."

The instrumentation group is now testing the use of a special "cooled" CCD camera because of its superiority at detecting the faint images produced through chemiluminescence and fluorescence, two environmentally benign alternatives to the use of radioactive probes for tagging biological material. Plans call for the semiconductor diode array that replaces photographic film in a CCD camera to be chilled to about minus 40 degrees Celsisus before images are recorded. The camera and the illumination stages will be contained inside a light-tight box. Objects to be imaged, such as gels or dot-blot membranes, would be introduced into the box through a side-loader that would be designed to prevent accidental exposure to light.

Another station that has been developed by the instrumentation group is the "thermal cycler," a device that automatically cycles DNA and associated reagents through the temperature changes required for the polymerase chain reaction (PCR) process. PCR, a cloning technique in which special enzymes are used to make thousands of copies of a DNA fragment within a couple of hours, has quickly become one of the most important tools for genetic research. During this process, DNA and the replicating enzymes must react together at three different temperatures -- approximately 95, 70, and 55 degrees Celsius. The need is for temperature switching to be achieved as rapidly as possible since the time spent at each temperature is only a few seconds. The time spent in between each cycle must also be brief in order to prevent the production of faulty DNA copies that can occur during temperature changes.

The thermal cycler accomplishes all of these goals by circulating water through a bath at each of the three required temperatures. The water, which is circulated from three separate heating tanks, transfers its heat to a thin-walled microtiter plate placed directly into the bath. Water flow can be switched immediately from one tank to another. It takes about 30 seconds to transfer this heat to the reagents inside the plate, and it takes a little less than two minutes to complete an entire three- temperature cycle.

Says Hansen, "The thermal cycler can easily perform 500 PCR amplifications per day, and a unit serving six stations could perform over a million PCR amplifications per year if continuously loaded and unloaded by a general purpose laboratory robot."

Also under development by the instrumentation group are automated stations for preparing high-density gels and dot blots that would enable geneticists to speed up their screening of clone libraries, and automated stations for the preparation and synthesis of DNA from cloned vectors. DNA preparation means separating the cloned DNA from the yeast, bacteria, or enzyme in which it was made. The LBL engineers have laid out a design for a high-speed centrifuge that could accommodate a 96-well microtiter plate and an accompanying robot to load and unload it. They have also designed a system in which magnetic beads, dropped into a microtiter plate well, become attached to the DNA and a magnetic field is then used to draw the bead-DNA combo over to one side of the well.

The automation of DNA synthesis is also being looked into. Says Jaklevic, "Because the chemicals involved degrade rapidly in the presence of water and oxygen, commercial synthesizers must use a lot of tricky plumbing to keep the reagents dry. We intend to build a chamber with an inert atmosphere. Synthesis will be carried out inside this chamber under a controlled environment."

Engineering aspects of the inert-atmosphere chamber have been studied and analyzed, Jaklevic says. There are biochemical issues that require further study and this is now being done by the Human Genome Center biologists.

Perhaps the single most time-consuming step in mapping and sequencing research is the sorting of DNA fragments according to size in order to determine the positions of genes on chromosomes or the sequences of the base pairs that make up those genes. The most widely used technique for doing this is called gel electrophoresis in which fragments are placed in a polymeric gel (like agarose) and an electric field is applied. Because of DNA's negative electrical charge, the fragments move across the gel toward the positive electrode, with shorter fragments moving faster than longer fragments. Gel electrophoresis is sensitive enough to distinguish a size difference of only a single base pair, but the process is slow because it takes so long for the fragments to transverse a gel.

The instrumentation group is exploring the possibility of eliminating the need for gel electrophoresis through the use of mass spectrometry. In a mass spectrometer, a molecular sample is vaporized and the resulting ions are separated according to their mass as they pass through a magnetic field. The ions are then directed into a detector for identification and analysis. Mass spectrometry data can be collected in less than one minute per sample, which is substantially faster than the approximately four hours required to complete a gel electrophoresis run. However, past attempts at applying mass spectrometry to genome research were unsuccessful because no one could vaporize molecules as large as a typical DNA fragment.

Says Henry Benner, a chemist in the instrumentation group who is studying this problem, "Recent progress in matrix-assisted laser desorption and plasma desorption has come close to solving the vaporization problem. Our calculations suggest that sequences up to 300 bases long should be analyzable with modest extensions of existing technology."

Benner and his colleagues have constructed what they call a "time-of-flight mass spectrometry test stand" to test and develop new ways of vaporizing and ionizing DNA fragments that are larger than 300 bases. The most promising approach so far, Benner believes, is matrix-assisted laser desorption, which, potentially, could handle fragments as long as two kilobases. The matrix is an ultraviolet radiation-absorbing material that is mixed with the DNA fragments. When a pulsed UV laser zaps the mixture, the matrix absorbs the light's energy. This energy is transferred to the DNA, ionizing it and driving it down a time- of-flight tube and into a detector. Since velocity decreases with an increase in mass, the smaller fragments reach the detector first.

Benner says there are limitations with the detectors now available that require repeated analysis of a sample to accumulate a useful mass spectrum when matrix-assisted laser desorption is used to ionize the DNA fragments. The instrumentation group is also exploring the use of particle beam approaches, such as the electrospray technique in which the sample molecules are sprayed with electrically charged droplets and the mix is evaporated. So far, however, ions generated in this fashion possess a multiplicity of charge that makes the resulting mass spectra confusing and difficult to interpret. To overcome these problems, the instrumenation group plans to test new types of ion detectors on macromolecules prepared by the biologists at the Human Genome Center. These detectors will be based on either high voltage ion acceleration, secondary ion emission, or a combination of both. In addition, more advanced bolometric detectors which directly measure the thermal energy associated with ions will also be explored.

Rine has said that the challenge of sequencing by mass spectrometry is to be able to resolve the mass of, for example, the 300th base from the base at 299 with sufficient precision to know if the 300th base is guanine, adenine, thymine, or cytosine. Building upon the traditional strength of LBL in the field of detector design and development, he believes that researchers at the Human Genome Center will be able to use mass spectrometry to measure the inheritance of human genetic markers by the end of 1992.

Just as the fastest cars in the world can't get very far in a traffic jam, the advantages of using fast machines to generate mapping and sequencing data will be lost if there are bottlenecks in the collection, storage, and management of this data. Consequently, the use of robots and other automated machines must be accompanied by the use of computers. It is more than a question of sheer speed. Accuracy is also an issue. The rate of error for manual data entry is conservatively estimated to be about one percent (but realistically is probably more like five percent), which in a 50,000 fragment YAC library would mean 500 fragments could be out of order.

"Anytime you have humans transcribing data, the process will be slow and there will be errors," says Suzanna Lewis, a computer scientist with the Human Genome Center's informatics group ("informatics" being the biologists' term for anything having to do with computers). "Computers not only capture data far more quickly and accurately, but they can also send results from one machine to another."

This ability of computers to collect and distribute data from machine to machine is critical to the overall automation plans at the Human Genome Center. As Theil, who recently became the new leader of the informatics group describes it, a central computer will serve as a "laboratory controller," collecting the data gathered from each of the automation stations, then routing it on to one or more appropriate databases that are accessible through local workstations. This laboratory controller, Theil says, will be similar to, but less elaborate than the accelerator control systems that are now in use or under development at LBL.

As in the case of the instrumentation group, the informatics group must first meet the immediate needs of the biologists at LBL's Human Genome Center, then look to the more long-term need for integrating the genome data acquired at LBL with the data acquired at other laboratories. Also like the instrumentation group, the informatics group wants to adapt commercially available software whenever possible, rather than develop it from scratch, since software development is so expensive.

"We want to leverage off what has been done by others to meet the needs of our biologists as soon as possible," says Lewis. "So far, however, very little of the software which has been developed is transportable to other labs because in each instance, the software has been customized to solve particular problems."

The need for customized software is especially acute in the area of data management. Although there are a number of existing commercial databases that are capable of managing the volumes of data generated by mapping and sequencing research, they are not set up to represent the data in a format that is comprehensible to biologists. Design tools that can adapt commercial databases to meet the special needs of genome researchers are being developed by computer scientists Victor Markowitz, Arie Shoshani, and Ernest Szeto. One of the most widely used to date is called ERDraw. It is a graphical editing program for describing data in biological terms. With ERDraw, a database designer can specify a number of different types of "entities," for example, chromosomes, persons, or bibliographical citations, and describe the relationship between these entities and the respective attributes of each. ERDraw is part of a general purpose database design tool kit that has been used successfully in the design of several databases at LBL and other laboratories. This tool kit is now part of the Laboratory Information Management System, a package of software that enables geneticists to create their own personal database or electronic notebook.

One of the first large-scale data management systems developed at LBL by the Human Genome Center's informatics group was the Chromosome Information System (CIS), an experimental prototype for supporting the collaborative development of genomic maps. Similar to the CAD (Computer-Aided Design) programs for engineers, CIS lets biologists study chromosome maps and related data through text and pictures. The data is easy to access and manipulate through an interface called GenomeGuide, and readily shared through a management system called GenomeBase. Information shown on any given map can be followed to reveal other relevant information, such as contact persons or bibliographic citations, to trace data back to its experimental origins, or to get more information about a specific entry.

In addition to its human genome applications, CIS has attracted the interest of the U.S. Department of Agriculture which has several plant genome mapping projects underway. LBL's John McCarthy, one of the original developers of CIS, is now helping USDA computer scientists adapt the program to the wheat, soybean, and pine tree genomes.

The informatics group has also put together a data management system called "BIOPIX" to support the biological imaging activities at the Human Genome Center. BIOPIX stores images, such as those generated at the CCD camera-based imaging station, as separate files, and records the attributes of each in a database. Specific sets of images can be readily retrieved from this database and passed on to separate software programs for analysis.

Explains computer scientist Frank Olken, one of the developers of this system, "BIOPIX will permit Human Genome Center researchers to conduct nearly all of their imaging activities completely with digital images. We anticipate that this will facilitate proposed large-scale genetic and physical mapping experiments which must analyze hundreds or thousands of images."

Of the programs developed by the informatics group for image analysis, one of the newest and most promising is called TAP for Transposon Analysis Program. A transposon is a sequence of bacterial DNA that inserts itself into other sequences of DNA in order to activate the expression of a gene. Sometimes called "jumping genes" because of their ability to jump from place to place along bacterial genomes, transposons are being used by Human Genome Center geneticists Palazzolo and Charles Martin to break human DNA into fragments that are about 400 bases long, the largest size that current sequencing technology can accommodate. This is done by introducing transposons into samples of human DNA clones and selecting only those sets of resulting fragments which are spaced 400 base pairs apart. TAP, which was developed by Kevin Gong, working with Lewis and Theil, enables a computer to analyze images of electrophoretic gels and dot blots and select a set of fragments containing transposon inserts at a desired spacing.

"It takes about two hours for a person to analyze enough images to select a set of fragments," says Lewis. "TAP can do the same thing in a matter of seconds."

Although TAP was originally developed to handle the specific needs of Palazzolo and Martin for their transposon-based sequencing research, an enhanced version of TAP is expected to be applicable to wide variety of experiments which use gel assays, including the dog genome and fruit fly genome mapping projects.

Other image processing programs that assist in the enhancement as well as the analysis of digital images have also been developed. One such program is called GENIAL. Developed by William Johnston, former informatics group leader and the original architect of BIOPIX, GENIAL is designed for the study of phosphor-based images.

As part of its more long-term mission of integrating the genome data gathering at LBL's Human Genome Center with the national effort, the informatics group is maintaining a satellite copy of GenBank, the sequence database operated by the Los Alamos National Laboratory (LANL). LBL is one of only eight sites in the world where a satellite copy of GenBank is being maintained to test the concept. The satellite copy is updated every night from the master database at LANL rather than the quarterly update that regular subscribers to GenBank receive. LBL computer scientist Manfred Zorn, who runs the satellite copy, is also developing software that will convert GenBank sequence data into a format usable by standard data analysis programs.

Says Zorn, "GenBank is quite difficult to use. We are working to develop a user friendly interface for GenBank that will enable users to ask it questions or do data searches without going through a lot of complicated procedures."

Zorn is now working towards this end with biologists at LBL and the University of California at Berkeley, and with computer scientists at LANL and other GenBank satellite sites. He is examining other databases which are being developed for large- scale genome use as well. These include not only databases of base pair sequences, but also databases that contain sequence information for specific purposes, such as "Prosite," a database of protein sequences, and "Entrez," a database from NIH that integrates base pair and protein sequence data with bibliographies.

Computers can also be used to help biologists plan mapping and sequencing strategies. Already, computer simulations have been used to determine that the most efficient mapping strategy is one based on paired, nonrandomly selected sequence-tagged sites (STSs). STSs are short sequences of DNA that identify a unique physical location on the genome. Many previous mapping schemes have proposed the use of STSs that are derived at random from anywhere along the genome. With the help of computer simulations, biologists at LBL's Human Genome Center have found that it is better to limit the potential STSs to those derived from clone libraries.

Says Palazzolo, "The data from our simulations suggests that the nonrandom procedures require three to four-fold fewer STS assays and lead to a map that provides greater coverage and larger contigs than can be accomplished using randomly sampled probes."

As a biologist who is now working closely with both the instrumentation and informatics groups, Palazzolo believes their efforts will do more than just shorten the cost and time it takes to complete the Human Genome Project.

Says Palazzolo, "The application of automation and computers will also improve the quality and thus the utility of the genome map that is presented to the research community it is ultimately intended to benefit."

What is the human genome

Are you male or female? What color is your skin? Your eyes? Your hair? Are you tall? Short? Fat? Thin? Somewhere in between? In appearances, temperment, talents, and skills, do you favor your mother or your father, a grandmother or grandfather, or are you an original, a one-of-a-kind?

In the nucleus of each of the hundred trillion cells in your body is a very special heirloom, a "recipe book" of sorts, containing about a hundred thousand individual recipes, organized into 24 chapters, that, taken as a whole, provide a complete set of instructions for making the being that is you. This recipe book, made from your full complement of genetic material, is called a genome, and it is written in an arcane language that is universal for all life on this planet.

Were we able to read this genetic language well enough to decipher the human genome, we would have at our disposal what would almost certainly be the most powerful resource for biological and medical research ever developed. Although the genome of each of us is uniquely ours, the variation from person- to-person averages but two-tenths of one-percent (less for "identical" twins). This means that a model of one human genome could serve as a standard reference for us all. More than 4,000 diseases, including cancer and heart disease, have been identified as occurring because of a breakdown in the genetic process. A model of the human genome could reveal where and why such breakdowns occur, providing an unprecedented asset for the diagnosis and prevention of their associated diseases. It should also prove enormously valuable in the diagnosis and treatment of genetic damage caused by external factors, such as radiation, toxic chemicals and other environmental pollutants.

Just as archaeologists needed the Rosetta stone to decipher Egyptian hieroglyphics, biologists will need a "key" to decipher the human genome. This "key" is expected to be provided by the mapping and sequencing of the genome, which is the main objective of the Human Genome Project. To understand "mapping" and sequencing" it helps to review the process of how the information in the genome is translated into a human being, a living, four- dimensional space-time entity.

Your body is a community of different types of tissue, bone tissue, skin tissue, muscle tissue, nerve tissue, etc., which in turn are communities of specific types of cells. What determines the structure of a given cell, and also regulates much of the chemical activity that drives and defends the body, is protein. The ingredients for making protein are chemical compounds called amino acids, of which there are 20 varieties. Amino acids from one or more of these different varieties link together to form a "polypeptide chain". A typical protein is a polypeptide chain of some 300 amino acids. The varieties of amino acids linked and the order in which the individual acids are joined determines the nature of the protein made. This assemblage of amino acids into polypeptide chains, crucial to determining whether the final product is a fish, a flower, a microorganism, or a human being, is directed by the instructions contained in the genetic code.

It all starts with the intricately beautiful architecture of the DNA molecule, deoxyribonucleic acid, alternating lengths of phosphates and ribose sugar, strung together into two strands that wind around one another like a great spiraling staircase -- the famous "double helix." Forming the steps of the staircase are nitrogenous compounds called nucleotides or "bases," which connect the two strands at each stretch of sugar. There are four types of nucleotides -- adenine (A), cytosine (C), guanine (G), and thymine (T). They represent the alphabet of the genetic language.

In binding together to connect the DNA double strands, adenine will always pair off opposite thymine, and cytosine will always couple with guanine. A bound pair of nucleotides is often referred to as a "base pair." The human genome is made up of approximately 3 billion base pairs, or 6 billion individual nucleotides. Since the same two nucleotides always pair off, the order in which nucleotides are arranged on one helix determines the order on the other. For example, a nucleotide pattern of ACGT on one strand, means a pattern of TGCA on the other. The pattern of nucleotide distribution is called a "sequence."

The genetic code is a dictionary whose "words" are formed by sequences of three nucleotides called "codons." Each codon defines or codes for a specific amino acid. This genetic dictionary consists of a grand total of 64 codons. Since there are only 20 amino acids, many codons must have the same meaning (some codons or pairs of codons signal "stop" or "go" during the process of translation). Despite this limited vocabulary, the genetic code can transmit a tremendous amount of information. Stored in a typical molecule of human DNA is enough "text" to fill 200 Manhattan telephone directories or three years worth of New York Times newspapers (Sunday editions included!). A series of codons is arranged to form messages, called "genes". Genes are the individual "recipes" in the genome recipe book, listing the amino acid ingredients and telling how to combine them in order to make a specific kind of protein.

Several thousand nucleotides may go into the writing of a gene, but only a thousand of them might contain any actual protein-making instructions. The sequences of nucleotides in a gene that contain codes for proteins are called "exons," and tend to be relatively short in length. The remaining sequences, which usually are much much longer, are called "introns," and have no known function. Introns have been labeled by many biologists as "junk DNA," but this "junk" comprises nearly 95 percent of the human genome. The preponderance of so much genetic clutter, has led to the speculation that at least some of the introns play a role in activating a gene so that its message is "expressed" -- translated into protein-making action. Being able to read the code might answer this speculation, and might also explain the existence of several versions of the same gene -- called alleles -- and what makes some of these versions "dominant," like the gene for brown eyes, and others "recessive," like the gene for blue eyes.

The "chapters" of the human genome recipe book into which the information of the genes is organized are the tangled- together strings of DNA called "chromosomes." Untangled and separated, human chromosomes can be matched according to length into 23 distinct pairs in which one chromosome will have been contributed from each parent. The first 22 chromosome pairs -- the "autosomes" -- have been assigned numbers, according to their length, with number 1 being the largest (the average is about one fifty-thousandth of an inch). In the 23rd pair are the sex chromosomes: two, so-called, "x" chromosomes if you are female, an "x" and a "y" chromosome if you are male. Each chromosome in each pair carries several thousand genes (ranging from 50 million to 250 million nucleotides in length), arranged linearly along the chromosome like a sentence on a page. Identifying on which chromosome a specific gene is located and where along that chromosome it can be found -- the equivalent to having a table of contents for our genetic recipe book -- is called "mapping."

Beyond the applications to biological and medical research already touched upon, the Human Genome Project also holds out the promise of providing answers to some of the most vexing mysteries of life. What distinguishes the animate from the inanimate? What is the biological basis of thought? Every minute approximately 3 billion of the cells in your body die, but replacements have already been produced and put into action. At least that is the scenario until a certain point in your life when, though the genome "recipes" are still there, the cells no longer follow the script. This is what "aging" is all about, and understanding the genome might at long last explain why it happens, and if anything can be done to make it a more graceful process. At the other end of the scale, the ability to read the human genetic code might also reveal how the nation of cells that is you today sprang from what once upon a time was a single cell, a fertilized egg in the womb of your mother.

For all of these reasons and more, the Human Genome Project has been called the "Holy Grail of Biology."

The Role and Mission of the Human Genome Center at LBL

"I see the Human Genome Center as an opportunity to establish for biology the same synergistic relationship between LBL and the University of California's Berkeley campus as exists in physics and chemistry," said molecular geneticist Jasper Rine when he agreed to serve as the acting director of LBL's Human Genome Center last July. In January of this year, the word "acting" was dropped from Rine's title.

Said LBL director Charles V. Shank, when he named Rine to head the Lab's Human Genome Center, "Jasper Rine's leadership has been remarkable and inspiring, to those working directly with him as well as to the rest of LBL. I couldn't be more delighted by the present or more excited about the future."

Rine holds a Ph.D. in molecular genetics from the University of Oregon and was a postdoctoral fellow at Stanford University's School of Medicine. He has been with UC Berkeley since 1982 and is currently a professor of genetics in the Department of Molecular and Cellular Biology. At LBL, he has an appointment in the Cell and Molecular Biology Division (CMBD) which oversees the Laboratory's Human Genome Center.

Said CMBD director Mina Bissell, "Jasper Rine is the ideal director for our Human Genome Center. In addition to being a superb bench scientist, he appreciates the opportunities and support LBL can offer as a national laboratory and he knows how to put it to good use."

LBL's Human Genome Center was one of three such centers established by the Department of Energy (DOE) in 1987 as part of the national effort to decipher the human genetic code. The Center's first director, Charles Cantor, who was appointed in 1988, resigned in 1990 to become the principal scientist of DOE's human genome program. During the search for a new director, the Center was guided by biologist Sylvia Spengler, who is now its deputy directory.

LBL's Human Genome Center has specific responsibilities for mapping and sequencing chromosome 21. Under Rine's leadership, the Center has gone through a number of organizational and scientific changes that culminated in a highly successful DOE site review at the beginning of this year.

"The purpose of the LBL Human Genome Center is to bring a focused and comprehensive fusion of complementary talents to the goals of the Human Genome Project," Rine told members of the site review. "Specifically, we are working to provide very high resolution genetic maps of human chromosomes in the shortest possible time, based upon genetic markers with universal availability and maximum possible informativeness."

One of the guiding principles of the Center under Rine's leadership has been that "every dollar spent is a precious resource." Therefore, Rine believes that the major consumers of genome data should be the main beneficiaries of the Center's research and technology developments. "At this time in history," he says, "the major consumers are investigators whose focus is on the isolation of disease genes."