Sung-Hou Kim, a senior staff scientist with Berkeley Lab's
Physical Biosciences Division and an internationally
recognized leader in X-ray crystallography, is putting
together a structural biology undertaking that might one day
rival the Human Genome Project. The goal is to determine the
structures of a core set of proteins that would be
representative of all types of protein structures.
Thomas Earnest and Natasha Khlebtsova are among those working at the
Macromolecular Crystallography Facility, where the intense x-rays of the ALS are providing
new opportunities for solving protein structures | Photo by Roy Kaltschmidt
Function follows form in the spiraled, zig-zagged, folded world of proteins. Knowing
the structure of a protein encoded by a specific gene sequence is critical for biomedical
researchers seeking to understand the what and how of that protein's function. X-ray
crystallography in combination with computerized image analysis is an excellent technique
for determining protein structures. However, even with the availability of new
state-of-the art resources, such as the Macromolecular Crystallography Facility at
Berkeley Lab's Advanced Light Source, researchers such as Kim cannot keep pace with the
enormous influx of new genetic sequencing data.
Noting that the human genome might code for as many as 100,000 different proteins, Kim
recently told a reporter for Science magazine, "There are too many genes,
making it too difficult to reasonably determine them all."
In response to this dilemma, Kim organized a meeting of some 70 other protein
crystallographers that took place on Jan. 24 and 25 at Argonne National Laboratory. At
this meeting, they discussed a "targeted" strategy as a practical alternative to
trying to identify the structure of each and every protein identified through genetic
The targeted strategy has been warmly received by representatives from the U.S.
Department of Energy and the National Institutes of Health, the two agencies most likely
to foot the bill for such a national initiative. Many details remain to be worked out,
however, including the basis on which to classify proteins into a representative core. One
approach put forward would group proteins according to "folds," which refers to
how key components of a protein are arranged. While promising, this approach could also be
unwieldy because there may exist more than 1,000 different types of folds for proteins.
Kim has suggested that one way to speed up the discovery of new protein folds would be
to target only proteins that appear to be exceptionally good candidates for having novel
folds. To find such candidates, he proposes to study genes from simple organisms.
"The idea is just to pick a small, self-replicating organism that presumably has a
small number of genes but a large fraction of the three-dimensional folds," he told Science.
Kim and his colleagues here at Berkeley are testing this idea on the proteins coded for
in the fully sequenced genome of a microbe called Methanococcus jannaschii. They are
looking at 20 of the microbe's genes -- 10 that resemble known genes and 10 that do not.
From this gene pool, they have so far purified five proteins and determined the structures
of three of them. None yielded a new fold, but Kim remains confident in the idea.
The number of proteins that might constitute a representative sample of all proteins
could range from 1,000 to 10,000. At the current price of $150,000 per structure, even a
targeted strategy for identifying protein structures could cost $1.5 billion. Expensive,
yes -- but Kim says there may be no choice. "What else can we do if we're trying to
get the function of as many gene products as possible?"
At the conclusion of the January meeting, attendees agreed to meet in April and again
in October to try to work out a more detailed plan.