Protein Structure Initiative Relies On X-Ray Crystallography

February 20, 1998

By Lynn Yarris,

Sung-Hou Kim, a senior staff scientist with Berkeley Lab's Physical Biosciences Division and an internationally recognized leader in X-ray crystallography, is putting together a structural biology undertaking that might one day rival the Human Genome Project. The goal is to determine the structures of a core set of proteins that would be representative of all types of protein structures.

Thomas Earnest and Natasha Khlebtsova
Thomas Earnest and Natasha Khlebtsova are among those working at the Macromolecular Crystallography Facility, where the intense x-rays of the ALS are providing new opportunities for solving protein structures | Photo by Roy Kaltschmidt

Function follows form in the spiraled, zig-zagged, folded world of proteins. Knowing the structure of a protein encoded by a specific gene sequence is critical for biomedical researchers seeking to understand the what and how of that protein's function. X-ray crystallography in combination with computerized image analysis is an excellent technique for determining protein structures. However, even with the availability of new state-of-the art resources, such as the Macromolecular Crystallography Facility at Berkeley Lab's Advanced Light Source, researchers such as Kim cannot keep pace with the enormous influx of new genetic sequencing data.

Noting that the human genome might code for as many as 100,000 different proteins, Kim recently told a reporter for Science magazine, "There are too many genes, making it too difficult to reasonably determine them all."

In response to this dilemma, Kim organized a meeting of some 70 other protein crystallographers that took place on Jan. 24 and 25 at Argonne National Laboratory. At this meeting, they discussed a "targeted" strategy as a practical alternative to trying to identify the structure of each and every protein identified through genetic sequencing.

The targeted strategy has been warmly received by representatives from the U.S. Department of Energy and the National Institutes of Health, the two agencies most likely to foot the bill for such a national initiative. Many details remain to be worked out, however, including the basis on which to classify proteins into a representative core. One approach put forward would group proteins according to "folds," which refers to how key components of a protein are arranged. While promising, this approach could also be unwieldy because there may exist more than 1,000 different types of folds for proteins.

Kim has suggested that one way to speed up the discovery of new protein folds would be to target only proteins that appear to be exceptionally good candidates for having novel folds. To find such candidates, he proposes to study genes from simple organisms.

"The idea is just to pick a small, self-replicating organism that presumably has a small number of genes but a large fraction of the three-dimensional folds," he told Science.

Kim and his colleagues here at Berkeley are testing this idea on the proteins coded for in the fully sequenced genome of a microbe called Methanococcus jannaschii. They are looking at 20 of the microbe's genes -- 10 that resemble known genes and 10 that do not. From this gene pool, they have so far purified five proteins and determined the structures of three of them. None yielded a new fold, but Kim remains confident in the idea.

The number of proteins that might constitute a representative sample of all proteins could range from 1,000 to 10,000. At the current price of $150,000 per structure, even a targeted strategy for identifying protein structures could cost $1.5 billion. Expensive, yes -- but Kim says there may be no choice. "What else can we do if we're trying to get the function of as many gene products as possible?"

At the conclusion of the January meeting, attendees agreed to meet in April and again in October to try to work out a more detailed plan.

Search | Home | Questions