Berkeley Lab Highlights nameplate
Basic and Computational Science banner
 
Science Grid Creates Home
for Far-Flung Research
    Today's science is big science. A modern high-energy physics experiment requires hundreds of researchers from several countries and generates enough data to fill millions of CD-ROMs. To bring these resources together, and to foster the next generation of large-scale Department of Energy (DOE) science research, Berkeley Lab scientists are spearheading the development of the DOE Science Grid, an infrastructure for using and managing widely distributed computing and data resources in the science environment.
     
   
   
  Bill Johnston, the principal investigator of the DOE Science Grid, which enables a geographically diverse team of scientists to form the multidisciplinary and multinational collaborations critical to today's complex scientific experiments.
   

The Science Grid's goal is to link instruments such as electron microscopes and synchrotrons with vast data repositories and DOE's National Energy Research Scientific Computing Center (NERSC)--located at Berkeley Lab and home to some of the nation's most powerful unclassified supercomputers. These resources are joined over DOE's Energy Sciences Network (ESnet), a high-speed network managed and operated by Berkeley Lab staff that serves thousands of scientists worldwide. Ultimately, the Science Grid will provide easy access to the computing, data, and instrument resources at Berkeley Lab, Oak Ridge National Lab, Pacific Northwest National Lab, and Argonne National Lab.

The result is a virtual lab in which a researcher from Illinois can obtain data from an observatory in Hawaii and analyze it using a supercomputer in California. More important, these resources exist on a common network, enabling a geographically diverse team of scientists to form the multidisciplinary and multinational collaborations critical to today's experiments.

"Science has mushroomed in its complexity and dependence on collaborations, and the Science Grid reflects this change," says Bill Johnston, head of Berkeley Lab's Distributed Systems Department and the principal investigator of the DOE Science Grid.

What are grids? They're middleware that enable the routine interaction of multi-institutional science and engineering projects. They provide uniform and secure access to large and small-scale computing, data, and instrument systems. They support the construction of application frameworks and science portals. And they provide distributed applications with required infrastructure such as security services that enable operation in multi-institutional environments.

The Science Grid is one such grid. It facilitates increasingly complex DOE science projects that encompass several institutions, require robust computing and data management resources, and involve some of the world's most sophisticated and expensive scientific instruments.

Consider the Supernova Cosmology Project. Based at Berkeley Lab, the project involves searching the sky for supernovae in their earliest evolutionary stages, and then measuring the changes in their spectra and magnitude during the several weeks of their most explosive activity. By observing these distant and ancient exploding stars, researchers have determined that the universe is expanding at an accelerating rate.

This discovery, Science magazine's breakthrough of the year in 1998, owes its success to an approach called observational cosmology--a fundamental departure from the days when scientists reserved time at an observatory and hoped to glimpse a specific celestial phenomenon.

Instead, raw sky images are transferred nightly from the Hubble Space Telescope and from observatories in Hawaii, the Canary Islands, and Chile to a central computing facility, in this case NERSC. Here, computational calibrations eliminate sky tracking errors as well as instrument and atmospheric effects. Baseline comparisons eliminate asteroids and man-made satellites. Finally, automated search algorithms comb through this refined data for the indications of the onset of supernova activity.

"It's a classic grid problem. Widely distributed data must be closely analyzed in order to extract the correct data," Johnston says. "The Science Grid makes it practical to obtain this essential information in a uniform way."

It gets even more complex. The Supernova Cosmology Project involves researchers in England, France, Germany, and Sweden. A related project, the Nearby Supernova Factory, will utilize additional observatories located on the summits of Haleakala, Hawaii and Mount Palomar, California. This project will also serve as a test bed for the SuperNova/Acceleration Probe, another Berkeley Lab-led project that will combine an optical field imager, near infrared imager, and spectrometer in a single satellite.

In short, the search for supernovae involves many institutions, many people, and many instruments-a perfect Science Grid application.

And it's one of many DOE science initiatives that will benefit from the Science Grid. Every day, Berkeley Lab's synchrotron, the Advanced Light Source, and several electron and x-ray microscopes yield an avalanche of data. These instruments further our understanding of everything from the structure of proteins and cells to the characterization of metals and computer chips. And soon, the Spallation Neutron Source will map, in unprecedented detail, the atomic properties of materials.

Although cutting edge, these inquiries suffer from an old problem: once an experiment is initiated, researchers can't determine if it's on track until the data is analyzed days or weeks later. The trick is monitoring the experiment in real time. But because the difference between fuzzy noise and important data is often extremely subtle, evaluating such data streams requires enormous computing power.

Another classic grid problem, says Johnston. When NERSC is incorporated into the Science Grid at the end of 2002, researchers will be able to link instruments with computational models that can potentially steer experiments. They can also compare experiments with high resolution models of the process they are studying. If the two match, the experiment is most likely on target. Coupling instruments and supercomputers will also bring real time data analysis to research.

In addition, a massive NERSC data repository that uses software developed by a DOE Lab and IBM consortium will soon be incorporated, furthering the Science Grid's ability to manage complex data archives such as those generated by the DOE and National Institutes of Health's Human Genome Project.

"Ten years ago, data facilities were secondary to supercomputers," Johnston says. "But as science becomes increasingly data driven, storage facilities are rivaling computing resources in their importance to science."

Ultimately, the Science Grid will provide easy access to a network of computing and data archiving resources that, together, can perform trillions of calculations per second and quickly store and retrieve 400 times the amount of information contained in the Library of Congress. And it will bring this power to the desktop. Using one protocol, a scientist could run the National Center for Atmospheric Research's global climate model on NERSC using climate data stored at Oak Ridge National Laboratory. Without the Science Grid, this project would require navigating a labyrinth of protocols.

Perhaps even more important than easy access to data, instruments, and supercomputers is the easy and secure access to other researchers. The grid is safeguarded with a two-tiered, cryptographic authorization process. Only certified people can access the Science Grid, and among those only uniquely qualified researchers can access specific resources. Both U.S. and European grids use this system, allowing the secure exchange of information between different grids and different nations.

"This is a major step in furthering international science collaborations," Johnston says, adding that the more resources on the Science Grid, the better. "It's like the Web: it's great not because there are thousands of servers, but thousands of servers with interesting information."

-- Dan Krotz

     
 
 
< Highlights Top ^
Next >