|
|
|
|
|
|
|
Bill Johnston, the principal investigator
of the DOE Science Grid, which enables a geographically
diverse team of scientists to form the multidisciplinary
and multinational collaborations critical to today's complex
scientific experiments. |
|
|
The Science Grid's goal is to link instruments such as electron
microscopes and synchrotrons with vast data repositories and
DOE's National Energy Research Scientific Computing Center
(NERSC)--located at Berkeley Lab and home to some of the nation's
most powerful unclassified supercomputers. These resources
are joined over DOE's Energy Sciences Network (ESnet), a high-speed
network managed and operated by Berkeley Lab staff that serves
thousands of scientists worldwide. Ultimately, the Science
Grid will provide easy access to the computing, data, and
instrument resources at Berkeley Lab, Oak Ridge National Lab,
Pacific Northwest National Lab, and Argonne National Lab.
The result is a virtual lab in which a researcher from Illinois
can obtain data from an observatory in Hawaii and analyze
it using a supercomputer in California. More important, these
resources exist on a common network, enabling a geographically
diverse team of scientists to form the multidisciplinary and
multinational collaborations critical to today's experiments.
"Science has mushroomed in its complexity and dependence
on collaborations, and the Science Grid reflects this change,"
says Bill Johnston, head of Berkeley Lab's Distributed Systems
Department and the principal investigator of the DOE Science
Grid.
What are grids? They're middleware that enable the routine
interaction of multi-institutional science and engineering
projects. They provide uniform and secure access to large
and small-scale computing, data, and instrument systems. They
support the construction of application frameworks and science
portals. And they provide distributed applications with required
infrastructure such as security services that enable operation
in multi-institutional environments.
The Science Grid is one such grid. It facilitates increasingly
complex DOE science projects that encompass several institutions,
require robust computing and data management resources, and
involve some of the world's most sophisticated and expensive
scientific instruments.
Consider the Supernova Cosmology Project. Based at Berkeley
Lab, the project involves searching the sky for supernovae
in their earliest evolutionary stages, and then measuring
the changes in their spectra and magnitude during the several
weeks of their most explosive activity. By observing these
distant and ancient exploding stars, researchers have determined
that the universe is expanding at an accelerating rate.
This discovery, Science magazine's breakthrough of
the year in 1998, owes its success to an approach called observational
cosmology--a fundamental departure from the days when scientists
reserved time at an observatory and hoped to glimpse a specific
celestial phenomenon.
Instead, raw sky images are transferred nightly from the Hubble
Space Telescope and from observatories in Hawaii, the Canary
Islands, and Chile to a central computing facility, in this
case NERSC. Here, computational calibrations eliminate sky
tracking errors as well as instrument and atmospheric effects.
Baseline comparisons eliminate asteroids and man-made satellites.
Finally, automated search algorithms comb through this refined
data for the indications of the onset of supernova activity.
"It's a classic grid problem. Widely distributed data
must be closely analyzed in order to extract the correct data,"
Johnston says. "The Science Grid makes it practical to
obtain this essential information in a uniform way."
It gets even more complex. The Supernova Cosmology Project
involves researchers in England, France, Germany, and Sweden.
A related project, the Nearby Supernova Factory, will utilize
additional observatories located on the summits of Haleakala,
Hawaii and Mount Palomar, California. This project will also
serve as a test bed for the SuperNova/Acceleration Probe,
another Berkeley Lab-led project that will combine an optical
field imager, near infrared imager, and spectrometer in a
single satellite.
In short, the search for supernovae involves many institutions,
many people, and many instruments-a perfect Science Grid application.
And it's one of many DOE science initiatives that will benefit
from the Science Grid. Every day, Berkeley Lab's synchrotron,
the Advanced Light Source, and several electron and x-ray
microscopes yield an avalanche of data. These instruments
further our understanding of everything from the structure
of proteins and cells to the characterization of metals and
computer chips. And soon, the Spallation Neutron Source will
map, in unprecedented detail, the atomic properties of materials.
Although cutting edge, these inquiries suffer from an old
problem: once an experiment is initiated, researchers can't
determine if it's on track until the data is analyzed days
or weeks later. The trick is monitoring the experiment in
real time. But because the difference between fuzzy noise
and important data is often extremely subtle, evaluating such
data streams requires enormous computing power.
Another classic grid problem, says Johnston. When NERSC is
incorporated into the Science Grid at the end of 2002, researchers
will be able to link instruments with computational models
that can potentially steer experiments. They can also compare
experiments with high resolution models of the process they
are studying. If the two match, the experiment is most likely
on target. Coupling instruments and supercomputers will also
bring real time data analysis to research.
In addition, a massive NERSC data repository that uses software
developed by a DOE Lab and IBM consortium will soon be incorporated,
furthering the Science Grid's ability to manage complex data
archives such as those generated by the DOE and National Institutes
of Health's Human Genome Project.
"Ten years ago, data facilities were secondary to supercomputers,"
Johnston says. "But as science becomes increasingly data
driven, storage facilities are rivaling computing resources
in their importance to science."
Ultimately, the Science Grid will provide easy access to a
network of computing and data archiving resources that, together,
can perform trillions of calculations per second and quickly
store and retrieve 400 times the amount of information contained
in the Library of Congress. And it will bring this power to
the desktop. Using one protocol, a scientist could run the
National Center for Atmospheric Research's global climate
model on NERSC using climate data stored at Oak Ridge National
Laboratory. Without the Science Grid, this project would require
navigating a labyrinth of protocols.
Perhaps even more important than easy access to data, instruments,
and supercomputers is the easy and secure access to other
researchers. The grid is safeguarded with a two-tiered, cryptographic
authorization process. Only certified people can access the
Science Grid, and among those only uniquely qualified researchers
can access specific resources. Both U.S. and European grids
use this system, allowing the secure exchange of information
between different grids and different nations.
"This is a major step in furthering international science
collaborations," Johnston says, adding that the more
resources on the Science Grid, the better. "It's like
the Web: it's great not because there are thousands of servers,
but thousands of servers with interesting information."
-- Dan Krotz
|
|