Ten billion bits a second: the real-world benefits
Shalf: "From the computing side, there's also a real-world
need and benefit. The source of data for our demonstration was the Cactus
simulation code, developed by the Numerical Relativity group led by Ed
Seidel at the Albert Einstein Institute of the Max Planck Institutes,
in Potsdam, Germany. Cactus is a modular framework capable of supporting
many different simulation applications, such as general relativity, binary
neutron stars, magnetohydrodynamics, and chemistry, but in this case we
were interested in binary black-hole mergers. These simulations will help
us better understand what wave signatures we should be looking for in
gravitational wave observatories like LIGO and VIRGO.
 |
 |
|
|
 |
In Berkeley Lab's Access Grid Node
John Shalf, foreground, works with Visapult's rendering of the huge
Cactus simulation. |
|
|
"Codes like Cactus can easily consume an entire supercomputer like
the 3,328-processor IBM SP at NERSC. The Cactus team ran the code at NERSC
for 1 million CPU-hours, or 114 CPU-years, performing the first-ever simulations
of the in-spiraling coalescence of two black holes. When you make these
big heroic runs, you don't want to find out after a week that one parameter
was wrong and the simulation fell apart after a few days. You need high
bandwidth to keep up with the enormous data production rate of these simulations
-- one terabyte per time step -- and with 10-gig E you can get an accurate
look at how the code is running. Otherwise, you can only get low-resolution
snapshots that are of limited usefulness.
"Remote monitoring and visualization require a system that can provide
visualization capability over wide area network connections without compromising
interactivity or the simulation performance. We used Visapult, developed
by Wes Bethel of LBNL's Visualization Group for DOE's Next Generation
Internet Combustion Corridor project several years ago. Visapult allows
you to use your desktop workstation to perform interactive volume visualization
of remotely computed datasets without downsampling of the original data.
It does so by employing the same massively parallel distributed memory
computational model employed by the simulation code in order to keep up
with the data production rate of the simulation. It also uses high performance
networking in order to distribute its computational pipeline across a
WAN [wide area network] so as to provide a remote visualization capability
that is decoupled from the cycle time of the simulation code itself."
Science Beat: "What about other applications for this capability?"
Bennett: "Initially, I think the major interest will come
from the research and university communities, until the cost comes down.
Although right now we have found 10-gig E to cost about the same as aggregating
ten 1-Gig E connections. One area that could benefit would be health care.
Having 10-gig E capability will allow streaming video at motion picture
quality, which could be useful in performing surgery and teaching. It
will also make it easier to transmit high-res medical images.
"Also, services that rely on bandwidth can benefit. Data centers
operating web servers or providing bandwidth on demand for commercial
clients would be able to offer better service, as would metropolitan area
Ethernet service providers. Basically, any place now running 1-gig E stands
to benefit from this. Farther down the road, I think the financial services
industry will find this capability useful."
Smith: "A couple of colleagues who work at Pixar [Animation
Studios] came by to view the demo. Their computer animations are a good
candidate to benefit from higher bandwidth connections. They said they
were getting a new cluster in the coming weeks and this gave them some
good ideas, especially since it is going to be a Linux cluster, as are
ours."
Science Beat: "What were the obstacles to achieving true
10-gigabit Ethernet performance?"
Bennett: "The first one is getting the 1-gig network interfaces
to run as close to that line rate as possible. Many of them only run at
600-700 megabits. Chip Smith worked with SysKonnect to get up to the gigabit
level."
Smith: "The speed bump was with Linux. When you run Linux
with the SysKonnect card, the libraries in the kernel for the SysKonnect
cards have a default behavior that would have the cards run with an average
line rate of 600-700 megabits per second.
"Working with SysKonnect, I was able to change one of the libraries
in the kernel and using a recent virtual Ethernet interface module, I
was able to get 950 to 1,000 megabits off the single interfaces. This
enabled us to run this demonstration with one-third fewer machines than
it would have without the work on the kernel. In the long run, getting
this to work also saves money on machines and per-port price that is factored
in when purchasing new machines for those who want to set up similar systems.
It also shows that 1-gig E is viable in a cluster setting."
 |
 |
|
|
From left, Wes Bethel, John Christman,
John Shalf, Chip Smith, and Mike Bennett, with the Force10 switch
and the cluster running the Cactus application. |
|
|
|
Bennett: "The second obstacle was getting network equipment
that can deliver at that rate. Force10 was able to provide the network
equipment that could handle it. Because of all the contributing vendors,
the demo was a success.
"But the most work involved building the cluster and getting the
applications to run on it, which are John's and Chip's areas of expertise."
Shalf: "And certainly it's a nontrivial feat to design an
application like Wes's Visapult that can fully overlap its computation
with pulling data off of the network at full line rate. This requires
considerable performance tuning at the application level, as well as novel
visualization algorithms like the Rogers and Crawfis image-based rendering
method on which Visapult is loosely based."
Science Beat: "Any other challenges?"
Bennett: "Well, it's definitely an exciting process. When
you're working with new technology like this, you almost hope you'll run
into a new and interesting bug -- something you haven't seen before. It's
also exciting to be able to offer this to users of our network here at
the Lab."
Science Beat: "Did you notice any similarities to previous
increases in bandwidth?"
Shalf: "At SC'95 we were asked, 'With these OC-3 lines, how
are you going to deal with this infinite bandwidth?' Our demonstration
shows that 10-gig E indeed isn't 'infinite bandwidth.' We are quite capable
of consuming this and more using an existing production simulation and
visualization application. So our excitement over the possibilities that
this new technology unlocks is tempered by the fact that we remain such
a long distance away from anything approximating 'infinite bandwidth.'"
Bennett: "I saw the same cycle when 1-gig E was rolled out
in 1998. People thought it was too expensive and that no one would use
all that bandwidth right away. But as the cost came down, demand and usage
went up. Here at the Lab, we have 1-gig E network distribution connections
to the buildings. As that fills up, we're going to be looking at upgrading
to 10-gig E."
Additional information:
An interview with the 10-gig E team leaders,
Part1
|