Next Generation Supercomputers: Cluster of Multi-processor Systems Project Launched
|By Jeffery Kahn, [email protected]
November 25, 1996
BERKELEY, CA - Mass manufacturing and high speed computer networks -- exploit both and you will build the supercomputer of the future. That's the conviction of researchers at Ernest Orlando Lawrence Berkeley National Laboratory (Berkeley Lab).
Joining with the University of California at Berkeley's Computer Science Division and Sun Microsystems, Berkeley Lab will develop a unique networked cluster of largely off-the-shelf equipment that will be put to work by scientists and then evaluated. Many here believe that this COMPS (Cluster of Multiprocessor Systems) project will serve as the architectural blueprint for the next generation of supercomputers.
Initially, the COMPS prototype will consist of three Sun SMP (symmetric multiprocessor) computers connected together and to the world via a mix of lighting-fast networks. Two of these computers will have eight processors (CPUs) and the third Sun will have between two and four processors. In order for these relatively inexpensive machines to function as a supercomputer, systems software and network technology must be developed that allows the 18-20 CPUs on the three separate computers along with multiple memory storage units to function as though they were a single machine. The challenge is formidable.
To succeed, researchers must overcome the communication delay or "latency" that has been inherent when clusters of computers are linked together via a network. Anybody who has ever downloaded a World Wide Web page is familiar with such delays. To speed up the flow of information between machines, multiple 622 megabit/second ATM networks will interconnect the Sun computers as well as link them to scientific instruments. Another type of network technology, SCI (Scalable Coherent Interface), will be used to accelerate the progression of parallel programming tasks and exchange of information between the many processors.
Berkeley Lab's National Energy Research Scientific Computing Center (NERSC), which is home to the most powerful combination of unclassified computing and networking resources in the United States, will host the COMPS project. Lead scientists on the project include Horst Simon, who heads Berkeley Lab's NERSC Division; Berkeley Lab's William Johnston, who heads its Imaging and Distributed Computing Group; David Culler, UC Berkeley Computer Science Division professor and NERSC researcher; and Greg Papadopoulos, vice president and chief technology officer for Sun.
Sun, which has provided two Enterprise 4000 computers, is making its entry into the field of supercomputers with this project. Said Papadopoulos, "Sun's system architecture and high performance computing strategy are ideally suited to help NERSC achieve its goal of being a leader in the next wave of high performance computing. In addition, this collaborative research program will assist Sun in the development of many of the key technologies needed to advance the capabilities and performance of networked clusters of SMP systems."
COMPS is being launched at a time when the future design of high performance computers is particularly uncertain. The architecture of high performance computing systems ranges from machines with a few powerful vector processors to massively parallel machines with thousands of processors to networks of single-processor workstations, and now to COMPS, a network of multiprocessor workstations. As for the processors used in these different machines, they can be inexpensive off-the-shelf commodities or unique, custom-made CPUs that cost tens of millions of dollars.
Horst Simon says NERSC, which provides high performance computing to thousands of Department of Energy researchers all over the world, has a major stake in helping to resolve the architectural puzzle posed by the current array of choices.
"What is the most efficient and inexpensive way to put together a set of components to do supercomputing?" asks Simon. "Clusters of symmetric multiprocessor systems have demonstrated their supercomputing potential. What we're now seeking to determine is whether COMPS provides a price and performance advantage for NERSC users. Our approach with COMPS is to network clusters of SMP machines that use off-the-shelf, inexpensive processors. What NERSC needs to determine is whether this approach really can provide capability computing for a large national user facility."
David Culler has tackled the problems of transforming a cluster of workstations into a single system. Currently, he heads the Network of Workstations (NOW) project which links 140 single processor workstations within Berkeley's Soda Hall. COMPS differs from NOW in that it links multiprocessor workstations. Yet many of the communications issues involved are similar.
When information flows over a network, it is broken down into small packets. The time spent preparing packets for transmission and receiving them off the network is extremely high, and slows communications immensely. Culler and his team at Berkeley have developed a package of Fast Communications tools, including Fast Sockets, that eliminates much of the communication overhead and lag.
Says Culler, "The use of multiprocessor nodes is essential for very large systems and it opens up valuable opportunities for clusters of all scale, including higher transfer bandwidth using multiple network interfaces, better resource sharing, and new approaches to fault tolerance."
William Johnston heads a team that is creating virtual laboratories, using the Internet to make unique scientific equipment accessible from all over the world. Collaborators at different locations can conduct an experiment and discuss results while it is in progress. Like Culler, Johnston too must contend with the issue of communications latency if true remote control of machines is to be achieved.
Says Johnston, "Scientific experiment instrumentation control is increasingly using `machine intelligence' approaches to automate experiment control functions through the real-time analysis of experiment data, which is also a very computationally intensive process. It is our hypothesis that the COMPS architecture will be capable of addressing both the traditional numerical computation, and the rapidly increasing needs of the experimental science community for high performance computing and storage. Further, all of these scientific computing activities can use, and will benefit from, a common and incrementally scalable computing system architecture that can be -- as needed -- widely distributed around the new high-speed, wide area networks."
The COMPS prototype will be developed and tested through actual use by scientists. In one project, 200 megabit/second data streams from physics detector experiments will be fed to COMPS for analysis. A second test, this involving a "machine intelligence" based control system, will analyze live digital video from an electron microscope at Berkeley Lab's National Center for Electron Microscopy. COMPS also will be used for the on-line analysis and control of an ultra-high resolution micro-spectroscopy instrument at Berkeley Lab's Advanced Light Source.
Berkeley Lab conducts unclassified scientific research for the U.S. Department of Energy. It is located in Berkeley, California and is managed by the University of California.