|Taking the Measure of Supercomputer Architectures|
|Contact: Jon Bashor, [email protected]|
Members of Berkeley Lab's Computing Sciences divisions are applying their expertise in running scientific codes and evaluating high-performance computers to achieve "real world" assessments of leading supercomputers around the world. Their goal is to determine which architectures are best suited for advancing computational science.
With the re-emergence of viable vector computing systems such as the Earth Simulator and the Cray X1, and with IBM and DOE's BlueGene/L taking the number-one spot on the TOP500 list of the world's fastest computers, there is renewed debate about which architecture is best suited for running large-scale scientific applications.
In order to cut through conflicting claims, researchers from Berkeley Lab's Computational Research and NERSC Center divisions have been putting various architectures through their paces, running benchmarks as well as scientific applications key to Department of Energy programs. The team includes Lenny Oliker, Julian Borrill, Andrew Canning, and John Shalf of CRD, Jonathan Carter and David Skinner of NERSC, and Stephane Ethier of the Princeton Plasma Physics Laboratory. Their evaluations have resulted in a half-dozen papers published in journals and presented at conferences in the United States, Norway, Japan, and Spain.
In the initial part of their study, the team traveled to Japan in December, 2004 and put five different systems through their paces, running four different scientific applications key to DOE research programs. As part of the effort, the group became the first international team to conduct a performance evaluation study of the 5,120-processor Earth Simulator.
The team also assessed the performance of
"This effort relates to the fact that the gap between peak and actual performance for scientific codes keeps growing," said team leader Lenny Oliker. "Because of the increasing cost and complexity of HPC systems" high-performance computing systems "it is critical to determine which classes of applications are best suited for a given architecture."
The four applications and research areas selected by the team for the evaluation were
"The four applications successfully ran on the Earth Simulator with high parallel efficiency," Oliker said. "And they ran faster than on any other measured architecture generally by a large margin." However, Oliker added, only codes that scale well and are suited to the vector architecture may be run on the Earth Simulator. "Vector architectures are extremely powerful for the set of applications that map well to those architectures," Oliker said. "But if even a small part of the code is not vectorized, overall performance degrades rapidly."
One of the codes, LBMHD, ran at 67 percent of peak system performance, even when scaled up to 4,800 processors. However, as with most scientific inquiries, the ultimate solution to the problem is neither simple nor straightforward.
"We're at a point where no single architecture is well suited to the full spectrum of scientific applications," Oliker said. "One size does not fit all, so we need a range of systems. It's conceivable that future supercomputers would have heterogeneous architectures within a single system, with different sections of a code running on different components."
One of the codes the group intended to run in this study MADCAP, the Microwave Anisotropy Dataset Computational Analysis Package did not scale well enough to be used on the Earth Simulator. MADCAP, developed by Julian Borrill, is a parallel implementation of cosmic microwave background map-making and power spectrum estimation algorithms. Since MADCAP has high input-output (I/O) requirements, its performance was hampered by the lack of a fast global file system on the Earth Simulator.
Undeterred, the team retuned MADCAP and returned to Japan to try again. The results, outlined in a paper titled "Performance characteristics of a cosmology package on leading HPC architectures" and presented at the Eleventh International Conference on HPC in Bangalore, India, found that the Cray X1 had the best runtimes for MADCAP but suffered the lowest parallel efficiency. The Earth Simulator and IBM Power3 demonstrated the best scalability, and the code achieved the highest percentage of peak on the Power3. The paper concluded, "Our results highlight the complex interplay between the problem size, architectural paradigm, interconnect, and vendor-supplied numerical libraries, while isolating the I/O filesystem as the key bottleneck across all the platforms."
BlueGene/L is currently the world's fastest supercomputer, with the first Blue Gene system being installed at Lawrence Livermore National Laboratory. David Skinner is serving as Berkeley Lab's representative to a new BlueGene/L consortium led by Argonne National Laboratory. The consortium aims to pull together a group of institutions active in HPC research, collectively building a community focused on the BlueGene family as a next step towards petascale computing. This consortium will work together to develop or port BlueGene applications and system software, conduct detailed performance analysis on applications, develop mutual training and support mechanisms, and contribute to future platform directions.