ALS 2000 Abstract
Performance Comparison of LAM/MPI, MPICH,
and MVICH on a Linux Cluster connected by a
Gigabit Ethernet Network.
Hong Ong and Paul Farrell, Kent State University
Abstract
Due to the increase in network hardware speed and the availability of low cost high performance workstations, cluster
computing has become increasingly popular. Many research institutes, universities, and industrial sites around the
world have started to purchase/build low cost clusters, such as Linux Beowulf-class clusters, for their parallel
processing needs at a fraction of the price of mainframes or supercomputers.
On these cluster systems, parallel processing is usually accomplished through parallel programming libraries such as
MPI, PVM [Geist], and BSP [Jonathan]. These environments provide well-defined portable mechanisms for which
concurrent applications can be developed easily. In particular, MPI has been widely accepted in the scientific parallel
computing area. The use of MPI has broadened over time as well. Two of the most extensively used MPI
implementations are MPICH [Gropp], [Gropp2] from Mississippi State University and Argonne National Laboratory
and LAM [LSC] originally from Ohio Supercomputing Center. LAM is now being maintained by the University of
Notre Dame. The modular design taken by MPICH and LAM has allowed research organizations and commercial
vendors to port the software to a great variety of multiprocessor and multicomputer platforms and distributed
environments.
Naturally, there has been great interest in the performance of LAM and MPICH for enabling high-performance
computing in clusters. Large scale distributed applications using MPI ( either LAM or MPICH ) as communication
transport on a cluster of computers impose heavy demands on communication networks. Gigabit Ethernet technology,
among others high-speed networks, can in principle provide the required bandwidth to meet these demands. Moreover
it also holds the promise of considerable price reductions, possibly even to commodity levels, as Gigabit over copper
devices become more available and use increases. However, it has also shifted the communication bottleneck from
network media to protocol processing. Since LAM and MPICH use TCP/UDP socket interfaces to communicate
messages between nodes, there have been great efforts in reducing the overhead incurred in processing the TCP/IP
stacks. However, the efforts have yielded only moderate improvement. Since then, many systems such as U-Net
[Welsh], BIP [Geoffray], and Active Message [Martin] have been proposed to provide low latency and high bandwidth
message-passing between clusters of workstations and I/O devices that are connected by a network. More recently, the
Virtual Interface Architecture (VIA) [Compaq] has been developed to standardize these ideas. VIA defines mechanisms
that will bypass layers of protocol stacks and avoid intermediate copies of data during sending and receiving messages.
Elimination of this overhead not only enables significant communication performance increases but will also result in a
significant decrease in processor utilization by the communication subsystem. Since the introduction of VIA, there have
been several software and hardware implementations of VIA. Berkeley VIA [Buonadonna], Giganet VIA [Speight],
M-VIA [MVIA], and FirmVIA [Banikazemi] are among these implementations. This has also led to the recent
development of VIA-based MPI communications libraries, noticeably MVICH [MVICH].
- View the full text of this paper in
HTML form, and
PDF form.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
- To become a USENIX Member, please see our Membership Information.
|