Welcome!

Linux Authors: Reuven Cohen, Michael Sheehan, Lavenya Dilip, Ian Thain, Bruce Armstrong

Related Topics: Linux

Linux: Article

Linux.SYS-CON.com Cover Story: Rapid Cluster Deployment

From delivery to production in hours

Here's what I did:

Create the PBS scripts to run the tests-(following the example script above - truncated version)

#!/bin/bash
#PBS -l nodes=1:ppn=2
cmd="/home/tools/bonnie++/sbin/bonnie++ -s 8000 -n 0 -f -d
/home/alice-nfs-directory/bonnie"
$cmd >& $PBS_O_WORKDIR/Log.d/run.nfs/log.bonnie.nfs.$PBS_JOBID

#!/bin/bash
#PBS -l nodes=1:ppn=2
cmd="/home/tools/bonnie++/sbin/bonnie++ -s 8000 -n 0 -f -d
/home/alice-panfs-directory/bonnie"
$cmd >& $PBS_O_WORKDIR/Log.d/run.panfs/log.bonnie.panfs.$PBS_JOBID
Run the tests-
Results from eight instances of the NFS script-
17.80MB/sec for concurrent write using NFS with eight dual-processor jobs
36.97MB/sec during read process

Results from eight instances of the PanFS script-
154MB/sec for concurrent write using PanFS with eight dual-processor jobs
190MB/sec during read process

The numbers were excellent, so I decided to scale it up:

Results from 16 instances of the NFS script-

20.59MB/sec for concurrent write using NFS with 16 dual-processor jobs
27.41MB/sec for during read process

Results from 16 instances of the PanFS script-

187MB/sec for concurrent write using PanFS with 16 dual-processor jobs
405MB/sec during read process

It's pretty much a no-brainer here. We moved to the new system. Having the Rocks Clusters platform let us integrate the Panasas solution in just under two hours.

Final Thoughts
First off, this may sound as if it's very simple to set up and manage a cluster. While the description above looks to be straightforward, the real work starts once your first user logs onto the front-end. I have a group of power users that I've unleashed on systems in the past and I'm comfortable with their ability to compile and run known good code and provide me with clear information related to the issues they encounter on the system. I wouldn't advise setting up a system and turning everyone loose on it since you might spend the next few weeks trying to resolve issues that may or may not be cluster-related.

This is where the Rocks discussion list comes in handy. Although you may have certain configuration and code-related differences with other Rocks installations, you share the same common thread with everyone, the same tools to troubleshoot and apply changes.

You'll also find your cluster may not be as different as you think it is from other clusters. I've been able to manage a large number of clusters without having the overhead of many system administrators primarily because of interactions with people on the Rocks list. By far the easiest system I've installed was a Penguin AMD Opteron cluster running Infinicon (now Silverstorm) Infiniband equipment. The team at the AMD Developer Center runs Rocks on the multiple clusters they use to provide service for developers running on their test systems and then rebuild their clusters each time a new client requests system time. They have been an excellent resource through the Rocks list with information about cluster building and day-to-day management.

About the Science
Researchers in the mechanical engineering and aeronautics & astronautics departments at Stanford University are involved in several research programs that require large-scale massively parallel computing resources to carry out first-of-a-kind simulations. The cluster is being used to compute the details of the flow and acoustic fields created by helicopters in forward flight. For this purpose two major simulation codes, SUmb and CDP, are run simultaneously to compute the flow in separate areas of the domain: SUmb computes the flow in the region near the surface of the blades where compressibility and viscous effects are dominant, and CDP resolves the wake portion where the identification of the strength and location of the trailing vortices is of fundamental importance.

SUmb (Stanford University multi-block) and CDP (named for the late Charles David Pierce) are both mas-|sively parallel flow solvers developed at Stanford under the sponsorship of the Department of Energy's ASC program. SUmb can be used to solve for the compressible flow in many applications including, but are not limited to, jet engines, subsonic and supersonic aircraft, helicopters, launch vehicles, space and re-entry vehicles, and a host of other research applications.

SUmb uses a multi-block structured meshing approach. The mesh is decomposed into a number of pieces that are distributed to each of the processors in a calculation and the Message Passing Interface (MPI) standard is used to communicate between processors using a high-bandwidth, low-latency network (InfiniBand in the case of our Rocks Cluster).

CDP uses a fully unstructured grid approach to allow more flexibility in concentrating the grid points in regions of interest. The code was developed to simulate the multiphase reacting flow in jet engine combustors, but can be applied more generally to simulate a variety of flows where important flow structure persists for a relatively long time, such as the trailing vortices generated by the helicopter's blade tips.

For coupled simulations where SUmb needs to interact with CDP, the codes use a Python-based interface to simplify access to the data structures, while allowing the core portions of the solution to be carried out using highly optimized compiled languages. Both codes have been routinely run on thousands of processors and have been readied for computations on large parallel computers such as BlueGene/L, which is expected to reach a total of about 130,000 processors.

Resources

More Stories By Steve Jones

Steve Jones is currently the technology operations manager at the Institute for Computational and Mathematical Engineering at Stanford University. Steve designed and administered a Top 500 Supercomputer and speaks regularly about the design and management of High Performance Computing Clusters, most recently as a keynote speaker at the annual Rocks-a-Palooza conference at the San Diego Supercomputing Center. His free time is spent with his significant other, Leilani, far away from a keyboard. More information about Steve can be found at http://www.hpcclusters.org.

Comments (4) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
clusteradmin.net 02/18/08 06:17:49 PM EST

For those who came here searching for cluster resources you may consider visiting my blog (http://clusteradmin.net) about cluster administration. Some introductory stuff, load-balancing guide, monitoring and other articles.

Thanks,

-marek

Grid 04/01/06 10:38:44 AM EST

Seems like SGE was not mentioned:
http://gridengine.sunsource.net

Grid 04/01/06 10:36:27 AM EST

Seems like SGE was not mentioned:
http://gridengine.sunsource.net

SYS-CON Belgium News Desk 03/17/06 09:36:01 AM EST

After building a number of clusters from the ground up -including one that made it to the Top500 Supercomputer list - I decided to try a service that many vendors now offer - having a system racked and stacked at the factory then shipped to us. Such a service saves a huge amount of time, not to mention my back, not having to build the cluster and cable all the equipment together. I've been a fan of well-cabled systems and have found the quality control to be acceptable. The key component is the pre-build requirements and verification before the system is built. This will ensure the system shipped is what is expected when it arrives at your front door. There can still be a fair amount of cabling that has to be done once it arrives, if you have a multi-rack configuration, but it's usually limited to plugging in the system's power and public network.