Features
Linux.SYS-CON.com Cover Story: Rapid Cluster Deployment
From delivery to production in hours
Mar. 17, 2006 08:15 AM
Digg This!
Page 1 of 4
next page »
After building a number of clusters from the ground up -including one that made it to the Top500 Supercomputer list - I decided to try a service that many vendors now offer - having a system racked and stacked at the factory then shipped to us. Such a service saves a huge amount of time, not to mention my back, not having to build the cluster and cable all the equipment together. I've been a fan of well-cabled systems and have found the quality control to be acceptable. The key component is the pre-build requirements and verification before the system is built. This will ensure the system shipped is what is expected when it arrives at your front door. There can still be a fair amount of cabling that has to be done once it arrives, if you have a multi-rack configuration, but it's usually limited to plugging in the system's power and public network.
Once this is done, the fun begins...
I've tried a few cluster distribution toolkits, and the one that works for me is the Rocks Cluster Distribution from the San Diego Supercomputing Center. I came across the package in a simple Google search in 2002 and was immediately sold on it. I use the term "sold" loosely since it's under an Open Source BSD-style license available for download and supported by a broad range of technical people who answer most questions on the Rocks user list. I've found support on the list to be better than most commercial distributions, but this may be because there are over 500 registered systems on the Rocks Register.
Here's how simple it is - insert the boot CD, complete a few screens worth of configuration data, and grab a coffee because it's a fairly simple base installation. The Rocks solution is extensible, with a mechanism for users and software vendors to ensure customizations are correctly installed on the system at setup. The mechanism is called a Roll.
The Roll typically consists of packages (RPMS/SRPMS/source) that have to be installed and scripts that are needed to ensure the packages are properly installed and distributed on the cluster. The Rocks team has extensive documentation for the Roll developer in the user manual.
Rocks 4.0.0 is a "cluster on a CD" set. That is it contains all the bits and configuration to build a cluster from "naked" hardware. The core OS bundled with Rocks is CentOS 4, which is a freely downloadable rebuild of Red Hat Enterprise Linux 4. As a side note, in Rocks CentOS 4 is encapsulated as the "OS Roll" and this OS Roll can be substituted with any Red Hat Enterprise Linux 4 rebuild (e.g., Scientific Linux ) including the official bits from Red Hat. Rolls are used in Rocks to customize your cluster. For example, the HPC Roll contains cluster-specific packages, such as an MPI environment for developing and running parallel programs. Two other examples are the Ganglia Roll, which provides cluster-monitoring tools, and the Area51 Roll, which provides security tools such as Tripwire and chkrootkit.
The Software
The core OS we used for the cluster in this article is CentOS 4.0 and the rolls we used to customize the cluster to our needs were the Compute Roll and the PBS Roll from University of Tromso in Norway.
The Hardware
- 1 - Front-end node - a Dell PowerEdge 2850 with dual 3.6GHz Intel Xeon EM64T processors and 4GB RAM
- 48 - Compute nodes - Dell PowerEdge SC 1425s with dual 3.4GHz Intel Xeon EM64T processors, 2GB RAM and a Topspin PCI-X Infiniband HCA card
- 1 - Topspin 270 Infiniband chassis with modules
- 4 - Dell PowerConnect 5324 Gigabit Ethernet switches
- 1 - Panasas Storage Cluster with one DirectorBlade and 10 StorageBlades
- 2 - Dell 19-inch racks
Start the build process ***time 0:00:00***
Setting up the front-end:
- Insert Compute Roll and boot the system
- Select hpc, kernel, ganglia, base, java, and area51 as the rolls to install
- Select "Yes" for additional roll
- Insert CentOS disk 1
- Select "Yes" for additional roll
- Insert CentOS disk 2
- Select "Yes" for additional roll
- Insert PBS roll
- Select "No" for additional rolls
- Input data on the configuration screen (e.g., fully qualified domain name, root password, IP addresses)
- Select "Disk Druid" to create partitions
- Create/partition ext3 64GB
- Create swap partition 4GB
- Create/export partition 64GB
- Insert CDs as requested to merge them into the distribution
The most important step...grab a mocha and enjoy it while the install runs.
After the-front end installation completes, the site-specific customization
of the front-end starts. The base installation of CentOS 4.0 x86_64 has the 2.6.9-5.0.5.ELsmp kernel and we need the 2.6.9-11.ELsmp for many of the packages that will be included with our cluster. Below we'll describe how we do this key upgrade then continue with many package and mount point customizations.
Customization of the front-end:
The first step is to apply the updated kernel packages:
- # rpm -ivh kernel-smp.2.6.9-11.EL.x86_64.rpm
- # rpm -ivh kernel-smp-devel-2.6.9-11.EL.x86_64.rpm
- # rpm -ivh kernel-sourcecode-2.6.9-11.EL.x86_64.rpm
I always check /boot/grub/grub.conf to be sure the system is booting from the proper kernel after an update.
Then apply an RPM to resolve a known (to us) library issue:
- # rpm -ivh compat-libstdc++-33-3.2.3-47.3.i386.rpm
Prepare for Panasas Storage Cluster and Topspin integration on the front-end:
- # rpm -ivh panfs-2.6.9-11.EM-mp-2.2.3-166499.27.rhel_4_amd64.rpm
- # rpm -ivh topspin-ib-rhel4-3.1.0-113. x86_64.rpm
- # rpm -ivh topspin-ib-mod-rhel4-2.6.9-11.ELsmp-3.1.0-113.x86_64.rpm
Time for a break. ***time 1:05:00***
I need to complete the setup of the disks on the front-end because there are two RAID volumes and Rocks only configures the first disk (the boot disk) on the front-end leaving the other disks untouched.
Create a second partition for applications:
# fdisk /dev/sdb
(400GB single partition on our system)
Create the file system and mount point:
# mkfs -t ext3 /dev/sdb1
# mkdir /space
Modify /etc/fstab to include the mount point then mount it:
# mount /space
Now let's start adding some of the goods...
- Install Portland Group compilers in /space/apps/pgi/
- Install Intel 9.0 compilers in /space/apps/pgi
- Install OSU MVAPICH 0.95 in /space/apps/mvapich
- Build version of MVAPICH for intel/gnu/pgi
- Install our own version of Python in /space/apps/python64
- Install f2py, Numeric, pyMPI built against our vanilla version of python64
The Rocks solution uses a simple XML framework to provide a programmatic way in which to apply site-specific customizations to compute nodes. While this requires a small learning curve regarding the XML syntax, once you make the transition from "administering" your cluster to "programming" your cluster, you'll find that writing programs (in the form of scripts) are a powerful way in which to ensure your site customizations are consistent across all compute nodes. The following describes how we used the XML framework to apply our customizations.
Page 1 of 4
next page »
About Steve JonesSteve Jones is currently the technology operations manager at the Institute for Computational and Mathematical Engineering at Stanford University. Steve designed and administered a Top 500 Supercomputer and speaks regularly about the design and management of High Performance Computing Clusters, most recently as a keynote speaker at the annual Rocks-a-Palooza conference at the San Diego Supercomputing Center. His free time is spent with his significant other, Leilani, far away from a keyboard. More information about Steve can be found at http://www.hpcclusters.org.