Welcome!

Linux Authors: Michael Sheehan, Lavenya Dilip, Ian Thain, Bruce Armstrong, Ellen Rubin

Related Topics: Linux

Linux: Article

Linux.SYS-CON.com Cover Story: Rapid Cluster Deployment

From delivery to production in hours

Put the package you want to add in:

/home/install/contrib./4.0.0/arch/RPMS

Create a new XML configuration file that will extend the current compute.xml configuration file:

# /home/install/site-profiles/4.0.0/nodes
# cp skeleton.xml extend-compute.xml

Inside the extend-compute.xml, add the package name with a <package> tag.

Insert any post installation scripting in the <post> section.

This is a truncated version of an extend-compute.xml script we use:

<?xml version="1.0" standalone="no"?>
<kickstart>

<!-- There may be as many packages as needed here. -->
<package> kernel-smp </package>
<package> kernel-smp-devel </package>
<package> kernel-sourcecode </package>
<package> topspin-ib-rhel4 </package>
<package> topspin-ib-mod-rhel4 </package>
<package> compat-libstdc++-33 </package>
<package> panfs </package>
<package> F2PY-fix </package>
<post>

<file name="/etc/profile.d/topspinvars.sh">
if ! echo ${LD_LIBRARY_PATH} | grep -q /usr/local/topspin/mpi/mpich/lib64;

then

LD_LIBRARY_PATH=/usr/local/topspin/mpi/mpich/lib64:${LD_LIBRARY_PATH}
fi
if ! echo ${LD_LIBRARY_PATH} | grep -q /usr/local/topspin/lib64 ; then
LD_LIBRARY_PATH=/usr/local/topspin/lib64:${LD_LIBRARY_PATH}
fi
</file>

<file name="/etc/profile.d/topspinvars.csh">
if ( /usr/local/topspin/mpi/mpich/lib64 !~ "${LD_LIBRARY_PATH}" ) then
set LD_LIBRARY_PATH = ( /usr/local/topspin/mpi/mpich/lib64$LD_LIBRARY_PATH )
endif
if ( /usr/local/topspin/lib64 !~ "${LD_LIBRARY_PATH}" ) then
set LD_LIBRARY_PATH = ( /usr/local/topspin/lib64 $LD_LIBRARY_PATH )
endif
</file>

<!-- Setup for Portland Group Compilers -->
<file name="/etc/profile.d/pgi-path.sh">
LM_LICENSE_FILE=/space/apps/pgi/license.dat
export LM_LICENSE_FILE
export PGI=/space/apps/pgi
if ! echo ${PATH} | grep -q /space/apps/pgi/linux86-64/6.0/bin ;

then

PATH=/space/apps/pgi/linux86-64/6.0/bin:${PATH}
fi
</file>

<file name="/etc/profile.d/pgi-path.csh">
setenv PGI /space/apps/pgi
setenv LM_LICENSE_FILE $PGI/license.dat
if ( /space/apps/pgi/linux86-64/6.0/bin !~ "${path}" ) then
set path = ( /space/apps/pgi/linux86-64/6.0/bin $path )
endif
</file>

</kickstart>

To apply the customized configuration scripts to the compute nodes, rebuild the distribution:

# cd /home/install
# rocks-dist dist

Time for a break. ***time 2:30:00

Switch Configurations
- Setup IP information
- Enable "fast link" on all ports
- Link aggregation (four 1GB aggregated links between switches, four 1GB aggregated link for the Panasas Installation)
- Setup configuration on Topspin 270 switch

Insert the Compute Nodes
- Invoke 'insert-ethers' on the front-nd node and select 'compute node' for the appliance type (insert-ethers is used to discover compute nodes and ensure they get the proper configuration)

I chose to make modifications to each compute node for remote management access for remote boot/power cycle, etc. This added two minutes to each node's install time. I also like to make sure each node is discovered in the order in which I power them on (left to right, bottom to top in the racks). This process takes more time, but it saves a huge amount of time when tracking down hardware failures. It also makes labeling them following the compute node installation easy, with the utility provided by Rocks for creating labels with identification information. (This little tool really shows that the distribution was developed by people who manage clusters.) 48 nodes x 3.5 minutes per node = 2 hours 48 minutes for this process due to my requirements. It could be far less time depending on your needs.

Time for a break. ***time 5:40:00***

Time for Panasas Integration
The Panasas Storage Cluster arrived in three boxes on one pallet. From the time I clipped the first band on the pallet to having the system fully operational was only 1 hour 55 minutes. Here's how the process went.

First off, we unpacked the equipment and identified all components and parts. I found a location in the rack, set up the mounting hardware then mounted the chassis. I loaded the chassis with one DirectorBlade and 10 StorageBlades, network, battery and power supply modules, and connected four 1GB Ethernet links to the link aggregation point on the switch.

Now it's time to configure the software side of it.

- Connect a console cable from my laptop to the serial port on the DirectorBlade
- Log in as admin
- Configure the system via the pancli (a modified Panasas command line interface on the Unix-based operating system) with IP information

I was then able to connect to the system via the Web GUI to complete some additional configuration:

- Customer ID
- System name (required for global file system configuration)
- IP range for DHCP pool for Director Blade and StorageBlades
- Timezone

This minimum configuration is all that's required for the system to function as a basic network attached storage (NAS) system. Additional modifications can include DNS, NIS, NTP, NFS, CIFS, and SMTP servers (notifications). You can also modify volume and quota information. There are some modifications to make to the useradd scripts based on setting choices and auto-mount options if you want user home directories located on the Panasas Storage Cluster, one way in which we use the unit.

More Stories By Steve Jones

Steve Jones is currently the technology operations manager at the Institute for Computational and Mathematical Engineering at Stanford University. Steve designed and administered a Top 500 Supercomputer and speaks regularly about the design and management of High Performance Computing Clusters, most recently as a keynote speaker at the annual Rocks-a-Palooza conference at the San Diego Supercomputing Center. His free time is spent with his significant other, Leilani, far away from a keyboard. More information about Steve can be found at http://www.hpcclusters.org.

Comments (4) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
clusteradmin.net 02/18/08 06:17:49 PM EST

For those who came here searching for cluster resources you may consider visiting my blog (http://clusteradmin.net) about cluster administration. Some introductory stuff, load-balancing guide, monitoring and other articles.

Thanks,

-marek

Grid 04/01/06 10:38:44 AM EST

Seems like SGE was not mentioned:
http://gridengine.sunsource.net

Grid 04/01/06 10:36:27 AM EST

Seems like SGE was not mentioned:
http://gridengine.sunsource.net

SYS-CON Belgium News Desk 03/17/06 09:36:01 AM EST

After building a number of clusters from the ground up -including one that made it to the Top500 Supercomputer list - I decided to try a service that many vendors now offer - having a system racked and stacked at the factory then shipped to us. Such a service saves a huge amount of time, not to mention my back, not having to build the cluster and cable all the equipment together. I've been a fan of well-cabled systems and have found the quality control to be acceptable. The key component is the pre-build requirements and verification before the system is built. This will ensure the system shipped is what is expected when it arrives at your front door. There can still be a fair amount of cabling that has to be done once it arrives, if you have a multi-rack configuration, but it's usually limited to plugging in the system's power and public network.