Welcome!

Linux Authors: Gilad Parann-Nissany, RealWire News Distribution, Colin Walker, Lori MacVittie, Unitiv Blog

Related Topics: Linux

Linux: Article

Preparing for the Revolution

Dual-core technology for HPC Clusters

There's revolution (or evolution) occurring in the high-performance computing (HPC) industry. Recently both AMD and Intel introduced chips with multiple processing units in a single package. Instead of having one central processor, or brain, computers will now have multiple brains with which to run programs. While this technique isn't new, it's the first time these types of architectures have been mass-produced and sold to the commodity PC and server markets.

This revolution will affect everyone who uses a computer including high-performance clustered systems. From laptops to game consoles to large servers, the multi-core age has begun. From an end-user's perspective, this change will remain hidden. However, the expectation of continued price-to-performance gains like those experienced over the past 20 years will remain.

Programmers will find providing additional price-to-performance advantages on multi-core designs a challenge. There's no silver bullet or automated technology that can adapt current software to multi-core systems. This article will address these challenges and provide programmers and managers with a basic understanding of the issues and the solutions that will be required to leverage the new multi-core revolution.

The Road to Multi-Core
The computer market has enjoyed the steady uptick in processor speeds. A processor's speed is largely determined by how fast a clock tells the processor to perform instructions. The faster the clock the more instructions can be performed in a given timeframe. The physics of semiconductors have put some constraints on the rate at which processor clock speeds can be increased. This trend is shown quite clearly in Figure 1 where the average clock speed and heat dissipation of Intel and AMD processors are plotted over time.

From a power consumption perspective, it was clear that something had to be done. The continued spikes in power consumption (and thus heat generation) required additional cooling and electrical service to keep the processor operating. The solution was to scale out processor cores instead of scaling up the clock rate. The drop-off in clock speed on the graph indicates the delivery of the first dual-core processors from AMD and Intel. These processors are designed to run at slower clock rates than single-core designs due heat issues. These dual-core chips can, in theory, deliver twice the performance of a single-core chip and so continue the processor performance march.

Multi-Core Road Maps
Both Intel and AMD are selling multi-core processors today. From publicly available documents, they expect to release quad-cores in 2007 and speculation has eight-way cores being introduced in 2009-2010.

       2005 Dual-Cores
       2007 Four-Cores
       2009 + Eight-Cores

For servers and workstations that have traditionally had two processor sockets available, this means the total number of cores per motherboard can easily reach 16 by the end of the decade. AMD's HyperTransport (Direct Connect) technology already allows eight-way motherboard designs (two four-processor motherboards). Extrapolating this to eight-way cores means that 64-core servers aren't an unreasonable expectation.

The Challenges
The challenge facing the HPC cluster industry is how to use the sudden doubling processor power. Fortunately modern operating systems are equipped to take advantage of multiple processors and may extend some immediate benefits to end users near-term. Using dual-core processors to their fullest potential on a per-application basis is harder (it requires re-programming) and is considered a longer-term benefit. An analogy will help explain the situation.

The Multiprocessor Store
We've stood in line at the grocery store. The speed at which we get our order checked out (processed) is related to the number of cash registers the store uses.

A store with one cash register is like a modern day single-processor computer. Each customer has a cartful of items (programs) to be tabulated (computed) by the cash register (processor). Modern operating systems use a trick called time sharing (or multitasking) to make it look like multiple programs are running at the same time. For instance, extending the store analogy, if an extremely efficient cashier with a smart cash register processes some of your order then process some of the next customer's, you'd both appear to be moving though the line at the same time. Using this method, customers get the illusion that they are moving through the line, but in reality, they'll always go faster if they're the only customer.

The obvious solution to anyone waiting in line is to use more than one cash register. And this is actually what large stores do to improve the flow of customers through the checkout line. The same affect will happen when dual-core processors become mainstream in the next few years. More customers (programs) can be serviced (run) at the same time, but you won't get through the line any faster than you would if there was only your order and one cash register. In computer terminology, this is referred to as Symmetric Multiprocessing or SMP.

The market has grown accustomed to faster and faster "cashiers" over the last 20 years so that orders that once took minutes to tabulate now take seconds and customers (programs) move faster than before. As mentioned above, processor technology is having trouble making the processors (cashiers) faster so it's introduced more cash registers.

In the near-term, more processors (cash registers) means more of the users' programs work at the same time without impacting each other's performance. Using modern SMP-enabled operating systems, this benefit will be immediate and transparent to all users. The longer-term challenge facing software developers is how to make individual programs go faster using more than one processor.

The Long-Term Performance Challenge
Going back to our store analogy, it's obvious that breaking your order into smaller orders and distributing them over two or more cash registers lets you get finished faster. The same applies to computer programs. If the program is amenable to distribution, it can use multiple processors and execute faster. Commonly referred to as parallel computing, this method will be responsible for almost all performance gains in the immediate future. Parallel computing almost always requires reprogramming existing sequential applications to execute in parallel. The amount of reprogramming can be trivial or monumental depending on the application. The choice of tools and techniques for this task will be critical for success in the future. Fortunately there are existing software methods and tools for exploiting parallelism in applications. Many of these techniques are currently used successfully in the Linux-dominated HPC market.

Programming Methods
Dealing with multiple CPUs isn't a new idea. They've been around for years and studied quite extensively. There's no general consensus, however, on how to program multiple processors. There are two general methods that the programmer can use. The first is threaded programming and the second is message passing. Both have their advantages and disadvantages. The right choice depends on the application and target hardware.

Threads
The thread model is a way for a program to split itself into two or more concurrent tasks. These tasks can be run on a single processor in a time-shared mode or on a separate processor (e.g., the two cores on a dual-core processor can each run threads). The term thread comes from "thread of execution" and is similar to how a fabric (a computer program) can be pulled apart into threads (concurrent parts). In the cash register analogy, it would be similar to breaking your order up into components and using separate cash registers. Threads are different from individual processes (or independent programs) because they inherit much of the state information and memory from the parent process.

With Linux and Unix systems, threads are often implemented using a POSIX Thread Library (pthreads). There are several other thread models (Windows threads) that the programmer can choose; however, using a standards-based implementation, like POSIX, is highly recommended. As a low-level library, pthreads can be easily included in almost all programming applications.

Threads provide the ability to share memory and offer very fine-grained synchronization with other sibling threads. These low-level features can provide a very fast and flexible approach to parallel execution. Software coding at the thread level isn't without its challenges. Threaded applications require attention to detail and considerable amounts of extra code in the application. Finally, threaded apps are ideal for multi-core designs because the processors share local memory.

OpenMP
Because native thread programming can be cumbersome, a higher level of abstraction has been developed called OpenMP. As with all higher-level approaches, flexibility has been sacrificed for ease of coding. At its core OpenMP uses threads, but the details are hidden from the programmer. OpenMP is usually implemented as compiler directives in program comments. Typically, computationally heavy loops are augmented with OpenMP directives that the compiler uses to automatically "thread the loop." This approach has the distinct advantage of leaving the original program "untouched" (except for directives) and providing simple recompilation for a sequential (non-threaded) version where the OpenMP directives are ignored.

There are several commercial and Open Source (C/C++, Fortran) OpenMP compilers available. Like pthreads OpenMP is ideal for multi-core designs.

More Stories By Douglas Eadline

Dr. Douglas Eadline has over 25 years of experience in high-performance computing. You can contact him through Basement Supercomputing (http://basement-supercomputing.com).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.