YOUR FEEDBACK
Rapid Module Development for DotNetNuke
MICHEAL SMITH wrote: GO TO THE LINK, U HAVE EVERYTHING U WANT THERE. MICHEAL...


2007 West
GOLD SPONSORS:
Active Endpoints
Your SOA Needs BPEL for Orchestration
BEA
Virtualized SOA: Adaptive Infrastructure for Demanding Applications
Nexaweb
Overcoming Bandwidth Challenges with Nexaweb
TIBCO
What is Service Virtualization?
SILVER SPONSORS:
WSO2
Using Web Services Technologies and FOSS Solutions
Click For 2007 East
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TOP LINKS YOU MUST CLICK ON


Preparing for the Revolution
Dual-core technology for HPC Clusters

Digg This!

Page 1 of 2   next page »

There's revolution (or evolution) occurring in the high-performance computing (HPC) industry. Recently both AMD and Intel introduced chips with multiple processing units in a single package. Instead of having one central processor, or brain, computers will now have multiple brains with which to run programs. While this technique isn't new, it's the first time these types of architectures have been mass-produced and sold to the commodity PC and server markets.

This revolution will affect everyone who uses a computer including high-performance clustered systems. From laptops to game consoles to large servers, the multi-core age has begun. From an end-user's perspective, this change will remain hidden. However, the expectation of continued price-to-performance gains like those experienced over the past 20 years will remain.

Programmers will find providing additional price-to-performance advantages on multi-core designs a challenge. There's no silver bullet or automated technology that can adapt current software to multi-core systems. This article will address these challenges and provide programmers and managers with a basic understanding of the issues and the solutions that will be required to leverage the new multi-core revolution.

The Road to Multi-Core
The computer market has enjoyed the steady uptick in processor speeds. A processor's speed is largely determined by how fast a clock tells the processor to perform instructions. The faster the clock the more instructions can be performed in a given timeframe. The physics of semiconductors have put some constraints on the rate at which processor clock speeds can be increased. This trend is shown quite clearly in Figure 1 where the average clock speed and heat dissipation of Intel and AMD processors are plotted over time.

From a power consumption perspective, it was clear that something had to be done. The continued spikes in power consumption (and thus heat generation) required additional cooling and electrical service to keep the processor operating. The solution was to scale out processor cores instead of scaling up the clock rate. The drop-off in clock speed on the graph indicates the delivery of the first dual-core processors from AMD and Intel. These processors are designed to run at slower clock rates than single-core designs due heat issues. These dual-core chips can, in theory, deliver twice the performance of a single-core chip and so continue the processor performance march.

Multi-Core Road Maps
Both Intel and AMD are selling multi-core processors today. From publicly available documents, they expect to release quad-cores in 2007 and speculation has eight-way cores being introduced in 2009-2010.

       2005 Dual-Cores
       2007 Four-Cores
       2009 + Eight-Cores

For servers and workstations that have traditionally had two processor sockets available, this means the total number of cores per motherboard can easily reach 16 by the end of the decade. AMD's HyperTransport (Direct Connect) technology already allows eight-way motherboard designs (two four-processor motherboards). Extrapolating this to eight-way cores means that 64-core servers aren't an unreasonable expectation.

The Challenges
The challenge facing the HPC cluster industry is how to use the sudden doubling processor power. Fortunately modern operating systems are equipped to take advantage of multiple processors and may extend some immediate benefits to end users near-term. Using dual-core processors to their fullest potential on a per-application basis is harder (it requires re-programming) and is considered a longer-term benefit. An analogy will help explain the situation.

The Multiprocessor Store
We've stood in line at the grocery store. The speed at which we get our order checked out (processed) is related to the number of cash registers the store uses.

A store with one cash register is like a modern day single-processor computer. Each customer has a cartful of items (programs) to be tabulated (computed) by the cash register (processor). Modern operating systems use a trick called time sharing (or multitasking) to make it look like multiple programs are running at the same time. For instance, extending the store analogy, if an extremely efficient cashier with a smart cash register processes some of your order then process some of the next customer's, you'd both appear to be moving though the line at the same time. Using this method, customers get the illusion that they are moving through the line, but in reality, they'll always go faster if they're the only customer.

The obvious solution to anyone waiting in line is to use more than one cash register. And this is actually what large stores do to improve the flow of customers through the checkout line. The same affect will happen when dual-core processors become mainstream in the next few years. More customers (programs) can be serviced (run) at the same time, but you won't get through the line any faster than you would if there was only your order and one cash register. In computer terminology, this is referred to as Symmetric Multiprocessing or SMP.

The market has grown accustomed to faster and faster "cashiers" over the last 20 years so that orders that once took minutes to tabulate now take seconds and customers (programs) move faster than before. As mentioned above, processor technology is having trouble making the processors (cashiers) faster so it's introduced more cash registers.

In the near-term, more processors (cash registers) means more of the users' programs work at the same time without impacting each other's performance. Using modern SMP-enabled operating systems, this benefit will be immediate and transparent to all users. The longer-term challenge facing software developers is how to make individual programs go faster using more than one processor.

The Long-Term Performance Challenge
Going back to our store analogy, it's obvious that breaking your order into smaller orders and distributing them over two or more cash registers lets you get finished faster. The same applies to computer programs. If the program is amenable to distribution, it can use multiple processors and execute faster. Commonly referred to as parallel computing, this method will be responsible for almost all performance gains in the immediate future. Parallel computing almost always requires reprogramming existing sequential applications to execute in parallel. The amount of reprogramming can be trivial or monumental depending on the application. The choice of tools and techniques for this task will be critical for success in the future. Fortunately there are existing software methods and tools for exploiting parallelism in applications. Many of these techniques are currently used successfully in the Linux-dominated HPC market.

Programming Methods
Dealing with multiple CPUs isn't a new idea. They've been around for years and studied quite extensively. There's no general consensus, however, on how to program multiple processors. There are two general methods that the programmer can use. The first is threaded programming and the second is message passing. Both have their advantages and disadvantages. The right choice depends on the application and target hardware.

Threads
The thread model is a way for a program to split itself into two or more concurrent tasks. These tasks can be run on a single processor in a time-shared mode or on a separate processor (e.g., the two cores on a dual-core processor can each run threads). The term thread comes from "thread of execution" and is similar to how a fabric (a computer program) can be pulled apart into threads (concurrent parts). In the cash register analogy, it would be similar to breaking your order up into components and using separate cash registers. Threads are different from individual processes (or independent programs) because they inherit much of the state information and memory from the parent process.

With Linux and Unix systems, threads are often implemented using a POSIX Thread Library (pthreads). There are several other thread models (Windows threads) that the programmer can choose; however, using a standards-based implementation, like POSIX, is highly recommended. As a low-level library, pthreads can be easily included in almost all programming applications.

Threads provide the ability to share memory and offer very fine-grained synchronization with other sibling threads. These low-level features can provide a very fast and flexible approach to parallel execution. Software coding at the thread level isn't without its challenges. Threaded applications require attention to detail and considerable amounts of extra code in the application. Finally, threaded apps are ideal for multi-core designs because the processors share local memory.

OpenMP
Because native thread programming can be cumbersome, a higher level of abstraction has been developed called OpenMP. As with all higher-level approaches, flexibility has been sacrificed for ease of coding. At its core OpenMP uses threads, but the details are hidden from the programmer. OpenMP is usually implemented as compiler directives in program comments. Typically, computationally heavy loops are augmented with OpenMP directives that the compiler uses to automatically "thread the loop." This approach has the distinct advantage of leaving the original program "untouched" (except for directives) and providing simple recompilation for a sequential (non-threaded) version where the OpenMP directives are ignored.

There are several commercial and Open Source (C/C++, Fortran) OpenMP compilers available. Like pthreads OpenMP is ideal for multi-core designs.


Page 1 of 2   next page »

About Douglas Eadline
Dr. Douglas Eadline has over 25 years of experience in high-performance computing. You can contact him through Basement Supercomputing (http://basement-supercomputing.com). This article was sponsored by Concurrent (http://www.ccur.com), a global leader in providing digital on-demand systems to the broadband industry and real-time/HPC computer systems for industry and government.

LATEST LINUX STORIES
Kevin Hoffman's Review of Iron Man
I took the advice of a friend of mine and steered clear of the 'normal' movie theaters and went a little out of the way to go to a DLP movie theater. The experience of comparing a regular movie theater to a DLP movie theater is like comparing standard def analog TV with a 1080i HDTV si
3rd International Virtualization Conference & Expo: Themes & Topics
From Application Virtualization to Xen, a round-up of the virtualization themes & topics being discussed in NYC June 23-24, 2008 by the world-class speaker faculty at the 3rd International Virtualization Conference & Expo being held by SYS-CON Events in The Roosevelt Hotel, in midtown
Verizon Becomes a Counter-Android Linux Convert
Verizon Wireless is snubbing Google's Linux-based Android initiative to go with the LiMo Foundation's mobile Linux spec for its next wave of mobile phones expected next year. Along with Verizon, Mozilla signed up - giving the consortium its first major open source ISV - and a key one f
Adaptec Launches New Series 2 RAID Controller For Linux Users
Adaptec unveiled a new family of entry-level Unified Serial RAID controllers. The new low-profile Series 2 RAID controllers, built on the same Adaptec dual core RAID-on-Chip (ROC) architecture used in its successful Series 5 RAID controllers, provide significant performance enhancement
JavaOne 2008: Sun Challenges Linux
Sun's mule train has finally pulled into Indiana after three years on the road. Indiana is the Linux-friendly Fedora-like OpenSolaris project meant to move the Solaris-shy Linux community off Linux and on to Solaris tempted by Solaris widgetry like the highly scalable, rollback-easy, 1
Curl Announces Support for Ubuntu for Enterprise RIA Platform
Curl announced it has released the availability of an Ubuntu Installer for the Curl Rich Internet Application (RIA) platform. Curl is a Rich Internet Application platform that competes with Adobe AIR/Flex, Silverlight, and Ajax. Curl has been shipping with Linux support for RedHat 9, S
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON FEATURED WHITEPAPERS

ADS BY GOOGLE