Welcome!

Linux Containers Authors: Zakia Bouachraoui, Yeshim Deniz, Elizabeth White, Liz McMillan, Pat Romanski

Related Topics: Linux Containers

Linux Containers: Article

Preparing for the Revolution

Dual-core technology for HPC Clusters

There's revolution (or evolution) occurring in the high-performance computing (HPC) industry. Recently both AMD and Intel introduced chips with multiple processing units in a single package. Instead of having one central processor, or brain, computers will now have multiple brains with which to run programs. While this technique isn't new, it's the first time these types of architectures have been mass-produced and sold to the commodity PC and server markets.

This revolution will affect everyone who uses a computer including high-performance clustered systems. From laptops to game consoles to large servers, the multi-core age has begun. From an end-user's perspective, this change will remain hidden. However, the expectation of continued price-to-performance gains like those experienced over the past 20 years will remain.

Programmers will find providing additional price-to-performance advantages on multi-core designs a challenge. There's no silver bullet or automated technology that can adapt current software to multi-core systems. This article will address these challenges and provide programmers and managers with a basic understanding of the issues and the solutions that will be required to leverage the new multi-core revolution.

The Road to Multi-Core
The computer market has enjoyed the steady uptick in processor speeds. A processor's speed is largely determined by how fast a clock tells the processor to perform instructions. The faster the clock the more instructions can be performed in a given timeframe. The physics of semiconductors have put some constraints on the rate at which processor clock speeds can be increased. This trend is shown quite clearly in Figure 1 where the average clock speed and heat dissipation of Intel and AMD processors are plotted over time.

From a power consumption perspective, it was clear that something had to be done. The continued spikes in power consumption (and thus heat generation) required additional cooling and electrical service to keep the processor operating. The solution was to scale out processor cores instead of scaling up the clock rate. The drop-off in clock speed on the graph indicates the delivery of the first dual-core processors from AMD and Intel. These processors are designed to run at slower clock rates than single-core designs due heat issues. These dual-core chips can, in theory, deliver twice the performance of a single-core chip and so continue the processor performance march.

Multi-Core Road Maps
Both Intel and AMD are selling multi-core processors today. From publicly available documents, they expect to release quad-cores in 2007 and speculation has eight-way cores being introduced in 2009-2010.

       2005 Dual-Cores
       2007 Four-Cores
       2009 + Eight-Cores

For servers and workstations that have traditionally had two processor sockets available, this means the total number of cores per motherboard can easily reach 16 by the end of the decade. AMD's HyperTransport (Direct Connect) technology already allows eight-way motherboard designs (two four-processor motherboards). Extrapolating this to eight-way cores means that 64-core servers aren't an unreasonable expectation.

The Challenges
The challenge facing the HPC cluster industry is how to use the sudden doubling processor power. Fortunately modern operating systems are equipped to take advantage of multiple processors and may extend some immediate benefits to end users near-term. Using dual-core processors to their fullest potential on a per-application basis is harder (it requires re-programming) and is considered a longer-term benefit. An analogy will help explain the situation.

The Multiprocessor Store
We've stood in line at the grocery store. The speed at which we get our order checked out (processed) is related to the number of cash registers the store uses.

A store with one cash register is like a modern day single-processor computer. Each customer has a cartful of items (programs) to be tabulated (computed) by the cash register (processor). Modern operating systems use a trick called time sharing (or multitasking) to make it look like multiple programs are running at the same time. For instance, extending the store analogy, if an extremely efficient cashier with a smart cash register processes some of your order then process some of the next customer's, you'd both appear to be moving though the line at the same time. Using this method, customers get the illusion that they are moving through the line, but in reality, they'll always go faster if they're the only customer.

The obvious solution to anyone waiting in line is to use more than one cash register. And this is actually what large stores do to improve the flow of customers through the checkout line. The same affect will happen when dual-core processors become mainstream in the next few years. More customers (programs) can be serviced (run) at the same time, but you won't get through the line any faster than you would if there was only your order and one cash register. In computer terminology, this is referred to as Symmetric Multiprocessing or SMP.

The market has grown accustomed to faster and faster "cashiers" over the last 20 years so that orders that once took minutes to tabulate now take seconds and customers (programs) move faster than before. As mentioned above, processor technology is having trouble making the processors (cashiers) faster so it's introduced more cash registers.

In the near-term, more processors (cash registers) means more of the users' programs work at the same time without impacting each other's performance. Using modern SMP-enabled operating systems, this benefit will be immediate and transparent to all users. The longer-term challenge facing software developers is how to make individual programs go faster using more than one processor.

The Long-Term Performance Challenge
Going back to our store analogy, it's obvious that breaking your order into smaller orders and distributing them over two or more cash registers lets you get finished faster. The same applies to computer programs. If the program is amenable to distribution, it can use multiple processors and execute faster. Commonly referred to as parallel computing, this method will be responsible for almost all performance gains in the immediate future. Parallel computing almost always requires reprogramming existing sequential applications to execute in parallel. The amount of reprogramming can be trivial or monumental depending on the application. The choice of tools and techniques for this task will be critical for success in the future. Fortunately there are existing software methods and tools for exploiting parallelism in applications. Many of these techniques are currently used successfully in the Linux-dominated HPC market.

Programming Methods
Dealing with multiple CPUs isn't a new idea. They've been around for years and studied quite extensively. There's no general consensus, however, on how to program multiple processors. There are two general methods that the programmer can use. The first is threaded programming and the second is message passing. Both have their advantages and disadvantages. The right choice depends on the application and target hardware.

Threads
The thread model is a way for a program to split itself into two or more concurrent tasks. These tasks can be run on a single processor in a time-shared mode or on a separate processor (e.g., the two cores on a dual-core processor can each run threads). The term thread comes from "thread of execution" and is similar to how a fabric (a computer program) can be pulled apart into threads (concurrent parts). In the cash register analogy, it would be similar to breaking your order up into components and using separate cash registers. Threads are different from individual processes (or independent programs) because they inherit much of the state information and memory from the parent process.

With Linux and Unix systems, threads are often implemented using a POSIX Thread Library (pthreads). There are several other thread models (Windows threads) that the programmer can choose; however, using a standards-based implementation, like POSIX, is highly recommended. As a low-level library, pthreads can be easily included in almost all programming applications.

Threads provide the ability to share memory and offer very fine-grained synchronization with other sibling threads. These low-level features can provide a very fast and flexible approach to parallel execution. Software coding at the thread level isn't without its challenges. Threaded applications require attention to detail and considerable amounts of extra code in the application. Finally, threaded apps are ideal for multi-core designs because the processors share local memory.

OpenMP
Because native thread programming can be cumbersome, a higher level of abstraction has been developed called OpenMP. As with all higher-level approaches, flexibility has been sacrificed for ease of coding. At its core OpenMP uses threads, but the details are hidden from the programmer. OpenMP is usually implemented as compiler directives in program comments. Typically, computationally heavy loops are augmented with OpenMP directives that the compiler uses to automatically "thread the loop." This approach has the distinct advantage of leaving the original program "untouched" (except for directives) and providing simple recompilation for a sequential (non-threaded) version where the OpenMP directives are ignored.

There are several commercial and Open Source (C/C++, Fortran) OpenMP compilers available. Like pthreads OpenMP is ideal for multi-core designs.

More Stories By Douglas Eadline

Dr. Douglas Eadline has over 25 years of experience in high-performance computing. You can contact him through Basement Supercomputing (http://basement-supercomputing.com).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
Cell networks have the advantage of long-range communications, reaching an estimated 90% of the world. But cell networks such as 2G, 3G and LTE consume lots of power and were designed for connecting people. They are not optimized for low- or battery-powered devices or for IoT applications with infrequently transmitted data. Cell IoT modules that support narrow-band IoT and 4G cell networks will enable cell connectivity, device management, and app enablement for low-power wide-area network IoT. B...
The hierarchical architecture that distributes "compute" within the network specially at the edge can enable new services by harnessing emerging technologies. But Edge-Compute comes at increased cost that needs to be managed and potentially augmented by creative architecture solutions as there will always a catching-up with the capacity demands. Processing power in smartphones has enhanced YoY and there is increasingly spare compute capacity that can be potentially pooled. Uber has successfully ...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...