Welcome!

Linux Containers Authors: Automic Blog, Yeshim Deniz, Elizabeth White, Liz McMillan, Pat Romanski

Related Topics: Linux Containers

Linux Containers: Article

Preparing for the Revolution

Dual-core technology for HPC Clusters

There's revolution (or evolution) occurring in the high-performance computing (HPC) industry. Recently both AMD and Intel introduced chips with multiple processing units in a single package. Instead of having one central processor, or brain, computers will now have multiple brains with which to run programs. While this technique isn't new, it's the first time these types of architectures have been mass-produced and sold to the commodity PC and server markets.

This revolution will affect everyone who uses a computer including high-performance clustered systems. From laptops to game consoles to large servers, the multi-core age has begun. From an end-user's perspective, this change will remain hidden. However, the expectation of continued price-to-performance gains like those experienced over the past 20 years will remain.

Programmers will find providing additional price-to-performance advantages on multi-core designs a challenge. There's no silver bullet or automated technology that can adapt current software to multi-core systems. This article will address these challenges and provide programmers and managers with a basic understanding of the issues and the solutions that will be required to leverage the new multi-core revolution.

The Road to Multi-Core
The computer market has enjoyed the steady uptick in processor speeds. A processor's speed is largely determined by how fast a clock tells the processor to perform instructions. The faster the clock the more instructions can be performed in a given timeframe. The physics of semiconductors have put some constraints on the rate at which processor clock speeds can be increased. This trend is shown quite clearly in Figure 1 where the average clock speed and heat dissipation of Intel and AMD processors are plotted over time.

From a power consumption perspective, it was clear that something had to be done. The continued spikes in power consumption (and thus heat generation) required additional cooling and electrical service to keep the processor operating. The solution was to scale out processor cores instead of scaling up the clock rate. The drop-off in clock speed on the graph indicates the delivery of the first dual-core processors from AMD and Intel. These processors are designed to run at slower clock rates than single-core designs due heat issues. These dual-core chips can, in theory, deliver twice the performance of a single-core chip and so continue the processor performance march.

Multi-Core Road Maps
Both Intel and AMD are selling multi-core processors today. From publicly available documents, they expect to release quad-cores in 2007 and speculation has eight-way cores being introduced in 2009-2010.

       2005 Dual-Cores
       2007 Four-Cores
       2009 + Eight-Cores

For servers and workstations that have traditionally had two processor sockets available, this means the total number of cores per motherboard can easily reach 16 by the end of the decade. AMD's HyperTransport (Direct Connect) technology already allows eight-way motherboard designs (two four-processor motherboards). Extrapolating this to eight-way cores means that 64-core servers aren't an unreasonable expectation.

The Challenges
The challenge facing the HPC cluster industry is how to use the sudden doubling processor power. Fortunately modern operating systems are equipped to take advantage of multiple processors and may extend some immediate benefits to end users near-term. Using dual-core processors to their fullest potential on a per-application basis is harder (it requires re-programming) and is considered a longer-term benefit. An analogy will help explain the situation.

The Multiprocessor Store
We've stood in line at the grocery store. The speed at which we get our order checked out (processed) is related to the number of cash registers the store uses.

A store with one cash register is like a modern day single-processor computer. Each customer has a cartful of items (programs) to be tabulated (computed) by the cash register (processor). Modern operating systems use a trick called time sharing (or multitasking) to make it look like multiple programs are running at the same time. For instance, extending the store analogy, if an extremely efficient cashier with a smart cash register processes some of your order then process some of the next customer's, you'd both appear to be moving though the line at the same time. Using this method, customers get the illusion that they are moving through the line, but in reality, they'll always go faster if they're the only customer.

The obvious solution to anyone waiting in line is to use more than one cash register. And this is actually what large stores do to improve the flow of customers through the checkout line. The same affect will happen when dual-core processors become mainstream in the next few years. More customers (programs) can be serviced (run) at the same time, but you won't get through the line any faster than you would if there was only your order and one cash register. In computer terminology, this is referred to as Symmetric Multiprocessing or SMP.

The market has grown accustomed to faster and faster "cashiers" over the last 20 years so that orders that once took minutes to tabulate now take seconds and customers (programs) move faster than before. As mentioned above, processor technology is having trouble making the processors (cashiers) faster so it's introduced more cash registers.

In the near-term, more processors (cash registers) means more of the users' programs work at the same time without impacting each other's performance. Using modern SMP-enabled operating systems, this benefit will be immediate and transparent to all users. The longer-term challenge facing software developers is how to make individual programs go faster using more than one processor.

The Long-Term Performance Challenge
Going back to our store analogy, it's obvious that breaking your order into smaller orders and distributing them over two or more cash registers lets you get finished faster. The same applies to computer programs. If the program is amenable to distribution, it can use multiple processors and execute faster. Commonly referred to as parallel computing, this method will be responsible for almost all performance gains in the immediate future. Parallel computing almost always requires reprogramming existing sequential applications to execute in parallel. The amount of reprogramming can be trivial or monumental depending on the application. The choice of tools and techniques for this task will be critical for success in the future. Fortunately there are existing software methods and tools for exploiting parallelism in applications. Many of these techniques are currently used successfully in the Linux-dominated HPC market.

Programming Methods
Dealing with multiple CPUs isn't a new idea. They've been around for years and studied quite extensively. There's no general consensus, however, on how to program multiple processors. There are two general methods that the programmer can use. The first is threaded programming and the second is message passing. Both have their advantages and disadvantages. The right choice depends on the application and target hardware.

Threads
The thread model is a way for a program to split itself into two or more concurrent tasks. These tasks can be run on a single processor in a time-shared mode or on a separate processor (e.g., the two cores on a dual-core processor can each run threads). The term thread comes from "thread of execution" and is similar to how a fabric (a computer program) can be pulled apart into threads (concurrent parts). In the cash register analogy, it would be similar to breaking your order up into components and using separate cash registers. Threads are different from individual processes (or independent programs) because they inherit much of the state information and memory from the parent process.

With Linux and Unix systems, threads are often implemented using a POSIX Thread Library (pthreads). There are several other thread models (Windows threads) that the programmer can choose; however, using a standards-based implementation, like POSIX, is highly recommended. As a low-level library, pthreads can be easily included in almost all programming applications.

Threads provide the ability to share memory and offer very fine-grained synchronization with other sibling threads. These low-level features can provide a very fast and flexible approach to parallel execution. Software coding at the thread level isn't without its challenges. Threaded applications require attention to detail and considerable amounts of extra code in the application. Finally, threaded apps are ideal for multi-core designs because the processors share local memory.

OpenMP
Because native thread programming can be cumbersome, a higher level of abstraction has been developed called OpenMP. As with all higher-level approaches, flexibility has been sacrificed for ease of coding. At its core OpenMP uses threads, but the details are hidden from the programmer. OpenMP is usually implemented as compiler directives in program comments. Typically, computationally heavy loops are augmented with OpenMP directives that the compiler uses to automatically "thread the loop." This approach has the distinct advantage of leaving the original program "untouched" (except for directives) and providing simple recompilation for a sequential (non-threaded) version where the OpenMP directives are ignored.

There are several commercial and Open Source (C/C++, Fortran) OpenMP compilers available. Like pthreads OpenMP is ideal for multi-core designs.

More Stories By Douglas Eadline

Dr. Douglas Eadline has over 25 years of experience in high-performance computing. You can contact him through Basement Supercomputing (http://basement-supercomputing.com).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
In his session at @ThingsExpo, Arvind Radhakrishnen discussed how IoT offers new business models in banking and financial services organizations with the capability to revolutionize products, payments, channels, business processes and asset management built on strong architectural foundation. The following topics were covered: How IoT stands to impact various business parameters including customer experience, cost and risk management within BFS organizations.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, will provide a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to ...
SYS-CON Events announced today that Elastifile will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Elastifile Cloud File System (ECFS) is software-defined data infrastructure designed for seamless and efficient management of dynamic workloads across heterogeneous environments. Elastifile provides the architecture needed to optimize your hybrid cloud environment, by facilitating efficient...
There is only one world-class Cloud event on earth, and that is Cloud Expo – which returns to Silicon Valley for the 21st Cloud Expo at the Santa Clara Convention Center, October 31 - November 2, 2017. Every Global 2000 enterprise in the world is now integrating cloud computing in some form into its IT development and operations. Midsize and small businesses are also migrating to the cloud in increasing numbers. Companies are each developing their unique mix of cloud technologies and service...
SYS-CON Events announced today that Golden Gate University will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Since 1901, non-profit Golden Gate University (GGU) has been helping adults achieve their professional goals by providing high quality, practice-based undergraduate and graduate educational programs in law, taxation, business and related professions. Many of its courses are taug...
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
From 2013, NTT Communications has been providing cPaaS service, SkyWay. Its customer’s expectations for leveraging WebRTC technology are not only typical real-time communication use cases such as Web conference, remote education, but also IoT use cases such as remote camera monitoring, smart-glass, and robotic. Because of this, NTT Communications has numerous IoT business use-cases that its customers are developing on top of PaaS. WebRTC will lead IoT businesses to be more innovative and address...
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
Recently, WebRTC has a lot of eyes from market. The use cases of WebRTC are expanding - video chat, online education, online health care etc. Not only for human-to-human communication, but also IoT use cases such as machine to human use cases can be seen recently. One of the typical use-case is remote camera monitoring. With WebRTC, people can have interoperability and flexibility for deploying monitoring service. However, the benefit of WebRTC for IoT is not only its convenience and interopera...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
SYS-CON Events announced today that Secure Channels, a cybersecurity firm, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Secure Channels, Inc. offers several products and solutions to its many clients, helping them protect critical data from being compromised and access to computer networks from the unauthorized. The company develops comprehensive data encryption security strategie...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics ...
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, will introduce two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a...
SYS-CON Events announced today that App2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. App2Cloud is an online Platform, specializing in migrating legacy applications to any Cloud Providers (AWS, Azure, Google Cloud).
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
Internet-of-Things discussions can end up either going down the consumer gadget rabbit hole or focused on the sort of data logging that industrial manufacturers have been doing forever. However, in fact, companies today are already using IoT data both to optimize their operational technology and to improve the experience of customer interactions in novel ways. In his session at @ThingsExpo, Gordon Haff, Red Hat Technology Evangelist, shared examples from a wide range of industries – including en...
Detecting internal user threats in the Big Data eco-system is challenging and cumbersome. Many organizations monitor internal usage of the Big Data eco-system using a set of alerts. This is not a scalable process given the increase in the number of alerts with the accelerating growth in data volume and user base. Organizations are increasingly leveraging machine learning to monitor only those data elements that are sensitive and critical, autonomously establish monitoring policies, and to detect...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. Jack Norris reviews best practices to show how companies develop, deploy, and dynamically update these applications and how this data-first...