Welcome!

Linux Containers Authors: Derek Weeks, Larry Alton, Cloud Best Practices Network, Elizabeth White, Carmen Gonzalez

Related Topics: Linux Containers

Linux Containers: Article

Hyper-Threading Linux

Hyper-Threading Linux

With the introduction of the Xeon, Xeon DP, and Xeon MP processors using the P4 core architecture, Intel has incorporated a new feature known as Hyper-Threading or HT. HT is Intel's implementation of a technology known as Simultaneous Multi-Threading, or SMT, that allows a single physical processor to execute multiple threads concurrently. This new feature has great potential in the heavily threaded back-end systems that Linux is targeting in the enterprise data center.

Understanding Hyper-Threading
In an SMT system, a single physical processor duplicates some of the on-chip architectural state, allowing the processor core to make greater use of available resources. The second architectural state holds another thread context, allowing the processor to more completely use its resources when an active thread encounters some type of latency.

For example, when a processor encounters a cache miss, there is a slice of time that is normally wasted while the processor makes a long-latency read from main memory. In this brief slice of time, the vast majority of the processor's resources sit idle, while the processor reports itself as busy to the operating system. In an SMT system, the processor will use an on-board thread scheduler to immediately execute the second on-chip thread context's instructions, making use of otherwise wasted cycles.

Figure 1 illustrates the basic architecture of an SMT processor. Most of the processor's resources, such as the cache and the computational units, are shared between the two on-chip thread contexts.

SMT does incur some overhead. When two threads contend for the same processor resources, it is the responsibility of the on-chip thread scheduler to interleave the two active threads. For this reason, in certain situations a non-HT processor will outperform an HT processor. The net effect however is an overall improvement in performance for multi-threaded applications running on HT-enabled systems.

HT-Enabled Systems
From a hardware perspective, three subsystems must work together to enable HT: the processor, the chipset, and the BIOS.

Processor
Currently, all members of Intel's Xeon processor family support HT. Xeon here is not to be confused with PIII Xeon. When Intel converted the Xeon's architecture to the P4 core, it dropped the Pentium designation, calling the new processors simply Xeon.

Xeons currently come in three flavors: Xeon, Xeon DP, and Xeon MP. All recent versions of these processors will support HT. Some older Xeon and Xeon DP processors, commonly characterized by a smaller 256 Kb L2 cache, do not support HT. If you are purchasing a used Xeon system or used Xeon processors, be sure to confirm that they support HT.

In early 2003, Intel released the 3.06GHz P4 on 0.13 micron technology. This new P4 supports HT, and signals the introduction of HT to desktop systems. Look for Intel to continue to support HT on all of its subsequent P4 releases.

Chipset/BIOS
HT requires chipset and BIOS support. Most of Intel's newer chipsets are supporting HT. The following link presents a table of Intel's current server/workstation chipset offerings. The last row in the table indicates whether the chipset supports HT technology.

www.intel.com/design/chipsets/ linecard/svr_wkstn.htm

The Basic Input/Output System, or BIOS, allows a user to set parameters affecting system hardware, before the system boots to an operating system. As such, the BIOS is generally tightly coupled to the chipset on which it is installed. In a BIOS that supports HT, the user will have an option to enable/disable HT support on the processor/chipset. With HT enabled on the system, the BIOS presents each physical processor to the operating system as a pair of logical processors. From that point, it is the responsibility of the operating system to make intelligent use of the additional hardware resources.

Linux Support for Hyper-Threading
Given a processor/chipset/BIOS combination that supports HT, the operating system also needs to support the feature. SMT introduces many nuances that affect thread scheduler performance. The first Linux kernel with explicit support for HT was 2.4.18. Since then the 2.5.x kernel's thread scheduler has incorporated numerous enhancements that will further increase performance on HT-enabled systems.

Next, we'll look at HT support in the 2.4 and 2.5(2.6) series kernels.

Hyper-Threading in the 2.4.18+Linux Kernel
The current stable Linux kernel branch is 2.4.x, initially released in January 2001. The 2.4 kernel has since undergone extensive patching, initially for critical bug fixes, later for feature enhancements and support for new hardware.

Because the BIOS will present even a single HT-enabled processor to the OS as two logical processors, all HT configurations should use SMP (Symmetric Multi-Processing) kernels. Pre-2.4.18 SMP kernels may recognize two processors in an HT configuration; however, the scheduler is completely unaware of the logical/physical processor differentiation. The 2.4.18 patch release added some features to the stock scheduler to make it behave better with HT hardware. A 2.4.18+ kernel is strongly recommended for HT configurations.

Enabling Hyper-Threading in a 2.4 system
Given an HT-enabled hardware configuration, use the following steps to enable HT in a 2.4 kernel:
1.   First, confirm that your kernel is version 2.4.18 or later, with SMP support. There are many ways to do this, the easiest is to execute the "uname -a" command in a shell. For Red Hat users, Red Hat 7.3 was the first distribution release to support HT, incorporating a 2.4.18 kernel. If you are using another distribution, check the kernel version before attempting to use HT.
2.   Next, modify your bootloader (grub or lilo), adding the following parameter to any other boot parameters currently necessary for your system:

acpismp=force

It would be wise to add this as a different boot configuration so that you can boot HT or non-HT. (To create an explicitly non-HT configuration, add the 'noht' boot flag.)
3.   Finally, reboot the system. Before it restarts, enter the BIOS setup program. Under the processor options you will be able to enable or disable HT. Enable HT, and boot to the 2.4.18 or later SMP kernel with the additional parameters.

Once you have successfully booted the HT configuration, run top. If HT is properly configured, you should see twice as many CPU states as you have physical processors (two virtual CPUs per physical CPU).

Figure 2 is an example of top running on a Red Hat 7.3 system (2.4.18) with two physical Xeons and HT fully enabled. Note the CPU states 0-4, indicating the four logical processors.

Hyper-Threading on 2.4.18+Thread Scheduler
Performance testing multithreaded benchmarks under the 2.4 kernel series still shows some wide scatter in the data. This is because the scheduler still cannot make intelligent choices regarding logical/physical processors in many situations. Under some conditions, 2.4 will still schedule two active threads on the same physical CPU, causing performance degradation. This condition is often random, causing data points from multithreaded benchmarks to vary considerably. "Full" HT scheduler support was not incorporated into the kernel until 2.5.32.

Hyper-Threading in the 2.5.xLinux Kernel
As is standard in Linux kernel versioning, the 2.5.x versions of the kernel are the development branch that will become the 2.6.x stable releases. The 2.5.x kernel added a number of features to its thread scheduler that should extend the performance improvements of HT even further.

2.5.x Thread Scheduler Improvements
A scheduler patch in 2.5.32 introduced the concept of a shared runqueue. The shared runqueue allows two (logical) CPUs, which share resources like cache, to have a scheduler parallel known as a shared runqueue. The shared runqueue may have many applications, but the initial implementation was created specifically with HT in mind. This new concept optimizes the kernel thread scheduler for HT in the following ways:

  • HT-aware passive load balancing: This feature addresses the physical CPU imbalance problem - one physical CPU may be running two active threads, while a second physical CPU sits idle. Passive load balancing will attempt to schedule new active threads on an idle physical processor.
  • HT-aware active load balancing: Active load balancing also addresses the physical CPU imbalance problem, this time for currently active threads. If three threads are running on three logical CPUs, and one thread goes idle freeing a physical processor, the scheduler will migrate an active thread from the physical processor running two threads to a physical processor running none.
  • Thread affinity: Thread affinity is important in SMP as well as SMT systems. Processors use cache memory to hold data and instructions that the processor is using at the moment. By attempting to keep threads scheduled on the same processor, the efficiency of the cache is greatly increased. Moving a thread between physical processors requires the processor to repopulate its cache from main memory, causing performance degradation.

In an SMT system, because the logical processors share cache, the thread scheduler need only attempt to keep threads attached to a physical processor. The scheduler is free to move threads between adjacent logical processors with no performance degradation due to a stale cache.

  • HT-aware task pickup: This will allow the scheduler to pick up tasks on a per-physical CPU basis, rather than per-logical CPU basis. Task pickup is related to thread affinity above.
  • HT-aware wakeup: This allows threads that were woken up on active logical processors with an idle sibling to be woken up on the sibling processor. (As you might imagine, sibling processors are adjacent logical processors.)

These features work together in the 2.5.32+ kernel to make more efficient use of the new hardware features of HT systems. In addition, the kernel performs in a more consistent manner by continually making optimal use of the processors. The 2.4.18 kernel still performs better as a whole on an HT system, however, it does so in a less predictable manner.

Performance Gains Using Hyper-Threading
OK, you've built a Xeon-based HT system. What kind of performance improvement can be expected? Which applications will benefit from HT, and which will suffer?

Needless to say, HT is targeted at heavily threaded applications. Single-threaded, compute-intensive applications will see minimal performance enhancements. It should be noted, however, that nearly all modern desktop and server systems make extensive use of threads. Server applications generally process socket IO on a thread-per-socket basis. Desktop applications under X Windows will often be processing socket or disk io, X calls, and the application code in parallel.

To date, performance benchmarks for HT systems have focused on server-side systems. This should not be surprising; Intel only recently released HT on a desktop-focused processor (the recent P4). A Web search will quickly find many papers from the past year detailing performance of HT systems.

A recent IBM white paper by Duc Vianney ran several benchmarks both with and without HT enabled on 2.4 and 2.5 kernels. Vianney's work showed a slight performance degradation of single-threaded processes with HT enabled, but performance improvement for the 2.4.19 kernel was approximately 30%. With the enhanced scheduler in the 2.5.32 kernel, the same benchmarks showed a 51% improvement.

Data from an upcoming Java Developer's Journal article exploring heavily threaded Java applications on HT systems indicated typical performance gains of 10-15%, with some tests indicating gains of up to 75% running Java 1.4 on a 2.4.18 HT system.

Summary
SMT is here to stay. As processors become more sophisticated, the raw speed of the processor will become even less of a factor in overall system performance due to added features like HT. Some have speculated that SMT and related technologies will spell the end of the megahertz wars.

As with any new hardware technology, software is catching up. Subsequent Linux kernel releases will make more sophisticated use of the available hardware features. Over time, Linux support for HT will mature, resulting in further performance gains.

The Linux community is waiting with bated breath for Linus and crew to tackle the final bugs in 2.5.x, and release the 2.6 Linux kernel. After a stabilization period (which could be significant), major distributions will migrate to the 2.6 kernel. All the while, HT-enabled hardware will be finding its way into enterprise server racks. When the 2.6-enabled distributions hit this hardware, server-side performance will measurably increase, with no hardware investment whatsoever.

Hyper-Threading technology specifically targets performance gains on heavily threaded applications. These applications are most commonly found in enterprise server platforms - application servers, Web servers, Web services platforms, and Java-based systems. Dell, HP (Compaq), and IBM are all putting forth powerful Xeon-based systems with 2-16 processors running Linux. If HT can improve performance by a conservative 25% in heavily threaded server applications, there's an even stronger case for Linux servers over major Unix platforms for data center use on a cost/performance basis.

Hyper-Threading technology promises to make the Intel/Linux combination even more attractive to IT managers and systems architects looking to upgrade their enterprise software platforms.

More Stories By Paul Bemowski

Paul Bemowski is an independent consultant, focusing on Java and
Linux solutions to enterprise computing problems.
email: [email protected]
url: http://www.jetools.com

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
tcx 12/05/03 07:23:10 AM EST

very useful and detailed information.
while there are already "minor" support for HT in Linux, the next Kernel generation will enhance it very much!

for details search g**gle.com for
" hyperthreading site:kernel.org "
and you'll get patch information about changes ou should do to your HT linux system

@ThingsExpo Stories
IoT is at the core or many Digital Transformation initiatives with the goal of re-inventing a company's business model. We all agree that collecting relevant IoT data will result in massive amounts of data needing to be stored. However, with the rapid development of IoT devices and ongoing business model transformation, we are not able to predict the volume and growth of IoT data. And with the lack of IoT history, traditional methods of IT and infrastructure planning based on the past do not app...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
"LinearHub provides smart video conferencing, which is the Roundee service, and we archive all the video conferences and we also provide the transcript," stated Sunghyuk Kim, CEO of LinearHub, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Internet of @ThingsExpo, taking place June 6-8, 2017 at the Javits Center in New York City, New York, is co-located with the 20th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. @ThingsExpo New York Call for Papers is now open.
"There's a growing demand from users for things to be faster. When you think about all the transactions or interactions users will have with your product and everything that is between those transactions and interactions - what drives us at Catchpoint Systems is the idea to measure that and to analyze it," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York Ci...
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
WebRTC is the future of browser-to-browser communications, and continues to make inroads into the traditional, difficult, plug-in web communications world. The 6th WebRTC Summit continues our tradition of delivering the latest and greatest presentations within the world of WebRTC. Topics include voice calling, video chat, P2P file sharing, and use cases that have already leveraged the power and convenience of WebRTC.
20th Cloud Expo, taking place June 6-8, 2017, at the Javits Center in New York City, NY, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy.
Discover top technologies and tools all under one roof at April 24–28, 2017, at the Westin San Diego in San Diego, CA. Explore the Mobile Dev + Test and IoT Dev + Test Expo and enjoy all of these unique opportunities: The latest solutions, technologies, and tools in mobile or IoT software development and testing. Meet one-on-one with representatives from some of today's most innovative organizations
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain. In this power panel at @...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, discussed the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
"A lot of times people will come to us and have a very diverse set of requirements or very customized need and we'll help them to implement it in a fashion that you can't just buy off of the shelf," explained Nick Rose, CTO of Enzu, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
The WebRTC Summit New York, to be held June 6-8, 2017, at the Javits Center in New York City, NY, announces that its Call for Papers is now open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 20th International Cloud Expo and @ThingsExpo. WebRTC is the future of browser-to-browser communications, and continues to make inroads into the traditional, difficult, plug-in web co...
Buzzword alert: Microservices and IoT at a DevOps conference? What could possibly go wrong? In this Power Panel at DevOps Summit, moderated by Jason Bloomberg, the leading expert on architecting agility for the enterprise and president of Intellyx, panelists peeled away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of our distributed cloud enviro...
In 2014, Amazon announced a new form of compute called Lambda. We didn't know it at the time, but this represented a fundamental shift in what we expect from cloud computing. Now, all of the major cloud computing vendors want to take part in this disruptive technology. In his session at 20th Cloud Expo, John Jelinek IV, a web developer at Linux Academy, will discuss why major players like AWS, Microsoft Azure, IBM Bluemix, and Google Cloud Platform are all trying to sidestep VMs and containers...
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex softw...
WebRTC is about the data channel as much as about video and audio conferencing. However, basically all commercial WebRTC applications have been built with a focus on audio and video. The handling of “data” has been limited to text chat and file download – all other data sharing seems to end with screensharing. What is holding back a more intensive use of peer-to-peer data? In her session at @ThingsExpo, Dr Silvia Pfeiffer, WebRTC Applications Team Lead at National ICT Australia, looked at differ...
Fact is, enterprises have significant legacy voice infrastructure that’s costly to replace with pure IP solutions. How can we bring this analog infrastructure into our shiny new cloud applications? There are proven methods to bind both legacy voice applications and traditional PSTN audio into cloud-based applications and services at a carrier scale. Some of the most successful implementations leverage WebRTC, WebSockets, SIP and other open source technologies. In his session at @ThingsExpo, Da...
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Day 2 Keynote at 17th Cloud Expo, Sandy Ca...
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here