|By Paul Bemowski||
|August 11, 2003 12:00 AM EDT||
In an SMT system, a single physical processor duplicates some of the on-chip architectural state, allowing the processor core to make greater use of available resources. The second architectural state holds another thread context, allowing the processor to more completely use its resources when an active thread encounters some type of latency.
For example, when a processor encounters a cache miss, there is a slice of time that is normally wasted while the processor makes a long-latency read from main memory. In this brief slice of time, the vast majority of the processor's resources sit idle, while the processor reports itself as busy to the operating system. In an SMT system, the processor will use an on-board thread scheduler to immediately execute the second on-chip thread context's instructions, making use of otherwise wasted cycles.
SMT does incur some overhead. When two threads contend for the same processor resources, it is the responsibility of the on-chip thread scheduler to interleave the two active threads. For this reason, in certain situations a non-HT processor will outperform an HT processor. The net effect however is an overall improvement in performance for multi-threaded applications running on HT-enabled systems.
From a hardware perspective, three subsystems must work together to enable HT: the processor, the chipset, and the BIOS.
Currently, all members of Intel's Xeon processor family support HT. Xeon here is not to be confused with PIII Xeon. When Intel converted the Xeon's architecture to the P4 core, it dropped the Pentium designation, calling the new processors simply Xeon.
Xeons currently come in three flavors: Xeon, Xeon DP, and Xeon MP. All recent versions of these processors will support HT. Some older Xeon and Xeon DP processors, commonly characterized by a smaller 256 Kb L2 cache, do not support HT. If you are purchasing a used Xeon system or used Xeon processors, be sure to confirm that they support HT.
In early 2003, Intel released the 3.06GHz P4 on 0.13 micron technology. This new P4 supports HT, and signals the introduction of HT to desktop systems. Look for Intel to continue to support HT on all of its subsequent P4 releases.
HT requires chipset and BIOS support. Most of Intel's newer chipsets are supporting HT. The following link presents a table of Intel's current server/workstation chipset offerings. The last row in the table indicates whether the chipset supports HT technology.
The Basic Input/Output System, or BIOS, allows a user to set parameters affecting system hardware, before the system boots to an operating system. As such, the BIOS is generally tightly coupled to the chipset on which it is installed. In a BIOS that supports HT, the user will have an option to enable/disable HT support on the processor/chipset. With HT enabled on the system, the BIOS presents each physical processor to the operating system as a pair of logical processors. From that point, it is the responsibility of the operating system to make intelligent use of the additional hardware resources.
Linux Support for Hyper-Threading
Given a processor/chipset/BIOS combination that supports HT, the operating system also needs to support the feature. SMT introduces many nuances that affect thread scheduler performance. The first Linux kernel with explicit support for HT was 2.4.18. Since then the 2.5.x kernel's thread scheduler has incorporated numerous enhancements that will further increase performance on HT-enabled systems.
Next, we'll look at HT support in the 2.4 and 2.5(2.6) series kernels.
Hyper-Threading in the 2.4.18+Linux Kernel
The current stable Linux kernel branch is 2.4.x, initially released in January 2001. The 2.4 kernel has since undergone extensive patching, initially for critical bug fixes, later for feature enhancements and support for new hardware.
Because the BIOS will present even a single HT-enabled processor to the OS as two logical processors, all HT configurations should use SMP (Symmetric Multi-Processing) kernels. Pre-2.4.18 SMP kernels may recognize two processors in an HT configuration; however, the scheduler is completely unaware of the logical/physical processor differentiation. The 2.4.18 patch release added some features to the stock scheduler to make it behave better with HT hardware. A 2.4.18+ kernel is strongly recommended for HT configurations.
Enabling Hyper-Threading in a 2.4 system
Given an HT-enabled hardware configuration, use the following steps to enable HT in a 2.4 kernel:
1. First, confirm that your kernel is version 2.4.18 or later, with SMP support. There are many ways to do this, the easiest is to execute the "uname -a" command in a shell. For Red Hat users, Red Hat 7.3 was the first distribution release to support HT, incorporating a 2.4.18 kernel. If you are using another distribution, check the kernel version before attempting to use HT.
2. Next, modify your bootloader (grub or lilo), adding the following parameter to any other boot parameters currently necessary for your system:
It would be wise to add this as a different boot configuration so that you can boot HT or non-HT. (To create an explicitly non-HT configuration, add the 'noht' boot flag.)
3. Finally, reboot the system. Before it restarts, enter the BIOS setup program. Under the processor options you will be able to enable or disable HT. Enable HT, and boot to the 2.4.18 or later SMP kernel with the additional parameters.
Once you have successfully booted the HT configuration, run top. If HT is properly configured, you should see twice as many CPU states as you have physical processors (two virtual CPUs per physical CPU).
Figure 2 is an example of top running on a Red Hat 7.3 system (2.4.18) with two physical Xeons and HT fully enabled. Note the CPU states 0-4, indicating the four logical processors.
Hyper-Threading on 2.4.18+Thread Scheduler
Performance testing multithreaded benchmarks under the 2.4 kernel series still shows some wide scatter in the data. This is because the scheduler still cannot make intelligent choices regarding logical/physical processors in many situations. Under some conditions, 2.4 will still schedule two active threads on the same physical CPU, causing performance degradation. This condition is often random, causing data points from multithreaded benchmarks to vary considerably. "Full" HT scheduler support was not incorporated into the kernel until 2.5.32.
Hyper-Threading in the 2.5.xLinux Kernel
As is standard in Linux kernel versioning, the 2.5.x versions of the kernel are the development branch that will become the 2.6.x stable releases. The 2.5.x kernel added a number of features to its thread scheduler that should extend the performance improvements of HT even further.
2.5.x Thread Scheduler Improvements
A scheduler patch in 2.5.32 introduced the concept of a shared runqueue. The shared runqueue allows two (logical) CPUs, which share resources like cache, to have a scheduler parallel known as a shared runqueue. The shared runqueue may have many applications, but the initial implementation was created specifically with HT in mind. This new concept optimizes the kernel thread scheduler for HT in the following ways:
- HT-aware passive load balancing: This feature addresses the physical CPU imbalance problem - one physical CPU may be running two active threads, while a second physical CPU sits idle. Passive load balancing will attempt to schedule new active threads on an idle physical processor.
- HT-aware active load balancing: Active load balancing also addresses the physical CPU imbalance problem, this time for currently active threads. If three threads are running on three logical CPUs, and one thread goes idle freeing a physical processor, the scheduler will migrate an active thread from the physical processor running two threads to a physical processor running none.
- Thread affinity: Thread affinity is important in SMP as well as SMT systems. Processors use cache memory to hold data and instructions that the processor is using at the moment. By attempting to keep threads scheduled on the same processor, the efficiency of the cache is greatly increased. Moving a thread between physical processors requires the processor to repopulate its cache from main memory, causing performance degradation.
In an SMT system, because the logical processors share cache, the thread scheduler need only attempt to keep threads attached to a physical processor. The scheduler is free to move threads between adjacent logical processors with no performance degradation due to a stale cache.
- HT-aware task pickup: This will allow the scheduler to pick up tasks on a per-physical CPU basis, rather than per-logical CPU basis. Task pickup is related to thread affinity above.
- HT-aware wakeup: This allows threads that were woken up on active logical processors with an idle sibling to be woken up on the sibling processor. (As you might imagine, sibling processors are adjacent logical processors.)
These features work together in the 2.5.32+ kernel to make more efficient use of the new hardware features of HT systems. In addition, the kernel performs in a more consistent manner by continually making optimal use of the processors. The 2.4.18 kernel still performs better as a whole on an HT system, however, it does so in a less predictable manner.
Performance Gains Using Hyper-Threading
OK, you've built a Xeon-based HT system. What kind of performance improvement can be expected? Which applications will benefit from HT, and which will suffer?
Needless to say, HT is targeted at heavily threaded applications. Single-threaded, compute-intensive applications will see minimal performance enhancements. It should be noted, however, that nearly all modern desktop and server systems make extensive use of threads. Server applications generally process socket IO on a thread-per-socket basis. Desktop applications under X Windows will often be processing socket or disk io, X calls, and the application code in parallel.
To date, performance benchmarks for HT systems have focused on server-side systems. This should not be surprising; Intel only recently released HT on a desktop-focused processor (the recent P4). A Web search will quickly find many papers from the past year detailing performance of HT systems.
A recent IBM white paper by Duc Vianney ran several benchmarks both with and without HT enabled on 2.4 and 2.5 kernels. Vianney's work showed a slight performance degradation of single-threaded processes with HT enabled, but performance improvement for the 2.4.19 kernel was approximately 30%. With the enhanced scheduler in the 2.5.32 kernel, the same benchmarks showed a 51% improvement.
Data from an upcoming Java Developer's Journal article exploring heavily threaded Java applications on HT systems indicated typical performance gains of 10-15%, with some tests indicating gains of up to 75% running Java 1.4 on a 2.4.18 HT system.
SMT is here to stay. As processors become more sophisticated, the raw speed of the processor will become even less of a factor in overall system performance due to added features like HT. Some have speculated that SMT and related technologies will spell the end of the megahertz wars.
As with any new hardware technology, software is catching up. Subsequent Linux kernel releases will make more sophisticated use of the available hardware features. Over time, Linux support for HT will mature, resulting in further performance gains.
The Linux community is waiting with bated breath for Linus and crew to tackle the final bugs in 2.5.x, and release the 2.6 Linux kernel. After a stabilization period (which could be significant), major distributions will migrate to the 2.6 kernel. All the while, HT-enabled hardware will be finding its way into enterprise server racks. When the 2.6-enabled distributions hit this hardware, server-side performance will measurably increase, with no hardware investment whatsoever.
Hyper-Threading technology specifically targets performance gains on heavily threaded applications. These applications are most commonly found in enterprise server platforms - application servers, Web servers, Web services platforms, and Java-based systems. Dell, HP (Compaq), and IBM are all putting forth powerful Xeon-based systems with 2-16 processors running Linux. If HT can improve performance by a conservative 25% in heavily threaded server applications, there's an even stronger case for Linux servers over major Unix platforms for data center use on a cost/performance basis.
Hyper-Threading technology promises to make the Intel/Linux combination even more attractive to IT managers and systems architects looking to upgrade their enterprise software platforms.
|tcx 12/05/03 07:23:10 AM EST|
very useful and detailed information.
for details search g**gle.com for
Most of the IoT Gateway scenarios involve collecting data from machines/processing and pushing data upstream to cloud for further analytics. The gateway hardware varies from Raspberry Pi to Industrial PCs. The document states the process of allowing deploying polyglot data pipelining software with the clear notion of supporting immutability. In his session at @ThingsExpo, Shashank Jain, a development architect for SAP Labs, discussed the objective, which is to automate the IoT deployment process from development to production scenarios using Docker containers.
Dec. 1, 2015 11:00 AM EST Reads: 136
The cloud. Like a comic book superhero, there seems to be no problem it can’t fix or cost it can’t slash. Yet making the transition is not always easy and production environments are still largely on premise. Taking some practical and sensible steps to reduce risk can also help provide a basis for a successful cloud transition. A plethora of surveys from the likes of IDG and Gartner show that more than 70 percent of enterprises have deployed at least one or more cloud application or workload. Yet a closer inspection at the data reveals less than half of these cloud projects involve production...
Dec. 1, 2015 11:00 AM EST Reads: 513
Countless business models have spawned from the IaaS industry – resell Web hosting, blogs, public cloud, and on and on. With the overwhelming amount of tools available to us, it's sometimes easy to overlook that many of them are just new skins of resources we've had for a long time. In his general session at 17th Cloud Expo, Harold Hannon, Sr. Software Architect at SoftLayer, an IBM Company, broke down what we have to work with, discussed the benefits and pitfalls and how we can best use them to design hosted applications.
Dec. 1, 2015 10:45 AM EST Reads: 128
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true change and transformation possible.
Dec. 1, 2015 10:00 AM EST Reads: 576
Microservices are a very exciting architectural approach that many organizations are looking to as a way to accelerate innovation. Microservices promise to allow teams to move away from monolithic "ball of mud" systems, but the reality is that, in the vast majority of organizations, different projects and technologies will continue to be developed at different speeds. How to handle the dependencies between these disparate systems with different iteration cycles? Consider the "canoncial problem" in this scenario: microservice A (releases daily) depends on a couple of additions to backend B (re...
Dec. 1, 2015 09:00 AM EST Reads: 481
We all know that data growth is exploding and storage budgets are shrinking. Instead of showing you charts on about how much data there is, in his General Session at 17th Cloud Expo, Scott Cleland, Senior Director of Product Marketing at HGST, showed how to capture all of your data in one place. After you have your data under control, you can then analyze it in one place, saving time and resources.
Dec. 1, 2015 08:00 AM EST Reads: 251
Container technology is shaping the future of DevOps and it’s also changing the way organizations think about application development. With the rise of mobile applications in the enterprise, businesses are abandoning year-long development cycles and embracing technologies that enable rapid development and continuous deployment of apps. In his session at DevOps Summit, Kurt Collins, Developer Evangelist at Built.io, examined how Docker has evolved into a highly effective tool for application delivery by allowing increasingly popular Mobile Backend-as-a-Service (mBaaS) platforms to quickly crea...
Dec. 1, 2015 08:00 AM EST Reads: 397
Too often with compelling new technologies market participants become overly enamored with that attractiveness of the technology and neglect underlying business drivers. This tendency, what some call the “newest shiny object syndrome” is understandable given that virtually all of us are heavily engaged in technology. But it is also mistaken. Without concrete business cases driving its deployment, IoT, like many other technologies before it, will fade into obscurity.
Dec. 1, 2015 08:00 AM EST Reads: 395
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound effect on the world, and what should we expect to see over the next couple of years.
Dec. 1, 2015 06:30 AM EST Reads: 515
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Day 2 Keynote at 17th Cloud Expo, Sandy Carter, IBM General Manager Cloud Ecosystem and Developers, and a Social Business Evangelist, wil...
Dec. 1, 2015 05:00 AM EST Reads: 622
PubNub has announced the release of BLOCKS, a set of customizable microservices that give developers a simple way to add code and deploy features for realtime apps.PubNub BLOCKS executes business logic directly on the data streaming through PubNub’s network without splitting it off to an intermediary server controlled by the customer. This revolutionary approach streamlines app development, reduces endpoint-to-endpoint latency, and allows apps to better leverage the enormous scalability of PubNub’s Data Stream Network.
Dec. 1, 2015 05:00 AM EST Reads: 359
Apps and devices shouldn't stop working when there's limited or no network connectivity. Learn how to bring data stored in a cloud database to the edge of the network (and back again) whenever an Internet connection is available. In his session at 17th Cloud Expo, Ben Perlmutter, a Sales Engineer with IBM Cloudant, demonstrated techniques for replicating cloud databases with devices in order to build offline-first mobile or Internet of Things (IoT) apps that can provide a better, faster user experience, both offline and online. The focus of this talk was on IBM Cloudant, Apache CouchDB, and ...
Dec. 1, 2015 04:45 AM EST Reads: 458
I recently attended and was a speaker at the 4th International Internet of @ThingsExpo at the Santa Clara Convention Center. I also had the opportunity to attend this event last year and I wrote a blog from that show talking about how the “Enterprise Impact of IoT” was a key theme of last year’s show. I was curious to see if the same theme would still resonate 365 days later and what, if any, changes I would see in the content presented.
Dec. 1, 2015 03:00 AM EST Reads: 470
Cloud computing delivers on-demand resources that provide businesses with flexibility and cost-savings. The challenge in moving workloads to the cloud has been the cost and complexity of ensuring the initial and ongoing security and regulatory (PCI, HIPAA, FFIEC) compliance across private and public clouds. Manual security compliance is slow, prone to human error, and represents over 50% of the cost of managing cloud applications. Determining how to automate cloud security compliance is critical to maintaining positive ROI. Raxak Protect is an automated security compliance SaaS platform and ma...
Dec. 1, 2015 03:00 AM EST Reads: 469
The Internet of Things (IoT) is growing rapidly by extending current technologies, products and networks. By 2020, Cisco estimates there will be 50 billion connected devices. Gartner has forecast revenues of over $300 billion, just to IoT suppliers. Now is the time to figure out how you’ll make money – not just create innovative products. With hundreds of new products and companies jumping into the IoT fray every month, there’s no shortage of innovation. Despite this, McKinsey/VisionMobile data shows "less than 10 percent of IoT developers are making enough to support a reasonably sized team....
Nov. 30, 2015 03:00 PM EST Reads: 497
Just over a week ago I received a long and loud sustained applause for a presentation I delivered at this year’s Cloud Expo in Santa Clara. I was extremely pleased with the turnout and had some very good conversations with many of the attendees. Over the next few days I had many more meaningful conversations and was not only happy with the results but also learned a few new things. Here is everything I learned in those three days distilled into three short points.
Nov. 30, 2015 02:00 PM EST Reads: 375
DevOps is about increasing efficiency, but nothing is more inefficient than building the same application twice. However, this is a routine occurrence with enterprise applications that need both a rich desktop web interface and strong mobile support. With recent technological advances from Isomorphic Software and others, rich desktop and tuned mobile experiences can now be created with a single codebase – without compromising functionality, performance or usability. In his session at DevOps Summit, Charles Kendrick, CTO and Chief Architect at Isomorphic Software, demonstrated examples of com...
Nov. 30, 2015 01:45 PM EST Reads: 439
As organizations realize the scope of the Internet of Things, gaining key insights from Big Data, through the use of advanced analytics, becomes crucial. However, IoT also creates the need for petabyte scale storage of data from millions of devices. A new type of Storage is required which seamlessly integrates robust data analytics with massive scale. These storage systems will act as “smart systems” provide in-place analytics that speed discovery and enable businesses to quickly derive meaningful and actionable insights. In his session at @ThingsExpo, Paul Turner, Chief Marketing Officer at...
Nov. 30, 2015 01:45 PM EST Reads: 442
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
Nov. 30, 2015 01:00 PM EST Reads: 542
In his General Session at 17th Cloud Expo, Bruce Swann, Senior Product Marketing Manager for Adobe Campaign, explored the key ingredients of cross-channel marketing in a digital world. Learn how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects).
Nov. 30, 2015 12:45 PM EST Reads: 347