Welcome!

Linux Containers Authors: Amit Gupta, Liz McMillan, Elizabeth White, Yeshim Deniz, Pat Romanski

Related Topics: Linux Containers

Linux Containers: Article

The kernel of pain

Let's call a spade a spade: For large servers, the 2.4 kernel has been a disaster.

(LinuxWorld) -- Let's start from the beginning. In July 2001, I was responsible for upgrading a customer's server from Red Hat 6.2 to Mandrake 8.0. The machine was built from scratch, and Mandrake was installed onto a freshly formatted RAID 5 array. We then migrated the Red Hat 6.2 applications to the new machine.

After a little configuration, the machine seemed to run fine. We successfully migrated the entire system in less than five hours. Considering this was a large-scale server, that was quite a feat and was certainly welcomed by our paying customer.

However, after about a month into deployment I started noticing strange problems with the machine. Intermittent lockups were the most common. The lockups appeared physical, and the machine was unrecoverable without a reboot.

While performing research on the problem, I learned there was a serious sync() bug in the 2.4 kernel. This bug exists in all kernel 2.4 versions until 2.4.6. The solution seemed simple: I upgrade the kernel.

About a week later, the machine locks up cold -- again. We considered it a fluke and rebooted. The very next day the machine locked up -- again. We do further research and find that the original 2.4 VM (Virtual Memory) implementation was causing problems. In my frustration and embarrassment, I would be inclined to call it bad design, but I don't know enough about the intricacies of the Linux kernel to say whether it was.

The VM problem was so horribly bad that the kernel team decided to rip out the older implementation and implement a completely new design. These problems continued as the kernel versions worked their way up through 2.4.11, which has a serious symlink bug that could lead to corrupted inodes. As of 2.4.13, things finally seemed to be cleaned up a bit. The kernel seemed to show more stability. Then we hit kernel 2.4.15.

Linux version 2.4.15 contained a bug that was arguably worse than the VM bug. Essentially, if you unmounted a file system via reboot -- or any another common method -- you would get filesystem corruption. A fix, called kernel 2.4.16, was released 24 hours later.

Kernel 2.4.16 now appeared to be the kernel of choice. It seemed as if it was possible that after almost a year of "stable" status that the 2.4 kernel would be usable in a production environment.

We still aren't there yet

Alas, the mire of trouble within the 2.4 series kernels continues. As of kernel 2.4.16, there is a serious bug in the OOM that can cause system lockups. The lock-up bug in 2.4.16 has supposedly been fixed in 2.4.17pre4aa1.

The current kernel release is 2.4.17, and one would hope that it is stable, but a brief review of the changelog will show that the kernel team is still working on fine-tuning the new VM design, and the vast amount of changes that have been made are already making me weary of it.

As I reviewed the archives of late December, I found that the per-user limit support in the 2.4 series kernels is broken. With the limit support broken, any user -- privileged or not -- has the potential to suck up all of the machines resources, effectively causing an intramural DoS (Denial of Service) attack. They could do this accidentally, and it would cause a great deal of grief for any system administrator.

So, what does all of this mean for me? It means that after five months of battling the new, better-than-fresh-butter, enterprise-ready 2.4 kernel, I am moving my customer back to the stodgy, conservative, more-enterprise-ready-than-2.4-has-been-since-its-release-almost-a-year-ago, 2.2 kernel-based Red Hat 6.2.

The 2.2 kernels may not handle large SMP machines as well, they may not handle large amounts of memory well (only 2 gigabytes), and they may have a practical limit of 2 gigabytes on a single file, but the 2.2. kernels don't crash or cause phone calls at 5:00 AM. Moreover, the 2.2 kernels don't make customers unhappy that they chose Linux as their server solution.

What does this mean for you?

What does all of this mean for you? That is your decision. You just read mine.

I hope Red Hat, SuSE, and Mandrake are taking a long hard look at the 2.4 process and formulating long-term plans to circumvent problems like this. I know, for example, that Red Hat has its own stress testing for the kernel, and that the Red Hat-shipped kernel is a fork of the standard Linux kernel. This fork is a good thing, because it means that Red Hat is able to apply patches that, in theory, make its kernel more stable.

On the desktop that I write this article, I am running Red Hat 7.2 with the 2.4.9-enterprise kernel. (It's a long story that involves this machine's AMD Duron processor.) I have yet to have any lockups on the Red Hat kernel since I upgraded to 2.4.9. I can say that Red Hat 7.2 seems reasonable and usable (at least as a desktop machine) but I am unsure if any 2.4 kernel-based system would be considered acceptable in a production server environment today.

More Stories By Joshua Drake

Joshua Drake is the co-founder of Command Prompt, Inc., a PostgreSQL and Linux custom development company. He is also the current author of the Linux Networking HOWTO, Linux PPP HOWTO, and Linux Consultants HOWTO. His most demanding project at this time is a new PostgreSQL book for O'Reilly, 'Practical PostgreSQL'

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we'v...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities – ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups. As a result, many firms employ new business models that place enormous impor...
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
SYS-CON Events announced today that TidalScale, a leading provider of systems and services, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale has been involved in shaping the computing landscape. They've designed, developed and deployed some of the most important and successful systems and services in the history of the computing industry - internet, Ethernet, operating s...
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
SYS-CON Events announced today that TidalScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale is the leading provider of Software-Defined Servers that bring flexibility to modern data centers by right-sizing servers on the fly to fit any data set or workload. TidalScale’s award-winning inverse hypervisor technology combines multiple commodity servers (including their ass...
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant tha...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
Infoblox delivers Actionable Network Intelligence to enterprise, government, and service provider customers around the world. They are the industry leader in DNS, DHCP, and IP address management, the category known as DDI. We empower thousands of organizations to control and secure their networks from the core-enabling them to increase efficiency and visibility, improve customer service, and meet compliance requirements.
As popularity of the smart home is growing and continues to go mainstream, technological factors play a greater role. The IoT protocol houses the interoperability battery consumption, security, and configuration of a smart home device, and it can be difficult for companies to choose the right kind for their product. For both DIY and professionally installed smart homes, developers need to consider each of these elements for their product to be successful in the market and current smart homes.
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
SYS-CON Events announced today that mruby Forum will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. mruby is the lightweight implementation of the Ruby language. We introduce mruby and the mruby IoT framework that enhances development productivity. For more information, visit http://forum.mruby.org/.
Digital transformation is changing the face of business. The IDC predicts that enterprises will commit to a massive new scale of digital transformation, to stake out leadership positions in the "digital transformation economy." Accordingly, attendees at the upcoming Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA, Oct 31-Nov 2, will find fresh new content in a new track called Enterprise Cloud & Digital Transformation.
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, will discuss how given the magnitude of today's applicati...
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp emp...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...