|By Hovhannes Avoyan||
|July 28, 2011 10:15 AM EDT||
Before reading this article, I highly suggest to get familiar with all the concepts of the previous article regarding IO tuning.
Your IO please, sir
How is your IO characterized? Yes, this question has to be asked yet again. It’s a big difference when tuning for random access reads vs. sequential reads.
How is your application doing in that matter? – You should know better than me.
My main approach when optimizing for read IO is to access the disks as little as possible. Disks are slow, really slow, comparing to CPU and RAM – if we can – we avoid them.
Write some proper code
Have a look at a use case of reading large chunks from your HDs and serving them via the network for various users.
Obviously reading the file on demand will save memory, however, it will increase your latency and cause a bad user experience. The secret is here is to:
- Prefetch some more
- Try to expect what should be prefetched
Prefetching (or buffering) the actual data you are going to serve can offload your HDs by reading the data just once and serving it multiple times.
However, when your IO is random and you can’t really expect what’s going to be fetched – you can still statistically optimize your cache hits by buffering as much as data as you can in memory.
Unfortunately for random IO, in the end of the day, expect your HDs to work and also expect the extra delay.
Ah and of course – if the “disk grinding” application is actually a database, then obviously avoiding full table scans and adding proper indexes where needed are the key for better performance in that matter.
Linux to the rescue?
I talked about IO schedulers in the previous article, together with filesystems, they affect your performance not only when writing, but also while reading.
What I did want to mention here, is that Linux does a pretty good job with its buffer cache – ever wondered how come you never have any free memory under Linux? – That is because Linux is doing an awesome job for you – caching disk IO for you without even asking you. So if you didn’t implement proper caches in your application, then Linux might take care of it for you.
TMPFS and RAMFS are, for some reason, only rarely used. These are in-memory filesystems. They can be great for caching files and making sure they stay in your RAM for ultra-fast access.
This is yet another countermeasure we can undertake in order to avoid the slowness of HDs.
Yet another tuneable
For some awkward reason, the ‘noatime’ and ‘nodiratime’ attributes are not on by default on Linux filesystems.
You should switch them on, as it’ll trigger a filesystem metadata write every time you access a file or directory – yes, also for reading. If I disabled it on my humble netbook (actually for battery saving concerns – have the HD spin less) – you must disable it also on your server.
I encourage you to do it now – also on your desktop:
# vim /etc/fstab
Just add ‘noatime’ and ‘nodiratime’ for any non-swap filesystem.
If you have a lot of memory in your system – then use it.
Many are the times I’ve seen people struggle with a database in the size of 10GB which has either multiple reads or writes or both of them together.
The simple solution is just to have the whole DB in-memory on a proper machine with more than 10GB or memory. Trust me – no matter what HD setup you have below – if you have 100% of your databases in-memory – HDs will never slow you down.
Consider it also for databases larger than 10GB. Unfortunately during my work experience I’ve seen what happened with a fairly optimized DB in the size of 100GB. It had gazillion (OK, maybe I’m exaggerating here) of updates and queries per minute. Jumping from 8GB of RAM to 64GB made the huge difference between a system that didn’t work to a system that actually copes with the load.
In most cases, if your system is that much loaded and you justify these huge amounts of memory – you are also probably making enough money to actually afford it.
RAID1 simply mirrors your data over 2 (or more) HDs, usually to provide redundancy. Say one of your HDs is broken – your system can continue to operate from the other.
RAID1 is usually considered “wasteful” because you don’t get extra storage for every HD you add. However, with RAID1 and a proper RAID controller, you can get tremendous read performance boosts.
How come? Imagine you have 10 read IO requests and one HD to serve them – they might get ordered sequentially and served optimally, however, only one HD will serve them.
Lets imagine you now have 10 HDs in a RAID1 and the same 10 read IO requests – you now have 10 HDs to serve them!! Theoretically speaking it’ll be 10 times faster than one HD.
That is with an exception of having a proper RAID controller – some cheaper controllers wouldn’t provide you the desired behavior of reading from multiple disks at the same time.
Be cautious when building a RAID1 configuration – the write speed of the array will be the write speed of the slowest HD in the array – it will decrease write performance noticeably if you have slower disks in the array.
Another word about RAID1 vs. RAID10 – RAID1 can be easily expanded – whenever you notice a performance problem – just chuck another HD in the RAID array and you’re good to go with some more performance!
Hands up, don’t move!
SSDs can yield a big performance boost if you care mainly about reading.
They are, however, 10 times more expensive – but also 10 times or more faster when talking about random access of data.
How come? – there’s nothing turning over there, nothing is moving – all of your stored data is “in the same distance” from you. While on traditional mechanical magnetic HDs you have to actually wait for a head to move to the correct location in order to retrieve any data.
It is a sort of a last resort optimization in my opinion, as it can easily double or triple your expenses.
One last dirty trick
Mechanical disks turn, some at 7.2K RPM, some at 10K and some even at 15K.
In addition to turning, they have a head, pointing to the location with the desired data.
For a given HD, the angular velocity of the head on the HD plate is constant, be it 7.2K, 10K or 15K RPM.
However, going further out on the HD plate, the absolute velocity of the head increases – as the plate radius is bigger and the angular velocity stays the same. This is simple physics.
For a more thorough explanation I suggest reading this brilliant article.
What does it mean for us? Well, in order to squeeze the best out of our HDs, we can use just the outer parts of its plate. Usually the outermost tracks of a HD are in the beginning of it, regarding partition creation.
The bag of tools I published in the previous article are also just as good over here.
Examining your application in the applicative level is highly recommended as well. Provide yourself verbose logs and see where you hit the caches and where you miss them. Your task, obviously, is to minimize the cache misses to a bare minimum.
That’s all folks
Truly these two IO tuning articles are not a mere grocery-store tick list you can just perform and gain an extra boost – these articles were meant to give you tools to think, plan, design and eventually carry out successful architectures. Good luck!
Just over a week ago I received a long and loud sustained applause for a presentation I delivered at this year’s Cloud Expo in Santa Clara. I was extremely pleased with the turnout and had some very good conversations with many of the attendees. Over the next few days I had many more meaningful conversations and was not only happy with the results but also learned a few new things. Here is everything I learned in those three days distilled into three short points.
Oct. 1, 2016 12:30 PM EDT Reads: 5,526
“We're a global managed hosting provider. Our core customer set is a U.S.-based customer that is looking to go global,” explained Adam Rogers, Managing Director at ANEXIA, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Oct. 1, 2016 12:30 PM EDT Reads: 3,262
Why do your mobile transformations need to happen today? Mobile is the strategy that enterprise transformation centers on to drive customer engagement. In his general session at @ThingsExpo, Roger Woods, Director, Mobile Product & Strategy – Adobe Marketing Cloud, covered key IoT and mobile trends that are forcing mobile transformation, key components of a solid mobile strategy and explored how brands are effectively driving mobile change throughout the enterprise.
Oct. 1, 2016 12:30 PM EDT Reads: 2,327
What are the new priorities for the connected business? First: businesses need to think differently about the types of connections they will need to make – these span well beyond the traditional app to app into more modern forms of integration including SaaS integrations, mobile integrations, APIs, device integration and Big Data integration. It’s important these are unified together vs. doing them all piecemeal. Second, these types of connections need to be simple to design, adapt and configure...
Oct. 1, 2016 12:30 PM EDT Reads: 602
Adobe is changing the world though digital experiences. Adobe helps customers develop and deliver high-impact experiences that differentiate brands, build loyalty, and drive revenue across every screen, including smartphones, computers, tablets and TVs. Adobe content solutions are used daily by millions of companies worldwide-from publishers and broadcasters, to enterprises, marketing agencies and household-name brands. Building on its established design leadership, Adobe enables customers not o...
Oct. 1, 2016 12:30 PM EDT Reads: 606
SYS-CON Events announced today the Enterprise IoT Bootcamp, being held November 1-2, 2016, in conjunction with 19th Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA. Combined with real-world scenarios and use cases, the Enterprise IoT Bootcamp is not just based on presentations but with hands-on demos and detailed walkthroughs. We will introduce you to a variety of real world use cases prototyped using Arduino, Raspberry Pi, BeagleBone, Spark, and Intel Edison. Y...
Oct. 1, 2016 12:30 PM EDT Reads: 3,092
Ask someone to architect an Internet of Things (IoT) solution and you are guaranteed to see a reference to the cloud. This would lead you to believe that IoT requires the cloud to exist. However, there are many IoT use cases where the cloud is not feasible or desirable. In his session at @ThingsExpo, Dave McCarthy, Director of Products at Bsquare Corporation, will discuss the strategies that exist to extend intelligence directly to IoT devices and sensors, freeing them from the constraints of ...
Oct. 1, 2016 11:45 AM EDT Reads: 2,801
SYS-CON Events announced today that Sheng Liang to Keynote at SYS-CON's 19th Cloud Expo, which will take place on November 1-3, 2016 at the Santa Clara Convention Center in Santa Clara, California.
Oct. 1, 2016 11:45 AM EDT Reads: 268
Technology vendors and analysts are eager to paint a rosy picture of how wonderful IoT is and why your deployment will be great with the use of their products and services. While it is easy to showcase successful IoT solutions, identifying IoT systems that missed the mark or failed can often provide more in the way of key lessons learned. In his session at @ThingsExpo, Peter Vanderminden, Principal Industry Analyst for IoT & Digital Supply Chain to Flatiron Strategies, will focus on how IoT de...
Oct. 1, 2016 11:30 AM EDT Reads: 1,328
Complete Internet of Things (IoT) embedded device security is not just about the device but involves the entire product’s identity, data and control integrity, and services traversing the cloud. A device can no longer be looked at as an island; it is a part of a system. In fact, given the cross-domain interactions enabled by IoT it could be a part of many systems. Also, depending on where the device is deployed, for example, in the office building versus a factory floor or oil field, security ha...
Oct. 1, 2016 11:15 AM EDT Reads: 881
24Notion is full-service global creative digital marketing, technology and lifestyle agency that combines strategic ideas with customized tactical execution. With a broad understand of the art of traditional marketing, new media, communications and social influence, 24Notion uniquely understands how to connect your brand strategy with the right consumer. 24Notion ranked #12 on Corporate Social Responsibility - Book of List.
Oct. 1, 2016 10:45 AM EDT Reads: 620
Fact is, enterprises have significant legacy voice infrastructure that’s costly to replace with pure IP solutions. How can we bring this analog infrastructure into our shiny new cloud applications? There are proven methods to bind both legacy voice applications and traditional PSTN audio into cloud-based applications and services at a carrier scale. Some of the most successful implementations leverage WebRTC, WebSockets, SIP and other open source technologies. In his session at @ThingsExpo, Da...
Oct. 1, 2016 10:30 AM EDT Reads: 1,728
Businesses are struggling to manage the information flow and interactions between all of these new devices and things jumping on their network, and the apps and IT systems they control. The data businesses gather is only helpful if they can do something with it. In his session at @ThingsExpo, Chris Witeck, Principal Technology Strategist at Citrix, will discuss how different the impact of IoT will be for large businesses, expanding how IoT will allow large organizations to make their legacy ap...
Oct. 1, 2016 10:30 AM EDT Reads: 717
What happens when the different parts of a vehicle become smarter than the vehicle itself? As we move toward the era of smart everything, hundreds of entities in a vehicle that communicate with each other, the vehicle and external systems create a need for identity orchestration so that all entities work as a conglomerate. Much like an orchestra without a conductor, without the ability to secure, control, and connect the link between a vehicle’s head unit, devices, and systems and to manage the ...
Oct. 1, 2016 10:00 AM EDT Reads: 525
What does it look like when you have access to cloud infrastructure and platform under the same roof? Let’s talk about the different layers of Technology as a Service: who cares, what runs where, and how does it all fit together. In his session at 18th Cloud Expo, Phil Jackson, Lead Technology Evangelist at SoftLayer, an IBM company, spoke about the picture being painted by IBM Cloud and how the tools being crafted can help fill the gaps in your IT infrastructure.
Oct. 1, 2016 10:00 AM EDT Reads: 3,208
For basic one-to-one voice or video calling solutions, WebRTC has proven to be a very powerful technology. Although WebRTC’s core functionality is to provide secure, real-time p2p media streaming, leveraging native platform features and server-side components brings up new communication capabilities for web and native mobile applications, allowing for advanced multi-user use cases such as video broadcasting, conferencing, and media recording.
Oct. 1, 2016 10:00 AM EDT Reads: 3,327
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lea...
Oct. 1, 2016 09:45 AM EDT Reads: 843
In his session at @ThingsExpo, Kausik Sridharabalan, founder and CTO of Pulzze Systems, Inc., will focus on key challenges in building an Internet of Things solution infrastructure. He will shed light on efficient ways of defining interactions within IoT solutions, leading to cost and time reduction. He will also introduce ways to handle data and how one can develop IoT solutions that are lean, flexible and configurable, thus making IoT infrastructure agile and scalable.
Oct. 1, 2016 09:15 AM EDT Reads: 1,683
Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & ...
Oct. 1, 2016 08:30 AM EDT Reads: 3,431
So, you bought into the current machine learning craze and went on to collect millions/billions of records from this promising new data source. Now, what do you do with them? Too often, the abundance of data quickly turns into an abundance of problems. How do you extract that "magic essence" from your data without falling into the common pitfalls? In her session at @ThingsExpo, Natalia Ponomareva, Software Engineer at Google, provided tips on how to be successful in large scale machine learning...
Oct. 1, 2016 08:30 AM EDT Reads: 2,518