Click here to close now.

Welcome!

Linux Authors: AppDynamics Blog, Carmen Gonzalez, Roger Strukhoff, Pat Romanski, Hovhannes Avoyan

Related Topics: Virtualization, Java, Microservices Journal, Linux, Open Source, SDN Journal

Virtualization: Article

Preventing Performance Bottlenecks with Inline Deduplication

In a typical enterprise storage system the bottleneck to performance is in media bandwidth or computational overhead

Implementing high performance in enterprise storage is a constant battle to find and eliminate the next system bottleneck. Normally this alternates between limits of the underlying media and the computational overhead of metadata management, but choosing the wrong approach to deduplication can introduce a third performance challenge that can be impossible to overcome. Storage that implements a multi-pass approach to data optimization, such as staged or post-process deduplication, becomes inherently at a disadvantage for both computational and media overhead.

Common Performance Bottlenecks
In a typical enterprise storage system the bottleneck to performance is in one of two places: media bandwidth or computational overhead.

For the storage system designer, media overhead is the simplest to address - add more or faster media. In a hard disk-based storage system, this means adding faster drives, more drives, and larger drive sets. In a flash storage system bandwidth is increased by using SLC flash, adding more independent modules, allocating more over-provisioned space, and improving the flash translation layer.

Reducing computational overhead is a greater challenge. Once you have enough media bandwidth the problem becomes shuffling data to and from the storage initiators. Identifying the bottlenecks to performance here can be devilishly complex, as the designer must be concerned about matters such as system memory bandwidth, number of data copies and, especially in today's multi-core world, synchronization between multiple requests. There's no silver bullet here, so the only solution is having a very talented team of software engineers designing and optimizing the storage platform.

Adding deduplication introduces complexity directly into this most challenging area for performance improvement. Any deduplication implementation must interact directly with the storage metadata that is so critical to performance, since I/O requests are being redirected or eliminated based on the system's knowledge of duplicate data. Unless the deduplication technology has been designed and implemented in an inline, multi-core scalable, and low memory overhead way, system architects often try to separate deduplication into a separate layer and a second pass through the data. This is a mistake that harms storage performance in a way that cannot be repaired.

The Impossible Challenge of Multi-Pass Deduplication
This second pass through the data commonly occurs in two possible places: on the final storage media or when transferring data from a staging area to the final storage media.

The first case is always called post-process deduplication. Data are written to their resting media location and a separate process later reads them back, as time and bandwidth allow, determining if any portions are duplicates. If there are duplicates then storage metadata is updated to note this and space is freed for reuse. I've written extensively in the past about the risks of post-process deduplication; since it always requires additional media bandwidth and computational overhead it severely harms performance, and since there are no guarantees about when deduplication will occur it does not meet the requirements for high-change-rate use cases such as VDI.

Post-process Deduplication

The second case, where data is deduplicated as it is being moved from a staging location to a final media location, is often erroneously called inline - as it is inline with that destaging process - but is really just a modified form of post-process deduplication. As with conventional post-process deduplication, another round of data read and processing must occur. Additionally, now both the staging media and final media must provide the full system level of performance or either can become the bottleneck.

For example, some flash storage systems stage all data to a small arena of SLC flash prior to deduplication. This design doubles the number of possible performance bottlenecks in the architecture: performance writing to the staging area, front-end data ingestion, final media storage performance, and the deduplication and de-staging process itself. This sort of multi-pass deduplication process retains all of the negative performance aspects of a traditional post-process implementation.

Staged Post-process Deduplication

High Performance Requires Inline Deduplication
Any form of multi-pass deduplication introduces new bottlenecks that prevent an enterprise storage system from delivering the highest levels of performance. Post-process deduplication, whether on the final media or during a destaging process, creates additional overhead in both media access and data processing. For flash storage platforms requiring the highest levels of performance, only tightly integrated inline deduplication can meet all system requirements.

More Stories By Jered Floyd

Jered Floyd, Chief Technology Officer and Founder of Permabit Technology Corporation, is responsible for exploring strategic future directions for Permabit’s products, and providing thought leadership to guide the company’s data optimization initiatives. He has previously deployed Permabit’s effective software development methodologies and was responsible for developing Permabit product’s core protocol and initial server and system architectures.

Prior to Permabit, Floyd was a Research Scientist on the Microbial Engineering project at the MIT Artificial Intelligence Laboratory, working to bridge the gap between biological and computational systems. Earlier at Turbine, he developed a robust integration language for managing active objects in a massively distributed online virtual environment. Floyd holds Bachelor’s and Master’s degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
SYS-CON Events announced today the IoT Bootcamp – Jumpstart Your IoT Strategy, being held June 9–10, 2015, in conjunction with 16th Cloud Expo and Internet of @ThingsExpo at the Javits Center in New York City. This is your chance to jumpstart your IoT strategy. Combined with real-world scenarios and use cases, the IoT Bootcamp is not just based on presentations but includes hands-on demos and walkthroughs. We will introduce you to a variety of Do-It-Yourself IoT platforms including Arduino, Raspberry Pi, BeagleBone, Spark and Intel Edison. You will also get an overview of cloud technologies s...
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
The best mobile applications are augmented by dedicated servers, the Internet and Cloud services. Mobile developers should focus on one thing: writing the next socially disruptive viral app. Thanks to the cloud, they can focus on the overall solution, not the underlying plumbing. From iOS to Android and Windows, developers can leverage cloud services to create a common cross-platform backend to persist user settings, app data, broadcast notifications, run jobs, etc. This session provides a high level technical overview of many cloud services available to mobile app developers, includi...
SYS-CON Events announced today that Ciqada will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Ciqada™ makes it easy to connect your products to the Internet. By integrating key components - hardware, servers, dashboards, and mobile apps - into an easy-to-use, configurable system, your products can quickly and securely join the internet of things. With remote monitoring, control, and alert messaging capability, you will meet your customers' needs of tomorrow - today! Ciqada. Let your products take flight. For more inform...
Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 16th Cloud Expo at the Javits Center in New York June 9-11 will find fresh new content in a new track called PaaS | Containers & Microservices Containers are not being considered for the first time by the cloud community, but a current era of re-consideration has pushed them to the top of the cloud agenda. With the launch of Docker's initial release in March of 2013, interest was revved up several notches. Then late last...
Health care systems across the globe are under enormous strain, as facilities reach capacity and costs continue to rise. M2M and the Internet of Things have the potential to transform the industry through connected health solutions that can make care more efficient while reducing costs. In fact, Vodafone's annual M2M Barometer Report forecasts M2M applications rising to 57 percent in health care and life sciences by 2016. Lively is one of Vodafone's health care partners, whose solutions enable older adults to live independent lives while staying connected to loved ones. M2M will continue to gr...
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
Dave will share his insights on how Internet of Things for Enterprises are transforming and making more productive and efficient operations and maintenance (O&M) procedures in the cleantech industry and beyond. Speaker Bio: Dave Landa is chief operating officer of Cybozu Corp (kintone US). Based in the San Francisco Bay Area, Dave has been on the forefront of the Cloud revolution driving strategic business development on the executive teams of multiple leading Software as a Services (SaaS) application providers dating back to 2004. Cybozu's kintone.com is a leading global BYOA (Build Your O...
While not quite mainstream yet, WebRTC is starting to gain ground with Carriers, Enterprises and Independent Software Vendors (ISV’s) alike. WebRTC makes it easy for developers to add audio and video communications into their applications by using Web browsers as their platform. But like any market, every customer engagement has unique requirements, as well as constraints. And of course, one size does not fit all. In her session at WebRTC Summit, Dr. Natasha Tamaskar, Vice President, Head of Cloud and Mobile Strategy at GENBAND, will explore what is needed to take a real time communications ...
SYS-CON Events announced today that GENBAND, a leading developer of real time communications software solutions, has been named “Silver Sponsor” of SYS-CON's WebRTC Summit, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. The GENBAND team will be on hand to demonstrate their newest product, Kandy. Kandy is a communications Platform-as-a-Service (PaaS) that enables companies to seamlessly integrate more human communications into their Web and mobile applications - creating more engaging experiences for their customers and boosting collaboration and productiv...
SYS-CON Events announced today that BroadSoft, the leading global provider of Unified Communications and Collaboration (UCC) services to operators worldwide, has been named “Gold Sponsor” of SYS-CON's WebRTC Summit, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. BroadSoft is the leading provider of software and services that enable mobile, fixed-line and cable service providers to offer Unified Communications over their Internet Protocol networks. The Company’s core communications platform enables the delivery of a range of enterprise and consumer calling...
So I guess we’ve officially entered a new era of lean and mean. I say this with the announcement of Ubuntu Snappy Core, “designed for lightweight cloud container hosts running Docker and for smart devices,” according to Canonical. “Snappy Ubuntu Core is the smallest Ubuntu available, designed for security and efficiency in devices or on the cloud.” This first version of Snappy Ubuntu Core features secure app containment and Docker 1.6 (1.5 in main release), is available on public clouds, and for ARM and x86 devices on several IoT boards. It’s a Trend! This announcement comes just as...
What exactly is a cognitive application? In her session at 16th Cloud Expo, Ashley Hathaway, Product Manager at IBM Watson, will look at the services being offered by the IBM Watson Developer Cloud and what that means for developers and Big Data. She'll explore how IBM Watson and its partnerships will continue to grow and help define what it means to be a cognitive service, as well as take a look at the offerings on Bluemix. She will also check out how Watson and the Alchemy API team up to offer disruptive APIs to developers.
The IoT Bootcamp is coming to Cloud Expo | @ThingsExpo on June 9-10 at the Javits Center in New York. Instructor. Registration is now available at http://iotbootcamp.sys-con.com/ Instructor Janakiram MSV previously taught the famously successful Multi-Cloud Bootcamp at Cloud Expo | @ThingsExpo in November in Santa Clara. Now he is expanding the focus to Janakiram is the founder and CTO of Get Cloud Ready Consulting, a niche Cloud Migration and Cloud Operations firm that recently got acquired by Aditi Technologies. He is a Microsoft Regional Director for Hyderabad, India, and one of the f...
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
SYS-CON Media announced today that @ThingsExpo Blog launched with 7,788 original stories. @ThingsExpo Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @ThingsExpo Blog can be bookmarked. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago.
The world's leading Cloud event, Cloud Expo has launched Microservices Journal on the SYS-CON.com portal, featuring over 19,000 original articles, news stories, features, and blog entries. DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. Microservices Journal offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. Follow new article posts on Twitter at @MicroservicesE
SYS-CON Events announced today that robomq.io will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. robomq.io is an interoperable and composable platform that connects any device to any application. It helps systems integrators and the solution providers build new and innovative products and service for industries requiring monitoring or intelligence from devices and sensors.
Wearable technology was dominant at this year’s International Consumer Electronics Show (CES) , and MWC was no exception to this trend. New versions of favorites, such as the Samsung Gear (three new products were released: the Gear 2, the Gear 2 Neo and the Gear Fit), shared the limelight with new wearables like Pebble Time Steel (the new premium version of the company’s previously released smartwatch) and the LG Watch Urbane. The most dramatic difference at MWC was an emphasis on presenting wearables as fashion accessories and moving away from the original clunky technology associated with t...
SYS-CON Events announced today that Litmus Automation will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Litmus Automation’s vision is to provide a solution for companies that are in a rush to embrace the disruptive Internet of Things technology and leverage it for real business challenges. Litmus Automation simplifies the complexity of connected devices applications with Loop, a secure and scalable cloud platform.