Welcome!

Linux Authors: Michael Sheehan, Lavenya Dilip, Ian Thain, Bruce Armstrong, Ellen Rubin

Related Topics: Linux, Virtualization

Linux: Article

It's Not Your Father's Backup Anymore

New technology transforms backup and restore

Backup is the most important method for protecting mission-critical data. Traditionally, a backup system meant a tape drive attached to a server or mainframe. Software on the server regularly dumped an image of the entire set of disks to the tape each night. If things went well, someone pulled the tape out in the morning and put a new one in before going home at night. Advances in tape backup centered around making this process more efficient and safe by adding digital tape, encryption, automation, and compression. The core technologies - magnetic tape, tape drives, SCSI, and server software - didn't change. Even the addition of networked tape backup, either over a LAN or a Storage Area Network, only extended the old-fashioned model.

The reliance on tape technology, however, has made it difficult to meet the demands of the 24x7x365 data center. When service level agreements require that systems be restored in a few hours or even in minutes, even the best tape technology is too slow. The fastest tape backup still isn't fast enough when the backup window is near zero.

Disk-to-Disk Backup
Two new developments in backup system technology are having a profound effect on the way organizations protect vital data. The first is disk-based backup, usually called disk-to-disk backup or d2d. With disk-to-disk backup, an array of hard drives replaces the tapes as the backup medium. The disk system then emulates a tape system, using a method called virtual tape. In this way, the disk-based system is a drop-in replacement for existing tape libraries, only much faster.

While this seems like a small change, the result is immediately noticeable. First, the amount of time it takes to back up and restore data plummets. Given the speed of a typical Fibre Channel disk array (roughly 200 megabytes per second), restoring a one-terabyte disk array can happen in under 90 minutes. Even the best tape libraries would take roughly four hours to so the same thing. In environments that require fast restore times this is a critical advantage.

Backup time also shrinks considerably when using disk-to-disk backup. In fact, since the system can back up at disk system speeds, the backup window can shrink to nearly zero. This provides an opportunity to do point-in-time backups throughout the day rather than only once a day. Considering how system administrators struggle with backup windows, this alone can justify new disk-to-disk systems.

More Than Raw Speed
Besides raw speed, disk-to-disk backup systems have other important benefits. Disk drive systems are, by nature, random access storage devices. This makes it much easier (and faster) to find specific information in a backup. Backup software maintains a catalog of what's in the backup image, which is used to find a particular piece of information stored in the image. It's like having a big sign on a haystack that points to the needle you're looking for. However, retrieving that information can be time-consuming with magnetic tape. The tape has to be moved to the point on the tape where the information is stored. If the tape is at its beginning and the information at the end, it can take quite a bit of time to get to it. Compared to disks, tape drives move slowly. When the next bit of data that's needed is at the beginning, you have to go all the way back to the beginning again. Tapes systems are made to stream data to and from the media all at once. Disks are designed to access data quickly from any portion of the disk. With to disk-to-disk backup, retrieving specific application objects is much faster.

Disk systems also have very large capacities. Capacity combined with speed makes it viable to perform full backups more often, perhaps even daily. Full backups hasten recovery from failures. Organizations that rely on incremental backups run the risk that restoring operations will take more time than they can afford. First the last full backup has to be restored, then all the incremental backups. This can stretch out the time it takes to recover from a disaster. Disk-to-disk systems not only do this faster, but eliminate the need for a week's worth of incremental backups in the first place.

Disk-to-disk backup systems have one major deficit compared to tape systems - they're immobile. A tape can be removed from a tape drive and sent to a secure location. Typically disks can't. The solution has been to combine the two backup technologies into a disk-to-disk-to-tape system. With disk-to-disk-to-tape the advantages of both approaches are merged, each overcoming the other's shortcomings. Typically, the backups occur throughout the day to a disk system and a copy is made once a day or once a week to a tape system. The backups proceed more quickly - the primary disk system isn't hampered by the slow tape speeds - and the data can still be removed to a safe place (see Figure 1).

Continuous Data Protection
The second big change in backup technology is called Continuous Data Protection or CDP for short. This is primarily a software technology. With CDP, information is backed up continuously when it's created or changed. Although there are CDP systems that back up entire disk sets and volumes, most vendors focus on saving file or application objects. The advantage of CDP is the ability to restore application objects to a specific state from a specific point in time. It's a little like going back in history to correct a mistake that has significant negative consequences.

CDP provides a very fine level of protection. For example, say a member of the Sales department gets an important e-mail. This e-mail gives the salesperson what he needs to clinch the big deal he's working on. But disaster strikes. The salesperson accidentally deletes the e-mail! He can't very well ask the sender - an insider at the new customer's headquarters - to resend it. Not only might that send up a flag at that end but certainly would make the salesperson look stupid.

An application-specific CDP system would save the day. The e-mail would have been backed up as soon as it's created on the server. The system administrator can now restore just the single e-mail. Traditional backup systems would require finding and loading tapes and perhaps even restoring an entire day's worth of e-mail. The single e-mail could then be found. Of course, that would be a futile effort since the backups from yesterday wouldn't have the e-mail and today's backups haven't been run yet.

Continuous Data Protection is so useful, that it's being integrated into other data protection products, even low-end ones. Rudimentary forms of CDP are even starting to show up in desktop systems. In the none-too-distant future, CDP will be a feature of all backup software and perhaps even a core operating system function.

Only Red Hat or SuSE Need Apply
There is a downside to this technology for Linux users - lack of support for most distributions. A quick look at the products in the disk-to-disk backup and CDP categories confirms the impression that the only supported distributions are Red Hat and SuSE. Traditional software vendors, unlike Open Source providers, have limited resources and will only target specific operating systems. From their point-of-view, Linux is not an OS. The Linux distribution is the OS. Linux is treated as Unix and Windows are and support is available only for specific types and versions. If the Linux flavor of choice is Debian, Mandrake, or some other distribution, you can't expect that the disk-to-disk backup and CDP offerings will work or that the vendors will support it. This is an area that the Open Source community should be attacking, since it's an incredibly necessary function even for desktop environments.

CDP creates an incredibly safe environment when merged with disk-to-disk-to-tape backup. The CDP software constantly copies changed or new objects to a backup disk, providing protection from immediate events. The disk-to-disk-to-tape backup provides protection from wholesale disaster or events that destroy entire systems and facilities.

Conclusion
Backup is being transformed even as we speak. New technology is making backup more robust and significantly faster. Given the 24 by 7 by 365 nature of today's systems, this is a necessary and welcome change. Disk-to-disk backup systems coupled with Continuous Data Protection software will allow data to be protected from the minute it's created and restore it the moment it's needed. And for many of us, this means we can sleep better at night.

References:

  • Data Protection and Information Lifecycle Management, Tom Petrocelli, Prentice Hall, 2005
  • Using SANs and NAS, W. Curtis Preston, O'Reilly
  • You can also look to vendors sites to provide statistics on the latest products. I suggest www.quantum.com, www.ibm.com, and www.emc.com.

More Stories By Tom Petrocelli

Tom Petrocelli, president of Technology Alignment Partners, is a veteran of over 21 years in the technology arena. His background encompasses software engineering, marketing, IT, sales, marketing, and general management. He has worked in various industries including defense, digital signal processing, call center/CRM, networking, and data storage and storage networking. Tom is also the author of a new book entitled Data Protection and Information Lifecycle Management, published by Prentice Hall.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
LinuxWorld News Desk 12/15/05 01:07:53 PM EST

Information Storage & Security Journal: It's Not Your Father's Backup Anymore. Backup is the most important method for protecting mission-critical data. Traditionally, a backup system meant a tape drive attached to a server or mainframe. Software on the server regularly dumped an image of the entire set of disks to the tape each night. If things went well, someone pulled the tape out in the morning and put a new one in before going home at night. Advances in tape backup centered around making this process more efficient and safe by adding digital tape, encryption, automation, and compression. The core technologies - magnetic tape, tape drives, SCSI, and server software - didn't change. Even the addition of networked tape backup, either over a LAN or a Storage Area Network, only extended the old-fashioned model.