|By Brian Carrier||
|August 12, 2005 03:00 PM EDT||
We have all done it before. You accidentally type in the wrong argument to rm or select the wrong file for deletion. As you hit enter, you notice your mistake and your stomach drops. You reach for the backup of the system and realize that there isn't one.
There are many undelete tools for FAT and NTFS file systems, but there are few for Ext3, which is currently the default file system for most Linux distributions. This is because of the way that Ext3 files are deleted. Crucial information that stores where the file content is located is cleared during the deletion process.
In this article, we take a low-level look at why recovery is difficult and look at some approaches that are sometimes effective. We will use some open source tools for the recovery, but the techniques are not completely automated.
What Is a File?
Before we can see how to recover files, we need to look at how files are stored. Typically, file systems are located inside of a disk partition. The partition is usually organized into 512-byte sectors. When the partition is formatted as Ext3, consecutive sectors will be grouped into blocks, whose size can range from 1,024 to 4,096 bytes. The blocks are grouped together into block groups, whose size will be tens of thousands of blocks. Each file has data stored in three major locations: blocks, inodes, and directory entries. The file content is stored in blocks, which are allocated for the exclusive use of the file. A file is allocated as many blocks as it needs. Ideally, the file will be allocated consecutive blocks, but this is not always possible.
The metadata for the file is stored in an inode structure, which is located in an inode table at the beginning of a block group. There are a finite number of inodes and each is assigned to a block group. File metadata includes the temporal data such as the last modified, last accessed, last changed, and deleted times. Metadata also includes the file size, user ID, group ID, permissions, and block addresses where the file content is stored.
The addresses of the first 12 blocks are saved in the inode and additional addresses are stored externally in blocks, called indirect blocks. If the file requires many blocks and not all of the addresses can fit into one indirect block, a double indirect block is used whose address is given in the inode. The double indirect block contains addresses of single indirect blocks, which contain addresses of blocks with file content. There is also a triple indirect address in the inode that adds one more layer of pointers.
Last, the file's name is stored in a directory entry structure, which is located in a block allocated to the file's parent directory. An Ext3 directory is similar to a file and its blocks contain a list of directory entry structures, each containing the name of a file and the inode address where the file metadata is stored. When you use the ls -i command, you can see the inode address that corresponds to each file name. We can see the relationship between the directory entry, the inode, and the blocks in Figure 1.
When a new file is created, the operating system (OS) gets to choose which blocks and inode it will allocate for the file. Linux will try to allocate the blocks and inode in the same block group as its parent directory. This causes files in the same directory to be close together. Later we'll use this fact to restrict where we search for deleted data.
The Ext3 file system has a journal that records updates to the file system metadata before the update occurs. In case of a system crash, the OS reads the journal and will either reprocess or roll back the transactions in the journal so that recovery will be faster then examining each metadata structure, which is the old and slow way. Example metadata structures include the directory entries that store file names and inodes that store file metadata. The journal contains the full block that is being updated, not just the value being changed. When a new file is created, the journal should contain the updated version of the blocks containing the directory entry and the inode.
Several things occur when an Ext3 file is deleted from Linux. Keep in mind that the OS gets to choose exactly what occurs when a file is deleted and this article assumes a general Linux system.
At a minimum, the OS must mark each of the blocks, the inode, and the directory entry as unallocated so that later files can use them. This minimal approach is what occurred several years ago with the Ext2 file system. In this case, the recovery process was relatively simple because the inode still contained the block addresses for the file content and tools such as debugfs and e2undel could easily re-create the file. This worked as long as the blocks had not been allocated to a new file and the original content was not overwritten.
With Ext3, there is an additional step that makes recovery much more difficult. When the blocks are unallocated, the file size and block addresses in the inode are cleared; therefore we can no longer determine where the file content was located. We can see the relationship between the directory entry, the inode, and the blocks of an unallocated file in Figure 2.
Now that we know the components involved with files and which ones are cleared during deletion, we can examine two approaches to file recovery (besides using a backup). The first approach uses the application type of the deleted file and the second approach uses data in the journal. Regardless of the approach, you should stop using the file system because you could create a file that overwrites the data you are trying to recover. You can power the system off and put the drive in another Linux computer as a slave drive or boot from a Linux CD.
The first step for both techniques is to determine the deleted file's inode address. This can be determined from debugfs or The Sleuth Kit (TSK). I'll give the debugfs method here. debugfs comes with most Linux distributions and is a file system debugger. To start debugfs, you'll need to know the device name for the partition that contains the deleted file. In my example, I have booted from a CD and the file is located on /dev/hda5:
# debugfs /dev/hda5
debugfs 1.37 (21-Mar-2005)
We can then use the cd command to change to the directory of the deleted file:
debugfs: cd /home/carrier/
The ls -d command will list the allocated and deleted files in the directory. Remember that the directory entry structure stores the name and the inode of the file and this listing will give us both values because neither is cleared during the deletion process. The deleted files have their inode address surrounded by "<" and ">":
debugfs: ls -d
415848 (12) . 376097 (12) .. 415864 (16) .bashrc
<415926> (28) oops.dat
|theusr 07/09/09 09:29:00 AM EDT|
The figure 2 maybe misleading: the links between the address blocks and the file content are still there (though the address blocks are unallocated), that what's make the recovery possible.
|Mike Kay 01/15/08 03:57:07 PM EST|
Excellent article. Followed it step by step and successfully recovered a .XLS spreadsheet that had been deleted from the /tmp folder on Ubuntu Gutsy. It also found an associated .jpg that I wasn't looking for!
Saved me hours of retyping. Thanks a lot.
|Jahangir 10/22/07 05:26:36 PM EDT|
This was really the best article i could find inspite of 3 hrs of googling.
But what if you are trying to recover a 6GB VM.
|ruintower 04/23/06 09:07:29 PM EDT|
Trackback Added: ext3 undelete; I “mis-deleted” a big file several days ago. So I umount the the partition immediately and searched the recovery method because I knew (but forgot) some methods to recovery file in Linux. However, the result is disappointed. Alt...
|marco 03/13/06 08:04:20 AM EST|
U have saved my life.
U are a GURU,
|marco 03/13/06 08:04:04 AM EST|
U have saved my life.
U are a GURU,
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
Apr. 28, 2017 05:15 AM EDT Reads: 2,365
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
Apr. 28, 2017 03:15 AM EDT Reads: 910
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY. DatacenterDynamics is a brand of DCD Group, a global B2B media and publishing company that develops products to help senior professionals in the world's most ICT dependent organizations make risk-based infrastructure and capacity decisions.
Apr. 28, 2017 03:00 AM EDT Reads: 6,224
SYS-CON Events announced today that Hitachi, the leading provider the Internet of Things and Digital Transformation, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Hitachi Data Systems, a wholly owned subsidiary of Hitachi, Ltd., offers an integrated portfolio of services and solutions that enable digital transformation through enhanced data management, governance, mobility and analytics. We help globa...
Apr. 28, 2017 02:30 AM EDT Reads: 1,366
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
Apr. 28, 2017 01:15 AM EDT Reads: 9,268
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
Apr. 28, 2017 12:45 AM EDT Reads: 1,683
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
Apr. 28, 2017 12:15 AM EDT Reads: 1,377
Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the USA and Europe, we work with a variety of customers from emerging startups to Fortune 1000 companies.
Apr. 27, 2017 10:45 PM EDT Reads: 2,436
Financial Technology has become a topic of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 20th Cloud Expo at the Javits Center in New York, June 6-8, 2017, will find fresh new content in a new track called FinTech.
Apr. 27, 2017 10:30 PM EDT Reads: 2,456
@GonzalezCarmen has been ranked the Number One Influencer and @ThingsExpo has been named the Number One Brand in the “M2M 2016: Top 100 Influencers and Brands” by Analytic. Onalytica analyzed tweets over the last 6 months mentioning the keywords M2M OR “Machine to Machine.” They then identified the top 100 most influential brands and individuals leading the discussion on Twitter.
Apr. 27, 2017 10:30 PM EDT Reads: 1,311
Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & ...
Apr. 27, 2017 10:15 PM EDT Reads: 9,253
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
Apr. 27, 2017 10:15 PM EDT Reads: 1,488
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 add...
Apr. 27, 2017 10:00 PM EDT Reads: 2,078
Web Real-Time Communication APIs have quickly revolutionized what browsers are capable of. In addition to video and audio streams, we can now bi-directionally send arbitrary data over WebRTC's PeerConnection Data Channels. With the advent of Progressive Web Apps and new hardware APIs such as WebBluetooh and WebUSB, we can finally enable users to stitch together the Internet of Things directly from their browsers while communicating privately and securely in a decentralized way.
Apr. 27, 2017 09:15 PM EDT Reads: 9,134
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Apr. 27, 2017 09:15 PM EDT Reads: 7,315
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
Apr. 27, 2017 09:15 PM EDT Reads: 1,292
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...
Apr. 27, 2017 08:45 PM EDT Reads: 2,517
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
Apr. 27, 2017 08:00 PM EDT Reads: 1,155
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
Apr. 27, 2017 07:45 PM EDT Reads: 2,325
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
Apr. 27, 2017 06:45 PM EDT Reads: 2,250