Click here to close now.

Welcome!

Linux Authors: Carmen Gonzalez, Pat Romanski, Jason Bloomberg, Roger Strukhoff, Srinivasan Sundara Rajan

Related Topics: Linux

Linux: Article

Why Recovering a Deleted Ext3 File Is Difficult . . .

and why you should back up important files

We have all done it before. You accidentally type in the wrong argument to rm or select the wrong file for deletion. As you hit enter, you notice your mistake and your stomach drops. You reach for the backup of the system and realize that there isn't one.

There are many undelete tools for FAT and NTFS file systems, but there are few for Ext3, which is currently the default file system for most Linux distributions. This is because of the way that Ext3 files are deleted. Crucial information that stores where the file content is located is cleared during the deletion process.

In this article, we take a low-level look at why recovery is difficult and look at some approaches that are sometimes effective. We will use some open source tools for the recovery, but the techniques are not completely automated.

What Is a File?
Before we can see how to recover files, we need to look at how files are stored. Typically, file systems are located inside of a disk partition. The partition is usually organized into 512-byte sectors. When the partition is formatted as Ext3, consecutive sectors will be grouped into blocks, whose size can range from 1,024 to 4,096 bytes. The blocks are grouped together into block groups, whose size will be tens of thousands of blocks. Each file has data stored in three major locations: blocks, inodes, and directory entries. The file content is stored in blocks, which are allocated for the exclusive use of the file. A file is allocated as many blocks as it needs. Ideally, the file will be allocated consecutive blocks, but this is not always possible.

The metadata for the file is stored in an inode structure, which is located in an inode table at the beginning of a block group. There are a finite number of inodes and each is assigned to a block group. File metadata includes the temporal data such as the last modified, last accessed, last changed, and deleted times. Metadata also includes the file size, user ID, group ID, permissions, and block addresses where the file content is stored.

The addresses of the first 12 blocks are saved in the inode and additional addresses are stored externally in blocks, called indirect blocks. If the file requires many blocks and not all of the addresses can fit into one indirect block, a double indirect block is used whose address is given in the inode. The double indirect block contains addresses of single indirect blocks, which contain addresses of blocks with file content. There is also a triple indirect address in the inode that adds one more layer of pointers.

Last, the file's name is stored in a directory entry structure, which is located in a block allocated to the file's parent directory. An Ext3 directory is similar to a file and its blocks contain a list of directory entry structures, each containing the name of a file and the inode address where the file metadata is stored. When you use the ls -i command, you can see the inode address that corresponds to each file name. We can see the relationship between the directory entry, the inode, and the blocks in Figure 1.

When a new file is created, the operating system (OS) gets to choose which blocks and inode it will allocate for the file. Linux will try to allocate the blocks and inode in the same block group as its parent directory. This causes files in the same directory to be close together. Later we'll use this fact to restrict where we search for deleted data.

The Ext3 file system has a journal that records updates to the file system metadata before the update occurs. In case of a system crash, the OS reads the journal and will either reprocess or roll back the transactions in the journal so that recovery will be faster then examining each metadata structure, which is the old and slow way. Example metadata structures include the directory entries that store file names and inodes that store file metadata. The journal contains the full block that is being updated, not just the value being changed. When a new file is created, the journal should contain the updated version of the blocks containing the directory entry and the inode.

Deletion Process
Several things occur when an Ext3 file is deleted from Linux. Keep in mind that the OS gets to choose exactly what occurs when a file is deleted and this article assumes a general Linux system.

At a minimum, the OS must mark each of the blocks, the inode, and the directory entry as unallocated so that later files can use them. This minimal approach is what occurred several years ago with the Ext2 file system. In this case, the recovery process was relatively simple because the inode still contained the block addresses for the file content and tools such as debugfs and e2undel could easily re-create the file. This worked as long as the blocks had not been allocated to a new file and the original content was not overwritten.

With Ext3, there is an additional step that makes recovery much more difficult. When the blocks are unallocated, the file size and block addresses in the inode are cleared; therefore we can no longer determine where the file content was located. We can see the relationship between the directory entry, the inode, and the blocks of an unallocated file in Figure 2.

Recovery Approaches
Now that we know the components involved with files and which ones are cleared during deletion, we can examine two approaches to file recovery (besides using a backup). The first approach uses the application type of the deleted file and the second approach uses data in the journal. Regardless of the approach, you should stop using the file system because you could create a file that overwrites the data you are trying to recover. You can power the system off and put the drive in another Linux computer as a slave drive or boot from a Linux CD.

The first step for both techniques is to determine the deleted file's inode address. This can be determined from debugfs or The Sleuth Kit (TSK). I'll give the debugfs method here. debugfs comes with most Linux distributions and is a file system debugger. To start debugfs, you'll need to know the device name for the partition that contains the deleted file. In my example, I have booted from a CD and the file is located on /dev/hda5:

# debugfs /dev/hda5
debugfs 1.37 (21-Mar-2005)
debugfs:

We can then use the cd command to change to the directory of the deleted file:

debugfs: cd /home/carrier/

The ls -d command will list the allocated and deleted files in the directory. Remember that the directory entry structure stores the name and the inode of the file and this listing will give us both values because neither is cleared during the deletion process. The deleted files have their inode address surrounded by "<" and ">":

debugfs: ls -d
415848 (12) . 376097 (12) .. 415864 (16) .bashrc
[...]
<415926> (28) oops.dat

More Stories By Brian Carrier

Brian Carrier has authored several leading computer forensic tools, including The Sleuth Kit (formerly The @stake Sleuth Kit) and the Autopsy Forensic Browser. He has authored several peer-reviewed conference and journal papers and has created publicly available testing images for forensic tools. Currently pursuing a Ph.D. in Computer Science and Digital Forensics at Purdue University, he is also a research assistant at the Center for Education and Research in Information Assurance and Security
(CERIAS) there. He formerly served as a research scientist at @stake and as the lead for the @stake Response Team and Digital Forensic Labs. Carrier has taught forensics, incident response, and file systems at SANS, FIRST, the @stake Academy, and SEARCH. He is the author of File System Forensic Analysis (Addison-Wesley, ISBN 0321268172).

Comments (6) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
theusr 07/09/09 09:29:00 AM EDT

The figure 2 maybe misleading: the links between the address blocks and the file content are still there (though the address blocks are unallocated), that what's make the recovery possible.

Mike Kay 01/15/08 03:57:07 PM EST

Excellent article. Followed it step by step and successfully recovered a .XLS spreadsheet that had been deleted from the /tmp folder on Ubuntu Gutsy. It also found an associated .jpg that I wasn't looking for!

Saved me hours of retyping. Thanks a lot.

Jahangir 10/22/07 05:26:36 PM EDT

This was really the best article i could find inspite of 3 hrs of googling.

But what if you are trying to recover a 6GB VM.
Since VMware files are not recognized by foremost, how can we get the magic number to get the header for the VM files ??

ruintower 04/23/06 09:07:29 PM EDT

Trackback Added: ext3 undelete;   I “mis-deleted” a big file several days ago. So I umount the the partition immediately and searched the recovery method because I knew (but forgot) some methods to recovery file in Linux. However, the result is disappointed. Alt...

marco 03/13/06 08:04:20 AM EST

U have saved my life.
I had lost all my application files under tomcat with de deploy command... no backup ..gulp
now I have a 128MB ascii file with my lost files, it's great.

U are a GURU,
thanx

marco 03/13/06 08:04:04 AM EST

U have saved my life.
I had lost all my application files under tomcat with de deploy command... no backup ..gulp
now I have a 128MB ascii file with my lost files, it's great.

U are a GURU,
thanx

@ThingsExpo Stories
In 2015, 4.9 billion connected "things" will be in use. By 2020, Gartner forecasts this amount to be 25 billion, a 410 percent increase in just five years. How will businesses handle this rapid growth of data? Hadoop will continue to improve its technology to meet business demands, by enabling businesses to access/analyze data in real time, when and where they need it. Cloudera's Chief Technologist, Eli Collins, will discuss how Big Data is keeping up with today's data demands and how in the future, data and analytics will be pervasive, embedded into every workflow, application and infra...
The best mobile applications are augmented by dedicated servers, the Internet and Cloud services. Mobile developers should focus on one thing: writing the next socially disruptive viral app. Thanks to the cloud, they can focus on the overall solution, not the underlying plumbing. From iOS to Android and Windows, developers can leverage cloud services to create a common cross-platform backend to persist user settings, app data, broadcast notifications, run jobs, etc. This session provides a high level technical overview of many cloud services available to mobile app developers, includi...
Participants will reach the final if their IoT solution is liked. A community vote will determine the best solutions submitted in each country, after which an expert jury will select the national winners and the best international IoT solution. Each country's best solution can win a national marketing campaign worth up to €30,000 and become a partner in Deutsche Telekom's participating markets. The winning international solution can become partner of Deutsche Telekom Group across all eight countries and reach out to a potential of 10,8 million business customers. Deutsche Telekom Group has a...
Health care systems across the globe are under enormous strain, as facilities reach capacity and costs continue to rise. M2M and the Internet of Things have the potential to transform the industry through connected health solutions that can make care more efficient while reducing costs. In fact, Vodafone's annual M2M Barometer Report forecasts M2M applications rising to 57 percent in health care and life sciences by 2016. Lively is one of Vodafone's health care partners, whose solutions enable older adults to live independent lives while staying connected to loved ones. M2M will continue to gr...
Dave will share his insights on how Internet of Things for Enterprises are transforming and making more productive and efficient operations and maintenance (O&M) procedures in the cleantech industry and beyond. Speaker Bio: Dave Landa is chief operating officer of Cybozu Corp (kintone US). Based in the San Francisco Bay Area, Dave has been on the forefront of the Cloud revolution driving strategic business development on the executive teams of multiple leading Software as a Services (SaaS) application providers dating back to 2004. Cybozu's kintone.com is a leading global BYOA (Build Your O...
SYS-CON Events announced today that Vicom Computer Services, Inc., a provider of technology and service solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. They are located at booth #427. Vicom Computer Services, Inc. is a progressive leader in the technology industry for over 30 years. Headquartered in the NY Metropolitan area. Vicom provides products and services based on today’s requirements around Unified Networks, Cloud Computing strategies, Virtualization around Software defined Data Ce...
VoxImplant has announced full WebRTC support in the newest versions of its Android SDK and iOS SDK. The updated SDKs, which enable audio and video calls on mobile devices, are now compatible with the WebRTC standard to allow any mobile app to communicate with WebRTC-enabled browsers, including Google Chrome, Mozilla Firefox, Opera, and, when available, Microsoft Spartan. The WebRTC-updated SDKs represent VoxImplant's continued leadership in simplifying the development of real-time communications (RTC) services for app developers. VoxImplant (built by Zingaya, the real-time communication servi...
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
What exactly is a cognitive application? In her session at 16th Cloud Expo, Ashley Hathaway, Product Manager at IBM Watson, will look at the services being offered by the IBM Watson Developer Cloud and what that means for developers and Big Data. She'll explore how IBM Watson and its partnerships will continue to grow and help define what it means to be a cognitive service, as well as take a look at the offerings on Bluemix. She will also check out how Watson and the Alchemy API team up to offer disruptive APIs to developers.
The IoT Bootcamp is coming to Cloud Expo | @ThingsExpo on June 9-10 at the Javits Center in New York. Instructor. Registration is now available at http://iotbootcamp.sys-con.com/ Instructor Janakiram MSV previously taught the famously successful Multi-Cloud Bootcamp at Cloud Expo | @ThingsExpo in November in Santa Clara. Now he is expanding the focus to Janakiram is the founder and CTO of Get Cloud Ready Consulting, a niche Cloud Migration and Cloud Operations firm that recently got acquired by Aditi Technologies. He is a Microsoft Regional Director for Hyderabad, India, and one of the f...
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
With IoT exploding, massive data will transform businesses with opportunities to monetize almost anything that can be measured. In this C-Level Roundtable Discussion at @ThingsExpo, Brendan O’Brien, Aria Systems Co-founder and Chief Evangelist, will lead an expert panel of consultants, thought leaders and practitioners who will look at these new monetization trends, discuss the implications, and detail lessons learned from their collective experience. Finally, the panel will point the way forward for enterprises who wish to leverage the resulting complex recurring revenue models, adding valu...
The WebRTC Summit 2015 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
From telemedicine to smart cars, digital homes and industrial monitoring, the explosive growth of IoT has created exciting new business opportunities for real time calls and messaging. In his session at @ThingsExpo, Ivelin Ivanov, CEO and Co-Founder of Telestax, shared some of the new revenue sources that IoT created for Restcomm – the open source telephony platform from Telestax. Ivelin Ivanov is a technology entrepreneur who founded Mobicents, an Open Source VoIP Platform, to help create, deploy, and manage applications integrating voice, video and data. He is the co-founder of TeleStax, a...
As enterprises move to all-IP networks and cloud-based applications, communications service providers (CSPs) – facing increased competition from over-the-top providers delivering content via the Internet and independently of CSPs – must be able to offer seamless cloud-based communication and collaboration solutions that can scale for small, midsize, and large enterprises, as well as public sector organizations, in order to keep and grow market share. The latest version of Oracle Communications Unified Communications Suite gives CSPs the capability to do just that. In addition, its integration ...
Can call centers hang up the phones for good? Intuitive Solutions did. WebRTC enabled this contact center provider to eliminate antiquated telephony and desktop phone infrastructure with a pure web-based solution, allowing them to expand beyond brick-and-mortar confines to a home-based agent model. It also ensured scalability and better service for customers, including MUY! Companies, one of the country's largest franchise restaurant companies with 232 Pizza Hut locations. This is one example of WebRTC adoption today, but the potential is limitless when powered by IoT.
SYS-CON Events announced today that Ciqada will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Ciqada™ makes it easy to connect your products to the Internet. By integrating key components - hardware, servers, dashboards, and mobile apps - into an easy-to-use, configurable system, your products can quickly and securely join the internet of things. With remote monitoring, control, and alert messaging capability, you will meet your customers' needs of tomorrow - today! Ciqada. Let your products take flight. For more inform...
SYS-CON Events announced today that SoftLayer, an IBM company, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place June 9-11, 2015 at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place November 3–5, 2015 at the Santa Clara Convention Center in Santa Clara, CA. SoftLayer operates a global cloud infrastructure platform built for Internet scale. With a global footprint of data centers and network points of presence, SoftLayer provides infrastructure as a service to leading-edge customers ranging from ...
SYS-CON Events announced today that Cisco, the worldwide leader in IT that transforms how people connect, communicate and collaborate, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cisco makes amazing things happen by connecting the unconnected. Cisco has shaped the future of the Internet by becoming the worldwide leader in transforming how people connect, communicate and collaborate. Cisco and our partners are building the platform for the Internet of Everything by connecting the...