|By Brian Carrier||
|August 12, 2005 03:00 PM EDT||
We have all done it before. You accidentally type in the wrong argument to rm or select the wrong file for deletion. As you hit enter, you notice your mistake and your stomach drops. You reach for the backup of the system and realize that there isn't one.
There are many undelete tools for FAT and NTFS file systems, but there are few for Ext3, which is currently the default file system for most Linux distributions. This is because of the way that Ext3 files are deleted. Crucial information that stores where the file content is located is cleared during the deletion process.
In this article, we take a low-level look at why recovery is difficult and look at some approaches that are sometimes effective. We will use some open source tools for the recovery, but the techniques are not completely automated.
What Is a File?
Before we can see how to recover files, we need to look at how files are stored. Typically, file systems are located inside of a disk partition. The partition is usually organized into 512-byte sectors. When the partition is formatted as Ext3, consecutive sectors will be grouped into blocks, whose size can range from 1,024 to 4,096 bytes. The blocks are grouped together into block groups, whose size will be tens of thousands of blocks. Each file has data stored in three major locations: blocks, inodes, and directory entries. The file content is stored in blocks, which are allocated for the exclusive use of the file. A file is allocated as many blocks as it needs. Ideally, the file will be allocated consecutive blocks, but this is not always possible.
The metadata for the file is stored in an inode structure, which is located in an inode table at the beginning of a block group. There are a finite number of inodes and each is assigned to a block group. File metadata includes the temporal data such as the last modified, last accessed, last changed, and deleted times. Metadata also includes the file size, user ID, group ID, permissions, and block addresses where the file content is stored.
The addresses of the first 12 blocks are saved in the inode and additional addresses are stored externally in blocks, called indirect blocks. If the file requires many blocks and not all of the addresses can fit into one indirect block, a double indirect block is used whose address is given in the inode. The double indirect block contains addresses of single indirect blocks, which contain addresses of blocks with file content. There is also a triple indirect address in the inode that adds one more layer of pointers.
Last, the file's name is stored in a directory entry structure, which is located in a block allocated to the file's parent directory. An Ext3 directory is similar to a file and its blocks contain a list of directory entry structures, each containing the name of a file and the inode address where the file metadata is stored. When you use the ls -i command, you can see the inode address that corresponds to each file name. We can see the relationship between the directory entry, the inode, and the blocks in Figure 1.
When a new file is created, the operating system (OS) gets to choose which blocks and inode it will allocate for the file. Linux will try to allocate the blocks and inode in the same block group as its parent directory. This causes files in the same directory to be close together. Later we'll use this fact to restrict where we search for deleted data.
The Ext3 file system has a journal that records updates to the file system metadata before the update occurs. In case of a system crash, the OS reads the journal and will either reprocess or roll back the transactions in the journal so that recovery will be faster then examining each metadata structure, which is the old and slow way. Example metadata structures include the directory entries that store file names and inodes that store file metadata. The journal contains the full block that is being updated, not just the value being changed. When a new file is created, the journal should contain the updated version of the blocks containing the directory entry and the inode.
Several things occur when an Ext3 file is deleted from Linux. Keep in mind that the OS gets to choose exactly what occurs when a file is deleted and this article assumes a general Linux system.
At a minimum, the OS must mark each of the blocks, the inode, and the directory entry as unallocated so that later files can use them. This minimal approach is what occurred several years ago with the Ext2 file system. In this case, the recovery process was relatively simple because the inode still contained the block addresses for the file content and tools such as debugfs and e2undel could easily re-create the file. This worked as long as the blocks had not been allocated to a new file and the original content was not overwritten.
With Ext3, there is an additional step that makes recovery much more difficult. When the blocks are unallocated, the file size and block addresses in the inode are cleared; therefore we can no longer determine where the file content was located. We can see the relationship between the directory entry, the inode, and the blocks of an unallocated file in Figure 2.
Now that we know the components involved with files and which ones are cleared during deletion, we can examine two approaches to file recovery (besides using a backup). The first approach uses the application type of the deleted file and the second approach uses data in the journal. Regardless of the approach, you should stop using the file system because you could create a file that overwrites the data you are trying to recover. You can power the system off and put the drive in another Linux computer as a slave drive or boot from a Linux CD.
The first step for both techniques is to determine the deleted file's inode address. This can be determined from debugfs or The Sleuth Kit (TSK). I'll give the debugfs method here. debugfs comes with most Linux distributions and is a file system debugger. To start debugfs, you'll need to know the device name for the partition that contains the deleted file. In my example, I have booted from a CD and the file is located on /dev/hda5:
# debugfs /dev/hda5
debugfs 1.37 (21-Mar-2005)
We can then use the cd command to change to the directory of the deleted file:
debugfs: cd /home/carrier/
The ls -d command will list the allocated and deleted files in the directory. Remember that the directory entry structure stores the name and the inode of the file and this listing will give us both values because neither is cleared during the deletion process. The deleted files have their inode address surrounded by "<" and ">":
debugfs: ls -d
415848 (12) . 376097 (12) .. 415864 (16) .bashrc
<415926> (28) oops.dat
|theusr 07/09/09 09:29:00 AM EDT|
The figure 2 maybe misleading: the links between the address blocks and the file content are still there (though the address blocks are unallocated), that what's make the recovery possible.
|Mike Kay 01/15/08 03:57:07 PM EST|
Excellent article. Followed it step by step and successfully recovered a .XLS spreadsheet that had been deleted from the /tmp folder on Ubuntu Gutsy. It also found an associated .jpg that I wasn't looking for!
Saved me hours of retyping. Thanks a lot.
|Jahangir 10/22/07 05:26:36 PM EDT|
This was really the best article i could find inspite of 3 hrs of googling.
But what if you are trying to recover a 6GB VM.
|ruintower 04/23/06 09:07:29 PM EDT|
Trackback Added: ext3 undelete; I “mis-deleted” a big file several days ago. So I umount the the partition immediately and searched the recovery method because I knew (but forgot) some methods to recovery file in Linux. However, the result is disappointed. Alt...
|marco 03/13/06 08:04:20 AM EST|
U have saved my life.
U are a GURU,
|marco 03/13/06 08:04:04 AM EST|
U have saved my life.
U are a GURU,
IoT is at the core or many Digital Transformation initiatives with the goal of re-inventing a company's business model. We all agree that collecting relevant IoT data will result in massive amounts of data needing to be stored. However, with the rapid development of IoT devices and ongoing business model transformation, we are not able to predict the volume and growth of IoT data. And with the lack of IoT history, traditional methods of IT and infrastructure planning based on the past do not app...
Feb. 24, 2017 01:00 AM EST Reads: 1,871
WebRTC services have already permeated corporate communications in the form of videoconferencing solutions. However, WebRTC has the potential of going beyond and catalyzing a new class of services providing more than calls with capabilities such as mass-scale real-time media broadcasting, enriched and augmented video, person-to-machine and machine-to-machine communications. In his session at @ThingsExpo, Luis Lopez, CEO of Kurento, introduced the technologies required for implementing these idea...
Feb. 23, 2017 11:30 PM EST Reads: 6,282
Why do your mobile transformations need to happen today? Mobile is the strategy that enterprise transformation centers on to drive customer engagement. In his general session at @ThingsExpo, Roger Woods, Director, Mobile Product & Strategy – Adobe Marketing Cloud, covered key IoT and mobile trends that are forcing mobile transformation, key components of a solid mobile strategy and explored how brands are effectively driving mobile change throughout the enterprise.
Feb. 23, 2017 11:00 PM EST Reads: 7,004
Apache Hadoop is emerging as a distributed platform for handling large and fast incoming streams of data. Predictive maintenance, supply chain optimization, and Internet-of-Things analysis are examples where Hadoop provides the scalable storage, processing, and analytics platform to gain meaningful insights from granular data that is typically only valuable from a large-scale, aggregate view. One architecture useful for capturing and analyzing streaming data is the Lambda Architecture, represent...
Feb. 23, 2017 10:00 PM EST Reads: 4,592
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
Feb. 23, 2017 09:15 PM EST Reads: 1,507
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
Feb. 23, 2017 08:45 PM EST Reads: 1,289
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
Feb. 23, 2017 08:30 PM EST Reads: 1,445
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and sh...
Feb. 23, 2017 07:45 PM EST Reads: 3,374
SYS-CON Events announced today that IoT Now has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.
Feb. 23, 2017 06:45 PM EST Reads: 1,449
SYS-CON Events announced today that WineSOFT will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Based in Seoul and Irvine, WineSOFT is an innovative software house focusing on internet infrastructure solutions. The venture started as a bootstrap start-up in 2010 by focusing on making the internet faster and more powerful. WineSOFT’s knowledge is based on the expertise of TCP/IP, VPN, SSL, peer-to-peer, mob...
Feb. 23, 2017 06:30 PM EST Reads: 1,720
As organizations realize the scope of the Internet of Things, gaining key insights from Big Data, through the use of advanced analytics, becomes crucial. However, IoT also creates the need for petabyte scale storage of data from millions of devices. A new type of Storage is required which seamlessly integrates robust data analytics with massive scale. These storage systems will act as “smart systems” provide in-place analytics that speed discovery and enable businesses to quickly derive meaningf...
Feb. 23, 2017 06:30 PM EST Reads: 6,292
The Internet of Things can drive efficiency for airlines and airports. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Sudip Majumder, senior director of development at Oracle, discussed the technical details of the connected airline baggage and related social media solutions. These IoT applications will enhance travelers' journey experience and drive efficiency for the airlines and the airports.
Feb. 23, 2017 06:00 PM EST Reads: 1,499
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
Feb. 23, 2017 06:00 PM EST Reads: 13,044
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
Feb. 23, 2017 05:45 PM EST Reads: 1,217
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
Feb. 23, 2017 05:30 PM EST Reads: 1,971
SYS-CON Events announced today that Dataloop.IO, an innovator in cloud IT-monitoring whose products help organizations save time and money, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Dataloop.IO is an emerging software company on the cutting edge of major IT-infrastructure trends including cloud computing and microservices. The company, founded in the UK but now based in San Fran...
Feb. 23, 2017 04:45 PM EST Reads: 2,561
SYS-CON Events announced today that Outlyer, a monitoring service for DevOps and operations teams, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Outlyer is a monitoring service for DevOps and Operations teams running Cloud, SaaS, Microservices and IoT deployments. Designed for today's dynamic environments that need beyond cloud-scale monitoring, we make monitoring effortless so you...
Feb. 23, 2017 03:45 PM EST Reads: 1,660
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
Feb. 23, 2017 03:00 PM EST Reads: 1,883
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settle...
Feb. 23, 2017 02:15 PM EST Reads: 1,224
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
Feb. 23, 2017 02:15 PM EST Reads: 1,183