Welcome!

Linux Containers Authors: Liz McMillan, Elizabeth White, Zakia Bouachraoui, Pat Romanski, Stefana Muller

Related Topics: Linux Containers

Linux Containers: Article

An Advanced File System for Linux

Demanded by enterprises and beneficial to everyone

As Linux made its way further into the enterprise, a key feature that it was lacking at one point in time was a journaling file system. This was true in 1999, but today there are four journaling file systems that can solve enterprise server requirements. This article focuses on one of them: JFS.

The file system is one of the most important parts of an operating system. It stores and manages user data on disk drives and ensures that what's read from storage is identical to what was originally written. In addition to storing user data in files, the file system also creates and manages information about files and about itself. Besides guaranteeing the integrity of all that data, file systems are also expected to be extremely reliable and have excellent performance.

Before the year 2000, Ext2 was the de facto file system for most Linux machines; it was robust, reliable, and suitable for most deployments. However, as Linux displaced Unix and other operating systems in more and more large server and computing environments, Ext2 was pushed to its limits. In fact, many now-common requirements - large hard-disk volumes, quick recovery from crashes, high-performance I/O, and the need to store millions of files representing terabytes of data - exceed the capabilities of Ext2.

Fortunately, a number of other Linux file systems pick up where Ext2 leaves off. Indeed, Linux now offers four alternatives to Ext2: Ext3, JFS, ReiserFS, and XFS. In addition to meeting some or all of the previously mentioned requirements, each of these alternative file systems also supports journaling, a feature certainly demanded by enterprises but beneficial to anyone running Linux. A journaling file system can simplify restarts, reduce fragmentation, and accelerate I/O. Better yet, journaling file systems make fscks a thing of the past.

To better appreciate the benefits of file systems, it's helpful to speak the vernacular of file systems.

  • Logical block (or a file system's block size): The smallest unit of storage that can be allocated by the file system. A logical block is measured in bytes, and it may take several blocks to store a single file.
  • Logical volume: One or more physical disks or some subset of the physical disk space.
  • Block allocation: A method of allocating blocks in which the file system allocates one block at a time. With this method, a pointer to every block in a file is maintained and recorded. Ext2 uses block allocation.
  • Extent: A large number of contiguous blocks. Each extent is described by a triple, consisting of file offset, starting block number, and length. File offset is the offset of the extent's first block from the beginning of the file; starting block number is the first block in the extent; and length is the number of blocks in the extent. Extents are allocated and tracked as a single unit, meaning that a single pointer tracks a group of blocks. For large files, extent allocation is a much more efficient technique than block allocation. Figure 1 shows how extents are used.
  • File system metadata: The file system's internal data structures - everything concerning a file except the actual data inside the file. Metadata includes date and time stamps, ownership information, file access permissions, other security information such as access control lists (if they exist), the file's size, and the storage location or locations on disk.
  • Inode: Stores all the information about a file except the data itself. You can think of an inode as a "bookkeeping" file for a file (indeed, an inode is a structure that consumes blocks, too). An inode contains file permissions, file types, and the number of links to the file. Every inode has a unique inode number that distinguishes it from every other inode.
An extent is described by its block offset in the file, the location of the first block in the extent, and the length of the extent. If file sample.txt requires 18 blocks, and the file system is able to allocate one extent of length 8, a second extent of length 5, and a third extent of length 5, the file system would look something like Figure 1. The first extent has offset 0 (block A in the file), location 10, and length 8. The second extent has offset 8 (block I), location 20, and length 5. The last extent has offset 13 (block N), location 35, and length 5.

How File Systems Go Bad

With these concepts in mind, here's what happens when a three-block file is modified and grows to be a five-block file:
  1. Two new blocks are allocated to hold the new data.
  2. The file's inode is updated to record the new size of the file.
  3. The actual data is written into the blocks.
As you can see, while writing data to a file appears to be a single atomic operation, the actual process involves a number of steps (even more steps than shown here if you consider all of the accounting required to remove the two blocks from the free list of blocks and other metadata changes).

If all the steps to write a file are completed correctly (and this happens most of the time), the file is saved successfully. However, if the process is interrupted at any time (perhaps due to power failure or other system failure), a non-journal file system can end up in an inconsistent state. Corruption occurs because the logical operation of writing (or updating) a file is actually a sequence of I/O, and the entire operation may not be totally reflected on the media at any given point in time. A journaling file system uses transactions to keep track of metadata changes. Transactions are recorded in the log and during log replay a rollback to the last commit point is used to place the file system into a consistent state.

Features of JFS

JFS for Linux is a file system based on IBM's JFS file system for OS/2 Warp Server for e-business. Released as open source in early 2000 with a GPL license and ported to Linux soon after, JFS is well suited for enterprise environments. JFS uses many advanced techniques to boost performance, provide for very large file systems, and, of course, journal changes to the file system. Some of the features of JFS include:
  • Extent-based addressing structures: JFS uses extent-based addressing structures, along with aggressive block allocation policies to produce compact, efficient, and scalable structures for mapping logical offsets within files to physical addresses on disk. This feature yields excellent performance.
  • Dynamic inode allocation: JFS dynamically allocates space for disk inodes as required, freeing the space when it is no longer required. This is a radical improvement over Ext2, which reserves a fixed amount of space for disk inodes at file system creation time. With dynamic inode allocation, users do not have to estimate the maximum number of files and directories that a file system will contain. Additionally, this feature decouples disk inodes from fixed disk locations.
  • Directory organization: Two different directory organizations are provided: one is used for small directories and the other for large directories. The contents of a small directory (up to eight entries) are stored within the directory's inode. This eliminates the need for separate directory block I/O and the need to allocate separate storage. The contents of larger directories are organized in a B+ tree keyed on name. B+ trees provide faster directory lookup, insertion, and deletion capabilities when compared to traditional unsorted directory organizations.
  • Online resizing: Allows the file system to grow while it is mounted. This feature is used with a volume manager.
  • Online snapshot: Enables backing up an active file system. It provides an online backup mechanism by creating a point-in-time image of the file system. It helps eliminate the system being offline to require a consistent backup. This feature is used with a volume manager.
  • No integrity mount option: Allows the file system to not journal file system metadata changes. This feature can be used by a restore program to decrease the restore time.
  • 64-bits: JFS is a full 64-bit file system. All of the appropriate file system structure fields are 64-bits in size. This allows JFS to support large files and volumes.
There are other advanced features in JFS such as allocation groups (which speeds file access times by maximizing locality). Two additional features are extended attributes and Access Control Lists. To help understand the Access Control List feature a discussion of Linux's file permissions is a must, since Access Control Lists give a user a finer control of file permissions.

If you've spent even a little time with a Linux system, you're probably quite familiar with Linux's file permission scheme. In a nutshell, you may read, write, or execute a file (or in the case of a directory, search the directory) only if you have the proper permission. Furthermore, the traditional Linux read, write, and execute permissions are distinct, and each of those rights can be granted separately to the owner (a user) of the file, to the group that owns the file, and to other, which represents users other than the owner and users in the named group. Linux commands like chmod, chown, and chgrp affect the permissions and change the owners of files.

In general, Linux's simple permission scheme works well and is especially effective when access rights align with the users and groups on the system. But if you want to grant access rights to lists of users that do not belong to an existing group, the system fails miserably. For example, if you want to share one of your personal files, phones.txt, with every member of your group, say, staff, you can grant that access with two commands: chown staff phones.txt, and chmod g+r phones.txt. However, if you want to give read access to friends.txt to Debbie and Bo, and read access to colleagues.txt to Bo and Abby, you'd have to create two different groups with Bo in each one. (Or, perhaps it's more accurate to say that your system administrator would have to create the groups.)

More Flexibility with Fine-Grained Control

As you can see, managing permissions through "special interest groups" is terribly inconvenient, and worse, it doesn't scale. A more flexible scheme is Access Control Lists, or ACLs. Instead of capturing permissions in just a few flags, ACLs record permissions in an individual and extensible list of access rights that are attached to each file or directory. Access control rights can be assigned to a specific user, a specific group, or to multiple users or groups in any combination. In a sense, ACLs are like the "Will Call" list at the hottest restaurant in town: if you're not on the access control list, you don't get in.

Reusing the example above, if you want to give access to friends.txt to Debbie and Bo, you simply grant read access to both users. No (administrative) group is needed. Need to grant access to a third user? Simply give that user the appropriate access rights. In a sense, ACLs enhance security because ACLs can implement an access policy directly, even if the policy is different for every file on the system.

ACLs can be used to build advanced system applications like Samba, which, like its progenitor, Windows, requires ACLs. (For more information on how Samba uses ACLs, see sidebar "ACL Support in Samba.") Let's see how Extended Attributes work and how they can be used.

File Access Control Lists and Extended Attributes (EAs) are currently supported by the Ext2, Ext3, JFS, ReiserFS, and XFS file systems. You've already seen what an ACL is for; EAs are simply the underlying mechanism used to record ACLs.

An EA consists of a name/value pair, and associates arbitrary pieces of file metadata, or data about data, with a file or directory. EAs are not a part of the file's data. Instead, EAs are maintained separately and automatically managed by the file system.

More than one EA can be attached to a specific file or directory, and an EA can store system objects (such as access control lists or the capabilities of an executable) and user objects (such as the MIME type or character set of a file). Applications can define and associate extended attributes with a file object (remember, a directory is just a special file) through file system function calls.

Extended attributes can be used to store almost anything. You can maintain a file's history; categorize the contents of the file (such as text, icons, bitmaps); record the version of the file; append additional data; or do all of the above. For example, Figure 2 shows five extended attributes (Version, File Type, Additional data, Install, and History) of fileA.

With EAs in place, ACLs are relatively easy to implement. An Access Control Entry, or ACE, is an individual entry in an ACL. Each ACE is a triple defined by an entry type, either group or user; a group name, username, numeric UID, or numeric GID, depending on the value of the first field; and the access permission or right (read, write, execute) associated with the ACE. So, in the abstract, giving Debbie permission to read friends.txt means that the ACL attached to friends.txt contains an ACE (user, Debbie, read).

Currently, ACLs are the only Linux feature dependent on EAs. Other operating systems have had EAs for several years, and uses of EAs on those operating systems are broader.

ACL Support in Samba

To make Samba as portable as possible, the designers of Samba decided against a custom implementation of ACLs. Instead, each Samba server converts NT ACL specifications (sent via MS-RPC) into a POSIX ACL, and then converts that neutral ACL into an ACL that's platform-specific. A conceptual illustration of Samba's ACL subsystem is shown below.

If the Samba server's underlying file system supports ACLs, and the POSIX ACL can be converted to a native ACL, Windows users can manipulate server-side ACLs on the Samba server using the common Windows NT commands.

Samba 2.2 included support for ACLs, but up until now, Samba has had no way to store ACLs directly on the file system since there was no ACL support available for Linux. That's no longer an issue, and Samba will preserve NTFS ACLs rather than mapping ACL permissions to the less-flexible, standard Unix permissions. (Windows NT and Windows 2000 use ACLs to set permissions on files and directories. That scheme offers a much finer-grained control over permissions than the traditional "one user, one group" solution that most Unix systems use.)

Native ACL support, in combination with winbind, allows a Linux-based system to "assimilate" Windows NT users, groups, and ACL permissions. Quite an impressive solution!

Resources

  • Extended Attributes and Access Control Lists: http://acl.bestbits.at
  • JFS for Linux: http://oss.software.ibm.com/jfs
  • ReiserFS: www.namesys.com
  • XFS: http://oss.sgi.com/projects/xfs
  • Samba: http://us1.samba.org/samba/samba.html
  • More Stories By Steve Best

    Steve Best is a Senior Software Engineer in the Linux Technology Center of IBM in Austin,
    Texas. He is currently working on the Journaled File System (JFS) for
    Linux project. Steve has done extensive work in operating system
    development, with a focus in the areas of file systems,
    internationalization, and security. He can be reached at
    [email protected]

    Comments (2)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    IoT & Smart Cities Stories
    Moroccanoil®, the global leader in oil-infused beauty, is thrilled to announce the NEW Moroccanoil Color Depositing Masks, a collection of dual-benefit hair masks that deposit pure pigments while providing the treatment benefits of a deep conditioning mask. The collection consists of seven curated shades for commitment-free, beautifully-colored hair that looks and feels healthy.
    The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
    The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
    We all love the many benefits of natural plant oils, used as a deap treatment before shampooing, at home or at the beach, but is there an all-in-one solution for everyday intensive nutrition and modern styling?I am passionate about the benefits of natural extracts with tried-and-tested results, which I have used to develop my own brand (lemon for its acid ph, wheat germ for its fortifying action…). I wanted a product which combined caring and styling effects, and which could be used after shampo...
    The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
    There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
    Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
    At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
    Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
    BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.