原文地址:Ext2文件系统分析

Introduction

Ext2文件系统分析,首先熟悉ext2文件系统,然后实现文件系统中的path resolution process,最后实现基于ext2文件系统的virtual file system.

Linux Storage Stack

Introduction

The assignment is divided into three parts. The division is meant to help you with time management while working on the assignment, and to ensure you make steady progress on it instead of leaving it to the last minute. You will submit all three parts at once.

  • In part 1, you will obtain information about an ext2 file system, based on the data contained in the first copy of the superblock and group descriptor table.
  • In part 2, you will implement a path resolution process based on the ext2 file system. You will identify the inode associated with each file in the file system, and read both the metadata and the contents of the file.
  • Finally, in part 3, you will implement a virtual file system based on an ext2 file system volume.

General Assignment Rules

  • This assignment may be done in pairs or alone. Groups of 3 or more are not permitted. You are strongly encouraged to work with a partner.
  • Your program must be written in C. C++ or any other language will not be marked and you will get 0.
  • In the top level directory, typing make must produce the executable programs ext2test and ext2fs (the provided Makefile already does this, so if you make changes to it don't break it).
  • To be marked, the program must compile and run on the department provided Linux undergraduate servers. In particular, note that parts of the provided code may not work at all in other environment, including Mac OS X. If your program does not compile by simply typing make in the assignment directory it will not be marked and 0 will be awarded. In this situation no part marks will be granted regardless of the amount of work done. If you are testing your code in your own Linux environment, you may need to install package libfuse-dev (or its equivalent in your distribution).
  • You are not permitted to modify the Makefile or change compilation options to suppress warnings, to eliminate the need for prototypes, or to change the version of the compiler used.
  • The only branch of your repo that will be marked is master. If we have to mark a different branch or a different version of the assignment a penalty of 20% will be applied.
  • As usual, the late penalty for this assignment is 33.33% per day, rounded by the minute. To allow marking to start as fast as possible, late submissions will be limited to 24 hours after the deadline, unless some previous arrangement is made with the instructor.

Setting up your repository

Just like in previous assignments we will use version control to distribute the files that you will be working on. It will also be the way that you handin your assignment.

As indicated above you will be allowed to do this assignment either working by yourself, or with a partner; teams of three or more are not allowed. Before you can access the git repo with your copy of the assignment, you will need to register your team, or yourself if you aren't working with a partner. You must do this even if you are working by yourself in this case you leave the Partner field blank. When you are working with a partner, both of you must register your team before the assignment repo will be created. For additional information on using git and the format of repo names etc., see the information supplied with assignment 1.

As you work on your code you will want to periodically "push" changes to the Stash repository. It is highly recommended that you do this frequently. You might want to push these changes every time you stop working on the code and when you get to important implementation milestones.

Details of the tasks

Part 1: File System Information

You have access to a series of images of ext2 file systems in the directory ~cs313/EXTimages, which should be available in any of the department's Linux servers. Your task it to write code to determine some basic information about each of these file systems. The following is a list of some of the information to be determined about the file system:

  • its size, in bytes.
  • its block size, in sectors.
  • its volume name.
  • the total number of blocks and inodes.
  • the number of reserved blocks.
  • the number of blocks and inodes in each block group.
  • the number of block groups.
  • for each block group, the number of free blocks and inodes.
  • other relevant information stored in the superblock.

Most of the information above can be read, or computed, from the information stored in the superblock (located 1024 bytes from the start of the image) or the group descriptor table (located in the following block) of the file system volume file. Relevant details about the ext2 file systems can be found here:

In order to simplify your implementation of this part, and clarify the format of the output of your code, a skeleton code is provided in files ext2test.c, ext2.c, and ext2.h. In this part you will need to implement the following functions in ext2.c open_volume_file, close_volume_file, and read_block (as well as any additional helper functions you may see fit). You are allowed to change the function implementations and signature, provided the information printed in ext2test.c is still printed in your final version of those files. Note, however, that the function prototypes are provided in a manner that is expected to be helpful for the last part of the assignment, so you should limit your changes to those that aim to improve clarity.

You are allowed to make some assumptions: you may assume that your code is running on a little-endian CPU, which allows you to read the integer values in the image file in its raw format. You are also allowed to assume that the file system uses version 1.0 of the ext2 specification, and that it follows the specific format described for the Linux environment, in particular regarding the use of 32-bit values for UID and GID.

To allow easier testing, your code must compile by using the make command. Your program must also take as an argument the name of the file containing the file system. This functionality is provided, so make sure you don't break it. If you would like to implement additional arguments for testing purposes, you must make sure that calling the program with a single argument still works as expected.

Part 2: Inodes and Path Resolution

The goal of this part is to be able to deal with inodes, and extract information about the associated files. More specifically, you must implement the functions read_inode, read_ind_block_entry, get_inode_block_no, read_file_block and read_file_content in ext2.c, so that they can obtain both the metadata and content of an inode.

One particular topic that is not properly explained in the reference links above is regarding sparse files. In ext2 (and other similar file systems), if the block number associated to a particular location in the file is 0 (zero), then no block is allocated for the region in question, and all data bytes for that block are considered to be zero. This allows the file system to store very large files with sparse content (e.g., a 13 GB file with only a handful of non-zero blocks) by only using blocks for regions of the file that actually contain non-zero data. This applies to both data blocks and indirect blocks.

You must also implement the function follow_directory_entries in the same file. This function is used to traverse all the entries in a directory, calling a function for each entry. The function is used in the implementation of find_file_in_directory, which is provided.

Next, you must implement function find_file_from_path, which will perform the path resolution process on the file system. In particular, this function will split a path into components (file names and directory names); for each component the function will find the directory entry for the individual name, then find the inode associated to that entry. For example, if the path is /FOO/BAR/BUZZ.TXT, the function will:

  • read the inode for the root directory;
  • visit the entries in the root directory until it finds one whose name is FOO, keeping track of the inode number (see find_file_in_directory);
  • find and read the inode for the directory FOO (see read_inode);
  • visit the entries in the content of this directory until it finds one whose name is BAR;
  • find and read the inode for the directory BAR;
  • visit the entries in the content of this directory until it finds one whose name is BUZZ.TXT;
  • find and read the inode for the file, returning this inode.

Note that the file ext2test.c already has some preliminary tests for the functions above. These tests are based on files that exist in some of the images provided above (though some of the images may not contain all of the files in the test). You should, however, extend that file to test additional cases, such as corner cases (e.g., files not in the first block of a directory) and errors (e.g., a file with a path whose directory does not exist).

Part 3: A Virtual File System

This part of the system aims to give you a better understanding of how a file system actually works. You will use the FUSE (Filesystem in Userspace) library to create a functional file system that will allow a user to read the data in the file system using tools provided by the operating system.

In a FUSE-based file system, any system calls received by the kernel related to actions in files inside the mounted file system will trigger calls to predetermined functions associated to those actions. So, for example, when a user opens a file inside the file system, the kernel will call the function associated to the open operation. When a user reads data from that file, the kernel will call the function associated to the read operation, and will use its result as the result of the read operation.

In the provided file ext2fs.c, you are provided with the skeleton of a FUSE-based file system that uses an ext2 volume file as the underlying data structure. This skeleton provides support for the following operations:

  • init: this operation is called automatically when the file system is first mounted.
  • destroy: this operation is called automatically just before the file system is un-mounted.
  • getattr: read the metadata of a file (similar to the stat command, or the stat system call).
  • readdir: lists the contents of a directory.
  • open: opens a file (similar to the open or fopen functions).
  • read: reads the contents at a specific position of a file (similar to an fseek followed by fread, or a call to pread).
  • release: releases a file (used in functions like close or fclose).
  • readlink: reads the target of a symbolic (soft) link.

In this part of the assignment you are responsible for implementing the operations above (except for init and destroy, which are already implemented). The comments provided in the file should give you enough information to implement the operations successfully, but if you require additional information, you can find them in http://libfuse.github.io/doxy... (Links to an external site.)Links to an external site..

Testing your file system

To run the file system above, you must use the following command:


./ext2fs -f -s -o use_ino mountpoint file.img

where mountpoint is the directory where you will mount your file system (details below), and file.img is the file containing the ext2 volume where the data used by the file system is obtained. The command-line option -f indicates that the file system should run in the foreground (details below). The command-line option -s is used to indicate your file system will run single-threaded, i.e., only one file operation will be called at a time. You can also use the option -d for additional debugging information.

The command-line option -o is used to set additional options. In the example above, it is used to add the use_ino option (see below). You are welcome to explore further options (such as allow_other, direct_io, max_read or nonempty; see man 8 mount.fuse for more details), but you are not required to support them, and your final implementation should not rely on these options being set.

By default, FUSE will generate inode numbers for any new file it is made aware of. The use_ino option will ask FUSE to use the inode number provided by the getattr function instead of the one generated by FUSE. Your code must support the use_ino option, i.e., it must set the st_ino field for the struct stat buffer in ext2_getattr to the inode used in the file system.

The mountpoint is the directory where you will mount your file system. Once you mount it, any access to a file or directory within the mountpoint will automatically be redirected to the file system. So, for example, if you mount a file system in /tmp/myfs, and run the command:


ls -ali /tmp/myfs

the command will request the directory listing from the kernel, and the kernel will request the directory listings (readdir operation) from the file system, with the path /. If you run the command:


cat /tmp/myfs/sp.c

the kernel will call operations getattr (to confirm file exists), open, read, and release in the file system, with the path /sp.c.

You can choose almost any empty directory to be your mountpoint. However, there are some restrictions. First of all, you can only mount one file system in each directory. If you choose a mountpoint inside your home directory in the department Linux computers, this directory must be readable/executable by everybody, and its path must be executable. So, for example, if you choose the mountpoint /home/r/r2d2/cs313/a4/mp, you can run the following commands to provide access to the directory:


chmod a+x /home/r/r2d2 /home/r/r2d2/cs313 /home/r/r2d2/cs313/a4
chmod a+rx /home/r/r2d2/cs313/a4/mp

Alternatively, you can choose a mountpoint in a folder outside of your home directory, like a folder in /tmp (not /tmp itself!!!). In this case, make sure to choose a unique name that won't conflict with other students using the same computer, like your account (e.g., /tmp/r2d2).

You can run your file system in the foreground (with the -f option) or in the background (without it). If you run it in the background, the file system will be created, and the command shell will be restored for you, so you can use the same terminal for testing. The disadvantage, though, is that you won't have access to the standard output and debugging messages. If you run it in the foreground, the program will stay running in the terminal, and will wait there for operations. You will need to open a new terminal to test your file system. It is recommended that you run your system in the foreground, as this provides more flexibility to debug your code and see it running.

After testing your program, to unmount and close the file system, you have two options. One is to hit Ctrl-C in your file system terminal, if you are running it in the foreground. The other is to run:


fusermount -u mountpoint

replacing mountpoint with the mountpoint of your file system. If your file system crashed because of a bug (e.g., segmentation fault), you must still unmount the file system using the command above before you are able to reuse the same mountpoint.

To test your file system, you can use commands like:


stat FILE
ls FILE
cat FILE
readlink FILE

If you change into the mountpoint directory using a command like cd, note that you will not be able to cleanly unmount the file system until you leave that directory.

Bonus

If you would like to try some extra activities, the following tasks will allow you to get bonus points in this assignment. Note that only complete and correct implementations of these features will be entertained, and there will be no partial marks.

  • Modify your code in part 3 so it can support multi-threading, i.e., multiple file operations at the same time. Note that this means you will need to handle race conditions for simultaneous access to the volume file in different offsets (e.g., different block numbers).
  • Implement operations that allow you to modify the ext2 volume, like writing to a file, creating a new file, moving or deleting a file, creating or deleting a directory

(本文出自csprojectedu.com,转载请注明出处)


csprojectedu
751 声望201 粉丝

Microsoft, ACMer, 现BAT全栈工程师。


引用和评论

0 条评论