Write OS kernel from scratch-simple file system

Series catalog

Prepare

In the previous articles, we have established the framework of process and system calls, and have implemented the first fork system call. So far, all processes and their threads are manually created in the kernel. The working functions of threads are also fixed functions prepared in advance, which are purely for testing purposes. Of course, a real OS needs to be able to load user-supplied programs to run in process, which will use the second system call exec we want to implement.

However, before that, there is still a preparatory work to be done. Since the user program is to be loaded, of course it needs to be loaded from the disk. At present, our kernel does not have the ability to interact with the disk. This article will implement a very simple file system.

File system

The word file system ( file system ) is often ambiguous and has different meanings in different contexts, so beginners are often confused. For example, we often hear the windows system FAT , NTFS file system, Linux system EXT file system, you will sometimes hear the Linux virtual file system VFS ( Virtual File System ) and so on.

There is a saying in the computer world that any technical problem can be solved by adding an intermediate layer. The file system architecture of Linux perfectly embodies this philosophy. The various terms you have heard above are only divided into different layers under the file system

Let's take a look at the specific responsibilities of these three layers.

Virtual File System

From top to bottom, Virtual File System top is an abstract file system constructed by the Linux kernel. It can actually roughly correspond to the files and directories in the system that we usually see:

bash> ls -l /
drwxr-xr-x   2 root root  4096 Jan 13  2019 bin
drwxr-xr-x   4 root root  4096 Jan 11  2019 boot
drwxr-xr-x   3 root root  4096 Feb  3  2020 data

/usr/bin/cat
/home/foo/hello.txt

This layer is closest to the file system in the psychological concept of our users, but it is actually abstract, because you don’t know the device and storage format under these files, and as a user, you don’t need to care. VFS shields these low-level details. , So this layer is called Virtual file system. The VFS is logically a tree structure, with the root directory / at the top, and each node may be a directory (gray) or an ordinary file (green).

Storage file system

The files or directories of each node in the VFS are abstract, and they all correspond to the file entities on the specific storage devices (such as disks), which is managed by the layer below the VFS. For example, we often hear EXT2 , NTFS etc. Although they are also called file system in terms, they describe the storage and organization of files on the hardware, so its name should be called "Storage File System" ( storage system ) . Disk, like memory, the above data is not messy, they must be organized in a certain structure, so that the upper layer can parse and correctly index the desired data according to its specifications.

For example, the EXT2 file system format:

The entire storage space of EXT2 will be divided into several block group , and then the storage of files is organized inside each group, including various meta information, and the most important inode , which corresponds to each file and is used to store each file The basic meta information of the file, and the pointer points to the specific data block of the file (the blue part).

This storage system actually organizes the concept of directory hierarchy. For example, some inodes are ordinary files and some are directories. The directory will guide you to find the inodes below it. The entire disk file system is like an index of a book, telling you how to find the data of a file there.

The storage file system is generally built on what we usually call disk partitions ( partition ), such as the C drive D drive in windows, /dev/sda1 and dev/hda1 What we usually call the disk formatting is to initialize a certain partition of the disk according to a certain storage file system format, which is similar to spreading a logical structure net on the disk partition.

There are many types of storage file systems, and EXT2 is just one of them. We can even customize a file system ourselves. In this project, we will implement the simplest file system and use it to make user disk images.

`Hardware driver layer`

The next layer is the hardware IO layer, that is, the hardware driver, which directly interacts with the hardware. It has no concept of data organization and storage logic. It is purely a dull IO. For example, if you tell it, I need to read the data from position x to position y on the hard disk, or I need position w on the hard disk. What data is written to the range of position z.

`Access a file`

How is a storage file system, or disk partition, put into the tree structure of the VFS organization? In Linux, this is called mounting ( mount ). For example, for VFS at the beginning, the entire tree is empty, with only one root node / , but we usually have a system partition, such as /dev/sda1 , which is usually you The partition used for Linux installation. This partition is an EXT2 file system. It will be mount to the root directory / VFS, so that the VFS can start to query the directories and files in /

For example, the user needs to read a file:

/home/hello.txt

The system will query this directory from front to back, level by level:

/ is the root directory. It is now /dev/sda1 partition, and the partition is in the EXT2 storage format, so the system follows the format of the EXT2 system to query the node named home at the top of the partition; note that the VFS has a tree shape Structure, EXT2 actually has a tree structure, it can also be queried from top to bottom;
I found the home node at the top level of EXT2, and found that it is indeed a directory type node, no problem; then search for the hello.txt file in the home directory, if it can be found, then read it;

Here is always in accordance with the format of the EXT2 system, level by level on the /dev/sda1 partition; although the path in the VFS is an abstract concept, when the file is actually accessed, the path will be projected to the mount it is mounted on. Query in the file system of the disk partition.

The above example only mounts a single disk partition. In fact, under Linux, you can find a directory node on the VFS to mount a new disk partition. Even this partition does not need to be in EXT format, as long as the kernel can support this format. For example, we have a disk partition /dev/hda2 , which is in NTFS format (such as the D:\ disk on your dual-system windows), we will transfer it mount to the VFS node /mnt

mnt , from the perspective of VFS, it can be accessed from 0617133d9239e7 in the NTFS file system format, for example, to read this file:

/mnt/bar

When VFS accesses the mnt node, it is found that this is a mount point, and the mounted disk partition is an NTFS file system, then it will parse the next path in NTFS format-it will try to find and read /bar path on this disk partition.

`file system interface`

As mentioned above, when VFS accesses files on different nodes, it will track which disk partition it belongs to and what storage file system the partition is (such as EXT, NTFS), and then use the corresponding file system format to read the disk partition data. Here, in order to be compatible with various file systems, VFS will first define a series of unified file operation interfaces, and then various specific and different types of file systems will implement these interfaces separately. This is a typical object-oriented programming. The paradigm, for example:

class FileSystem {
  public:
    int32 read_file(const char* filename,
                    char* buffer,
                    uint32 start,
                    uint32 length) = 0;
    
    int32 write_file(const char* filename,
                     const char* buffer,
                     uint32 start,
                     uint32 length) = 0;
    
    int32 stat_file(const char* filename,
                    file_stat_t* stat) = 0;
    
    // ...
}

The above is a demonstration with C++ code (of course the kernel is written in C language, here is just to demonstrate its object-oriented programming mode), the abstract class FileSystem is defined, which defines various file operation interfaces, all of which are pure virtual functions. Various specific file systems only need to inherit and implement these interfaces, for example:

class Ext2FileSystem : public FileSystem {
  public:
    int32 read_file(const char* filename,
                    char* buffer,
                    uint32 start,
                    uint32 length) ;
    // ...
}

Again, the above is just for demonstration purposes. Of course, the interface and implementation in the real Linux VFS are not so simple, but the structure is similar.

`Code`

This project will not use a complex file system like EXT, nor will it implement complete VFS functions. It will only build its basic framework and embed a very simple storage file system customized by ourselves.

First define the file system interface, similar to the above abstract class, in the src/fs/vfs.h file:

struct file_system {
  enum fs_type type;
  disk_partition_t partition;

  // functions
  stat_file_func stat_file;
  list_dir_func list_dir;
  read_data_func read_data;
  write_data_func write_data;
};

typedef struct file_system fs_t;

You can see that the function pointers for various file operations are defined as interfaces above, and their prototypes are:

typedef int32 (*stat_file_func)(const char* filename,
                                file_stat_t* stat);

typedef int32 (*list_dir_func)(char* dir);

typedef int32 (*read_data_func)(const char* filename,
                                char* buffer,
                                uint32 start,
                                uint32 length);

typedef int32 (*write_data_func)(const char* filename,
                                 const char* buffer,
                                 uint32 start,
                                 uint32 length);

`naive_fs implementation`

We don't need to implement a complex storage file system like EXT. In this project, we only implement a very simple file system, which has very limited functions:

The disk image data is engraved in advance and can only be read but not written;
There is only one level of root directory, no subordinate directories;

The purpose of customizing this file system is one for demonstration and the other for project use. We need to use it to save user programs for loading and running, so we only need to be able to read it, and we don’t need a complicated directory structure. One layer is enough, and all files are placed on this layer. Although it is very low-level, it is still a file system, we might as naive_fs name it 0617133d923c29, because it is really naive and simple.

The storage structure of naive_fs

The green part of the header is an integer, which records the total number of files, which is also fixed;
The gray part behind is the meta information of each file;
The last blue part is the specific file data. The meta information of each file ( file offset , file size ) can be used to locate where its data is stored;

You will find that this is actually the same as the heap we implemented before. It is a very simple and straightforward meta + data structure.

I wrote a tool, in user/disk_image_writer.c , it will read user/progs directory (this directory does not currently exist, in the next article we will compile and link the user program here), and then put them According to the format of the naive_fs file system above, write them into the disk image file user_disk_image image file into our kernel disk image srcoll.img .

dd if=user/user_disk_image of=scroll.img bs=512 count=2048 seek=2057 conv=notrunc

The writing position starts from the 2057th sector of the disk, because the front is the boot loader and kernel mirroring.

Then we will implement naive_fs , which is actually the implementation of the above function pointers. The code is in src/fs/naive_fs.c :

static fs_t naive_fs;

void init_naive_fs() {
  naive_fs.type = NAIVE;

  naive_fs.stat_file = naive_fs_stat_file;
  naive_fs.read_data = naive_fs_read_file;
  naive_fs.write_data = naive_fs_write_file;
  naive_fs.list_dir = naive_fs_list_dir;
  
  // load file metas to memory.
  // ...
}

init_naive_fs function, the meta parts of all files are read and stored in the memory, which is similar to a file list, and then read write stat can operate on the files based on the meta information of these files, which is very simple.

For example, to read a file, first find the meta based on the file name, get the offset and size of the file on the disk, and then call the underlying driver to read the data:

static int32 naive_fs_read_file(char* filename,
                                char* buffer,
                                uint32 start,
                                uint32 length) {
  // Find file meta by name.
  naive_file_meta_t* file_meta = nullptr;
  for (int i = 0; i < file_num; i++) {
    naive_file_meta_t* meta = file_metas + i;
    if (strcmp(meta->filename, filename) == 0) {
      file_meta = meta;
      break;
    }
  }
  if (file_meta == nullptr) {
    return -1;
  }

  uint32 offset = file_meta->offset;
  uint32 size = file_meta->size;
  if (length > size) {
    length = size;
  }

  // Read file data from disk.
  read_hard_disk((char*)buffer, naive_fs.partition.offset + offset + start, length);
  return length;  
}

`Disk drive`

We also need to implement the bottom disk IO driver, which is naive_fs , mainly a function read_hard_disk , because we only need the function of reading the disk. read_disk similar to the boot loader in the bottom IO for the sake of simplicity, which is implemented by operating the ports of the disk management device. This is a synchronous implementation. The real operating system's processing of disk IO must be asynchronous, because the speed of the disk is very slow, the system cannot block waiting for it, but continues to process other things after issuing read and write commands, and then the disk management device uses interrupts To inform the system that the data IO is complete and the data is ready.

`Summarize`

Above we have implemented a simple VFS and file system naive_fs see how the kernel uses it to read a file, for example:

char* buffer = (char*)kmalloc(1024);
read_file("hello.txt", buffer, 0, 100);

It calls the interface of the top-level VFS, in vfs.c :

int32 read_file(char* filename, char* buffer, uint32 start, uint32 length) {
  fs_t* fs = get_fs(filename);
  return fs->read_data(filename, buffer, start, length);
}

According to the given file path filename , VFS will locate which file system it belongs to and which disk partition it corresponds to. Of course, we only mount a unique partition here, and the file system type is naive_fs , because get_fs directly returns the entity of naive_fs:

fs_t* get_fs(char* path) {
  return get_naive_fs();
}

The next step is to use the file read function interface read_data this fs to read files.

This article is File System . It is very simple and elementary, for demonstration purposes only. I hope it can help you have a comprehensive understanding of how the operating system manages files and underlying storage. Know.

Write OS kernel from scratch-simple file system

Series catalog

Prepare

File system

Virtual File System

Storage file system

`Hardware driver layer`

`Access a file`

`file system interface`

`Code`

`naive_fs implementation`

`Disk drive`

`Summarize`

navi

`引用和评论`

大数定律

rocky linux 使用记录

Visual Studio Code (VS Code) – C/C++ 入门

linux替换原有java

快捷键打开某个窗口(如网页chatGPT)

麒麟系统中theia终端崩溃问题排查小记

想从事嵌入式软件，有推荐的吗？