Write OS kernel from scratch-enter user mode

Series catalog

User mode thread

In the previous articles, we have started the kernel thread and implemented multi-threads scheduling operation. Next, we need to start the user thread. After all, this OS is for users to use, and most threads in the future will also be user mode.

Here we need to clarify thread / stack . Some students may lack an intuitive understanding of this.

Each user thread will have 2 stacks, namely the stack in the user space and the stack in the kernel space;
The user thread under normal conditions runs on the user stack;
When an interrupt occurs (including interrupt / exception / soft int ), the execution flow jumps to the kernel stack of the thread to start the interrupt handler, and then returns to the user stack to resume execution after the execution is complete;

Start from the kernel thread

To be clear, the user thread does not appear in a vacuum. In essence, it still needs to start from the kernel thread, and then jump to the user's code + stack to run. So here first look at the process kernel thread started in first kernel thread in a detailed explanations. The core work here is the initialization of the kernel stack. We built a stack as shown in the figure below:

Then start to run from the resume_thread instruction, the initial position of the stack kernel_esp , initialize each general register, and jump to the function kernel_thread start running ret It can be seen that the kernel thread finally enters the kernel_thread work function to run, and always runs in the kernel stack (light blue part).

How to switch to user mode

To enter the user state from the kernel, there are two problems to be solved here:

How to jump to the user code operation, here needs to change the CPU privilege level, from 0 -> 3;
How to jump to the user stack, which is below 3GB of user space;

Regarding question 1, I need to remind that this is not something that an ordinary jmp or call command can do. Under the x86 architecture, the only way to lower the CPU privilege level is to return from the interrupt, that is, the iret instruction;

For Question 2, need to pay attention to enter the stack user space needs to be changed ss value of the register, is to point to the user's data segment (you may need to brush up segment relevant knowledge, in global descriptor table GDT initialize an over-) ；

So in essence, the initialization of the user thread is to simulate the process of returning from an interrupt. If you remember the interrupt handling , the interrupt stack structure in article 1610946a0c3669:

The interrupt stack is composed of two parts, one is the registers that the CPU automatically pushes, and the other is what we push when the interrupt handler starts. The entire interrupt stack is described by the following struct:

typedef struct isr_params {
  uint32 ds;
  uint32 edi, esi, ebp, esp, ebx, edx, ecx, eax;
  uint32 int_num;
  uint32 err_code;
  uint32 eip, cs, eflags, user_esp, user_ss;
} isr_params_t;

typedef isr_params_t interrupt_stack_t;

Here we redefine it as the interrupt_stack_t structure, which will be used to represent the interrupt stack in the future;

Pay attention to the stack part of the CPU automatic push, it has actually solved the above two problems:

The user code is stored in cs and eip ;
The location of the user stack is stored in user esp and user ss ;

iret is called, the above saved data will automatically pop , so we actually only need to build the above interrupt_stack_t structure in the kernel stack, and set these values, and return through a simulated interrupt, so that it can help us "Return" to user mode.

By convention, the main source code of this article is given, mainly the following functions:

They are called in turn. The init_thread function has been used in the article about starting the kernel thread. It is used to initialize the kernel thread, and now it is also used to further initialize the user thread. user controlled by a parameter switch 0610946a0c37e5, because the user thread It must also evolve based on kernel thread.

Prepare the kernel stack

Here first review the initialization of the kernel stack in the kernel thread:

Pay attention to the user interrupt stack dotted part above. We didn't touch this area when we started the kernel thread because the kernel thread didn't use it. Now we need to fill it with the interrupt_stack_t structure above.

Look at the initialization of this part of the init_thread

interrupt_stack_t* interrupt_stack =
    (interrupt_stack_t*)((uint32)thread->kernel_esp
                         + sizeof(switch_stack_t));
// data segemnts
interrupt_stack->ds = SELECTOR_U_DATA;
// general regs
interrupt_stack->edi = 0;
interrupt_stack->esi = 0;
...
// user-level code env
interrupt_stack->eip = (uint32)function;
interrupt_stack->cs = SELECTOR_U_CODE;
interrupt_stack->eflags =
    EFLAGS_IOPL_0 | EFLAGS_MBS | EFLAGS_IF_1;
// user stack
interrupt_stack->user_ss = SELECTOR_U_DATA;

First, the position of interrupt_stack is correctly located, that is, the position above the kernel_esp switch_stack_t structure size;

The next step is to initialize each register in the interrupt stack:

ds initialized to user space data segment ;
The general register is initialized to 0;
cs initialized to user space code segment ;
eip initialized as the work function of user thread, which is passed in when the thread is created;
eflags initialization;
user_ss also initialized to user space data segment ;
user_esp not initialized, where did it go?! It will not be initialized here, because the location of the user stack is yet to be determined, and its initialization work will be discussed later;

Start running thread

As mentioned above, the user thread also needs to start from the kernel thread, so they start in the same way. The difference is thread_entry_eip , here is the initial entry of the entire thread.

For comparison, if you are running a kernel thread, thread_entry_eip is set to kernel_thread work function; here it is set to switch_to_user_mode function, look at its code, it is very simple:

switch_to_user_mode:
  add esp, 8
  jmp interrupt_exit

Looking back at the process of the kernel thread starting to run, start from the kernel_esp on the stack, pop all general registers, and then ret pops up to jump to start_eip (ie thread_entry_eip ), here it jumps to the switch_to_user_mode function; then add esp, 8 with the esp finally properly arrived user interrupt stack position:

Then start to execute the interrupt_exit function, which is the lower half of the interrupt processing function isr_common_stub , that is, exit the interrupt and restore the context before the interrupt occurs:

interrupt_exit:
  ; recover the original data segment
  pop eax
  mov ds, ax
  mov es, ax
  mov fs, ax
  mov gs, ax

  popa
  ; clean up the pushed error code and pushed ISR number
  add esp, 8

  ; make sure interrupt is enabled
  sti
  ; pop cs, eip, eflags, user_ss, and user_esp by processor
  iret

Combining with user interrupt stack , you can see that it begins to "restore" (in fact, it is not restored, it was initialized and constructed by us) user context; it solves the two problems mentioned before, and it is essentially the thread that we have emphasized. Two core elements:

user code ( cs + eip )
user stack ( user ss + user esp )

Of course, the user context also includes the user data segment , general registers, eflags and other registers, which are all initialized here. At this point, the operating environment of the user thread is initialized.

This chapter may feel a bit messy at first glance. You need to review segment , as well as the interrupt and kernel thread initialization and startup process. Essentially, we need to figure out the role of the two stacks on the kernel stack:

switch stack : This is the stack used to run the code in the kernel mode. All the kernel code about this thread runs here, and the context switch between multi-threads also occurs here;
interrupt stack : This is the interrupt stack for the user state to enter and exit the kernel state, which is jointly constructed by the CPU and the interrupt handler when an interrupt occurs;

The initial running process of the user thread is essentially divided into two steps:

At the beginning, just like the kernel thread, initialize the environment in which the kernel thread runs;
But when it jumped to start_eip , the original kernel_thread running function was changed to switch_to_user_mode function, which began to simulate the interrupt return process, entered interrupt stack , so that it began to arrange the initialization user context, and finally through the iret instruction jump Go to the user mode (code + stack) and start running;

Prepare user stack

The kernel stack is prepared above, so that it can jump into the user code + stack for execution in the form of interrupt return. But our user stack is not ready yet, and this area also needs simple initialization.

First, you need to specify the location of the user stack. Generally speaking, it is located at the top area of the 3GB space:

The location of the user stack should be managed by the process of this thread, but we have not yet started to build process-related content. As a test, we can temporarily specify the location of the user stack at will. In the actual create_new_user_thread function, the parameter process will be passed in to specify which process this thread should be created under, and the process will allocate a stack location of user space for this user thread.

tcb_t* create_new_user_thread(pcb_t* process,
                              char* name,
                              void* user_function,
                              uint32 argc,
                              char** argv);

In this way, for multiple threads under the same process, their user stacks are roughly arranged like this, and they cannot overlap:

With the location of the user stack, we can initialize the stack. The operation of the user thread is essentially a function call, so there is nothing special about its stack initialization, which is to build a function call stack, which is mainly composed of two parts:

parameter
Return address

Next, we initialize these two parts.

Copy parameters

Parameter is create_new_user_thread function on incoming argc and argv . But we need to copy them to the user stack so that user_function can run with them as parameters. This is actually the form main

int main(int argc, char** argv);

Of course, the main function is only the main thread of the process. If you continue to create a new thread in the process, it is essentially a similar form. For example, the commonly used pthread library will involve the thread function and parameter transfer:

int pthread_create(pthread_t *thread,
                   pthread_attr_t *attr,
                   void *(*start_routine)(void *),
                   void *arg);

The process of copying parameters is mainly implemented in the function prepare_user_stack . This involves argv , which is an array of strings, so we first copy all the strings in argv to the top of the user stack, and write down their starting addresses to form an char* , and then Let argv point to this array. The relationship between the pointers is a bit convoluted, which can be seen in combination with the following figure:

thread ends and returns

In the end, there is still a ret addr not set, which is the return address after the thread ends.

Here we need to ask ourselves a question, what should be done after a thread ends? Of course, the CPU instruction flow must continue to go down. Therefore, after the thread work function returns, it must jump to a certain place, and the kernel will perform the final recycling work on it. Even if the life cycle of this thread is over, then the scheduler will schedule the next thread to run.

In fact, this issue has also been mentioned before. In the kernel thread, the reason why we need to use the function kernel_thread to encapsulate it is because the thread needs a unified exit mechanism:

void kernel_thread(thread_func* function) {
  function();
  schedule_thread_exit();
}

schedule_thread_exit is the exit mechanism after the thread ends. It does not return, but enters the kernel end and recovery process. Its main job is to release the relevant resources of the thread, and then mark its status as TASK_DEAD , and then call the scheduler scheduling service. The scheduler finds that it is TASK_DEAD will clean it up and schedule the next thread to run.

So, should we just set the thread return address on the user stack to schedule_thread_exit it will be OK? The answer is wrong.

Because schedule_thread_exit is the kernel code, it cannot be called directly in user mode, otherwise it will report segment error. code segment in the user space is limited to below 3GB, and the privilege level CPL is 3. It is impossible to call the kernel code above 3GB and the DPL is 0 (you may need to review the content of segment

So how can we enter the kernel state from the user state and finally call the schedule_thread_exit function? The answer is interrupt, or more precisely, the system call ( system call ). The content of system calls will be expanded in detail in a future article. Here you only need to know that the thread ending method in user state should be to run a function roughly like this:

void user_thread_exit() {
  // This is a system call.
  thread_exit();
}

The system call thread_exit will lead us into the kernel state, and finally come to the schedule_thread_exit function to execute the thread to end the cleanup work.

It looks perfect, but here comes a new problem. The above user_thread_exit is just our assumption, or we hope to have such a function. In fact, the user's program code is written by the user, and then loaded and run from the disk. It is impossible for the kernel to know whether there is such a function in it. Then we actually fall into a paradox, which causes the kernel to never set a valid function for the user thread. user_thread_exit ret addr

For this problem, I haven't studied it carefully, and I don't know what the standard solution is. My implementation method is to completely abandon the ret addr on the user stack, which means that the work function of the user thread will never return, but will be encapsulated in a kernel_thread , such as user_thread :

void user_thread(thread_func* function，int argc, void** argv) {
  function(argc, argv);
  thread_exit();
}

But in fact, we cannot force users to create user threads in accordance with this specification. Therefore, thread creation in user state should not allow users to directly manipulate the underlying system calls, but should be encapsulated by corresponding standardized library functions, such as pthread, Then the user calls these library functions to perform related operations on thread. function and parameters passed in by the user into a user_thread , and then call the system call provided by the OS to create a thread.

As for the main , which is also a thread, its exit mechanism is easier to solve, and it is similar. Main must also be encapsulated in a function:

void _start(int argc, void** argv) {
  main(argc, argv);
  thread_exit();
}

_start function is the upper-level encapsulation function. In fact, it is the entry function of the real user program. It should theoretically be provided by the standard library. When the C program is linked, link it in and set it as the entry of the ELF executable file. address.

Set tss

The initialization and termination of the user thread are OK, and everything seems to be ready, but there is actually one missing hole that has not been filled in. As mentioned before, if an interrupt occurs in the user state, the CPU will automatically enter the kernel stack for interrupt processing (Of course, if the user thread is never interrupted, there is no need for this, but this is impossible. The most typical clock interruption will continue. Happens, page fault is also inevitable).

This jump from user to kernel is done automatically by the CPU and is determined by the hardware. So the question is, how does the CPU know where the kernel stack of this thread is?

The answer is tss ( task state segment ). About this thing is really tedious and tedious, I don't want to repeat it here. At present, you only need to know that the structure set to gdt, and the thread's kernel stack is set to its esp0 field, so that every time the CPU falls into the kernel state, it will find the tss structure and follow the esp0 field. , Locate the position of the kernel stack.

tss initialization related code in the function write_tss
Remember that every time the thread switches, the scheduler needs to update esp0 field of tss to point to the top (high address) of the kernel stack of the new thread:

void update_tss_esp(uint32 esp) {
  tss_entry.esp0 = esp;
}

void do_context_switch() {
  // ...
  update_tss_esp(next_thread->kernel_stack + KERNEL_STACK_SIZE);
  // ...
}

Summarize

The content of this article is a bit more complicated. This is also thread . It involves a wide range of content. You may need to review segment , interrupts, and kernel thread creation + startup processes to connect them in series. After passing this level, I believe you will have a comprehensive understanding of the operating mechanism of thread in the OS, which includes the following key points:

The relationship between user and kernel thread/stack, and their respective roles;
How does the transition between user and kernel state occur and return, and how does code + stack jump;
The structure diagram of the kernel stack when the thread starts, when the interrupt occurs, handles, and returns, and the context switch, and its role;

In the next article, we will define the concept of process

Write OS kernel from scratch-enter user mode

Series catalog

User mode thread

Start from the kernel thread

How to switch to user mode

Prepare the kernel stack

Start running thread

Prepare user stack

Copy parameters

thread ends and returns

Set tss

Summarize

navi

引用和评论

大数定律

聊聊C语言和ABAP

2024 龙蜥操作系统大会参会指南抢先看

cpplint 新增自定义checker 实战

深度操作系统最新版本 deepin 23 安装指南（含全新安装、跨版本升级等）

飞腾X100适配Ubuntu说明

读鸿蒙论文，看性能优化