Series catalog
- Preface
- Preparation work
- BIOS boot to real mode
- GDT and protection mode
- A Preliminary Study of Virtual Memory
- loads and enters the kernel
- display and print
- Global Descriptor Table GDT
- interrupt handling
- virtual memory perfection
- implements heap and malloc
- first kernel thread
- Multi-threaded operation and switching
- lock and multi-thread synchronization
- enter user mode
- process
- System call
- simple file system
- Load executable program
- keyboard driver
- Run shell
System call
from the implementation of the , this article will start to actually create the process, using the familiar fork
system call, so first we need to build the framework of the system call
system call
does not need to be repeated. It is the external function interface provided by the kernel for the user, and is the main way for the user to actively request to call the kernel function. Since it is from user to kernel state, it needs to be triggered by interruption. int 0x80
the classic way of Linux 32-bit system, we will also use the soft interrupt of syscall
enter 061019420ca0b2.
Since syscall
is for users, its entire implementation includes two parts:
- user part: a unified function interface, the bottom layer is to trigger an interrupt through int 0x80;
- Kernel part: similar to normal interrupt processing;
user interface
First look at the implementation of the user part. Note that this part of the code is compiled and linked into the user program, not the kernel. It will be packaged in a form similar to the standard library, and will link in when we write the user program later.
The code in this section is mainly the following files, according to the calling relationship from top to bottom:
- syscall.h and syscall.c , here is the user-level interface;
- syscall_trigger.S , this is the realization of interrupt triggering and parameter transfer;
look at the user-level interface in 161019420ca1d9 syscall.c syscall
function called directly by the user, which is similar to what we usually use in Linux:
int32 fork();
int32 exec(char* path, uint32 argc, char* argv[]);
Their bottom layer calls the trigger function provided by syscall_trigger.S, which is the place where syscall interrupts and parameters are actually triggered:
int32 fork() {
return trigger_syscall_fork();
}
int32 exec(char* path, uint32 argc, char* argv[]) {
return trigger_syscall_exec(path, argc, argv);
}
trigger_syscall_xxx
implementation of 061019420ca253 is defined in syscall_trigger.S .
syscall uses the unified int 0x80
interrupt trigger, but because there are many syscalls, each syscall has a number, for example:
SYSCALL_FORK_NUM equ 1
SYSCALL_EXEC_NUM equ 2
In addition, the difference between syscall and general interrupts is that parameters need to be passed. Therefore, according to the number of parameters, we syscall_trigger.S , such as syscall without parameters:
%macro DEFINE_SYSCALL_TRIGGER_0_PARAM 2
[GLOBAL trigger_syscall_%1]
trigger_syscall_%1:
mov eax, %2
int 0x80
ret
%endmacro
DEFINE_SYSCALL_TRIGGER_0_PARAM fork, SYSCALL_FORK_NUM
In this way, we actually get the underlying trigger function fork
[GLOBAL trigger_syscall_fork]
trigger_syscall_fork:
mov eax, SYSCALL_FORK_NUM
int 0x80
ret
syscall
essentially takes parameters. At the very least, we will use eax
save the syscall number. If syscall itself has parameters, then other registers will be used, such as ecx
, edx
, ebx
etc. Of course, these are all manually specified.
For example, exec
has three parameters:
trigger_syscall_exec:
push ebx
mov eax, %2
mov ecx, [esp + 8]
mov edx, [esp + 12]
mov ebx, [esp + 16]
int 0x80
pop ebx
ret
We use ecx
, edx
, ebx
passed in turn trigger_syscall_exec
three parameters. Note that ebx
is a push save here, because according to the x86 specification ( calling convention
), ebx
is the callee-saved
register, which needs to be saved and restored actively.
Prepare the registers and transfer parameters, and then the trigger function will use int 0x80
trigger an interrupt. This interrupt is the unified entry point for the system call, and then enters the kernel processing flow.
Kernel handles syscall
The main code of this section is the following files:
- syscall_wrapper.S is a unified entry for syscall processing;
- syscall_impl.h and syscall_impl.c are real syscall processing implementations;
Of course, before that, syscall
is an interrupt, so you must first register the handler function of the 0x80
src/interrupt/interrupt.c , the entry is syscall_entry function:
set_idt_gate(SYSCALL_INT_NUM,
(uint32)syscall_entry,
SELECTOR_K_CODE,
IDT_GATE_ATTR_DPL3);
Look at the syscall_entry function, which is basically the same as the entry function of the general interrupt, and it is also divided into two parts.
The upper part is to save the user's context, including all general registers, segment registers, etc., and then call syscall_handler enter the real syscall distribution processing.
syscall_entry:
; push dummy to match struct isr_params_t
push byte 0
push byte 0
; save common registers
pusha
; save original data segment
mov cx, ds
push ecx
; load the kernel data segment descriptor
mov cx, 0x10
mov ds, cx
mov es, cx
mov fs, cx
mov gs, cx
sti ; allow interrupt during syscall
call syscall_handler
The lower part is the return, which is similar to the interrupt return, restoring all the registers saved above. But one thing to note is that eax
cannot pop because syscall has a return value. It is eax
saves the return value of syscall_handler:
syscall_exit:
; recover the original data segment.
; Do NOT use eax because it's the syscall ret value!
pop ecx
mov ds, cx
mov es, cx
mov fs, cx
mov gs, cx
pop edi
pop esi
pop ebp
pop esp
pop ebx
pop edx
pop ecx
; skip eax because it is used as return value
; for syscall_handler
add esp, 4
; pop dummy values
add esp, 8
; pop cs, eip, eflags, user_ss, and user_esp by processor
iret
syscall_handler is the real syscall distribution processing function. It eax
, and finds the corresponding syscall processing implementation:
int32 syscall_handler(isr_params_t isr_params) {
// syscall num saved in eax.
// args list: ecx, edx, ebx, esi, edi
uint32 syscall_num = isr_params.eax;
switch (syscall_num) {
case SYSCALL_FORK_NUM:
return syscall_fork_impl();
case SYSCALL_EXEC_NUM:
return syscall_exec_impl((char*)isr_params.ecx,
isr_params.edx,
(char**)isr_params.ebx);
default: PANIC();
}
}
Note that syscall_handler
is the same as the ordinary interrupt processing function, and it also takes the data isr_params
in on the entire interrupt stack as a whole 061019420ca579 structure as a parameter:
If it is a normal interrupt, the value of the general-purpose register saved on the stack is used to save and restore the context information before the interrupt occurs; but in syscall, their role has changed, and some of them are actually used as syscall The parameters are passed, syscall_handler above and used by the processing functions of each syscall.
Recall, where are the register values used to pass parameters set? trigger_syscall_xxx
function that triggers syscall on the user side, where we assign the initial parameters when the user calls syscall to each register:
trigger_syscall_exec:
push ebx
mov eax, %2
mov ecx, [esp + 8]
mov edx, [esp + 12]
mov ebx, [esp + 16]
int 0x80
pop ebx
ret
Here we need to clarify the entire parameter transmission chain of syscall
- In the trigger part of the user side, the parameters are stored in various general-purpose registers;
- Trigger the interrupt, after entering the kernel stack, the values of these registers are pushed into the interrupt stack, encapsulated in the
isr_params
structure, and finally given to thesyscall_handler
function;
At the same time, we noticed that if the callee-saved
register is used to pass the parameters, then their values will be saved in the user stack first, such as ebx
above. This actually means that in the process of saving and restoring the user context, some registers are trigger_syscall_xxx
on the user stack, not after entering the interrupt, because the values of some registers saved on the interrupt stack will be used later For parameters passed in syscall, their values will be overwritten, so they must be saved on the user stack in advance. This is also the difference between syscall and ordinary interrupt.
The essential reason for this is that syscall
is initiated actively rather than unpredictable like a general interrupt, so it is actually more like an ordinary function call. As long as the caller (user) follows the x86 function call specification ( calling convention
), he first saves the callee-saved
on his stack, and then he can use these registers to pass parameters at will, and finally int 0x80
and enter the kernel stack handle.
Implementation of fork
All the above mentioned are syscall
, now we will implement the first syscall: fork
.
In syscall_handler , fork
is distributed to syscall_fork_impl function, the specific implementation is process_fork
function, which src/task/process.cff .
I believe you should be familiar with the use fork
int pid = fork();
if (pid > 0) {
// parent process
} else if (pid == 0) {
// child process
} else {
// fork failed
}
Unfortunately, our first system call fork is a bit more complicated. Fork will create a new child process the same as the parent process, they will all return from fork and continue to execute, the difference lies in the return value. The parent process will return the pid of the created child, and the child process will return 0.
First, the create_process
function is called to create a brand new process structure, and the corresponding fields are initialized; however, note that the child's page directory
is copied from the parent, so that they can share the virtual memory space:
pcb_t* create_process(char* name, uint8 is_kernel_process) {
pcb_t* process = (pcb_t*)kmalloc(sizeof(pcb_t));
memset(process, 0, sizeof(pcb_t));
//...
process->page_dir = clone_crt_page_dir();
}
Then came the most critical function fork_crt_thread , which is to copy the current thread. Its main function is to copy the current kernel stack, and then set the stack to look like when a new thread is started, so that the child thread will wait for a while It can be started normally like a new thread. Although it is started for the first time, it looks like it is the same as the parent, returning from the fork.
Recall the stack when the kernel thread starts:
The stack starts at kernel_esp
, pops up the general-purpose register, and then start_eip
as the entry point. Here we will child thread start_eip
set syscall_fork_exit :
thread->kernel_esp = kernel_stack + KERNEL_STACK_SIZE
- (sizeof(interrupt_stack_t) + sizeof(switch_stack_t));
switch_stack_t* switch_stack = (switch_stack_t*)thread->kernel_esp;
switch_stack->thread_entry_eip = (uint32)syscall_fork_exit;
syscall_fork_exit
This function, to be exact, the best name is syscall_fork_child_exit
, it is used for the child process return after the fork is completed, it is different from the normal syscall return in the recovery part of the general register:
pop edi
pop esi
pop ebp
; Do NOT pop old esp!
; Child process is its own stack, not parent's.
add esp, 4
pop ebx
pop edx
pop ecx
; child process returns 0.
mov eax, 0
add esp, 4
esp
and eax
have made special treatments:
esp
saved on the stack is the esp of the parent, and the child has already allocated its own stack, so skip it;eax
used asfork
, which must be 0 in child;
After running to iret
, the interrupt returns, where the CPU will restore the running state of the user thread before the syscall:
That is, the code
+ stack
information of the user thread:
- code: saved in
cs
+eip
; - stack: saved in
user_ss
+user_esp
;
This part of the information is the same as the content in the parent's stack, because the child's kernel stack is copied from the parent. This is why after the child returns to the user state, it can fork
like the parent, as if the parent mirrored a task for itself. Of course, their virtual memory space is isolated, which uses the copy-on-write
mechanism described in the previous article.
After the parent is fork_crt_thread
, it completes the finishing work of creating the child process, and then returns. The return value is the pid of the child process that was just created:
// Create new process and fork current thread.
pcb_t* process = create_process(nullptr);
tcb_t* thread = fork_crt_thread();
if (thread == nullptr) {
return -1;
}
// Bind child thread to child process.
add_process_thread(process, thread);
// Add child thread to scheduler to run.
add_thread_to_schedule(thread);
// Parent should return the new pid.
return process->id;
It can be seen that the parent returns from syscall normally, and the child's kernel stack has been modified by us, so that it runs as the thread is started for the first time, but two points need to be paid attention to:
- Its interrupt stack must be consistent with the parent, so that when the interrupt returns, the same user thread operating environment as the parent can be restored; so after the child returns to the user state, it looks like a task the same as the parent continues to run , Which is also the original intention
fork
- The return value must be 0;
Summarize
The content of this article is still a bit much. First of syscall
, the framework of 061019420ca9ca is implemented. It is necessary to distinguish the functional responsibilities of the user side and the kernel side, as well as the similarities and differences between syscall and ordinary interrupts. The most important thing here is the flow of data on register and stack. Process. On this basis, we have implemented the most challenging fork
in syscall, hope it can help you deeply understand the entry and return mechanism of syscall.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。