Series catalog
- Preface
- preparation work
- BIOS boot to real mode
- GDT and protection mode
- A preliminary study of virtual memory
- loads and enters the kernel
- Display and print
- Global Descriptor Table GDT
- interrupt handling
- virtual memory perfection
- implements heap and malloc
- First kernel thread
- Multi-threaded operation and switching
- Lock and multi-thread synchronization
- enter user mode
- process realization
- System call
- Simple file system
- Load executable program
- keyboard driver
- run shell
Multithreaded competition
Last We finally started to run multithreading, and initially established the task scheduling system scheduler
, so that this kernel finally began to show the appearance of an operating system. On the basis of multi-threads operation, we need to enter the user mode to run threads, and establish process
, and load the user executable program.
However, before this, an important and dangerous problem has come along with the operation of multi-threads, which is the problem of thread competition and synchronization. I believe you should have programming experience related to multithreading in user mode, understand the problems and reasons of competition and synchronization between threads, and the concept and use of lock
This article will examine and discuss lock and code implementation kernel
It should be noted that lock
is a huge and complex subject, and there will be many differences in the use and implementation of the lock in the kernel and in the user mode (of course there are most of the commonalities), this article is just my personal superficiality The understanding and realization of the, welcome to discuss and correct.
Lock
The data problem caused by the competition between threads, I don't think I need to explain more here. After the threads of the first two articles are started up and running, we should be able to clearly realize that interrupt
may happen at any time and at any instruction. Any non-atomic operation will cause data competition between threads ( race
).
For our current kernel, there are actually many places where lock
needs to be added to protect access to public data structures, for example:
page fault
processing function, allocates the bitmap of the physical frame, which obviously needs protection;kheap
, all threads are digging memory inside;scheduler
in a variety of task queue, asready_tasks
;- ......
In most programming languages that support multithreading, there will be lock-related concepts and tools. As a poor kernel project, we need to implement it ourselves.
lock
is a complicated subject. On the basis of safety first, the quality of the design and implementation and the way of use will greatly affect the performance of the system. Bad lock design and use may lead to unreasonable scheduling of threads and a large waste of CPU time, reducing system throughput performance.
Next we from lock
starting underlying principle, to discuss several common lock
classification and implementation, as well as their usage scenarios.
Atomic instruction operation
Logically speaking, lock
is very simple:
if (lock_hold == 0) {
// Not locked, I get the lock!
lock_hold = 1;
} else {
// Already locked :(
}
Here lock_hold
saves whether the current state is locked, the value is true / false
. The person who tries to get the lock first checks whether it is 0. If it is 0, it means that the lock has not been held by other people. Then I can get the lock and set it to 1, which means it is locked to prevent someone from getting the lock later.
However, the error of the above implementation is that the judgment of the condition of if
lock_hold = 1
below are not atomic. Both threads may be lock_hold
when 0611544ecee9aa is 0 and the other party has not been lock_hold = 1
to modify 0611544ecee9ac in the if
. Pass, get the lock together and enter.
The core problem here is lock
are not atomic, which means that they are not completed by one instruction, so the cross operation of two threads in it may cause data race.
Therefore, for any lock, the lowest level implementation must be an atomic instruction, that is, use an instruction to complete the check and change , which ensures that only one thread can pass the instruction smoothly, while blocking the others. E.g:
uint32 compare_and_exchange(volatile uint32* dst,
uint32 src);
It must be implemented by assembly:
compare_and_exchange:
mov edx, [esp + 4]
mov ecx, [esp + 8]
mov eax, 0
lock cmpxchg [edx], ecx
ret
cmpxchg
instruction is a compare and exchange
instruction, its function is to compare the first operand with the value of eax
- If they are the same, load the second operand into the first operand;
- If they are different, assign the value of the first operand to
eax
;
( cmpxchg
prefixed with lock
, so that when the instruction is executed on a multi-core CPU, the access to the memory is guaranteed to be exclusive ( exclusive
) and can be perceived by other cores ( visible
). This involves the cache coherence of the multi-core CPU. , You can skip it temporarily. For the single-core CPU used in our project experiment, the lock
is not necessary.)
In fact, this instruction realizes the atomic merging of check and modify the We use it to implement lock
, use the operand dst
to mark whether the lock has been locked, and compare it with eax = 0:
- If they are equal, then it is the first case, 0 means no lock, then assign 1 to
dst
, which means thatlock
and locked, and the return value iseax = 0
; - If not, then it means that
dst
is equal to 1, andlock
has been locked by someone else. Then it is the second case.dst = 1
value ofeax
to 0611544eceec27, the return value is eax, which has been modified to 1;
int lock_hold = 0;
void acquire_lock() {
if (compare_and_exchange(&lock_hold, 1) == 0) {
// Get lock!
} else {
// NOT get lock.
}
}
In addition to the cmpxchg
instruction, there is another way to implement it is the xchg
instruction, which I personally feel is better understood:
atomic_exchange:
mov ecx, [esp + 4]
mov eax, [esp + 8]
xchg [ecx], eax
ret
xchg
instruction has two operands, which represent the exchange of their values, and then the atomic_exchange
function will return the value of the second operand after the exchange, which is actually the old value before the exchange of the first parameter.
How to use atomic_exchange
to realize the function of lock
The same code:
int lock_hold = 0;
void acquire_lock() {
if (atomic_exchange(&lock_hold, 1) == 0) {
// Get lock!
} else {
// NOT get lock.
}
}
Trying to get a lock person, always with a (locked) this value, go and lock_hold
exchange, so atomic_exchange
function always returns lock_hold
old values, is about to lock_hold
old values of exchange and returned only when lock_hold
old value of 0 When the above judgment can be passed, it means that the lock was not held by anyone before, and then the lock was successfully obtained.
It can be seen that only one instruction is used here to complete the lock_hold
. The interesting thing is that it is changed first, and then checked, but this does not affect its correctness in the slightest.
Spinlock
The underlying implementation of the acquire
lock
was discussed above, but this is just the tip of the iceberg of lock related issues. The real complexity of lock lies in the processing after acquire fails. This is also a very important way to classify locks, which greatly affects the performance and usage scenarios of locks.
lock
to be discussed here is the spin lock ( spinlock
), which will continue to retry after the acquire fails until it succeeds:
#define LOCKED_YES 1
#define LOCKED_NO 0
void spin_lock() {
while (atomic_exchange(&lock_hold , LOCKED_YES) != LOCKED_NO) {}
}
This is a busy wait method. The current thread continues to hold the CPU and keeps retrying acquire, which is simple and rude;
First of all, it needs to be clear that spinlock
cannot be used on a single-core CPU. On a single-core CPU, only one thread is executed at the same time. If the lock cannot be obtained, such spin idling has no meaning, because the thread holding the lock cannot release the lock during this period.
However, if on a multi-core CPU, spinlock
useful. If acquire lock fails, continue to retry for a period of time, you may wait for the thread holding the lock to release the lock, because it is very likely to be running on another core at this time, and it must be within critical section
:
This critical section
very small and lock competition is not very fierce, because in this case, the probability of spin wait will not be very long. And if the current thread lock
, it may pay a higher price, which will be discussed in detail later.
However, if critical section
relatively large or the lock competition is fierce, even on multi-core CPUs, spin lock is not applicable. It is unwise to keep waiting and spin idling to waste CPU time.
yield lock
As mentioned above, spinlock
for a single-core CPU, but our kernel happens to run on a single-core CPU simulator, so we need to implement a lightweight lock similar to spinlock. I will call it yield lock
.
As the name implies, yield lock
refers to actively giving up the CPU after the acquire fails. That is to say, I can't get the lock temporarily, I will take a rest, let other threads run first, and wait for them to rotate for a full circle, then I will come back and try again.
Its behavior is essentially a spin, but different from idling in place, it does not waste any CPU time, but immediately gives the CPU to others, so that the thread holding the lock may be run and wait for the next round of time After the slice has finished running, it is likely to have released the lock:
void yield_lock() {
while (atomic_exchange(&lock_hold , LOCKED_YES) != LOCKED_NO) {
schedule_thread_yield();
}
}
Note that schedule_thread_yield
must be placed in the while loop, because even if the thread holding the lock releases the lock, it does not mean that the current thread will be able to get the lock later, because there may be other competitors, so after the yield returns , Must compete for acquire lock again;
And spinlock
Similarly, yield lock
also suitable for critical section
relatively small, competition is not very intense situation, otherwise many threads again and again waited in vain, it is a waste of CPU resources.
Blocking lock
The above two locks are both non-blocking lock , which means that thread will not block when acquire fails, but will keep retrying, or retrying after a period of time, essentially retrying. However, when critical section
relatively large or the competition for lock is fierce, constant retry is likely to be futile, which is a waste of CPU resources.
To solve this problem, there is the blocking lock ( blocking lock
), which maintains a queue internally. If the thread cannot get the lock, it will add itself to the queue to sleep and give up the CPU. During sleep, it will not It is scheduled to run again, that is, it enters the blocking state; when the thread holding the lock releases the lock, a thread will be taken out of the queue to wake up again.
For example, we define the following blocking lock, named mutex
:
struct mutex {
volatile uint32 hold;
linked_list_t waiting_task_queue;
yieldlock_t ydlock;
};
The realization of the lock:
void mutex_lock(mutex_t* mp) {
yieldlock_lock(&mp->ydlock);
while (atomic_exchange(&mp->hold, LOCKED_YES) != LOCKED_NO) {
// Add current thread to wait queue.
thread_node_t* thread_node = get_crt_thread_node();
linked_list_append(&mp->waiting_task_queue, thread_node);
// Mark this task status TASK_WAITING so that
// it will not be put into ready_tasks queue
// by scheduler.
schedule_mark_thread_block();
yieldlock_unlock(&mp->ydlock);
schedule_thread_yield();
// Waken up, and try acquire lock again.
yieldlock_lock(&mp->ydlock);
}
yieldlock_unlock(&mp->ydlock);
}
The implementation of lock here is already more complicated. This is actually the standard implementation conditional wait
The so-called conditional wait
, that is, the condition waits for , which is to wait for an expected condition to be met in a blocking manner. The expected condition to wait here is: the lock is released, I can try to obtain the lock.
After the attempt to acquire the lock failed, the current thread added itself to the waiting_task_queue
mutex, and marked itself as TASK_WAITING
, and then gave up the CPU; here gave up CPU, as in the yield lock
schedule_thread_yield
function, However, they have essential differences:
yield lock
in the thread through the yield, will still be placed inready_tasks
queue scheduler will still be behind schedule;- The thread here first marks
TASK_WAITING
, so inschedule_thread_yield
, the thread will not be added to theready_tasks
queue, so it actually enters the blocking state, and will not be scheduled again until the lock is held. Thread next timeunlock
, take it out ofwaiting_task_queue
of mutex to wake it up, and put it in theready_tasks
queue again;
void mutex_unlock(mutex_t* mp) {
yieldlock_lock(&mp->ydlock);
mp->hold = LOCKED_NO;
if (mp->waiting_task_queue.size > 0) {
// Wake up a waiting thread from queue.
thread_node_t* head = mp->waiting_task_queue.head;
linked_list_remove(&mp->waiting_task_queue, head);
// Put waken up thread back to ready_tasks queue.
add_thread_node_to_schedule(head);
}
yieldlock_unlock(&mp->ydlock);
}
The above lock
and unlock
code where there are key elements that define mutex
an internal yieldlock
. This seems strange, because mutex
is a lock. As a result, the internal data of this lock actually needs another lock to protect itself. Isn't this a doll?
In terms of implementation, mutex
already a more complex lock. It maintains a waiting queue internally, and this queue obviously needs to be protected, so there is the above doll paradox. The key point here is that the type and purpose of the two layers of locks are essentially different:
mutex
is a heavy-duty lock. It is provided for external use. Its purpose and protection object are uncertain. Generally speaking,critical section
relatively large area with fierce competition;- Internal
yield lock
is lightweight lock, purpose and object of protection is the lock is determined, it is protectedmutex
internal operations, thiscritical section
can control very small, so that the lock of the introduction is necessary and reasonable ;
The price of the internal yield lock
is that it introduces new competition, which makes the competition for threads on the mutex
However, such additional cost from mutex
is the design and use case is inevitable, in a sense also affordable and neglected: because usually think mutex
external protection of critical section
is relatively large, compared to its internal yield lock
In terms of protected areas.
Kernel and user mode lock
What has been discussed above are the principles and implementations of several locks, as well as the differences in their usage scenarios. A very important distinguishing principle is that critical section
and the intensity of competition essentially reflect the ease (or probability) of each thread trying to obtain the lock. Based on this, we can divide it into two categories. Usage:
critical section
small and the competition is not fierce, then use spin type locks (includingspinlock
andyieldlock
), which are non-blocking;critical section
very large, or the competition is fierce, then use a blocking lock;
In fact, the choice of which lock to use in the kernel state is far more than simple. It also involves the use of locks, such as interrupt context or thread context, and many other considerations, which will bring many restrictions and differences. Regarding these issues, I will try to write a separate article discussion in the future when I have time, first dig a hole.
And if it comes to the user mode, lock
and the kernel mode will be very different. The most discussed one is the choice of spinlock
As mentioned above, blocking locks are often used critical section
very large or highly competitive, because it does not cause a large amount of CPU idling, which seems to save CPU resources, but the thread in the user mode needs to fall into the kernel to enter the blocking sleep. This has a very large impact on the performance of the CPU, and it may not be as cost-effective as waiting in place to spin (provided that it is a multi-core CPU), so the considerations here and the use of locks in the kernel state will be very different.
Summarize
This article discusses the principle and realization of lock, which is limited to my own level. It is just my own superficial understanding. I hope it can be helpful to readers. Welcome to discuss and comment together. In this scroll project, performance is not our consideration for the time being. For the sake of simplicity and safety, I used yieldlock extensively as the main lock under the kernel.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。