2

condition variable

We introduced locks before, but locks are not the only primitive required in concurrent programming. In many cases, the thread needs to check that a certain condition (condition) is met before it will continue to run. For example, the parent thread needs to check whether the child thread has finished executing. How can this wait be realized?

Note: Concurrent programs have two major requirements, one is mutual exclusion, and the other is waiting. Mutual exclusion is because of shared data between threads, and waiting is because of dependencies between threads.

We can try to use a shared variable, as shown in the figure. This solution generally works, but it is inefficient because the main thread will spin checks and waste CPU time. We hope there is some way to let the parent thread sleep until the waiting condition is met (that is, the child thread finishes execution).

1    volatile int done = 0;
2
3    void *child(void *arg) {
4        printf("child\n");
5        done = 1;
6        return NULL;
7    }
8
9    int main(int argc, char *argv[]) {
10       printf("parent: begin\n");
11       pthread_t c;
12       Pthread_create(&c, NULL, child, NULL); // create child
13       while (done == 0)
14           ; // spin
15       printf("parent: end\n");
16       return 0;
17   }

defines and uses

Threads can use condition variables to wait for a condition to become true. condition variable is an explicit queue. When some execution state (that is, condition) is not met, the thread can add itself to the queue and wait for the condition. When other threads change the above state, they can wake up the waiting threads in the queue by sending a signal on this condition and let them continue to execute.

In the POSIX library, to declare a condition variable, just write it like this: pthread_cond_t c (note: proper initialization is also required). Condition variables have two related operations: wait() and signal(). When the thread wants to sleep, call wait(); when the thread wants to wake up the sleeping thread waiting on a certain condition variable, call signal(). Here is a typical example:

1    int done = 0;
2    pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
3    pthread_cond_t c = PTHREAD_COND_INITIALIZER;
4
5    void thr_exit() {
6        Pthread_mutex_lock(&m);
7        done = 1;
8        Pthread_cond_signal(&c);
9        Pthread_mutex_unlock(&m);
10    }
11
12    void *child(void *arg) {
13        printf("child\n");
14        thr_exit();
15        return NULL;
16   }
17
18   void thr_join() {
19       Pthread_mutex_lock(&m);
20       while (done == 0)
21           Pthread_cond_wait(&c, &m);
22       Pthread_mutex_unlock(&m);
23   }
24
25   int main(int argc, char *argv[]) {
26       printf("parent: begin\n");
27       pthread_t p;
28       Pthread_create(&p, NULL, child, NULL);
29       thr_join();
30       printf("parent: end\n");
31       return 0;
32   }

wait() call has a parameter besides the condition variable, which is a mutex lock. It assumes that the mutex is locked when wait() is called. The responsibility of wait() is to release the lock atomically and put the calling thread to sleep. When the thread is awakened, it must reacquire the lock and return to the caller. Such complicated steps are also to avoid some race conditions when the thread goes to sleep.

There are two situations to consider. The first case is that the parent thread creates a child thread, but continues to run by itself, and then immediately calls thr_join() to wait for the child thread. In this case, it will first acquire the lock, check whether the child thread is completed, and then call wait() to put itself to sleep. The child thread finally runs, prints out "child", and calls the thr_exit() function to wake up the parent thread. This code will set the state variable done after acquiring the lock, and then send a signal to the parent thread to wake it up. Finally, the parent thread runs (returns from the wait() call and holds the lock), releases the lock, and prints "parent:end".

The second case is that after the child thread is created, it runs immediately, sets the variable done to 1, calls the signal function to wake up other threads (there are no other threads here), and then ends. After the parent thread runs, when thr_join() is called, it is found that done is already 1, and it returns directly.

It should be noted that in the above code, the state variable done and the mutex lock c are both necessary. If we don't use state variables and the code looks like the following, what will happen?

1    void thr_exit() {
2        Pthread_mutex_lock(&m);
3        Pthread_cond_signal(&c);
4        Pthread_mutex_unlock(&m);
5    }
6
7    void thr_join() {
8        Pthread_mutex_lock(&m);
9        Pthread_cond_wait(&c, &m);
10       Pthread_mutex_unlock(&m);
11   }

Assume that the child thread runs immediately and calls thr_exit(). In this case, the child thread sends the signal, but at this time there is no thread that sleeps on the condition variable and waits. When the parent thread is running, it will call wait and get stuck there, and no other thread will wake it up. Through this example, you should realize the importance of the variable done, which records the value of interest to the thread. Sleep, wake up and lock are all inseparable from it.

In the following example, we assume that the thread does not lock when it signals and waits. What will happen again?

1    void thr_exit() {
2        done = 1;
3        Pthread_cond_signal(&c);
4    }
5
6    void thr_join() {
7        if (done == 0)
8            Pthread_cond_wait(&c);
9    }

The problem here is a subtle race condition. Specifically, if the parent process calls thr_join(), it checks that the value of done is 0, and then tries to sleep. But before calling wait into sleep, the parent process is interrupted. Then the child thread modifies the variable done to 1, and sends a signal. At this time, there is no waiting thread. When the parent thread runs again, it will sleep forever.

Therefore, we can adhere to this principle: When using condition variables, you must hold the lock when calling signal and wait.

Producer/Consumer Issue

Suppose there are one or more producer threads and one or more consumer threads. The producer puts the generated data item into the buffer, and the consumer takes the data item from the buffer and consumes it in some way. Many real systems will have this kind of scenario. For example, in a multi-threaded web server, a producer puts an HTTP request into a work queue, and the consumer thread takes the request from the queue and processes it.

Because the bounded buffer is a shared resource, we must access it through a synchronization mechanism to avoid race conditions. In order to better understand this problem, let's try some actual code.

First, a shared buffer is needed for the producer to put in the data and the consumer to take out the data. For simplicity, we take an integer as the buffer, and two internal functions put the value into the buffer and get the value from the buffer.

1    int buffer;
2    int count = 0; // initially, empty
3
4    void put(int value) {
5        assert(count == 0);
6        count = 1;
7        buffer = value;
8    }
9
10   int get() {
11       assert(count == 1);
12       count = 0;
13       return buffer;
14   }

The put() function will assume that the buffer is empty, store a value in the buffer, and then set count to 1 to indicate that the buffer is full. The get() function is just the opposite. After the buffer is emptied, the value is returned.

Now we need to write some functions for producing and consuming data. We call the producer thread that calls the production function, and we call the consumer thread that calls the consumer function. The following shows a pair of non-thread-safe producer and consumer code. The producer puts an integer into the shared buffer for loops, and the consumer continues to get data from the shared buffer and print out the data items. Our goal is to use condition variables to transform it into a thread-safe version.

1    void *producer(void *arg) {
2        int i;
3        int loops = (int) arg;
4        for (i = 0; i < loops; i++) {
5            put(i);
6        }
7    }
8
9    void *consumer(void *arg) {
10       int i;
11       while (1) {
12           int tmp = get();
13           printf("%d\n", tmp);
14       }
15   }
solution

Obviously, there will be a critical section in the put() and get() functions, because put() updates the buffer, and get() reads the buffer. Our first attempt is as follows:

1    cond_t cond;
2    mutex_t mutex;
3
4    void *producer(void *arg) {
5        int i;
6        for (i = 0; i < loops; i++) {
7            Pthread_mutex_lock(&mutex);              // p1
8            if (count == 1)                          // p2
9                Pthread_cond_wait(&cond, &mutex);    // p3
10           put(i);                                  // p4
11           Pthread_cond_signal(&cond);              // p5
12           Pthread_mutex_unlock(&mutex);            // p6
13       }
14   }
15
16   void *consumer(void *arg) {
17       int i;
18       for (i = 0; i < loops; i++) {
19           Pthread_mutex_lock(&mutex);              // c1
20           if (count == 0)                          // c2
21               Pthread_cond_wait(&cond, &mutex);    // c3
22           int tmp = get();                         // c4
23           Pthread_cond_signal(&cond);              // c5
24           Pthread_mutex_unlock(&mutex);            // c6
25           printf("%d\n", tmp);
26       }
27   }

When the producer wants to fill the buffer, it waits for the buffer to become empty (p1 ~ p3). Consumers have exactly the same logic, but wait for different conditions-become full (c1 ~ c3).

When there is only one producer and one consumer, the above code can run normally. But if there is more than one thread, this solution will have two serious problems.

Let's look at the first question first, which is related to waiting for the previous if statement. Suppose there are two consumers (Tc1 and Tc2) and one producer (Tp). First, a consumer (Tc1) starts execution first, it acquires the lock (c1), checks whether the buffer can be consumed (c2), and then waits (c3).

Then the producer (Tp) runs. It acquires the lock (p1), checks whether the buffer is full (p2), and adds a number (p4) to the buffer when it is not full. Then the producer sends a signal that the buffer is full (p5). The key is that this allows the first consumer (Tc1) to no longer sleep on the condition variable and enter the ready queue. The producer continues to execute until it finds that the buffer is full and sleeps (p6, p1-p3).

At this time the problem occurred: another consumer (Tc2) preemptively executed, consuming the value in the buffer. Now suppose that Tc1 is running. Before returning from wait, it acquires the lock and then returns. Then it calls get() (p4), but the buffer can no longer be consumed. The assertion fires and the code does not work as expected.

The reason for the problem is simple: after Tc1 is awakened by the producer, but before it runs, because Tc2 runs first, the state of the buffer changes. signals to threads just to wake them up, implying that the state has changed, but it does not guarantee that the state will always be the expected situation before it runs.

flawed solution: Use While instead of If

Fixing this problem is simple: change the if statement to while. When the consumer Tc1 is awakened, the shared variable (c2) is checked again immediately. If the buffer is empty at this time, the consumer will go back to sleep (c3). The corresponding if in the producer is also changed to while (p2).

1    cond_t cond;
2    mutex_t mutex;
3
4    void *producer(void *arg) {
5        int i;
6        for (i = 0; i < loops; i++) {
7            Pthread_mutex_lock(&mutex);               // p1
8            while (count == 1)                         // p2
9                Pthread_cond_wait(&cond, &mutex);      // p3
10           put(i);                                   // p4
11           Pthread_cond_signal(&cond);               // p5
12           Pthread_mutex_unlock(&mutex);             // p6
13       }
14   }
15
16   void *consumer(void *arg) {
17       int i;
18       for (i = 0; i < loops; i++) {
19           Pthread_mutex_lock(&mutex);                  // c1
20           while (count == 0)                           // c2
21               Pthread_cond_wait(&cond, &mutex);         // c3
22           int tmp = get();                             // c4
23           Pthread_cond_signal(&cond);                  // c5
24           Pthread_mutex_unlock(&mutex);             // c6
25           printf("%d\n", tmp);
26       }
27   }

We have to remember a simple rule about condition variables: always use while loops.

However, this code still has a problem, which is also one of the two problems mentioned above, which is related to the fact that we only use one condition variable.

Suppose two consumers (Tc1 and Tc2) run first and both sleep (c3). The producer starts to run, puts a value in the buffer, wakes up a consumer (assumed to be Tc1), and starts to sleep. Now a consumer is about to run (Tc1), and two threads (Tc2 and Tp) are waiting on the same condition variable.

Consumer Tc1 wakes up and returns from the wait() call (c3), rechecks the condition (c2), finds that the buffer is full, and consumes this value (c4). The consumer then signals the condition (c5) to wake up a sleeping thread. But which thread should be awakened?

Because the consumer has emptied the buffer, it is clear that the producer should be awakened. However, if it wakes up Tc2, the problem arises. Consumer Tc2 will wake up, find that the queue is empty (c2), and go back to sleep (c3). The producer Tp just put a value in the buffer and is now sleeping. The consumer Tc1 went back to sleep after continuing the execution. All 3 threads are sleeping, which is obviously a big problem.

We can see: The signal is obviously needed, but it must be more directional. Consumers should not wake up consumers, but should only wake up producers, and vice versa.

The correct solution for a single-value buffer

The solution to this problem is also very simple: uses two condition variables instead of one, so that when the system state changes, the correct type of thread can be signaled to wake up. The final code is shown below.

1    cond_t empty, fill;
2    mutex_t mutex;
3
4    void *producer(void *arg) {
5        int i;
6        for (i = 0; i < loops; i++) {
7            Pthread_mutex_lock(&mutex);
8            while (count == 1)
9                Pthread_cond_wait(&empty,  &mutex);
10           put(i);
11           Pthread_cond_signal(&fill);
12           Pthread_mutex_unlock(&mutex);
13       }
14   }
15
16   void *consumer(void *arg) {
17       int i;
18       for (i = 0; i < loops; i++) {
19           Pthread_mutex_lock(&mutex);
20           while (count == 0)
21               Pthread_cond_wait(&fill, &mutex);
22           int tmp = get();
23           Pthread_cond_signal(&empty);
24           Pthread_mutex_unlock(&mutex);
25           printf("%d\n", tmp);
26       }
27   }
Final plan

We now have a producer/consumer solution available, but it is not very general. The last modification we made was to improve concurrency and efficiency. Specifically, adds more buffer slots so that the producer can produce multiple values before sleep; similarly, consumers can consume multiple values before sleep.

In the case of a single producer and consumer, this scheme improves efficiency because of fewer context switches. When there are multiple producers and consumers, it can support concurrent production and consumption. Compared with the existing scheme, the changes are also very small.

The first modification is the buffer structure itself, and the corresponding put() and get() methods:

1    int buffer[MAX];
2    int fill = 0;
3    int use   = 0;
4    int count = 0;
5
6    void put(int value) {
7        buffer[fill] = value;
8        fill = (fill + 1) % MAX;
9        count++;
10   }
11
12   int get() {
13       int tmp = buffer[use];
14       use = (use + 1) % MAX;
15       count--;
16       return tmp;
17   }

The final code logic is shown below. So far, we have solved the producer/consumer problem.

1    cond_t empty, fill;
2    mutex_t mutex;
3
4    void *producer(void *arg) {
5        int i;
6        for (i = 0; i < loops; i++) {
7            Pthread_mutex_lock(&mutex);                 // p1
8            while (count == MAX)                        // p2
9                Pthread_cond_wait(&empty, &mutex);      // p3
10           put(i);                                     // p4
11           Pthread_cond_signal(&fill);                 // p5
12           Pthread_mutex_unlock(&mutex);               // p6
13       }
14   }
15
16   void *consumer(void *arg) {
17       int i;
18       for (i = 0; i < loops; i++) {
19           Pthread_mutex_lock(&mutex);               // c1
20           while (count == 0)                            // c2
21               Pthread_cond_wait(&fill, &mutex);     // c3
22           int tmp = get();                              // c4
23           Pthread_cond_signal(&empty);              // c5
24           Pthread_mutex_unlock(&mutex);             // c6
25           printf("%d\n", tmp);
26       }
27   }

Cover condition

Now let's look at an example of condition variables. This code is a fragment of the problem in a simple multi-threaded memory allocation library:

1    // how many bytes of the heap are free?
2    int bytesLeft = MAX_HEAP_SIZE;
3
4    // need lock and condition too
5    cond_t c;
6    mutex_t m;
7
8    void *allocate(int size) {
9        Pthread_mutex_lock(&m);
10       while (bytesLeft < size)
11           Pthread_cond_wait(&c, &m);
12       void *ptr = ...; // get mem from heap
13       bytesLeft -= size;
14       Pthread_mutex_unlock(&m);
15       return ptr;
16   }
17
18   void free(void *ptr, int size) {
19       Pthread_mutex_lock(&m);
20       bytesLeft += size;
21       Pthread_cond_signal(&c); // whom to signal??
22       Pthread_mutex_unlock(&m);
23   }

It can be seen from the code that when a thread call enters the memory allocation code, it may wait because of insufficient memory. Correspondingly, when the thread releases the memory, it will signal that more memory is free. However, there is a problem in the code: which waiting thread should be awakened (there may be multiple threads)?

The solution is also very straightforward: uses pthread_cond_broadcast() instead of pthread_cond_signal() in the above code to wake up all waiting threads . Doing so ensures that all threads that should be awakened are awakened. Of course, the downside is that it may affect performance, because many other threads that should not be awakened are unnecessarily awakened. After these threads are awakened, they recheck the conditions and immediately go to sleep again.

called a covering condition because it can cover all scenarios that need to wake up the thread (conservative strategy). Generally speaking, if you find that the program can only work when it is changed to a broadcast signal, it may be that the program is defective. But in some scenarios, like the memory allocation example above, broadcasting may be the most direct and effective solution.

semaphore

Semaphore was invented by Dijkstra and his colleagues. As the only primitive for all work related to synchronization, semaphores can be used as locks and condition variables.

definition

semaphore is an object with an integer value, and two functions can be used to manipulate it. In the POSIX standard, these are sem_wait() and sem_post(). Because the initial value of the semaphore can determine its behavior, it is necessary to initialize the semaphore first before calling other functions to interact with it.

#include <semaphore.h>
sem_t s;
sem_init(&s, 0, 1);

A semaphore s is declared, and its value is initialized to 1 through the third parameter. The second parameter of sem_init() is set to 0 in all our examples, indicating that the semaphore is shared by multiple threads in the same process. After the semaphore is initialized, we can call sem_wait() or sem_post() to interact with it.

sem_wait() atomically decrements the value of the semaphore by one, and returns immediately when the value of the semaphore is greater than or equal to 1, otherwise the calling thread will be put into the queue associated with the semaphore and wait to be awakened. sem_post() performs an atomic addition operation to the value of the semaphore. It does not need to wait for certain conditions to be met, and directly increases the value of the semaphore. If there is a waiting thread, it wakes up one of them. When the value of the semaphore is negative, this value is the number of waiting threads.

Binary semaphore (lock)

The first use of semaphores is what we are already familiar with: using semaphores as locks. In the following code snippet, we directly surround the critical section with a pair of sem_wait()/sem_post(). In order for this code to work properly, the initial value X of the semaphore m is crucial. What should X be?

sem_t m;
sem_init(&m, 0, X); // initialize semaphore to X; what should X be?

sem_wait(&m);
// critical section here
sem_post(&m);

Recalling the definition of the sem_wait() and sem_post() functions, we find that the initial value should be 1.

We assume a scenario with two threads. The first thread (thread 1) calls sem_wait(), which reduces the value of the semaphore to 0. Because the value is 0, thread 1 returns from the function and enters the critical section. If no other thread tries to acquire the lock, when it calls sem_post(), it will reset the semaphore to 1 (because there is no waiting thread, other threads will not be awakened).

If thread 1 holds the lock, another thread (thread 2) calls sem_wait() to try to enter the critical section. In this case, thread 2 reduces the semaphore to −1 and then waits. Thread 1 runs again, it finally calls sem_post(), increases the value of the semaphore to 0, wakes up the waiting thread, and then thread 2 can acquire the lock. When the execution of thread 2 ends, the value of the semaphore is increased again to restore it to 1.

Because the lock has only two states (hold and not held), this usage is sometimes called a binary semaphore.

Semaphore used as a condition variable

Here is a simple example. Suppose one thread creates another thread and waits for it to end. What should be the initial value X of the semaphore?

1    sem_t s;
2
3    void *
4    child(void *arg) {
5        printf("child\n");
6        sem_post(&s); // signal here: child is done
7        return NULL;
8    }
9
10   int
11   main(int argc, char *argv[]) {
12       sem_init(&s, 0, X); // what should X be?
13       printf("parent: begin\n");
14       pthread_t c;
15       Pthread_create(c, NULL, child, NULL);
16       sem_wait(&s); // wait here for child
17       printf("parent: end\n");
18       return 0;
19   }

There are two situations to consider. In the first type, the parent thread creates the child thread, but the child thread is not running. In this case, the parent thread calls sem_wait() before the child thread calls sem_post(). We want the parent thread to wait for the child thread to run, the only way is to make the value of the semaphore not greater than 0. Therefore, the initial value is 0. When the parent thread runs, the semaphore is reduced to −1, and then sleeps and waits; when the child thread is running, sem_post() is called, the semaphore is increased to 0, the parent thread is awakened, and the parent thread returns from sem_wait() to complete the program.

The second case is that the child thread ends before the parent thread calls sem_wait(). In this case, the child thread will call sem_post() first to increase the semaphore from 0 to 1. Then when the parent thread has a chance to run, it will call sem_wait() and find that the value of the semaphore is 1. So the parent thread reduces the semaphore from 1 to 0 without waiting, and returns directly from sem_wait(), which also achieves the desired effect.

Producer/Consumer Issue

Here, we discuss how to use semaphores to solve the producer/consumer mentioned above, that is, the bounded buffer problem. The encapsulated put() and get() functions are as follows:

1    int buffer[MAX];
2    int fill = 0;
3    int use = 0;
4
5    void put(int value) {
6        buffer[fill] = value;    // line f1
7        fill = (fill + 1) % MAX; // line f2
8    }
9
10   int get() {
11       int tmp = buffer[use];    // line g1
12       use = (use + 1) % MAX;    // line g2
13       return tmp;
14   }
first try

We use two semaphores empty and full to indicate that the buffer is empty or full. Below is the code we try to solve the producer/consumer problem.

1    sem_t empty;
2    sem_t full;
3
4    void *producer(void *arg) {
5        int i;
6        for (i = 0; i < loops; i++) {
7            sem_wait(&empty);             // line P1
8            put(i);                       // line P2
9            sem_post(&full);              // line P3
10       }
11   }
12
13   void *consumer(void *arg) {
14       int i, tmp = 0;
15       while (tmp != -1) {
16           sem_wait(&full);            // line C1
17           tmp = get();                // line C2
18           sem_post(&empty);            // line C3
19           printf("%d\n", tmp);
20       }
21   }
22
23   int main(int argc, char *argv[]) {
24       // ...
25       sem_init(&empty, 0, MAX); // MAX buffers are empty to begin with...
26       sem_init(&full, 0, 0);    // ... and 0 are full
27       // ...
28   }

Let us first assume MAX=1 to verify that the program is valid. Suppose there are two threads, a producer and a consumer. Let's look at the specific scenario on a CPU. The consumer runs first, executes to line C1, and calls sem_wait(&full). Because the initial value of full is 0, the wait call will reduce the full to −1, causing the consumer to sleep and wait for another thread to call sem_post(&full), which is in line with expectations.

Suppose the producer then runs. Execute to line P1 and call sem_wait(&empty). The producer will continue execution because empty is initialized to MAX (here it is 1). Therefore, empty is reduced to 0, the producer adds data to the buffer, and then executes line P3, calls sem_post(&full), changes full from −1 to 0, and wakes up the consumer.

In this case, there may be two situations. If the producer continues to execute and loops to line P1 again, it will block because the empty value is 0. If the producer is interrupted and the consumer starts to execute, call sem_wait(&full), find that the buffer is indeed full, and consume it. Both of these situations are in line with expectations.

You can continue to derive. When MAX=1, even if there are multiple producers and consumers, this sample code still runs normally.

We now assume that MAX is greater than 1, and that there are multiple producers and multiple consumers. Then there is a problem: race conditions. Suppose two producers (Pa and Pb) call put() almost simultaneously. When Pa runs first, add the first piece of data in line f1 (fill=0). Assuming Pa is interrupted before updating the fill counter to 1, Pb starts to run, and also adds a piece of data to position 0 of the buffer in line f1. This means that the data there is overwritten, which also means that the producer's data is lost.

increase mutual exclusion

It can be seen that adding elements to the buffer and increasing the index of the buffer are critical areas and need to be carefully protected. Therefore, we use binary semaphores as locks for mutual exclusion. Below is the corresponding code.

1    sem_t empty;
2    sem_t full;
3    sem_t mutex;
4
5    void *producer(void *arg) {
6        int i;
7        for (i = 0; i < loops; i++) {
8            sem_wait(&mutex);           // line p0 (NEW LINE)
9            sem_wait(&empty);           // line p1
10           put(i);                     // line p2
11           sem_post(&full);            // line p3
12           sem_post(&mutex);           // line p4 (NEW LINE)
13       }
14   }
15
16   void *consumer(void *arg) {
17       int i;
18       for (i = 0; i < loops; i++) {
19           sem_wait(&mutex);           // line c0 (NEW LINE)
20           sem_wait(&full);            // line c1
21           int tmp = get();            // line c2
22           sem_post(&empty);           // line c3
23           sem_post(&mutex);           // line c4 (NEW LINE)
24           printf("%d\n", tmp);
25       }
26   }
27
28   int main(int argc, char *argv[]) {
29       // ...
30       sem_init(&empty, 0, MAX); // MAX buffers are empty to begin with...
31       sem_init(&full, 0, 0);    // ... and 0 are full
32       sem_init(&mutex, 0, 1);   // mutex=1 because it is a lock (NEW LINE)
33       // ...
34   }

Now we have added a lock to the entire put()/get() part, that is, a few lines with NEW LINE in the comment. This seems to be the correct idea, but there is still a problem-deadlock.

Suppose there are two threads, a producer and a consumer. The consumer runs first, acquires the lock, and then executes sem_wait() on the full semaphore. Because there is no data yet, the consumer blocks and gives up the CPU. However, the problem is that the consumer still holds the lock at this time. Then the producer runs, it first calls sem_wait() on the binary mutex semaphore. The lock is already held by the consumer, so the producer is also stuck.

There is a loop waiting here. consumer holds the mutex and waits on the full semaphore. The producer can send a full signal, but is waiting for the mutex. Therefore, producers and consumers wait for each other-a typical deadlock.

Final plan

To solve this problem, you only need to reduce the scope of the lock. The following is the final feasible solution. As you can see, we adjusted the operation of acquiring and releasing the mutex to be next to the critical section, and adjusted the full and empty wake-up and wait operations to outside the lock. You get a simple and effective bounded buffer, a common pattern for multithreaded programs.

1    sem_t empty;
2    sem_t full;
3    sem_t mutex;
4
5    void *producer(void *arg) {
6        int i;
7        for (i = 0; i < loops; i++) {
8            sem_wait(&empty);            // line p1
9            sem_wait(&mutex);            // line p1.5 (MOVED MUTEX HERE...)
10           put(i);                      // line p2
11           sem_post(&mutex);            // line p2.5 (... AND HERE)
12           sem_post(&full);             // line p3
13       }
14   }
15
16   void *consumer(void *arg) {
17       int i;
18       for (i = 0; i < loops; i++) {
19           sem_wait(&full);             // line c1
20           sem_wait(&mutex);            // line c1.5 (MOVED MUTEX HERE...)
21           int tmp = get();             // line c2
22           sem_post(&mutex);            // line c2.5 (... AND HERE)
23           sem_post(&empty);            // line c3
24           printf("%d\n", tmp);
25       }
26   }
27
28   int main(int argc, char *argv[]) {
29       // ...
30       sem_init(&empty, 0, MAX);  // MAX buffers are empty to begin with...
31       sem_init(&full, 0, 0);     // ... and 0 are full
32       sem_init(&mutex, 0, 1);    // mutex=1 because it is a lock
33       // ...
34   }

Reader-Writer Lock

Another classic problem stems from the desire for more flexible locking primitives, which recognize that different data structure access may require different types of locks. For example, a concurrent linked list has many insert and lookup operations. The insert operation will modify the state of the linked list, and the search operation just reads the structure. As long as there is no insert operation, we can perform multiple search operations concurrently. Reader-writer lock (reader-writer lock) is used to complete this operation. Below is the code for this lock.

1    typedef struct _rwlock_t {
2      sem_t lock;      // binary semaphore (basic lock)
3      sem_t writelock; // used to allow ONE writer or MANY readers
4      int    readers;  // count of readers reading in critical section
5    } rwlock_t;
6
7    void rwlock_init(rwlock_t *rw) {
8      rw->readers = 0;
9      sem_init(&rw->lock, 0, 1);
10     sem_init(&rw->writelock, 0, 1);
11   }
12
13   void rwlock_acquire_readlock(rwlock_t *rw) {
14     sem_wait(&rw->lock);
15     rw->readers++;
16     if (rw->readers == 1)
17       sem_wait(&rw->writelock); // first reader acquires writelock
18     sem_post(&rw->lock);
19   }
20
21   void rwlock_release_readlock(rwlock_t *rw) {
22     sem_wait(&rw->lock);
23     rw->readers--;
24     if (rw->readers == 0)
25       sem_post(&rw->writelock); // last reader releases writelock
26     sem_post(&rw->lock);
27   }
28
29   void rwlock_acquire_writelock(rwlock_t *rw) {
30     sem_wait(&rw->writelock);
31   }
32
33   void rwlock_release_writelock(rwlock_t *rw) {
34     sem_post(&rw->writelock);
35   }

If a thread wants to update the data structure, it needs to call rwlock_acquire_writelock() to obtain the write lock, and call rwlock_release_writelock() to release the write lock. Internally, a writelock semaphore ensures that only one writer can obtain the lock to enter the critical section, thereby updating the data structure.

When acquiring a read lock, the reader must first acquire the lock, and then increase the reader variable to track how many readers are currently accessing the data structure. When the first reader acquires a read lock, it also acquires a write lock, that is, call sem_wait() on the writelock semaphore, and finally call sem_post() to release the lock.

Once a reader acquires the read lock, other readers can also acquire the read lock. However, a thread that wants to acquire a write lock must wait until all readers are over. The last reader to exit calls sem_post() on the writelock semaphore so that the waiting writer can acquire the lock.

This scheme is feasible, but there are some shortcomings, especially fairness, readers can easily starve writers. There are more complicated solutions, such as avoiding more readers to enter and hold the lock when there are writers waiting. Finally, reader-writer locks usually add more lock operations, so compared with other simple and fast locks, reader-writer locks have no performance advantage.

how to realize the semaphore

Finally, we use the underlying synchronization primitive locks and condition variables to implement our own semaphore, named Zemaphore.

1    typedef struct  _Zem_t {
2        int value;
3        pthread_cond_t cond;
4        pthread_mutex_t lock;
5    } Zem_t;
6
7    // only one thread can call this
8    void Zem_init(Zem_t *s, int value) {
9        s->value = value;
10       Cond_init(&s->cond);
11       Mutex_init(&s->lock);
12   }
13
14   void Zem_wait(Zem_t *s) {
15       Mutex_lock(&s->lock);
16       while (s->value <= 0)
17           Cond_wait(&s->cond, &s->lock);
18       s->value--;
19       Mutex_unlock(&s->lock);
20   }
21
22   void Zem_post(Zem_t *s) {
23       Mutex_lock(&s->lock);
24       s->value++;
25       Cond_signal(&s->cond);
26       Mutex_unlock(&s->lock);
27   }

There is a slight difference between the semaphore we implemented and the semaphore defined by Dijkstra, that is, we did not keep the value of the semaphore as a negative number and let it reflect the number of threads waiting. In fact, the value will never be less than 0. This behavior is easier to implement and conforms to the existing Linux implementation.


与昊
225 声望636 粉丝

IT民工,主要从事web方向,喜欢研究技术和投资之道