Completely Fair Scheduler的实现

原文地址:Completely Fair Scheduler的实现


基于XINU操作系统,实现一个Completely Fair Scheduler(CFS),并且进行性能分析。



The objectives of this lab are to instrument XINU to monitor system performance and implement a simplified version of Linux CFS in XINU.


Chapter 5 from the XINU textbook.

Instrumenting kernels: monitoring system uptime, process lifetime and CPU usage

Fine resolution system uptime

XINU measures in unit of second the time elapsed since a backend was bootloaded (i.e., system uptime). It does so by updating the global variable clktime in clkhandler(), keeping in mind that clock interrupt handling in XINU's lower half runs every 1 millisecond. Declare a new global variable, uint32 clktimefine, in the same file where clktime is declared and initialize it to 0. Update clktimefine in clkhandler() so that it maintains system uptime in unit of millisecond.

Process lifetime

Add a new process table field, uint32 prbdate, that specifies the time when a process is created (i.e., "birthday") using clktimefine in 3.1. Modify create() accordingly. Create a new system call, syscall procage(pid32 pid), which returns the age of a process given by argument pid in unit of millisecond. procage() returns SYSERR if the app programmer is using it incorrectly. As in Problem 4 of lab2, procage() should follow the default format of XINU system calls. Place the code in procage.c under system/. Note that procage() whose return value is of type syscall will not return the correct age of a process if it is sufficiently old since prbdate is of type uint32. See the discussion in 3.3. For our testing purposes, this won't matter since test runs are limited to around 30 seconds.

Process CPU usage

For a number of reasons including scheduling and accounting, it is important for a kernel to monitor how much CPU time a process has consumed during its lifetime. Add a new process table field, uint32 prcputime, initialized to 0 when a process is created that tracks its CPU usage. Create a new system call, syscall proccputime(pid32 pid), following the default format of XINU system calls, that returns CPU usage of a process given by pid and SYSERR when there is an error. Note that the return type of proccputime() is declared as syscall (i.e., int32), hence cannot return the full range of prcputime which is of type uint32 (i.e., unsigned int32). This is the same situation for system calls in Linux/UNIX and Windows where there is no uniform return type. For example, for some system calls such as getpid(), there is no need to consider returning an error since it can be designed to never fail. For other system calls where they can fail (negative return value) and the range of return value is insufficient (e.g., unsigned int), pointers must be used as arguments. Put proccputime.c in system/.

There are two issues to consider when implementing CPU usage tracking in prcputime and querying CPU usage through proccputime(). First, when CPU usage of a process that is not current (i.e., is not running on the CPU) is queried either by calling proccputtime() (user app) or direct inspection of prcputime in the process table (kernel code), the value in prcputime will specify accurate CPU usage in msec. However, if CPU usage is queried of the current process, the value returned should be prcputime + (QUANTUM - preempt). That is, since XINU sets the global time slice variable preempt to QUANTUM before a process is allocated CPU cycles, QUANTUM - preempt specifies CPU time consumed by the current process since it has been context-switched in. There is no need to add overhead to clkhandler() which runs every 1 msec by updating prcputime of the current process.

Second, in our design where unnecessary operations in the lower half of XINU are avoided, prcputime does need to be updated when the current process is context-switched out at which time prcputime must hold the accurate CPU usage value. Based on the control flow of the XINU kernel during scheduling, make kernel modifications so that the aforementioned update is performed. Explain in Lab3Answers.pdf where you chose to make the modification and why.

Dynamic priority scheduling

Objective of fair scheduling

The default XINU scheduler is a static (or fixed) priority scheduler that uses priorities specified in the create() system call and never changes them. XINU's process scheduler, resched(), when invoked, picks a highest priority ready process to run next. If there is a tie round-robin scheduling is implemented among the group of highest priority processes. In computing systems where multiple processes share CPU resources, "fair" allocation of CPU cycles is a widely accepted goal. Giving an app programmer direct control over process priority through create() cannot be allowed. The underlying substrate of a scheduler -- picking a highest priority ready process and breaking ties in a round-robin matter -- remains unchanged. What does change is that a process's priority becomes dynamic, i.e., changes over time, so as to facilitate equitable sharing of CPU cycles.

Linux's scheduler was completely revamped several years back to implement a form of "fair" scheduling that was more characteristic of packet scheduling at routers in computer networks and its underlying theory. Instead of the previous approach, still followed by UNIX and Windows operating systems, Linux adopted process CPU usage as a prime metric for deciding which ready process should run next. The basic idea is to allocate CPU cycles to a ready process that has received the least CPU usage -- a form of load balancing -- so that CPU usage across processes over time evens out. To do so, process CPU usage is monitored (Problem 3.3) and when a scheduler is invoked by a kernel's upper (system call) or lower half (interrupt), a process of least CPU usage is selected to become current. Equivalently, a process's CPU usage is interpreted as its priority with smaller numbers (i.e., smaller CPU usage) meaning higher priority.

XINU compatibility

To implement fair scheduling in XINU where CPU usage is viewed as priority, its ready list -- a priority queue -- would have to be modified so that a process of least CPU usage is at the front of the list. Alternatively, using real numbers (i.e., floating-point representation) we could set priority as 1 / CPU usage which would preserve XINU's notion of priority: larger value means higher priority. For our purposes where we aim to, one, reduce volume of coding to focus on quality of coding, two, not use floating-point representations when possible (recall our discussion of router operating systems), and three, be backward-compatible with legacy XINU to the extent feasible, we will implement a third approach that requires least change. We will define the priority of a process as 30000 - CPU usage. 30000 msec, i.e., 30 seconds will suffice as an upper bound on tests and performance evaluation in lab assignments. As such, a process will not incur more than 30 seconds of CPU usage during testing. Thus a process with a smaller CPU usage than another will have a correspondingly higher priority value which allows us to use XINU's ready list as is.

Another simplification is not to use a data structure such as a heap with logarithmic insert/extract time complexity. Instead we will continue to use XINU's priority queue which has linear insert overhead. The reason is three-fold: one, you will know from data structures how to code heaps and balanced search trees, two, we will not be testing with hundreds and thousands of processes where difference in overhead will surface as a dominant performance factor, and three, kernel support for dynamic memory allocation which is needed for managing heaps will be discussed later in the course under memory management.

Linux CFS scheduler

Linux's current scheduler, CFS (completely fair scheduler), was introduced in 2007 and implements a specific form of fair scheduling. Fair scheduling based on process CPU usage as outlined above cannot be used in real-world operating environments where processes come and go. For example, suppose a process is newly created (hence CPU usage is 0), however, processes that were created an hour ago are still running which have tallied up significant CPU usage. For an existing process with 15 minutes of CPU usage, the newly created process would need at least 15 minutes of CPU time to catch up, which means that during this time the older process (maybe a web browser or mail client) would be completely starved of CPU cycles. This, of course, is unacceptable.

Another important consideration in real-world operating environments is the need to provide interactive applications with fast response times. For example, applications with user interfaces (UIs) that perform frequent I/O operations (e.g., shells, chat apps, some gaming apps) may not consume significant CPU time but require fast response times to affect desirable user experience. Thus an interactive process that is blocking on a message from a server/peer to arrive should be assigned a high priority when it does become ready so that it is allocated CPU cycles in a timely manner.

Linux CFS addresses the first concern by (i) setting the initial CPU usage of a newly created process not to 0 but the maximum CPU usage across all ready processes. That is, a newly created process is assigned the lowest priority. The second concern is addressed by (ii) assigning an interactive process that becomes ready a new CPU usage value that is the minimum CPU usage across all ready processes. That is, an interactive process that was blocked but just became unblocked (i.e., ready) begets the highest priority. To do so, the kernel needs to distinguish I/O-bound (includes interactive) from CPU-bound processes. Linux uses the criterion that a process that blocks, i.e., makes a blocking system call, is considered I/O-bound. Linux (and similarly UNIX and Windows) does not consider a process's entire history to assess whether it is I/O- or CPU-bound. A process's last action -- did it make a blocking system call or was it involuntarily switched out by the kernel because it depleted its time slice -- is the sole criterion. For XINU, where we have not yet discussed blocking system calls stemming from inter-process communication and device I/O, we will consider sleepms() as the sole system call through which test apps block, i.e., voluntarily relinquish the CPU despite having time left in preempt.

When incorporating mechanisms (i) and (ii) into XINU, we cannot use the process CPU usage time field, prcputime, introduced in Problem 3.3 since the two mechanism change prcputime. That is, prcputime ceases to be a variable that accurately tracks CPU usage of a process. We will introduce a second process table field, uint32 prvcputime, which undergoes the same updates as prcputime and, in addition, is subject to updates by (i) and (ii). We call this cousin of prcputime, virtual CPU usage, hence the letter "v" in its name. XINU's scheduler uses prvcputime when making scheduling decisions, i.e., priority becomes 30000 - prvcputime, and leaves prcputime as a pure accounting variable.

XINU fair scheduler implementation

Following the features described in 4.1, 4.2 and 4.3 which captures the gist of Linux CFS, implement fair scheduling in XINU that dynamically adjusts process priority over time so as to provide equitable sharing of CPU cycles while promoting responsiveness of I/O-bound applications. Set the time slice parameter QUANTUM to 25 msec. Note that in fair scheduling a fixed time slice is not needed. For example, if the highest priority process has CPU usage 100 msec and the second highest process has CPU usage 180 msec, then for scheduling purposes a time slice of 80 msec will suffice since that amount of time is required for the highest priority process to catch to the second highest priority process. There is no need to call the scheduler earlier than that from the clock interrupt handler (unless an even higher priority process wakes up) which adds unnecessary overhead. For simplicity we will fix time slice at 25 msec.

When a new process is spawned using create(), ignore the third argument specifying the process priority and follow mechanism (i) to set the process's initial priority. XINU's null process (PID 0) must be treated separately so that its priority always remains 0 and strictly less than the priority of all other processes in the system. Thus the null process does not undergo dynamic priority changes. Explain in Lab3Answers.pdf how you go about assuring that. When coding your XINU fair scheduler, make sure to clearly specify where in your code you have made changes (note your initial) along with brief comments on what and why. Code that is inadequately annotated will result in point deductions.

Evaluation of XINU fair scheduler

Coding and testing individual components in Problem 4 needs to be augmented by system-wide testing and performance evaluation to gauge correctness of your implementation.

Test applications

Code two test applications, void appcpu(void), that serves as a CPU-bound app and, void appio(void), that acts as an I/O-bound app. The code of appcpu() (place in system/appcpu.c) works as follows:

int x, y;
x = 0;
y = clktimefine;
while(clktimefine - y < 29000)
kprintf("cpu-bound: %d %d %d\n", currpid, x, proctab[currpid].prcputime);

Since clktimefine from Problem 3.1 is updated by XINU's clock interrupt handler, appcpu() keeps performing ALU operations until 29 seconds have elapsed at which time it prints its PID, counter x, CPU usage, and terminates.

Note: From a C programming perspective, note that y is assigned clktimefine which is a global variable that is asynchronously updated by the clock interrupt handler every 1 msec (unless interrupts are disabled). gcc, by default, performs a number of optimizations so that the machine code generated is efficient. Sometimes gcc tries too hard which can lead to code that is incorrect. One example is a C variable that is asynchronously updated by interrupt handlers. The qualifier volatile is used to inform the C compiler not to engage in optimizations that may result in unintended behavior. Carefully consider if this is required in the test applications or any other code in lab3 where clktimefine comes into play.

The I/O-bound application, appio() (in system/appio.c), works as follows:

int x, y;
x = 0;
y = clktimefine;
while(clktimefine - y < 29000) {
kprintf("io-bound: %d %d %d\n", currpid, x, proctab[currpid].prcputime);

We expect the value of x and CPU usage of appio() to be significantly smaller than appcpu().

Test scenario A

Create 5 CPU-bound processes from main() back-to-back. If your scheduler is implemented correctly, we would expect to see the 5 processes printing similar CPU usage and x values. Repeat the benchmark tests two more times and inspect your results. Discuss your findings in Lab3Answers.pdf.

Test scenario B

Create 5 I/O-bound processes from main() and perform the same benchmark tests as in Problem 5.2. Discuss your findings in Lab3Answers.pdf.

Test scenario C

Create 5 CPU-bound processes and 5 I/O-bound processes. We would expect the 5 CPU-bound processes to output similar x values and CPU usage with respect to each other, and the same goes for the 5 I/O-bound processes. Between the two groups, however, we would expect CPU-bound processes to receive significantly more CPU time than I/O-bound processes. Discuss your findings in Lab3Answers.pdf.

Test scenario D

A variant of test scenario A, create 5 CPU-bound processes in sequence with 5 second delays (by calling sleepms()) added between successive create() (nested with resume()) system calls. Estimate how much CPU time the first process should receive, and the same goes for the other 4 processes. Compare your back-of-the-envelop calculations with the actual performance results from benchmarking. Discuss your findings in Lab3Answers.pdf.

Bonus problem

Mechanism (ii) is aimed at enhancing responsiveness of I/O-bound applications. The test cases B and C do not quantitatively gauge if (ii) has indeed improved responsiveness. What might be a way to quantify the benefit of (ii)? Describe your solution in Lab3Answers.pdf, implement it, and discuss your results.

Note: The bonus problem provides an opportunity to earn extra credits that count toward the lab component of the course. It is purely optional.

Turn-in instructions

  1. Format for submitting written lab answers and kprintf() added for testing and debugging purposes in kernel code:

    • Provide your answers to the questions below in Lab3Answers.pdf and place the file in system/. You may use any document editing software but your final output must be exported and submitted as a pdf file.
    • For problems where you are asked to print values using kprintf(), use conditional compilation (C preprocessor directives #ifdef and #define) with macros, please follow the instructions specified in the TA Notes.
  2. Before submitting your work, make sure to double-check the TA Notes to ensure that additional requirements and instructions have been followed.
  3. Electronic turn-in instructions:

    • i) Go to the xinu-spring2019/compile directory and run "make clean".
    • ii) Go to the directory of which your xinu-spring2019 directory is a subdirectory. (Note: please do not rename xinu-spring2019 or any of its subdirectories.)

      • e.g., if /homes/bob/xinu-spring2019 is your directory structure, go to /homes/bob
    • iii) Type the following command

You can check/list the submitted files using

turnin -c cs354le1 -p lab3 -v

Please make sure to disable all debugging output before submitting your code.


阅读 2k

分享Computer Science技术

Microsoft, ACMer, 现BAT全栈工程师。

751 声望
199 粉丝
0 条评论

Microsoft, ACMer, 现BAT全栈工程师。

751 声望
199 粉丝