Check out the new USENIX Web site. next up previous
Next: Performance Measurements Up: Adaptive Page Replacement to Previous: Why an adaptive replacement

The Design and Implementation of Thrashing Protection Patch

The main idea of the patch is quite simple and intuitive. Once multiple ``CPU cycle eager'' processes but with high page fault rates and low CPU utilization coexist, the patch will make a temporal tuning on the page replacement to help one of the processes to establish its working set and let it consume CPU cycles. Otherwise, the patch is almost dormant and adds little intervention to the original system.

Our patch implementation on Kernel 2.2 consists of two kernel utilities: detection and protection routines. The detection routine is used to dynamically monitor the page fault rate of each process and the CPU utilization of the system. The protection routine will be awakened to adjust the page replacement when the CPU utilization is lower than a predetermined threshold, and when the page fault rates of more than one interacting process exceed a threshold. It then grants a privilege to an identified process which will only contribute a limited number of NRU pages. The detection routine also monitors if the identified process has lowered its page fault rate to a certain degree. If so, its privilege will be disabled. This action will retain the memory utilization by treating each process equally.

There are four predetermined parameters used in the patch:

  1. CPU_Low: the lowest CPU utilization the system can tolerate.
  2. CPU_High: the targeted CPU utilization for the patch to achieve.
  3. PF_Low: the targeted page fault rate 2 of the identified process for the patch to achieve.
  4. PF_High: the page fault rate threshold for a process to potentially cause thrashing.
We add one global linked list, high_PF_proc, in the kernel to record interacting processes with high page fault rates. A process enters the list when its page fault rate exceeds ``PF_High'', and exits from the list when its page fault rate lowers below ``PF_Low''.

The kernel memory management has the following three states with dynamic transitions:

  1. normal state: In this state, no monitoring activities are conducted. The system deals with page faults exactly as the original Linux kernel does. The system keeps track of the number of page faults for each process and places the processes with page fault rates higher than ``PF_High'' into ``high_PF_proc".

  2. monitoring state: In this state, the detection routine is awakened to start monitoring the CPU utilization and the page fault rates of processes in the linked list. If the protection condition is satisfied, the detection routine will select a process in ``high_PF_proc" for protection and go to the protection state. The system returns to the normal state when there is no more than one processes in ``high_PF_proc".

  3. protection state: The protection routine will mark the selected process and let its ``swap_cnt'' reset to 0 no matter whether a replaced page has been successfully found, which let the process contribute at most one page continuously and help it quickly establish its working set. In the protection state, the detection routine keeps monitoring the CPU utilization and the page fault rate of each process in the list. The detection routine is deactivated and the protection state transfers to the monitoring state as soon as the protected process obtains low page fault rate, and/or the CPU utilization has been sufficiently improved.

Figure 4 describes the dynamic transitions among the three states, which gives a complete description of the patch. When the system is normal (no page faults occur), detection and protection routines are not involved. The patch only adds limited operations for each page fault and checks several system parameters with the interval of one second. So, overhead involved in detection and protection is trivial compared with the CPU overhead to deal with page faults.

Figure 4: Dynamic transitions among normal, monitoring, and protection states in the improved kernel system.
\begin{figure}\centering \centerline{\psfig{figure=state.eps,width=3.5in,height=1.5in}} \end{figure}


next up previous
Next: Performance Measurements Up: Adaptive Page Replacement to Previous: Why an adaptive replacement
Song Jiang 2001-09-09