|
optimal peripheral access using pipe-based double-buffering
by Diomidis Spinellis
Diomidis Spinellis holds a Ph.D. in computer science. Currently
he is leading software development at SENA S.A. His research interests
include information security, software engineering, and programming
languages.
The Problem The problem I wanted to solve was to minimize the actual wall-clock time taken by our backup process, which currently needs eight hours for a full backup. I thought I could achieve this by making our drive stream, and by increasing the parallelism of the backup process. Our LAN consists of Windows 95/98/NT and UNIX machines. We perform our backups on a machine running Linux (2.0.36). At backup time, the Linux machine mounts the disks of all Windows hosts using the SMB filesystem and runs a tar(1) based backup process. The process is responsible for:
The heart of the process is essentially the following command: tar czf /dev/tape -b 1000 /smb The command creates a tar file in /dev/tape from the contents of smb using gzip compression and a blocking factor of 1000. The blocking factor is supposed to make tar write to the tape device in chunks of 1000 blocks (i.e., of 0.5MB). Accessing the tape in large blocks is mandatory for some devices, and according to some man pages for tar can be more efficient than access in smaller chunks. Since tape drives typically read or write data at a fixed rate, if that rate cannot be maintained on the rest of the system the drive has to read or write some data and then stop and backtrack to a point where the next block can be written. When this phenomenon occurs, the performance of the drive drastically deteriorates, and the drive is no longer said to be streaming. Writing in large blocks can help the drive to stream. Our drive consistently refused to stream. Increasing the size of the blocking factor had absolutely no effect. Going through the source code of tar we are using (GNU tar 1.11.8 with a small local modification to handle the lack of stable inode numbers in SMB filesystems), I noticed that when compression is specified, tar ignores the blocking factor. That explained the lack of streaming and prompted me to examine the matter further. Measurements and the Solution A golden rule of performance optimization is to profile the process to be optimized. I therefore first measured the time it takes for tar and gzip to process a medium-sized directory:
% time sh -c 'tar cf - /usr/src/linux | gzip -c >/dev/null'
This measurement establishes our performance bottom line. No matter how much we improve tape access, we will never be able to make the process faster than seven minutes. The next step was to measure the overhead of writing to tape:
% time sh -c 'tar cf - /usr/src/linux | gzip -c >/dev/tape'
So we now know that writing to the tape increases the execution time by about three minutes, or 47 percent. It is this increase that we want to fight. Since our full backup takes about eight hours, optimizing tape access could result in significant time savings. The measurement also tells us that the CPU is idle about 30 percent of the time. Our task mix is therefore I/O bound, and we have CPU power to spare in order to optimize it. As a first try, I piped the gzip output through dd(1) specifying a large (0.5MB) output block size. This causes dd to accumulate data from its input until it has 0.5MB of data available and then write that data in one large chunk to the tape:
% time sh -c 'tar cf - /usr/src/linux | gzip -c |
This time I could hear and see the tape drive stream; a beautiful sound and sight. Embarrassingly, the execution time increased by more than one minute (11 percent), while CPU utilization decreased. Streaming the tape drive did not appear to help. An analysis of the situation was now in order. Apparently, in the system I am using, write commands to the tape device are processed synchronously; that is, the process issuing the write system call is suspended until the data is written to the device. In the first case involving the tape drive, while data was being written by gzip to the tape drive (and gzip was waiting), tar could continue to process some files writing its output to the pipe buffer connecting the two processes. In the second case where dd was added, dd took a longer time to write the 0.5MB block to the tape drive. During that time, the pipe buffer between gzip and dd got full, thereby blocking gzip. That, in turn, made the pipe buffer between tar and gzip fill up, blocking tar as well. The explanation of the timing anomaly made me realize the problem's root and think about a possible solution. In order to optimize the backup process I would need a way to prevent the pipeline stalls while data was being written to the tape drive. The addition of another dd process after gzip could provide this buffer:
% tar cf - /usr/src/linux | gzip -c | \
In the pipeline above, gzip writes data to the first dd process. As soon as this process accumulates 0.5MB, the data is written to the second dd process using a single write call. As soon as the second dd process accumulates 0.5MB (i.e., immediately after the first process writes it), it will issue a write call to send the data to the tape drive. While data is being written to the tape, the second dd process is blocked; however, the first one is still running, since it has written its block to the second process and is now again accumulating the next block to write. Therefore tar and gzip can continue gathering and compressing data at the same time that data is being written to the tape drive. Thus, the two dd processes effectively implement a double-buffering mechanism. The time measurements prove the validity of this approach:
% time sh -c 'tar cf - /usr/src/linux | gzip -c |
Our change resulted in a 28 percent decrease in the tape overhead, giving a 10 percent decrease in overall execution time. Depending on the performance differences between the CPU and the peripheral in question, the execution time can decrease even more. Conclusions
As I mentioned at the beginning, one of the responsibilities of an OS
is to optimize the utilization of a computer system by the appropriate
scheduling of tasks. In our initial case, while the system was waiting
for the tape drive to write the data, it could indeed run other
unrelated processes such as sendmail or httpd.
However, the operation of the processes in the backup pipeline could
not be parallelized beyond the limits imposed by the small buffer
provided by the implementation of pipes under UNIX. Soon after one
process in the pipeline blocked, the whole pipeline was stalled. The
addition of another dd process before the process writing to
the tape provided extra breathing space to the pipeline and increased
the possibility for parallel operation. The same approach can be used
in all cases where a pipeline consists of a mixture of I/O and
CPU-bound tasks.
|
|
Last changed: 18 Nov. 1999 mc |
|