2001 FREENIX Track Technical Program - Abstract
User-level Checkpointing for LinuxThreads Programs
William R. Dieter, and James E. Lumpp, Jr., University of Kentucky
Abstract
Multiple threads running in a single, shared address
space is a simple model for writing parallel programs for symmetric
multiprocessor (SMP) machines and for overlapping I/O and computation
in programs run on either SMP or single processor machines. Often a
long running program's user would like the program to save its state
periodically in a checkpoint from which it can recover in case
of a failure. This paper introduces the first system to provide
checkpointing support for multithreaded programs that use LinuxThreads,
the POSIX based threads library for Linux.
The checkpointing library is simple to use, flexible, and efficient.
Virtually all of the overhead of the checkpointing system comes from
saving the checkpoint to disk. The checkpointing library added no
measurable overhead to tested application programs when they took no
checkpoints. Checkpoint file size is approximately the same size as
the checkpointed process's address space. On the current
implementation WATER-SPATIAL from the SPLASH2 benchmark suite saved a
2.8 MB checkpoint in about 0.18 seconds for local disk or about 21.55
seconds for an NFS mounted disk. The overhead of saving state to disk
can be minimized through various techniques including varying the
checkpoint interval and excluding regions of the address space from
checkpoints.
- View the full text of this paper in
HTML form, and
PDF form.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
- To become a USENIX Member, please see our Membership Information.
|