Dynamic memory allocation is considered to be a solved problem for most applications. The ubiquitous malloc(3) routine is a part of the C run-time library, and most programmers use it without a second thought. Much research and development efforts have been invested in optimizing malloc to fit the typical allocation pattern of applications, which is characterized by a small number of object sizes [2].
Most real (commercial) applications consider memory allocation performance early on. They often rewrite the memory allocator specifically to tune it to their application. For example, the Apache web server [8] and a large -body simulation program [1, column 6] contained specialized implementation of malloc.
We also considered malloc to be a robust building block, and we used it with Hummingbird [7], a light-weight file system for caching web proxies. To our surprise, when we run Hummingbird on several operating systems with different malloc implementations, the heap size of the process was several times larger than the total size of live memory objects. We knew that it was not a memory leak problem, since we run Hummingbird under Purify [5]. When we used other malloc implementations, most of them also caused excessive heap fragmentation (increased heap size). This is a serious problem, since it may cause heap overflows, thrashing or reduce the size of memory that can be used to store live objects. One malloc implementation even caused a heap overflow, which is unacceptable.
Hummingbird implements a memory-based cache, which stores variable-sized objects in addition to fixed-sized objects. The total size of live objects is fixed. We believe the Hummingbird's memory access pattern is actually quite common for many long-running applications, which store dynamic memory objects of variable sizes. This could be the memory access pattern of many Web related programs that maintain a cache in memory.
Many algorithms for memory allocators have been studied and documented. Most memory allocations are optimized for short-lived programs with a few fixed allocation sizes [9,2]. Recently, Larson and Krishnan [4] studied the scalability of memory allocators and the impact of memory allocation on long-running applications with fixed-size objects. Not many people have studied the effect of heap fragmentation on long-running applications with variable-sized objects.
While the excessive heap fragmentation was surprising to us, it is not a new problem. System developers have been facing the problem of malloc consuming excessive memory due to fragmentation, and have alleviated the problem by implementing their own memory management schemes. For example, the Apache web server [8] uses its own scatter-gather memory allocation scheme.
In this paper we describe the dynamic memory activity pattern of Hummingbird, and compare the operation of multiple malloc implementations given the same sequence of memory allocation and deallocation operations, which were captured from a live run of Hummingbird. We observed wide variation of heap consumption and running times between various malloc implementations. The best malloc we measured was PhK/BSD malloc version 42. It caused 30.5% heap fragmentation. The worst malloc was SunOS ``space efficient'' malloc, distributed with Sun OS version 5.8. It could not complete the run due to a heap overflow.
Since Hummingbird is not a commonly used program, we looked for a widely available tool that allocates objects of varying sizes and runs for a long time. We picked GNU Emacs version 20.7, and captured its dynamic memory activity when it was used to edit the source files in a large source hierarchy. See Section 3 for details. The memory activity of GNU Emacs is quite different than that of Hummingbird regarding object size distribution, object lifetime, and total amount of live memory, as described in Section 3. In particular, Emacs's total amount of live memory grows continuously during the run, while Hummingbird's total amount of live memory is fixed. We measured a smaller variation between the various mallocs when used with the Emacs dynamic memory activity trace relative to the Hummingbird trace. The best malloc we measured was Doug Lea's malloc version 2.6.6, which caused 2.69% fragmentation. The worst malloc was again the SunOS ``space efficient'' malloc, which caused 101.48% fragmentation. PhK/BSD malloc was a close fifth with 3.65% fragmentation.
The main contribution of this paper is in exposing an unexpected problem with malloc, which is an existing building block that has been extensively optimized. Since the internal algorithms of the various mallocs are so different, we did not attempt to analyze the root cause of this problem. We hope that this paper will help future developers recognize that some mallocs may cause excessive fragmentation, which can be alleviated by switching to a different malloc. In the long run, we hope that this paper will spur research in dynamic memory allocation in order to analyze and rectify the problem we identified.
The rest of the paper is organized as follows. Section 2 describes the Hummingbird light-weight file system and characterizes its dynamic memory activity. Section 3 describes the GNU Emacs workload and characterizes its dynamic memory activity. Section 4 explains the trace-driven program we used to compare the various malloc implementations. Section 5 contains a measurements of the various malloc implementations using the Hummingbird and Emacs memory activity traces, and Section 6 concludes.