Profiling [14] is an important step in software development. We use the term profiling to mean, in a broad sense, the ability to monitor and trace events that occur during run time, the ability to track the cost of these events, as well as the ability to attribute the cost of the events to specific parts of the program. For example, a profiler may provide information about what portion of the program consumes the most amount of CPU time, or about what portion of the program allocates the most amount of memory.
This paper is mainly concerned with profilers that provide information to programmers, as opposed to profilers that feedback to the compiler or run-time system. Although the fundamental principles of profiling are the same, there are different requirements in designing these two kinds of profilers. For example, a profiler that sends feedback to the run-time system must incur as little overhead as possible so that it does not slow down program execution. A profiler that constructs the complete call graph, on the other hand, may be permitted to slow down the program execution significantly.
This paper discusses techniques for profiling support in the Java virtual machine [17]. Java applications are written in the Java programming language [10], and compiled into machine-independent binary class files, which can then be executed on any compatible implementation of the Java virtual machine. The Java virtual machine is a multi-threaded and garbage-collected execution environment that generates various events of interest for the profiler. For example:
The first contribution of this paper is to present a general-purpose, extensible, and portable Java virtual machine profiling architecture. Existing profilers typically rely on custom instrumentation in the Java virtual machine and measure limited types of resource consumption. In contrast, our framework relies on an interface that provides comprehensive support for profilers that can be built independent of the Java virtual machine. A profiler can obtain information about CPU usage hot spots, heavy memory allocation sites, unnecessary object retention, monitor contention, and thread deadlocks. Both code instrumentation and statistical sampling are supported. Adding new features typically requires introducing new event types, and does not require changes to the profiling interface itself. The profiling interface is portable. It is not dependent on the internal implementation of the Java virtual machine. For example, the heap profiling support is independent of the garbage collection implementation, and can present useful information for a wide range of garbage collection algorithms. The benefit of this approach is obvious. Tools vendors can ship profilers that work with any virtual machine that implements the interface. Equivalently, users of a Java virtual machine can easily take advantage of the profilers available from different tools vendors.
The second contribution of this paper is to introduce an algorithm that obtains accurate CPU-time profiles in a multi-threaded execution environment with minimum overhead. It is a standard technique to perform statistical CPU time profiling by periodically sampling the running program. What is less known, however, is how to obtain accurate per-thread CPU time usage on the majority of operating systems that do not provide access to the thread scheduler or a high-resolution per-thread CPU timer clock. In these cases, it is difficult to attribute elapsed time to threads that are actually running, as opposed to threads that are blocked, for example, in an I/O operation. Our solution is to determine whether a thread has run in a sampling interval by comparing the check sum of its register sets. To our knowledge, this is the most portable technique for obtaining thread-aware CPU-time profiles on modern operating systems.
The third contribution is to demonstrate how our approach supports interactive profiling with minimum overhead. Users can selectively enable or disable different types of profiling while the application is running. This is achieved with very low space and time overhead. Neither the virtual machine, nor the profiler need to accumulate large amounts of trace data. The Java virtual machine incurs only a test and branch overhead for a disabled profiling event. Most events occur in code paths that can tolerate the overhead of an added check. As a result, the Java virtual machine can be deployed with profiling support in place.
We have implemented all the techniques discussed in this paper in the Java Development Kit (JDK) 1.2 [15]. Numerous tool vendors have already built profiling front-ends that rely on the comprehensive profiling support built into the JDK 1.2 virtual machine.
We will begin by introducing the general-purpose profiling architecture, before we discuss the underlying techniques in detail. We assume the reader is familiar with the basic concepts in the Java programming language [10] and the Java virtual machine [17].