Check out the new USENIX Web site.


Optimizations

Since Jikes RVM is written in Java, our runtime call graph construction mechanism may incur two kinds of overhead. First, adding one instruction per call can potentially consume many CPU cycles because Jikes RVM itself is compiled using the same compilers used for compiling the applications, and it also inserts many system calls into applications for runtime checks, locks and object allocations. Second, a CTB array is a normal Java array with a three-word header; thus CTB arrays can increase memory usage and create extra work for garbage collectors.


Table 1: Distribution of CTB sizes
#callers Java Libraries SpecJVM App
0 2214 69.08% 507 19.32%
1 291 78.16% 815 50.38%
2-3 172 83.53% 608 73.55%
4-7 170 88.83% 283 84.34%
8- 358   411  
TOTAL 3205 2624


Table 1 shows the distribution of the CTB sizes for the SpecJVM98 benchmarks [27] profiled in a FastAdaptiveSemiSpace boot image. The boot image contains mostly RVM classes and a few Java utility classes. We only profiled methods from Java libraries and benchmarks. A small number of methods of classes in the boot image may have CTB arrays allocated at runtime because there is no clear cut mechanism for distinguishing between Jikes RVM code and application code. The first column shows the range of the number of callers. The second and third columns list the distributions of methods belonging to Java libraries and SpecJVM application code.3 To demonstrate that most methods have few callers, we calculated the cumulative percentages of methods that have no caller, $\leq 1$, $\leq 3$ and $\leq 7$ callers in the first to fourth rows. We found that 89% of methods from (loaded classes in) Java libraries and 84% of methods from SpecJVM98 have no more than 7 callers. In these cases, it is not wise to create short CTB arrays because each array header takes 3 words. The last data row labelled ``TOTAL" gives the total number of methods of all classes and the number of methods in each of two sub-categories.

To avoid the overhead of array headers for CTBs, and to eliminate the extra instruction to load the CTB array from a TIB in the code for invokevirtual instructions, a local optimization is to inline the first few elements of the CTB into the TIB. Since caller indices are assigned at compile time, a compiler knows which part of the CTB will be accessed in the generated code. To accommodate the inlined part of the CTB, a class' TIB entry is expanded to allow a method to have several entries. Figure 1(d) shows the layout of TIBs with one inlined CTB element. When generating instructions for a virtual call, the value of the caller's CTB index, caller_index, is examined: if the index falls into the inlined part of the CTB, then invocation is done by three instructions:

  TIB = * (ptr + TIB_OFFSET);
  INSTR = TIB[method_offset + caller_index]; 
  JMP INSTR
Whenever a CTB index is greater than or equal to the inlined CTB size, INLINED_CTB_SIZE, then four instructions must be used for the call:
  TIB = * (ptr + TIB_OFFSET);  
  CTB = TIB[method_offset + CTB_ARRAY_OFFSET]; 
  INSTR = CTB[caller_index - INLINED_CTB_SIZE];
  JMP INSTR

Note that in addition to saving the extra instruction for inlined CTB entries, the space overhead of the CTB header is eliminated in the common cases where all CTB entries are inlined.

Another source of optimization is to avoid the overhead of handling system code, such as runtime checks and locks, inserted by compilers, because this code is frequently called and ignoring them does not affect the semantics of applications. To achieve this, the first CTB entry is reserved for the purpose of system inserted calls. Instead of being initialized with the address of a call graph profiling stub, the first entry has the address of a lazy method compilation code stub or method instructions. When the compiler generates code for a system call, it always assigns the zero caller_index to the caller. To avoid the extra load instruction, the first entry of a CTB array is always inlined into the TIB.

Feng Qian 2004-02-16