Optimization Framework for the CACAO VM

master's thesis


A virtual machine is a software that takes charge of executing another program. This either can be done by interpreting the instructions of the program, or by compiling parts of it when needed. The second approach is also called just-in-time (JIT) compilation.

The Cacao VM is a virtual machine for Java bytecode. It follows a JIT-only approach meaning that all methods are compiled into native machine code prior execution. This requires a fast compiler to minimize the latency for program execution. Consequently, the compiler, called baseline compiler, uses simple data representation and the passes are highly integrated. While the generated machine code is adequate for most parts of a program, the over-all performance can be improved, if more time for better optimization is invested for frequently executed methods. It can not be decided a priori which methods will be called regularly. To gain this knowledge the virtual machine profiles the run-time behavior of the program and selects methods for recompilation with more efforts. This approach is known as adaptive optimization.

So far the Cacao VM uses the baseline compiler with a higher optimization level for recompilation. This approach has two problems. On the one hand, the additional optimizations complicate the baseline compiler and maintenance is more difficult. On the other hand and more important, due to the simple but efficient construction the baseline compiler is inflexible and new optimizations are hard to implement.

This work presents a new compilation framework for Cacao, which replaces the baseline compiler as the optimizing compiler. It features two intermediate representations. The graph-based high-level representation is intended for program analysis and transformation. It is designed to make optimization development fast and easy. The low-level representation is focused on tasks close to the machine level, like register allocation and code emission. The compiler contains a pass manager to administer the execution and data exchange of passes. Currently, the transformation pipeline includes IR construction, structural analysis, scheduling, instruction selection, register allocation and code emission.

An empirical comparison between the baseline compiler and the new framework discloses the potential of the new framework and sets the agenda for future directions of the project.