Reducing dynamic compilation latency - concurrent and parallel dynamic compilation

Abstract

The main challenge faced by a dynamic compilation system is to detect and translate frequently executed program regions into highly efficient native code as fast as possible. Depending on application requirements, state-of-the-art dynamic-compilation systems either focus on peak-performance, applying many optimisations resulting in low compilation speeds, or response time, trading peak performance of generated machine code for compilation speed. Faster availability of optimised native code minimises the time spent in the unoptimised version, thereby improving application performance. As dynamic compilation adds to the overall execution time, it is often decoupled and operates in a separate thread independent from the main execution loop. This approach improves application responsiveness by reducing pause times due to dynamic compilation, it does not, however, reduce dynamic compilation latency. In this talk we want to present two innovative contributions that work together to effectively reduce dynamic compilation latency. The first contribution is an incremental region based compilation approach that considers all frequently executed paths in a program for dynamic compilation, as opposed to previous trace based approaches where trace compilation is restricted to paths through loops. The second reduces dynamic compilation latency by compiling several hot regions in a concurrent and parallel task farm, using LLVM as the underlying compilation framework. The proposed scheme was implemented and evaluated in the context of an industry-strength dynamic binary translator. Using more than 60 industry standard benchmarks from various domains we demonstrate speedups of up to 2.08 on a standard quad-core machine. Across short- and long-running benchmarks the dynamic compilation scheme is robust and never results in a slowdown. In fact, using four processors total execution time can be reduced by on average 11.5% over state-of-the-art decoupled (or asynchronous) dynamic compilation. Finally, we want to show two live demos to showcase the capability and successful application of our technology to real world applications ranging from video decoding and playback of the WebM video codec to instruction accurate dynamic binary translation of a full operating system binary.

Date
Location
Hotel Russell, 1-8 Russell Square, Bloomsbury, London, WC1B 5BE, UK.