DesignWare ARC nSIM - ARC processor model

The DesignWare® ARC® nSIM Instruction Set Simulator provides an instruction accurate processor model for the DesignWare ARC processor families.

ArcSim Project - Flexible Ultra-High Speed Instruction Set Simulator

ArcSim is our ‘Swiss army knife’ for high-speed functional and cycle accurate instruction set simulation of the EnCore processor. It provides various simulation modes and yields a wealth of statistics and metrics about simulated programs.

PASTA Project - Processor Automated Synthesis by iTerative Analysis

In the PASTA project we seek to automate the design and optimisation of customisable embedded processors. We do this by creating tools that are able to learn about the physical characteristics of the underlying silicon technology, and use that knowledge to synthesise the structure of an embedded processor.

HBURG - Haskell Bottom Up Rewrite Generator

Design and implementation of a code generator generator based upon Tree Pattern Matching and Dynamic Programming using the functional programming language Haskell.

PortBrowser - A user interface for the BSD ports system

The PortBrowser is a FREE, easy to use implementation of a front end written for the BSD ports system. It has been developed for OpenBSD but it should also work on FreeBSD.

At PLDI’11 we have demonstrated that a concurrent dynamic compilation model works really well in our paper “Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator”. This work has also been presented at Euro LLVM’12 under the title “Reducing dynamic compilation latency - concurrent and parallel dynamic compilation”.

Three years later Google announces support for concurrent compilation to make Chrome, more specifically the V8 JavaScript engine, faster.

More recently, Facebook announced concurrent JIT compilation support in HHVM, their open source virtual machine designed for executing programs written in Hack and PHP.


Instruction set simulators (ISS) are vital tools for compiler, operating system, and application development as well as processor architecture design space exploration and verification. Because the demands are so different, designing an ISS that caters to all of the above application scenarios is a constant challenge. On the one hand HW verification demands absolute precision with respect to architectural behavior. Even for corner case randomly generated scenarios that are unlikely to occur in reality. Compiler developers on the other hand require functional correctness, performance, and rich profiling feedback to create an optimizing compiler before the actual HW is ready.


Very happy to see our simulation research made it into a very successful Synopsys Inc. product making customers happy.


The official case study about the research impact of The EnCore Microprocessor and the ArcSim Simulator project has been released. We are all very happy to see industry value our ideas and work.


I received an invitation to present the outcome of my PhD at the Euro LLVM’12 conference. It seems that at that time we were the first ones to have built a production ready concurrent JIT compiler using the LLVM framework.

A few years later Google and Facebook apply our research results in their virtual machines.



Past & Upcoming


The main challenge faced by a dynamic compilation system is to detect and translate frequently executed program regions into highly efficient native code as fast as possible. To efficiently reduce dynamic compilation latency, a dynamic compilation system must improve its workload throughput, i.e. compile more application hotspots per time. As time for dynamic compilation adds to the overall execution time, the dynamic compiler is often decoupled and operates in a separate thread independent from the main execution loop to reduce the overhead of dynamic compilation. This thesis proposes innovative techniques aimed at effectively speeding up dynamic compilation...
[PhD Thesis]

Embedded systems, as typified by modern mobile phones, are already seeing a drive toward using multi-core processors. The number of cores will likely increase rapidly in the future. Engineers and researchers need to be able to simulate systems, as they are expected to be in a few generations time, running simulations of many-core devices on today's multi-core machines. These requirements place heavy demands on the scalability of simulation engines, the fastest of which have typically evolved from just-in-time (JIT) dynamic binary translators (DBT) ...

In recent years multi-core processors have seen broad adoption in application domains ranging from embedded systems through general-purpose computing to large-scale data centres. Simulation technology for multi-core systems, however, lags behind and does not provide the simulation speed required to effectively support design space exploration and parallel software development. While state-of-the-art instruction set simulators (ISS) for single-core machines reach or exceed the performance levels of speed-optimised silicon...

Abstract Dynamic Binary Translation (DBT) is the key technology behind cross-platform virtualization and allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Under the hood, DBT is typically implemented using Just-In-Time (JIT) compilation of frequently executed program regions, also called traces. The main challenge is translating frequently executed program regions as fast as possible into highly efficient native code. As time for JIT compilation adds to the ...

Instruction set simulators (Iss) are vital tools for compiler and processor architecture design space exploration and verification. State-of-the-art simulators using just-in-time (Jit) dynamic binary translation (Dbt) techniques are able to simulate complex embedded processors at speeds above 500 Mips. However, these functional Iss do not provide microarchitectural observability. In contrast, low-level cycle-accurate Iss are too slow to simulate full-scale applications, forcing developers to revert to FPGA-based simulations. In this paper we demonstrate that it is possible to run ultra-high speed cycle-accurate instruction set simulations surpassing...

For memory constrained embedded systems code size is at least as important as performance. One way of increasing code density is to exploit compact instruction formats, e.g. ARM Thumb2, where the processor either operates in standard or compact instruction mode. The ARCompact ISA considered in this paper is different in that it allows freeform mixing of 16- and 32-bit instructions without a mode switch. Compact 16-bit instructions can be used anywhere in the code given that additional register constraints are satisfied. In this paper we present an integrated instruction selection and register allocation methodology and develop two approaches for mixed-mode code generation: a simple opportunistic ...

Building compiler back ends from declarative specifications that map tree structured intermediate representations onto target machine code is the topic of this thesis. Although many tools and approaches have been devised to tackle the problem of automated code generation, there is still room for improvement...
[MSc Thesis]