

# Intermediate Representations

COMP 412 Fall 2005

Copyright 2005, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use.





- Front end produces an intermediate representation (IR)
- Middle end transforms the IR into an equivalent IR that runs more efficiently
- Back end transforms the IR into native code
- *IR* encodes the compiler's knowledge of the program
- Middle end usually consists of several passes

### Intermediate Representations



- Decisions in IR design affect the speed and efficiency of the compiler
- Some important *IR* properties
  - Ease of generation
  - Ease of manipulation
  - Procedure size
  - Freedom of expression
  - Level of abstraction
- The importance of different properties varies between compilers
  - Selecting an appropriate IR for a compiler is critical

# Types of Intermediate Representations

Three major categories

- Structural
  - Graphically oriented
  - Heavily used in source-to-source translators
  - Tend to be large
- Linear
  - Pseudo-code for an abstract machine
  - Level of abstraction varies
  - Simple, compact data structures
  - Easier to rearrange
- Hybrid
  - Combination of graphs and linear code
  - Example: control-flow graph

Examples: Trees, DAGs

Examples: 3 address code Stack machine code

Example: Control-flow graph

#### Comp 412 Fall 2004

Α

# subscript

Level of Abstraction

#### High level AST: Good for memory disambiguation

| IUauI | <b>–</b>         |                | -/ | <b>-</b> 1              |
|-------|------------------|----------------|----|-------------------------|
| sub   | r <sub>j</sub> , | $r_1$          | => | $r_2$                   |
| loadI | 10               |                | => | r <sub>3</sub>          |
| mult  | $r_2$ ,          | r <sub>3</sub> | => | $r_4$                   |
| sub   | r <sub>i</sub> , | $r_1$          | => | $r_5$                   |
| add   | $r_4$ ,          | $r_5$          | => | $r_6$                   |
| loadI | <b>@</b> A       |                | => | $r_7$                   |
| add   | $r_7$ ,          | $r_6$          | => | r <sub>8</sub>          |
| load  | r <sub>8</sub>   |                | => | <b>r</b> <sub>Aij</sub> |
|       |                  |                |    | -                       |

Good for address calculation

Low level linear code:

=> r

 The level of detail exposed in an IR influences the profitability and feasibility of different optimizations.

Two different representations of an array reference:

loadt 1



#### Level of Abstraction



- Structural IRs are usually considered high-level
- Linear IRs are usually considered low-level
- Not necessarily true:



loadArray A,i,j

High level linear code



An abstract syntax tree is the procedure's parse tree with the nodes for most non-terminal nodes removed



- Can use linearized form of the tree
  - Easier to manipulate than pointers
    - x 2 y \* in postfix form
    - \* 2 y x in prefix form
- S-expressions are (essentially) ASTs

Directed Acyclic Graph



A directed acyclic graph (DAG) is an AST with a unique node for each value



- Makes sharing explicit
- Encodes redundancy

Same expression twice means that the compiler might arrange to evaluate it just once!



Originally used for stack-based computers, now Java

• Example:

x - 2 \* y becomes push x push 2 push y multiply subtract

Advantages

- Compact form
- Introduced names are *implicit*, not *explicit*
- Simple to generate and execute code

Useful where code is transmitted over slow communication links (*the net*)

Implicit names take up no space, where explicit ones do!



Several different representations of three address code

• In general, three address code has statements of the form:

With 1 operator ( $\underline{op}$ ) and, at most, 3 names (x, y, & z)



• Compact form

#### Three Address Code: Quadruples

Naïve representation of three address code

- Table of k \* 4 small integers
- Simple record structure
- Easy to reorder
- Explicit names

| load  | r1, | У   |    |
|-------|-----|-----|----|
| loadI | r2, | 2   |    |
| mult  | r3, | r2, | r1 |
| load  | r4, | х   |    |
| sub   | r5, | r4, | r3 |

RISC assembly code

The original FORTRAN compiler used "quads"

| load  | 1 | Y |   |
|-------|---|---|---|
| loadi | 2 | 2 |   |
| mult  | 3 | 2 | 1 |
| load  | 4 | X |   |
| sub   | 5 | 4 | 3 |

Quadruples



#### Three Address Code: Triples

- Index used as implicit name
- 25% less space consumed than quads
- Much harder to reorder





Three Address Code: Indirect Triples

- List first triple in each statement
- Implicit name space
- Uses more space than triples, but easier to reorder

| (100) | (100) | load | у     |       |
|-------|-------|------|-------|-------|
| (105) | (101) |      | 2     |       |
|       | (102) | mult | (100) | (101) |
|       | (103) | load | X     |       |
|       | (104) | sub  | (103) | (102) |

- Major tradeoff between quads and triples is compactness versus ease of manipulation
  - In the past compile-time space was critical
  - Today, speed may be more important



Allows statements of the form

х ← х <u>ор</u> у

Has 1 operator  $(\underline{op})$  and, at most, 2 names (x and y)

Example:

 $z \leftarrow x - 2 * y$  becomes

• Can be very compact

 $t_{1} \leftarrow 2$   $t_{2} \leftarrow 1 \text{oad } y$   $t_{2} \leftarrow t_{2} * t_{1}$   $z \leftarrow 1 \text{oad } x$  $z \leftarrow z - t_{2}$ 

Problems

- Machines no longer rely on destructive operations
- Difficult name space
  - Destructive operations make reuse hard
  - Good model for machines with destructive ops (PDP-11)

#### Control-flow Graph

Models the transfer of control in the procedure

- Nodes in the graph are basic blocks
  - Can be represented with quads or any other linear representation
- Edges in the graph represent control flow





## Static Single Assignment Form



- The main idea: each name defined exactly once

```
Original
                                            SSA-form
                                                    x₀ ← ...
        x ← ...
        y ← ...
                                                    y_0 \leftarrow \dots
       while (x < k)
                                                     if (x_0 \ge k) goto next
                                            loop: x_1 \leftarrow \phi(x_0, x_2)
            x \leftarrow x + 1
            y \leftarrow y + x
                                                        y_1 \leftarrow \phi(y_0, y_2)
                                                         x_2 \leftarrow x_1 + 1
                                                         y_2 \leftarrow y_1 + x_2
                                                         if (x_2 < k) goto loop
Strengths of SSA-form
                                            next:
                                                           •••
```

- Sharper analysis
- (sometimes) faster algorithms



- Repeatedly lower the level of the intermediate representation
  - Each intermediate representation is suited towards certain optimizations
- Example: the Open64 compiler
  - WHIRL intermediate format
    - Consists of 5 different *IRs* that are progressively more detailed and less abstract

### Memory Models



Two major models

- Register-to-register model
  - Keep all values that can legally be stored in a register in registers
  - Ignore machine limitations on number of registers
  - Compiler back-end must insert loads and stores
- Memory-to-memory model
  - Keep all values in memory
  - Only promote values to registers directly before they are used
  - Compiler back-end can remove loads and stores
- Compilers for RISC machines usually use register-to-register
  - Reflects programming model
  - Easier to determine when registers are used

The Rest of the Story...

Representing the code is only part of an IR

There are other necessary components

- Symbol table (already discussed)
- Constant table
  - Representation, type
  - Storage class, offset
- Storage map
  - Overall storage layout
  - Overlap information
  - Virtual register assignments

