# Compiler and Interpreter Technology

08:15 Monday, April 4th, 2016

Building Intermediate Representations.

Evaluation OrdersTranslating Statements
Translating ExpressionsBuilding the CFG.

• We saw last time the Intermediate Representation has two levels:
• 3-Address instructions that describes the sequence of operations.
• A Control-Flow-Graph that describes the execution orders of these sequences.
• Today we look at how to build this representation; how to translate the Semantic Value into basic-blocks.
• Part I is about translating expression trees into blocks.
• Mixes up parts of Chapter 5 and 6.
• Part II is about translating statement trees into a CFG.
• Mainly Chapter 6.6 onwards.
• The entire presentation is about the Tree Rewriting approach.
• Cleaner implementation than following the Attribute Grammar approach in the text.

# 2. Overview of translating expression trees

• A high-level view of translation from expression trees into 3-address code.
• Each node in the tree describes both an operation, and a resulting value.
• To keep this uniform we may need dummy operations.
• So we split the translation into two parts:
• A local (node) decision to convert one node into one instruction.
• A global (tree) decision about which order to do the conversions.

All of Chapter 5 describes the contraints for the global ordering process.
Figure 6.19 shows an example of the local rules.

# 3. What's in a name?

• A 3-address instruction contains four parts:
• Two names of values to read.
• The name of one value to write to.
• An operation that defines what to do.
• Labels in symbol table that name a particular value.
• Can be a reference to a variable's current value.
• Can be a constant.
• For most tree nodes there is a direct correspondence.
• Non-leaf nodes are connected to three other nodes in the tree.
• Binary operators have a left and a right subtree.
• The value they compute is used by a parent node.

# 4. Names from trees

• Most tree nodes in expressions are a single operation.
• The node type maps onto a 3-address instruction operation.
• The operation creates a value, we generate a name for it.
• The operation uses two values; names of each sub-tree root.
• If we translate the nodes in one particular order then it is easy.
Key Observation
The order that we process nodes during translation is the same as during interpretation.

# 5. Evaluation order

Consequence of the key observation
When expressed as a tree walk the translation process is a form of interpreter
• I'm unaware of this being stated explicitly anywhere in the text.
• It is generally known in the research literature.
• All pretty printers are a form of interpreter, the translator is a form of pretty printer.
• This evaluation order generalises from trees to DAGs.
• On a DAG this ordering is called a topological sort (hence its appearance in Chapter 5).
• First we'll look at the general system in Chapter 5.
• Then we'll see the specific case for translation of trees.
• Later we'll return to see how it is defined on DAGs.

# 6. Evaluation order: Semantic Rules

• Chapter 5 jumps around quite a bit.
• Dependencies and evaluation orders are introduced in the context of semantic rules.
• Control-flow is opaque for semantic rules: code is executed when productions apply.
• This disguises the point somewhat: we will go though a cleaner description.
• The simple (lexeme-only) parse trees that we built with Bison are implied by the grammar.
• Although we needed Semantic Actions to build this trivial tree that is not their purpose.
• The purpose of injecting code is to attach a full set of attributes to each node.
• If we work in this style then the actions inside the production are specifying how data flows outside, between the nodes. It can be unintuitive this way.

# 7. Evaluation order: Trivial Actions

• The style taught in the labs did not propagate attributes between nodes inside the rules.
• Instead each node stored the lexeme only.
• Restructuring the trees ("less ugly") demonstrated complex actions, breaking this trivial mapping.
• To maintain this trivial mapping each rule should only build the node that defines that production.
• This separates the problem of defining the family of parse-tree shapes from the problem of propagating dependencies across the parse-trees.
• This style of Semantic Actions is trivial in two different senses:
• They express the minimum required to build the tree.
• They are simple enough to build automatically from the grammar.

# 8. Evaluation order: Attribute Dependencies

• Dependency graphs are a structure drawn on top of parse-trees.
• They indicate where each attribute is dependent on the value of another.
• Grammars do not turn directly into dependency graphs.
• The actions/rules in the grammar define the dependencies of every tree.
• This is a level of indirection: code implies dependencies in the generated structure.
• The direction of information flow defines the type of dependency.
• Data flows upwards from children to synthesized attributes in parents.
• Data flows downwards from parents to inherited attributes in children.

# 9. Evaluation order: Dependency Constraints

• If all attributes in a grammar are synthetic:
• This implies we build every attribute in a node by looking at the children.
• This is called an S-attributed grammar.
• It can be computed as a tree-walk (post-order traversal).
• So why do we need anything more complex?
• Type declarations are the classic example for inherited attributes.
• The declaration node has a synthetic attribute to represent the type.
• The identifiers have inherited attributes to store this type.
• This is an L-attributed grammar.

# 9b. Code for the previous example

DeclNode *processDecl(ParseNode *node) { Type *t = processType(node->left); if(!validCombination(t)) throw("Bad tree"); IdList *ids = processIdList(node->right, t); return new DeclNode(ids); }
• Trying to analyse / fix info-flow in the grammar is awkward.
• Building a simple tree, then traversing it is easier.
• The previous example is shown above.
• The synthetic attribute (the type value) is a result from a function.
• The inherited attribute is passing that result as an argument.

# 10. Evaluation order: Summary

• Attributes are attached to nodes in the parse-tree.
• Each attribute may depend on the value of other attributes.
• These constraints induce orders for correct evaluation.
• It is vital that we avoid cycles in these dependencies.
• S-attributed grammars only propagate data up the parse-tree.
• L.attributed grammars allow limited downwards propagation.
• By restricting left-to-right flows amonst siblings.
• Now we introduce the evaluation order for translation as a tree rewrite.
• It is a very specific case, so much simpler than the general case.
• Two passes over the parse-tree.
• Generate names (addresses) for each node.
• Output 3-address instructions in the correct order.

• Every node needs a unique name: use a scheme to generate an unlimited number.
• There are several in Chapter 5.
• I'll assume that the source language does not allow leading underscores.
• Walk over the tree and use a counter to generate _t0, _t1...
• It seems wasteful - but we can remove unused names later.
counter = 0 function names(node current) if current is a leaf current.output = variable name else current.output = "_t"+counter++ names(current.left) names(current.right) names(root)

• Emit one 3-address instruction per node.
• Choose the order so that sources are evaluated before targets.
• This is just a post-order traversal of the tree (ignoring inherited values).
function process(node current) if current is not a leaf process(current.left) process(current.right) output current.name output "<-" output current.left.name output current.operator output current.right.name process(root)
t2 <- y * z t1 <- x + t2

# 13. Inherited Attributes

• The psuedo-code walks the tree as a post-order traversal.
• And it processes children left-right.
• So it is already suitable for handling L-attributed grammars.
• The psuedo-code (for translation) showed a single uniform case for all tree nodes.
• Q: What if we need inherited attributes in the Semantic Values?
• A: Add another tree-walk inbetween the parser and the translation.
• Use the uniform case as a base-case (for the node class).
• Sub-type specific behaviour in the parent node for inherited attributes.
• Call the walk on the left-child, read the created value.
• Pass the value to the call on the right-child (specific behaviour).
• Writing individual tree-walks is very modular.
• Using this scheme simplifies the parser, allows easy semantic analysis and then simple translation into 3-address instructions.

# 14. Constants

• We have two different options for how to handle constants:
• They can act as their own name.
• e.g. t3 <- x + 1
• When the operand is converted to assembly language
• Inserted directly as an immediate value ,e.g. LOAD #1, R1
• Or we can place the constant into the symbol table.
• Gives it a name, store that name in the node.
• Now uniform with the other cases.
• Ultimately depends on if the type is atomic or complex.
• e.g. constants for objects/aggregates are easier to process in the symbol table.

Intermission

# 15. Building the IR

• In the first part we saw how to build basic blocks from expression trees.
• Recursive (family of) function(s) to walk the tree.
• Allocate names first, append instructions to a list.
• These sequences are the contents of basic blocks.
• Next we look at the parse-tree around the expression sub-trees.
• The parts that correspond to statements in the source language.
• Assume that the previous psuedo-code has been implemented in:
• string convert(Expr* tree, BBlock *target);
• The return value is the final name computed (the root of the subtree).
• The BBlock includes a sequence container for instructions.
• How do we process the non-expression nodes in the parse-tree?

# 16. Translating assignment statements

• In the source we have x := E where E is an expression.
• Let's assume that we parsed this into the tree shown.
• When we call convert on the tree, it:
• Fills instructions into our current block.
• Produces the name (address) of the result of evalauting E.
• So all we need to do is copy this value into x.
• Copy is a unary operation (one operand).
• x <- resultE copy resultE resultE is the name.
• This all happens within the same basic block;
• A sequence of assignment statements will occupy the same basic block.

# 17. The other kinds of statements

• The rest of the statements are more complicated.
• As well as looking at how to construct the basic blocks...
• ...we must look at how to connect them together into a CFG.
• Each of the statement types that modify control flow:
• If-then-else
• While
• (anything else that is a sugared form of these)
• All involve steps to modify both blocks, and the CFG.
• What do we need to represent a basic block / CFG?

# 18. Representation of basic blocks

• A basic block contains a sequence of 3-address instructions.
• These instructions imply the contents of part of a symbol table.
• There may be 0,1 or 2 outgoing edges.
• A rough sketch would be:
class BBlock { std::list contents; public: BBlock *trueExit, *falseExit; int numExits; ... std::set< std::string > symbols(); std::string lastValueWritten(); ... };

# 19. Comparison to assembly language

• If you have spent any time writing code in assembly then you will be used to labels and jumps.
• We try to avoid schemes that involve explicit labels and jumps.
• They work out messier and more complex than schemes using blocks.
• Blocks corrospond to cutting a program into pieces whenever there is a label or a jump.
• Up to two jumps at the end of a block.
• Non-conditional jumps finish a block immediately.
• Labels always start a new block: merges together at least two control flows.

# 20. Outline of statement processing

• We want a common interface to translation of statement nodes:
• Assume that we always have a "current" block that we are building.
• We can add instruction into this block.
• If we need control-flow then we must make new blocks.
• Decide which block is "current" at the end.
• This should fit into a recursive scheme to process the parse-tree.
• At times it will call the expression translation on sub-trees.
• Now we can fill in this outline for:
• Sequences
• If-then-else
• While

# 21. Translation of sequences

• The description of assignment earlier already fitted the outline.
• No new blocks, append instructions into the current block.
• Consider a sequence of assignment statements.
• They should continue to fill up the same basic block.
• To translate a sequence we simply call the translation on the nodes below.
• As long as they update the current block at the end, it works.

# 22. Translation of If-then-else

• Consider the example with simple statements inside the true and else branches.
• The intention is to select one of the two branches to execute.
• Ultimately this will be chosen by a conditional jump instruction.
• There are at least four blocks involved in this structure.
• A common initial block; evaluate the condition in the boolean expression.
• A block to execute on the true branch.
• A block to execute on the false branch.
• The following block where control-flow merges after either case.

# 23. Translation of while loops

• An if statement with no else-branch simply forwards the other edge to the block that rejoins control flow.
• A while loop is simply an if-statement that repeats; instead of the true-block rejoining it jumps back to evaluating the condition.
• NOTE: The diagram is a do-while loop.
• Discuss: How do we move the arrows to form a while-loop?

# 24. Example of de-sugaring

• There is a simple equivalence between C-style for-loops and while.
• We can use this to "de-sugar" the for-loop into a while-loop.
• The transformation can operate on trees or CFGs.
• If we want to simplfy the IR translation then it makes sense to rewrite the parse-tree.
• for(A;B;C) D; => A; while(B) { D; C; }
• Desugaring reduces the number of different cases to process in following phases.

# 25. Nesting

• Examples so far have used one block "inside" each of the constructs.
• i.e. while(x<y) x:=x+1;
• We need to handle the parse-tree recursively during translation.
• We must be able to nest these constructions inside one another.
• i.e. while(x<y) if x%2==1 x:=x+1 else x:=x*2
• Thankfully this turns out to be quite easy.
• The structures shown produce graphs that have one entry and one exit point.
• A single block has one entry and one exit point.
• So they are already similar...

# 26. Nesting

• The interface to building a piece of the graph...
• ...is walking the parse-tree to translate a node.
• This starts filling the current block.
• Adds new blocks as necessary.
• Ends by updating / returning the current block.
• If we allow empty blocks in the CFG then this just works.
• The red grouping has the same structure as a single block.
• Note to self: it's actually a while loop this time.

# 27. Final Thought

• There are bits that did not fit naturally into this presentation.
• We have some holes to fill in later:
• Boolean expressions (and the true/false exits). [Lecture 10]
• Function calls. [Lecture 11]
• DAGs, translation into them and their relation to SSA. [Lecture 9]
• Topological sorts. [Lecture 9]
• We've seen most of how to build the IR from the parse-tree.
• Some details are better to explain in a later context.