DV1465 / DV1505 / DV1511:
Compiler and Interpreter Technology
13:15 Friday 8th April, 2016
Runtime memory organisation.
Table of Contents
|Data and Memory||Procedure Activations|
|Issues in the runtime||Stack Organisation|
|Runtime techniques||Call Sequences|
1. Roadmap type stuff
- In the first half today:
- Issues and techniques in the runtime.
- How do we use memory, present it to the programmer.
- Second half is about stacks:
- Procedure calls, activiation trees.
- Stack organisation and frames.
- Procedure call mechanism.
2. The Program and the Machine
- The running program and the machine have a fundamental disagreement on what memory is.
- A sparse dynamic hierarchical namespace.
- A dense static flat address-space.
- The runtime system needs to support the abstraction that the language gives to the programmer.
3. Runtime issues: naming
- The data-model that the programmer uses in the program is a namespace.
- Use names instead of addresses to access data.
- Decrease the defect-rate by making the problem (of programming) easier.
- Namespaces can be flat or hierarchical.
- Compound names give more useful abstractions (structures, objects etc).
- Namespaces can be modelled as dictionaries, or as trees of dictionaries.
- Dictionaries are a sparse, and unordered, form of storage (flexibility makes them expressive).
- An address-space is basically an array; implementation issue.
4. Runtime issues: change
- Data is variable: if it doesn't change we use a constant.
- Memory is mutable: changes inside a datum is easy.
- But we have to handle changes to the collection of data.
- Consider a flat namespace: a single dictionary.
- We can change it in two ways:
- Overwrite values under the same keys.
- Add / Remove pairs of keys and values.
- Both of these kinds of changes are awkward in different ways.
- What if a value update changes its size?
- If we change the key-space how does it alter the address-space?
5. Runtime issues: lifetimes
- Details of memory allocators are in previous courses.
- Summary: the performance / density trade-off is complex.
- It depends heavily on access patterns.
- Language Design is influenced by those results.
- The lifetime of a value is the time between creation and disposal.
- The more that we constrain lifetimes, the more efficient the allocation that we can exploit.
- If every value has an independent lifetime then it is as hard as general memory allocation.
- Can we group lifetimes together: reduce the work?
- Can we nest lifetimes inside one another: reduce fragmentation...
6. Runtime issues: fixed / static / dynamic
- Fixed properties are constant.
- Static properties can be calculated at compile-time.
- They are defined by the source, same for every execution of the program,
- Dynamic properties are data-dependent.
- They can vary in different executions of the same source - they are only known at runtime.
- If we compare the flexibilty of the namespace with the mutability of the values:
- Cases that need a memory allocator.
- Cases where we can do better (faster).
7. Runtime techniques: precompute offsets
- When we know the set of names, and the value sizes at compile-time:
- Arrange the values in a table.
- The order doesn't matter - pick any.
- The sizes tell us the positions of the values that follow.
- The positions are offsets relative to the beginning.
- Replace each name with the offset.
- We've precomputed the relative addresses of the values.
- Resolved the dictionary-lookup at compile-time.
- The cost of access is zero.
8. Runtime techniques: managing the heap
- The heap is the data-segment in the process address space.
- It is defined as a start address and an extent.
- The process can ask for more by calling brk() to change the extent.
- This is a single dense array.
- There are many memory allocation algorithms used to implement new/delete or malloc/free ontop of the heap.
- If we allow variable sized blocks: fragmentation is a problem.
- If we fix the size of blocks: wasted memory and hardcoded limit in the language.
- We need a "least worst" solution (dlmalloc is popular).
- What does using the heap cost?
- At least one pointer dereference per access (~400 cycles on a L3 cache-miss).
- Sparse organisation of blocks will impair cache performance.
- Unknown latency for allocation.
9. Runtime techniques: partition the space
- Optimise for different behaviour:
- Heap for arbitray graph-like data
- Fixed allocation for globals.
- Stack for well-nested data (lifetimes correlation with activations).
- Guarantee zero fragmentation for data with fastest churn.
- Provide flexibility for data with unpredicatable lifetimes.
Testing for real
- Programmers like using arbitrary graph-like datatypes: e.g. objects/references.
- Exploiting static properties of data allows much higher performance.
- Very difficult to measure the difference in isolation.
- Rough "rule of thumb": 5-10x difference in performance.
10. Runtime techniques: handles
- When we have a variable-size piece of data:
- The position and size in memory are both dynamic.
- We may need to move it around in order to resize.
- Handles are a technique to split the data into two pieces.
- A fixed-size piece called: reference, pointer, handle etc.
- A second variable-size part.
- The handle is can be passed about more efficiently.
- Comes with a technique to find the dynamic part.
- Pointers are a cheap implementation technique.
- Handles create an aliasing problem:
- If we copy the fixed part, we have two references to the variable part.
- This causes much woe in the world.
11. Runtime techniques: handles
- Pointers are one implementation of handles, there are others.
- References are a more hygenic version of pointers.
- Arithmetic operations are disallowed: references can be created and copied, but never forged.
- The only way to alias the target of a reference is to copy the reference.
- If we track copy operations explicitly then we can control aliasing.
- Many different implementations: smart pointers (e.g. boost).
- Control of aliasing means that we can check for many common errors.
- Double freeing a pointer (crashes a system built on raw pointers / malloc).
- Bounds checking.
- Automatically tracking lifetimes, automatically freeing memory...
12. Runtime techniques: garbage collection
- If handles are hygenic then we can automate garbage collection.
- Data that can be reached from a root is still valid.
- Data that is unreachable is garbage - deallocate it.
- This doesn't work on pointers - can build an arbitrary address.
- Will stop memory leaks in the target code.
- As long as we invalidate (delete) the handles.
13. Runtime techniques: garbage collection
- Reference counting: hide a counter in each object.
- Increment for each handle creation / copy.
- Decrement for each handle deletion.
- Problem: overhead per operation (slow).
- Problem: can't handle cycles (graphs).
- Mark and sweep: hide a flag in each object.
- Every so often stop the world.
- Wipe all the flags.
- Follow the edges from every root, setting flags to indicate reachability.
- Delete every unreachable object.
- Problem: unpredicatable pauses in execution.
- Pauseless: break mark and sweep into incremental pieces.
- Realtime: current research area.
14. Summary of runtime
- We've only looked at storage issues in the runtime.
- Tends to be the most complex part of the language runtime.
- Generally the I/O layer is quite thin:
- Design the I/O semantics in the language to match the OS.
- Very wide range of possibilities:
- Static language features can be used to dramatically increase performance.
- Static languages are awkward to work in.
- Dynamic language features can dramatically increase productivity.
- But - while dynamic properties improve "writability" of a language, they impact the "readability".
- We've largely divided the world into two: heaps and stacks.
- Stacks are covered in the next part along with procedure calls.
15. Procedure calls
- Motivation: allow modularity in programs.
- Problem: divert control-flow temporarily.
- Continue at the same point after the call.
- The call must not corrupt local data (state).
- Arguments must be specified to define the sub-problem to work on.
- Results must be communicated back to the caller for the sub-problem.
- This is a form of resource sharing: similarity to paging or context-switching.
- Activation of a procedure makes it the current user of the machine.
- Calling another procedure (method or function) makes it active.
16. Activation Trees
- Run the program on some particular data.
- Record the activations in that execution.
- The activations form a tree; each parent node called its children nodes in sequence.
- i.e. ignore parallelism and asynchronus calls
- Each edge represents parameters in the call, and results being returned.
if n<3 then return 1
else return fib(n-1) + fib(n-2)
end .... some initial call n=4 ....
17. Activation Trees as traces
- Emphasis: This shows the structure of one particular execution of the program.
- Not the structure of the program source.
- High-level trace of one execution of the program.
- If we change the data then we get a different call sequence; draw a different tree.
- So this is a record of the execution path in a program.
- We are using very simple example programs.
- Real applications have very complex activation trees.
18. Paths in the tree are stacks
- At any point during execution the program has a current location in the Activation Tree.
- Each node is one activation.
- The path to a node is the set of saved activations that will continue.
- In a walk across the tree we only need to rewrite the end of the path, one item at a time.
- Hence using a stack to represent program location (cheapest structure on the machine).
19. Activation Records (general theory)
- What needs to go in each record; how do we walk across the Activiation Tree?
- Must resume suspended activations:
- Local storage
- Machine state.
- Must cross the tree edges in two directions:
- Parameters (going down)
- Returns (going up)
- Each record is a variable size:
- Pointer to be able to pop.
20. Activation Records (SYS V on x86_64)
- Frame pointers simplify management of variable length structures.
- Can be disabled to gain rbp as a free register.
- First four parameters are passed in registers.
- RDI, RSI, RDX, RCX (integers or pointers).
- Any more spill into the stack frame.
- As well as the 8-byte alignment, each frame is on a 32-byte boundary.
- Red zone is 128 bytes of scratch space (safe from interrupts).
- R10, R11 are the caller's reponsibility (save if needed before making call):
- RBX, RBP, RSP are the callee's responsibility (save if used).
- RSP can be recalculated from the frame pointer.
21. Example Call Sequence
- Example is both a callee and a caller.
- This lets us examine the steps in the whole sequence.
- Compiler will optimise away the bits we want to see.
- So gcc -march=x86-64 -O0 dummy.c -S
- f1 is active as the last frame on the stack.
- At this point in time it looks like a leaf in the activation tree.
- Small symbol tables live in the red zone, but this one is bigger...
long f1(long a, long b, long c, long d)
long x=a*b, y=c*d;
movq %rsp, %rbp
22. Example Call Sequence
- When the symbol table overflows the red zone, or the function is a caller, it needs to make the space explicit (move RSP).
- RSP is tracking the end of the stack frame.
- RBP is used to index into the frame.
- We can use the multiplications to work out where we are relative to the source.
- Work backwards to associate the offsets with symbols.
subq $48, %rsp
movq %rdi, -24(%rbp)
movq %rsi, -32(%rbp)
movq %rdx, -40(%rbp)
movq %rcx, -48(%rbp)
movq -24(%rbp), %rax
imulq -32(%rbp), %rax
movq %rax, -8(%rbp)
movq -40(%rbp), %rax
imulq -48(%rbp), %rax
movq %rax, -16(%rbp)
23. Example Call Sequence
- Put the symbols into registers to match the calling convention.
- The call pushs the return address onto RSP.
- So we are back to the same state as two slides ago (callee needs to execute their prologue).
- Cleanup will recover old frame RBP = *RBP, RSP = RBP-8.
- The ret pops the return address.
- Results are normally in RAX.
movq -16(%rbp), %rdx
movq -8(%rbp), %rax
movq %rdx, %rsi
movq %rax, %rdi
24. More complex frames
- The prologue can be a lot more complicated.
- Non-trivial sizes need to be calculated.
- Arrays need space to be allocated.
- This can depend on the result of previous calculations.
- The variable-length array prologue wouldn't fit on the slides.
- Removing the frame pointer is a common optimisation.
- This frees up a register: win for dense arithmetic code, can prevent a spill.
- Makes symbol access more complex.
- If we need the offset from RSP it could be non-constant...
- Can make some code slower: complex trade-off.
25. Security Issues: return addresses
- The stack contains both code and data.
- The code is explicit: return address affect the control-flow.
- If we can overwrite a return address then we can jump somewhere the programmer did not intend.
- When data/code are separated - impossible for data access to direct control-flow.
- It can only affect it indirectly through decisions in the program.
- When they mix - we must be very careful.
- Any unsanitised data is dangerous.
- Handles only (no arrays outside the heap) offsets some protection.
- Setting noexec on the stack prevents injection of arbitrary code.
26. Security Issues: buffer overruns
- Problems on the stack are a consequence of performance.
- We could check every array access to prevent overruns.
- But this is slow...
- There are other ways to abuse the lack of bounds checks.
- If we can influence index expressions we can write to arbitrary addresses.
- It doesn't matter if the array is on the stack or not.
- The program has permissions over its own address space.
- This is a much harder class of exploit to get working.
- It requires detailed knowledge of the address-space of the process.
- We can gain this through debugging if there are constants between executions.
- Address-space randomisation makes this harder (impossible?).
- We've reached the end of the course!
- We've seen a complete end-to-end compilation of (some of) Lua into x86_64.
- So we've seen a taste of how the subject applies to the real world.
- Hopefully you've learnt lots of transferable knowledge.
- Parsing is ugly - but useful.
- Embedded languages, file formats, entire language tools.
- Dirty details of some of the ABI on a modern machine.
- Intermediate Representations.
- Has a lot of application in understanding how your programs behave in other people's compilers.
28. Final Thoughts
- But this is a huge area; even material for several courses.
- There are many interesting starting points for independent study.
- Static Analysis, Types and safety.
- Error checking and recovery.
- Program Analysis: recovering the behaviour.
- There are powerful tools to extend what you've learnt.
- LLVM - Production-strength IR for C / C++ style languages.
- Please give feedback - course evaluations will be at the end of LP4.