# Real-Time Operating Systems

08:15-10:00 Tuesday, September 20th, 2016

The Memory Subsystem.

§3-3.3.1 (pg 181-198)

Memory IssuesAllocation Techniques
SwappingMMU
Allocation

# 1. Introduction

• Programmers see a simple abstraction of memory.
• It can be cut up into many pieces.
• Pointers link pieces together into data- structures.
• The interface to the O/S is simple:
• Agree to give some back?
• Ideally: large, fast and cheap.
• Really: too slow, too small and too expensive.
• The memory manager abstracts the memory hierarchy and provides a simple interface.

# 2. No Abstraction: Physical Addressing

• Without a memory manager there is no abstraction: physical addresses.
• Program sees the entire physical address space (partly reserved for OS).
• Addresses in the program are sent to the memory chips: e.g. LOAD $23, R1 • No way to do multiprogramming; only one program in memory. • Programs use hardcoded addresses in physical memory. # 3. Types of memory management • In this chapter we look at two kinds of memory manager: Partitions (with swapping) Older simpler scheme, simple hardware, fixed size programs Virtual Memory (with paging) Newer (more complex) scheme, requires MMU, different program sizes • The newer scheme reuses some parts of the simpler scheme. • First we look at the simpler approach. • Look at memory allocation and the fragmentation problem. • Introduce the MMU hardware. • Look at the more complex approach. # 4. Co-existence is difficult • How do we run multiple programs? • Timesharing is easy (we look at swapping later). • Putting both programs in memory at the same time is harder. • If we look inside the code for a typical program. • There is an asymmetry between code and data. • When we access data the address is normally calculated. • When we access code the address is normally constant. • Moving the data to new addresses is easier. • Just insert some extra calculation. • Moving the code to new addresses is harder... # 5. The Relocation Problem • The asymmetry in the instruction set makes it harder to move the code. • Each jump target is a hardcoded number inside the instruction. • If we try to put the code into new addresses... • It still uses the originals whenever it jumps. • Left: two programs using addresses 0-16K • Right: copy white program higher addresses. • These two programs would not last for long before crashing. • Digging the addresses out of the instruction encoding is surprisingly hard. • Do something else instead. # 6. Address Spaces: Concept Address Space Mapping from labels (keys) to values (normally numbers). • Think of an address space as a map (or "dictionary") data-type. • The key-space can be sparse, e.g. domain names. • The value-space is a dense range, e.g. physical addresses. • This is a level of indirection inbetween the program and the machine. Virtualisation Analogy Processes appear to use a private CPU, the address-space "virtualises" memory. • Each program can have its own private address range. • These can overlap, e.g. 0x1000 can be used in program A and B. • At runtime: map the private addresses onto the global physical range. # 7. Address Spaces: Base and Limit • Simple implementation of the concept: • base register holds an offset into memory. • limit register hold the size of memory. • Very simple dynamic scheme inside the CPU. • Every instruction that is executed: • Take any address that is used: x. • If x>=limit generate an error. • x'=x+base add the offset. • Use x' as the physical address. • Same example as earlier: • The address 28 is still hardcoded. • CPU jumps to address 16384+28=16412. # 8. Address Spaces: Swapping • The Base and Limit scheme allows multiple processes. • But the memory is partitioned: fixed size causes wasted space. • Each partition must be as big as the program might need. • During the execution of one program: no other memory is used. • Swapping moves programs out of memory when not running. • Example lifecyle of process (only in memory when running). • a) Process A loaded. • b) B is loaded while A runs. • c) A is written to disk "swapped out" • d) Memory for A is freed. • g) Next execution of A - different address. # 9. Address Spaces: Swapping • If the set of processes is scheduled forever: ABCDABCD... • Allocation and free-ing is always in the same order. • Always one big hole looping through the system. • When processes exit in a different order: a new hole appears. • e.g in diagram c) if processes A and C both exit before B. • Fragmentation: multiple holes with different sizes. • The free space is split into fragments throughout memory. • Compaction is one solution: move everything down. • Much like defragging a disk: very slow process. • Problem: base+limit is a single consecutive range per process. # 10. Address Spaces: Swapping • There is a similar problem if a process asks for more memory. • If there is no adjacent hole: need to rearrange memory - slow. • Avoid problem by leaving some extra space. • The unused space does not need to be loaded/saved - just counted. • When swapping a process in, if the slack is too small can increase it cheaply (no rearranging). • Generally we leave the slack for both the heap and the stack to grow into. # 11. Representing Memory Allocation • System doesn't track free memory in bytes: too costly. • Chooses an allocation unit size e.g. 4KB, tracks units used/free. • Example: 5 processes occupying units, three holes (free memory). • Two approaches for representing this allocation state: • Array of bits (bitmap) - one per unit. • Linked list of records. # 12. Representing Memory Allocation • Allocator needs to answer calls: alloc(n) with a large enough hole. • Bitmaps are always a fixed size: fixed size is just $$\frac{memory}{unit}$$ • Finding free holes big enough to allocate into - requires a scan. • Linked list holds the same information in a different form. • Different trade-offs for fast/slow operations. • Variable length structure: implementation is more complex. # 13. Memory Management: Fragmentation • When processes and holes are kept in a list it is easy to find neighbours. • There are four possible cases for the neighbouring segments. • In a) the process is replaced by a new hole. • In b) / c) the new space extends a neighbouring hole. • In d) two existing holes are extended / merged into one larger hole. • When the machine is busy, case a) is statistically more likely. • This increases fragmentation: lots of uniquely sized holes. • e.g. 20 units free: 1+3+1+7+4+4, cannot allocate 8-units. # 14. Searching for free blocks • In order to allocate a segment of memory, the manager must find a large enough hole. • When the representation is a linked list this requires scanning the list to search. • The search can performed several ways. • The examples on the right show an allocation of 18 units. • Find the first hole large enough. • Find the next hole large enough (remember where the last scan finished). • Find the best hole that wastes the least space (same as first in the example). • Find the worst hole that wastes the most space. # Break (15mins) Intermission # 15. Searching for free blocks • Each algorithm encodes assumptions: • First fit - minimise the amount of seaching to do. • Next fit - don't repeat the search already done. • Best fit - minimise the wasted space. • Worst fit - leave the most useful hole behind. • These assumptions depend on a complex tradeoff between memory and time efficiency for the (unknown) applications: simulate it.  A comparison of next-fit, first-fit, and best-fit Carter Bays. CACM, Vol 20 Issue 3, March 1977 http://dl.acm.org/citation.cfm?id=359453 # 16. Memory Efficiency • The memory manager uses a block-size for allocations, e.g. 16kb in the example. • Older process (C) allocated at address 64kb. • A hole existed between 0-63kb. • Process A is to be allocated 40kb. • Round up to the block size: 3 blocks = 48kb. • 8kb is lost from rounding, inside a block. • This is called internal fragmentation. • 64kb hole filled with 3 blocks, and a hole of 1 block. • The 16kb hole is a full block, but can only use for a small process. • This is called external fragmentation. # 17. Tradeoffs in memory allocation • Internal and External Fragmentation occur in many systems. • Any time we need to cut a divisible resource into variable-sized pieces. • File-systems, Memory Partitions, Run-time memory-management (e.g. new/delete in the standard C++ library). • Internal Fragmentation can be reduced by using a smaller block-size. • But this increases External Fragmentation. • Finer granulatity in allocation sizes means more uniquely sized holes. • External Fragmentation is a disaster: compaction means "stop the world". • Internal Fragmentation is wasteful, but not catastrophic. • In finding a way to avoid external fragmentation we can tolerate reasonably low internal fragmentation. • We can ask: can we approximate optimal layout by a constant factor? # 18. The Buddy System • If we round allocations up to the nearest power of two... • ...the internal fragmentation can be no more than 50%. • There are fewer powers of two, than arbitrary request sizes. • Trade some internal fragmentation for higher performance. • Start with a single piece of memory. • Cut it into half until we reach the rounded-up size. • When request can be satisfied with an existing piece it is fast. • When memory is freed - only need to check if it can coalesce with its buddy. # 19. The Buddy System • When blocks are split it is always into two pieces. • We can draw the state of the allocation in a tree. • The root is the entire memory range. • If it is split we colour the node, and add two children. • Unsplit nodes point directly to the region of memory. • Each level of the tree contains nodes of the same size. • Nodes can be linked into lists (the blue lines). • Node can be reached either via tree (for free / coalesce), or via list for allocation. • No scanning is necessary for allocation. • Only splitting blocks requires work: very fast. # 20. SLABs • If we allocate many small regions, each will waste some memory (internal fragmentation). • Worst case in the example: $$2^n+1$$ • If they are all the same size we can manage better than this. • e.g. one kind of common kernel data-structure. • A more efficient approach is to insert an extra layer before the memory allocation (slab). • The slab manages a pool (array) of fixed size elements. • Zero internal fragmentation in the slab. • Allocation/Freeing items in the slab is very fast (never any splitting). # 21. Linux Physical Allocation • A 4kb block of physical memory (the global address space) is called a page frame. • Slabs and user data sit inside allocations from this page-allocater. • Doubly-linked lists for powers of two block sizes (free blocks). • Array of pointers into free lists. • Lower addresses (16MB) have special rules (ZONE_DMA). • The rest is ZONE_NORMAL. • Data-structures used inside the kernel are stored in slabs called object caches. • Provides fast memory management for the kernel. # 22. Overlays • Not of the code in program runs at the same time. • We can take advantage of this to write bigger programs than there is available memory. • Idea: only load in the parts of the program that are currently needed. • Swap independent parts into the same region of memory. • Similar to swapping, but small pieces of programs code, not whole memory. • Writing good overlay descriptions is tedious and difficult. • Try to minimise the number of swaps to improve performance. • Doing this automatically would be better. # 23. Virtual Memory • We've now pushed the abstraction provided by dynamic partitions as far as it can go. • External fragmentation is an issue that cannot be solved completely with partitions. • Instead we use a technique called Virtual Memory. • There are two changes to the memory abstraction that we have seen so far. • The base+limit registers are replaced with a separate mapping from each page to a frame. • Swapping is performed on each page, rather than an entire process memory space. • To make this work we need a new piece of hardware in the machine... # 24. Virtual Memory: MMU • The Memory Management Unit (MMU) sits between the processor and the memory bus. • Physically it is often within the same package. • Logically it is a separate unit, independent from the processor. • The processor uses a separate address space to the memory. • The processor address space contains virtual addresses. • The memory address space contains physical addresses. • Addresses are split into two parts: high=page, low=offset. • The MMU implements a mapping between virtual and physical page. # 25. Virtual Memory: MMU • Memory knows nothing of virtual addresses. • It simply performs read/write operations on physical addresses. • The program executing on the CPU does not need to know about physical addresses at all. • It operates entirely within the virtual address space. • The MMU is a layer of indirection inbetween to isolate these two components from one another. • The O/S programs the MMU with the details of the mapping. • Idea: it's a dictionary mapping on addresses. # 26. Virtual Memory: MMU Examples • A program executes movl$3, %eax; movl (%eax), %ebx.
• The virtual address 3 is sent to the MMU as a read request.
• The MMU sees this is page 0 (drop the low 12 bits).
• Page 0 maps onto frame 2.
• The offset within the page is 3: target address is 8192+3=8195.
• Another program tries movl \$32780, %eax; movl (%eax), %ebx.
• This is page 8 offset 12 ($$2^{15}=32678$$.
• The page is unmapped in the MMU.
• It signals the CPU there is an error...

# 27. Quick sketch of Page Faults

• The book delays page-fault details until § 3.6.2 (lecture 8).
• But logically page faults are part of the execution of the MMU.
• So we have a brief sketch of how they work now.
• There is a hardware interrupt called the page fault.
• This is trapped by the OS, which swaps pages in memory/disk.
• The instruction is then repeated and succeeds.

# 28. Summary

• Direct use of memory imposes problems.
• A single dense address range is difficult to manage.
• Programs need to be relocated.
• External fragmentation is a difficult issue.
• Virtual Memory solves this issue.
• The cost is extra hardware to support a new abstraction.
• We look at the details of Virtual Memory in the next lecture.