DV1460 / DV1492:
Real-Time Operating Systems
08:15-10:00 Thursday, September 14th, 2017
The Memory Subsystem.
§3-3.3.1 (pg 181-198)
Table of Contents
Memory Issues | Allocation Techniques |
Base+Limit | Tradeoffs |
Swapping | MMU |
Allocation |
1. Introduction
- Programmers see a simple abstraction of memory.
- It can be cut up into many pieces.
- Single global address space.
- Pointers link pieces together into data- structures.
- The interface to the O/S is simple:
- Ask for more memory.
- Agree to give some back?
- Ideally: large, fast and cheap.
- Really: too slow, too small and too expensive.
- The memory manager abstracts the memory hierarchy and provides a simple interface.
2. No Abstraction: Physical Addressing
- Without a memory manager there is no abstraction: physical addresses.
- Program sees the entire physical address space (partly reserved for OS).
- Addresses in the program are sent to the memory chips: e.g. LOAD $23, R1
- No way to do multiprogramming; only one program in memory.
- Programs use hardcoded addresses in physical memory.
3. Types of memory management
- In this chapter we look at two kinds of memory manager:
Partitions (with swapping)
Older simpler scheme, simple hardware, fixed size programs
Virtual Memory (with paging)
Newer (more complex) scheme, requires MMU, different program sizes
- The newer scheme reuses some parts of the simpler scheme.
- First we look at the simpler approach.
- Look at memory allocation and the fragmentation problem.
- Introduce the MMU hardware.
- Look at the more complex approach.
4. Co-existence is difficult
- How do we run multiple programs?
- Timesharing is easy (we look at swapping later).
- Putting both programs in memory at the same time is harder.
- If we look inside the code for a typical program.
- There is an asymmetry between code and data.
- When we access data the address is normally calculated.
- When we access code the address is normally constant.
- Moving the data to new addresses is easier.
- Just insert some extra calculation.
- Moving the code to new addresses is harder...
5. The Relocation Problem
- The asymmetry in the instruction set makes it harder to move the code.
- Each jump target is a hardcoded number inside the instruction.
- If we try to put the code into new addresses...
- It still uses the originals whenever it jumps.
- Left: two programs using addresses 0-16K
- Right: copy white program higher addresses.
- These two programs would not last for long before crashing.
- Digging the addresses out of the instruction encoding is surprisingly hard.
- Do something else instead.
6. Address Spaces: Concept
Address Space
Mapping from labels (keys) to values (normally numbers).
- Think of an address space as a map (or "dictionary") data-type.
- The key-space can be sparse, e.g. domain names.
- The value-space is a dense range, e.g. physical addresses.
- This is a level of indirection inbetween the program and the machine.
Virtualisation Analogy
Processes appear to use a private CPU, the address-space "virtualises" memory.
- Each program can have its own private address range.
- These can overlap, e.g. 0x1000 can be used in program A and B.
- At runtime: map the private addresses onto the global physical range.
7. Address Spaces: Base and Limit
- Simple implementation of the concept:
- base register holds an offset into memory.
- limit register hold the size of memory.
- Very simple dynamic scheme inside the CPU.
- Every instruction that is executed:
- Take any address that is used: x.
- If x>=limit generate an error.
- x'=x+base add the offset.
- Use x' as the physical address.
- Same example as earlier:
- The address 28 is still hardcoded.
- CPU jumps to address 16384+28=16412.
8. Address Spaces: Swapping
- The Base and Limit scheme allows multiple processes.
- But the memory is partitioned: fixed size causes wasted space.
- Each partition must be as big as the program might need.
- During the execution of one program: no other memory is used.
- Swapping moves programs out of memory when not running.
- Example lifecyle of process (only in memory when running).
- a) Process A loaded.
- b) B is loaded while A runs.
- c) A is written to disk "swapped out"
- d) Memory for A is freed.
- g) Next execution of A - different address.
9. Address Spaces: Swapping
- If the set of processes is scheduled forever: ABCDABCD...
- Allocation and free-ing is always in the same order.
- Always one big hole looping through the system. Life is good.
- Problem: if processes exit in a different order a new hole appears.
- e.g in diagram c) if processes A and C both exit before B.
- Fragmentation: multiple holes with different sizes.
- The free space is split into fragments throughout memory.
- Compaction is one solution: move everything down.
- Much like defragging a disk: very slow process.
- Problem: base+limit is a single consecutive range per process.
10. Address Spaces: Swapping
- There is a similar problem if a process asks for more memory.
- If there is no adjacent hole: need to rearrange memory - slow.
- Avoid problem by leaving some extra space.
- The unused space does not need to be loaded/saved - just counted.
- When swapping a process in, if the slack is too small can increase it cheaply (no rearranging).
- Generally we leave the slack for both the heap and the stack to grow into.
11. Representing Memory Allocation
- System doesn't track free memory in bytes: too costly.
- Chooses an allocation unit size e.g. 4KB, tracks units used/free.
- Example: 5 processes occupying units, three holes (free memory).
- Two approaches for representing this allocation state:
- Array of bits (bitmap) - one per unit.
- Linked list of records.
12. Representing Memory Allocation
- Allocator needs to answer calls: alloc(n) with a large enough hole.
- Bitmaps are always a fixed size: fixed size is just \(\frac{memory}{unit}\)
- Finding free holes big enough to allocate into - requires a scan.
- Linked list holds the same information in a different form.
- Different trade-offs for fast/slow operations.
- Variable length structure: implementation is more complex.
13. Memory Management: Fragmentation
- When processes and holes are kept in a list it is easy to find neighbours.
- There are four possible cases for the neighbouring segments.
- In a) the process is replaced by a new hole.
- In b) / c) the new space extends a neighbouring hole.
- In d) two existing holes are extended / merged into one larger hole.
- When the machine is busy, case a) is statistically more likely.
- This increases fragmentation: lots of uniquely sized holes.
- e.g. 20 units free: 1+3+1+7+4+4, cannot allocate 8-units.
14. Searching for free blocks
- In order to allocate a segment of memory, the manager must find a large enough hole.
- When the representation is a linked list this requires scanning the list to search.
- The search can performed several ways.
- The examples on the right show an allocation of 18 units.
- Find the first hole large enough.
- Find the next hole large enough (remember where the last scan finished).
- Find the best hole that wastes the least space (same as first in the example).
- Find the worst hole that wastes the most space.
Break (15mins)
Intermission
15. Searching for free blocks
- Each algorithm encodes assumptions:
- First fit - minimise the amount of seaching to do.
- Next fit - don't repeat the search already done.
- Best fit - minimise the wasted space.
- Worst fit - leave the most useful hole behind.
- These assumptions depend on a complex tradeoff between memory and time efficiency for the (unknown) applications: simulate it.
16. Memory Efficiency
- The memory manager uses a block-size for allocations, e.g. 16kb in the example.
- Older process (C) allocated at address 64kb.
- A hole existed between 0-63kb.
- Process A is to be allocated 40kb.
- Round up to the block size: 3 blocks = 48kb.
- 8kb is lost from rounding, inside a block.
- This is called internal fragmentation.
- 64kb hole filled with 3 blocks, and a hole of 1 block.
- The 16kb hole is a full block, but can only use for a small process.
- This is called external fragmentation.
17. Tradeoffs in memory allocation
- Internal and External Fragmentation occur in many systems.
- Any time we need to cut a divisible resource into variable-sized pieces.
- File-systems, Memory Partitions, Run-time memory-management (e.g. new/delete in the standard C++ library).
- Internal Fragmentation can be reduced by using a smaller block-size.
- But this increases External Fragmentation.
- Finer granulatity in allocation sizes means more uniquely sized holes.
- External Fragmentation is a disaster: compaction means "stop the world".
- Internal Fragmentation is wasteful, but not catastrophic.
- In finding a way to avoid external fragmentation we can tolerate reasonably low internal fragmentation.
- We can ask: can we approximate optimal layout by a constant factor?
18. The Buddy System
- If we round allocations up to the nearest power of two...
- ...the internal fragmentation can be no more than 50%.
- There are fewer powers of two, than arbitrary request sizes.
- Trade some internal fragmentation for higher performance.
- Start with a single piece of memory.
- Cut it into half until we reach the rounded-up size.
- When request can be satisfied with an existing piece it is fast.
- When memory is freed - only need to check if it can coalesce with its buddy.
19. The Buddy System
- When blocks are split it is always into two pieces.
- We can draw the state of the allocation in a tree.
- The root is the entire memory range.
- If it is split we colour the node, and add two children.
- Unsplit nodes point directly to the region of memory.
- Each level of the tree contains nodes of the same size.
- Nodes can be linked into lists (the blue lines).
- Node can be reached either via tree (for free / coalesce), or via list for allocation.
- No scanning is necessary for allocation.
- Only splitting blocks requires work: very fast.
20. SLABs
- If we allocate many small regions, each will waste some memory (internal fragmentation).
- Worst case in the example: \(2^n+1\)
- If they are all the same size we can manage better than this.
- e.g. one kind of common kernel data-structure.
- A more efficient approach is to insert an extra layer before the memory allocation (slab).
- The slab manages a pool (array) of fixed size elements.
- Zero internal fragmentation in the slab.
- Allocation/Freeing items in the slab is very fast (never any splitting).
21. Linux Physical Allocation
- A 4kb block of physical memory (the global address space) is called a page frame.
- Slabs and user data sit inside allocations from this page-allocater.
- Doubly-linked lists for powers of two block sizes (free blocks).
- Array of pointers into free lists.
- Lower addresses (16MB) have special rules (ZONE_DMA).
- The rest is ZONE_NORMAL.
- Data-structures used inside the kernel are stored in slabs called object caches.
- Provides fast memory management for the kernel.
22. Overlays
- Not of the code in program runs at the same time.
- We can take advantage of this to write bigger programs than there is available memory.
- Idea: only load in the parts of the program that are currently needed.
- Swap independent parts into the same region of memory.
- Similar to swapping, but small pieces of programs code, not whole memory.
- Writing good overlay descriptions is tedious and difficult.
- Try to minimise the number of swaps to improve performance.
- Doing this automatically would be better.
23. Virtual Memory
- We've now pushed the abstraction provided by dynamic partitions as far as it can go.
- External fragmentation is an issue that cannot be solved completely with partitions.
- Instead we use a technique called Virtual Memory.
- There are two changes to the memory abstraction that we have seen so far.
- The base+limit registers are replaced with a separate mapping from each page to a frame.
- Swapping is performed on each page, rather than an entire process memory space.
- To make this work we need a new piece of hardware in the machine...
24. Virtual Memory: MMU
- The Memory Management Unit (MMU) sits between the processor and the memory bus.
- Physically it is often within the same package.
- Logically it is a separate unit, independent from the processor.
- The processor uses a separate address space to the memory.
- The processor address space contains virtual addresses.
- The memory address space contains physical addresses.
- Addresses are split into two parts: high=page, low=offset.
- The MMU implements a mapping between virtual and physical page.
25. Virtual Memory: MMU
- Memory knows nothing of virtual addresses.
- It simply performs read/write operations on physical addresses.
- The program executing on the CPU does not need to know about physical addresses at all.
- It operates entirely within the virtual address space.
- The MMU is a layer of indirection inbetween to isolate these two components from one another.
- The O/S programs the MMU with the details of the mapping.
- Idea: it's a dictionary mapping on addresses.
26. Virtual Memory: MMU Examples
- A program executes movl $3, %eax; movl (%eax), %ebx.
- The virtual address 3 is sent to the MMU as a read request.
- The MMU sees this is page 0 (drop the low 12 bits).
- Page 0 maps onto frame 2.
- The offset within the page is 3: target address is 8192+3=8195.
- Another program tries movl $32780, %eax; movl (%eax), %ebx.
- This is page 8 offset 12 (\(2^{15}=32678\).
- The page is unmapped in the MMU.
- It signals the CPU there is an error...
27. Quick sketch of Page Faults
- The book delays page-fault details until § 3.6.2 (lecture 8).
- But logically page faults are part of the execution of the MMU.
- So we have a brief sketch of how they work now.
- There is a hardware interrupt called the page fault.
- This is trapped by the OS, which swaps pages in memory/disk.
- The instruction is then repeated and succeeds.
28. Summary
- Direct use of memory imposes problems.
- A single dense address range is difficult to manage.
- Programs need to be relocated.
- External fragmentation is a difficult issue.
- Virtual Memory solves this issue.
- The cost is extra hardware to support a new abstraction.
- We look at the details of Virtual Memory in the next lecture.