DV1460 / DV1492:

Real-Time Operating Systems

08:15-10:00 Tuesday, September 20th, 2016

The Memory Subsystem.

§3-3.3.1 (pg 181-198)

Table of Contents
Memory IssuesAllocation Techniques
Base+LimitTradeoffs
SwappingMMU
Allocation

1. Introduction

  • Programmers see a simple abstraction of memory.
    • It can be cut up into many pieces.
    • Single global address space.
    • Pointers link pieces together into data- structures.
  • The interface to the O/S is simple:
    • Ask for more memory.
    • Agree to give some back?
  • Ideally: large, fast and cheap.
  • Really: too slow, too small and too expensive.
  • The memory manager abstracts the memory hierarchy and provides a simple interface.

2. No Abstraction: Physical Addressing

  • Without a memory manager there is no abstraction: physical addresses.
  • Program sees the entire physical address space (partly reserved for OS).
  • Addresses in the program are sent to the memory chips: e.g. LOAD $23, R1
  • No way to do multiprogramming; only one program in memory.
    • Programs use hardcoded addresses in physical memory.

3. Types of memory management

  • In this chapter we look at two kinds of memory manager:
Partitions (with swapping)
Older simpler scheme, simple hardware, fixed size programs
Virtual Memory (with paging)
Newer (more complex) scheme, requires MMU, different program sizes
  • The newer scheme reuses some parts of the simpler scheme.
  • First we look at the simpler approach.
    • Look at memory allocation and the fragmentation problem.
    • Introduce the MMU hardware.
    • Look at the more complex approach.

4. Co-existence is difficult

  • How do we run multiple programs?
    • Timesharing is easy (we look at swapping later).
    • Putting both programs in memory at the same time is harder.
  • If we look inside the code for a typical program.
    • There is an asymmetry between code and data.
    • When we access data the address is normally calculated.
    • When we access code the address is normally constant.
  • Moving the data to new addresses is easier.
    • Just insert some extra calculation.
  • Moving the code to new addresses is harder...

5. The Relocation Problem

  • The asymmetry in the instruction set makes it harder to move the code.
  • Each jump target is a hardcoded number inside the instruction.
  • If we try to put the code into new addresses...
    • It still uses the originals whenever it jumps.
  • Left: two programs using addresses 0-16K
  • Right: copy white program higher addresses.
  • These two programs would not last for long before crashing.
  • Digging the addresses out of the instruction encoding is surprisingly hard.
  • Do something else instead.

6. Address Spaces: Concept

Address Space
Mapping from labels (keys) to values (normally numbers).
  • Think of an address space as a map (or "dictionary") data-type.
    • The key-space can be sparse, e.g. domain names.
    • The value-space is a dense range, e.g. physical addresses.
  • This is a level of indirection inbetween the program and the machine.
Virtualisation Analogy
Processes appear to use a private CPU, the address-space "virtualises" memory.
  • Each program can have its own private address range.
  • These can overlap, e.g. 0x1000 can be used in program A and B.
  • At runtime: map the private addresses onto the global physical range.

7. Address Spaces: Base and Limit

  • Simple implementation of the concept:
    • base register holds an offset into memory.
    • limit register hold the size of memory.
  • Very simple dynamic scheme inside the CPU.
  • Every instruction that is executed:
    • Take any address that is used: x.
    • If x>=limit generate an error.
    • x'=x+base add the offset.
    • Use x' as the physical address.
  • Same example as earlier:
    • The address 28 is still hardcoded.
    • CPU jumps to address 16384+28=16412.

8. Address Spaces: Swapping

  • The Base and Limit scheme allows multiple processes.
    • But the memory is partitioned: fixed size causes wasted space.
    • Each partition must be as big as the program might need.
  • During the execution of one program: no other memory is used.
  • Swapping moves programs out of memory when not running.
  • Example lifecyle of process (only in memory when running).
  • a) Process A loaded.
  • b) B is loaded while A runs.
  • c) A is written to disk "swapped out"
  • d) Memory for A is freed.
  • g) Next execution of A - different address.

9. Address Spaces: Swapping

  • If the set of processes is scheduled forever: ABCDABCD...
    • Allocation and free-ing is always in the same order.
    • Always one big hole looping through the system.
  • When processes exit in a different order: a new hole appears.
    • e.g in diagram c) if processes A and C both exit before B.
  • Fragmentation: multiple holes with different sizes.
    • The free space is split into fragments throughout memory.
  • Compaction is one solution: move everything down.
    • Much like defragging a disk: very slow process.
  • Problem: base+limit is a single consecutive range per process.

10. Address Spaces: Swapping

  • There is a similar problem if a process asks for more memory.
    • If there is no adjacent hole: need to rearrange memory - slow.
  • Avoid problem by leaving some extra space.
    • The unused space does not need to be loaded/saved - just counted.
    • When swapping a process in, if the slack is too small can increase it cheaply (no rearranging).
  • Generally we leave the slack for both the heap and the stack to grow into.

11. Representing Memory Allocation

  • System doesn't track free memory in bytes: too costly.
  • Chooses an allocation unit size e.g. 4KB, tracks units used/free.
  • Example: 5 processes occupying units, three holes (free memory).
  • Two approaches for representing this allocation state:
    • Array of bits (bitmap) - one per unit.
    • Linked list of records.

12. Representing Memory Allocation

  • Allocator needs to answer calls: alloc(n) with a large enough hole.
  • Bitmaps are always a fixed size: fixed size is just \(\frac{memory}{unit}\)
    • Finding free holes big enough to allocate into - requires a scan.
  • Linked list holds the same information in a different form.
    • Different trade-offs for fast/slow operations.
    • Variable length structure: implementation is more complex.

13. Memory Management: Fragmentation

  • When processes and holes are kept in a list it is easy to find neighbours.
  • There are four possible cases for the neighbouring segments.
  • In a) the process is replaced by a new hole.
  • In b) / c) the new space extends a neighbouring hole.
  • In d) two existing holes are extended / merged into one larger hole.
  • When the machine is busy, case a) is statistically more likely.
  • This increases fragmentation: lots of uniquely sized holes.
    • e.g. 20 units free: 1+3+1+7+4+4, cannot allocate 8-units.

14. Searching for free blocks

  • In order to allocate a segment of memory, the manager must find a large enough hole.
  • When the representation is a linked list this requires scanning the list to search.
  • The search can performed several ways.
  • The examples on the right show an allocation of 18 units.
    • Find the first hole large enough.
    • Find the next hole large enough (remember where the last scan finished).
    • Find the best hole that wastes the least space (same as first in the example).
    • Find the worst hole that wastes the most space.

Break (15mins)





Intermission

15. Searching for free blocks

  • Each algorithm encodes assumptions:
    • First fit - minimise the amount of seaching to do.
    • Next fit - don't repeat the search already done.
    • Best fit - minimise the wasted space.
    • Worst fit - leave the most useful hole behind.
  • These assumptions depend on a complex tradeoff between memory and time efficiency for the (unknown) applications: simulate it.
A comparison of next-fit, first-fit, and best-fit
Carter Bays. CACM, Vol 20 Issue 3, March 1977
http://dl.acm.org/citation.cfm?id=359453

16. Memory Efficiency

  • The memory manager uses a block-size for allocations, e.g. 16kb in the example.
    • Older process (C) allocated at address 64kb.
    • A hole existed between 0-63kb.
  • Process A is to be allocated 40kb.
    • Round up to the block size: 3 blocks = 48kb.
    • 8kb is lost from rounding, inside a block.
    • This is called internal fragmentation.
  • 64kb hole filled with 3 blocks, and a hole of 1 block.
    • The 16kb hole is a full block, but can only use for a small process.
    • This is called external fragmentation.

17. Tradeoffs in memory allocation

  • Internal and External Fragmentation occur in many systems.
    • Any time we need to cut a divisible resource into variable-sized pieces.
    • File-systems, Memory Partitions, Run-time memory-management (e.g. new/delete in the standard C++ library).
  • Internal Fragmentation can be reduced by using a smaller block-size.
    • But this increases External Fragmentation.
    • Finer granulatity in allocation sizes means more uniquely sized holes.
  • External Fragmentation is a disaster: compaction means "stop the world".
  • Internal Fragmentation is wasteful, but not catastrophic.
  • In finding a way to avoid external fragmentation we can tolerate reasonably low internal fragmentation.
  • We can ask: can we approximate optimal layout by a constant factor?

18. The Buddy System

  • If we round allocations up to the nearest power of two...
    • ...the internal fragmentation can be no more than 50%.
  • There are fewer powers of two, than arbitrary request sizes.
    • Trade some internal fragmentation for higher performance.
  • Start with a single piece of memory.
    • Cut it into half until we reach the rounded-up size.
  • When request can be satisfied with an existing piece it is fast.
  • When memory is freed - only need to check if it can coalesce with its buddy.

19. The Buddy System

  • When blocks are split it is always into two pieces.
  • We can draw the state of the allocation in a tree.
  • The root is the entire memory range.
  • If it is split we colour the node, and add two children.
  • Unsplit nodes point directly to the region of memory.
  • Each level of the tree contains nodes of the same size.
  • Nodes can be linked into lists (the blue lines).
  • Node can be reached either via tree (for free / coalesce), or via list for allocation.
  • No scanning is necessary for allocation.
  • Only splitting blocks requires work: very fast.

20. SLABs

  • If we allocate many small regions, each will waste some memory (internal fragmentation).
  • Worst case in the example: \(2^n+1\)
  • If they are all the same size we can manage better than this.
    • e.g. one kind of common kernel data-structure.
  • A more efficient approach is to insert an extra layer before the memory allocation (slab).
  • The slab manages a pool (array) of fixed size elements.
  • Zero internal fragmentation in the slab.
  • Allocation/Freeing items in the slab is very fast (never any splitting).

21. Linux Physical Allocation

  • A 4kb block of physical memory (the global address space) is called a page frame.
  • Slabs and user data sit inside allocations from this page-allocater.
  • Doubly-linked lists for powers of two block sizes (free blocks).
  • Array of pointers into free lists.
  • Lower addresses (16MB) have special rules (ZONE_DMA).
  • The rest is ZONE_NORMAL.
  • Data-structures used inside the kernel are stored in slabs called object caches.
  • Provides fast memory management for the kernel.

22. Overlays

  • Not of the code in program runs at the same time.
  • We can take advantage of this to write bigger programs than there is available memory.
  • Idea: only load in the parts of the program that are currently needed.
  • Swap independent parts into the same region of memory.
  • Similar to swapping, but small pieces of programs code, not whole memory.
  • Writing good overlay descriptions is tedious and difficult.
    • Try to minimise the number of swaps to improve performance.
  • Doing this automatically would be better.

23. Virtual Memory

  • We've now pushed the abstraction provided by dynamic partitions as far as it can go.
    • External fragmentation is an issue that cannot be solved completely with partitions.
  • Instead we use a technique called Virtual Memory.
  • There are two changes to the memory abstraction that we have seen so far.
    • The base+limit registers are replaced with a separate mapping from each page to a frame.
    • Swapping is performed on each page, rather than an entire process memory space.
  • To make this work we need a new piece of hardware in the machine...

24. Virtual Memory: MMU

  • The Memory Management Unit (MMU) sits between the processor and the memory bus.
    • Physically it is often within the same package.
    • Logically it is a separate unit, independent from the processor.
  • The processor uses a separate address space to the memory.
    • The processor address space contains virtual addresses.
    • The memory address space contains physical addresses.
  • Addresses are split into two parts: high=page, low=offset.
    • The MMU implements a mapping between virtual and physical page.

25. Virtual Memory: MMU

  • Memory knows nothing of virtual addresses.
    • It simply performs read/write operations on physical addresses.
  • The program executing on the CPU does not need to know about physical addresses at all.
    • It operates entirely within the virtual address space.
  • The MMU is a layer of indirection inbetween to isolate these two components from one another.
    • The O/S programs the MMU with the details of the mapping.
  • Idea: it's a dictionary mapping on addresses.

26. Virtual Memory: MMU Examples

  • A program executes movl $3, %eax; movl (%eax), %ebx.
    • The virtual address 3 is sent to the MMU as a read request.
    • The MMU sees this is page 0 (drop the low 12 bits).
    • Page 0 maps onto frame 2.
    • The offset within the page is 3: target address is 8192+3=8195.
  • Another program tries movl $32780, %eax; movl (%eax), %ebx.
    • This is page 8 offset 12 (\(2^{15}=32678\).
    • The page is unmapped in the MMU.
    • It signals the CPU there is an error...

27. Quick sketch of Page Faults

  • The book delays page-fault details until § 3.6.2 (lecture 8).
  • But logically page faults are part of the execution of the MMU.
    • So we have a brief sketch of how they work now.
  • There is a hardware interrupt called the page fault.
  • This is trapped by the OS, which swaps pages in memory/disk.
  • The instruction is then repeated and succeeds.

28. Summary

  • Direct use of memory imposes problems.
    • A single dense address range is difficult to manage.
    • Programs need to be relocated.
    • External fragmentation is a difficult issue.
  • Virtual Memory solves this issue.
    • The cost is extra hardware to support a new abstraction.
  • We look at the details of Virtual Memory in the next lecture.