DV1460 / DV1492:

Realtime- (and) Operating-Systems

13:15-15:00 Thursday September 22nd, 2016

Design and Implementation Issues.

§3.5-3.9 pg222-254

Table of Contents
Design IssuesSegmentation
Page Fault DetailsWindows
Implementation IssuesLinux

1. Overview

  • We've covered the core of a modern memory subsystem.
    • Page-tables and their caching in the TLB.
    • Handling page-faults.
    • Core function of a Virtual Memory implementation.
  • The memory subsystem has to interact with many others.
  • Today we look at a range of issues around it.

2. Design Issues

  • Covered major aspects of the subsystem.
    • Three major interactions with system.
  • Design of paging has large impact.
  • Memory performance is signficant factor in overall system performance.
  • The memory abstraction is a core interface between a program and the machine.
  • Some issues overlap: improve the abstraction offered to programs, but need to consider the impact on performance.

3. Allocation Policies

  • a) shows processes A,B,C in memory.
    • Page fault - need to load A6.
  • b) shows a local allocation policy.
    • LRU within memory of process A
  • c) shows a global allocation policy.
    • LRU within all process memories.
  • "Fairness" vs overall efficiency.
  • Can adjust ratio in response to process behaviour (fault rate).
  • Expect a response curve as shown.
  • Attempt to keep processes in a range.
  • Diminishing returns: don't over- or under-allocate.

4. Load Control / Page Size

  • Thrashing: repeatedly paging same data.
  • Working set of all proceses is larger than memory.
  • As we schedule processes it causes page-faults.
  • Shown in timeline on the left.
  • Inefficent: to much waiting time.
  • Temporarily swapping out an entire process can be faster overall.
  • Page size tradeoff: internal fragmentation vs number of entries in table and TLB.

5. Cleaning Policy

  • Flushing an old page imposes latency before loading a new one.
  • If there are no clean pages a page fault takes twice as long.
  • Trying to flush pages ahead of time decrease latency when we need them.
  • Paging daemon periodically flushes dirty pages.
  • Can be integrated into the clock algorithm for choosing page.
  • Two "hands" spin around the clock.
  • Hopefully more pages are clean when we must evict during a fault.

6. Separating Instructions and Data

  • It is not mandatory for code and data to share an address space.
  • Avoiding self-modifying code we can split into I-space and D-space.
  • Larger address-spaces for program to work with.
  • Requires linker involvement (to handle relocation addresses).
  • Improves isolation: make the I-space read-only.
  • Avoids classes of exploits that patch running programs.

7. Shared Pages

  • Sharing pages reduces contention.
    • Fewer page-faults means higher performance.
  • Sharing read-only memory is easy.
    • Without writing no synchronisation is needed.
  • If we assume code is read-only, multiple instances of same program can share.
  • Process use same pointer to I-space.
  • Different pointers to D-space.
  • On fork: read-only pages, copy-on-write.

8. Shared Libraries

  • Sharing entire I-space of processes prevents relocation.
  • When libraries are shared between different programs this can be an issue.
  • Requires position independent code.
    • No references to absolute addresses.
    • Relative (to PC) jumps only.
  • Prevents problem where library only works at certain addresses.
    • Would be unable to use arbitrary shared libs together.
  • Allows the same (code) pages to be located at different addresses.

9. Memory-mapping and Interface

  • I/O is costly: lots of library calls, making syscalls, kernel needs to copy process' local buffer into kernel-space.
  • Memory-mapped files expose a shared buffer to the process.
  • No read/write calls - simply access the memory.
  • Memory subsystem pages changes in and out.
  • Different memory abstraction : explicitly shared holes in address-space.
  • Can be used to build MPI primitives.
  • Explicit naming / sharing of regions allows communications.

10. Implementation Issues

  • Just as the function of the memory subsystem raised certain design issues...
    • ...implementation creates issues of its own.
  • Some of these are relatively simple.
  • Creation of process:
    • Page-table(s) need to be created.
    • Swap space needs to be allocated.
  • Context switch:
    • TLB flushed, MMU reprogrammed.
    • Preload some pages (otherwise will start with page faults).
  • Process exit:
    • Must free up the page-table
  • Handling page-fault is a bit more complex...

11. Page Fault Example

  • Top pg198, and pg235: walk through an example.
  • The example system has a virtual memory of 4 pages.
    • This is twice the size of the physical memory (2 frames).
    • A disk with 3 blocks is used as backing.
  • Initially pages 0 and 2 are loaded into the memory.
    • Pages 1 and 3 are stored in blocks on the disk.
  • The page table in the MMU reflects this with pages 1 and 3 marked as unmapped.
  • In this state the CPU executes an instruction...

12. Page Faults

  • The CPU executes an instruction with a memory access.
  • The MMU looks in its translation table: page is unmapped.
  • Signals the CPU there is a page fault.
  • Causes a trap: current instruction is interrupted.
  • CPU despatches the trap to the kernel page fault handler.
  • The hander picks a victim page to evict from memory.
    • We cover the algorithms for this later on.
  • For now we simply assume page 0 is the victim to be evicted.

13. Page Faults

  • In the example page 0 is the victim, handler writes it to disk.
  • The data in memory is now stale: can be overwritten.
  • Load page 1 from the disk into the frame.
  • Data on the disk is now stale: block can be reused without loss.
  • The handler then rewrites the MMU table.
    • Page 0 is marked as unmapped.
    • Page 1 is now mapped into frame 0.
  • The system can now service the memory access.
  • Resume the instruction that was interrupted: fetch the data.

14. Further implementation issues

  • Most CISC instruction sets use variable length instruction coding.
  • On execution the PC increments to the next opcode.
  • If a page-fault needs to rewind one instruction - how many bytes?
    • Hopefully the hardware shadows the PC (now and next).
  • Locking Memory: pinning of pages.
    • When I/O is occuring the buffer must remain in memory.
    • It is probably DMA, paging out would corrupt the transfer.
    • Need a way to force pages to stay in memory.

15. Last implementation issues

  • Options for arranging the backing store.
  • Static (direct) mapping, or dynamic.
  • Starting processes in memory (paging out when evicted).
    • Over-estimating the working set (more thrashing).
  • Starting processes on disk (page in on demand).
    • Under-estimating, slow start.
  • Separation of policy and mechanism (very general point).

Break (15mins)





Intermission

16. Segmentation

  • We can provide a 2D view of memory to programs.
  • A segment is a separate address-space.
  • Program now manipulate addresses in two parts:
    • The segment (name or id).
    • The address within the segment.
  • Richer representation of the data-model and program structure.
  • Pure segmentation allows swapping of entire segments between disk and memory.
  • Can also combine with paging...

17. Segmentation: Paging in MULTICS

  • MULTICs combines both paging and segmentation in one system.
  • Ran on hardware with 36-bit words.
  • \(2^{24}\) physical words (72MB).
  • A 34-bit address was split as shown.
  • Segment contained up to \(2^{16}\) words.
  • \(2^{18}\) possible segments.
  • Each segment is a virtual address space with a page-table.
  • Paging provides fixed-size blocks to work with in the memory manager.
    • Whole segment does not need to occupy physical memory.
  • Programmer gains the advantages of segmentation.
    • Expressive data-model, modularity and sharing, protection.

18. Segmentation: Paging in MULTICS

  • Each process has a segment table.
  • Each segment entry links to a page-table in memory.
  • Pages are aligned on 6-bit (64-word) boundaries.
  • Only store the top 18-bits of the physical address.
  • Flexibility is costly: segment table could occupy \(\frac{1}{64}\) of memory.
  • Solution: segment table for the process is itself a paged segment.
  • Don't store the entire table when fewer segments are used.

19. Segmentation: Paging on x86

  • In MULTICs the segment number was a field inside the address.
  • If we wanted to ignore segmentation we could put a segment every 64K words and use a flat address-space.
  • Segmentation was easy to use (or ignore) for programmer / compiler.
  • Intel chose a different approach.
  • Separate registers to choose the "current" segment.
  • "Long pointers" need to explicitly store two values.
  • The program changes the segment register before an access.

20. Segmentation: Paging on x86

  • The indirection to walk segment/page tables is quite slow: MULTIC caches 16 entries..
  • x86 caches two entries: code and data segment registers.
  • The segment entries are stored in two tables in memory: LDT and GDT.
  • To use a segment the entry is loaded into the register from the table.
  • Processor knows location of table, index register to choose current segment.
  • Current segment defines a base+limit scheme to translate segment address into "linear address".

21. Segmentation: Paging on x86

  • Backward compatability: x86 implements segmentation both with, and without paging.
  • If paging is switched off (286 compatiability) then linear addresses are physical addresses.
  • If paging is switched on, linear addresses are virtual addresses put through the MMU.
  • Two-level scheme: directories and tables.
  • Again: table walking is slow - TLB to cache frequent entries.

22. Segmentation Summary

  • It seems as if Tanenbaum likes segmentation.
    • A richer memory interface for the programmer is appealing.
    • My first job was real-mode x86 programming.
      • Segmented memory was appalling (YMMV).
    • It seems to have died out for now, may be reinvented later.
  • We've looked at the memory sub-system now.
    • Providing the illusion of large, fast, uniform memory is complex.
  • Many steps just to access a single memory location.
    • Common perception of "software bloat".
    • Performance expectations don't match the 4 000 000 -fold increase in processor speed.
    • Main reason is trying to hide the slow memory interface.

23. Linux case-study

  • Memory management on Linux is pretty close to the textbook solution.
  • Virtual address space give 3GB to process, 1GB to kernel.
  • Text segment holds code.
  • Initialised data.
  • BSS is uninitialised - simply store length on disk.
  • Copy-on-write allows a single zero page.
  • Stack grows through page-faults.
  • Memory-mapped files allow sharing of physical page frames.
  • High bandwidth - processes must handle synchronisation themselves.

24. Windows case-study

  • Example is 32-bit system, 4GB virtual address space per process.
    • Upper 2GB is OS - shaded parts are shared by all.
    • Lower 2GB is user-memory for process.
  • Demand-paged in 4kb pages.
  • OS space only accesible in supervisor mode.
  • Permanently present to avoid cost of switching MMU on a syscall.
  • Each virtual page is in one of three states:
    • invalid: access triggers and exception.
    • reserved: planned growth, access triggers allocatin.
    • committed: active, either on disk or in RAM.

25. Windows case-study

  • No pre allocation in paging file.
    • Page is either in memory, or on disk, not both.
  • Allocation in paging file occurs when page needs to be evicted.
  • Writing of pages can be asynchronous.
    • When page is evicted, marked as being written out.
    • Batches of I/O can be done together.
    • Preallocated swap would be sparse on disk.
    • Just-in-time allocation is a dense area.

26. Windows case-study

  • Page replacement is a replacement of working set.
    • Windows tries to estimate the "important" pages in a process.
    • Minimise page-faults system-wide, keeping most frequent pages per-process.
    • This is a local-allocation approach: attempt to distribute page-faults fairly.
  • Size of the working-sets is flexible ; system adapts to memory pressure.
    • Low pressure - track pages by age, spend little effort on eviction.
    • High pressure - impose working set limits, aggressively evict pages.
  • Working set manager (called every second).
    • Estimates memory pressure, process working sets, throttle disk I/O.

27. Windows case-study

  • When an application runs Windows can record the page activity.
    • Application startup will probably be quite regular.
    • Same initialisation of data-structures.
    • Should produce similar pattern of page-faults each time.
  • This is the working-set for the beginning of the program.
  • SuperFetch records the page-fault sequence for an application.
    • Next time it is started those pages can be preloaded.
    • If they are in memory by the time they are used: page-fault is avoided.
  • Exploiting the working set concept from earlier.

28. Summary

  • End of the memory sub-system.
    • We've seen a range of abstractions available to the programmer.
    • How the OS can implement each one.
    • Seen how the costs are per-access or per-fault.
    • How various layers of caching can mitigate these.
    • Wrapping up with a sketch of real-world systems.
  • Next we venture into the exciting area of File Systems.