# Realtime- (and) Operating-Systems

13:15-15:00 Thursday September 22nd, 2016

Design and Implementation Issues.

§3.5-3.9 pg222-254

Design IssuesSegmentation
Page Fault DetailsWindows
Implementation IssuesLinux

# 1. Overview

• We've covered the core of a modern memory subsystem.
• Page-tables and their caching in the TLB.
• Handling page-faults.
• Core function of a Virtual Memory implementation.
• The memory subsystem has to interact with many others.
• Today we look at a range of issues around it.

# 2. Design Issues

• Covered major aspects of the subsystem.
• Three major interactions with system.
• Design of paging has large impact.
• Memory performance is signficant factor in overall system performance.
• The memory abstraction is a core interface between a program and the machine.
• Some issues overlap: improve the abstraction offered to programs, but need to consider the impact on performance.

# 3. Allocation Policies

• a) shows processes A,B,C in memory.
• Page fault - need to load A6.
• b) shows a local allocation policy.
• LRU within memory of process A
• c) shows a global allocation policy.
• LRU within all process memories.
• "Fairness" vs overall efficiency.
• Can adjust ratio in response to process behaviour (fault rate).
• Expect a response curve as shown.
• Attempt to keep processes in a range.
• Diminishing returns: don't over- or under-allocate.

# 4. Load Control / Page Size

• Thrashing: repeatedly paging same data.
• Working set of all proceses is larger than memory.
• As we schedule processes it causes page-faults.
• Shown in timeline on the left.
• Inefficent: to much waiting time.
• Temporarily swapping out an entire process can be faster overall.
• Page size tradeoff: internal fragmentation vs number of entries in table and TLB.

# 5. Cleaning Policy

• Flushing an old page imposes latency before loading a new one.
• If there are no clean pages a page fault takes twice as long.
• Trying to flush pages ahead of time decrease latency when we need them.
• Paging daemon periodically flushes dirty pages.
• Can be integrated into the clock algorithm for choosing page.
• Two "hands" spin around the clock.
• Hopefully more pages are clean when we must evict during a fault.

# 6. Separating Instructions and Data

• It is not mandatory for code and data to share an address space.
• Avoiding self-modifying code we can split into I-space and D-space.
• Larger address-spaces for program to work with.
• Improves isolation: make the I-space read-only.
• Avoids classes of exploits that patch running programs.

# 7. Shared Pages

• Sharing pages reduces contention.
• Fewer page-faults means higher performance.
• Sharing read-only memory is easy.
• Without writing no synchronisation is needed.
• If we assume code is read-only, multiple instances of same program can share.
• Process use same pointer to I-space.
• Different pointers to D-space.
• On fork: read-only pages, copy-on-write.

# 8. Shared Libraries

• Sharing entire I-space of processes prevents relocation.
• When libraries are shared between different programs this can be an issue.
• Requires position independent code.
• No references to absolute addresses.
• Relative (to PC) jumps only.
• Prevents problem where library only works at certain addresses.
• Would be unable to use arbitrary shared libs together.
• Allows the same (code) pages to be located at different addresses.

# 9. Memory-mapping and Interface

• I/O is costly: lots of library calls, making syscalls, kernel needs to copy process' local buffer into kernel-space.
• Memory-mapped files expose a shared buffer to the process.
• No read/write calls - simply access the memory.
• Memory subsystem pages changes in and out.
• Different memory abstraction : explicitly shared holes in address-space.
• Can be used to build MPI primitives.
• Explicit naming / sharing of regions allows communications.

# 10. Implementation Issues

• Just as the function of the memory subsystem raised certain design issues...
• ...implementation creates issues of its own.
• Some of these are relatively simple.
• Creation of process:
• Page-table(s) need to be created.
• Swap space needs to be allocated.
• Context switch:
• TLB flushed, MMU reprogrammed.
• Process exit:
• Must free up the page-table
• Handling page-fault is a bit more complex...

# 11. Page Fault Example

• Top pg198, and pg235: walk through an example.
• The example system has a virtual memory of 4 pages.
• This is twice the size of the physical memory (2 frames).
• A disk with 3 blocks is used as backing.
• Initially pages 0 and 2 are loaded into the memory.
• Pages 1 and 3 are stored in blocks on the disk.
• The page table in the MMU reflects this with pages 1 and 3 marked as unmapped.
• In this state the CPU executes an instruction...

# 12. Page Faults

• The CPU executes an instruction with a memory access.
• The MMU looks in its translation table: page is unmapped.
• Signals the CPU there is a page fault.
• Causes a trap: current instruction is interrupted.
• CPU despatches the trap to the kernel page fault handler.
• The hander picks a victim page to evict from memory.
• We cover the algorithms for this later on.
• For now we simply assume page 0 is the victim to be evicted.

# 13. Page Faults

• In the example page 0 is the victim, handler writes it to disk.
• The data in memory is now stale: can be overwritten.
• Load page 1 from the disk into the frame.
• Data on the disk is now stale: block can be reused without loss.
• The handler then rewrites the MMU table.
• Page 0 is marked as unmapped.
• Page 1 is now mapped into frame 0.
• The system can now service the memory access.
• Resume the instruction that was interrupted: fetch the data.

# 14. Further implementation issues

• Most CISC instruction sets use variable length instruction coding.
• On execution the PC increments to the next opcode.
• If a page-fault needs to rewind one instruction - how many bytes?
• Hopefully the hardware shadows the PC (now and next).
• Locking Memory: pinning of pages.
• When I/O is occuring the buffer must remain in memory.
• It is probably DMA, paging out would corrupt the transfer.
• Need a way to force pages to stay in memory.

# 15. Last implementation issues

• Options for arranging the backing store.
• Static (direct) mapping, or dynamic.
• Starting processes in memory (paging out when evicted).
• Over-estimating the working set (more thrashing).
• Starting processes on disk (page in on demand).
• Under-estimating, slow start.
• Separation of policy and mechanism (very general point).

Intermission

# 16. Segmentation

• We can provide a 2D view of memory to programs.
• A segment is a separate address-space.
• Program now manipulate addresses in two parts:
• The segment (name or id).
• The address within the segment.
• Richer representation of the data-model and program structure.
• Pure segmentation allows swapping of entire segments between disk and memory.
• Can also combine with paging...

# 17. Segmentation: Paging in MULTICS

• MULTICs combines both paging and segmentation in one system.
• Ran on hardware with 36-bit words.
• $$2^{24}$$ physical words (72MB).
• A 34-bit address was split as shown.
• Segment contained up to $$2^{16}$$ words.
• $$2^{18}$$ possible segments.
• Each segment is a virtual address space with a page-table.
• Paging provides fixed-size blocks to work with in the memory manager.
• Whole segment does not need to occupy physical memory.
• Programmer gains the advantages of segmentation.
• Expressive data-model, modularity and sharing, protection.

# 18. Segmentation: Paging in MULTICS

• Each process has a segment table.
• Each segment entry links to a page-table in memory.
• Pages are aligned on 6-bit (64-word) boundaries.
• Only store the top 18-bits of the physical address.
• Flexibility is costly: segment table could occupy $$\frac{1}{64}$$ of memory.
• Solution: segment table for the process is itself a paged segment.
• Don't store the entire table when fewer segments are used.

# 19. Segmentation: Paging on x86

• In MULTICs the segment number was a field inside the address.
• If we wanted to ignore segmentation we could put a segment every 64K words and use a flat address-space.
• Segmentation was easy to use (or ignore) for programmer / compiler.
• Intel chose a different approach.
• Separate registers to choose the "current" segment.
• "Long pointers" need to explicitly store two values.
• The program changes the segment register before an access.

# 20. Segmentation: Paging on x86

• The indirection to walk segment/page tables is quite slow: MULTIC caches 16 entries..
• x86 caches two entries: code and data segment registers.
• The segment entries are stored in two tables in memory: LDT and GDT.
• To use a segment the entry is loaded into the register from the table.
• Processor knows location of table, index register to choose current segment.
• Current segment defines a base+limit scheme to translate segment address into "linear address".

# 21. Segmentation: Paging on x86

• Backward compatability: x86 implements segmentation both with, and without paging.
• If paging is switched off (286 compatiability) then linear addresses are physical addresses.
• If paging is switched on, linear addresses are virtual addresses put through the MMU.
• Two-level scheme: directories and tables.
• Again: table walking is slow - TLB to cache frequent entries.

# 22. Segmentation Summary

• It seems as if Tanenbaum likes segmentation.
• A richer memory interface for the programmer is appealing.
• My first job was real-mode x86 programming.
• Segmented memory was appalling (YMMV).
• It seems to have died out for now, may be reinvented later.
• We've looked at the memory sub-system now.
• Providing the illusion of large, fast, uniform memory is complex.
• Many steps just to access a single memory location.
• Common perception of "software bloat".
• Performance expectations don't match the 4 000 000 -fold increase in processor speed.
• Main reason is trying to hide the slow memory interface.

# 23. Linux case-study

• Memory management on Linux is pretty close to the textbook solution.
• Virtual address space give 3GB to process, 1GB to kernel.
• Text segment holds code.
• Initialised data.
• BSS is uninitialised - simply store length on disk.
• Copy-on-write allows a single zero page.
• Stack grows through page-faults.
• Memory-mapped files allow sharing of physical page frames.
• High bandwidth - processes must handle synchronisation themselves.

# 24. Windows case-study

• Example is 32-bit system, 4GB virtual address space per process.
• Upper 2GB is OS - shaded parts are shared by all.
• Lower 2GB is user-memory for process.
• Demand-paged in 4kb pages.
• OS space only accesible in supervisor mode.
• Permanently present to avoid cost of switching MMU on a syscall.
• Each virtual page is in one of three states:
• invalid: access triggers and exception.
• reserved: planned growth, access triggers allocatin.
• committed: active, either on disk or in RAM.

# 25. Windows case-study

• No pre allocation in paging file.
• Page is either in memory, or on disk, not both.
• Allocation in paging file occurs when page needs to be evicted.
• Writing of pages can be asynchronous.
• When page is evicted, marked as being written out.
• Batches of I/O can be done together.
• Preallocated swap would be sparse on disk.
• Just-in-time allocation is a dense area.

# 26. Windows case-study

• Page replacement is a replacement of working set.
• Windows tries to estimate the "important" pages in a process.
• Minimise page-faults system-wide, keeping most frequent pages per-process.
• This is a local-allocation approach: attempt to distribute page-faults fairly.
• Size of the working-sets is flexible ; system adapts to memory pressure.
• Low pressure - track pages by age, spend little effort on eviction.
• High pressure - impose working set limits, aggressively evict pages.
• Working set manager (called every second).
• Estimates memory pressure, process working sets, throttle disk I/O.

# 27. Windows case-study

• When an application runs Windows can record the page activity.
• Application startup will probably be quite regular.
• Same initialisation of data-structures.
• Should produce similar pattern of page-faults each time.
• This is the working-set for the beginning of the program.
• SuperFetch records the page-fault sequence for an application.
• Next time it is started those pages can be preloaded.
• If they are in memory by the time they are used: page-fault is avoided.
• Exploiting the working set concept from earlier.

# 28. Summary

• End of the memory sub-system.
• We've seen a range of abstractions available to the programmer.
• How the OS can implement each one.
• Seen how the costs are per-access or per-fault.
• How various layers of caching can mitigate these.
• Wrapping up with a sketch of real-world systems.
• Next we venture into the exciting area of File Systems.