DV1460 / DV1492:

Realtime- (and) Operating-Systems

08:15-10:00 Tuesday September 6th, 2016

Processes and Threads.

Chapter 2 - §2.2 (pg 35-81)

Table of Contents
Processes
Threads.

1. Introduction

  • Processes: abstraction for multi-processing.
    • Isolation of physical resources (e.g. memory).
    • Time multiplexing of programs.
  • Chapter 2 covers the multiplexing aspect.
    • Why we uses processes.
    • What is a process (as a unit of scheduling).
    • How do we switch between them?
  • After the break we look at threads.
    • Why they differ from processes.
    • What changes when we use them.
    • How are they scheduled in the kernel.

2. Web server example

  • A simple web server serves static content.
    • Note: different to §2.2.1 / pg100 / slide 18.
  • Static contents is .html loaded from the disk.
  • When a http request is received:
    • Despatch from network stack to process code.
    • Parse the request: URL -> filename.
    • Send request to the disk: read file contents.
    • Very long wait for the data to be returned.
  • The CPU is busy for \(\frac{1}{10}\)ms, idle for 10ms.
  • Utilisation of the CPU is about 1%.
  • CPU should do useful work instead of sitting idle.
  • Schedule another task: fill the gap.

3. Make example

  • Build system for a large application.
    • Many separate compilation jobs.
    • Some parts are independent of one another.
    • Other parts are shared.
    • Compilation is largely disk-bound.
  • Similar problem to before: avoid idle periods.
  • If we have more than one CPU - run jobs in parallel.
  • Goal: minimise total time taken.
  • Goal: don't repeat work, share results.
  • Goal: compiler shouldn't know (complex enough).

4. Process switching

  • Running four processes: A...D
  • Provide illusion of four separate PCs.
    • Four independent "current" locations.
  • A single CPU has only one real PC.
    • Must be shared between all four processes.
    • Switch the active process.
    • Execute a piece of A, switch to B...
  • The overall result is that the CPU is always busy.
  • Fill all of time horizontally.
    • Never execute idle / waiting process.
    • Interleave execution of each process.

5. Context Switch

  • The switch between processes is a Context Switch:
    • In interrupt handler: syscall, I/O, quantum.
    • Save the state of the running process.
    • Load the state of another process.
    • Resume execution.
  • When saving the state of a process:
    • The transition must be mostly invisible.
    • There will be a gap in time between instructions.
    • Process may resume on a different processor.
    • Nothing should affect the logic of the program.
  • OS must store information about each process.
    • Data-structure: Process Control Block...

6. Typical process-table entry (PCB)

ExecutionMemoryFiles
(all pointers into
memory sub-system)
PID (unique key)Root directory (jails)
Registers
Program Counter (PC)
Stack Pointer
Status Word
Text Segment (code)Working directory (CWD)
Data Segment (heap)File Descriptor Table
Stack SegmentUser ID
Group ID
Scheduling State
Priority
Scheduling Parameters
Parent PID
Signals
Time started
CPU time

7. Life-cycle of a process

ReasonEventExpectation
Boot SequenceSystem init.
Background Service Processing.Daemon process forks.Short duration.
User request.Foreground process forks.Fast response time.
Foreground request.Script execution.High CPU utilisation.
Application quit, command finished.main returned / exit().Free resources
Propagate success.
Process detected a problem
e.g. parameter, configuration, environment.
exit() / abort()Free resources
Propagate failure.
Illegal access or instruction.Exception.Clean up, provide notification.
Terminated by another program.Received a kill signal.Who should clean up - parent?

8. Process States

  • The model incorporates blocking I/O cleanly.
  • We only need three states to describe the scheduling of processes.
  • The Running state means that the process is on a CPU.
  • If the process attempts I/O, e.g. calls read() then it must wait (1).
  • It is descheduled: the context is saved in the PCB.
  • The state is the PCB is set to Blocked - the CPU is now free.
  • Later, on an interrupt, the data is received, the PCB is marked as ready (4).

9. Process States

  • To stop programs keeping the CPU we can use a timing interrupt (2 and 3).
  • Blocking I/O allows programs to collaborate effectively: modular system.
  • cat is a program that views text files.
  • Executing cat chap1 chap2 chap3 will print the three files to the terminal.
  • grep is a program that searches for text.
  • They can be combined with a pipeline: this looks like a file to both programs.
  • cat chap1 chap2 chap3 | grep tree print lines containg "tree".

10. Process State Example

  • One real file (on disk), two components with file interface.
  • Two processes in block / ready / run state.
  • Assume files with fixed size buffers (e.g. char buffer[512]).
  • Data arriving from the disk triggers cat process.
  • It writes to stdout until the buffer fills.
  • As data arrives in the buffer grep becomes ready to run.
  • When the buffer fills cat is descheduled, grep takes the CPU.

11. Process-structure model

  • Processes are a good abstraction.
    • Programs use simple blocking I/O: read(), write() etc.
    • Adaptive behaviour - e.g. load balancing CPU in a pipeline.
    • Scales well across multiple CPUs.
  • Scheduler details hidden internally:
    • Interrupts, Service Routines, Scheduling.
  • Behaviour of the model can be reasoned about easily:
    • Parallel execution, suspension, signals, waiting...

12. CPU Utilization

  • Model: process waits on IO with probability \(p\)
  • All processes can be waiting at the same time: probability \(p^n\)
  • Probability that something is running: \(1-p^n\)
  • Simple approximation: shows the trend.
  • Can use the model to plan capacity of a system.
    • Adding 8GB, run 7 processes instead of 3: 79% utilization.
    • Adding 8GB more: 91% utilization - diminishing returns.

13. Process Hierarchies

  • The only mechanism (in UNIX) for creating a new process is fork().
    • Results in a child / parent process.
    • The processes always form a tree.
    • Parents cannot "disown" their children.
    • If a parent is terminated first, orphaned processes are given to pid 1.
  • Simplifies the security model.
    • Parent is an "owner" is some contexts.
    • Parent is expected to clean up.
    • Signals are sent to parent, propagate down tree until handled.

14. Information Leakage

  • Processes work best when programs are limited by I/O, not CPU.
    • After the break we talk about an alternative model.
  • Security perspective: isolation is not guaranteed.
FLUSH+RELOAD: a High Resolution, Low Noise, L3 Cache Side-Channel Attack. Yuval Yarom, Katrina Falkner. SEC'14. USENIX Association.

Theoretical Use of Cache Memory as a Cryptanalytic Side-Channel. Dan Page. IACR Cryptology ePrint Archive 2002: 169 (2002)

Break (15mins)





Intermission

15. What is a thread?

  • Processes: mechanism for scheduling and a private memory space.
  • The privacy of the memory space is to make programs more robust.
  • If data is shared between two processes...
    • ...must copy between two address spaces.
    • Neither process has privileges: requires a syscall.
  • It is possible to trade the safety for higher performance.
Threads - the basic idea.
Multiple executions scheduled with a single address space.
  • When two threads use shared data they can just pass the address (pointer).
  • No copy needed - no syscall to make.
  • This is an optimisation of processes (lighter-weight version).
  • The trade-off is the loss of safety in the programming model.

16. Word Processor Problem

  • Word processor: typical interactive example.
    • Complex internal data representation.
    • Words, paragraphs, styles, fonts, pages
    • Typically: object graph linked by pointers.
  • UI must respond quickly - low event latency.
    • Operations slow - walk the structure.
    • e.g. move one word, reflow all pages.
  • UI cannot wait until done - high latency (zzz).
  • Perform multiple functions concurrently.
  • Must access the same memory / address-space.
  • Threads are a more natural fit than processes.

17. Word Processor Solution

  • Cicle = shared address space, wiggle = concurrent control flow.
  • Extends easily to other concurrent tasks: e.g. backup to disk.
    • Third thread can loop through structure calling write().
  • Raises some questions - if the disk I/O blocks?
    • Does only one thread block? kernel must know about threads, suspend.
    • Do all threads block - i.e. kernel blocks the process?
Design Issue
Does the thread scheduler live in the kernel, or in user-mode?

18. Example: Web server

  • The web-server based on processes assumed each file was independent.
  • But some pages on a server are requested more frequently than others.
  • Caching is a simple optimisation: store some pages in memory.
  • Requires all the connections to access a shared data structure.
  • Dispatch thread; same role as the parent process before.
  • Worker threads; same role to child processes.
  • Higher performance / less robust: error in one thread can affect all.

19. Lightweight context switches

Per-processPer-thread
Address-space, global variablesProgram Counter
Open resources, signals, alarmsStack
Child processesRegisters
Accounting info (e.g. cpu-time)State (e.g. heap variables)
  • Context switch between processes must save / load the entire PCB.
  • Save / load the per-thread data only is much faster.
  • We could write the scheduler directly inside the program:
    • Don't need to save everything, manually save data; switch task.
    • Must write the program in an incremental way to do this.
    • Event-based style from the first lecture: Finite State Machine.
  • If we use a thread library: avoid breaking the code up manually.
    • Write sequence of calls instead e.g. read(), write()...

20. Asynchronous programming styles.

Design Issue
Does the thread scheduler support synchronous or asychronous code?
  • A synchronous style requires blocking I/O.
    • Operations happen in a sequence in the code: delay return until complete.
    • Easier to understand code: most robust program.
  • An asynchronous style requires the programmer to write incremental code.
    • State indicates what is happening, big dispatch: operations return instantly.
    • Program is more complex: but performance can be higher.
    • To support asychronous styles the OS must offer asychronous IO.
    • A select() call to check if I/O is ready.
    • Operations return code indicates either completion, or try again later.

21. Classical thread model

  • a) process-only model - implicit single thread of control, no shared data.
  • b) shows multiple threads with shared data.
  • Each thread needs its own PC and stack.
    • Stack stores call sequence - allows returns.
  • Per-thread ready / running / blocked state.
  • fork() is now difficult.
    • Thread blocked on I/O: two copies?
  • Run-time is also difficult:
    • Memory manager is now shared structure.
    • Conflicting calls on shared resources.
    • e.g. Thread 1 read(), thread 2 close().

22. pthreads

  • IEEE 1003.1c: specification for a threading API / library.
    • Implementation is p(-osix) threads: available almost everywhere.
  • Core functionality for managing threads:
    • Pthread_createExecute supplied procedure in another thread.
    • Pthread_exitKill without returning from procedure.
    • Pthread_joinWait for a termination / status code.
    • Pthread_yieldCPU-heavy threads can explicitly block.
  • Shared data: threads must guard against race-hazards and data-corruption.
    • We look at these problems / solutions over next two lectures.
    • Learning to use pthreads is outside the scope of this course.
    • DV2544: Multiprocessor systems (parallel programming) with Håkan Grahn.
    • Tutorial available for the interested here (cmu).

23. Thread-scheduler location.

  • First option: implement threads in a library.
    • The thread table is in the process memory.
    • The scheduler code is linked into the program.
    • Kernel does not know about the threads: process looks like a single-thread process.
    • Kernel scheduler chooses between processes.
  • Second option: threads implemented in kernel.
    • Thread table in kernel memory.
    • Part of the kernel process scheduler.
    • Chooses which thread in which process should run next.

24. Trade-offs between kernel- and user-threads.

User-mode threadsKernel-mode threads
Context switchReplace PC / Stack / Regs
Cost ~ call < 50ns
Full (lmbench lat_ctx)
3-10µs
Creation / DeletionVery cheap - no syscallExpensive - syscall required
Support neededLibrary - just link.Kernel support + user-mode library.
SyscallsWrapper needed to check for blocking
Limits performance
Use any syscall normally
Page-faultsWhole process will suspendAnother thread can be scheduled
Pre-emptionNo clock interrupt - cooperative only.Threads can be preempted.
UtilisationLower - two schedulers
No global info.
Higer - single scheduler can see entire system.

25. Windows / Linux

  • Linux (§10.3.3 [pg 740]), Windows (§11.4 [pg913]).
  • Similar approaches: both use kernel-mode threads.
    • Both systems give a different PID / TID for each execution context (task).
    • e.g. process is a collection of tasks (same PIDs, different TIDs).
    • Kernel schedules tasks (whole collection of process/threads).
    • Kernel knows what is happening with threads inside the process.
    • Can avoid blocking entire process on a single thread.
  • Each thread has two stacks (both OS) to simplify syscalls.
  • Windows also offers "fibers".
    • Pure user-mode threads.
    • Part of an older API - somewhat deprecated.
  • Linux: Primary API for multi-programming is clone, rather than fork.
    • Allows explicit control over sharing (pg743-4: no safe default).

26. Pools: Scheduler Activations

Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism
Thomas Anderson et al.
http://homes.cs.washington.edu/~tom/pubs/sched_act.pdf
  • Kernel-mode threads are expensive to create.
  • Normal optimisation to create a "pool" - reuse threads from pool.
  • Scheduler activations simplify this approach.
    • Hybrid approach: part kernel / part library.
    • Virtual Processors are an abstraction of thread pools (each is a kernel thread).
    • The messaging component of a Scheduler Activation is called an upcall.
Basic Idea
Treat threads as disposable - after launch runs to completion and gets deleted.
  • Knowing that the Schedular Activations are short-duration / always deleted.
  • Put the logic for the pool in the kernel: fewer syscalls overall.

25. Pools: Pop-up threads

  • Alternative approach to threads pools.
  • Similar idea - threads waiting to be used, deleted afterwards.
  • Pop-up thread: created to respond to an event.
  • Advantage: no previous context to restore - cheap to switch.
  • No need to save context at end - deleted after use.
  • These are used to reduce latency in message handlers.
  • Where should the pop-up thread run?
    • Kernel: lower overhead for switching, but trusted code cannot crash.

28. Summary

  • Concurrent programming has two main issues:
    • How to handle multiple flows of control through a program.
    • Whether or not the data should be shared.
  • OS uses two abstractions to support.
  • Process - flow of control with private memory.
  • Thread - flow of control with shared memory.
  • Next lecture is about communication between processes.
    • Primitives for synchronisation and control.