DV1460 / DV1492:

Real-Time Operating Systems

08:15-10:00 Thursday, September 7th, 2017

Interprocess-communication (IPC).

§2.3 (pg 119-149)

Table of Contents
Race HazardsSemaphores
Critical RegionsMonitors

1. Introduction

  • Working in a multi-programming environment:
    • Problems are split up for processing by processes.
    • Processes need ways to collaborate with each other.
    • e.g. pipelines passing streams of text.
  • Note: throughout §2.3 treat the word "process" also as "thread".
  • When independent processes collaborate, three issues arise.
How to get the data from one process to another.
Avoid interferring actions happening simultaneously.
Preserving order
If an action depends on another they must occur in the correct order.

2. Example: Race Conditions in a spooler

Race Condition (Hazard)
Distinct outcomes, selected by the order events occur
  • Abstraction failure: details leak.
    • Hide internal details of other processes.
    • But combined order of steps changes result.
  • Print spooler is a simple queue.
    • Variables: next to print, next to queue.
    • Process adding a file uses non-atomic steps.
    • slot[in++] = name;
    • Operation is not atomic.
    • Read in, write slot, calculate, write in.
  • Both A and B try at the "same" time...

3. What goes wrong?

  • A process can be interrupted at any time.
  • Processes have their instructions interleaved.
  • Not all interleavings work correctly.
    • Initially in=7 read by both A & B.
    • Both write into the same slot.
    • One value destroys the other.
    • Afterwards the damage is invisible.
    • in is the next free slot.
Murphy's law
If it can go wrong, sooner or later, it will.
Randomised system
Pick from all the possible choices.

4. Critical Regions

  • Arbitrary interleaving of operations between processes causes problems.
Atomic operation
All steps in the operation are executed together - cannot be split apart
  • First approach is very coarse and simple: Critical Region.
  • Program has one atomic part, and the rest is scheduled normally.
  • Any scheme for critical regions must meet the following properties.
1. Max one process in critical region.
2. Must work on any hardware / number CPUs.
3. No process outside region may block another.
4. Every process gets in eventually. (fairness)

5. Exclusion: Disabling Interrupts

  • Pre-emptive multitasking relies on timing interrupts.
    • If the interrupt is disabled then there are no context switches.
  • Scheme: disable interrupts during critical region.
  • Without context switches every other process is "blocked".
If the process does not re-enable interrupts the entire OS fails.
Timing interrupts are per-processor; only works on a single core machine.
Problem (if all processors' interrupts disabled)
Performance is terrible, only \(\frac{1}{n}\) processors running during critical region.

6. Exclusion: Locks

  • Idea: a "lock" with only one key [pg123].
  • Code illustrates how we want this to work.
  • Problem: race hazard between testing and setting variable, must be atomic.
while(lock==0) /* loop */; lock = 1; ... critical region ... lock = 0;
// Process A: ME=0 OTHER=1 // Process B: ME=1 OTHER=0 while(turn!=ME) /* loop */; ... critical region ... turn = OTHER; ... non-critical ...
  • Strict Alternation: [Fig 2-23], two processes take turns.
  • While loops is exited after other process sets value.
  • No set after the test, avoids the race hazard.
  • Problem: busy-waiting (uses all available CPU).
  • Violates condition 3 - forced alternation is blocking other process.

7. Exclusion: Peterson's Solution

Solution of a problem in concurrent programming control.
E. W. Dijkstra
  • First solution for any number of processes is credited to Dijkstra.
    • Generalises a 2-process technique invented by Dekker.
    • Dijkstra generalised it to arbitrary number of processes, proved correctness.
  • Simpler approach invented by Peterson [Fig 2-24].
  • Array of flags to show that process is interested in entering.
  • The turn variable is who gets to try next.
proc enter0 = 0r: interested[0] = True 0t: turn = 0 0e: while(turn==0 && interested[1]==True)
proc enter1 = 1r: interested[1] = True 1t: turn = 1 1e: while(turn==1 && interested[0]==True)

8. Exclusion: Explanation of Peterson's Algorithm

  • Pg 125 provides a description of how Peterson's algorithm works.
    • Waiting:while(turn==process && interested[other]==TRUE);
    • Inverse:turn!=process || interested[other]==FALSE
  • This process can continue if turn has changed.
    • If the other process started enter(), this interested flag was already set.
    • Because the other is blocked, this process can enter.
  • This process can continue if the other process's interested flag is clear.
    • The other process has not yet started enter().
  • If the turn flag has flipped since this process set it.
    • The other process must have set its interested flag already.
    • Block until it clears.

9. Exclusion: Proof of Peterson's Algorithm

  • Red states: turn value, two interested flags.
  • Black transitions use labels from slide 7.
  • Multiple states mean we are unsure which state the system is in.
    • e.g. No initialisation of turn.
  • Edges with a line mean the process is blocked in its busy wait.
    • Can see this by considering the states.
  • Repetition of sub-trees: path to each state encodes the PC of the two processes.
  • Cannot get stuck: at least one transition out of every state.
  • Nowhere are both sections entered.

10. Exclusion: Test and Set

  • Put it in hardware: atomic instructions.
    • Read/write in single instruction.
    • Works in multiprocessor (locks bus).
    • Cache coherent (force out to memory).
    • Spin-locks for more than two processes.
  • TSL : Test and Set Lock. (pg126)
    • Reads the old value.
    • Writes "1" into the location.
  • XCHG: Atomic swap.
    • More general: update with any value.
  • More general than a critical section:
  • Multiple locks in different parts of program.
enter: TSL REG, lock CMP $0, REG JNE enter ... enter: MOV $1, REG XCHG REG, lock CMP $0, REG JNE enter ... leave: MOV $0, lock ...

11. Problem: Producer-Consumer

  • Every technique so far is a busy-wait.
    • Waste CPU-time / power instead of sleeping.
  • Abstract problem: motivates a general solution.
  • Producer-Consumer Problem: many real cases.
  • Two processes share a fixed-size buffer.
  • Producer creates items, consumer uses them.
  • Buffer smooths out differences in their run-speeds.
  • Either work at full speed, or sleep (save power).
  • Producer sleeps when buffer is full (no overflow).
  • Consumer sleeps when the buffer is empty.

12. Producer-Consumer Examples

  • Print-spooler is consumer, multiple producers.
    • Multiplexs a physical resource (the printer).
    • Preserves order of delivery.
    • Replace printer with disk: logging daemon.
  • Video-player, e.g. streaming from network.
    • Producer reads network stream, puts frames in a buffer.
    • Consumer displays images on the screen (attempts constant rate).
  • Web-server:
    • Despatch on listening port is producer.
    • Consumers are request handlers, making responses.

13. Sleep / Wake primitives

  • Introduce new primitives to work with.
  • sleep() deschedule current process.
  • wakeup(x) continue process x.
  • Idea is to avoid busy waiting.
  • Buffer full: producer sleeps, cons runs.
  • Buffer empty: cons sleeps, producer runs.
  • Problem: race hazard if() ... sleep()
  • Condition can change after check.
  • Race condition: if(...) sleep().
    • Cons evaluates buf.avail()==0
    • Producer adds item, does wakeup().
    • Deadlock: Cons sleep(), never wakes.
producer: while(1) { Item *i = makeNew() if( buf.used()==N ) sleep(); buf.add(i); count++ if( count==1) wakeup(consumer); } consumer: while(1) { if( buf.used()==0 ) sleep(); Item *i = buf.get(); if( buf.avail()>0 ) wakeup(producer); doStuff(i); }

14. Producer-Consumer: Key Idea

  • The API is treating sleeping and waking as flags.
    • If a wake-up is sent to a non-sleeping process it is lost.
    • The flag is already in the target state (running=True).
  • We must remember the number of wake-ups sent.
    • Storing a wake-up that we don't use yet... for later.
Synchronised flags (locks) are not powerful enough
Synchronised counters are necessary to solve the problem.
  • Summary: Simplest approach is busy-waiting.
    • To improve efficiency we introduce flags (locks).
    • To make them work reliably we need new primitives.
    • After the break we look at semaphore and mutexs.

Break (15mins)


15. Technique: Semaphore description

  • Semaphore: a shared (unsigned) integer.
  • Interface has two atomic methods:
    • down - decrement if positive, if zero sleep.
    • up - increment the counter.
  • If processes are sleeping up() will wake one.
  • Metaphor: the counter is storing "wakeups"
    • Calling down() consumes a wake-up.
    • If none remain, caller sleeps until one arrives.
    • Calling up() provides a wake-up.
  • Interface is balanced - up()s match down()s.
  • Enforces conservation of a resource.

16. Solution using semaphores

  • Uses three separate semaphores.
    • lock guards the buffer.
    • free number of free slots.
    • used number of used slots.
  • The buffer changes (read/writing item and changing count) are in a critical region.
    • Access for only one process at a time.
  • Producer adding when full blocks on free.down().
    • Buffer full = zero free slots.
  • Consumer taking when empty blocks used.down().
    • Buffer empty = zero used slots.
semaphore lock=1, used=0; semaphore free = N; producer: while(1) { Item *i = makeNew(); free.down() lock.down() buf.add(i); lock.up() used.up() } consumer: while(1) { used.down() lock.down() Item *i = buf.get(); lock.up() free.up() doStuff(i);

17. Technique: Solution correctness

  • Sounds plausible: proof?
  • Observe: used=N-free when it starts.
  • Producer: free.down and used.up
  • Consumer: used.down and free.up
  • Both loop bodies preserve the property.
    • Inductively: they remain synchronised.
  • Properties of semaphores prevent <0 or >N items in buffer.
  • If makeNew and doStuff always terminate.
    • Then processes run forever; no deadlock.
  • Can use multiple consumers or producers.
semaphore lock=1, used=0; semaphore free = N; producer: while(1) { Item *i = makeNew(); free.down() lock.down() buf.add(i); lock.up() used.up() } consumer: while(1) { used.down() lock.down() Item *i = buf.get(); lock.up() free.up() doStuff(i);

18. Technique: Mutexes

  • A mutex is an efficient way to guard access to a resource.
    • It is a kind of lock: but doesn't busy-wait like the earlier solutions.
    • It acts like a semaphore limited to 0,1 (a binary semaphore).
  • If we have TSL and a yield() operation then it is simple.
mutexLock: TSL REG, mutex CMP $0, REG JZE ok CALL yield JMP mutexLock ok: RET
mutexUnlock: MOV $0, mutex RET
  • Very similar to the spin-lock shown earlier.
    • On failure, don't try immediately again, run something else first.
  • If these are user-threads then no OS call - very fast...

19. Mutexes: Threads vs Processes

  • Discussion on pg134 about address spaces.
  • If we are scheduling threads: shared address space.
  • Both threads can access the mutex location.
    • Top diagram: mutex provided by library in process.
  • When scheduling processes they have private memory... middle diagram.
  • Requires syscalls to access the mutex location.
    • Typically lock(), unlock() interface.
    • This is very slow: the lock using TSL takes 10s cycles.
    • Making an OS call takes 1000s of cycles.
  • To avoid the syscall we want to achieve something like the bottom diagram...

20. Technique: Futex

  • Low contention = long average delay.
    • The overhead of OS calls can be tolerated.
  • High contention = short average delay.
  • Linux: processes can really share memory.
    • Same physical page in both page-tables.
  • Futex: Fast User-space Mutex.
    • Atomic instruction on shared memory.
    • If available - grabs it straight away (low cost).
    • If in use - syscall to sleep().
  • Release the lock:
    • If zero processes sleeping - avoid the syscall.

21. Semaphores in pthreads

  • Semaphores are POSIX, but not pthreads.
    • Might not be available directly.
  • pthread_cond_t is a condition variable.
    • Sleep/wakeup interface, race hazards.
  • pthread_mutex_t is a mutex (safe).
  • Fig 2.32 is not good: threads work in lock-step.
  • Better approach is to build semaphores in pthreads.
  • Note: p_ means pthread_.
typedef struct { p_mutex_t lock; p_cond_t wait; int count; } Sem; void down(Sem *s) { p_mutex_lock(&s->lock); if( --(s->count) < 0) p_cond_wait(&s->wait, &s->lock); else p_mutex_unlock(&s->lock); } void up(Sem *s) { p_mutex_lock(&s->lock); if( ++(s->count) >= 0) p_cond_signal(&s->wait); p_cond_unlock(&s->lock); }

22. Motivation for monitors

  • Semaphores and mutexes, powerful and safe: so all done?
  • Many decades of real-world experience suggests otherwise.
  • Concurrent programming using them is reputed to be quite hard.
  • Consider two similar ways of doing the same thing:
producer (working).... : free.down() criticalRegion.down() ...
producer (broken).... : criticalRegion.down() free.down() ...
  • In the broken case the producer enters its critical region first.
    • The second call to remove a free slot (to put the item into) can block.
    • If the producer blocks while inside its critical region the consumer cannot run.
  • System deadlocks. Game over.

23. Technique: Monitors

  • Monitors are a higher-level primitive for concurrency.
  • Idea: no synchronisation variables - use control-flow instead.
  • Easier for programmers to work with - matches procedures in program.
Each procedure has a lock, only one process may enter at a time.
  • Programmers must think about control-flow between procedures anyway.
  • A whole procedure becomes an atomic operation.
  • The actual mechanism can be handled by the compiler / run-time.
    • Insert lower-level mutexes inside the compiled program.
    • Java: synchronised keyword indicate a monitor.
  • [Fig 2-34] very simple - insert() and remove() become atomic actions.
  • Java: wait() / notify() can only be called inside mutex (no hazard).

24. Technique: Message Passing (MPI)

  • Monitors: higher-level approach on-top of mutexes.
    • Requires shared memory, works on small machines.
    • What about large-scale servers with 1000s of processors?
    • Overhead limits shared-memory to about 8-16 nodes.
  • If we use a network between nodes - what about message loss?
  • Full details in DV2544, rough overview:
    • Build MPI on-top of a layer that performs retransmission (e.g. TCP/IP).
    • Build MPI on-top of a faster layer (UDP/IP) and handle retransmission.
  • Reuse the same interface locally (use IPC instead of network).
    • Simplifies code: learn MPI-interface for parallel.
  • Scales very well (100 000s of processors).

25. Technique: Message Passing (MPI)

  • Abstraction provides guaranteed delivery.
  • Implemented on retransmission layer.
  • Mailbox: queue of messages to process.
  • Block when sending to a full mailbox.
  • Block when reading an empty mailbox.
  • Shared fixed-size buffer, rather than a stream.
  • Different to TCP abstraction.
  • Idea: Treat messages as tokens.
  • Maintain a fixed number in the system.
  • Leads to simple code.
  • Implementation is very fast, scales well.
producer (repeats forever): produce item // might block here receive msg fill msg with item send msg consumer: // one-time init send N empty msgs forever: // might block receive msg take item from msg send empty msg process item

26. Technique: Barriers

  • A barrier is a way for a group of processes to synchronise.
  • The barrier spans a group (library needs to know group size \(n\)).
  • Processes arrive at the barrier() individually (a).
  • First \(n-1\) processes block (b), all continue when last arrives (c).
  • Easy to build synchronous channels from primitive.

27. Technique: Avoiding Locks

  • The fastest technique for locks: no locks.
  • Basic idea: hide invalid intermediate steps, only make atomic actions visible.
  • e.g. Preparing the node to add first (with child links), atomic overwrite (X->E).
  • e.g. Removing links into B first, then deleting B.
  • Context: assume users do not hold pointers into structure for long.
  • High-performance, short-duration calls.
  • Timeout: let users finish B before removing.

28. Summary

  • We've seen the different concurrency primtives.
  • Each interacts with the scheduler in some way.
  • We saw one part of the scheduler already:
    • The context switch / blocking I/O.
  • Next time we look at the rest of the scheduler:
    • How to choose which process runs next.