DV1460 / DV1492:

Real-Time Operating Systems

13:15-15:00 Thursday, September 8th, 2016

Scheduling Algorithms.

§2.4-2.7 (pg 149-173)

Table of Contents
OverviewRealtime
Batch AlgorithmsLinux Implementation
Interactive AlgorithmsClassic IPC Problems

1. Introduction to Scheduling

  • We've seen the primitives programmers use for multi-processing.
    • These operate in a system that chooses a schedule.
      • A running order of timeslices in processes.
    • Today we look at the scheduling algorithms the OS uses.
    • We finish up some motivating problems in IPC.
    • These are to give an idea of how concurrency is used.
      • More detail in concurrency is available in a later elective course.
  • The scheduler is concerned with the currently runnable processes..
    • Blocked processes are held in separate queues (for their resource).
    • We see these later when we look at I/O.

2. Process Behaviour

  • Empirical observation: a process tends to repeat similar behaviour.
Principle of Locality
A program is more likely to repeat recent code than execute something new
  • A program that did not would contain few loops.
  • This observation leads to the idea behind caching (later in course).
  • Today we see how it affects scheduling of processes.
  • We can predict that a bursty process will remain bursty, and vica-versa.

3. When scheduling decisions are made

  • When a new process is created, forked from parent:
    • Which process should parent continue, or child run?
  • When the current process terminates:
    • Which process should run next?
  • When the current process yields:
    • Which process should run next?
  • When the current process blocks?
    • Which process should run next?
  • When an I/O interrupt fires:
    • Switch to a process blocked on that resource?
  • When a timing interrupt fires:
    • Should we preempt the process and pick another?

4. Goals / Metrics

All systems
  • Fairness: equal division of resources within a category.
  • Policy: arbitrary design decision.
  • Balance: avoid bottlenecks, use the whole system.
Batch systems
  • Throughput: completed jobs per hour.
  • Turnaround: average time to complete from submission.
  • CPU utilization: percentage of time CPU is busy.
Interactive systems
  • Response time: time from request to the response.
  • Proportionality: matching (forming) perceptions of duration.
Real-time systems.
  • Meeting deadlines: must vs should.
  • Predictability: good estimates of task duration.

5. Batch Algorithms

  • Batch systems are non-interactive: response time is irrelevant.
    • Maximise CPU utilization.
    • Maxmise throughput.
    • Minimise turnaround time.
    • Consequence: fewer context switches.
    • Extreme case: no timeslicing.
      • Run entire jobs to completion.
  • Take a representative example.
  • Use it to examine three batch algorithms:
    • First Come First Served.
    • Shortest Job First
    • Shortest Remaining Time Next

6. Batch: First Come First Served (FCFS)

  • Jobs arrive continuously over time: store in linked list.
  • Simplest algorithm: run the jobs in the order they arrived.
  • No preemption: minimal number of context switches.
  • Advantages: don't need to know job durations, simple implementation.
  • Disadvantage: shorter jobs can be delayed by running after longer jobs.
  • Avg Throughput: \(\frac{5}{1}, \frac{8}{2}, \frac{14}{3}, \frac{16}{4}, \frac{17}{5}\), final value fixed for any order.
  • Throughput: \(\frac{1}{5}, \frac{2}{8}, \frac{3}{14}, \frac{16}{4}, \frac{17}{5}\), final value also fixed over all orders.

7. Batch: Shortest Job First (SJF)

  • If we know the duration of each job we can do better.
    • Either declared (by user) or estimated by system.
  • Submitting the shortest job first is optimal (for the metrics).
  • Shortest jobs have the shortest turnaround time.
    • Approaches the final value from beneath: best for most users.
  • Turnaround and throughputs are reciprocols:
    • Approaches the final value from above.
  • When durations are known, all jobs are available: optimal schedule.

8. Batch: Shortest Remaining Time Next (SRTN)

  • What if jobs are not all available at the start?
    • Often we schedule systems that run forever: jobs arrive during execution.
  • If the shortest job is not available: SJF is no longer optimal.
  • If we are allowed to preempt jobs: we can re-run SJF when a job arrives.
    • The current job is partially executed: look at the remaining time.

9. Interactive Algorithms

  • Interactive systems tend to have (impatient) users.
    • Minimize response time.
    • Analogue to turnaround time for irregularly repeating tasks.
    • Maximise efficiency.
    • Consequence: again, reduce number of context switches.
  • Algorithms:
    • Round Robin.
    • Priority Scheduling.
    • Multiple Queues.
    • Shortest Process Next.
    • Guaranteed Scheduling.
    • Lottery Scheduling.
    • Fair-share Scheduling.

10. Interactive: Round-robin

  • Quantum: length of a time-slice to execute process.
  • Either process blocks, or runs until end of quauntum.
  • Deschedule and put at the back of the queue.
  • Shorter quantums: more overhead in the system.
  • Longer quantums: longer latency between slices.
  • Tuning parameter : only the length of the quantum.
  • Depends on: tolerated level of overhead, and length of context switch.
  • Works best when scheduling short-burst, heavy I/O tasks.

11. Interactive: Priority Scheduling

  • It can be difficult to tune the quantum length to maximise performance.
  • Goal: Avoid the longer latency between slices.
  • Idea: Split the processes into groups and prioritise them.
  • Within each priority level we can use round-robin scheduling (simple).
    • Shorter time to cycle back around to the beginning.
  • How to assign priorities to processes is a policy (we don't care).
    • The mechanism is to run the highest priority process available.
  • Desired policy may require dynamic changes of priority levels...

12. Interactive: Multiple Queues

  • One implementation of priority scheduling is to hold multiple queues.
    • When a process finishes execution decide which queue to add it to.
  • Idea: reward low-latency processes by moving to a higher queue.
    • Punish high-latency processes by moving to a lower queue.
  • Simple feedback mechanism to allow the system to adapt to workload.
  • Exponential backoff works well: same scheme as TCP/IP retransmission.
    • If something goes wrong (exceeding a quantum) double next estimate.
    • Compensate by reducing priority (choice of queue) by the inverse.
Analysis of Multiple-Queue Task Scheduling Algorithms for Multiple-SIMD Machines
D. L. Tuomenoksa, H. J. Siegel
http://www.engr.colostate.edu/~hj/conferences/54.pdf

13. Interactive: Shortest Process Next

  • We saw that SJF was optimal in batch systems.
    • Can we apply the same idea to an interactive system?
  • We can think of an interactive process as being many short jobs:
  • Each burst occupies the CPU then the process is descheduled.
  • Idea: assume the next burst will be similar length to the last.
  • This looks very similar to multiple queues:
    • Record the length of each burst as estimate of the next.
    • Sort the processes by expected next time.
    • Run the shortest first.
  • Multiple queues is doing this: each queue is a bucket in the sort.

14. Interactive: Better burst length estimates

  • The sequence of burst-durations is a time-sequence.
    • The best estimate is the average of the sequence.
  • Storing a infinite list of times and averaging them is too much overhead.
  • Moving Window: store last \(n\) samples: \(\frac{1}{n}\Sigma_{i=1}^n t_i\)
  • Weights: respond more quickly to changes, less importance to older data:
    • \( \frac{1}{n}\Sigma_{i=1}^n w_i\cdot t_i \), choose a set of weights.
  • Empirically exponential weighting works well: \( w_i = 2^{-i} \)
Aging (Poisson Smoothing as a FIR-Filter)
\(e' = \alpha \cdot x + (1-\alpha)\cdot e\)
  • Only store the current estimate \(e\), overwrite with new estimate \(e'\).
    • The sample \(x\) is only used to update the estimate.
    • When \(\alpha\) is \(0.5\) becomes just an add and a shift.

Break (15mins)





Intermission

15. Interactive: Guaranteed Scheduling

  • Systems with quotas divide their resources among users.
    • Each user has a proportion of the CPU they are entitled to.
    • Policy decision for mapping user quotas onto processes.
    • Greedy approach: any process can consume user's quota.
    • Come back to "Fair Share" approach later.
Mechanism
1. Record start-time and CPU-time per process.
2. Calculate quota of wall-time for process.
3. For each scheduling decision: pick the process with lowest ratio.
  • Example: Three users, uniform distribution, one process each.
    • User A: process has 2s of CPU over 4s of wall-time.
    • Quota = \( \frac{4}{3}\), Ratio = \( \frac{2}{1.33} = 1.5 \)

16. Interactive: Lottery Scheduling

  • Guaranteed scheduling has some pathological cases, e.g.:
    • What if process with lowest ratio blocked a lot (far under quota).
    • But now that it has its data it will be compute-bound for a long time.
  • Difficult to implement a system that avoids all corner cases.
  • Randomisation can prevent a system settling into a bad state.
  • Idea: make a weighted random pick for each decision.
    • Weights can be expressed as "lottery tickets".
    • More tickets is a higher chance of winning.
  • Actually using discrete tokens has advantages:
    • They can be traded between processes in the system.

17. Interactive: Fair-Share Scheduling

  • We saw that a greedy choice of process was one way to implement Guaranteed Scheduling.
    • Alternative: divide a user's quota uniformly over their processes.
    • Cheaper to implement than calculating ratios.
    • Check owner of process during scheduling decision.
  • Also applies to threads within processes (all same user): CFS later.

18. Realtime

  • Hard-realtime system uses tasks of known duration (e.g. WCET).
    • Each task is associated with deadlines - when it must execute by.
    • Normally these deadlines are periodic.
    • A feasible system is one in which the deadlines can be met.
    • This is a static decision.
    • At runtime the feasible schedule is executed in order.
  • Some kinds of realtime systems allow flexibility.
    • Attempt to meet deadlines - soft realtime is "best effort".
    • Normally handled by the priority mechanism already seen.

19. Linux: Scheduler

  • Hard realtime Linux is a work in progress.
  • Soft realtime is defined in POSIX.
  • Linux uses kernel-threads, all processes/threads are tasks, scheduled together.
  • Priorities rang over 140 levels.
  • Static part.
    • Defined by class: FIFO, round-robin, non-realtime.
    • Affected by "niceness" (set through nice utility).
  • Dynamic adjustment of priority.
    • Kernel can be configured with different schedulers.
    • Different schemes for adjusting priority.
  • Purpose is to determine "interactivity".
  • Two common schedulers are O(1) and CFS.

20. Linux: O(1) Scheduler

  • Aging approximates exponential weighted average.
    • But it needs to be calculated for every task: \(\mathcal{O}(n)\)
  • The scheduler keeps a "runqueue" of tasks.
  • O(1) scheduler splits it into two: active and expired.
    • There is a pair for every priority level.
    • If a task is descheduled within its timeslice: added back to active.
    • If a task finishes its timeslice: moved to expired.
    • When the active list is empty: move the expired list into the active slot.
  • A running score is kept (-5,+5) of how many ticks a task has slept or run for: offset to static priority.
    • Improves response time for interactive tasks.

21. Linux: CFS Scheduler

  • "Completely" Fair Scheduling:
    • Maintain tasks sorted by their CPU-time in a red-black tree.
  • Least executed task is left-most in tree.
    • Execute it, when adding running time reinsert in tree.
    • (blocked tasks are not in the run-queue).
  • Choosing next task: \( \mathcal{O}(1) \).
  • Reinsertion: \( \mathcal{O}(n\cdot\mathrm{log}\;n) \)
  • No quantum - ranking tasks by previous CPU time.
  • Each task receives \( \frac{1}{n} \) CPU-time.
    • Interactive tasks block quickly - so use little CPU.
    • When ready they are reinserted in the left of the tree.
  • Rebalances as tasks change their behaviour.
  • Designed to fix multimedia playback (among other issues).

22. Dining Philosophers Problem

  • The design of the process sub-sytem is motivated by typical problems.
    • There are two classic IPC problems that model many pieces of software.
  • Dining Philosophers:
    • Five philosphers (processes) around a circular table.
    • A fork (resource) lies inbetween each pair.
    • Philosophers spend time thinking (CPU) and eating (I/O).
    • In order to eat they need two forks (slippery spagetti).
  • Relevance: Processes that need (overlapping) sets of resources.
  • Important problem because cannot order/prioritise (circular).

23. Deadlock and Livelock

Deadlock
Every process is stuck in the waiting state for a resource that will never be released.
  • If every philospher tries to do the same, system will deadlock: e.g.
    • Every philosopher picks up left (lock), pick up right (waits...).
Livelock (starvation)
Every process keeps running but never acquires all the resources it needs to work.
  • A simple attempt to fix: timeouts on the locks.
    • Each philospher fails (i.e. an exception) on timeout, tries again...
  • In some contexts (e.g. TCP/IP) adding a random wait fixes the problem.
    • In other cases (e.g. consistent behaviour of software) it is not so great.
  • We need another approach...

24. Solution: Set of mutexs

  • A philosopher with only one fork is bad.
    • If two neighbours can enter this state at the same time then deadlock/livelock.
  • Idea: only one philospher at a time, but only when both neighbours are not.
    • Need to track neighbour's transitions.
    • There is no check() / peek() operation.
  • Idea: add an extra state to communicate.
    • When a philosopher wants to eat, they communicate it by changing into the hungry state.
    • Used to wake-up a neighbour after eating.

25. Solution: Set of mutexs

// test inside critical region. if( self==EAT && left ==HUNGRY && right==HUNGRY) { self = EAT; // one branch self.up(); // avoid block } // ... outside critical region ... self.down(); // block if did not up
  • put_forks calls test on both neighbours.
    • If they are blocked in the down, does up.
    • If they are not HUNGARY - no change.

26. Problem: Readers and Writers

  • A collection of data: a structure that requires mutual exclusion during writing.
  • A collection of writing processes: when they need to update the collection they lock it.
    • Only one writer at most is active at a time.
  • A collection of reading processes: multiple reads do not interfere.
    • Many readers can access in parallel.
  • If a writer holds a lock: everybody else should block.
  • If a reader holds a "lock": writers should block.
    • New readers can be allowed in - there is an issue with writer starvation if they are unprioritised.

27. Solution: Readers and Writers

  • Semphores are a blocking counter.
    • Lower bound is zero.
    • Block on attempts to go lower.
    • No blocking behaviour for positive integers.
  • This problem requires exclusion when number readers is larger than zero: kind of inverse.
  • Solution uses a simple counter (no concurrency).
  • Mutual exclusion to access it.
  • Triggers semaphore operations at condition boundaries.
mutex.down() rc++; if(rc==1) db.down() // First locks out writers mutex.up() ... mutex.down() rc--; if(rc==0) db.up() // Last releases lock mutex.up() ...

28. Summary

  • When a process is descheduled, the scheduler decides what to do next.
    • We have looked at the algorithms used in batch systems.
      • Designed to maximise throughput and minimise turnaround time.
    • Interactive systems use algorithms designed with a different purpose.
      • Minimise response time for the user.
      • We looked at the popular schedulers in Linux.
        • O(1) - get the process out of the kernel quickly.
        • CFS - try to balance runtimes between all threads.
    • We've seen some classical IPC problems to motivate how it is all used.