# Real-Time Operating Systems

08:15-10:00 Tuesday, September 12th, 2017

Scheduling Algorithms.

§2.4-2.7 (pg 149-173)

OverviewRealtime
Batch AlgorithmsLinux Implementation
Interactive AlgorithmsClassic IPC Problems

# 1. Introduction to Scheduling

• We've seen the primitives programmers use for multi-processing.
• These operate in a system that chooses a schedule.
• A running order of timeslices in processes.
• Today we look at the scheduling algorithms the OS uses.
• We finish up some motivating problems in IPC.
• These are to give an idea of how concurrency is used.
• More detail in concurrency is available in a later elective course.
• The scheduler is concerned with the currently runnable processes..
• Blocked processes are held in separate queues (for their resource).
• We see these later when we look at I/O.

# 2. Process Behaviour

• Empirical observation: a process tends to repeat similar behaviour.
Principle of Locality
A program is more likely to repeat recent code than execute something new
• A program that did not would contain few loops.
• This observation leads to the idea behind caching (later in course).
• Today we see how it affects scheduling of processes.
• We can predict that a bursty process will remain bursty, and vica-versa.

# 3. When scheduling decisions are made

• When a new process is created, forked from parent:
• Which process should parent continue, or child run?
• When the current process terminates:
• Which process should run next?
• When the current process yields:
• Which process should run next?
• When the current process blocks?
• Which process should run next?
• When an I/O interrupt fires:
• Switch to a process blocked on that resource?
• When a timing interrupt fires:
• Should we preempt the process and pick another?

# 4. Goals / Metrics

All systems
• Fairness: equal division of resources within a category.
• Policy: arbitrary design decision.
• Balance: avoid bottlenecks, use the whole system.
Batch systems
• Throughput: completed jobs per hour.
• Turnaround: average time to complete from submission.
• CPU utilization: percentage of time CPU is busy.
Interactive systems
• Response time: time from request to the response.
• Proportionality: matching (forming) perceptions of duration.
Real-time systems.
• Meeting deadlines: must vs should.
• Predictability: good estimates of task duration.

# 5. Batch Algorithms

• Batch systems are non-interactive: response time is irrelevant.
• Maximise CPU utilization.
• Maximise throughput.
• Minimise turnaround time.
• Consequence: fewer context switches.
• Extreme case: no timeslicing.
• Run entire jobs to completion.
• Take a representative example.
• Use it to examine three batch algorithms:
• First Come First Served.
• Shortest Job First
• Shortest Remaining Time Next

# 6. Batch: First Come First Served (FCFS)

• Jobs arrive continuously over time: store in linked list.
• Simplest algorithm: run the jobs in the order they arrived.
• No preemption: minimal number of context switches.
• Advantages: don't need to know job durations, simple implementation.
• Disadvantage: shorter jobs can be delayed by running after longer jobs.
• Avg Turnaround: $$\frac{5}{1}, \frac{8}{2}, \frac{14}{3}, \frac{16}{4}, \frac{17}{5}$$, final value fixed for any order.
• Avg Throughput: $$\frac{1}{5}, \frac{2}{8}, \frac{3}{14}, \frac{4}{16}, \frac{5}{17}$$, final value also fixed over all orders.

# 7. Batch: Shortest Job First (SJF)

• If we know the duration of each job we can do better.
• Either declared (by user) or estimated by system.
• Submitting the shortest job first is optimal (for the metrics).
• Shortest jobs have the shortest turnaround time (blue).
• Approaches the final value from beneath: best for most users.
• Turnaround and throughputs are reciprocols:
• Approaches the final value from above (red).
• When durations are known, all jobs are available: optimal schedule.

# 8. Batch: Shortest Remaining Time Next (SRTN)

• What if jobs are not all available at the start?
• Often we schedule systems that run forever: jobs arrive during execution.
• If the shortest job is not available: SJF is no longer optimal.
• If we are allowed to preempt jobs: we can re-run SJF when a job arrives.
• The current job is partially executed: look at the remaining time.

# 9. Interactive Algorithms

• Interactive systems tend to have (impatient) users.
• Minimize response time.
• Analogue to turnaround time for irregularly repeating tasks.
• Maximise efficiency.
• Consequence: again, reduce number of context switches.
• Algorithms:
• Round Robin.
• Priority Scheduling.
• Multiple Queues.
• Shortest Process Next.
• Guaranteed Scheduling.
• Lottery Scheduling.
• Fair-share Scheduling.

# 10. Interactive: Round-robin

• Quantum: length of a time-slice to execute process.
• Either process blocks, or runs until end of quauntum.
• Deschedule and put at the back of the queue.
• Shorter quantums: more overhead in the system.
• Longer quantums: longer latency between slices.
• Tuning parameter : only the length of the quantum.
• Depends on: tolerated level of overhead, and length of context switch.
• Works best when scheduling short-burst, heavy I/O tasks.

# 11. Interactive: Priority Scheduling

• It can be difficult to tune the quantum length to maximise performance.
• Goal: Avoid the longer latency between slices.
• Idea: Split the processes into groups and prioritise them.
• Within each priority level we can use round-robin scheduling (simple).
• Shorter time to cycle back around to the beginning.
• How to assign priorities to processes is a policy (we don't care).
• The mechanism is to run the highest priority process available.
• Desired policy may require dynamic changes of priority levels...

# 12. Interactive: Multiple Queues

• One implementation of priority scheduling is to hold multiple queues.
• When a process finishes execution decide which queue to add it to.
• Idea: reward low-latency processes by moving to a higher queue.
• Punish high-latency processes by moving to a lower queue.
• Simple feedback mechanism to allow the system to adapt to workload.
• Exponential backoff works well: same scheme as TCP/IP retransmission.
• If something goes wrong (exceeding a quantum) double next estimate.
• Compensate by reducing priority (choice of queue) by the inverse.
 Analysis of Multiple-Queue Task Scheduling Algorithms for Multiple-SIMD Machines D. L. Tuomenoksa, H. J. Siegel http://www.engr.colostate.edu/~hj/conferences/54.pdf

# 13. Interactive: Shortest Process Next

• We saw that SJF was optimal in batch systems.
• Can we apply the same idea to an interactive system?
• We can think of an interactive process as being many short jobs:
• Each burst occupies the CPU then the process is descheduled.
• Idea: assume the next burst will be similar length to the last.
• This looks very similar to multiple queues:
• Record the length of each burst as estimate of the next.
• Sort the processes by expected next time.
• Run the shortest first.
• Multiple queues is doing this: each queue is a bucket in the sort.

# 14. Interactive: Better burst length estimates

• The sequence of burst-durations is a time-sequence.
• The best estimate is the average of the sequence.
• Storing a infinite list of times and averaging them is too much overhead.
• Moving Window: store last $$n$$ samples: $$\frac{1}{n}\Sigma_{i=1}^n t_i$$
• Weights: respond more quickly to changes, less importance to older data:
• $$\frac{1}{n}\Sigma_{i=1}^n w_i\cdot t_i$$, choose a set of weights.
• Empirically exponential weighting works well: $$w_i = 2^{-i}$$
Aging (Poisson Smoothing as a FIR-Filter)
$$e' = \alpha \cdot x + (1-\alpha)\cdot e$$
• Only store the current estimate $$e$$, overwrite with new estimate $$e'$$.
• The sample $$x$$ is only used to update the estimate.
• When $$\alpha$$ is $$0.5$$ becomes just an add and a shift.

Intermission

# 15. Interactive: Guaranteed Scheduling

• Systems with quotas divide their resources among users.
• Each user has a proportion of the CPU they are entitled to.
• Policy decision for mapping user quotas onto processes.
• Greedy approach: any process can consume user's quota.
• Come back to "Fair Share" approach later.
Mechanism
1. Record start-time and CPU-time per process.
2. Calculate quota of wall-time for process.
3. For each scheduling decision: pick the process with lowest ratio.
• Example: Three users, uniform distribution, one process each.
• User A: process has 2s of CPU over 4s of wall-time.
• Quota = $$\frac{4}{3}$$, Ratio = $$\frac{2}{1.33} = 1.5$$

# 16. Interactive: Lottery Scheduling

• Guaranteed scheduling has some pathological cases, e.g.:
• What if process with lowest ratio blocked a lot (far under quota).
• But now that it has its data it will be compute-bound for a long time.
• Difficult to implement a system that avoids all corner cases.
• Randomisation can prevent a system settling into a bad state.
• Idea: make a weighted random pick for each decision.
• Weights can be expressed as "lottery tickets".
• More tickets is a higher chance of winning.
• Actually using discrete tokens has advantages:
• They can be traded between processes in the system.

# 17. Interactive: Fair-Share Scheduling

• We saw that a greedy choice of process was one way to implement Guaranteed Scheduling.
• Alternative: divide a user's quota uniformly over their processes.
• Cheaper to implement than calculating ratios.
• Check owner of process during scheduling decision.
• Also applies to threads within processes (all same user): CFS later.

# 18. Realtime

• Hard-realtime system uses tasks of known duration (e.g. WCET).
• Each task is associated with deadlines - when it must execute by.
• Normally these deadlines are periodic.
• A feasible system is one in which the deadlines can be met.
• This is a static decision.
• At runtime the feasible schedule is executed in order.
• Some kinds of realtime systems allow flexibility.
• Attempt to meet deadlines - soft realtime is "best effort".
• Normally handled by the priority mechanism already seen.

# 19. Linux: Scheduler (pg746)

• Hard realtime Linux is a work in progress.
• Soft realtime is defined in POSIX.
• Priorities rang over 140 levels.
• Static part.
• Defined by class: FIFO, round-robin, non-realtime.
• Affected by "niceness" (set through nice utility).
• Kernel can be configured with different schedulers.
• Different schemes for adjusting priority.
• Purpose is to determine "interactivity".
• Two common schedulers are O(1) and CFS.

# 20. Linux: O(1) Scheduler

• Aging approximates exponential weighted average.
• But it needs to be calculated for every task: $$\mathcal{O}(n)$$
• The scheduler keeps a "runqueue" of tasks.
• O(1) scheduler splits it into two: active and expired.
• There is a pair for every priority level.
• If a task is descheduled within its timeslice: added back to active.
• If a task finishes its timeslice: moved to expired.
• When the active list is empty: move the expired list into the active slot.
• A running score is kept (-5,+5) of how many ticks a task has slept or run for: offset to static priority.
• Improves response time for interactive tasks.

# 21. Linux: CFS Scheduler

• "Completely" Fair Scheduling:
• Maintain tasks sorted by their CPU-time in a red-black tree.
• Least executed task is left-most in tree.
• Execute it, when adding running time reinsert in tree.
• (blocked tasks are not in the run-queue).
• Choosing next task: $$\mathcal{O}(1)$$.
• Reinsertion: $$\mathcal{O}(n\cdot\mathrm{log}\;n)$$
• No quantum - ranking tasks by previous CPU time.
• Each task receives $$\frac{1}{n}$$ CPU-time.
• Interactive tasks block quickly - so use little CPU.
• When ready they are reinserted in the left of the tree.
• Rebalances as tasks change their behaviour.
• Designed to fix multimedia playback (among other issues).

# 22. Dining Philosophers Problem (pg167)

• The design of the process sub-sytem is motivated by typical problems.
• There are two classic IPC problems that model many pieces of software.
• Dining Philosophers:
• Five philosphers (processes) around a circular table.
• A fork (resource) lies inbetween each pair.
• Philosophers spend time thinking (CPU) and eating (I/O).
• In order to eat they need two forks (slippery spagetti).
• Relevance: Processes that need (overlapping) sets of resources.
• Important problem because cannot order/prioritise (circular).

Every process is stuck in the waiting state for a resource that will never be released.
• If every philospher tries to do the same, system will deadlock: e.g.
• Every philosopher picks up left (lock), pick up right (waits...).
Livelock (starvation)
Every process keeps running but never acquires all the resources it needs to work.
• A simple attempt to fix: timeouts on the locks.
• Each philospher fails (i.e. an exception) on timeout, tries again...
• In some contexts (e.g. TCP/IP) adding a random wait fixes the problem.
• In other cases (e.g. consistent behaviour of software) it is not so great.
• We need another approach...

# 24. Solution: Set of mutexs (Fig 2-47)

• A philosopher with only one fork is bad.
• If two neighbours can enter this state at the same time then deadlock/livelock.
• Idea: only one philospher at a time, but only when both neighbours are not.
• Need to track neighbour's transitions.
• There is no check() / peek() operation.
• Idea: add an extra state to communicate.
• When a philosopher wants to eat, they communicate it by changing into the hungry state.
• Used to wake-up a neighbour after eating.

# 25. Solution: Set of mutexs

// test inside critical region. if( self==EAT && left ==HUNGRY && right==HUNGRY) { self = EAT; // one branch self.up(); // avoid block } // ... outside critical region ... self.down(); // block if did not up
• put_forks calls test on both neighbours.
• If they are blocked in the down, does up.
• If they are not HUNGARY - no change.

# 26. Problem: Readers and Writers

• A collection of data: a structure that requires mutual exclusion during writing.
• A collection of writing processes: when they need to update the collection they lock it.
• Only one writer at most is active at a time.
• Many readers can access in parallel.
• If a writer holds a lock: everybody else should block.
• If a reader holds a "lock": writers should block.
• New readers can be allowed in - there is an issue with writer starvation if they are unprioritised.

# 27. Solution: Readers and Writers

• Semphores are a blocking counter.
• Lower bound is zero.
• Block on attempts to go lower.
• No blocking behaviour for positive integers.
• This problem requires exclusion when number readers is larger than zero: kind of inverse.
• Solution uses a simple counter (no concurrency).
• Mutual exclusion to access it.
• Triggers semaphore operations at condition boundaries.
mutex.down() rc++; if(rc==1) db.down() // First locks out writers mutex.up() ... mutex.down() rc--; if(rc==0) db.up() // Last releases lock mutex.up() ...

# 28. Summary

• When a process is descheduled, the scheduler decides what to do next.
• We have looked at the algorithms used in batch systems.
• Designed to maximise throughput and minimise turnaround time.
• Interactive systems use algorithms designed with a different purpose.
• Minimise response time for the user.
• We looked at the popular schedulers in Linux.
• O(1) - get the process out of the kernel quickly.
• CFS - try to balance runtimes between all threads.
• We've seen some classical IPC problems to motivate how it is all used.