# Realtime- (and) Operating-Systems

I/O : Hardware Principles

§5.1 (pg 337-351)

DeviceDMA
PortsInterupts

# 1. Hardware Devices

• A logical view of I/O is shown to the right.
• Different kinds of devices, plugged into a common bus.
• Each can communicate with CPU or Memory.
• Huge degree of variations between the devices.
• Bandwidth: 1 bytes/sec - 20GB/sec.
• Latency: <1ms - 1s
• Complexity: single function - small computer.
• Controller interfaces the device to the bus.
• Slightly common interface: status and control registers, exact details differ.

# 2. Device Controllers

• The actual electronics in a device are completely specific.
• The controller is there is make it a little bit sane.
• Commands sent between the device and the CPU (both directions).
• Perform some specific function.
• Data sent between the device and the CPU (again, both ways).
• Each register is a word, e.g. 32-bits.
• Individual bits, or groups of bits, encode each of the above possibilities.

# 3. Device Controllers

• Register organisation is specific to each device (e.g. brand and model).
• CPU accesses these registers over the bus with read/write operations.
• Control registers are written to issue commands.
• Data registers are read/written for transfers.
• Code on the CPU needs to know the specific details for each device.
• The device IDs are hardcoded (known) constants to determine which device the bus operations are addressed to.

# 4. Examples

• The structure of lectures 11,12 and 13:
• Look at the software (device drivers).
 Example Control Reg Status Reg Data Reg Interrupts DMA Keyboard 8 8 8 No No Mouse 8 8 8 Yes No Harddisk 10 24 16 Yes Yes Network 16? 128 64+ Yes Yes
• Register sizes are measured in bits.
• Rough guie to how complex the device and interface will be.
• Examples are based on real hardware (PS/2, IDE and 3c509c) but simplified.
• The various registers are universal, usage of interrupts and DMA varies.

# 5. Using I/O Ports

• To communicate with a device the CPU needs to read/write shared registers.
• The simplest scheme for I/O is to add two instructions to the CPU:
• IN REG, PORT: read a value from the port number.
• OUT REG,PORT: write a value to a port number.
• Implication: global address spaces for device registers.
• It is shared across all brands and models of devices.
• This makes it quite hard to administer.
• This is common in embedded systems: known set of devices.
• The code of programs can mix up memory access, computation and I/O:
movl $7, %eax addl (%eax), %ebx out %ebx,$3

# 6. I/O Ports Example

• I thought USB would make an interesting real-world example.
• PS/2 connectors with a 8042 controller chip are much simpler.
• This pair of registers forms the communication channel with a PS/2 device.
• To use it we need to access the ports (not native to C/C++).
inline void outb(uint16_t port, uint8_t value) { __asm("outb %b0, %w1" :: "a"(value), "d"(port)); }

# 7. I/O Ports Example

• Only two registers: pack a different function into each bit (e.g. status).
• Multiplex different registers on the same port.
• Sending a command can choose what the output buffer does.
• I/O programming is very low-level.
 Bit Name Meaning 0 OBF Output Buffer Full 1 IBF Input Buffer Full 2 SYS Reset cold or softare? 3 A2 Internal address line 2 4 INH Keyboard inhibited? 5 MOBF Mouse buffer full 6 TO Timeout, bad? 7 PERR Parity error, connected?
kbRead: inb $0x64, al ; Read Status byte andb$0x02, al ; Test IBF flag (Status<1>) jz kbRead ; Wait for IBF = 1 inb $0x60, al ; Read input buffer ... or equivalently in C ... while( !inb(0x64)&0x02 ); scancode = inb(0x60); # 8. Address space options • Previous example corresponds to design a). • Separate address space for I/O ports and memory. • Option b) puts both in the same address space. • The processor does not need separate instructions. • Accessing ports looks the same as memory. movl$portaddr, %eax movb \$0x10, (%eax)
• Option c) is a hybrid design.
• Some registers (normally status and control) have port addresses.
• Others (such as data) have memory addresses.
• Programs code mixes up port and memory accesses.

# 9. Comparison between approaches

 Common address space Separate spaces Hardware One bus with address lines, each controller must filter ranges. Address lines + bus selection lines. Language Interface Fits into C, C++ Requires inline asm Protection Flags in page-table root / supervisor privledges Instruction mode Any instruction with address can use Dedicated instruction for access, then function Caching Must exclude I/O ranges No special treatment

# 10. Multiple buses

• Figure a) shows a single-bus system: I/O and memory goes over the same wires.
• Each controller has to look at every operation, decide if it is the target.
• This means that the controller has to work at the same speed as memory.
• Bandwidth must be split between I/O and memory traffic.
• Figure b) shows a dedicated memory bus (high speed and bandwidth).
• This requires the memory to sit on both buses (multi-ported).
• Bus snooping.

# 11. Multiple buses

• With memory-mapped I/O, which bus should the CPU use?
• Option: Try the fast bus first, fall back if no response.
• Simple to implement, but adds latency to I/O operations.
• Option: I/O devices watch addresses on memory bus, signal when interested.
• Controller must run at high frequency for the fast bus.
• Option: Reserve a memory range for I/O devices.
• Bit more complex at boot-time, how PCs work.

# 12. Summary of Registers

 Example Control Reg Status Reg Data Reg Interrupts DMA Keyboard 8 8 8 No No Mouse 8 8 8 Yes No Harddisk 10 24 16 Yes Yes Network 16? 128 64+ Yes Yes
• The example so far looked at the register interface for an 8042 controller.
• Each bit of the register has a meaning.
• Very low-level interface to the raw hardware behind the controller.
• Writing 0s/1s into control registers issues commands / activate circuits.
• Reading 0s/1s from status registers checks specific circuits in device.
• For more complex devices we can see the size of the interface increases.
• Network cards are not the most complex device: GPU interface 10-100x larger.
• We will take a more abstract look at the final two examples.

# 13. Harddisk example

• Red lines show communication between units in the HDD.
• The control logic is linked to everything so no wiring shown.
• The interface visible to the CPU has to control the physical hardware in the drive.
• More complex than mouse / keyboard: bi-directional data flow.

Intermission

# 14. NIC example

• Big fat lines show high-bandwidth communication paths.
• Data is bi-directional, and now the device initiates communications.
• Multiple layers of network model: much more control / status.

# 15. The other hardware features

 Example Control Reg Status Reg Data Reg Interrupts DMA Keyboard 8 8 8 No No Mouse 8 8 8 Yes No Harddisk 10 24 16 Yes Yes Network 16? 128 64+ Yes Yes
• Now we've seen the four examples and how they scale up in complexity.
• The last two mechanisms that we look at are interrupts and DMA.
• These allow more sophisticated interaction with devices.
• So far we've seen very simple port writing / reading by the CPU.
• Both the PS/2 mouse and keyboard share the same controller.
• Keyboard polling is hidden behind an external interrupt (clock).
• The mouse generates its own interupts to deliver X/Y coordinates.
• Both are quite low bandwidth communications.
• HDD and NIC are higher bandwith, NICs require low latency as well.
• Harddisks are higher bandwidth, quite tolerant of latency.
• Ethernet is very high bandwidth - requires low latency.

# 16. Example of real code (3509c driver)

// Drop everything, so we are not driving the data, and run the // clock through 32 cycles in case the PHY is trying to tell us // something. Then read the data line, since the PHY's pull-up // will read as a 1 if it's present. select_window(nic, 4); outpw(nic->iobase + PHYSICAL_MANAGEMENT, 0); for (i = 0; i < 32; i++) { udelay(1); outpw(nic->iobase + PHYSICAL_MANAGEMENT, PHY_CLOCK); udelay(1); outpw(nic->iobase + PHYSICAL_MANAGEMENT, 0); }
• Example from the NIC negotiating the physical layer.
• As well as setting individual bits to control a hardware function...
• ...timing constraints as well.

# 17. Motivation for DMA

• When the CPU sets a value in a register - very fast transition in device.
• Actual hardware may need to operate slower.
• Depending on the length of delay:
• Busy loops (very small pauses).
• Concurrency primitive (e.g. semaphore) within kernel.
• Suspend operation and wait for an interrupt.
• For very short pauses busy loops may be the only option (low-latency).
• But they tie up the CPU.
• Similar problem with a sequence of operations.
• e.g. writing a buffer to a register one word at a time.
• Very simple jobs occupying the CPU when it could be running code.

# 18. Direct Memory Access

• Direct Memory Access (DMA) controller.
• A very simple co-processor.
• CPU programs it by writing into its control registers.
• Perform repetitive tasks - e.g. read from a register and write to memory.
• Goal: the CPU can do useful work while the low-level transfer is happening.
• DMA controllers have several banks of control registers.
• Each one programs a different transfer sequence ("channel).
• Allows multiple transfers to run at once.
• DMA controllers can be built into the bus (e.g. PCI).
• High-bandwidth controllers may have dedicatd DMA controllers.
• The CPU is no longer the only device issuing requests on the bus.
• DMA controller(s) can also write onto the bus.
• This is a strong motiviation for a single system bus.

# 20. Different types of transfers in DMA

• Word at a time (also called cycle stealing).
• No need for DMA controller to lock the bus.
• When it wants a transfer the CPU has to wait.
• Low overhead for low bandwidth applications.
• Burst mode.
• DMA controller locks the bus.
• Transfers several words at once.
• Frees the bus to allow other devices to use.
• Lower overhead for high bandwidth applications.
• Flyby-mode: device writes to memory directly.
• Alterative: devices write to DMA controller, then transfer.
• Lower complexity devices, allows buffering+burst.

# 21. Combination of I/O mechanisms

• The hardware introduced so far fulfils different roles in I/O.
• Individual commands and responses: control and status registers.
• Small transfers: direct use of data registers.
• Large transfers: automated use of data registers via DMA.
• The final hardware mechanism used in I/O is the interrupt.
• We've seen their use a few times already:
• Syscalls - manage privilege escalation via software interrupts.
• Processor scheduling - timing interupt to preempt a process.
• Virtual Memory - handling of page-faults.
• File-system - assumption that disk blocks arrive somehow.
• Now we can fill in the details of they work.

# 22. Interrupts mechanism

• The interrupt controller is a separate logical entity from the CPU.
• Can be integrated into the same physical package (bus controller).
• Can also be integrated into the device controller.
• Allows the device to initiate communication with the CPU.
• Diagram shows the steps that occur over the bus.

# 23. Multiple Interrupts

• Conceptually interrupts are simple when we consider a single controller.
• CPU sees a signal and calls a handler routine.
• How can multiple interrupts be serviced?
• The new interrupt can be delayed until the old one has finished its service handler.
• If the interrupt is time-critical then data could be lost, e.g. if a buffer overflows.
• The CPU needs a way to choose which interrupt to service.
• Controllers identify the source of the interrupt by placing a number on the address lines.
• This is called the Interrupt Request Line (IRQ).

# 24. Multiple Interrupts

• The IRQ is used as an index into the Interupt Service Vector (ISV).
• This is a table of addresses stored at a constant address in memory.
• The issue of which interrupt to service can then be solved by prioritisation.
• When an interrupt is issued the controller places the IRQ on the address lines.
• The CPU can then distinguish between interrupt sources.
• Each IRQ can be given a priority level, e.g. lower or higher values have priority.
• An interrupt can itself be interrupted if the new priority level is higher.

# 25. Issues in saving state

• The fastest approach is to save state inside the processor.
• A hidden (not programmer visible) bank of registers.
• But this will be overwritten if a new interrupt is higher priority.
• To use this fast approach the OS must disable (not acknowledge) interrupts until processing is finished.
• The alternative approach is to save information on the stack.
• Which stack? The user process, the kernel?
• Process stack - we cannot guarantee it is valid.
• A bug in a user program could crash the interrupt handler...
• Kernel stack - we need to switch stacks (takes time, flushes caches/TLB).
• No fixed answer - depends on performance tradeoffs within the OS.

# 26. Precision of interrupts

• A scalar processor is shown in a), it executes instructions strictly in sequence.
• The value of the PC tells us exactly which instructions have been executed, and which have not.
• A superscalar processor executes instructions out-of-order.
• The PC does not tell us which instructions have finished any more.
• Precise interrupts on a CPU allow the operating system to determine the state of each instruction.
• On a superscalar processor, some partially executed instructions may be "rolled back".

# 27. Summary

• We've covered the mechanisms available to the OS to implement basic I/O on a range of devices.
• Shared registers (status, control and data).
• Interrupt handling.
• DMA.
• Next time we look at how the O/S software uses these mechanisms.