DV1460 / DV1492:

Realtime- (and) Operating-Systems

I/O : Hardware Principles

§5.1 (pg 337-351)

Table of Contents

1. Hardware Devices

  • A logical view of I/O is shown to the right.
    • Different kinds of devices, plugged into a common bus.
    • Each can communicate with CPU or Memory.
  • Huge degree of variations between the devices.
    • Bandwidth: 1 bytes/sec - 20GB/sec.
    • Latency: <1ms - 1s
    • Complexity: single function - small computer.
  • Controller interfaces the device to the bus.
    • Slightly common interface: status and control registers, exact details differ.

2. Device Controllers

  • The actual electronics in a device are completely specific.
    • The controller is there is make it a little bit sane.
  • Commands sent between the device and the CPU (both directions).
    • Perform some specific function.
  • Data sent between the device and the CPU (again, both ways).
  • Each register is a word, e.g. 32-bits.
    • Individual bits, or groups of bits, encode each of the above possibilities.

3. Device Controllers

  • Register organisation is specific to each device (e.g. brand and model).
  • CPU accesses these registers over the bus with read/write operations.
  • Status registers are read to discover information about the device.
  • Control registers are written to issue commands.
  • Data registers are read/written for transfers.
  • Code on the CPU needs to know the specific details for each device.
  • The device IDs are hardcoded (known) constants to determine which device the bus operations are addressed to.

4. Examples

  • The structure of lectures 11,12 and 13:
    • Start with the hardware, some simple examples.
    • Look at the software (device drivers).
    • Return to more complex examples to show the combination.
ExampleControl RegStatus RegData RegInterruptsDMA
  • Register sizes are measured in bits.
    • Rough guie to how complex the device and interface will be.
  • Examples are based on real hardware (PS/2, IDE and 3c509c) but simplified.
    • The various registers are universal, usage of interrupts and DMA varies.

5. Using I/O Ports

  • To communicate with a device the CPU needs to read/write shared registers.
  • The simplest scheme for I/O is to add two instructions to the CPU:
    • IN REG, PORT: read a value from the port number.
    • OUT REG,PORT: write a value to a port number.
  • Implication: global address spaces for device registers.
    • It is shared across all brands and models of devices.
    • This makes it quite hard to administer.
  • This is common in embedded systems: known set of devices.
  • The code of programs can mix up memory access, computation and I/O:
movl $7, %eax addl (%eax), %ebx out %ebx, $3

6. I/O Ports Example

  • I thought USB would make an interesting real-world example.
  • PS/2 connectors with a 8042 controller chip are much simpler.
  • Start with the port specification.
0x60ReadRead Input Buffer
0x60WriteWrite Output Buffer
0x64ReadRead Status
0x64WriteSend Command
  • This pair of registers forms the communication channel with a PS/2 device.
  • To use it we need to access the ports (not native to C/C++).
inline void outb(uint16_t port, uint8_t value) { __asm("outb %b0, %w1" :: "a"(value), "d"(port)); }

7. I/O Ports Example

  • Only two registers: pack a different function into each bit (e.g. status).
  • Multiplex different registers on the same port.
    • Sending a command can choose what the output buffer does.
  • I/O programming is very low-level.
0OBFOutput Buffer Full
1IBFInput Buffer Full
2SYSReset cold or softare?
3A2Internal address line 2
4INHKeyboard inhibited?
5MOBFMouse buffer full
6TOTimeout, bad?
7PERRParity error, connected?
kbRead: inb $0x64, al ; Read Status byte andb $0x02, al ; Test IBF flag (Status<1>) jz kbRead ; Wait for IBF = 1 inb $0x60, al ; Read input buffer ... or equivalently in C ... while( !inb(0x64)&0x02 ); scancode = inb(0x60);

8. Address space options

  • Previous example corresponds to design a).
    • Separate address space for I/O ports and memory.
  • Option b) puts both in the same address space.
    • The processor does not need separate instructions.
    • Accessing ports looks the same as memory.
movl $portaddr, %eax movb $0x10, (%eax)
  • Option c) is a hybrid design.
    • Some registers (normally status and control) have port addresses.
    • Others (such as data) have memory addresses.
    • Programs code mixes up port and memory accesses.

9. Comparison between approaches

Common address spaceSeparate spaces
HardwareOne bus with address lines, each controller must filter ranges.Address lines + bus selection lines.
Language InterfaceFits into C, C++Requires inline asm
ProtectionFlags in page-tableroot / supervisor privledges
Instruction modeAny instruction with address can useDedicated instruction for access, then function
CachingMust exclude I/O rangesNo special treatment

10. Multiple buses

  • Figure a) shows a single-bus system: I/O and memory goes over the same wires.
  • Each controller has to look at every operation, decide if it is the target.
  • This means that the controller has to work at the same speed as memory.
  • Bandwidth must be split between I/O and memory traffic.
  • Figure b) shows a dedicated memory bus (high speed and bandwidth).
  • This requires the memory to sit on both buses (multi-ported).
  • Bus snooping.
  • Address filtering.

11. Multiple buses

  • With memory-mapped I/O, which bus should the CPU use?
  • Option: Try the fast bus first, fall back if no response.
    • Simple to implement, but adds latency to I/O operations.
  • Option: I/O devices watch addresses on memory bus, signal when interested.
    • Controller must run at high frequency for the fast bus.
  • Option: Reserve a memory range for I/O devices.
    • Bit more complex at boot-time, how PCs work.

12. Summary of Registers

ExampleControl RegStatus RegData RegInterruptsDMA
  • The example so far looked at the register interface for an 8042 controller.
    • Each bit of the register has a meaning.
    • Very low-level interface to the raw hardware behind the controller.
    • Writing 0s/1s into control registers issues commands / activate circuits.
    • Reading 0s/1s from status registers checks specific circuits in device.
  • For more complex devices we can see the size of the interface increases.
    • Network cards are not the most complex device: GPU interface 10-100x larger.
  • We will take a more abstract look at the final two examples.

13. Harddisk example

  • Red lines show communication between units in the HDD.
    • The control logic is linked to everything so no wiring shown.
  • The interface visible to the CPU has to control the physical hardware in the drive.
  • More complex than mouse / keyboard: bi-directional data flow.

Break (15mins)


14. NIC example

  • Big fat lines show high-bandwidth communication paths.
  • Data is bi-directional, and now the device initiates communications.
  • Multiple layers of network model: much more control / status.

15. The other hardware features

ExampleControl RegStatus RegData RegInterruptsDMA
  • Now we've seen the four examples and how they scale up in complexity.
  • The last two mechanisms that we look at are interrupts and DMA.
    • These allow more sophisticated interaction with devices.
    • So far we've seen very simple port writing / reading by the CPU.
  • Both the PS/2 mouse and keyboard share the same controller.
    • Keyboard polling is hidden behind an external interrupt (clock).
    • The mouse generates its own interupts to deliver X/Y coordinates.
    • Both are quite low bandwidth communications.
  • HDD and NIC are higher bandwith, NICs require low latency as well.
  • Harddisks are higher bandwidth, quite tolerant of latency.
  • Ethernet is very high bandwidth - requires low latency.

16. Example of real code (3509c driver)

// Drop everything, so we are not driving the data, and run the // clock through 32 cycles in case the PHY is trying to tell us // something. Then read the data line, since the PHY's pull-up // will read as a 1 if it's present. select_window(nic, 4); outpw(nic->iobase + PHYSICAL_MANAGEMENT, 0); for (i = 0; i < 32; i++) { udelay(1); outpw(nic->iobase + PHYSICAL_MANAGEMENT, PHY_CLOCK); udelay(1); outpw(nic->iobase + PHYSICAL_MANAGEMENT, 0); }
  • Example from the NIC negotiating the physical layer.
    • As well as setting individual bits to control a hardware function...
    • ...timing constraints as well.

17. Motivation for DMA

  • When the CPU sets a value in a register - very fast transition in device.
    • Actual hardware may need to operate slower.
    • Depending on the length of delay:
      • Busy loops (very small pauses).
      • Concurrency primitive (e.g. semaphore) within kernel.
      • Suspend operation and wait for an interrupt.
  • For very short pauses busy loops may be the only option (low-latency).
    • But they tie up the CPU.
  • Similar problem with a sequence of operations.
    • e.g. writing a buffer to a register one word at a time.
  • Very simple jobs occupying the CPU when it could be running code.

18. Direct Memory Access

  • Direct Memory Access (DMA) controller.
    • A very simple co-processor.
    • CPU programs it by writing into its control registers.
    • Perform repetitive tasks - e.g. read from a register and write to memory.
  • Goal: the CPU can do useful work while the low-level transfer is happening.
  • DMA controllers have several banks of control registers.
    • Each one programs a different transfer sequence ("channel).
    • Allows multiple transfers to run at once.
  • DMA controllers can be built into the bus (e.g. PCI).
    • High-bandwidth controllers may have dedicatd DMA controllers.
  • The CPU is no longer the only device issuing requests on the bus.
    • DMA controller(s) can also write onto the bus.
  • This is a strong motiviation for a single system bus.

19. Steps in using DMA

20. Different types of transfers in DMA

  • Word at a time (also called cycle stealing).
    • No need for DMA controller to lock the bus.
    • When it wants a transfer the CPU has to wait.
    • Low overhead for low bandwidth applications.
  • Burst mode.
    • DMA controller locks the bus.
    • Transfers several words at once.
    • Frees the bus to allow other devices to use.
    • Lower overhead for high bandwidth applications.
  • Flyby-mode: device writes to memory directly.
    • Alterative: devices write to DMA controller, then transfer.
    • Lower complexity devices, allows buffering+burst.

21. Combination of I/O mechanisms

  • The hardware introduced so far fulfils different roles in I/O.
    • Individual commands and responses: control and status registers.
    • Small transfers: direct use of data registers.
    • Large transfers: automated use of data registers via DMA.
  • The final hardware mechanism used in I/O is the interrupt.
  • We've seen their use a few times already:
    • Syscalls - manage privilege escalation via software interrupts.
    • Processor scheduling - timing interupt to preempt a process.
    • Virtual Memory - handling of page-faults.
    • File-system - assumption that disk blocks arrive somehow.
  • Now we can fill in the details of they work.

22. Interrupts mechanism

  • The interrupt controller is a separate logical entity from the CPU.
    • Can be integrated into the same physical package (bus controller).
    • Can also be integrated into the device controller.
  • Allows the device to initiate communication with the CPU.
  • Diagram shows the steps that occur over the bus.

23. Multiple Interrupts

  • Conceptually interrupts are simple when we consider a single controller.
    • CPU sees a signal and calls a handler routine.
  • How can multiple interrupts be serviced?
    • The new interrupt can be delayed until the old one has finished its service handler.
    • If the interrupt is time-critical then data could be lost, e.g. if a buffer overflows.
    • The CPU needs a way to choose which interrupt to service.
  • Controllers identify the source of the interrupt by placing a number on the address lines.
  • This is called the Interrupt Request Line (IRQ).

24. Multiple Interrupts

  • The IRQ is used as an index into the Interupt Service Vector (ISV).
    • This is a table of addresses stored at a constant address in memory.
  • The issue of which interrupt to service can then be solved by prioritisation.
    • When an interrupt is issued the controller places the IRQ on the address lines.
    • The CPU can then distinguish between interrupt sources.
    • Each IRQ can be given a priority level, e.g. lower or higher values have priority.
  • An interrupt can itself be interrupted if the new priority level is higher.

25. Issues in saving state

  • The fastest approach is to save state inside the processor.
    • A hidden (not programmer visible) bank of registers.
    • But this will be overwritten if a new interrupt is higher priority.
    • To use this fast approach the OS must disable (not acknowledge) interrupts until processing is finished.
  • The alternative approach is to save information on the stack.
    • Which stack? The user process, the kernel?
    • Process stack - we cannot guarantee it is valid.
    • A bug in a user program could crash the interrupt handler...
    • Kernel stack - we need to switch stacks (takes time, flushes caches/TLB).
  • No fixed answer - depends on performance tradeoffs within the OS.

26. Precision of interrupts

  • A scalar processor is shown in a), it executes instructions strictly in sequence.
  • The value of the PC tells us exactly which instructions have been executed, and which have not.
  • A superscalar processor executes instructions out-of-order.
  • The PC does not tell us which instructions have finished any more.
  • Precise interrupts on a CPU allow the operating system to determine the state of each instruction.
  • On a superscalar processor, some partially executed instructions may be "rolled back".

27. Summary

  • We've covered the mechanisms available to the OS to implement basic I/O on a range of devices.
    • Shared registers (status, control and data).
    • Interrupt handling.
    • DMA.
  • Next time we look at how the O/S software uses these mechanisms.