DV1460 / DV1492:

Realtime- (and) Operating-Systems

08:15-10:00 Tuesday September 27th, 2016

FS Overview and Implementation.

§ 4-4.4.3 pg 263-290

Table of Contents
OverviewFree space
File structure and formatsFile organisation
File / Directory Organisation

1. Introduction

  • Computers generally do three things.
    • We've already looked at Processing.
    • Memory was the short-term overlap.
    • Transient: lost when process exits.
  • Long-term information storage.
    • Data should be stored until further notice.
    • User needs to be confident that data cannot be lost accidently / unexpectedly.
    • Data inside programs is explicit for the programmer, but implicit for the user...

2. Explicit storage of data

  • Users require explicit control over data storage.
  • Persistence: decouples lifetime of data from process.
    • Allows user to control when it is written.
    • How long it lasts before it is removed.
    • Stored data should be available until explicity deleted.
  • Copying: guarantees duplication of data.
    • Allows checkpointing (recovering intermediate states, e.g. thesis_v5.tex).
    • Allows sharing and communication (e.g. taking data from word processor, sending it by email).
  • Organisation: finding data at a later time.
    • Expressing relationships between data (e.g. part of same project).

3. Definitions

A logical unit of information.
  • Different meanings in different contexts.
    • Each application can map data onto a collection of files.
    • User can control which information is stored together.
Remaining in existence until explicitly removed.
  • Implies that stored data should not be lost when a process exits or the system reboots.
File System (FS)
The storage and management of collections of files.

4. Issues

  • Some of the issues in design and implementation of a FS:
    • How do we locate information?
    • How do we provide confidentiality?
    • How do we provide integrity?
    • How do we manage free space?
    • What structure is supported for data within files?
    • What operations are supported on files?
    • How does the logical organisation map onto disk operations?
  • We look at the user interface to storage first.
  • This leads into how we implement what the user sees.
  • Which design choices arise.

5. File Structure

  • We have different options for a "Logical unit of information".
  • Different forms of structure for the contents within each file.
  • Affects how much freedom each program has in choosing structure.
  • Raw byte sequence: least structured, program has complete control.
    • Raw abstraction of most systems; program implements everything.
  • Records: sequence of fixed length pieces (array).
    • Not used directly anymore, indirectly implemented by programs.
  • Tree: highly structured, less flexible (in terms of what can be written).
    • Seen on mainframes, OSX resource forks, the fabled Reiser4 etc.

6. File Structure

  • There is a wide design-space of options for file structure.
  • Files are used to transfer data between systems.
  • Some systems have richer representation, some less so.
    • How should files be exported for use on other systems?
    • e.g. mounting HFS on non-mac systems and accessing resource forks.
  • The trend over time has been towards least-common-denominator.
Providing file structure in the OS has largely been superceded by applications defining structure over a raw byte stream.
  • Optimising the simplest case for data storage, adding more structure in file-formats, seems to work better overall.

7. File Types (Formats)

  • Problem: When files are raw streams of bytes programs need to agree on the structure of data within.
  • Solution: Define structure of each kind case-by-case; specify a file-format.
  • Many commmon formats are textual: sequence of symbols.
    • ASCII - 128 characters (other 128 depend on context).
      • Each symobl is always a byte.
    • 7-bit clean files are easily recognisable.
    • UTF-8 (unicode) represents more symbols.
      • Breaks the connection between a symbol and a byte.
      • A code-point (basically a symbol) can be a sequence of bytes.
      • Harder to process, but more useful.

8. Text vs Binary

  • ASCII (or UTF-8) provide a common interface.
    • Many viewers, editors, processing tools.
    • Least common denominator (tools are available everywhere).
  • More specific structure can be layered on top.
    • e.g. Tree data as XML, Records as CSV.
  • Anything not encoded as UTF or ASCII is lumped together as "binary".
    • Each binary format is different - needs specific tools.
    • More difficult to process (both for users and programs).
    • Better optimised for specific tasks.
      • Smaller file-sizes.
      • Faster access.

9. Examples of binary formats

  • Magic number: explicit constant to indicate file type.
    • File formats are similar to network message specification.
  • The nice clean lines don't really exist.
    • The sequence of bytes needs to express divisions into different pieces.
    • Fixed length parts can be implicit.
    • Variable length parts need well-defined encoding of size (e.g. known position/size before field).

10. Explicit Naming

  • Method to choose between files: names as human-readable labels.
  • Either strings in ASCII or UTF-8.
  • Design choice: should names be case-sensitive?
    • NTFS and UNIX generally choose yes - equals strings name the same file.
    • The Mac chooses to be different - HFS+ is insensitive by default.
    • Making it easier for people vs programs (programmers).
  • Fixed-length names are easier to work with (but can get awkward).
    • Variable length names makes the implementation more intricate.
  • Names allow the selection of files.
    • Need to be stored in containers for collections of data...

11. Flat Directory Structures

  • Within a single-level directory system all names must be unique.
    • e.g. naming scheme on a camera IMG00000.JPG, IMG00001.JPG ...
  • Doesn't scale up well:
    • Typical desktops / small networks store millions of files.
    • Uniqueness becomes difficult.
    • Same name often works in many contexts (e.g. readme.txt).
  • Independent naming contexts.
  • Remembering which files amongst millions relate to each other is hard.
  • Users want to group files together.
  • Allowing directories of files and also directories...
    • ... solves both issues of uniqueness and grouping...

12. Hierarchical Directory Structure

  • Grouping files together allows users to organise data.
    • Tagging is slightly more powerful (database style queries on relations).
    • Hierarchy is cheaper to implement.
  • Directories can contain both files and directories.
    • Arbitrary number of groups, arbitrary depth of nesting.
  • Powerful enough to cover most use-cases.
  • Can indicate privacy over entire group (directory or sub-tree).
    • Good organisational tools for multi-user sytems.
  • Names must be unique within a single directory, not the entire FS.

13. Paths

  • Path: How to navigate a tree.
    • e.g. /usr/jim or /usr/lib/dict
  • Component: each name in the path.
  • Separator: illegal character for a component.
  • The path is a sequence of directions.
  • Each component is where to go next: name in directory.
  • Most UNIX systems use / as separator.
  • Windows uses \.

14. Paths

  • Absolute path: starts with the separator.
    • Instructions start at the root.
    • e.g. /usr or "\Program Files".
    • UNIX uses a single root: FS is one tree.
    • Windows uses a root per drive (C:, D: etc), forest of trees.
    • Locates a single file or directory.
  • Relative path: starts with a component.
    • Different targets, relative to a location.
  • Current Working Directory (CWD) is stored in each process.
  • Relative paths are instruction from the the CWD.
    • e.g. CWD=/usr lib/dict refers to /usr/lib/dict

Break (15mins)


15. Directories

  • Special names: exist inside every directory.
    • . is the directory itself, e.g. filename = ./filename
    • .. is the parent directory i.e. .. is always up in the tree.
    • .. of root is the root (a loop).
  • Every directory is a namespace (dictionary).
    • Names within each directory are unique keys.
    • Values are locations of other directories or files.
  • We will look at different schemes for writing locations.
    • They are equivalent to pointers to disk blocks.
    • Multiple pointers can refer to the same location.
    • Allows different names (paths) for a file / directory.
    • Directories and paths are a logical organisation.
    • Using pointers separates it from the physical organisation.

16. Typical Directory API

  • To manipulate directories the system provides an API (normally POSIX).
  • These operations are exposed in UNIX as shell commands.
    • Creation: . and .. are automatically added and linked.
    • Deletion: only empty directories can be deleted.
      • Multiple ways to handle deleting trees with links.
      • Program must decide by doing it first.
    • Open/Close/Read: allows a program to access the list of entries.
    • Rename: move.
    • Link/Unlink: add/remove new names to the directory (delete if last unlink).
  • This concludes the user view of the FS, move on to implementation.
    • More information on using it in DV1466 (Intro to Linux).

17. File-system Layout

  • Program (user) view is the logical organisation of the data in the FS.
  • FS Implementation is mainly concerned with the physical organisation.
    • How to map the structure described onto the disk API..
Disk API
Normally a disk has a fixed number of fixed-size blocks (e.g. \(10^9\) 4kb blocks) and is accessed by read(k) and write(k).
  • Typically the disk is divided into smaller logical pieces (partitions).
    • Each partition (drive) contains an independent file-system.
    • Isolate failures (generally mechanical devices).
  • Each partition is accessed through an API similar to the raw disk.
    • Read(\(k\)) or write(\(k\)) for \(k<n\).

18. File-system Layout

  • Boot-strap problem: code to access the FS inside the OS inside the FS...
  • Example of a possible file-system layout.
    • Boot-block in a known location: solves the boot-strap problem.
    • Simple interface for BIOS/EFI to load known blocks, execute contents.
    • Superblock contains admin: size, FS type, locations of other boxes.
    • Description of which blocks on the disk are free.
    • I-nodes are the tree nodes for the FS.
    • Link into the files and directories.

19. File Allocation: Contiguous

  • Allocating a file as a contiguous range of disk blocks:
    • Easy to represent, file is start block and size.
    • Indexing into block \(k\) of the file is just addition.
    • Maximum read performance on spinning disks.
  • When files are deleted they leave holes...
    • Same fragmentation problem we saw in memory allocation.

20. File Allocation: Contiguous

  • Avoiding fragmentation: we could preallocate (max) space for the file.
    • Prevents deletion and reallocation when it changes.
    • Works pretty well in write-once applications: all file sizes are known.
    • e.g. burning UDF FS onto an optical disk.
    • e.g. creating read-only boot systems (building a server image).
  • Also works well if we write out incremental snapshots.
    • Version-control at the FS level, e.g. Fossil (Plan9) or ZFS.
  • Another use is to split the file into extents.
    • A collection of contiguous pieces of the file.

21. File Allocation: Linked List

  • Another approach to file allocation is a linked list for each file.
  • Each block in a file stores the next block in the file.
  • Zero fragmentation.
  • Sequential file access becomes random access on the disk.
  • Mechanical disks are slower at random access than sequential.
  • Problem: Indexing into block \(k\) (random access in the file) requires reading \(k\) blocks to follow the list.
  • Problem: Block size is slightly less than \(2^n\): alignment problems, e.g. 4KB access will span blocks.

22. File Allocation Table (FAT)

  • File Allocation Table: (FAT) use a single table to hold all lists together.
  • Directory entry points to start.
    • e.g. A: 4,7,10,12.
  • No pointer inside blocks: store \(2^n\).
  • Keep table in memory - faster to index into for random access.
  • One entry for each block on disk: problem for larger FS.
  • Sentinel (-1) at end of each list.
  • FAT was introduced in MS-DOS, FAT32 was Windows 95 - still used as standard for most removable media.

23. I-nodes

  • Last approach for file allocation: index-node (inodes) are a block listing file blocks.
  • Splits FAT into many indices - one per file.
  • Don't put entire disk table in memory - only open files.
  • Scales much better for large FS.
  • Problem: FAT allows arbitrary length files, i-node can only hold fixed number of pointers.
  • Solution: chain i-nodes together (linked-list of index blocks).
  • Used in UNIX / NTFS.

24. Implementing directories

  • The file-allocation records where each file is on the disk.
  • But to open/read these files by their path, the OS must:
    • Walk through the directory tree, according to the path components.
    • Find the location of the file (block-range, first block or inode).
  • The implementation of a directory must:
    • Map the ASCII name string onto some disk block.
    • Could be the files (as above) or the next directory component in the path.
  • Mapping could be fixed-length names:
    • 8+3 in MSDOS, a label and an extension indicating file format.
    • 14 characters in older unix, any mix of labels and/or extensions.
    • 255 characters in most modern systems (almost arbitrary strings).
  • Variable-length names are slightly more challenging...

25. Variable-length filenames

  • a) shows file-names inline.
  • Names are padded to 4-byte boundaries.
  • Each entry is a different size.
  • Making listing the directory more complex.
  • Fragmentation inside listing if files are deleted.
  • b) shows a heap approach.
  • Each entry is the same size.
  • Simpler to list entries, no fragmentation in entries themselves.
  • Still need to compact the heap when it is fragmented (can be done in memory).

26. Scalability of directory structures

  • Schemes shown so far assume linear search for filenames.
    • So walking a path is \(\mathcal{O}(n)\) for each component.
    • Must search for name inside directory at each step.
  • Large directories slow down the system (even opening their subdirectories).
  • What about servers, NFS, NAS - millions / billions of files?
    • Hash filenames.
    • Split the lists seen so far into the chain in each hash.
    • Average lookup is \(O(1)\).
  • Implementation is more complex.
    • inode requires hashtable structure followed by multiple lists.

27. Attributes in directories

  • Attributes are general meta-data for files.
    • Track usage information: creation / access times.
    • Ownership
    • Access rights
    • More general tags / categories.
  • These can be stored in the directory entry as shown in a).
  • Can be put into a separate structure: b) shows dedicated i-node.
  • Flexibility.
  • Scope for optimising small files (e.g reusing i-node).

28. Summary

  • Logical organisation: user visible organisation of files and directories.
  • Physical organisation: layout of blocks on the disk
  • We've seen most of the implementation section:
    • File Allocation.
    • Directories.
  • Rest is in the next lecture.
    • Links, Journalling File Systems and VFS.