Sections


Main-Menu

header image

File System


File Structure

Unix hides the “chunkiness'’ of tracks, sectors, etc. and presents each file as a “smooth'’ array of bytes with no internal structure. Application programs can, if they wish, use the bytes in the file to represent structures. For example, a wide-spread convention in Unix is to use the newline character (the character with bit pattern 00001010) to break text files into lines. Some other systems provide a variety of other types of files. The most common are files that consist of an array of fixed or variable size records and files that form an index mapping keys to values. Indexed files are usually implemented as B-trees.

File Types

Most systems divide files into various “types.'’ The concept of “type'’ is a confusing one, partially because the term “type'’ can mean different things in different contexts. Unix initially supported only four types of files: directories, two kinds of special files (discussed later), and “regular'’ files. Just about any type of file is considered a “regular'’ file by Unix. Within this category, however, it is useful to distinguish text files from binary files; within binary files there are executable files (which contain machine-language code) and data files; text files might be source files in a particular programming language (e.g. C or Java) or they may be human-readable text in some mark-up language such as html (hypertext markup language). Data files may be classified according to the program that created them or is able to interpret them, e.g., a file may be a Microsoft Word document or Excel spreadsheet or the output of TeX. The possibilities are endless.

In general (not just in Unix) there are three ways of indicating the type of a file:

  1. The operating system may record the type of a file in meta-data stored separately from the file, but associated with it. Unix only provides enough meta-data to distinguish a regular file from a directory (or special file), but other systems support more types.
  2. The type of a file may be indicated by part of its contents, such as a header made up of the first few bytes of the file. In Unix, files that store executable programs start with a two byte magic number that identifies them as executable and selects one of a variety of executable formats. In the original Unix executable format, called the a.out format, the magic number is the octal number 0407, which happens to be the machine code for a branch instruction on the PDP-11 computer, one of the first computers to implement Unix. The operating system could run a file by loading it into memory and jumping to the beginning of it. The 0407 code, interpreted as an instruction, jumps to the word following the 16-byte header, which is the beginning of the executable code in this format. The PDP-11 computer is extinct by now, but it lives on through the 0407 code!
  3. The type of a file may be indicated by its name. Sometimes this is just a convention, and sometimes it’s enforced by the OS or by certain programs. For example, the Unix Java compiler refuses to believe that a file contains Java source unless its name ends with .java.

Some systems enforce the types of files more vigorously than others. File types may be enforced

  • Not at all,
  • Only by convention,
  • By certain programs (e.g. the Java compiler), or
  • By the operating system itself.

Unix tends to be very lax in enforcing types.

Access Modes

Systems support various access modes for operations on a file.

  • Sequential. Read or write the next record or next n bytes of the file. Usually, sequential access also allows a rewind operation.
  • Random. Read or write the nth record or bytes i through j. Unix provides an equivalent facility by adding a seek operation to the sequential operations listed above. This packaging of operations allows random access but encourages sequential access.
  • Indexed. Read or write the record with a given key. In some cases, the “key'’ need not be unique–there can be more than one record with the same key. In this case, programs use a combination of indexed and sequential operations: Get the first record with a given key, then get other records with the same key by doing sequential reads.
File Attributes

Note that access modes are distinct from file structure–e.g., a record-structured file can be accessed either sequentially or randomly–but the two concepts are not entirely unrelated. For example, indexed access mode only makes sense for indexed files.

This is the area where there is the most variation among file systems. Attributes can also be grouped by general category.

Name.

Ownership and Protection.

Owner, owner’s “group,'’ creator, access-control list (information about who can to what to this file, for example, perhaps the owner can read or modify it, other members of his group can only read it, and others have no access).

Time stamps.

Time created, time last modified, time last accessed, time the attributes were last changed, etc. Unix maintains the last three of these. Some systems record not only when the file was last modified, but by whom.

Sizes.

Current size, size limit, “high-water mark'’, space consumed (which may be larger than size because of internal fragmentation or smaller because of various compression techniques).

Type Information.

As described above: File is ASCII, is executable, is a “system'’ file, is an Excel spread sheet, etc.


Related Articles :



Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

Shaadi.com Matrimony - Register for FREE