UNIX Files and File Hierarchy


Motivation

Essentially, computers are used to store and process information. Information that is being processed by the computer right now is usually store in RAM. Why? Because RAM is faster for the CPU to access.

Nonetheless, information that you store in computers is not always being processed. For example, you may need to store information for days, weeks, months, and even years. Two common mediums for storing information for long periods of time are harddrives and floppy disks. Specifically, we will look at how information is organized on disks (floppies and harddrives), but similar organization is used with other mediums, like CDs.

Files

What types of information might we want to store?

If we want to store this information on a disk, we don't want to scatter it all over the disk haphazardly, but rather, we want to store it in an organized fashion. To do this, the operating system (e.g., UNIX) allows us to create files.

File
An abstract object that holds information. In other words, a logical unit of information on a disk.

Why do I say that a file is an abstract object? Because, even though we may think of a file as a single chunk of information on the disk, it could actually be spread throughout the disk in some fashion.

It is therefore an abstract object in the sense that you will always deal with it as a single object, the operating system (e.g., UNIX) isolates you from the fact that it is broken up.

What else might we store in files?

How to refer to files - naming them

Just like you can refer to individual people with names, so too do we use names to refer to individual files. For example, I might give the files holding my resume, my English paper, and the picture of Aunt Betty the following names respectively: With human names, there are some general rules: they consists of mainly letters and some punctuation marks (-, ', etc.), but rarely contain things like digits. In UNIX, there are rules for what characters (e.g., letters, digits, punctuation, etc.) can be use in a filename.

In UNIX, almost any character can be used in filenames.... Nonetheless, many characters have special meanings and must be treated in a special way if you want to use them in a filename. Therefore, it is most common to stick to: letters, digits, underscores (_), and periods. You will see a few other characters that are used as well. Note that spaces are one of those characters that are hard to deal with in filenames, so we usually use the underscore (_) in its place (as in aunt_betty above).

Filenames are case-sensitive

In UNIX, filenames are case-sensitive. All that means is that UNIX treats upper and lowercase letters as distinct in a filename. For example, the filename Aunt_Betty is a different filename than aunt_betty.

File Extensions

Often, you'll see filenames that include something called an extension. For example, if I had a set of documents that I created with my favorite word processor I might give them filenames like:

These files have names just like the files mentioned before, the only difference is that these filenames all end in .doc. The .doc part of the filename is what we call the extension and the part before .doc is called the base: . Note that paper1.doc is just a filename just as paper1 is a filename and, in that sense, the extension is nothing special.

We use extensions in order to give information about the format in which information is stored in a file as part of its filename itself. For example, I used .doc above to indicate files that were prepared by my word processor.

Extensions, by convention, begin with a period and are usually followed by a short abbreviation. For example, if I prepared some text with an editor (like Emacs) and then wanted to store that in a file, I might name that file something.txt.

Historically, it has become common to use extensions in this way. For certain file formats, common extensions are often used. For example,

.doc
a file prepared by a word processor.
.txt
a plain text file.
.c
source code for a program written in the C language.
.bak
a backup copy of another file.
It should be noted that you can often choose whatever extensions you want; however, there are some common ones that most people use for certain file formats and you should use them. Furthermore, some programs will require you to put certain extensions on files in order to deal with them properly. After time, you will become familiar with what extensions are commonly used, and when certain extensions are required.

File Hierarchy

Organizing Files

Suppose that I want to keep my resume in a file called resume.doc and you want to keep your resume in a file called resume.doc too. If all that made a file unique was its filename, then we would be in trouble, because we cannot have a single filename referring to 2 different files.

However, as it turns out, we don't have a problem since my resume.doc exists in my area and yours exists in your area and because they are in different areas, they do not conflict. These areas are where you are automatically put when you log in.

What are these areas?

In order to organize files, UNIX has what are called directories. A directory is simply something that can hold files. As a physical analogy, think of directories as folders in which you can put files.

Thinking about our earlier example, my resume and yours do not conflict because mine in is my folder (directory) and yours is in your folder.

When we each log into our account, we each get put in our own directories, these are called our home directories.


Aside:
Although you can think of UNIX directories as holding other files, they are actually implemented as specially-treated files that contain information about what files are in that directory. In our folder analog, this would mean instead of putting a bunch of papers in a folder, we write what files should be in that folder on another piece of paper. Nonetheless, you can always think of directories as folders and don't have to worry about the underlying details. In that sense, directories are abstract objects like files, i.e., the operating system hides the messy details.

More than just one directory to organize stuff

After you have had a account for a while, you may have created many files to hold information. For example, several English papers: paper1, paper2, short_essay; several lists: birthdays, addresses, phone_list; and several pictures: aunt_betty, uncle_jim, mary.

As you can imagine, your home directory can become cluttered with many files. To solve this problem (using the old analogy) you would like to put folders within folders. Under UNIX we can do this, i.e., we can have a directory inside our home directory, called papers, that holds all of our English paper files. Similarly, we could have another directory for lists and one for pictures.


Definition: If one directory is in another directory, we say that the first is a subdirectory of the other. For example, papers would be a subdirectory of my home directory.

Since we can have directories within directories, we say that files and directories form a hierarchy, which we can draw as an upside down tree. The top of the figure is called its root (remember, the tree is drawn upside down). Here is an example of a file hierarchy drawn as a tree:

Notice that in this tree only directories have other files or directories underneath them, since only directories can contain other files.

Continuing with our example, your part of the UNIX hierarchy that consists of your home directory and anything underneath it might look like:

Here, you have a single home directory (typically named the same as your login name). In this home directory, you have the resume file and 2 subdirectories to organize papers, lists and pictures. Notice that the subdirectory papers has 3 files corresponding to papers you have written and each is stored in a separate file: paper1, paper2 and short_essay.

The Big UNIX File Hierarchy

UNIX machines organize files under one big file hierarchy. The top directory in this file hierarchy is called the root directory and is named by a single forward slash (/). Some parts of the hierarchy hold programs needed to run the computer. Some parts are used to hold your e-mail. Your directories and files (those under your home directory) are also part of this file hierarchy.


BU CAS CS - UNIX Files and File Hierarchy
Copyright © 1993-2000 by Robert I. Pitts <rip@bu.edu> All Rights Reserved.