Essentially, computers are used to store and process information. Information that is being processed by the computer right now is usually store in RAM. Why? Because RAM is faster for the CPU to access.
Nonetheless, information that you store in computers is not always being processed. For example, you may need to store information for days, weeks, months, and even years. Two common mediums for storing information for long periods of time are harddrives and floppy disks. Specifically, we will look at how information is organized on disks (floppies and harddrives), but similar organization is used with other mediums, like CDs.
If we want to store this information on a disk, we don't want to scatter it all over the disk haphazardly, but rather, we want to store it in an organized fashion. To do this, the operating system (e.g., UNIX) allows us to create files.
Why do I say that a file is an abstract object? Because, even though we may think of a file as a single chunk of information on the disk, it could actually be spread throughout the disk in some fashion.
It is therefore an abstract object in the sense that you will always deal with it as a single object, the operating system (e.g., UNIX) isolates you from the fact that it is broken up.
- resume
- paper1
- aunt_betty
With human names, there are some general rules: they consists of mainly
letters and some punctuation marks (-, ',
etc.), but rarely contain things like digits. In UNIX, there are rules
for what characters (e.g., letters, digits, punctuation, etc.) can be
use in a filename.
In UNIX, almost any character can be used in filenames....
Nonetheless, many characters have special meanings and must be treated
in a special way if you want to use them in a filename. Therefore, it
is most common to stick to: letters, digits, underscores
(_), and periods. You will see a few other
characters that are used as well. Note that spaces are one of those
characters that are hard to deal with in filenames, so we usually use
the underscore (_) in its place (as in
aunt_betty above).
Aunt_Betty is a different filename
than aunt_betty.
paper1.doc
paper2.doc
short_essay.doc
resume.doc
These files have names just like the files mentioned before, the only
difference is that these filenames all end in .doc. The
.doc part of the filename is what we call the extension
and the part before .doc is called the base:
.
Note that paper1.doc is just a filename just as
paper1 is a filename and, in that sense, the extension
is nothing special.
We use extensions in order to give information about the format in
which information is stored in a file as part of its filename
itself. For example, I used .doc above to indicate files
that were prepared by my word processor.
Extensions, by convention, begin with a period and are usually
followed by a short abbreviation. For example, if I prepared
some text with an editor (like Emacs) and then wanted
to store that in a file, I might name that file something.txt.
Historically, it has become common to use extensions in this way. For certain file formats, common extensions are often used. For example,
.doc
.txt
.c
.bak
resume.doc and you want to keep your resume in a file
called resume.doc too. If all that made a file unique
was its filename, then we would be in trouble, because we cannot have a
single filename referring to 2 different files.
However, as it turns out, we don't have a problem since my
resume.doc exists in my area and yours exists in
your area and because they are in different areas,
they do not conflict. These areas are where you are automatically put
when you log in.
In order to organize files, UNIX has what are called directories. A directory is simply something that can hold files. As a physical analogy, think of directories as folders in which you can put files.
Thinking about our earlier example, my resume and yours do not conflict because mine in is my folder (directory) and yours is in your folder.
When we each log into our account, we each get put in our own directories, these are called our home directories.
As you can imagine, your home directory can become cluttered with
many files. To solve this problem (using the old analogy) you would
like to put folders within folders. Under UNIX we can do
this, i.e., we can have a directory inside our home directory, called
papers, that holds all of our English paper files.
Similarly, we could have another directory for lists and
one for pictures.
papers
would be a subdirectory of my home directory.
Since we can have directories within directories, we say that files and directories form a hierarchy, which we can draw as an upside down tree. The top of the figure is called its root (remember, the tree is drawn upside down). Here is an example of a file hierarchy drawn as a tree:
Notice that in this tree only directories have other files or directories underneath them, since only directories can contain other files.
Continuing with our example, your part of the UNIX hierarchy that consists of your home directory and anything underneath it might look like:
Here, you have a single home directory (typically named the same as
your login name). In this home directory, you have the
resume file and 2 subdirectories to organize papers,
lists and pictures. Notice that the subdirectory
papers has 3 files corresponding to papers you have
written and each is stored in a separate file: paper1, paper2
and short_essay.
UNIX machines organize files under one big file hierarchy. The top directory in this file hierarchy is called the root directory and is named by a single forward slash (/). Some parts of the hierarchy hold programs needed to run the computer. Some parts are used to hold your e-mail. Your directories and files (those under your home directory) are also part of this file hierarchy.