Disk Cache
The disk cache was added to Mosaic for a number of reasons. First and foremost, we did not want a prefetched document to be put into the memory cache, where it might eliminate something more important (i.e. anything that was cached due to accessing a URL). Second, it was much easier to implement the concurrent prefetching along with the disk cache, since child processes could simultaneously write their individual data to disk, instead of using an interprocess communication mechanism for passing the data. Thus the only communication between processes is a small sized message containing the URL and local filename of the data.

Our implementation of the disk cache was modeled directly after Mosaic's memory caching system for images. We used the same sized tables and the same hashing functions to keep the tables updated, maximizing our code reuse. The data structures for storing node information however, had to be quite different. For each piece of data that is cached to disk (image or text) we store the following information: number of accesses, last access date, local file path, URL, size, and a flag stating whether the data had been prefetched (and not yet accessed) or not. This last flag is used for gathering data for our performance measures.

For all image data accesses, we almost completely relied on Mosaic's internal functions for dealing with this type of data. In fact, when Mosaic loads image data, it is first saved to disk, it's format is determined, and then it is loaded to memory and subsequently displayed. We used this to our advantage by just accessing the data while it is on disk, so no new functions had to be written to save image data. And since Mosaic already has functions for reading image data from disk, we utilized those as well during disk cache hits of an image. Text data is never written to disk by Mosaic, so functions were written to save data and then read it back into the proper data structures upon a disk cache hit. Note that currently, the disk cache stores all non text data, even if the format is not recognized by Mosaic. Also, all network loads are cached to disk, and we allow for a copy of a document to reside on disk and in memory at the same time. It was felt that the overhead of tracking where each piece of data was located, and the much larger numbers of disk accesses was too great to include this feature. This is especially true since in many instances the disk cache will be significantly bigger than the memory cache, making the savings in disk space while something is in memory insignificant.

The disk cache was also made to be a persistent cache, so that entries do not get deleted at the end of each session. Rather all cached data remains in place (in a directory named .mosaic-disk-cache), and we dump the contents of the disk cache hash table to disk (into a file named .mosaic-cache-index) at the end of each session. This file contains all the relevant data about each entry, and is preloaded and parsed at the beginning of each session. Figure 2 shows an example of this file.

Because the goal of this project was to implement prefetching, the disk caching scheme as it stands is not complete. For our purposes, it does it's job well, but it is missing two elements: a cache replacement strategy and a cache consistency scheme. That is to say, we assume that we have an infinite disk cache and never check if the data that we store locally has been updated on the server. However, since these two elements are required in a fully functional caching strategy, we included two empty functions within the code. The first, called mo_check_cache_limit, is called whenever something needs to be added to the disk cache. It returns a flag indicating whether to drop the data or to go ahead and cache. A cache replacement strategy could be placed into this function, so that it does any necessary manipulations before allowing the caching to proceed, or possibly rejecting it outright. The second function, called mo_check_consistency is called whenever a disk cache fetch takes place (before any data is moved). Cache consistency could be added through this function, which would check the remote time stamps, do a comparison with the date on the cached entry and return a flag to either proceed with the disk load or label the file as stale, thus forcing a net load.


Back | Index | Forward