A Client Based Prefetching Implementation for the WWW

Modifications to Mosaic

In order to implement our prefetching algorithm, we needed to make a number of additions and modifications to the Mosaic code. First, we started with an already modified version, described in [5], that allows us to get exact timings on all network retrievals, header sizes, and some other timing information. The next step was to add the necessary data structures to collect the history data that would be used in the creation of the prefetching tables. To achieve this, not only did the original Mosaic history code have to be fixed so that it worked correctly, but we needed to expand upon it to allow a strict chronological history for each session.

A disk cache was added next, to provide storage for prefetched data. It was concluded that prefetched data is not as important as data cached due to visiting a document. As such, it was decided to not allow the client to store prefetched data directly in the memory cache. Instead, all prefetched data is stored on local disk until accessed by the user (if ever), at which point it it loaded into memory.

The third large change to the client was the addition of prefetching itself. This was broken up into two major parts: the implementation of the prefetching algorithm for creating the prefetch tables, and the code that will actually do the concurrent prefetching. The former, although the most important piece of code, was fairly straightforward, while the latter involves forking child processes and interprocess communication through the use of a message queue. Concurrency was required in this case so that prefetching could be done without disrupting the client's (parent's) operation.

Finally, we added a few options that may be set by the user, as well as some performance measures that will give a general indication of how well the client is behaving. All the new user options are set in a file named .mosaic-init, and allow for prefetching and disk caching to be turned on or off, as well as for changing various prefetching parameters. The performance measures included allow the tracking of bytes accessed, bytes prefetched, as well as various cache hit ratios. All these measures will be described in detail in a later section of this paper.

As an aside, we must point out that all size measurements used by our client include the header that is sent during a network fetch. This is universal in all measurements, so it does not have an impact on the results gained through the performance measures. We include the note to explain the discrepancy between the file sizes used by the client and the actual sizes of files saved to disk.

Back | Index | Forward