A Client Based Prefetching Implementation for the WWW

Prefetch Table Creation

The algorithm used to determine which URLs to prefetch only relies on three indicators: the time a user takes to access a new URL from the one he/she is currently visiting (prefetch window), the number of accesses to the source and destination URLs (prefetch probability), and the size of the file that could be prefetched (max prefetch size). The first indicator is used in the creation of the prefetch tables, while the latter two are used to check if a URL is to be prefetched during program (client) execution. Note that all of these thresholds are set by the user in the initialization file, which is discussed in a future section. Also, in the current version of the code, all data is compiled at the end of each session, and saved to disk. During initialization, the tables are read from disk and remain in memory during execution.

The creation of the prefetch tables begins with a step by step analysis of the chronological history for the current session. In the chronological history, we record every access from every document visited by the user. So, for each source document, we first process all the inlines that were accessed and add them to the prefetch list for that document. Once the inlines are taken care of, we move on to the accessed links. If the link was accessed before a set number of seconds (the prefetch window), then we add that link to the source documents prefetch list as well. If it does not fall within the prefetch window, then we add nothing. In either case, the user has traversed a link, so we start the process again with the new source document, and repeat until we have parsed the entire session history. Note that the addition of a URL to a document's prefetch list does not necessarily imply that that URL will be prefetched, only that it is a likely candidate for prefetching. The other indicators (checked at run time) determine if anything is prefetched or not.

We do make a distinction between movement using embedded links in a document and movement using the Back/Forward/Home/History mechanisms. Specifically, we do not associate any correlation between two documents if the user used one of the latter navigation methods to arrive at the destination. In this case, nothing is added to the source document's prefetch list. We just continue the parsing process with the destination becoming the new source. Once all the parsing is complete, all data gets written to disk for use in future sessions.

The format for the prefetch file is as follows (see Figure 3): each source URL that was encountered is listed on a line beginning with a +. That line contains the URL and the total number of accesses for that URL. Each line underneath that begins with a - has been found to be a possible candidate for prefetching (for this particular source). The second character in the line is either a 0 or 1. This designates whether the URL is an inline image. If it it, then it is always prefetched, since it would be automatically loaded by the source URL. The rest of the line contains the URL and the number of times that is has been accessed from this particular source. Inlines contain a zero in this last field since they are always accessed when the source is loaded. This data is used to determine at run time whether to actually prefetch a non-inline URL. This is determined by finding the percentage of accesses to this particular destination from the current source. If the percentage is higher or equal to the prefetch probability, then we move on to check the size indicator. If the size of the file to be prefetched is smaller than the max prefetch size threshold, we go ahead and fetch the file. Otherwise we do nothing.

Back | Index | Forward