Concurrent Prefetching
In designing the code that would actually do the prefetching during a session, our primary goal was to make the fetching totally invisible to the user, except in the aspect of latency during traversals from one document to the next. The way this was implemented was to spawn multiple child processes that would prefetch relevant data while the parent continued functioning normally.

To achieve this, a number of additional data structures had to be added to the client to keep track of it's children and what they are fetching. We used a linked list to store this information, where each node contained the process id of the child, and the URL that it is currently fetching. Thus, whenever the user asks for a piece of data (new document or image), this linked list is checked to see if the data has already been requested and is currently being fetched by a child. If it is not, the parent continues and loads the data in the normal fashion. If it is, then the parent must communicate with the child to get the status of the data it needs.

The method we chose for interprocess communication was the message queue. This choice was made for a number of reasons: first, in using message queues, we did not need to worry about atomicity of write operations, or insuring some type of concurrency control for multiple writes (as in shared memory, for example). We also do not need to access the messages in any particular order, just as long as each one is eventually received. In truth, any form of interprocess communication could have been used, with the only difference being the ease of implementation.

To continue, once the parent knows a child is fetching the data it needs, it starts processing messages off the queue. It knows the process id of the child it is waiting for, so messages keep getting processed until the correct one is received. Each child puts a message on the queue before it terminates with the following content: it's process id, the status of the fetch (successful or not), the path to the data and the size of the data. As each child fetches the data, it is written to disk, so the size of the messages on the queue remains small. Thus, as the parent processes each message, it just adds to the disk cache table the information about the file fetched by a child. If that is the data it is waiting for, once it is in the disk cache, the parent loads it into memory and continues normal execution (i.e. displaying the data). If the data received is not what the parent is waiting for, then we add the data to the disk cache, and wait for the next message on the queue.

In the current implementation, there is no mechanism for periodically processing the message queue. So if the user never accesses any prefetched material the queue will just continue growing as more and more data is prefetched. At the end of each session, however, the queue is always parsed and each prefetched file is added to the disk cache in the hope of being accessed in a future session. This aspect will change depending on any cache replacement strategy added to the code. Also, the code only cleans up the message queue in case of a segmentation fault or bus error interrupt (and upon normal exit, of course), so the user must be aware that in case of some other abnormal termination the message queue does not get automatically removed, and will need to be physically deleted.


Back | Index | Forward