An important factor that must be considered is the distribution of file sizes for the requests received by a web server. Such distribution has been proven to be heavy-tailed, that is, most of the requests are for small files whereas very few will be for big files. The heavy-tail distribution has a significant effect on the performance achieved by the web server since usually less than 1% of the requested files make up half of the load experienced by the server. This property, known as the heavy-tail property, significantly affects the performance of the scheduling policy in place.
Our goal is to study the effect of the distribution of file sizes on the performance of different components of a web server architecture, and study the performance improvements that can be achieved by applying scheduling policies that use certain knowledge about the sizes of the requests received at the server. The studies have both a theoretical component and an experimental component. The theoretical component deals with the creation and analysis of models of web server architectures in order to obtain theoretical performance bounds. The QoS Networking Laboratory serves as a testing environment for implementing and experimenting with new web server architectures. Currently we are studying the effect of having different scheduling policies at different components of a web server architecture, that is, CPU, disk and network.