Software and Datasets Related to My Research

Mark E. Crovella


1995 WWW Client Datasets

These datasets are the basis for many published studies. They are available from the Internet Traffic Archives. The format of the traces and collection process are documented in our tech report BUCS-TR-1995-010.

This material is based upon work supported by the National Science Foundation under Grant no. CCR-9501822.


1998 WWW Client Datasets

In 1998 we captured a new set of client logs, using a method different from the 1995 set. The format of the trace data and collection process are documented in our tech report BUCS-TR-1999-011, and the trace itself is here.

This material is based upon work supported by the National Science Foundation under Grant no. CCR-9501822.


tcpeval

Tcpeval constructs critical path analyses of TCP transactions. It was developed by Paul Barford during his PhD thesis research. Its algorithms are described in the paper Critical Path Analysis of TCP Transactions in Proceedings of the 2000 ACM SIGCOMM Conference, Stockholm. Sweden, September 2000.

The source code is available in a compressed tarfile here. Included in the tarfile is a HOWTO with installation instructions.

If you download this code, please send an email to Paul Barford (pb at cs.wisc.edu) let him know you are using it, and whether you find it useful.

This material is based upon work supported by the National Science Foundation under Grant no. CCR-9706685.


Surge

Surge, which generates Web requests intended to mimic measured statistical properties is availble here.

The paper describing Surge's rationale and design is Generating Representative Web Workloads for Network and Server Performance Evaluation in Proceedings of Performance '98/ACM SIGMETRICS '98.

However, the default models and parameter settings used in this version of Surge are based on analyses of the 1998 dataset, documented in Changes in Web Client Access Patterns: Characteristics and Caching Implications in World Wide Web, Special Issue on Characterization and Performance Evaluation, Vol. 2, pp. 15-28, 1999.

This is the HTTP/1.1 compliant version of the code (HTTP/1.0 is still supported in this release). There is a detailed HOW-TO included which should get you going.

If you download this code, please send an email to Paul Barford (pb at cs.wisc.edu) who developed it and will put your name on the SURGE interest mailing list so that you will be notified about future updates. We'd also be interested in what you will be using the code for - if you could give him a brief overview I would appreciate it.

While the HOW-TO suggests using the MIT pthreads, if you are using a 2.2 Linux kernel, we recommend you compile using kernel threads (make sure your thread limit is set high enough!). To do that make the following mods:

This material is based upon work supported by the National Science Foundation under Grant nos. CCR-9501822 and CCR-9706685.


BPROBE

BPROBE is a tool for measuring bottleneck bandwidth of an Internet path, using the packet-pair technique. It was developed by Bob Carter during his PhD research. Source code for BPROBE is available here and the paper describing the design of BPROBE is here.

This material is based upon work supported by the National Science Foundation under Grant no. CCR-9501822.


Aest: A Tool For Estimating the Heavy Tail Index from Scaling Properties

This tool provides an estimation of the tail index alpha for empirical heavy-tailed distributions, such as have been encountered in telecommunication systems. It uses a method (called the ``scaling estimator'') based on the scaling properties of sums of heavy-tailed random variables. The software is available here, and the paper describing aest is available here.
Traffic Matrices

In the paper Mining Anomalies Using Traffic Feature Distributions we used data from two networks: GEANT and Abilene. THe GEANT data was provided to us under NDA so we can't distribute it, but the Abilene data is freely distributable. It can be downloaded here as a Matlab file with associated metadata and instructions. Note: this data consists of byte counts per unit time (not the entropy measures used in the paper).

This material is based upon work supported by the National Science Foundation under Grant no. CCR-0325701.


Latency Matrices (Virtual Landmarks)

Virtual Landmarks uses Lipschitz embedding of network nodes based on distances to landmark, along with dimensionality reduction via PCA. The method is described in this paper. The datasets used in that paper are here.

This material is based upon work supported by the National Science Foundation under Grant no. ANI-0322990.


Constraint-Based Geolocation

Constraint-Based Geolocation (CBG) uses measured round-trip-time delays to estimate geographic position. The technique is described in this paper. The code for CBG is here as a collection of routines in R (you can get R itself here).

This material is based upon work supported by the National Science Foundation under Grant no. ANI-0322990.


Multidimensional Scaling in the Poincare Disk

Multidimensional Scaling in the Poincare Disk is a method of embedding a set of points equipped with interpoint distances (or dissimilarities) into the Poincare model of hyperbolic space, in a way that seeks to minimize the difference between the input distances, and the distances as measured in the embedding. Matlab code for MDS-PD is here and the method is described in this paper.

This material is based upon work supported by the National Science Foundation under Grant no. CNS-1018266.


Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Creative Commons License
All code on this page is licensed under a Creative Commons License.

Mark Crovella /