BibTeX Entry

  author	= {Agosta, John-Mark and Chandrasekar, Jaideep and Crovella, Mark and Taft, Nina and Ting, Daniel},
  title		= {Mixture Models of Endhost Network Traffic},
  booktitle	= {Proceedings of the Infocom 2013 Miniconference},
  year		= {2013},
  address	= {Turin, Italy},
  URL		= {},
  doi		= {10.1109/INFCOM.2013.6566768},
  abstract	= {In recent years there has been much interest in modeling internet traffic that comes from inside large networks, such as at routers or gateways. In this work we focus on modeling a little studied type of traffic---namely the network traffic generated from endhosts. We study traffic data collected from hundreds of enterprise laptop users. We introduce a parsimonious parametric model of the marginal distribution for connection arrivals. We employ mixture models based on a convex combination of component distributions with both heavy and light-tails. These models can be fitted with high accuracy using maximum likelihood techniques. Our methodology assumes that the underlying user data can be fitted to one of many modeling options, and we apply Bayesian model selection criteria as a rigorous way to choose the preferred combination of components. Our experiments show that a simple Pareto-exponential mixture model is preferred for a wide range of users, over both simpler and more complex alternatives. This model has the desirable property of modeling the entire distribution, effectively segmenting the traffic into the heavy-tailed as well as the non-heavy-tailed components. Scaling the time-window used to bin the data varies the relative contributions of the components strongly, but affects the component parameters less so. We illustrate that this technique has the flexibility to capture the wide diversity of user behaviors.}