CS 791 - A Critical Look at Network Protocols
Prof. John Byers
Lecture Notes
Scribe: Gabriel Nasser
9/30/1999
Overview
Today's lecture focuses on the discussion of the paper Why we don't
know how to simulate the Internet, by Paxson and Floyd. This
paper discusses the challenges that involve modelling and simulating of
the Internet as well as provides a survey of what has been done to improve
on this front and what considerations should be made for future research.
Before discussing the paper, we begin by overviewing some mathematical
notions: Heavy-tailed distributions and Poisson distributions.
Mathematics and such...
Poisson Distribution
We define the Poisson distribution as the function
P (k; L) = (L^k * e^(-L)) / k! , where
L is a constant and k varies.
L is the average rate of occurence of the events. This distribution
appears in many natural situations such as the number of telephone calls
per minute at some switchboard. This distribution has the property
that it is memoryless, that is to say there is no correlation between events.
Furthermore, we define the density function of such a distribution as follows
PDF: f (k; L) = L * e^(-Lk), (where L stands for
lambda)
And the cumulative distribution function is written as
CDF: F (x; L) = Pr[X <= x] = 1 - e^(-Lk)
(again, where L stands for lambda)
For such random variables as those obeying a Poisson distribution
the time interval between occurences of events is exponentially distributed.
Log-normal Distribution (Heavy-tailed)
A random variable X is distributed log-normally if there is a random
variable Y, such that
Y =ln X
is ditributed normally. The following simple figure depicts the relation
between the random variables Y and X. Note the exponential spacing between
events.
A log-normal distribution has finite variance (as opposed to a Pareto
distribution).
A good rule-of-thumb to follow when testing for log-normality is to
plot the distribution on a log-scale, and verify that it looks normally
distributed.
Pareto Distribution (Heavy-tailed)
A Pareto distribution is another example of a heavy-tailed distribution.
In particular, Pareto distribution stands out from the other distributions
in that it has infinite variance (see lecture slides). We write the
density function for such a distribution as follows
PDF: f (x; k, O) = (k * O^k) / x^(k+1)
The following figure depicts a Pareto distribution
The Internet Challenge
[PF97] V. Paxson and S. Floyd, Why We Don't Know How to Simulate the
Internet , In
Proceedings of the Winter Simulation Conference, December 1997.
The Internet is a huge, heterogeneous, unpredictable and irregularly
distributed system. These properties make it hard for anyone to try
to model and simulate it. The Internet is challenging on many fronts.
In particular we will take closer looks at its topology, or its physical
layout; the behavior of traffic over its links; and how it is has evolved
in the last years as well as outlooks for the future.
Topology
The topology of the Internet is not well-known to anyone. There exist
various maps of certain areas of the Internet, but nothing too significant.
Also, its structure is very irregular as there seems to be no pattern for
how it is laid out. This makes it very hard to speak of a representative
topology of the Internet: there is no such thing (yet). This aspect makes
it very hard for researchers to make sound assumptions about its topology
as they attempt to argue for a new technology. This make modelling
the Internet very hard. Moreover, even if we have a representative topology
of the Internet today, it may not be representative of tomorrow's Internet.
Also, a representative topology might sound ridiculous when considering
the size of the Internet: developing protocols in small representative
test environments is not a guarantee that these protocols will perform
or scale well as they are deployed in the Internet.
Traffic
Another difficult aspect is that of a representative traffic mix.
It is very hard to model Internet traffic due to its highly dynamic nature.
Several attempts are made to monitor traffic behavior using per packet
trace simulations which is not a good indicator of the nature of traffic.
Instead, the tendency is to use source-level (i.e. the application that
generates
the traffic) behavior monitoring when studying traffic trends, for a more
accurate result. Since traffic generation is tricky it makes it hard to
come up with at least a representative mix. Nevertheless, there are
certain invariants which make things a little easier...
A closer look at the distribution of user sessions reveals a nice Poisson
distribution, and thus are modeled as such. The only caveat is that
there are time of day effects, which makes the average rate of occurrence
of sessions during the day different from that at night, different than
that during the weekend, etc. When looking at the equations described
in the previous section, this is realized by changing the value of L.
At a finer level of granularity, when we look at connections during
a single user session we observe that the connections follow a heavy-tailed
distribution. More precisely, we observe a small number of large connections,
but a large number of short-lived connections. The following graph
depicts the density function of the connction size
And, when observing the traffic at the gateway level, we observe a self-similar
process with long term correlation. The figure below gives a complete picture
of traffic and connection behavior at different levels.
One thing to note about the above figure is that due to the Poisson
distribution of session arrivals there is no temporal correlation between
them.
Trends
One important issue to consider is how has the Internet been evolving over
recent years and whether the trend is likely to continue in the future.
This question remains unaswered as the Internet exhibits an unpridictable
behavior, specially with regards to how fast (or slow) it is growing.
The Internet is growing at an unpredictable pace, making it even more
difficult to obtain the slightest reasonable result or asssumption.
Even if we come up with representative "traffic mix" and topology today,
who or what is to guarantee that they will be representative tomorrow.
Consider the latest changes on the Internet:
-
Web caching is more and more popular
-
The slow move towards fair queuing policies
-
Deployment of multicast capabilities is ongoing but slow.
These changes are not major modifications. No real major change or
"revolution" has emerged lately on the Internet. Does the Internet seem
to have (finally) stabilized?
In summing, here are the major challenges that lie ahead along with
some proposed considerations
The Challenge...
|
...and the Considerations
|
Realistic traffic mix... |
...perform a careful combination of per-packet trace simulation and
source-level behavior simulation as well as careful analysis. |
Representative topology... |
...No such thing yet, though there
must exist certain patterns in the layout and structure of the Internet.
We need to exploit these patterns. |
Future considerations... |
...Today's Internet is certainly not tomorrow's. We certainly need
to find scalable solutions. Moreover, account for "rare" events which are
no longer rare. Considerations also need to be made for potential revolutions,
such as "killer apps", etc. |
Conclusion
The key to success depends on building on existing work with a careful
combination of analysis and simulation so that the results are robust theoretically
and stand ground when put to the test. Emphasis is to feed existing
successful work (i.e. the VINT project) because it is the most efficient
and promising approach to handle the big challenge that is the Internet.