::::::::::::::
1993-001
::::::::::::::
Title: Performance Evaluation of Two-Shadow Speculative Concurrency Control
Author: A. Bestavros, S. Braoudakis, E. Panagos, Boston University
Date: February 1993
Abstract:
Speculative Concurrency Control (SCC) is a new concurrency control
approach especially suited for real-time database applications. It
relies on the use of redundancy to ensure that serializable schedules
are discovered and adopted as early as possible, thus increasing the
likelihood of the timely commitment of transactions with strict timing
constraints. In a recent publication by two of the authors, SCC-nS, a
generic algorithm that characterizes a family of SCC-based algorithms
was described, and its correctness established by showing that it only
admits serializable histories. In this paper, we evaluate the
performance of the Two-Shadow SCC algorithm (SCC-2S), a member of the
SCC-nS family, which is notable for its minimal use of redundancy. In
particular, we show that SCC-2S (as a representative of SCC-based
algorithms) provides significant performance gains over the widely
used Optimistic Concurrency Control with Broadcast Commit (OCC-BC),
under a variety of operating conditions and workloads.
::::::::::::::
1993-002
::::::::::::::
Author: Azer Bestavros, Boston University
Title: Speculative Concurrency Control for Real-Time Databases
Date: January 1993
Abstract:
In this paper, we propose a new class of Concurrency Control Algorithms that
is especially suited for real-time database applications. Our approach relies
on the use of (potentially) redundant computations to ensure that
serializable schedules are found and executed as early as possible, thus,
increasing the chances of a timely commitment of transactions with strict
timing constraints. Due to its nature, we term our concurrency control
algorithms Speculative. The aforementioned description encompasses many
algorithms that we call collectively Speculative Concurrency Control (SCC)
algorithms.
SCC algorithms combine the advantages of both Pessimistic and Optimistic
Concurrency Control (PCC and OCC) algorithms, while avoiding their
disadvantages. On the one hand, SCC resembles PCC in that conflicts are
detected as early as possible, thus making alternative schedules available in
a timely fashion in case they are needed. On the other hand, SCC resembles
OCC in that it allows conflicting transactions to proceed concurrently, thus
avoiding unecessary delays that may jeopardize their timely commitment.
::::::::::::::
1993-003
::::::::::::::
Title: Quadsim Student Manual
Author: Marwan Shaban, Boston University
Date: April 1993
Abstract:
Quadsim is an intermediate code simulator. It allows you to "run"
programs that your compiler generates in intermediate code format.
Its user interface is similar to most debuggers in that you can step
through your program, instruction by instruction, set breakpoints,
examine variable values, and so on.
The intermediate code format used by Quadsim is that described in
[Aho 86]. If your compiler generates intermediate code in this
format, you will be able to take intermediate-code files generated by
your compiler, load them into the simulator, and watch them "run."
You are provided with functions that hide the internal representation
of intermediate code. You can use these functions within your
compiler to generate intermediate code files that can be read by the
simulator.
Quadsim was inspired and greatly influenced by [Aho 86]. The
material in chapter 8 (Intermediate Code Generation) of [Aho 86]
should be considered background material for users of Quadsim.
::::::::::::::
1993-004
::::::::::::::
Title: Proceedings of Sixth International Workshop on Unification
Author: Wayne Snyder, Boston University
Date: April 1993
Abstract:
The Proceedings of the Sixth International Workshop on
Unification contains short papers presented at the workshop
which took place at the Dagstuhl conference center in
Germany, in June 1992.
::::::::::::::
1993-005
::::::::::::::
Title: Mermera: Non-coherent Distributed Shared Memory for Parallel Computing
Name: Himanshu Shekhar Sinha, Boston University
Date: May 1993
Abstract:
MERMERA:
NON-COHERENT DISTRIBUTED SHARED MEMORY
FOR PARALLEL COMPUTING}
Ph.D. Thesis
Himanshu Shekhar Sinha
Computer Science Department
Boston University
The proliferation of inexpensive workstations and networks has
prompted several researchers to use such distributed systems for
parallel computing.
Attempts have been made to offer a shared-memory programming
model on such distributed memory computers.
Most systems provide a shared-memory that is {\em coherent} in that
all processes that use it agree on the order of all memory events.
This dissertation explores the possibility of a significant improvement in
the performance of some applications when they use {\em
non-coherent} memory.
First, a new formal model to describe existing non-coherent memories
is developed.
I use this model to prove that certain problems can be solved using
asynchronous iterative algorithms on shared-memory in which the
coherence constraints are substantially relaxed.
In the course of the development of the model I discovered a new type of
non-coherent behavior called {\em Local Consistency}.
Second, a programming model, {\sc Mermera}, is proposed.
It provides programmers with a choice of hierarchically related
non-coherent behaviors along with one coherent behavior.
Thus, one can trade-off the ease of programming with coherent memory for
improved performance with non-coherent memory.
As an example, I present a program to solve a linear system of equations using
an asynchronous iterative algorithm.
This program uses all the behaviors offered by {\sc Mermera}.
Third, I describe the implementation of {\sc Mermera} on a BBN
Butterfly TC2000 and on a network of workstations.
The performance of a version of the equation solving program that uses
all the behaviors of {\sc Mermera} is compared with that of a
version that uses coherent behavior only.
For a system of 1000 equations the former exhibits at least a 5-fold
improvement in convergence time over the latter.
The version using coherent behavior only does not benefit from
employing more than one workstation to solve the problem while the
program using non-coherent behavior continues to achieve improved
performance as the number of workstations is increased from 1
to 6.
This measurement corroborates our belief that non-coherent shared memory can
be a performance boon for some applications.
::::::::::::::
1993-006
::::::::::::::
Title: An Implementation of Mermera: A Shared Memory System that Mixes Coherence with Non-coherence
Author: Abdelsalam Heddaya and Himanshu Sinha
Date: June 1993
Abstract:
Coherent shared memory is a convenient, but inefficient, method of
inter-process communication for parallel programs. By contrast,
message passing can be less convenient, but more efficient. To get
the benefits of both models, several non-coherent memory behaviors
have recently been proposed in the literature.
We present an implementation of Mermera, a shared memory system that
supports both coherent and non-coherent behaviors in a manner that
enables programmers to mix multiple behaviors in the same
program~\cite{HeddayaS93}. A programmer can debug a Mermera program
using coherent memory, and then improve its performance by selectively
reducing the level of coherence in the parts that are critical to
performance.
Mermera permits a trade-off of coherence for performance. We analyze
this trade-off through measurements of our implementation, and by an
example that illustrates the style of programming needed to exploit
non-coherence. We find that, even on a small network of workstations,
the performance advantage of non-coherence is compelling. Raw
non-coherent memory operations perform 20-40~times better than
non-coherent memory operations. An example aplication program is
shown to run 5-11~times faster when permitted to exploit
non-coherence. We conclude by commenting on our use of the Isis
Toolkit of multicast protocols in implementing Mermera.
Keywords: Distributed Shared Memory, Weak consistency,
Parallel Computing, Asynchronous Iterative Methods, Isis.
::::::::::::::
1993-007
::::::::::::::
Title: Using Warp to Control Network Contention in Mermera
Author: Abdelsalam Heddaya, Kihong Park, and Himanshu Sinha, Boston University
Date: June 1993
Abstract:
Parallel computing on a distributed system, such as a network of
workstations, can saturate the communication network, leading to
excessive message delays and consequently poor application
performance. Current operating systems offer only partial support for
flow control protocols that can help insulate application performance
from extraneous traffic on the shared network. We examine empirically
the consequences of integrating one such protocol, called Warp
control~\cite{Park93}, into Mermera, a software shared memory system
that supports parallel computing on distributed
systems~\cite{HeddayaS93hicss}.
Preliminary performance measurements are reported for an asynchronous
iterative program to solve a system of linear equations, under varying
levels of network contention. The experiments were conducted on a
network of seven Sun Sparc~1+ workstations, using an auxiliary traffic
generator. These measurements show that Warp succeeds in stabilizing
the network behavior when there is high contention, increasing the
effective throughput available to the application, and consequently
decreasing its completion time. In some cases, however, Warp control
does not achieve the performance attainable by fixed size buffering
when using a statically optimal buffer size. Based on the nature of
Warp and the underlying communication layers, we offer explanations
for our results.
Our use of Warp to regulate the allocation of network bandwidth
emphasizes the possibility for integrating it with the allocation of
other resources, such as CPU cycles and disk bandwidth, so as to
optimize overall system throughtput, and enable fully-shared execution
of parallel programs.
Keywords: Distributed non-coherent shared memory, network contention,
flow control, iterative methods, Isis.
::::::::::::::
1993-008
::::::::::::::
Title: Fixed Point vs. First-Order Logic on Finite Ordered Structures with Unary Relations
Name: A. J. Kfoury and M. Wymann-Boeni, Boston University
Date: August 1993
Abstract:
We prove that first order logic is strictly weaker than fixed point logic
over every infinite classes of finite ordered structures with additional
unary relations: Over these classes there is always an inductive unary
relation which cannot be defined by a first-order formula, even when every
inductive sentence (i.e., closed formula) can be expressed in first-order
over this particular class.
Our proof first establishes a property valid for every unary relation
definable by first-order logic over these classes which is peculiar to
classes of ordered structures with unary relations. In a second step we
show that this property itself can be expressed in fixed point logic and
can be used to construct a non-elementary unary relation.
::::::::::::::
1993-009
::::::::::::::
Title: A Characterization of First-Order Definable Subsets on Classes of Finite Total Orders
Author: A.J. Kfoury and M. Wymann-Boeni, Boston University
Date: August 1993
Abstract:
We give an explicit and easy-to-verify characterization for subsets in
finite total orders (infinitely many of them in general) to be definable by
the same first-order formula over any class of finite total orders.
From this characterization we derive immediately that Beth's definability
theorem does not hold in any class of finite total orders, as well as that
McColm's first conjecture is true for all classes of finite total orders.
Another consequence is a natural 0-1 law for definable subsets on finite
total orders expressed as a statement about the possible densities of
first-order definable subsets.
::::::::::::::
1993-010
::::::::::::::
Title: Learning Unions of Rectangles with Queries
Author: Zhixiang Chen and Steve Homer, Boston University
Date: September 1993
Abstract:
We investigate the efficient learnability of unions of $k$ rectangles
in the discrete plane $\{1,\ldots,n\}^{2}$ with equivalence and
membership queries. We exhibit a learning algorithm that learns any
union of $k$ rectangles with $O(k^{3}\log n)$ queries, while the time
complexity of this algorithm is bounded by $O(k^{5}\log n)$. We
design our learning algorithm by finding ``corners'' and ``edges'' for
rectangles contained in the target concept and then constructing the
target concept from those ``corners'' and ``edges''. Our result
provides a first approach to on-line learning of nontrivial subclasses
of unions of intersections of halfspaces with equivalence and
membership queries.
::::::::::::::
1993-011
::::::::::::::
Title: Typability and Type Checking in the Second-Order Lambda-Calculus Are Equivalent and Undecidable
Author: J. B. Wells,Boston University
Date: September 1993
Abstract:
We consider the problems of typability and type checking in the
Girard/Reynolds second-order polymorphic typed lambda calculus, for which
we use the short name ``System F'' and which we use in the ``Curry style''
where types are assigned to pure lambda terms. These problems have been
considered and proven to be decidable or undecidable for various
restrictions and extensions of System F and other related systems, and
lower-bound complexity results for System F have been achieved, but they
have remained ``embarrassing open problems'' for System F itself. We
first prove that type checking in System F is undecidable by a reduction
from semi-unification. We then prove typability in System F is
undecidable by a reduction from type checking. Since the reverse
reduction is already known, this implies the two problems are equivalent.
The second reduction uses a novel method of constructing lambda terms such
that in all type derivations, specific bound variables must always be
assigned a specific type. Using this technique, we can require that
specific subterms must be typable using a specific, fixed type assignment
in order for the entire term to be typable at all. Any desired type
assignment may be simulated. We develop this method, which we call
``constants for free'', for both the lambda-K and lambda-I calculi.
::::::::::::::
1993-012
::::::::::::::
Title: Building Responsive Systems from Physically-correct Specifications
Author: Azer Bestavros,Boston University
Date: October 1993
Abstract:
Predictability -- the ability to foretell that an implementation will
not violate a set of specified reliability and timeliness requirements
-- is a crucial, highly desirable property of responsive embedded
systems. This paper overviews a development methodology for responsive
systems, which enhances predictability by eliminating potential
hazards resulting from physically-unsound specifications.
The backbone of our methodology is the Time-constrained Reactive
Automaton (TRA) formalism, which adopts a fundamental notion of space
and time that restricts expressiveness in a way that allows the
specification of only reactive, spontaneous, and causal computation.
Using the TRA model, unrealistic systems -- possessing properties such
as clairvoyance, caprice, infinite capacity, or perfect timing --
cannot even be specified. We argue that this ``ounce of prevention''
at the specification level is likely to spare a lot of time and energy
in the development cycle of responsive systems -- not to mention the
elimination of potential hazards that would have gone, otherwise,
unnoticed.
The TRA model is presented to system developers through the Cleopatra
programming language. Cleopatra features a C-like imperative syntax
for the description of computation, which makes it easier to
incorporate in applications already using C. It is event-driven, and
thus appropriate for embedded process control applications. It is
object-oriented and compositional, thus advocating modularity and
reusability. Cleopatra is semantically sound; its objects can be
transformed, mechanically and unambiguously, into formal TRA automata
for verification purposes, which can be pursued using model-checking
or theorem proving techniques. Since 1989, an ancestor of Cleopatra
has been in use as a specification and simulation language for
embedded time-critical robotic processes.
::::::::::::::
1993-013
::::::::::::::
Title: A Minimal GB Parser
Author: Marwan Shaban, Boston University
Date: October 1993
Abstract:
We describe a GB parser implemented along the lines of those written
by Fong [Fong91] and Dorr [Dorr87]. The phrase structure recovery
component is an implementation of Tomita's generalized LR parsing
algorithm (described in [Tomi86]), with recursive control flow
(similar to Fong's implementation). The major principles implemented
are government, binding, bounding, trace theory, case theory,
theta-theory, and barriers. The particular version of GB theory we
use is that described by Haegeman [Haeg91].
The parser is minimal in the sense that it implements the major
principles needed in a GB parser, and has fairly good coverage of
linguistically interesting portions of the English language.
::::::::::::::
1993-014
::::::::::::::
Title: Multi-version Speculative Concurrency Control with Delayed Commit
Author: Azer Bestavros and Biao Wang, Boston University
Date: October 1993
Abstract:
This paper presents an algorithm which extends the relatively new
notion of speculative concurrency control by delaying the commitment
of transactions, thus allowing other conflicting transactions to
continue execution and commit rather than restart. This algorithm
propagates uncommitted data to other outstanding transactions thus
allowing more speculative schedules to be considered. The algorithm is
shown always to find a serializable schedule, and to avoid cascading
aborts. Like speculative concurrency control, it considers strictly
more schedules than traditional concurrency control algorithms.
Further work is needed to determine which of these speculative methods
performs better on actual transaction loads.
::::::::::::::
1993-015
::::::::::::::
Title: How good are genetic algorithms at finding large cliques: an experimental study
Author: Bob Carter and Kihong Park, Boston University
Date: November 1993
Abstract:
This paper investigates the power of genetic algorithms at solving
the MAX-CLIQUE problem. We measure
the performance of a standard genetic algorithm on an elementary set
of problem instances consisting of embedded cliques in random graphs.
We indicate the need for improvement, and
introduce a new genetic algorithm, the {\em multi-phase annealed
GA}, which exhibits superior performance on the same
problem set.
As we scale up the problem size and test on ``hard''
benchmark instances, we notice a
degraded performance in the algorithm
caused by premature convergence to local minima.
To alleviate this problem,
a sequence of modifications are implemented ranging from changes in
input representation to systematic local search. The most recent version,
called {\em union GA}, incorporates the features of union cross-over,
greedy replacement, and diversity enhancement. It shows a marked
speed-up in the number of iterations required to find a given
solution, as well as
some improvement in the clique size found.
We discuss issues related to the SIMD implementation of the
genetic algorithms on a Thinking Machines CM-5,
which was necessitated by the intrinsically
high time complexity ($O(n^3)$) of the serial algorithm for computing
one iteration.
Our preliminary conclusions are: (1) a genetic algorithm
needs to be heavily customized to work ``well'' for the clique problem;
(2) a GA is computationally very expensive, and its use is
only recommended if it is known to find larger cliques than other
algorithms; (3) although our customization effort is bringing forth
continued improvements, there is no clear evidence, at this time, that a
GA will have better success in circumventing local minima.
::::::::::::::
1993-016
::::::::::::::
Title: An Algebraic Characterization of First-Order Definability
Author: A.J. Kfoury and M. Wymann-Boeni
Date: November 1993
Abstract:
We give a variable-free relational calculus which defines exactly
all first-order definable relations in a arbitrary structure.
We then show that, over an arbitrary class $\C$ of finite ordered
structures with signature $\{ \LE, R_1, \ldots, R_\alpha \}$,
the unary relations uniformly defined by this calculus over $\C$
are characterized by a another simplified variable-free calculus which we
call $\Q$. $\Q$ is the least set of formal expressions such that:
\begin{eqnarray*}
\Q &\supseteq&\ \{ \varnothing, R_1,\ldots, R_\alpha \}\ \cup\\
& &\ \{ (Q\PLUS x)\ |\ Q\in\Q, x\in\omega \cup \{\infty\} \}\ \cup
\ \{ (Q\MINUS x)\ |\ Q\in\Q, x\in\omega \cup \{\infty\} \}\ \cup \\
& &\ \{ (\NOT Q)\ |\ Q\in\Q\}\ \cup
\ \{ (Q_1\AND Q_2)\ |\ Q_1,Q_2\in\Q\}\ \cup
\ \{ (Q_1\OR Q_2)\ |\ Q_1,Q_2\in\Q\}\ .\
\end{eqnarray*}
where $\PLUS$ and $\MINUS$ are ``shift'' operators defined in Section 3.
\end{abstract}
::::::::::::::
1993-017
::::::::::::::
Title: A Direct Algorithm for Type Inference in the Rank 2 Fragment of the Second-Order Lambda-Calculus
Author: Joe Wells,Boston University
Date: November 1993
Abstract:
We study the problem of type inference for a family of polymorphic type
disciplines containing the power of Core-ML. This family comprises all
levels of the stratification of the second-order lambda-calculus by
``rank'' of types. We show that typability is an undecidable problem at
every rank k >= 3 of this stratification. While it was already known that
typability is decidable at rank <= 2, no direct and easy-to-implement
algorithm was available. To design such an algorithm, we develop a new
notion of reduction and show how to use it to reduce the problem of
typability at rank 2 to the problem of acyclic semi-unification. A
by-product of our analysis is the publication of a simple solution
procedure for acyclic semi-unification.
::::::::::::::
1993-018
::::::::::::::
Title: A General Theory of Semi-Unification
Author: Said Jahama and A. J. Kfoury
Date: December 1993
Abstract:
Various restrictions on the terms allowed for substitution give rise
to different cases of semi-unification. Semi-unification on finite
and regular terms has already been considered in the literature. We
introduce a general case of semi-unification where substitutions are
allowed on non-regular terms, and we prove the equivalence of this
general case to a well-known undecidable data base dependency problem
, thus establishing the undecidability of general semi-unification.
We present a unified way of looking at the various problems of
semi-unification. We give some properties that are common to all the
cases of semi-unification. We also the principality property and the
solution set for those problems. We prove that semi-unification on
general terms has the principality property. Finally, we present a
recursive inseparability result between semi-unification on regular
terms and semi-unification on general terms.
::::::::::::::
1993-019
::::::::::::::
Title: Type Reconstruction in the Presence of Polymorphic Recursion and Recursive Types
Author: Said Jahama, Boston University
Date: December 1993
Abstract:
We establish the equivalence of type reconstruction with polymorphic recursion
and recursive types is equivalent to regular semi-unification which proves
the undecidability of the corresponding type reconstruction problem. We also
establish the equivalence of type reconstruction with polymorphic recursion
and positive recursive types to a special case of regular semi-unification
which we call positive regular semi-unification. The decidability of positive
regular semi-unification is an open problem.
::::::::::::::
1993-020
::::::::::::::
Title: AIDA-based Distributed File System
Authors: Azer Bestavros and Mohammad Makarechian
Date: December 1993
Abstract:
This paper describes a prototype implementation of a Distributed File
System (DFS) based on the Adaptive Information Dispersal Algorithm
(AIDA). Using AIDA, a file block is encoded and dispersed into smaller
blocks stored on a number of DFS nodes distributed over a network. The
implementation devises file creation, read, and write operations. In
particular, when reading a file, the DFS accepts an optional timing
constraint, which it uses to determine the level of redundancy needed
for the read operation. The tighter the timing constraint, the more
nodes in the DFS are queried for encoded blocks. Write operations
update all blocks in all DFS nodes--with future implementations
possibly including the use of read and write quorums.
This work was conducted under the supervision of Professor Azer
Bestavros (best@cs.bu.edu) in the Computer Science Department as part
of Mohammad Makarechian's Master's project.
::::::::::::::
1994-001
::::::::::::::
Title: On the Performance of Polynomial-time CLIQUE Algorithms on Very Large Graphs
Author: Steve Homer and Marcus Peinado, Boston University
Date: January 1994
Abstract:
The performance of a randomized version of the subgraph-exclusion
algorithm (called Ramsey) for CLIQUE by Boppana and Halld\'{o}rsson is
studied on very large graphs. We compare the performance of this
algorithm with the performance of two common heuristic algorithms, the
greedy heuristic and a version of simulated annealing. These
algorithms are tested on graphs with up to 10,000 vertices on a
workstation and graphs as large as 70,000 vertices on a Connection
Machine. Our implementations establish the ability to run clique
approximation algorithms on very large graphs. We test our
implementations on a variety of different graphs. Our conclusions
indicate that on randomly generated graphs minor changes to the
distribution can cause dramatic changes in the performance of the
heuristic algorithms. The Ramsey algorithm, while not as good as the
others for the most common distributions, seems more robust and
provides a more even overall performance. In general, and especially
on deterministically generated graphs, a combination of simulated
annealing with either the Ramsey algorithm or the greedy heuristic
seems to perform best. This combined algorithm works particularly
well on large Keller and Hamming graphs and has a competitive overall
performance on the DIMACS benchmark graphs.
::::::::::::::
1994-002
::::::::::::::
Title: On Learning Counting Functions With Queries
Author: Zhixiang Chen and Steven Homer
Date: February 1994
Abstract:
We investigate the problem of learning disjunctions of counting
functions, which are general cases of parity and modulo functions,
with equivalence and membership queries. We prove that, for any prime
number $p$, the class of disjunctions of integer-weighted counting
functions with modulus $p$ over the domain $Z^{n}_{q}$ (or $Z^{n}$)
for any given integer $q \ge 2$ is polynomial time learnable using at
most $n+1$ equivalence queries, where the hypotheses issued by the
learner are disjunctions of at most $n$ counting functions with
weights from $Z_{p}$. The result is obtained through learning linear
systems over an arbitrary field. In general a counting function may
have a composite modulus. We prove that, for any given integer $q \ge
2$, over the domain $Z_{2}^{n}$, the class of read-once disjunctions
of Boolean-weighted counting functions with modulus $q$ is polynomial
time learnable with only one equivalence query, and the class of
disjunctions of $\log \log n$ Boolean-weighted counting functions with
modulus $q$ is polynomial time learnable.tions, which are general
cases Finally, we present an algorithm for learning graph-based
counting functions.ies.
::::::::::::::
1994-003
::::::::::::::
Title: Mapping parallel iterative algorithms onto workstation networks
Author: Abdelsalam Heddaya and Kihong Park, Boston University
Date: February 1994
Abstract:
For communication-intensive parallel applications, the maximum degree
of concurrency achievable is limited by the communication throughput
made available by the network. In previous work, we showed
experimentally that the performance of certain parallel applications
running on a workstation network can be enhanced significantly if a
congestion control protocol is used to enhance network performance.
In this paper, we characterize and analyze the communication
requirements of a large class of supercomputing applications that fall
under the category of fixed-point problems, amenable to solution by
parallel iterative methods. This results in a set of interface and
architectural features sufficient for the efficient implementation of
the application over a large-scale distributed system. In particular,
we propose a direct link between the application and network layer,
supporting congestion control actions at both ends. This in turn
enhances the system's responsiveness to network congestion, improving
performance.
Preliminary results of a prototype system are summarized showing the
efficacy of our scheme to support large-scale parallel computations.
We conclude with a description of a full implementation in progress.
KEYWORDS: Parallel iterative methods, congestion control,
communication architecture.
::::::::::::::
1994-004
::::::::::::::
Title: A Hybrid GLR Algorithm for Parsing with Epsilon Grammars
Author: Marwan Shaban, Boston University
Date: March 22, 1994
Abstract:
We give a hybrid algorithm for parsing $\epsilon$-grammars based on
Tomita's non-$\epsilon$-grammar parsing algorithm (\cite{tomi86}) and
Nozohoor-Farshi's $\epsilon$-grammar recognition algorithm
(\cite{fars91}). The hybrid parser handles the same set of grammars
handled by Nozohoor-Farshi's recognizer. The algorithm's details and
an example of its use are given. We also discuss the deployment of
the hybrid algorithm within a GB parser, and the reason an
$\epsilon$-grammar parser is needed in our GB parser.
::::::::::::::
1994-005
::::::::::::::
Title: Structure Sharing and Parallelization in a GB Parser
Author: Marwan Shaban, Boston University
Date: March 22, 1994
Abstract:
By utilizing structure sharing among its parse trees, a GB parser can
increase its efficiency dramatically. Using a GB parser which has as
its phrase structure recovery component an implementation of Tomita's
algorithm (as described in \cite{tomi86}), we investigate how a GB
parser can preserve the structure sharing output by Tomita's
algorithm. In this report, we discuss the implications of using
Tomita's algorithm in GB parsing, and we give some details of the
structure-sharing parser currently under construction. We also
discuss a method of parallelizing a GB parser, and relate it to the
existing literature on parallel GB parsing. Our approach to
preserving sharing within a shared-packed forest is applicable not
only to GB parsing, but anytime we want to preserve structure sharing
in a parse forest in the presence of features.
::::::::::::::
1994-006
::::::::::::::
Title: Adding Polymorphic Abstraction to ML (Detailed Abstract)
Author: A. J. Kfoury and J. B. Wells, Boston University
Date: May 1994
Abstract:
The ML programming language restricts type polymorphism to occur only
in the ``let-in'' construct and requires every occurrence of a formal
parameter of a function (a lambda abstraction) to have the same type.
Milner in 1978 refers to this restriction (which was adopted to help
ML achieve automatic type inference) as a serious limitation. We show
that this restriction can be relaxed enough to allow universal
polymorphic abstraction without losing automatic type inference. This
extension is equivalent to the rank-2 fragment of system F. We
precisely characterize the additional program phrases (lambda terms)
that can be typed with this extension and we describe typing anomalies
both before and after the extension. We discuss how macros may be
used to gain some of the power of rank-3 types without losing
automatic type inference. We also discuss user-interface problems in
how to inform the programmer of the possible types a program phrase
may have.
::::::::::::::
1994-007
::::::::::::::
Title: Timeliness via Speculation for Real-Time Databases
Author: Azer Bestavros and Spyridon Braoudakis, Boston University
Date: May 1994
Abstract:
Various concurrency control algorithms differ in the time when
conflicts are detected, and in the way they are resolved. In that
respect, the Pessimistic and Optimistic Concurrency Control (PCC and
OCC) alternatives represent two extremes. PCC locking protocols detect
conflicts as soon as they occur and resolve them using {\em
blocking}. OCC protocols detect conflicts at transaction commit time
and resolve them using {\em rollbacks} (restarts). For real-time
databases, blockages and rollbacks are hazards that increase the
likelihood of transactions missing their deadlines. We propose a {\em
Speculative} Concurrency Control (SCC) technique that minimizes the
impact of blockages and rollbacks. SCC relies on the use of added
system resources to {\em speculate} on potential serialization orders
and to ensure that if such serialization orders materialize, the
hazards of blockages and roll-backs are minimized. We present a number
of SCC-based algorithms that differ in the level of speculation they
introduce, and the amount of system resources (mainly memory) they
require. We show the performance gains (in terms of number of
satisfied timing constraints) to be expected when a representative SCC
algorithm (SCC-2S) is adopted.
::::::::::::::
1994-008
::::::::::::::
Title: Towards Physically-Correct Specifications of Embedded Real-Time Systems
Author: Azer Bestavros, Boston University
Date: May 1994
Abstract:
Predictability (the ability to foretell that an implementation will
not violate a set of specified reliability and timeliness
requirements) is a crucial, highly desirable property of responsive
embedded systems. This paper overviews a development methodology for
responsive systems, which enhances predictability by eliminating
potential hazards resulting from physically-unsound specifications.
The backbone of our methodology is a formalism that restricts
expressiveness in a way that allows the specification of only
reactive, spontaneous, and causal computation. Unrealistic systems
(possessing properties such as clairvoyance, caprice, infinite
capacity, or perfect timing) cannot even be specified. We argue that
this ``ounce of prevention'' at the specification level is likely to
spare a lot of time and energy in the development cycle of responsive
systems -- not to mention the elimination of potential hazards that
would have gone, otherwise, unnoticed.
::::::::::::::
1994-009
::::::::::::::
Title: A lower-bound result on the power of a genetic algorithm
Author: Kihong Park, Computer Science Dept., Boston University
Date: October 12, 1994
Abstract:
This paper presents a lower-bound result on the computational power of
a genetic algorithm in the context of combinatorial optimization. We describe
a new genetic algorithm, the merged genetic algorithm, and prove that
for the class of monotonic functions, the algorithm finds the optimal solution,
and does so with an exponential convergence rate. The analysis pertains to the
ideal behavior of the algorithm where the main task reduces to showing
convergence of probability distributions over the search space of combinatorial
structures to the optimal one. We take exponential convergence to be indicative
of efficient solvability for the sample-bounded algorithm, although a sampling
theory is needed to better relate the limit behavior to actual behavior. The
paper concludes with a discussion of some immediate problems that lie ahead.
::::::::::::::
1994-010
::::::::::::::
Title: On the effectiveness of genetic search in combinatorial optimization
Author: Bob Carter and Kihong Park, Computer Science Dept, Boston University
Date: November 10, 1994
Abstract:
In this paper, we study the efficacy of genetic algorithms in the context
of combinatorial optimization. In particular, we isolate the effects of
cross-over, treated as the central component of genetic search. We show that
for problems of nontrivial size and difficulty, the contribution of cross-over
search is marginal, both synergistically when run in conjunction with mutation
and selection, or when run with selection alone, the reference point being
the search procedure consisting of just mutation and selection. The latter can
be viewed as another manifestation of the Metropolis process. Considering the
high computational cost of maintaining a population to facilitate cross-over
search, its marginal benefit renders genetic search inferior to its
singleton-population counterpart, the Metropolis process, and by extension,
simulated annealing. This is further compounded by the fact that many problems
arising in practice may inherently require a large number of state transitions
for a near-optimal solution to be found, making genetic search infeasible given
the high cost of computing a single iteration in the enlarged state-space.
::::::::::::::
1994-011
::::::::::::::
Title: Concurrency Control Protocols for Real-Time Databases, Phd Thesis
Author: Spyridon Braoudakis (Major Advisor: Azer Bestavros)
Date: November 12, 1994
Abstract:
Concurrency control methods developed for traditional database systems
are not appropriate for real-time database systems (RTDBS), where, in
addition to database consistency requirements, satisfying timing
constraints is an integral part of the correctness criterion. Most
real-time concurrency control protocols considered in the literature
combine time-critical scheduling with traditional concurrency control
methods to conform to transaction timing constraints. These methods
rely on either transaction {\em blocking} or {\em restarts}, both of
which are inappropriate for real-time concurrency control because of
the {\em unpredictability} they introduce. Moreover, RTDBS
performance objectives differ from those of conventional database
systems in that maximizing the number of transactions that complete
before their deadlines becomes the decisive performance objective,
rather than merely maximizing concurrency (or throughput). Recently,
Speculative Concurrency Control (SCC) was proposed as a categorically
different approach to concurrency control for RTDBS. SCC relies on
the use of {\em redundant} processes ({\em shadows}), which {\em
speculate} on alternative schedules, once conflicts that threaten the
consistency of the database are detected. SCC algorithms utilize added
system resources to ensure that correct (serializable) executions are
discovered and adopted as early as possible, thus increasing the
likelihood of the timely commitment of transactions.
This dissertation starts by reviewing the Order-Based SCC (SCC-OB)
algorithm which associates almost as many shadows as there are
serialization orders of transactions. After demonstrating SCC-OB's
excessive use of redundancy, a host of novel SCC-based protocols is
introduced. Conflict-Based SCC (SCC-CB) reduces the number of shadows
that a running transaction needs to keep by maintaining one shadow per
uncommitted conflicting transaction. It is shown that the quadratic
number of shadows maintained by SCC-CB is optimal, covering {\em all}
serialization orders produced by SCC-OB. SCC-CB's correctness is
established by showing that it admits only serializable histories.
Next, the trade-off between the number of shadows and timeliness is
considered. A generic SCC algorithm (SCC-kS) that operates under a
limited redundancy assumption is presented; it allows no more than a
constant number $k$ of shadows to coexist on behalf of any uncommitted
transaction. Next, a novel technique is proposed that incorporates
additional information such as {\em deadline}, {\em priority} and {\em
criticalness} within the SCC methodology. SCC with Deferred Commit
(SCC-DC) utilizes this additional information to improve the
timeliness through the controlled {\em deferment} of transaction
commitments. A probabilistic Value Induced Shadow Allocation (VISA)
policy is developed which aims at preserving the most {\em valuable}
shadows for each system transaction. The thesis of this dissertation
is that SCC-based algorithms offer a new dimension, {\em redundancy},
to improve the {\em timeliness} of RTDBS. SCC-based algorithms are
efficient (quadratic number of shadows is optimal), scalable
(redundancy can be traded-off for timeliness), and easily amendable
(deadline and priority information can be incorporated).
::::::::::::::
1994-012
::::::::::::::
Title: OS Support for Portable Bulk Synchronous Parallel Programs
Author: Abdelsalam Heddaya and Amr F. Fahmy (Harvard)
Date: December 5, 1994
Abstract:
For parallel programs to become portable, they must be executable with
uniform efficiency on a variety of hardware platforms, which is not
the case at present. In 1990, Valiant proposed Bulk-Synchronous
Parallelism (BSP) as a model on which portable parallel programs can
be built. We argue that shared-memory BSP is efficiently
implementable on a wide variety of parallel hardware, and that BSP
forms a useful basis for providing an even higher level programming
interface based on Sequential Consistency (SC). A list of memory and
thread management features needed to support BSP and SC parallel
programs are given, under the assumption that the parallel computer is
space-shared among multiple parallel task, rather than time-shared.
Known techniques to realize efficiently the most important of these
features are sketched.
::::::::::::::
1994-013
::::::::::::::
Title: An Algorithm for Inferring Quasi-Static Types
Author: Alberto Oliart
Date: November 1994
Abstract:
This report presents an algorithm, and its implementation, for doing type
inference in the context of Quasi-Static Typing (QST) ["Quasy-static
Typing." Satish Thatte Proc. ACM Symp. om Principles of Programming
Languages, 1988]. The package infers types a la ``QST'' for the simply
typed lambda-calculus.
::::::::::::::
1994-014
::::::::::::::
Title: New Notions of Reduction and Non-Semantic Proofs of Beta-Strong
Normalization in Typed Lambda-Calculi
Author: A. J. Kfoury and J. B. Wells
Date: December 19, 1994
Abstract:
Two new notions of reduction for terms of the lambda-calculus are
introduced and the question of whether a lambda-term is beta-strongly
normalizing is reduced to the question of whether a lambda-term is merely
normalizing under one of the new notions of reduction. This leads to a
new way to prove beta-strong normalization for typed lambda-calculi.
Instead of the usual semantic proof style based on Girard's ``candidats de
r\'eductibilit\'e'', termination can be proved using a decreasing metric
over a well-founded ordering in a style more common in the field of term
rewriting. This new proof method is applied to the simply-typed
lambda-calculus and the system of intersection types.
::::::::::::::
1994-015
::::::::::::::
Title: Search by Shape Examples: Modeling Nonrigid Deformation
Author: S. Sclaroff and A. P. Pentland
Date: October, 1994
Abstract:
We describe our work on shape-based image database search using the
technique of modal matching. Modal matching employs a deformable shape
decomposition that allows users to select example objects and have the
computer efficiently sort the set of objects based on the similarity
of their shape. Shapes are compared in terms of the types of nonrigid
deformations (differences) that relate them. The modal decomposition
provides deformation ``control knobs'' for flexible matching and thus
allows for selecting weighted subsets of shape parameters that are
deemed significant for a particular category or context. We
demonstrate the utility of this approach for shape comparison in 2-D
image databases; however, the general formulation is applicable to
signals of any dimensionality.
::::::::::::::
1994-016
::::::::::::::
Title: Physically-Based Combinations of Views: Representing Rigid and Nonrigid Motion
Author: S. Sclaroff and A. P. Pentland
Date: November, 1994
Abstract:
Nonrigid motion can be described as morphing or blending between
extremal shapes, e.g., heart motion can be described as transitioning
between the systole and diastole states. Using physically-based
modeling techniques, shape similarity can be measured in terms of
forces and strain. This provides a physically-based coordinate system
in which motion is characterized in terms of physical similarity to a
set of extremal shapes. Having such a low-dimensional
characterization of nonrigid motion allows for the recognition and the
comparison of different types of nonrigid motion.
::::::::::::::
1995-001
::::::::::::::
Title: Proceedings of the Workshop on Versioning in Hypertext Systems
Author: David Durand, Anja Haake (GMD-IPSI, Germany),
David Hicks (Texas A&M), Fabio Vitali (CIRFID - University of Bologna)
Date: February 7, 1995
Abstract:
This report (ftp://cs-ftp.bu.edu/techreports/misc/95-001/Home.html)
contains 9 papers presented at a workshop on version management and
hypertext, as well as a summary introduction by the organizers. These
papers address requirements, solutions, and research issues related to
the management of hypertext databases. Version management is not only
a key application requirement in some domains (like design journals
and electronic manuals) but provides a way to preserve the integrity
of links in a changing hyperbase.
::::::::::::::
1995-002
::::::::::::::
Title: Application-Level Document Caching in the Internet
Author: Azer Bestavros, Robert L. Carter, Mark E. Crovella, Carlos R. Cunha, Abdelsalam Heddaya and Sulaiman A. Mirdad
Date: February 15, 1995
Abstract:
With the increasing demand for document transfer services such as the World
Wide Web comes a need for better resource management to reduce the latency
of documents in these systems. To address this need, we report on the
potential for document caching at the application level in document transfer
services. We collected traces of over 250 executions of Mosaic, reflecting
actual user requests for WWW documents. Using those traces, we study the
tradeoffs between caching at three levels in the system, and the potential
for use of application-level information in the caching system. Our traces
show that while a high hit rate in terms of URLs is achievable, a much lower
hit rate is possible in terms of bytes, because most profitably-cached
documents are small. We considered the performance of caching when applied
at the level of individual user sessions, at the level of individual hosts,
and at the level of a collection of hosts on a single LAN. We show that the
performance gain achievable by caching at the session level (which is
straightforward to implement) is nearly all of that achievable at the LAN
level (where caching is more difficult to implement). However, when
resource requirements are considered, LAN level caching becomes much more
desirable, since it can achieve a given level of caching performance using a
much smaller amount of cache space. Finally, we consider the use of
organizational boundary information as an example of the potential for use
of application-level information in caching. We show that while it is
desirable to cache local documents at the LAN level, the opposite is true at
the session level, where remote documents are more profitably cached.
::::::::::::::
1995-003
::::::::::::::
Title: Demand-based Document Dissemination for the World-Wide Web
Author: Azer Bestavros
Date: February 15, 1995
Abstract:
We analyzed the logs of our departmental HTTP server {\tt
http://cs-www.bu.edu} as well as the logs the more popular Rolling
Stones HTTP server {\tt http://www.stones.com}. These servers have
very different purposes; the former caters primarily to local clients,
whereas the latter caters exclusively to remote clients all over the
world. In both cases, our analysis showed that remote HTTP accesses
were confined to a very small subset of documents. Using an
analytical model of server popularity and file access profiles, we
show that by disseminating the most popular documents on servers
(proxies) closer to the clients, network traffic could be reduced
considerably, while server loads are balanced. We argue that this
process could be generalized so as to provide for an automated
demand-based duplication of documents. In that respect, we sketch the
DDD-WWW protocol to implement this Demand-based Document Dissemination
on the WWW. We believe that such server-based information
dissemination protocols will be more effective at reducing {\em both}
network bandwidth and document retrieval times than client-based
caching protocols \cite{bestavros:95c}.
::::::::::::::
1995-004
::::::::::::::
Title: Equational Axiomatization of Bicoercibility for Polymorphic Types
Author: Jerzy Tiuryn, Institute of Informatics, Warsaw University
Date: February 16, 1995
Abstract: Two polymorphic types \sigma and \tau are said to be bicoercible if
there is a coercion from \sigma to \tau and conversely. We give a
complete equational axiomatization of bicoercible types and prove that the
relation of bicoercibility is decidable.
::::::::::::::
1995-005
::::::::::::::
Title: Speculative Concurrency Control with Deferred Commitment for Real-Time Databases
Author: Azer Bestavros and Spyridon Braoudakis
Date: February 20, 1995
Abstract:
A problem with Speculative Concurrency Control algorithms and other
common concurrency control schemes using forward validation is that
committing a transaction as soon as it finishes validating, may result
in a value loss to the system. Haritsa showed that by making a lower
priority transaction wait after it is validated, the number of
transactions meeting their deadlines is increased, which may result in
a higher value-added to the system. SCC-based protocols can benefit
from the introduction of such delays by giving optimistic shadows with
high value-added to the system more time to execute and commit instead
of being aborted in favor of other validating transactions, whose
value-added to the system is lower. In this paper we present and
evaluate an extension to SCC algorithms that allows for commit
deferments.
::::::::::::::
1995-006
::::::::::::::
Title: Using Speculation to Reduce Server Load and Service Time on the WWW
Author: Azer Bestavros
Date: February 21, 1995
Abstract:
Speculative service implies that a client's request for a document is
serviced by sending, in addition to the document requested, a number
of other documents that the server speculates will be requested by the
client in the near future. This speculation is based on statistical
information that the server maintains for each document it serves. The
notion of speculative service is analogous to prefetching, which is
used to improve cache performance in distributed/parallel shared
memory systems, with the exception that servers (not clients) control
when and what to prefetch. Using trace simulations based on the logs
of our departmental HTTP server http://cs-www.bu.edu, we show that
both server load and service time could be reduced considerably, if
speculative service is used. This is above and beyond what is
currently achievable using client-side caching and server-side
dissemination. We identify a number of parameters that could be used
to fine-tune the level of speculation performed by the server.
::::::::::::::
1995-007
::::::::::::::
Title: Addendum to ``New Notions of Reduction and Non-Semantic Proofs of Beta Strong Normalization in Typed Lambda Calculi''
Author: A. J. Kfoury and J. B. Wells
Date: March 1995
Abstract:
This is an addendum to our technical report BUCS TR-94-014 of December
19, 1994. It clarifies some statements, adds information on some
related research, includes a comparison with research be de Groote, and
fixes two minor mistakes in a proof.
::::::::::::::
1995-008
::::::::::::::
Title: Modal Matching for Correspondence and Recognition
Author: S. Sclaroff and A. P. Pentland
Date: March 1995
Abstract:
Modal matching is a new method for establishing correspondences and
computing canonical descriptions. The method is based on the idea of
describing objects in terms of generalized symmetries, as defined by
each object's eigenmodes. The resulting modal description is
used for object recognition and categorization, where shape
similarities are expressed as the amounts of modal deformation energy
needed to align the two objects. In general, modes provide a
global-to-local ordering of shape deformation and thus allow for
selecting which types of deformations are used in object alignment and
comparison. In contrast to previous techniques, which required
correspondence to be computed with an initial or prototype shape,
modal matching utilizes a new type of finite element formulation that
allows for an object's eigenmodes to be computed directly from
available image information. This improved formulation provides
greater generality and accuracy, and is applicable to data of any
dimensionality. Correspondence results with 2-D contour and point
feature data are shown, and recognition experiments with 2-D images of
hand tools and airplanes are described.
::::::::::::::
1995-009
::::::::::::::
Title: A New Version of Toom's Proof
Author: Peter Gacs
Date: March 27, 1995
Abstract:
There are several proofs now for the stability of Toom's example of a
two-dimensional stable cellular automaton and its application to
fault-tolerant computation. Simon and Berman simplified and
strengthened Toom's original proof: the present report is simplified
exposition of their proof.
::::::::::::::
1995-010
::::::::::::::
Title: Characteristics of WWW Client-based Traces
Author: Carlos Cunha, Azer Bestavros, and Mark Crovella
Date: April 1, 1995 (modified July 18, 1995)
Abstract:
The explosion of WWW traffic necessitates an accurate picture of WWW
use, and in particular requires a good understanding of client
requests for WWW documents. To address this need, we have collected
traces of actual executions of NCSA Mosaic, reflecting over half a
million user requests for WWW documents. In this paper we present a
descriptive statistical summary of the traces we collected, which
identifies a number of trends and reference patterns in WWW use. In
particular, we show that many characteristics of WWW use can be
modelled using power-law distributions, including the distribution of
document sizes, the popularity of documents as a function of size, the
distribution of user requests for documents, and the number of
references to documents as a function of their overall rank in
popularity (Zipf's law). In addition, we show how the power-law
distributions derived from our traces can be used to guide system
designers interested in caching WWW documents.
---
Our client-based traces are available via FTP from
ftp://cs-ftp.bu.edu/techreports/1995-010-www-client-traces.tar.gz
::::::::::::::
1995-011
::::::::::::::
Title: A Prefetching Protocol Using Client Speculation for the WWW
Author: Azer Bestavros and Carlos Cunha
Date: April 28, 1995
Abstract:
The growing traffic of WWW related services requires the development of
efficient protocols for reducing traffic, balancing load, and improving
service time. One way of achieving these effects is via caching or
replication. Studies like [Bestavros et al, 1995] show that simple
demand-driven caching is not enough, and that aggressive caching
policies have to be adopted. One such policy is prefetching.
In an earlier paper, the potential of speculation (server-initiated
prefetching) in distributed information systems (such as the WWW) was
investigated and shown to be effective in reducing service time and
server load. This speculation was based on statistical information
that the server maintains for each document it serves. In this paper
we study the performance of a client-initiated prefetching protocol,
whereby speculation is based on past user-specific access patterns.
We propose a technique whereby the history of a user is analyzed to
predict his/her future accesses. Our technique does not make a
distinction between embbed links and traversal links. In
particular, embedded links are treated as a special case of
traversal links with the probability of traversal being 1. We show
that performance gains are possible to obtain by identifying common
access patterns. Our study was conducted using client-based traces
obtained from our departmental labs over a period of 100 days
[Cunha et al, 1995].
::::::::::::::
1995-012
::::::::::::::
Title: Object-Oriented Animation on the World Wide Web
Author: Patrick Cai and Azer Bestavros
Date: May 8, 1995
Abstract:
We propose that video/audio animation be considered as a first-class
object on the World Wide Web. Animation is a very "bandwidth-efficient"
alternative to using video streams, especially for presentations
involving mathematical objects and interactions. We present an
object-oriented model that supports drawing-based and frame-based
animation. Based on that model, we describe an extension of the HyperText
Markup Language to support these capabilities. BU-NCSA Mosanim, a
modified version of the NCSA Mosaic for X(v2.5), was developed and is
available for distribution via anonymous FTP to demonstrate the concepts
and potentials of animation in presentations and interactive game playing
over the web.
::::::::::::::
1995-013
::::::::::::::
Title: Simulation of Hardware Dynamic Scheduling on the DLX Architecture
Author: Azer Bestavros and Yueh-Lin Liu
Date: June 6, 1995
Abstract:
We describe our extention of the existing DLX simulator (DLXsim),
available from the University of California at Berkeley, which allows
the simulation of two hardware dynamic scheduling techniques. There
are two DLXsim-like interactive simulators developed as part of this
project. DLXscore simulates the operation of a DLX architecture
equipped with scoreboarding hardware. DLXscore provides the status of
instructions, scoreboard tables, and statistics. DLXtomasulo simulates
the operation of a DLX architecture equipped with a hardware
implementation of Tomasulo's algorithm. DLXtomasulo provides the
status of instructions, reservation stations, and statistics. Both
programs allow the user to configure the number of functional units
and the latency of floating point operations.
::::::::::::::
1995-014
::::::::::::::
Title: Dynamic Server Selection in the Internet
Author: Mark E. Crovella and Robert L. Carter
Date: June 30, 1995
Abstract:
As distributed information services like the World Wide Web become
increasingly popular on the Internet, problems of scale are clearly
evident. A promising technique that addresses many of these problems is
service (or document) replication. However, when a service is
replicated, clients then need the additional ability to find a ``good''
provider of that service. In this paper we report on techniques for
finding good service providers without a priori knowledge of server
location or network topology. We consider the use of two principal
metrics for measuring distance in the Internet: hops, and round-trip
latency. We show that these two metrics yield very different results in
practice. Surprisingly, we show data indicating that the number of hops
between two hosts in the Internet is {\em not\/} strongly correlated to
round-trip latency. Thus, the distance in hops between two hosts is not
necessarily a good predictor of the expected latency of a document
transfer. Instead of using known or measured distances in hops, we show
that the extra cost at runtime incurred by dynamic latency measurement
is well justified based on the resulting improved performance. In
addition we show that selection based on dynamic latency measurement
performs much better in practice that any static selection scheme.
Finally, the difference between the distribution of hops and latencies
is fundamental enough to suggest differences in algorithms for server
replication. We show that conclusions drawn about service replication
based on the distribution of hops need to be revised when the
distribution of latencies is considered instead.
::::::::::::::
1995-015
::::::::::::::
Title: Explaining World Wide Web Traffic Self-Similarity
Author: Mark E. Crovella and Azer Bestavros
Date: August 29, 1995
Abstract:
Recently the notion of self-similarity has been shown to apply to
wide-area and local-area network traffic. In this paper we examine the
mechanisms that give rise to self-similar network traffic. We present
an explanation for traffic self-similarity by using a particular subset
of wide area traffic: traffic due to the World Wide Web (WWW). Using an
extensive set of traces of actual user executions of NCSA Mosaic,
reflecting over half a million requests for WWW documents, we show
evidence that WWW traffic is self-similar. Then we show that the
self-similarity in such traffic can be explained based on the underlying
distributions of WWW document sizes, the effects of caching and user
preference in file transfer, the effect of user ``think time'', and the
superimposition of many such transfers in a local area network. To do
this we rely on empirically measured distributions both from our traces
and from data independently collected at over thirty WWW sites.
::::::::::::::
1995-016
::::::::::::::
Title: World Wide Web Image Search Engines
Author: Stan Sclaroff
Date: 27 May 1995
Abstract:
We propose the development of a world wide web image search engine
that crawls the web collecting information about the images it finds,
computes the appropriate image decompositions and indices, and stores
this extracted information for searches based on image content.
Indexing and searching images need not require solving the image
understanding problem. Instead, the general approach should be to
provide an arsenal of image decompositions and discriminants that can
be precomputed for images. At search time, users can select a
weighted subset of these decompositions to be used for computing image
similarity measurements. While this approach avoids the
search-time-dependent problem of labeling what is important in images,
it still holds several important problems that require further
research in the area of query by image content. We briefly explore
some of these problems as they pertain to shape.
(white paper presented at the NSF Workshop on Visual Information
Management, MIT, June 1995)
::::::::::::::
1995-017
::::::::::::::
Title: Deformable Prototypes for Encoding Shape Categories in Image Databases
Author: Stan Sclaroff
Date: Sept 12, 1995
Abstract:
We describe a method for shape-based image database search that uses
deformable prototypes to represent categories. Rather than directly
comparing a candidate shape with all shape entries in the database,
shapes are compared in terms of the types of nonrigid deformations
(differences) that relate them to a small subset of representative
prototypes. To solve the shape correspondence and alignment problem,
we employ the technique of {\em modal matching}, an
information-preserving shape decomposition for matching, describing,
and comparing shapes despite sensor variations and nonrigid
deformations. In modal matching, shape is decomposed into an ordered
basis of orthogonal principal components. We demonstrate the utility
of this approach for shape comparison in 2-D image databases.
::::::::::::::
1995-018
::::::::::::::
Title: Deterministic Computations Whose Hisrtory is Independent of
the Order of Updating
Author: Peter Gacs
Date: November 18, 1995
Abstract:
Consider a network of processors (sites) in which each site has
finitely many neighbors. Each site has some transition function
computing its next state from the states of the neighbors. These
transitions (updates) are applied in arbitrary order, one or many at a
time.
If the state of site x at time t is r(x,t) then let us define the
sequence r'(x,0),r'(x,1),... by taking the sequence
r(x,0),r(x,1),... and deleting each repetition, i.e. each element
equal to the preceding one.
The system of transition functions is said to support asynchrony if
the sequence r'(x,i), (while it lasts, in case it is finite) depends
only on the initial configuration, not on the order of updates.
This paper gives a simple characterization of transition functions
supporting asynchrony. The characterization says that it is
equivalent to the following seemingly weaker commutativity condition:
For any configuration, for any pair x,y of neighbors, if the updating
would change both s(x) and s(y) then the result of updating first x
and then y is be the same as the result of doing this in the reverse
order.
::::::::::::::
1995-019
::::::::::::::
Title: The Undecidability of Mitchell's Subtyping Relationship
Author: J. B. Wells
Date: December 10, 1995
Abstract:
Mitchell defined and axiomatized a subtyping relationship (also known as
containment , coercibility , or subsumption over the types of System F
(with "arrow" and "forall"). This subtyping relationship is quite simple
and does not involve bounded quantification. Tiuryn and Urzyczyn quite
recently proved this subtyping relationship to be undecidable. This paper
supplies a new undecidability proof for this subtyping relationship.
First, a new syntax-directed axiomatization of the subtyping relationship
is defined. Then, this axiomatization is used to prove a reduction from
the undecidable problem of semi-unification to subtyping. The
undecidability of subtyping implies the undecidability of type checking
for System F extended with Mitchell's subtyping, also known as F plus eta.
::::::::::::::
1996-001
::::::::::::::
Title: AIDA-based Real-Time Fault-Tolerant Broadcast Disks
Author: Azer Bestavros
Date: January 6, 1996
Abstract:
The proliferation of mobile computers and wireless networks requires
the design of future distributed real-time applications to recognize
and deal with the significant asymmetry between downstream and
upstream communication capacities, and the significant disparity
between server and client storage capacities. Recent research work
proposed the use of Broadcast Disks as a scalable mechanism to deal
with this problem. In this paper, we propose a new broadcast disks
protocol, based on our Adaptive Information Dispersal Algorithm
(AIDA). Our protocol is different from previous broadcast disks
protocols in that it improves communication timeliness,
fault-tolerance, and security, while allowing for a finer control of
multiplexing of prioritized data (broadcast frequencies). We start
with a general introduction of broadcast disks. Next, we propose
broadcast disk organizations that are suitable for real-time
applications. Next, we present AIDA and show its fault-tolerance and
security properties. We conclude the paper with the description and
analysis of AIDA-based broadcast disks organizations that achieve both
timeliness and fault-tolerance, while preserving downstream
communication capacity.
::::::::::::::
1996-002
::::::::::::::
Title: An Admission Control Paradigm for Real-Time Databases
Author: Azer Bestavros and Sue Nagy
Date: January 11, 1996
Abstract:
We propose and evaluate an admission control paradigm for RTDBS, in
which a transaction is submitted to the system as a pair of processes:
a primary task, and a recovery block. The execution requirements of
the primary task are not known a priori, whereas those of the recovery
block are known a priori. Upon the submission of a transaction, an
Admission Control Mechanism is employed to decide whether to admit or
reject that transaction. Once admitted, a transaction is guaranteed to
finish executing before its deadline. A transaction is considered to
have finished executing if exactly one of two things occur: Either its
primary task is completed (successful commitment), or its recovery
block is completed (safe termination). Committed transactions bring a
profit to the system, whereas a terminated transaction brings no
profit. The goal of the admission control, and scheduling protocols
(e.g., concurrency control, I/O scheduling, memory management)
employed in the system is to maximize system profit. We describe a
number of admission control strategies and contrast (through
simulations) their relative performance.
::::::::::::::
1996-003
::::::::::::::
Title: Advances in Real-Time Database Systems Research:
Special Section on RTDBS of ACM SIGMOD Record 25(1), March 1996.
Author: Azer Bestavros
Date: January 15, 1996
Abstract:
A Real-Time DataBase System (RTDBS) can be viewed as an
amalgamation of a conventional DataBase Management System (DBMS) and a
real-time system. Like a DBMS, it has to process transactions and
guarantee ACID database properties. Furthermore, it has to operate in
real-time, satisfying time constraints imposed on transaction
commitments. A RTDBS may exist as a stand-alone system or as an
embedded component in a larger multidatabase system. The publication
in 1988 of a special issue of ACM SIGMOD Record on Real-Time DataBases
signaled the birth of the RTDBS research area---an area that brings
together researchers from both the database and real-time systems
communities. Today, almost eight years later, I am pleased to present
in this special section of ACM SIGMOD Record a review of recent
advances in RTDBS research. There were 18 submissions to this special
section, of which eight papers were selected for inclusion to provide
the readers of ACM SIGMOD Record with an overview of current and
future research directions within the RTDBS community. In this paper,
I will summarize these directions and provide the reader with pointers
to other publications for further information.
::::::::::::::
1996-004
::::::::::::::
Title: On the Fractal Nature of WWW and Its Application to Cache Modeling
Author: Virgilio A. F. Almeida and Adriana Oliveira
Date: February 5, 1996
Abstract:
The World Wide Web (WWW or Web) is growing rapidly on the Internet.
Web users want fast response time and easy access to a enormous
variety of information across the world. Thus, performance is becoming
a main issue in the Web. Fractals have been used to study fluctuating
phenomena in many different disciplines, from the distribution of
galaxies in astronomy to complex physiological control systems. The
Web is also a complex, irregular, and random system. In this paper,
we look at the document reference
pattern at Internet Web servers and use fractal-based models
to understand aspects (e.g. caching schemes) that affect the
Web performance.
::::::::::::::
1996-005
::::::::::::::
Title: Distributed Parallel Computing in Mermera:
Mixing Noncoherent Shared Memories
Author: A. Heddaya and H.S. Sinha (GTE Labs)
Date: March 7, 1996
Abstract:
Programmers of parallel processes that communicate through shared globally
distributed data structures (DDS) face a difficult choice. Either they must
explicitly program DDS management, by partitioning or replicating it over
multiple distributed memory modules, or be content with a high latency
coherent (sequentially consistent) memory abstraction that hides the DDS'
distribution. We present Mermera, a formalism and system that enables a
smooth spectrum of noncoherent shared memory behaviors to coexist between the
above two extremes. Our approach allows us to define known noncoherent
memories in a new simple way, to identify new memory behaviors, and to
characterize generic mixed-behavior computations. The latter are useful for
programming using multiple behaviors that complement each others' advantages,
and for programming by step-wise refinement.
On the practical side, we show that the large class of programs that use
asynchronous iterative methods (AIM) can run correctly on slow memory, one of
the weakest, and hence most efficient and fault-tolerant, noncoherence
conditions. An example AIM program to solve linear equations, is developed to
illustrate the need for concurrently mixing memory behaviors, and the
performance gains attainable via noncoherence. Other program classes tolerate
weak memory consistency by synchronizing in such a way as to yield executions
indistinguishable from coherent ones. AIM computations on noncoherent memory
yield noncoherent, yet correct, computations. We present performance data
that illustrate the benefits of noncoherence, in terms of raw memory
performance, as well as application speed.
Keywords: Distributed parallel computing, noncoherent shared memory,
asynchronous iterative algorithms, network of workstations.
::::::::::::::
1996-006
::::::::::::::
Title: Measuring Bottleneck Link Speed in Packet-Switched Networks
Authors: Robert L. Carter and Mark E. Crovella
Date: March 15, 1996
Abstract:
The quality of available network connections can often have a large
impact on the performance of distributed applications. For example,
document transfer applications such as FTP, Gopher and the World Wide
Web suffer increased response times as a result of network
congestion. For these applications, the document transfer time is
directly related to the available bandwidth of the connection.
Available bandwidth depends on two things: 1) the underlying capacity
of the path from client to server, which is limited by the
bottleneck link; and 2) the amount of other traffic competing for
links on the path. If measurements of these quantities were available
to the application, the current utilization of connections could be
calculated. Network utilization could then be used as a basis for
selection from a set of alternative connections or servers, thus
providing reduced response time. Such a dynamic server selection
scheme would be especially important in a mobile computing environment
in which the set of available servers is frequently changing.
In order to provide these measurements at the application level, we
introduce two tools: bprobe, which provides an estimate of the
uncongested bandwidth of a path; and cprobe, which gives an
estimate of the current congestion along a path. These two measures
may be used in combination to provide the application with an estimate
of available bandwidth between server and client thereby enabling
application-level congestion avoidance.
In this paper we discuss the design and implementation of our probe
tools, specifically illustrating the techniques used to achieve
accuracy and robustness. We present validation studies for both tools
which demonstrate their reliability in the face of actual Internet
conditions; and we give results of a survey of available bandwidth to
a random set of WWW servers as a sample application of our probe
technique. We conclude with descriptions of other applications of our
measurement tools, several of which are currently under development.
::::::::::::::
1996-007
::::::::::::::
Title: Dynamic Server Selection using Bandwidth Probing in Wide-Area Networks
Authors: Robert L. Carter and Mark E. Crovella
Date: February 2, 1996
Abstract:
Replication is a commonly proposed solution to problems of scale
associated with distributed services. However, when a service is
replicated, each client must be assigned a server. Prior work has
generally assumed that assignment to be static. In contrast, we propose
dynamic server selection, and show that it enables
application-level congestion avoidance.
To make dynamic server selection practical, we demonstrate the use
of three tools. In addition to direct measurements of round-trip latency,
we introduce and validate two new tools: bprobe, which estimates
the maximum possible bandwidth along a given path; and cprobe, which
estimates the current congestion along a path.
Using these tools we demonstrate dynamic server selection and compare it
to previous static approaches. We show that dynamic server selection
consistently outperforms static policies by as much as 50%. Furthermore,
we demonstrate the importance of each of our tools in performing dynamic
server selection.
::::::::::::::
1996-008
::::::::::::::
Title: Responsive Web Computing: Resource Management, Protocol Techniques, and Applications (A research statement)
Authors: Azer Bestavros, Marina Chen, Mark Crovella, Abdelsalam Heddaya, Stan Sclaroff, and James Cowie (Cooperating Systems Corporation)
Date: March 21, 1996
Abstract:
The exploding demand for services like the World Wide Web reflects the
potential that is presented by globally distributed information systems.
The number of WWW servers world-wide has doubled every 3 to 5 months since
1993, outstripping even the growth of the Internet. At each of these
self-managed sites, the Common Gateway Interface (CGI) and Hypertext
Transfer Protocol (HTTP) already constitute a rudimentary basis for
contributing local resources to remote collaborations.
However, the Web has serious deficiencies that make it unsuited for use
as a true medium for metacomputing --- the process of bringing
hardware, software, and expertise from many geographically dispersed
sources to bear on large scale problems. These deficiencies are,
paradoxically, the direct result of the very simple design principles
that enabled its exponential growth.
There are many symptoms of the problems exhibited by the Web: disk and
network resources are consumed extravagantly; information search and
discovery are difficult; protocols are aimed at data movement rather than
task migration, and ignore the potential for distributing computation.
However, all of these can be seen as aspects of a single
problem: as a distributed system for metacomputing, the Web offers
unpredictable performance and unreliable results.
The goal of our project is to use the Web as a medium (within either
the global Internet or an enterprise intranet) for metacomputing in a
reliable way with performance guarantees. We attack this problem one
four levels:
(1) Resource Management Services:
Globally distributed computing allows novel approaches to the old
problems of performance guarantees and reliability. Our first set of
ideas involve setting up a family of real-time resource management
models organized by the Web Computing Framework with a standard
Resource Management Interface (RMI), a Resource Registry, a Task
Registry, and resource management protocols to allow resource needs
and availability information be collected and disseminated so that a
family of algorithms with varying computational precision and accuracy
of representations can be chosen to meet realtime and reliability constraints.
(2) Middleware Services:
Complementary to techniques for allocating and scheduling available
resources to serve application needs under realtime and reliability
constraints, the second set of ideas aim at reduce communication
latency, traffic conjestion, server work load, etc. We develop
customizable middleware services to exploit application
characteristics in traffic analysis to drive new server/browser design
strategies (e.g., exploit self-similarity of Web traffic), derive
document access patterns via multiserver cooperation, and use them in
speculative prefetching, document caching, and aggressive replication
to reduce server load and bandwidth requirements.
(3) Communication Infrastructure:
Finally, to achieve any guarantee of quality of service or
performance, one must get at the network layer that can provide the
basic guarantees of bandwidth, latency, and reliability. Therefore,
the third area is a set of new techniques in network service and
protocol designs.
(4) Object-Oriented Web Computing Framework
A useful resource management system must deal with job priority,
fault-tolerance, quality of service, complex resources such as ATM
channels, probabilistic models, etc., and models must be tailored to
represent the best tradeoff for a particular setting. This requires a
family of models, organized within an object-oriented framework,
because no one-size-fits-all approach is appropriate. This presents a
software engineering challenge requiring integration of solutions at
all levels: algorithms, models, protocols, and profiling and
monitoring tools. The framework captures the abstract class
interfaces of the collection of cooperating components, but allows the
concretization of each component to be driven by the requirements of a
specific approach and environment.
::::::::::::::
1996-009
::::::::::::::
Title: Proceedings of the ECSCW'95: Workshop on the Role of Version Control in CSCW Applications
Editors: David Hicks, Anja Haake, David Durand, and Fabio Vitali
Date: April 26, 1996
Abstract:
The workshop entitled "The Role of Version Control in Computer Supported
Cooperative Work Applications" was held on September 10, 1995 in Stockholm,
Sweden in conjunction with the ECSCW'95 conference. Version control, the
ability to manage relationships between successive instances of artifacts,
organize those instances into meaningful structures, and support navigation
and other operations on those structures, is an important problem in CSCW
applications. It has long been recognized as a critical issue for
inherently cooperative tasks such as software engineering, technical
documentation, and authoring. The primary challenge for versioning in these
areas is to support opportunistic, open-ended design processes requiring
the preservation of historical perspectives in the design process, the
reuse of previous designs, and the exploitation of alternative designs.
This report contains a summary in which the workshop organizers report the
major results of the workshop. The summary is followed by a section that
contains the position papers that were accepted to the workshop. The
position papers provide more detailed information describing recent
research efforts of the workshop participants as well as current challenges
that are being encountered in the development of CSCW applications. A list
of workshop participants is provided at the end of the report.
::::::::::::::
1996-010
::::::::::::::
Title: Client-Based Logging: A New Paradigm For Distributed Transaction Management
Authors: Thimios Panagos
Date: June 13, 1996
Abstract:
The proliferation of inexpensive workstations and networks has
created a new era in distributed computing. At the same time,
non-traditional applications such as computer-aided design (CAD),
computer-aided software engineering (CASE), geographic-
information systems (GIS), and office-information systems (OIS)
have placed increased demands for high-performance transaction
processing on database systems. The combination of these factors
gives rise to significant challenges in the design of modern
database systems. In this thesis, we propose novel techniques
whose aim is to improve the performance and scalability of these
new database systems. These techniques exploit client resources
through client-based transaction management.
Client-based transaction management is realized by providing
logging facilities locally even when data is shared in a global
environment. This thesis presents several recovery algorithms
which utilize client disks for storing recovery related informa-
tion (i.e., log records). Our algorithms work with both coarse
and fine-granularity locking and they do not require the merging
of client logs at any time. Moreover, our algorithms support
fine-granularity locking with multiple clients permitted to con-
currently update different portions of the same database page.
The database state is recovered correctly when there is a complex
crash as well as when the updates performed by different clients
on a page are not present on the disk version of the page, even
though some of the updating transactions have committed.
This thesis also presents the implementation of the proposed
algorithms in a memory-mapped storage manager as well as a
detailed performance study of these algorithms using the OO1
database benchmark. The performance results show that client-
based logging is superior to traditional server-based logging.
This is because client-based logging is an effective way to
reduce dependencies on server CPU and disk resources and, thus,
prevents the server from becoming a performance bottleneck as
quickly when the number of clients accessing the database
increases.
::::::::::::::
1996-011
::::::::::::::
Title: Characterizing Reference Locality in the WWW
Authors: Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira
Date: June 21, 1996
Abstract:
As the World Wide Web (Web) is increasingly adopted as the
infrastructure for large-scale distributed information systems, issues
of performance modeling become ever more critical. In particular,
locality of reference is an important property in the performance
modeling of distributed information systems. In the case of the Web,
understanding the nature of reference locality will help improve the
design of middleware, such as caching, prefetching, and document
dissemination systems. For example, good measurements of reference
locality would allow us to generate synthetic reference streams with
accurate performance characteristics, would allow us to compare
empirically measured streams to explain differences, and would allow
us to predict expected performance for system design and capacity
planning.
In this paper we propose models for both temporal and spatial locality
of reference in streams of requests arriving at Web servers.
We show that simple models based only on document popularity (likelihood
of reference) are insufficient for capturing either temporal or spatial
locality. Instead, we rely on an equivalent, but numerical,
representation of a reference stream: a stack distance trace.
We show that temporal locality can be
characterized by the marginal distribution of the stack distance trace,
and we propose models for typical distributions and compare their cache
performance to our traces.
We also show that spatial locality in a reference stream can be
characterized using the notion of self-similarity. Self-similarity
describes long-range correlations in the dataset, which is a property
that previous researchers have found hard to incorporate into synthetic
reference strings. We show that stack distance strings appear to be
stongly self-similar, and we provide measurements of the degree of
self-similarity in our traces. Finally, we discuss methods for
generating synthetic Web traces that exhibit the properties of temporal
and spatial locality that we measured in our data.
Keywords: Self-similarity; Long-range dependence; Distance strings;
Reference locality; Caching; Performance modeling.
::::::::::::::
1996-012
::::::::::::::
Title: Management of Communicable Memory and Lazy Barriers for Bulk Synchronous Parallelism in BSPk
Authors: Amr Fahmy (Harvard University) and Abdelsalam Heddaya
Date: July 2, 1996
Abstract:
Communication and synchronization stand as the dual bottlenecks in the
performance of parallel systems, and especially those that attempt to
alleviate the programming burden by incurring overhead in these two
domains. We formulate the notions of communicable memory and lazy
barriers to help achieve efficient communication and synchronization.
These concepts are developed in the context of BSPk, a toolkit library
for programming networks of workstations---and other distributed
memory architectures in general---based on the Bulk Synchronous
Parallel (BSP) model. BSPk, whose design is the subject of this
paper, emphasizes efficiency in communication by minimizing local
memory-to-memory copying, and in barrier synchronization by not
forcing a process to wait unless it needs remote data. Both the
message passing (MP) and distributed shared memory (DSM) programming
styles are supported in BSPk, for the former helps processes exchange
short-lived unnamed data values, while the latter permits
communication through long-lived named variables.
::::::::::::::
1996-013
::::::::::::::
Title: Real-Time Databases: Issues and Applications (RTDB'96 Workshop Report)
Authors: Azer Bestavros, Kwei-Jay Lin (University of California Irvine), and Sang Son (University of Virginia)
Date: July 3, 1996
Abstract:
This report summarizes the technical presentations and discussions that
took place during RTDB'96: the First International Workshop on
Real-Time Databases, which was held on March 7 and 8, 1996 in Newport
Beach, California. The main goals of this project were to (1) review
recent advances in real-time database systems research, (2) to promote
interaction among real-time database researchers and practitioners,
and (3) to evaluate the maturity and directions of real-time database
technology.
::::::::::::::
1996-014
::::::::::::::
Title: TCP Boston --- A Fragmentation-tolerant TCP Protocol for ATM Networks
Authors: Azer Bestavros and Gitae Kim
Date: July 15, 1996
Abstract:
The popularity of TCP/IP coupled with the premise of high speed
communication using Asynchronous Transfer Mode (ATM) technology have
prompted the network research community to propose a number of
techniques to adapt TCP/IP to ATM network environments. ATM offers
Available Bit Rate (ABR) and Unspecified Bit Rate (UBR) services for
best-effort traffic, such as conventional file transfer. However,
recent studies have shown that TCP/IP, when implemented using ABR or
UBR, leads to serious performance degradations, especially when the
utilization of network resources (such as switch buffers) is
high. Proposed techniques---switch-level enhancements, for
example---that attempt to patch up TCP/IP over ATMs have had limited
success in alleviating this problem. The major reason for TCP/IP's
poor performance over ATMs has been consistently attributed to packet
fragmentation, which is the result of ATM's 53-byte cell-oriented
switching architecture.
In this paper, we present a new transport protocol, TCP Boston, that
turns ATM's 53-byte cell-oriented switching architecture into an
advantage for TCP/IP. At the core of TCP Boston is the Adaptive
Information Dispersal Algorithm (AIDA), an efficient encoding
technique that allows for dynamic redundancy control. AIDA makes
TCP/IP's performance less sensitive to cell losses, thus ensuring a
graceful degradation of TCP/IP's performance when faced with congested
resources. In this paper, we introduce AIDA and overview the main
features of TCP Boston. We present detailed simulation results that
show the superiority of our protocol when compared to other
adaptations of TCP/IP over ATMs. In particular, we show that TCP
Boston improves TCP/IP's performance over ATMs for both
network-centric metrics (e.g., effective throughput) and
application-centric metrics (e.g., response time).
Keywords: ATM networks; TCP/IP; Adaptive Information Dispersal
Algorithm; congestion control; performance evaluation.
::::::::::::::
1996-015
::::::::::::::
Title: Ergodicity and mixing rate of one-dimensional cellular automata
Author: Kihong Park, Computer Science Department, Boston University
Date: July 22, 1996
Abstract:
One-and two-dimensional cellular automata which are known to be
fault-tolerant are very complex. On the other hand, only very simple
cellular automata have actually been proven to lack fault-tolerance,
i.e., to be mixing. The latter either have large noise probability
$\eps$ or belong to the small family of two-state nearest-neighbor
monotonic rules which includes local majority voting.
For a certain simple automaton $L$ called the soldiers rule, this problem
has intrigued researchers for the last two decades since $L$ is clearly
more robust than local voting: in the absence of noise, $L$ eliminates any
finite island of perturbation from an initial configuration of all 0's or
all 1's. The same holds for a 4-state monotonic variant of $L$, $K$,
called two-line voting. We will prove that the probabilistic cellular
automata $K_\eps$ and $L_\eps$ asymptotically lose all information about
their initial state when subject to small, strongly biased noise. The
mixing property trivially implies that the systems are ergodic.
The finite-time information-retaining quality of a mixing system can be
represented by its relaxation time $\Relax(\cdot)$, which measures the time
before the onset of significant information loss. This is known to grow
as $(1/\eps)^c$ for noisy local voting. The impressive error-correction
ability of $L$ has prompted some researchers to conjecture that
$\Relax(L_\eps)=2^{c/\eps}$. We prove the tight bound
$2^{c_1\log^2 1/\eps} < \Relax(L_\eps) < 2^{c_2\log^2 1/\eps}$ for a biased
error model. The same holds for $K_\eps$. Moreover, the lower bound is
independent of the bias assumption.
The strong bias assumption makes it possible to apply sparsity/renormalization
techniques, the main tools of our investigation, used earlier in the
opposite context of proving fault-tolerance.
::::::::::::::
1996-016
::::::::::::::
Title: On the relationship between file sizes, transport protocols, and self-similar network traffic
Author: Kihong Park, Gitae Kim, and Mark Crovella, Computer Science Department, Boston University
Date: July 30, 1996
Abstract:
Recent measurements of local-area and wide-area traffic have shown
that network traffic exhibits variability at a wide range of
scales---self-similarity. In this paper, we examine a mechanism that
gives rise to self-similar network traffic and present some of its
performance implications. The mechanism we study is the transfer of
files or messages whose size is drawn from a heavy-tailed distribution.
We examine its effects through detailed transport-level simulations
of multiple TCP streams in an internetwork.
First, we show that in a ``realistic'' client/server network
environment---i.e., one with bounded resources and coupling among traffic
sources competing for resources---the degree to which file sizes are
heavy-tailed can directly determine the degree of traffic self-similarity
at the link level. We show that this causal relationship is not
significantly affected by changes in network resources (bottleneck
bandwidth and buffer capacity), network topology, the influence of
cross-traffic, or the distribution of interarrival times.
Second, we show that properties of the transport layer play an
important role in preserving and modulating this relationship. In
particular, the reliable transmission and flow control mechanisms
of TCP (Reno, Tahoe, or Vegas) serve to maintain the long-range
dependency structure induced by heavy-tailed file size distributions.
In contrast, if a non-flow-controlled and unreliable (UDP-based)
transport protocol is used, the resulting traffic shows little
self-similar characteristics: although still bursty at short time scales,
it has little long-range dependence. If flow-controlled, unreliable
transport is employed, the degree of traffic self-similarity is
positively correlated with the degree of throttling at the source.
Third, in exploring the relationship between file sizes, transport
protocols, and self-similarity, we are also able to show some of the
performance implications of self-similarity. We present data on
the relationship between traffic self-similarity and network performance
as captured by performance measures including packet loss rate,
retransmission rate, and queueing delay. Increased self-similarity,
as expected, results in degradation of performance. Queueing delay,
in particular, exhibits a drastic increase with increasing
self-similarity. Throughput-related measures such as packet loss and
retransmission rate, however, increase only gradually with increasing
traffic self-similarity as long as reliable, flow-controlled transport
protocol is used.
::::::::::::::
1996-017
::::::::::::::
Title: Load Profiling in Distributed Real-Time Systems
Author: Azer Bestavros
Date: August 1, 1996
Abstract:
Load balancing is often used to ensure that nodes in a distributed
systems are equally loaded. In this paper, we show that for real-time
systems, load balancing is not desirable. In particular, we propose a
new load-profiling strategy that allows the nodes of a distributed
system to be unequally loaded. Using load profiling, the system
attempts to distribute the load amongst its nodes so as to maximize
the chances of finding a node that would satisfy the computational
needs of incoming real-time tasks. To that end, we describe and
evaluate a distributed load-profiling protocol for dynamically
scheduling time-constrained tasks in a loosely-coupled distributed
environment. When a task is submitted to a node, the scheduling
software tries to schedule the task locally so as to meet its
deadline. If that is not feasible, it tries to locate another node
where this could be done with a high probability of success, while
attempting to maintain an overall load profile for the system. Nodes
in the system inform each other about their state using a combination
of multicasting and gossiping. The performance of the proposed
protocol is evaluated via simulation, and is contrasted to other
dynamic scheduling protocols for real-time distributed systems. Based
on our findings, we argue that keeping a diverse availability
profile and using passive bidding (through gossiping) are both
advantageous to distributed scheduling for real-time systems.
::::::::::::::
1996-018
::::::::::::::
Title: Performance Analysis of a WWW Server
Author: Virgilio Almeida (UFMG and BU), Jussara Almeida (UFMG), and Cristina Murta (UFMG)
Date: August 5, 1996
Abstract:
The WWW has experienced a phenomenal growth and has become the most
popular Internet application. As a consequence of its large
popularity, the Internet has suffered from various performance
problems, such as network congestion and overloaded servers. These
days, it is not uncommon to find servers refusing connections because
they are overloaded.
Performance has always been a key issue in the design and operation of
on-line systems. With regard to Internet, performance is also
critical, because users want fast and easy access to all objects
(e.g., documents, graphics, audio, and video) available on the
net. Thus, it is important to understand WWW performance issues. This
paper focuses on the performance analysis of Web servers. Using a
synthetic benchmark (WebStone) and standard operating systems
monitoring tools, it analyzes three different Web server software
running on top of a Windows NT platform and performing some typical
WWW tasks. It also discusses the main steps needed to carry out a WWW
performance analysis effort and shows relations between the workload
characteristics and system resource usage.
::::::::::::::
1996-019
::::::::::::::
Title: Beta-Reduction as Unification
Author: A.J. Kfoury, Computer Science, Boston University
Date: July 8, 1996
Abstract:
We define a unification problem ^UP with the property that,
given a pure lambda-term M, we can derive an instance Gamma(M)
of ^UP from M such that Gamma(M) has a solution if and only if
M is beta-strongly normalizable. There is a type discipline for
pure lambda-terms that characterizes beta-strong normalization;
this is the system of intersection types (without a ``top'' type
that can be assigned to every lambda-term). In this report, we
use a lean version LAMBDA of the usual system of intersection types.
Hence, ^UP is also an appropriate unification problem to characterize
typability of lambda-terms in LAMBDA. It also follows that ^UP is
an undecidable problem, which can in turn be related to semi-unification
and second-order unification (both known to be undecidable).
::::::::::::::
1996-020
::::::::::::::
Title: An Infinite Pebble Game and Applications
Authors: A.J. Kfoury, Computer Science, Boston University and A.P. Stolboushkin, Mathematics, UCLA
Date: August 15, 1996
Abstract:
We generalize the well-known pebble game to infinite dag's, and we
use this generalization to give new and shorter proofs of results in
different areas of computer science (as diverse as ``logic of programs''
and ``formal language theory''). Our applications here include a proof
of a theorem due to Salomaa, asserting the existence of a context-free
language with infinite index, and a proof of a theorem due to Tiuryn
and Erimbetov, asserting that unbounded memory increases the power of
logics of programs. The original proofs by Salomaa, Tiuryn, and Erimbetov,
are fairly technical. The proofs by Tiuryn and Erimbetov also involve
advanced techniques of model theory, namely, back-and-forth constructions
based on a variant of Ehrenfeucht-Fraisse games. By contrast, our proofs
are not only shorter, but also elementary. All we need is essentially
finite induction and, in the case of the Tiuryn-Erimbetov result, the
compactness and completeness of first-order logic.
::::::::::::::
1996-021
::::::::::::::
Title: A Linearization of the Lambda Calculus and Consequences
Author: A.J. Kfoury, Computer Science, Boston University
Date: August 19, 1996
Abstract:
If every lambda-abstraction in a lambda-term M binds at most one
variable occurrence, then M is said to be "linear". Many questions
about linear lambda-terms are relatively easy to answer, e.g.
they all are beta-strongly normalizing and all are simply-typable.
We extend the syntax of the standard lambda-calculus L to a non-standard
lambda-calculus L^ satisfying a linearity condition generalizing the
notion in the standard case. Specifically, in L^ a subterm Q of a term
M can be applied to several subterms R1,...,Rk in parallel, which we
write as (Q. R1 \wedge ... \wedge Rk). The appropriate notion of beta-
reduction beta^ for the calculus L^ is such that, if Q is the lambda-
abstraction (\lambda x.P) with m\geq 0 bound occurrences of x, the
reduction can be carried out provided k = max(m,1). Every M in L^ is
thus beta^-SN. We relate standard beta-reduction and non-standard
beta^-reduction in several different ways, and draw several consequences,
e.g. a new simple proof for the fact that a standard term M is beta-SN
iff M can be assigned a so-called ``intersection'' type (``top'' type
disallowed).
::::::::::::::
1996-022
::::::::::::::
Title: Typability is Undecidable for F+Eta
Author: J. B. Wells
Date: March 9, 1996
Abstract:
System F is the well-known polymorphically-typed lambda calculus with
universal quantifiers. F+eta is System F extended with the eta rule,
which says that if term M can be given type tau and M eta-reduces to N ,
then N can also be given the type tau. Adding the eta rule to System F is
equivalent to adding the subsumption rule using the subtyping
(containment) relation that Mitchell defined and axiomatized [Mit88]. The
subsumption rule says that if M can be given type tau and tau is a subtype
of type sigma, then M can be given type sigma. Mitchell's subtyping
relation involves no extensions to the syntax of types, i.e., no bounded
polymorphism and no supertype of all types, and is thus unrelated to the
system "F-sub".
Typability for F+eta is the problem of determining for any term M whether
there is any type tau that can be given to it using the type inference
rules of F+eta. Typability has been proven undecidable for System F
[Wel94] (without the eta rule), but the decidability of typability has
been an open problem for F+eta. Mitchell's subtyping relation has
recently been proven undecidable [TU95,Wel95b], implying the
undecidability of "type checking" for F+eta. This paper reduces the
problem of subtyping to the problem of typability for F+eta, thus proving
the undecidability of typability. The proof methods are similar in
outline to those used to prove the undecidability of typability for System
F, but the fine details differ greatly.
::::::::::::::
1996-023
::::::::::::::
Title: Pinwheel Scheduling for Fault-tolerant Broadcast Disks in Real-time Database Systems
Author: Sanjoy Baruah (U of Vermont) and Azer Bestavros (Boston U)
Date: August 22, 1996
Abstract:
The design of programs for broadcast disks which incorporate real-time
and fault-tolerance requirements is considered. A generalized model
for real-time fault-tolerant broadcast disks is defined. It is shown
that designing programs for broadcast disks specified in this model is
closely related to the scheduling of pinwheel task systems. Some new
results in pinwheel scheduling theory are derived, which facilitate
the efficient generation of real-time fault-tolerant broadcast disk
programs.
::::::::::::::
1996-024
::::::::::::::
Title: WebWave: Globally Load Balanced Fully Distributed
Caching of Hot Published Documents
Author: Abdelsalam Heddaya and Sulaiman Mirdad
Date: October 10, 1996
Abstract:
Document publication service over such a large network as the Internet
challenges us to harness available server and network resources to meet
fast growing demand. In this paper, we show that large-scale dynamic
caching can be employed to globally minimize server idle time, and hence
maximize the aggregate throughput of the whole service. Given the
distributed nature of the system, a successful caching mechanism must
satisfy three properties: (1) that it maximize the global throughput of the
system, (2) that it be completely distributed in the sense of operating
only on the basis of local information, and (3) that it require no naming
service that introduces a scalability bottleneck.
In this paper, we develop a precise definition, which we call "tree
load-balance", of what it means for a mechanism to satisfy these three
goals, and present two algorithms that achieve them. Both algorithms
compute the request rate that should be allocated to each cache server, so
that global throughput is maximized. The first algorithm, WebFold, is a
centralized one that is provably optimal with respect to throughput. The
second algorithm, WebWave, whose optimality is evidenced by simulation, is
a fully distributed diffusion-based protocol. Both algorithms assume that
cache copies are placed on the routing tree that connects the cached
document's home server with its clients. As a consequence, document
requests can find cache copies without resorting to a cache directory of
any kind. The results herein apply only to immutable documents; we do not
consider the cache consistency problem.
::::::::::::::
1996-025
::::::::::::::
Title: Measuring the Behavior of a World-Wide Web Server
Author: Jussara Almeida, Virgilio Almeida, and David Yates
Date: October 29, 1996
Abstract:
Server performance has become a crucial issue for improving the overall
performance of the World-Wide Web. This paper describes Webmonitor, a tool
for evaluating and understanding server performance, and presents new
results for a realistic workload.
Webmonitor measures activity and resource consumption, both within
the kernel and in HTTP processes running in user space. Webmonitor is
implemented using an efficient combination of sampling and event-driven
techniques that exhibit low overhead. Our initial implementation is for
the Apache World-Wide Web server running on the Linux operating system. We
demonstrate the utility of Webmonitor by measuring and understanding the
performance of a Pentium-based PC acting as a dedicated WWW server. Our
workload uses a file size distribution with a heavy tail. This captures
the fact that Web servers must concurrently handle some requests for large
audio and video files, and a large number of requests for small documents,
containing text or images.
Our results show that in a Web server saturated by client requests,
over 90% of the time spent handling HTTP requests is spent in the kernel.
Furthermore, keeping TCP connections open, as required by TCP, causes a
factor of 2-9 increase in the elapsed time required to service an HTTP
request. Data gathered from Webmonitor provide insight into the causes of
this performance penalty. Specifically, we observe a significant increase
in resource consumption along three dimensions: the number of HTTP
processes running at the same time, CPU utilization, and memory
utilization. These results emphasize the important role of operating
system and network protocol implementation in determining Web server
performance.
::::::::::::::
1996-026
::::::::::::::
Title: Blocking Java Applets at the Firewall
Authors: David M. Martin Jr. (BU), Sivaramakrishnan Rajagopalan (Bellcore), and Aviel D. Rubin (Bellcore).
Date: November 14, 1996
Abstract:
This paper explores the problem of protecting a site on the Internet
against hostile external Java applets while allowing trusted internal
applets to run. With careful implementation, a site can be
made resistant to current Java security weaknesses as well as those yet to
be discovered. In addition, we describe a new attack on certain
sophisticated firewalls that is most effectively realized
as a Java applet.
::::::::::::::
1996-027
::::::::::::::
Title: Proceedings of the Real-Time Systems Symposium WIP Session
Author: Azer Bestavros (Editor)
Date: December 4, 1996
Abstract:
This technical report includes 14 short papers presented during the
WIP session of the 17th Real-Time Systems Symposium, held in
Washington DC on December 4-6, 1996. The title and authors are
included below.
------
(1) A Specialized Specification and Verification System for Timed Automata
Myla Archer and Constance Heitmeyer
Naval Research Laboratory, USA
Abstract: Assuring the correctness of specifications of
real-time systems can involve significant human effort. The use
of a mechanical theorem prover to encode such specifications and
to verify their properties could significantly reduce this
effort. A barrier to routinely encoding and mechanically
verifying specifications has been the need first to master the
specification language and logic of a general theorem proving
system. Our approach to overcoming this barrier is to provide
mechanical support for producing specifications and verifying
proofs, specialized for particular mathematical models and proof
techniques. We are currently developing a mechanical
verification system called TAME (Timed Automata Modeling
Environment), which provides this specialized support using
SRI's Prototype Verification System (PVS). Our system is
intended to permit steps in reasoning similar to those in hand
proofs that use model-specific techniques. TAME has recently
been used to detect errors in a realistic example.
------
(2) Scheduling Slack in MetaH
Pam Binns
Honeywell Technology Center, USA
Abstract: A real-time implementation for allocating slack to
aperiodic proceesses in MetaH is nearing completion. The slack
scheduling algorithm is based on the slack stealer originally
proposed in "An Optimal Algorithm for Scheduling Soft-Aperiodic
Tasks in Fixed-Priority Preemptive Systems" with practical
extensions to allow for support of process criticalities,
multiple process streams (of different criticalities) competing
for pooled slack and inclusion of run-time overheads in the
slack functions. Areas in need of future work are also
identified.
------
(3) AFTER: A case tool to assist in Fine-tuning of embedded real-time systems
Gaurav Arora and David Stewart
University of Maryland, USA
Abstract: AFTER (Assist in Fine-Tuning of Embedded Real-time
systems) is an interactive analysis and predictor tool for
embedded systems. It helps designers quickly identify timing
problems and systematically fine-tune an application during and
after the implementation phase of a product's lifecycle. The
tool begins with raw timing data collected from an embedded
system. It analyzes the data to provide a temporal image of the
current implementation, highlighting actual and potential
problems. The user then interacts with AFTER to obtain
predictions on what overall effect can be expected if small
adjustments are made to configuration parameters or to the
timing properties of specific software components. The tool
integrates and extends prior research in scheduling, task
monitoring, and operating system design for real-time systems.
------
(4) Genericity and Upgradability in Ultra-Dependable Real-Time Architectures
Andy Wellings, Ljerka Beus-Dukis, Alan Burns, and David Powell
LAAS-CNRS, France and University of York, UK
Abstract: We report on the ideas currently being developed
within the European GUARDS project to develop a generic
upgradable architecture for real-time dependable systems. After
a brief introduction and overview of the architecture, we
outline the GUARDS approach for scheduling real-time replicated
computation.
------
(5) Challenges in Engineering Distributed Shipboard Control System
L.Welch, B.Ravindran, R.Harrison, L.Madden, M.W.Masters and W.Mills
Naval Surface Warfare Center and University of Texas at Arlington, USA
Abstract: In response to the need to develop high capacity,
scalable computer systems for shipboard use, a program called
the High Performance Distributed Computing Program (HiPer-D),
was created. HiPer-D is intended to provide the technical
design concepts and engineering data needed to enable the Navy
to capitalize on commercial computing products. The program,
conducted jointly by the Defense Advanced Research Projects
Agency (DARPA) and the Aegis Shipbuilding Program, consists of
simultaneous top down engineering studies and large-scale
critical experiments using new computer technology.
------
(6) Issues for realizing a scalable Real Time Kernel for
function-distributed Multiprocessors
Hiroaki Takada, Cai-Dong Wang, and ken Sakamura
University of Tokyo, Japan
Abstract: In multiprocessor systems, the worst-case execution
time of a task that exclusively accesses a shared resource is
unavoidably prolonged as the number of contending processors is
increased. In case of function-distributed multiprocessors,
because many of the tasks can be processed within a processor,
it is advantageous that their worst-case behavior are
independent of the number of processors in the system. This
paper summarizes the required properties on scalable real-time
kernels and discusses their realization techniques. What we
have solved so far are described, and the remaining problem to
be solved is presented.
------
(7) The design and implementation of the CPU power regulator for
multimedia operating systems
Giun-Haur Huang, Shie-Kai Ni, and Tei-Wei Kuo
National Chung Cheng University, Taiwan
Abstract: This paper describes a Windows NT/95 utility, the CPU
Power Regulator (CPR), which improves the capability of Windows
NT/95 in servicing time-critical applications. CPR considers a
distance model [4] to service time-critical applications such as
multimedia softwares and electronic games in a timely
fashion. Distinct from the past work [7, 8, 9], CPR adopts a
user-level control mechanism to manage the resource allocations
on Windows NT/95 and makes no modifications to the operating
system and application softwares. The performance of CPR was
verified by a collection of simulation experiments of randomly
generated and realistic workloads. CPR not only introduces very
low system overheads but also largely reduces the phenomenon of
non-timely resource allocation for applications. The
experimental results also demonstrate the capability and
flexibility of CPR in multiplexing CPU cycles to provide
different degrees of quality-of-service to time-critical
applications. The results of this work present a low-cost
software solution to transform an ordinary operating system into
a multimedia operating system.
------
(8) An approach for monitoring intrusion removal in Real Time Systems
Vishal Jain, Madalene Spezialetti, and Rajiv Gupta
University of Pittsburgh and Trinity College, USA
Abstract: To assist in the development of a real-time
application, monitoring is used to collect execution timing
information for the application. In this paper we propose a
strategy that accurately reports timing information by
accounting for intrusion introduced by monitoring. In addition,
by allowing processes that miss deadlines to run to completion,
our approach provides the user with times by which the execution
of these processes exceeds their deadlines. This information can
be used to guide the user in restructuring the application to
meet timing requirements.
------
(9) Empirical Evaluation of Task and Resource Scheduling in Dynamic
Real-Time Systems
Ken Tew and Panos Chrysantis and Daniel Mosse
University of Pittsburgh, USA
Abstract: This work-in-progress reports on our on-going
empirical evaluation of a two-tiered resource allocation scheme
assuming independent jobs, that is, jobs have no precedence
constraints. The first tier extends the temporal density
approach, while the second tier uses an Earliest Deadline First
(EDF) approach to schedule jobs at a site. However, job
scheduling at sites is constrained by the precedence relation
between the loading and execution of a job. In addition to CPU
scheduling, we also take care of the time it takes to load a
task onto memory from a disk (or from another processor over the
network). We assume that loading (i.e., disk scheduling)
follows an EDF non-preemptive discipline whereas the execution
(i.e., CPU scheduling) follows a preemptive EDF.
------
(10) Scalability based admission control of real-time channels
Ramesh Yerraballi and Ravi Mukkamala
Midwestern State University and Old Dominion University, USA
Abstract: This paper reports our continuing efforts and initial
results with the problem of admission control in real-time
networks. This problem was first addressed by the Tenet group,
and, their approach was based on the assumption that the link
level scheduling was EDD (Earliest Due Date) based. Our work
departs from this assumption by addressing the problem in the
context of any arbitrary dynamic/fixed priority link level
scheduling. Our approach is based on extending a result we have
derived in a different context, viz., Task Scalability. It
involves assessing the current capacity of a link in terms of
its ability to accommodate (scale to) new channels. This
assessment (called the admittance measure) is then heuristically
compared against the traffic requirements of the newly requested
channel to decide its admissibility. A simulation study was
performed to study the effectiveness of our approach in
improving both utilization of the link and admissibility of
channels. Further, we demonstrate the relevance of our heuristic
by observing that it reduces to the Tenet schedulability test,
for the case of EDD.
------
(11) Optimization of scheduling on real-time parallel computer systems
Leyuan Shi and Philip Q. Hwang
University of Wisconsin and Defence Mapping Agency, USA
Abstract: We describe our ongoing work in the field of optimal
scheduling for real-time systems. We are primarily concerned
with optimal task allocation and job scheduling for parallel
computer systems. Many real-time task allocation and job
scheduling problems are proven to be NP-hard. Recently, we
proposed a randomized optimization framework for efficiently
solving such NP-hard problems. The proposed method, the Nested
Partitions (NP) method, has been proved to converge to global
optimal solutions and it is also highly matched to emerging
massively parallel processing capabilities.
------
(12) Dynamic Scheduling of Hard Real-Time Applications in Open System
Environment
Z. Deng, J. W.-S. Liu, and J. Sun
University of Illinois at Urbana Champaign, USA
Abstract: This paper focuses on the problem of providing
run-time support to real-time applications and non-real-time
applications in an open system. It describes a two-level
hierarchical priority-driven scheme for scheduling independently
developed applications. The scheme allows the developer of each
real-time application to validate the schedulability of the
application independently of other applications. Once a
real-time application is created and accepted by the open
system, its schedulability is guaranteed regardless of the
behaviors of other applications that execute concurrently in the
system.
------
(13) In Search for an efficient Real-Time Atomic Commit Protocol
Yousef Al-Houmaily and Panos Chrysantis
University of Pittsburgh, USA
Abstract: The purpose of this paper is to report on the first
step in our quest for an efficient atomic commit protocol in
real-time databases. This includes the development of RT-IYV
(real-time implicit yes-vote), a new real-time atomic commit
protocol. In contrast to other real-time commit protocols that
provide for semantic atomicity, RT-IYV is designed to ensure the
traditional notion of transaction atomicity. RT-IYV (1)
eliminates the voting phase from 2PC hence, reducing the number
of sequential coordination messages and forced log writes during
normal processing, and (2) supports transactions' forward
recovery hence, enabling partially executed transactions to
resume their execution after a failure. To illustrate its
performance advantages, we compare RT-IYV with the recently
proposed OPT (optimistic commit protocol) which is also designed
to support the standard transaction atomicity in real-time
databases.
------
(14) Distributed Real-Time Dataflow: An Execution Paradigm for Image
Processing and Anti-Submarine Warfare Applications
Steve Goddard and Kevin Jeffay
University of North Carolina, USA
Abstract: The purpose of this paper is to report on the first
step in our quest for an efficient atomic commit protocol in
real-time databases. This includes the development of RT-IYV
(real-time implicit yes-vote), a new real-time atomic commit
protocol. In contrast to other real-time commit protocols that
provide for semantic atomicity, RT-IYV is designed to ensure the
traditional notion of transaction atomicity. RT-IYV (1)
eliminates the voting phase from 2PC hence, reducing the number
of sequential coordination messages and forced log writes during
normal processing, and (2) supports transactions' forward
recovery hence, enabling partially executed transactions to
resume their execution after a failure. To illustrate its
performance advantages, we compare RT-IYV with the recently
proposed OPT (optimistic commit protocol) which is also designed
to support the standard transaction atomicity in real-time
databases.
::::::::::::::
1997-001
::::::::::::::
Title: Exploiting Redundancy for Timeliness in TCP Boston
Author: Azer Bestavros and Gitae Kim
Date: January 24, 1997
Abstract:
While ATM bandwidth-reservation techniques are able to offer the
guarantees necessary for the delivery of real-time streams in many
applications (e.g. live audio and video), they suffer from many
disadvantages that make them inattractive (or impractical) for many
others. These limitations coupled with the flexibility and popularity
of TCP/IP as a best-effort transport protocol have prompted the
network research community to propose and implement a number of
techniques that adapt TCP/IP to the Available Bit Rate (ABR) and
Unspecified Bit Rate (UBR) services in ATM network environments. This
allows these environments to smoothly integrate (and make use of)
currently available TCP-based applications and services without much
(if any) modifications. However, recent studies have shown that
TCP/IP, when implemented over ATM networks, is susceptible to serious
performance limitations. In a recently completed study, we have
unveiled a new transport protocol, TCP Boston, that turns ATM's
53-byte cell-oriented switching architecture into an advantage for
TCP/IP.
In this paper, we demonstrate the real-time features of TCP Boston
that allow communication bandwidth to be traded off for timeliness. We
start with an overview of the protocol. Next, we analytically
characterize the dynamic redundancy control features of TCP
Boston. Next, We present detailed simulation results that show the
superiority of our protocol when compared to other adaptations of
TCP/IP over ATMs. In particular, we show that TCP Boston improves
TCP/IP's performance over ATMs for both network-centric metrics ({\em
e.g.}, effective throughput and percent of missed deadlines) and
real-time application-centric metrics (e.g., response time and
jitter).
::::::::::::::
1997-002
::::::::::::::
Title: The Network Effects of Prefetching
Author: Mark Crovella and Paul Barford
Date: February 7, 1997
Prefetching has been shown to be an effective technique for reducing
user perceived latency in distributed systems. In this paper we show
that even when prefetching adds no extra traffic to the network, it can
have serious negative performance effects. Straightforward approaches
to prefetching increase the burstiness of individual sources, leading to
increased average queue sizes in network switches. However, we
also show that applications can avoid the undesirable queueing effects
of prefetching. In fact, we show that applications employing
prefetching can significantly improve network performance, to a level
much better than that obtained without any prefetching at all. This is
because prefetching offers increased opportunities for traffic shaping
that are not available in the absence of prefetching. Using a simple
transport rate control mechanism, a prefetching application can modify
its behavior from a distinctly ON/OFF entity to one whose data transfer
rate changes less abruptly, while still delivering all data in advance
of the user's actual requests.
::::::::::::::
1997-003
::::::::::::::
Title: Visible Volume: A Robust Measure for Protein Structure Characterization.
Author: Loredana Lo Conte and Temple F. Smith
Date: March 20, 1997
Abstract:
We propose a new characterization of protein structure based on the natural
tetrahedral geometry of the beta carbon and a new geometric measure of
structural similarity, called visible volume. In our model, the
side-chains are replaced by an ideal tetrahedron, the orientation of which is
fixed with respect to the backbone and corresponds to the preferred rotamer
directions. Visible volume is a measure of the non-occluded empty space
surrounding each residue position after the side-chains have been removed. It
is a robust, parameter-free, locally-computed quantity that accounts for all
spatial constraints that are of relevance to the corresponding position in
the native structure. When computing visible volume, we ignore the nature of
both the residue observed at each site and the ones surrounding it. We focus
instead on the space that, together, these residues could occupy. By doing
so, we are able to quantify a new kind of invariance beyond the apparent
variations in a protein family, namely, the conservation of the physical
space that is available at structurally equivalent positions for 3-D
side-chain packing. Visible volume has the unique property of estimating how
much space can be used at each site for different combinations of side-chains
to fit in. This property, and the relation of visible volume to the degree of
exposure of a residue position, qualify it as a powerful tool in a variety of
applications, from the detailed analysis of protein structure to the
definition of better scoring functions for threading purpose.
::::::::::::::
1997-004
::::::::::::::
Title: Determining WWW User's Next Access and Its Application to Pre-fetching
Authors: Carlos R. Cunha and Carlos F. B. Jaccoud
Date: March 24, 1997
Abstract:
World-Wide Web (WWW) services have grown to levels where significant
delays are expected to happen. Techniques like pre-fetching are likely
to help users to personalize their needs, reducing their waiting times.
However, pre-fetching is only effective if the right documents are
identified and if user's move is correctly predicted. Otherwise,
pre-fetching will only waste bandwidth. Therefore, it is productive to
determine whether a revisit will occur or not, before starting
pre-fetching.
In this paper we develop two user models that help determining user's
next move. One model uses Random Walk approximation and the other is
based on Digital Signal Processing techniques. We also give hints on how
to use such models with a simple pre-fetching technique that we are
developing.
This is an extended version of the article with the same title
presented in the International Symposium on Computers and
Communication'97, Alexandria, Egypt, 1-3 July, 1997.
::::::::::::::
1997-005
::::::::::::::
Title: ImageRover: A Content-Based Image Browser for the World Wide Web
Authors: Stan Sclaroff, Leonid Taycher, and Marco La Cascia
Date: March 31, 1997
Abstract:
ImageRover is a search by image content navigation tool for the world
wide web. To gather images expediently, the image collection subsystem
utilizes a distributed fleet of WWW robots running on different
computers. The image robots gather information about the images they
find, computing the appropriate image decompositions and indices, and
store this extracted information in vector form for searches based on
image content. At search time, users can iteratively guide the search
through the selection of relevant examples. Search performance is made
efficient through the use of an approximate, optimized k-d tree
algorithm. The system employs a novel relevance feedback algorithm that
selects the Lm distance metrics appropriate for a particular query.
::::::::::::::
1997-006
::::::::::::::
Title: Generating Representative Web Workloads for Network and Server Performance Evaluation
Authors: Barford, Paul and Crovella, Mark
Date: May 5, 1997 (revised November 4, 1997)
Abstract:
One role for workload generation is as a means for understanding how
servers and networks respond to variation in load. This enables
management and capacity planning based on current and projected usage.
This paper applies a number of observations of Web server usage to
create a realistic Web workload generation tool which mimics a set of
real users accessing a server. The tool, called SURGE (Scalable URL
Reference Generator) generates references matching empirical
measurements of 1) server file size distribution; 2) request size
distribution; 3) relative file popularity; 4) embedded file
references; 5) temporal locality of reference; and 6) idle periods of
individual users. This paper reviews the essential elements required
in the generation of a representative Web workload. It also addresses
the technical challenges to satisfying this large set of simultaneous
constraints on the properties of the reference stream, the solutions
we adopted, and their associated accuracy. Finally, we present
evidence that SURGE exercises servers in a manner significantly
different from other Web server benchmarks.
::::::::::::::
1997-007
::::::::::::::
Title: Real-Time Mutable Broadcast Disks
Authors: Sanjoy Baruah and Azer Bestavros
Date: May 5, 1997
Abstract:
There is an increased interest in using broadcast disks to support
mobile access to real-time databases. However, previous work has only
considered the design of real-time immutable broadcast disks, the
contents of which do not change over time. This paper considers the
design of programs for real-time mutable broadcast disks --- broadcast
disks whose contents are occasionally updated. Recent
scheduling-theoretic results relating to pinwheel scheduling and pfair
scheduling are used to design algorithms for the efficient generation
of real-time mutable broadcast disk programs.
::::::::::::::
1997-008
::::::::::::::
Title: Active Blobs
Authors: Sclaroff, Stan and Isidoro, John
Date: April 30, 1997
Abstract:
Active blobs, a new region-based approach to nonrigid motion tracking is
described. Active blobs employ a view-based representation; each
object is defined in terms of a deformable, active blob of color pixels.
Shape is defined in terms of a triangulated finite element model that
captures object shape plus a color texture map that captures
object appearance. Active blobs also provide normalization with respect to
some photometric variations. Nonrigid shape registration and motion
recovery is achieved by posing the problem as an energy-based, robust
minimization procedure. The active blob formulation is robust to
occlusions, shadows, and specular highlights.
::::::::::::::
1997-009
::::::::::::::
Title: Load Profiling for Efficient Route Selection in Multi-Class Networks
Authors: Azer Bestavros and Ibrahim Matta
Date: May 14, 1997
Abstract:
High-speed networks, such as ATM networks, are expected to
support diverse Quality of Service (QoS) constraints, including
real-time QoS guarantees. Real-time QoS is required by many
applications such as those that involve voice and video communication.
To support such services, routing algorithms that allow applications
to reserve the needed bandwidth over a Virtual Circuit (VC) have been
proposed. Commonly, these bandwidth-reservation algorithms assign VCs
to routes using the least-loaded concept, and thus result in balancing
the load over the set of all candidate routes.
In this paper, we show that for such reservation-based
protocols---which allow for the exclusive use of a preset fraction of a
resource's bandwidth for an extended period of time---load balancing
is not desirable as it results in resource fragmentation, which
adversely affects the likelihood of accepting new reservations. In
particular, we show that load-balancing VC routing algorithms are not
appropriate when the main objective of the routing protocol is to
increase the probability of finding routes that satisfy incoming VC requests,
as opposed to equalizing the bandwidth utilization along the various
routes. We present an on-line VC routing scheme that is based on the
concept of ``load profiling'', which allows a distribution of
``available'' bandwidth across a set of candidate routes to match the
characteristics of incoming VC QoS requests. We show the
effectiveness of our load-profiling approach when compared to
traditional load-balancing and load-packing VC routing schemes.
::::::::::::::
1997-010
::::::::::::::
Title: Concurrency Admission Control Management in ACCORD
Authors: Susan Nagy and Azer Bestavros
Date: May 15, 1997
Abstract:
We propose and evaluate admission control mechanisms for ACCORD, an
Admission Control and Capacity Overload management Real-time Database
framework---an architecture and a transaction model---for hard
deadline RTDB systems. The system architecture consists of admission
control and scheduling components which provide early notification of
failure to submitted transactions that are deemed not valuable or
incapable of completing on time. In this paper, we focus on our
Concurrency Admission Control Manager (CACM), which ensures that
admitted transactions do not overburden the system by requiring a
level of concurrency that is not sustainable. The transaction model
consists of two components: a primary taskand a compensating task.
The execution requirements of the primary task are notknown a priori,
whereas those of the compensating task are known a priori. Upon the
submission of a transaction, the Admission Control Mechanismsare
employed to decide whether to admitor rejectthat transaction. Once
admitted, a transaction is guaranteed to finishexecuting before its
deadline. A transaction is considered to have finished executing if
exactly one of two things occur: Either its primary task is completed
(successful commitment), or its compensating task is completed (safe
termination). Committed transactions bring a profit to the system,
whereas a terminated transaction brings no profit. The goal of the
admission control and scheduling protocols (e.g., concurrency control,
I/O scheduling, memory management) employed in the system is to
maximize system profit. In that respect, we describe a number of
concurrency admission control strategies and contrast (through
simulations) their relative performance.
::::::::::::::
1997-011
::::::::::::::
Title: Reliability, Availability, Dependability and Performability: A User-centered View
Authors: Abdelsalam Heddaya (BU) and Abdelsalam Helal (MCC)
Date: May 22, 1997
Abstract:
Reliability and availability have long been considered twin system
properties that could be enhanced by distribution. Paradoxically, the
traditional definitions of these properties do not recognize the
positive impact of recovery---as distinct from simple repair and
restart---on reliability, nor the negative effect of recovery, and of
internetworking of clients and servers, on availability. As a result
of employing the standard definitions, reliability would tend to be
underestimated, and availability overestimated.
We offer revised definitions of these two critical metrics, which we
call service reliability and service availability, that improve the
match between their formal expression, and intuitive meaning. A
fortuitous advantage of our approach is that the product of our two
metrics yields a highly meaningful figure of merit for the overall
dependability of a system. But techniques that enhance system
dependability exact a performance cost, so we conclude with a cohesive
definition of performability that rewards the system for performance
that is delivered to its client applications, after discounting the
following consequences of failure: service denial and interruption,
lost work, and recovery cost.
::::::::::::::
1997-012
::::::::::::::
Title: On the Interaction Between an Operating System and Web Server
Author: David J. Yates, Virgilio Almeida, and Jussara M. Almeida
Date: July 16, 1997
Abstract:
This paper examines how and why web server performance changes as the
workload at the server varies. We measure the performance of a PC acting
as a standalone web server, running Apache on top of Linux. We use two
important tools to understand what aspects of software architecture and
implementation determine performance at the server. The first is a tool
that we developed, called WebMonitor, which measures activity and resource
consumption, both in the operating system and in the web server. The
second is the kernel profiling facility distributed as part of Linux. We
vary the workload at the server along two important dimensions: the number
of clients concurrently accessing the server, and the size of the documents
stored on the server. Our results quantify and show how more clients and
larger files stress the web server and operating system in different and
surprising ways. Our results also show the importance of fixed costs
(i.e., opening and closing TCP connections, and updating the server log) in
determining web server performance.
::::::::::::::
1997-013
::::::::::::::
Title: Evaluation of a Load Profiling Approach to Routing Guaranteed Bandwidth Flows
Authors: Ibrahim Matta and Azer Bestavros
Date: July 30, 1997
Abstract:
To support the diverse Quality of Service (QoS) requirements of
real-time (e.g. audio/video) applications in integrated services
networks, several routing algorithms that allow for the reservation of
the needed bandwidth over a Virtual Circuit (VC) established on one of
several candidate routes have been proposed. Traditionally, such
routing is done using the least-loaded concept, and thus results in
balancing the load across the set of candidate routes. In a recent
study, we have established the inadequacy of this load balancing
practice and proposed the use of load profiling as an alternative.
Load profiling techniques allow the distribution of ``available''
bandwidth across a set of candidate routes to match the
characteristics of incoming VC QoS requests.
In this paper we thoroughly characterize the performance of VC routing
using load profiling and contrast it to routing using load balancing
and load packing. We do so both analytically and via extensive
simulations of multi-class traffic routing in Virtual Path (VP) based
networks. Our findings confirm that for routing guaranteed bandwidth
flows in VP networks, load balancing is not desirable as it results in
VP bandwidth fragmentation, which adversely affects the likelihood of
accepting new VC requests. This fragmentation is more pronounced when
the granularity of VC requests is large. Typically, this occurs when a
common VC is established to carry the aggregate traffic flow of many
high-bandwidth real-time sources. For VP-based networks, our
simulation results show that our load-profiling VC routing scheme
performs better or as well as the traditional load-balancing VC
routing in terms of revenue under both skewed and uniform workloads.
Furthermore, load-profiling routing improves routing fairness by
proactively increasing the chances of admitting high-bandwidth
connections.
::::::::::::::
1997-014
::::::::::::::
Title: Image Digestion and Relevance Feedback in the ImageRover WWW Search Engine
Authors: Leonid Taycher, Marco La Cascia, and Stan Sclaroff
Date: August 14, 1997
Abstract:
ImageRover is a search by image content navigation tool for the
world wide web. The staggering size of the WWW dictates certain
strategies and algorithms for image collection, digestion, indexing,
and user interface. This paper describes two key components of the
ImageRover strategy: image digestion and relevance feedback. Image
digestion occurs during image collection; robots digest the images
they find, computing image decompositions and indices, and storing
this extracted information in vector form for searches based on image
content. Relevance feedback occurs during index search; users can
iteratively guide the search through the selection of relevant
examples. ImageRover employs a novel relevance feedback algorithm to
determine the weighted combination of image similarity metrics
appropriate for a particular query. ImageRover is
available and running on the web site.
::::::::::::::
1997-015
::::::::::::::
Title: Admission Control and Scheduling for High-Performance WWW Servers
Authors: Azer Bestavros and Naomi Katagai
Date: August 21, 1997
Abstract:
In this paper we examine a number of admission control and scheduling
protocols for high-performance web servers based on a 2-phase policy
for serving HTTP requests. The first ``registration'' phase involves
establishing the TCP connection for the HTTP request and
parsing/interpreting its arguments, whereas the second ``service''
phase involves the service/transmission of data in response to the
HTTP request. By introducing a delay between these two phases, we show
that the performance of a web server could be potentially improved
through the adoption of a number of scheduling policies that optimize
the utilization of various system components (e.g. memory cache and
I/O). In addition, to its premise for improving the performance of a
single web server, the delineation between the registration and
service phases of an HTTP request may be useful for load balancing
purposes on clusters of web servers. We are investigating the use of
such a mechanism as part of the Commonwealth testbed being developed
at Boston University.
::::::::::::::
1997-016
::::::::::::::
Title: Discovering Spatial Locality in WWW Access Patterns using Data Mining of Document Clusters in Server Logs
Authors: Azer Bestavros
Date: September 10, 1997
Abstract:
In this paper, we introduce the notion of a ``document cluster'' in
WWW space as a generalization of the notion of a ``cache line'' in
linear memory address space. Through the analysis of Web server logs,
we show evidence of the spatial locality of reference in WWW access
patterns and present an implementation of an efficient data mining
algorithm that discovers document clusters. We show preliminary
simulation results that quantify the benefits of using document
clusters for file allocation on server disks, as well as for purposes
of prefetching into server cache/main memory.
::::::::::::::
1997-017
::::::::::::::
Title: To queue or not to queue?: When FCFS is better than PS in a distributed system
Authors: Mor Harchol-Balter, Mark E. Crovella, and Cristina D. Murta
Date: October 31, 1997
Abstract:
We examine the question of whether to employ the first-come-first-served
(FCFS) discipline or the processor-sharing (PS) discipline at the nodes
in a distributed server system. We are interested in the case in which
service times are drawn from a heavy-tailed distribution, and so have
very high variability. Traditional wisdom in such a situation would
prefer the PS discipline, because it allows small tasks to avoid being
delayed behind large tasks in a queue. However, we show that system
performance can actually be significantly better under FCFS queueing, if
a particular kind of task assignment is used. By task assignment, we
mean an algorithm that inspects incoming tasks and assigns them to hosts
for service. The policy we propose is called SITA-E: Size Interval Task
Assignment with Equal Load; it is a static policy that does not
incorporate feedback knowledge of the state of the hosts. Surprisingly,
under SITA-E, FCFS queueing typically outperforms the PS discipline by a
factor of about two, as measured by mean waiting time and mean slowdown
(waiting time of task divided by its service time). We analyze the
FCFS/SITA-E policy and compare it to the processor-sharing case; in
addition we compare it in simulation to a number of other policies. We
show that the benefits of SITA-E are present even in small-scale
distributed systems (four or more hosts), and that SITA-E can in many
cases be more effective than a dynamic policy that takes into account
the current load at each host. Finally we discuss issues in employing
this policy in distributed Web servers.
::::::::::::::
1997-018
::::::::::::::
Title: Task Assignment in a Distributed System: Improving Performance by Unbalancing Load
Authors: Mark Crovella, Mor Harchol-Balter, Cristina Murta
Date: October 29, 1997
Abstract:
We consider the problem of task assignment in a distributed system (such as
a distributed Web server) in which task sizes are drawn from a heavy-tailed
distribution. Many task assignment algorithms are based on the heuristic
that balancing the load at the server hosts will result in optimal
performance. We show this conventional wisdom is less true when the task
size distribution is heavy-tailed (as is the case for Web file sizes). We
introduce a new task assignment policy, called Size Interval Task
Assignment with Variable Load (SITA-V). SITA-V purposely operates the
server hosts at different loads, and directs smaller tasks to the
lighter-loaded hosts. The result is that SITA-V provably decreases the
mean task slowdown by significant factors (up to 1000 or more) where the
more heavy-tailed the workload, the greater the improvement factor. We
evaluate the tradeoff between improvement in slowdown and increase in
waiting time in a system using SITA-V, and show conditions under which
SITA-V represents a particularly appealing policy. We conclude with a
discussion of the use of SITA-V in a distributed Web server, and show that
it is attractive because it has a simple implementation which requires no
communication from the server hosts back to the task router.
::::::::::::::
1997-019
::::::::::::::
Title: Color Region Grouping and Shape Recognition with Deformable Models
Authors: Lifeng Liu and Stan Sclaroff
Date: November 24, 1997
Abstract:
A new deformable shape-based method for color region segmentation is
described. The method includes two stages: over-segmentation using
a traditional color region segmentation algorithm, followed by
deformable model-based region merging via grouping and hypothesis
selection. During the second stage, region merging and object
identification are executed simultaneously. A statistical shape model is
used to estimate the likelihood of region groupings and model
hypotheses. The prior distribution on deformation parameters is
precomputed using principal component analysis over a training set of
region groupings. Once trained, the system autonomously segments
deformed shapes from the background, while not merging them with
similarly colored adjacent objects. Furthermore, the recovered
parametric shape model can be used directly in object recognition and
comparison. Experiments in segmentation and image retrieval are
reported.
::::::::::::::
1997-020
::::::::::::::
Title: Head Tracking via Robust Registration in Texture Map Images
Authors: Marco La Cascia, John Isidoro, and Stan Sclaroff
Date: November 24, 1997
Abstract:
A novel method for 3D head tracking in the presence of large head
rotations and facial expression changes is described. Tracking is
formulated in terms of color image registration in the texture map of a
3D surface model. Model appearance is recursively updated via image
mosaicking in the texture map as the head orientation varies. The
resulting dynamic texture map provides a stabilized view of the face
that can be used as input to many existing 2D techniques for face
recognition, facial expressions analysis, lip reading, and eye tracking.
Parameters are estimated via a robust minimization procedure; this
provides robustness to occlusions, wrinkles, shadows, and specular
highlights. The system was tested on a variety of sequences taken with
low quality, uncalibrated video cameras. Experimental results are
reported.
::::::::::::::
1997-021
::::::::::::::
Title: Proceedings of the 18th Real-Time Systems Symposium WIP Session
Author: Bestavros, Azer (Editor)
Date: December 1, 1997
Abstract:
This technical report includes 10 short papers presented during the
WIP session of the 18th Real-Time Systems Symposium, held in
Washington DC on December 3-5, 1997. The title and authors are
included below.
------
(1) CPU Reservations and Time Constraints:
Efficient, Predictable Scheduling of Independent Activities
Michael B. Jones, Microsoft Research, Microsoft Corporation
Daniela Rosu and Marcel-Catalin Rosu, Georgia Institute of Technology
Abstract:
Workstations and personal computers are increasingly being used
for applications with real-time characteristics such as speech
understanding and synthesis, media computations and I/O, and
animation, often concurrently executed with traditional
non-real-time workloads. This paper presents a system that can
schedule multiple independent activities so that:
- activities can obtain minimum guaranteed execution rates with
application-specified reservation granularities via CPU
Reservations,
- CPU Reservations, which are of the form "reserve X units of
time out of every Y units", provide not just an average case
execution rate of X/Y over long periods of time, but the
stronger guarantee that from any instant of time, by Y time
units later, the activity will have executed for at least X
time units,
- applications can use Time Constraints to schedule tasks by
deadlines, with on-time completion guaranteed for tasks with
accepted constraints, and
- both CPU Reservations and Time Constraints are implemented very
efficiently. In particular,
- CPU scheduling overhead is bounded by a constant and is not a
function of the number of schedulable tasks.
Other key scheduler properties are:
- activities cannot violate other activities' guarantees,
- time constraints and CPU reservations may be used together,
separately, or not at all (which gives a round-robin
schedule), with well-defined interactions between all
combinations, and
- spare CPU time is fairly shared among all activities.
The Rialto operating system, developed at Microsoft Research,
achieves these goals by using a precomputed schedule, which is
the fundamental basis of this work.
------
(2) Characterizing Group Communication Middleware for a Real-time
Distributed System
L. M. Feeney, P. Bernadat, F. Travostino
The Open Group Research Institute
Abstract:
This paper presents our current work in characterizing the
behavior of a real-time dependable distributed system, which
must exhibit predictable behavior under load and in the presence
of partial failures. We focus on measuring the end-to-end
properties of the middleware which implements the real-time
process group service, specifically its membership and message
latency. The paper also describes the tools and techniques we
have developed, along with some of the practical issues that
arise in instrumenting a real-time distributed system.
------
(3) Real-Time Monitoring of the EIVIS Distributed Video-Server on Windows NT
M. Gergeleit and M. Mock
GMD - German National Research Center for Information Technology
Abstract:
JewelNT is a fine-grained, trace-based real-time monitoring tool
for Windows NT. It hooks into the NT kernel and provides full
information about NT?s thread scheduling combined with
application-level timing information. JewelNT allows monitoring
a number of NT machines remotely controlled from one central
desktop. JewelNT has been initially developed for the evaluation
and performance tuning of the distributed EIVIS video server, a
European ESPRIT project.
------
(4) Achieving Predictability and Responsiveness of Fault Recovery
Operations in Real-Time Systems.
Pedro Mejia-Alvarez, CINVESTAV-IPN, Seccion de Computacion, Mexico
Juan A. de la Puente, Universidad Politecnica de Madrid, Spain.
Abstract:
The dependability of real-time software can be improved by
enhancing the robustness of the scheduler in predicting and
controlling the occurrence of timing failures during recovery.
This may be achieved by developing strategies which allow the
scheduler to dynamically control the manner in which real-time
applications tasks and its time-critical recovery operations are
handled in time.
In this paper, an scheme is presented to provide scheduling
guarantees for a variety of fault tolerant techniques. Bounds of
execution are developed and an study case examined to analyze
these techniques in its ability to recover from transient faulty
situations. A criterion for providing responsiveness for
fault-tolerant scheduler is discussed and some approaches were
developed.
A responsiveness table RTAB, has been developed for assisting
the scheduler during recovery of transient faults. This table is
based on different criterion for responsiveness of recovery. An
analytical characterization of the table, for supporting on-line
scheduling has been developed. Some of the issues involved in
using this table to support run-time scheduling decisions are
illustrated with a hypothetical application example.
The advantages of the RTAB approach over previously proposed
scheduling policies for aperiodic tasks include the support for
run-time customization and guaranteed scheduling stability
during recovery.
------
(5) Compositional Reasoning about Real-Time Asynchronous
Communication with Time-Outs
D. Peticolas and F.A. Stomp, University of California, Davis
Abstract:
This paper describes ongoing work in developing a compositional
trace-based semantics and proof system for a real-time
language. The semantics models distributed processes
communicating over asynchronous FIFO communication
channels. Sending processes can specify time-out periods for
individual messages. Messages not received within their time-out
period are `lost'. Program behavior is modeled as traces of
events, including events (such as asynchronous messages) which
occur after termination. The proof system uses specification
triples with explicit variables for time and program traces.
------
(6) Exploring Consistency of Read-Only Transactions in Real-Time Systems
Kwok-Wa Lam, Sang H. Son* and Sheung-Lun Hung
City University of Hong Kong, Hong Kong.
University of Virginia, U.S.A.
Abstract:
In this paper, we describe our current work on exploring the
consistency of read-only transactions (ROT) in real-time
systems. A ROT is a transaction that only reads, but does not
update any data items. Since there is a significant proportion
of ROTs in several real-time systems, it is important to
investigate how to process ROTs efficiently with separate
algorithms. We identify three different consistency
requirements for ROTs. Particularly, we define a weaker form of
consistency, view consistency, which allows ROTs to perceive
different serialization order of update transactions, thus
permitting non-serializable execution of transactions. However,
ROTs are still ensured to see consistent data. Based on view
consistency, we present two algorithms which let ROTs read the
most recent and consistent data without interfering with update
transactions. The recency of data read by a ROT could be
important in some real-time applications.
------
(7) Dynamic Timing Constraints - Relaxing Overconstraining
Specifications of Real-Time Systems
Gerhard Fohler
Malardalens University, S-72123 Vasteras, Sweden
Abstract:
Standard timing constraints, such as deadlines and periods can
overconstrain specifications and lack expressive power. Only few
tasks have "natural" periods and deadlines. Most are artifacts,
derived during system design. Knowledge of more flexibility is
abandoned in the process, thus overconstraining the
specification.
In this paper, we propose dynamic timing constraints, which
represent conditions for the temporal correctness rather than
fixed values for constraints such as period and deadline. This
is achieved by so-called timing entities, which combine a
functional unit, such as a task, with a feasibility function for
testing the feasibility of the timing of the unit. This
representation allows the system specification to provide
information about feasibility and various options of time
related design decisions.
We outline how dynamic timing constraints can be used with
standard scheduling algorithms, indicate modifications to these
algorithms, and novel approaches fully utilizing the benefits of
dynamic timing constraints.
------
(8) Exploring the Importance of Preprocessing Operations in
Real-Time Multiprocessor Scheduling
Jan Jonsson, Chalmers University of Technology, Sweden
Abstract:
Recent real-time scheduling research has mainly been focused on
generating mature scheduling theories. Therefore, the important
field of preprocessing operations has been left fairly
unexplored. Most real-time scheduling techniques in use today
assume that the constraints (e.g. local task deadlines, degree
of task replication, or task clustering) on the constituent
tasks are entirely known beforehand. In such cases, no
preprocessing is typically applied. However, when the
constraints are relaxed, preprocessing operations can be applied
for increasing the likelihood of succeeding with a scheduling
attempt. In addition, preprocessing operations are vital in
quality-of-service negotiations for adaptive real-time systems
since changing some of the task constraints may result in a
higher system reward.
In this paper, we define a set of preprocessing operations that
we believe is representative for real-time multiprocessor
scheduling. We also give a rationale for using these operations,
and present results from some preliminary work that corroborate
our conjecture. In conjunction to this, we present an evaluation
framework for objective studies of different preprocessing
operations.
------
(9) Compiler Support for Non-intrusive Monitoring and Debugging of
Real-Time Systems in the CRL Environment
P. V. Petrov, A. D. Stoyen
New Jersey Institute of Technology
Abstract:
In this work we approach the problem of monitoring and debugging
real-time distributed systems by performing static analysis and
transformations to eliminate obtrusion to the monitored system.
Our work extends the CRL testbed compiler and run-time
environment to support monitoring and logging for the purpose of
post-mortem debugging. The main contribution of this work is
the innovative use of compiler transformations and idle slots
for monitoring and logging.
------
(10) Optimization of Real-Time MRL Rule-Based Systems with the EQL Optimizer
Albert Mo Kim Cheng
University of Houston--University Park Houston, Texas, USA
Abstract:
In our earlier work, we developed an efficient algorithm for
optimizing a class of EQL rule-based systems so that they can
meet specified response time constraints. In this paper, we
show that this EQL optimizer with minor modifications can be
used to optimize a class of real-time MRL rule-based systems.
As a more expressive superset of EQL, MRL allows existentially
quantified as well as universally quantified variables (simple
or macro), making it comparable in expressive power to that of
OPS5 and CLIPS (two of the most popular commercially available
rule-based system languages) while maintaining predictable
response time behavior.
::::::::::::::
1997-022
::::::::::::::
Title: A Framework for Local Anonymity in the Internet
Author: David M. Martin Jr.
Date: December 1997
Abstract:
We describe and evaluate options for providing anonymous IP service,
argue for the further investigation of local anonymity, and sketch a
framework for the implementation of locally anonymous networks.
::::::::::::::
1998-001
::::::::::::::
Title: Aggregating Congestion Information Over Sequences of TCP Connections
Authors: Azer Bestavros and Olivier Hartmann
Date: January 5, 1998
Abstract:
In this paper we present an extension of the TCP stack that allows a
sequence of TCP connections between the same machines to
share the congestion window. Our Linux implementation of this
scenario shows significant improvement in performance,
particularly when the individual connections are short-lived. Such a
behavior is common on the web, due to the nature of the HTTP protocol
and the distribution of file sizes.
::::::::::::::
1998-002
::::::::::::::
Author: Peter Gacs
Title: Reliable Cellular Automata with Self-Organization
Date: Jan. 15, 1998
Abstract:
In a probabilistic cellular automaton in which all local transitions
have positive probability, the problem of keeping a bit of information
for more than a constant number of steps is nontrivial, even in an
infinite automaton.
Still, there is a solution in 2 dimensions, and this solution can be
used to construct a simple 3-dimensional discrete-time universal
fault-tolerant cellular automaton.
This technique does not help much to solve the following problems:
remembering a bit of information in 1 dimension; computing in
dimensions lower than 3; computing in any dimension with
non-synchronized transitions.
Our more complex technique organizes the cells in blocks that
perform a reliable simulation of a second (generalized) cellular
automaton.
The cells of the latter automaton are also organized in blocks,
simulating even more reliably a third automaton, etc.
Since all this (a possibly infinite hierarchy) is organized in
``software'', it must be under repair all the time from damage caused
by errors.
A large part of the problem is essentially self-stabilization
recovering from a mess of arbitrary-size and content caused by the
faults.
The present paper constructs an asynchronous one-dimensional
fault-tolerant cellular automaton, with the further feature of
``self-organization''.
The latter means that unless a large amount of input information must be
given, the initial configuration can be chosen to be periodical with a
small period.
::::::::::::::
1998-003
::::::::::::::
Title: Distributed Packet Rewriting and its Application to Scalable Server Architectures
Date: Feb 1, 1998
Authors: Azer Bestavros, Mark Crovella, Jun Liu, and David Martin
Abstract:
To construct high performance Web servers, system builders are
increasingly turning to distributed designs. An important challenge
that arises in distributed Web servers is the need to direct incoming
connections to individual hosts. Previous methods for connection
routing have employed a centralized node which handles all incoming
requests. In contrast, we propose a distributed approach, called
Distributed Packet Rewriting (DPR), in which all hosts of the
distributed system participate in connection routing. We argue that
this approach promises better scalability and fault-tolerance than the
centralized approach. We describe our implementation of four variants
of DPR and compare their performance. We show that DPR provides
performance comparable to centralized alternatives, measured in terms
of throughput and delay under the SPECweb96 benchmark. Finally, we
argue that DPR is particularly attractive both for small scale systems
and for systems following the emerging trend toward increasingly
intelligent I/O subsystems.
::::::::::::::
1998-004
::::::::::::::
Title: Combining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web
Authors: Marco La Cascia, Sarathendu Sethi, and Stan Sclaroff
Date: February 9, 1998
Abstract:
Some WWW image engines allow the user to form a query in terms of text
keywords. To build the image index, keywords are extracted
heuristically from HTML documents containing each image, and/or from
the image URL and file headers. Unfortunately, text-based image
engines have merely retro-fitted standard SQL database query methods,
and it is difficult to include images cues within such a framework. On
the other hand, visual statistics ({\em e.g.}, color histograms) are
often insufficient for helping users find desired images in a vast WWW
index. By truly unifying textual and visual statistics, one would
expect to get better results than either used separately.
In this paper, we propose an approach that allows the combination of
visual statistics with textual statistics in the vector space
representation commonly used in query by image content systems. Text
statistics are captured in vector form using latent semantic indexing
(LSI). The LSI index for an HTML document is then associated with
each of the images contained therein. Visual statistics ({\em e.g.},
color, orientedness) are also computed for each image. The LSI and
visual statistic vectors are then combined into a single index vector
that can be used for content-based search of the resulting image
database. By using an integrated approach, we are able to take
advantage of possible statistical couplings between the topic of the
document (latent semantic content) and the contents of images (visual
statistics). This allows improved performance in conducting
content-based search. This approach has been implemented in a WWW
image search engine prototype.
::::::::::::::
1998-005
::::::::::::::
Title: Preserving Bandwidth Through A Lazy Packet Discard Policy in ATM Networks
Authors: Gitae Kim and Azer Bestavros
Date: February 9, 1998
Abstract:
A number of recent studies have pointed out that TCP's performance
over ATM networks tends to suffer, especially under congestion and
switch buffer limitations. Switch-level enhancements and link-level
flow control have been proposed to improve TCP's performance in ATM
networks. Seletive Cell Discard (SCD) and Early Packet Discard (EPD)
ensure that partial packets are discarded from the network "as early
as possible", thus reducing wasted bandwidth. While such techniques
improve the achievable throughput, their effectiveness tends to
degrade in multi-hop networks.
In this paper, we introduce Lazy Packet Discard (LPD), an AAL-level
enhancement that improves effective throughput, reduces response time,
and minimizes wasted bandwidth for TCP/IP over ATM. In contrast to the
SCD and EPD policies, LPD delays as much as possible the removal from
the network of cells belonging to a partially communicated packet. We
outline the implementation of LPD and show the performance advantage
of TCP/LPD, compared to plain TCP and TCP/EPD through analysis and
simulations.
::::::::::::::
1998-006
::::::::::::::
Title: Active Voodoo Dolls: A Vision Based Input Device for Non-rigid Control
Authors: John Isidoro and Stan Sclaroff
Date: 16 February 1998
Abstract:
A vision based technique for non-rigid control is presented that can be
used for animation and video game applications. The user grasps a soft,
squishable object in front of a camera that can be moved and deformed in
order to specify motion. Active Blobs, a non-rigid tracking technique
is used to recover the position, rotation and non-rigid deformations of
the object. The resulting transformations can be applied to a texture
mapped mesh, thus allowing the user to control it interactively. Our
use of texture mapping hardware allows us to make the system responsive
enough for interactive animation and video game character control.
::::::::::::::
1998-007
::::::::::::::
Title: Improved Tracking of Multiple Humans with Trajectory Predcition and Occlusion Modeling
Authors: Romer Rosales and Stan Sclaroff
Date: March 2, 1998
Abstract:
A combined 2D, 3D approach is presented that allows for robust tracking
of moving bodies in a given environment as observed via a single,
uncalibrated video camera. Tracking is robust even in the presence of
occlusions. Low-level features are often insufficient for detection,
segmentation, and tracking of non-rigid moving objects. Therefore, an
improved mechanism is proposed that combines low-level (image
processing) and mid-level (recursive trajectory estimation) information
obtained during the tracking process. The resulting system can segment
and maintain the tracking of moving objects before, during, and after
occlusion. At each frame, the system also extracts a stabilized
coordinate frame of the moving objects. This stabilized frame is used
to resize and resample the moving blob so that it can be used as input
to motion recognition modules. The approach enables robust tracking
without constraining the system to know the shape of the objects being
tracked beforehand; although, some assumptions are made about the
characterstics of the shape of the objects, and how they evolve with
time. Experiments in tracking moving people are described.
::::::::::::::
1998-008
::::::::::::::
Title: Determining Acceptance Possibility for a Quantum Computation is Hard for PH
Author: Stephen Fenner, University of Southern Maine
Frederic Green, Clark University
Steven Homer, Boston University
Randall Pruim, Boston University
Date: April 2, 1998
Abstract:
It is shown that determining whether a quantum computation has a
non-zero probability of accepting is at least as hard as the
polynomial time hierarchy. This hardness result also applies to
determining in general whether a given quantum basis state appears
with nonzero amplitude in a superposition, or whether a given quantum
bit has positive expectation value at the end of a quantum
computation.
::::::::::::::
1998-009
::::::::::::::
Title: Slack Stealing Job Admission Control Scheduling
Authors: Alia Atlas and Azer Bestavros
Date: May 2, 1998
Abstract:
In this paper, we present Slack Stealing Job Admission Control
(SSJAC)---a methodology for scheduling periodic firm-deadline tasks
with variable resource requirements, subject to controllable Quality
of Service (QoS) constraints. In a system that uses Rate Monotonic
Scheduling, SSJAC augments the slack stealing algorithm of Thuel et al
with an admission control policy to manage the variability in the
resource requirements of the periodic tasks. This enables SSJAC to
take advantage of the 31\% of utilization that RMS cannot use, as well
as any utilization unclaimed by jobs that are not admitted into the
system.
Using SSJAC, each task in the system is assigned a resource
utilization threshold that guarantees the minimal acceptable QoS for
that task (expressed as an upper bound on the rate of missed
deadlines). Job admission control is used to ensure that (1) only
those jobs that will complete by their deadlines are admitted, and (2)
tasks do not interfere with each other, thus a job can only monopolize
the slack in the system, but not the time guaranteed to jobs of other
tasks.
We have evaluated SSJAC against RMS and Statistical RMS (SRMS).
Ignoring overhead issues, SSJAC consistently provides better
performance than RMS in overload, and, in certain conditions, better
performance than SRMS. In addition, to evaluate optimality of SSJAC
in an absolute sense, we have characterized the performance of SSJAC
by comparing it to an inefficient, yet optimal scheduler for task sets
with harmonic periods.
::::::::::::::
1998-010
::::::::::::::
Title: Statistical Rate Monotonic Scheduling
Authors: Alia Atlas and Azer Bestavros
Date: May 2, 1998
Abstract:
In this paper we present Statistical Rate Monotonic Scheduling (SRMS),
a generalization of the classical RMS results of Liu and Layland
that allows scheduling periodic tasks with highly variable execution
times and statistical QoS requirements. Similar to RMS, SRMS
has two components: a feasibility test and a scheduling
algorithm. The feasibility test for SRMS ensures that using SRMS'
scheduling algorithms, it is possible for a given periodic task set
to share a given resource (e.g. a processor, communication
medium, switching device, etc.) in such a way that such sharing does
not result in the violation of any of the periodic tasks QoS
constraints.
The SRMS scheduling algorithm incorporates a number of unique
features. First, it allows for fixed priority scheduling that keeps
the tasks' value (or importance) independent of their
periods. Second, it allows for job admission control, which allows
the rejection of jobs that are not guaranteed to finish by their
deadlines as soon as they are released, thus enabling the system to
take necessary compensating actions. Also, admission control allows
the preservation of resources since no time is spent on jobs that
will miss their deadlines anyway. Third, SRMS integrates
reservation-based and best-effort resource scheduling seamlessly.
Reservation-based scheduling ensures the delivery of the minimal
requested QoS; best-effort scheduling ensures that unused, reserved
bandwidth is not wasted, but rather used to improve QoS
further. Fourth, SRMS allows a system to deal gracefully with
overload conditions by ensuring a fair deterioration in QoS across
all tasks---as opposed to penalizing tasks with longer periods, for
example. Finally, SRMS has the added advantage that its
schedulability test is simple and its scheduling algorithm has a
constant overhead in the sense that the complexity of the scheduler
is not dependent on the number of the tasks in the system.
We have evaluated SRMS against a number of alternative scheduling
algorithms suggested in the literature (e.g. RMS and slack
stealing), as well as refinements thereof, which we describe in this
paper. Consistently throughout our experiments, SRMS provided the
best performance. In addition, to evaluate the optimality of SRMS,
we have compared it to an inefficient, yet optimal scheduler
for task sets with harmonic periods.
::::::::::::::
1998-011
::::::::::::::
Title: Multiplexing VBR Traffic Flows with Guaranteed
Application-level QoS Using Statistical Rate Monotonic Scheduling
Authors: Alia Atlas and Azer Bestavros
Date: May 2, 1998
Abstract:
Quality of Service (QoS) guarantees are required by an increasing
number of applications to ensure a minimal level of fidelity in the
delivery of application data units through the network.
Application-level QoS does not necessarily follow from any
transport-level QoS guarantees regarding the delivery of the
individual cells (e.g. ATM cells) which comprise the application's
data units. The distinction between application-level and
transport-level QoS guarantees is due primarily to the fragmentation
that occurs when transmitting large application data units (e.g. IP
packets, or video frames) using much smaller network cells, whereby
the partial delivery of a data unit is useless; and, bandwidth spent
to partially transmit the data unit is wasted.
The data units transmitted by an application may vary in size while
being constant in rate, which results in a variable bit rate (VBR)
data flow. That data flow requires QoS guarantees. Statistical
multiplexing is inadequate, because no guarantees can be made and no
firewall property exists between different data flows. In this
paper, we present a novel resource management paradigm for the
maintenance of application-level QoS for VBR flows. Our paradigm is
based on Statistical Rate Monotonic Scheduling (SRMS), in which (1)
each application generates its variable-size data units at a fixed
rate, (2) the partial delivery of data units is of no value to the
application, and (3) the QoS guarantee extended to the application is
the probability that an arbitrary data unit will be successfully
transmitted through the network to/from the application.
::::::::::::::
1998-012
::::::::::::::
Title: The Statistical Rate Monotonic Scheduling Workbench
Authors: Alia Atlas and Azer Bestavros
Date: May 2, 1998
Abstract:
The SRMS Workbench is a software system developed to demonstrate the
notion of Statistical QoS employed in SRMS [AtlasBestavros:1998]. The
SRMS Workbench includes: (1) the SRMS schedulability analyzer (QoS
negotiator), and (2) a SRMS simulator (Basic SRMS + all
extensions). These two components are packaged into a Java Applet that
can be executed remotely on any Java-capable Internet browser. For
comparison, other scheduling algorithms, including RMS
[LiuLayland:1973] and SSJAC [AtlasBestavros:1998] are included.
Through a simple GUI, the SRMS Workbench allows users to specify a set
of periodic tasks, each with (a) its own period, (b) the
distributional characteristics of its periodic resource requirements
(e.g. Poisson, Pareto, Normal, Exponential, Gamma, etc.), (c) its
desired QoS as a lower bound on the percentage of deadlines to be met,
and (d) a criticality/importance index indicating the value of the
task (relative to other tasks in the task set). Once the task set is
specified, the SRMS Workbench allows the user to check for
schedulability under SRMS. If the task set is schedulable, the SRMS
Workbench generates the appropriate allowance for each task and allows
the user to create an animated simulation of the task system, which
can be executed and profiled. If the task set is not schedulable, the
SRMS Workbench informs the user of that fact and suggests (as part of
the QoS negotiation) an alternative set of feasible QoS requirements
that reflects the specified criticality/importance index of the tasks
in the task set.
The SRMS Workbench is available on the Web at
http://www.cs.bu.edu/groups/realtime/SRMSworkbench
::::::::::::::
1998-013
::::::::::::::
Title: Design and Implementation of SRMS in Kurt Linux
Authors: Alia Atlas and Azer Bestavros
Date: September 2, 1998
Abstract:
Statistical Rate Monotonic Scheduling (SRMS) is a generalization of
the classical RMS results of Liu and Layland \cite{ll:sched} for
periodic tasks with highly variable execution times and statistical
QoS requirements. The main tenet of SRMS is that the variability in
task resource requirements could be smoothed through aggregation to
yield guaranteed QoS. This aggregation is done over time for a given
task and across multiple tasks for a given period of time. Similar
to RMS, SRMS has two components: a feasibility test and a scheduling
algorithm. SRMS feasibility test ensures that it is possible for a
given periodic task set to share a given resource without violating
any of the statistical QoS constraints imposed on each task in the
set. The SRMS scheduling algorithm consists of two parts: a job
admission controller and a scheduler. The SRMS scheduler is a
simple, preemptive, fixed-priority scheduler. The SRMS job admission
controller manages the QoS delivered to the various tasks through
admit/reject and priority assignment decisions. In particular, it
ensures the important property of task isolation, whereby tasks do
not infringe on each other.
In this paper we present the design and implementation of SRMS within
the KURT Linux Operating System \cite{hspn:kurt,sphan:cots,srin:kurt}.
KURT Linux supports conventional tasks as well as real-time tasks.
It provides a mechanism for transitioning from normal
Linux scheduling to a mixed scheduling of conventional and real-time
tasks, and to a focused mode where only real-time tasks are scheduled.
We overview the technical issues that we had to overcome in order to
integrate SRMS into KURT Linux and present the API we have developed
for scheduling periodic real-time tasks using SRMS.
::::::::::::::
1998-014
::::::::::::::
Title: An Omniscient Scheduling Oracle for Systems with Harmonic Periods
Authors: Alia Atlas and Azer Bestavros
Date: September 2, 1998
Abstract:
Most real-time scheduling problems are known to be NP-complete.
To enable accurate comparison between the schedules of heuristic
algorithms and the optimal schedule, we introduce an omniscient
oracle. This oracle provides schedules for periodic task sets with
harmonic periods and variable resource requirements. Three different
job value functions are described and implemented. Each corresponds
to a different system goal.
The oracle is used to examine the performance of different on-line
schedulers under varying loads, including overload. We have compared
the oracle against Rate Monotonic Scheduling, Statistical Rate
Monotonic Scheduling, and Slack Stealing Job Admission Control
Scheduling. Consistently, the oracle provides an upper bound on
performance for the metric under consideration.
::::::::::::::
1998-015
::::::::::::::
Title: Principality and Decidable Type Inference
for Finite-Rank Intersection Types
Authors: A. J. Kfoury and J. B. Wells
Date: 6 November 1998
Abstract: Principality of typings is the property that for each
typable term, there is a typing from which all other typings are
obtained via some set of operations. Type inference is the problem
of finding a typing for a given term, if possible. We define an
intersection type system which has principal typings and types
exactly the strongly normalizable $\lambda$-terms. More interestingly,
every finite-rank restriction of this system (using Leivant's first
notion of rank) has principal typings and also has decidable type
inference. This is in contrast to System~F where the finite rank
restriction for every finite rank at 3 and above has neither principal
typings nor decidable type inference. This is also in contrast to
earlier presentations of intersection types where the status of these
properties is not known for the finite-rank restrictions at 3 and above.
Furthermore, the notion of principal typings for our system involves
only one operation, substitution, rather than several operations
(not all substitution-based) as in earlier presentations of
principality for intersection types (of unrestricted rank).
A unification-based type inference algorithm is presented using a
new form of unification, $\beta$-unification.
::::::::::::::
1998-016
::::::::::::::
Title: A Performance Evaluation of Hyper Text Transfer Protocols
Authors: Paul Barford and Mark Crovella
Date: 10/23/98
Abstract:
Version 1.1 of the Hyper Text Transfer Protocol (HTTP) was principally
developed as a means for reducing both document transfer latency and
network traffic. The rationale for the performance enhancements in HTTP/1.1
is based on the assumption that the network is the bottleneck in Web
transactions. In practice, however, the Web server can be the primary
source of document transfer latency. In this paper, we characterize and
compare the performance of HTTP/1.0 and HTTP/1.1 in terms of throughput at
the server and transfer latency at the client. Our approach
is based on considering a broader set of bottlenecks in an HTTP transfer;
we examine how bottlenecks in the network, CPU, and in the disk system
affect the relative performance of HTTP/1.0 versus HTTP/1.1. We show that
the network demands under HTTP/1.1 are somewhat lower than HTTP/1.0, and we
quantify those differences in terms of packets transferred, server
congestion window size and data bytes per packet. We show that when the
CPU is the bottleneck, there is relatively little difference in performance
between HTTP/1.0 and HTTP/1.1. Surprisingly, we show that when the disk
system is the bottleneck, performance using HTTP/1.1 can be much worse
than with HTTP/1.0. Based on these observations, we suggest a connection
management policy for HTTP/1.1 that can improve throughput, decrease
latency, and keep network traffic low when the disk system is the bottleneck.
::::::::::::::
1998-017
::::::::::::::
Title: Deformable Shape Detection and Description via Model-Based Region
Grouping
Authors: Lifeng Liu and Stan Sclaroff
Date: December 4, 1998
Abstract:
A method for deformable shape detection and recognition is described.
Deformable shape templates are used to partition the image into a
globally consistent interpretation, determined in part by the minimum
description length principle. Statistical shape models enforce the
prior probabilities on global, parametric deformations for each object
class. Once trained, the system autonomously segments deformed shapes
from the background, while not merging them with adjacent objects or
shadows. The formulation can be used to group image regions based on
any image homogeneity predicate; e.g., texture, color, or motion. The
recovered shape models can be used directly in object recognition.
Experiments with color imagery are reported.
Note: This TR supercedes BUCS-TR-1997-019
::::::::::::::
1998-018
::::::::::::::
Title: Fast, Reliable Head Tracking under Varying Illumination
Date: December 4, 1998
Authors: Marco La Cascia and Stan Sclaroff
Abstract:
An improved technique for 3D head tracking under varying illumination
conditions is proposed. The head is modeled as a texture mapped
cylinder. Tracking is formulated as an image registration problem in the
cylinder's texture map image. To solve the registration problem in the
presence of lighting variation and head motion, the residual error of
registration is modeled as a linear combination of texture warping
templates and orthogonal illumination templates. Fast and stable on-line
tracking is then achieved via regularized, weighted least squares
minimization of the registration error. The regularization term tends to
limit potential ambiguities that arise in the warping and illumination
templates. It enables stable tracking over extended sequences. Tracking
does not require a precise initial fit of the model; the system is
initialized automatically using a simple 2-D face detector. The only
assumption is that the target is facing the camera in the first frame of
the sequence. The warping templates are computed at the first frame of
the sequence. Illumination templates are precomputed off-line over a
training set of face images collected under varying lighting conditions.
Experiments in tracking are reported.
::::::::::::::
1998-019
::::::::::::::
Title: 3D Trajectory Recovery for Tracking Multiple Objects and
Trajectory Guided Recognition of Actions
Date: December 4, 1998
Authors: Romer Rosales and Stan Sclaroff
Abstract:
A mechanism is proposed that integrates low-level (image
processing), mid-level (recursive 3D trajectory estimation), and
high-level (action recognition) processes. It is assumed that the
system observes multiple moving objects via a single, uncalibrated
video camera. A novel extended Kalman filter formulation is used in
estimating the relative 3D motion trajectories up to a scale
factor. The recursive estimation process provides a prediction and
error measure that is exploited in higher-level stages of action
recognition. Conversely, higher-level mechanisms provide feedback
that allows the system to reliably segment and maintain the tracking
of moving objects before, during, and after occlusion. The 3D
trajectory, occlusion, and segmentation information are utlized in
extracting stabilized views of the moving object. Trajectory-guided
recognition (TGR) is proposed as a new and efficient method for
adaptive classification of action. The TGR approach is demonstrated
using ``motion history images'' that are then recognized via a mixture
of Gaussian classifier. The system was tested in recognizing
various dynamic human outdoor activities; e.g., running, walking,
roller blading, and cycling. Experiments with synthetic data sets are
used to evaluate stability of the trajectory estimator with respect to
noise.
::::::::::::::
1998-020
::::::::::::::
Title: Recognition of Human Action Using Moment-Based Features
Author: Romer Rosales
Date: December 4, 1998
Abstract:
The performance of different classification approaches is evaluated
using a view-based approach for motion representation. The view-based
approach uses computer vision and image processing techniques to
register and process the video sequence. Two motion representations
called Motion Energy Images and Motion History Image are then
constructed. These representations collapse the temporal component in a
way that no explicit temporal analysis or sequence matching is needed.
Statistical descriptions are then computed using moment-based features
and dimensionality reduction techniques. For these tests, we used 7 Hu
moments, which are invariant to scale and translation. Principal
Components Analysis is used to reduce the dimensionality of this
representation. The system is trained using different subjects
performing a set of examples of every action to be recognized. Given
these samples, K-nearest neighbor, Gaussian, and Gaussian mixture
classifiers are used to recognize new actions. Experiments are conducted
using instances of eight human actions (i.e., eight classes) performed
by seven different subjects. Comparisons in the performance among these
classifiers under different conditions are analyzed and reported. Our
main goals are to test this dimensionality-reduced representation of
actions, and more importantly to use this representation to compare the
advantages of different classification approaches in this recognition
task.
::::::::::::::
1998-021
::::::::::::::
Leonid Sigal, Vassilis Athitsos, and Stan Sclaroff. "Estimation and
Prediction of Evolving Color Distributions for Skin Segmentation Under
Varying Illumination"
Abstract:
A novel approach for real-time skin segmentation in video sequences is
described. The approach enables reliable skin segmentation despite
wide variation in illumination during a tracking sequence. An
explicit second order Markov model is used to predict the evolution of
the skin color (HSV) histogram over time. Histograms are dynamically
updated based on feedback from the current segmentation and based on
predictions of the Markov model. The evolution of the skin color
distribution at each frame is parameterized by translation, scaling
and rotation in color space. Consequent changes in the geometric
parameterization of the distribution are propagated by warping and
resampling of the histogram. The parameters of the discrete-time
dynamic Markov model are estimated using Maximum Likelihood
Estimation, and also evolve over time. The likelihood of each pixel
being skin or background can be measured directly from the probability
density function of the histogram. Connected components analysis and
size filtering are used to extract the patches of skin from the
segmented image. Segmentation accuracy is evaluated using labeled
ground-truth video sequences taken from popular movies, and results
are encouraging.
::::::::::::::
1998-022
::::::::::::::
3.) Stan Sclaroff and John Isidoro, "Active Blobs: Region-Based,
Deformable Appearance Models."
::::::::::::::
1998-023
::::::::::::::
Title: Changes in Web Client Access Patterns: Characteristics and Caching Implications
Authors: Paul Barford, Azer Bestavros, Adam Bradley, and Mark Crovella
Date: December 4, 1998
Abstract:
Understanding the nature of the workloads and system demands created by
users of the World Wide Web is crucial to properly designing and
provisioning Web services. Previous measurements of Web client
workloads have been shown to exhibit a number of characteristic
features; however, it is not clear how those features may be changing
with time. In this study we compare two measurements of Web client
workloads separated in time by three years, both captured from the same
computing facility at Boston University. The older dataset, obtained in
1995, is well-known in the research literature and has been the basis
for a wide variety of studies. The newer dataset was captured in 1998
and is comparable in size to the older dataset. The new dataset has the
drawback that the collection of users measured may no longer be
representative of general Web users; however using it has the advantage
that many comparisons can be drawn more clearly than would be possible
using a new, different source of measurement. Our results fall into two
categories. First we compare the statistical and distributional
properties of Web requests across the two datasets. This serves to
reinforce and deepen our understanding of the characteristic statistical
properties of Web client requests. We find that the kinds of
distributions that best describe document sizes have not changed between
1995 and 1998, although specific values of the distributional parameters
are different. Second, we explore the question of how the observed
differences in the properties of Web client requests, particularly the
popularity and temporal locality properties, affect the potential for
Web file caching in the network. We find that for the computing
facility represented by our traces between 1995 and 1998, (1) the
benefits of using size-based caching policies have diminished; and (2)
the potential for caching requested files in the network has declined.
::::::::::::::
1999-001
::::::::::::::
Title: Load Balancing a Cluster of Web Servers using Distributed Packet Rewriting
Authors: Luis Aversa and Azer Bestavros
Date: January 6, 1999
Abstract:
In this paper, we propose and evaluate an implementation of a
prototype scalable web server. The prototype consists of a
load-balanced cluster of hosts that collectively accept and service
TCP connections. The host IP addresses are advertised using the Round
Robin DNS technique, allowing any host to receive requests from any
client. Once a client attempts to establish a TCP connection with one
of the hosts, a decision is made as to whether or not the connection
should be redirected to a different host---namely, the host with the
lowest number of established connections. We use the low-overhead
Distributed Packet Rewriting (DPR) technique to redirect TCP
connections. In our prototype, each host keeps information about
connections in hash tables and linked lists. Every time a packet
arrives, it is examined to see if it has to be redirected or not. Load
information is maintained using periodic broadcasts amongst the
cluster hosts.
::::::::::::::
1999-002
::::::::::::::
Title: Trajectory Guided Tracking and Recognition of Actions
Authors: Romer Rosales and Stan Sclaroff
Date: March 9, 1999
Abstract:
A combined 2D, 3D approach is presented that allows for robust
tracking of moving people and recognition of actions. It is assumed
that the system observes multiple moving objects via a single,
uncalibrated video camera. Low-level features are often insufficient
for detection, segmentation, and tracking of non-rigid moving
objects. Therefore, an improved mechanism is proposed that integrates
low-level (image processing), mid-level (recursive 3D trajectory
estimation), and high-level (action recognition) processes. A novel
extended Kalman filter formulation is used in estimating the relative
3D motion trajectories up to a scale factor. The recursive estimation
process provides a prediction and error measure that is exploited in
higher-level stages of action recognition. Conversely, higher-level
mechanisms provide feedback that allows the system to reliably segment
and maintain the tracking of moving objects before, during, and after
occlusion. The 3D trajectory, occlusion, and segmentation information
are utilized in extracting stabilized views of the moving object that
are then used as input to action recognition modules.
Trajectory-guided recognition (TGR) is proposed as a new and efficient
method for adaptive classification of action. The TGR approach is
demonstrated using ``motion history images'' that are then recognized
via a mixture-of-Gaussians classifier. The system was tested in
recognizing various dynamic human outdoor activities: running,
walking, roller blading, and cycling. Experiments with real and
synthetic data sets are used to evaluate stability of the trajectory
estimator with respect to noise.
(This technical report supercedes TR's [98-020] and [98-007])
::::::::::::::
1999-003
::::::::::::::
Title: Connection Scheduling in Web Servers
Authors: Mark E. Crovella, Robert Frangioso, and Mor Harchol-Balter
Date: March 31, 1999
Abstract:
Under high loads, a Web server may be servicing many hundreds of
connections concurrently. In traditional Web servers, the question of
the order in which concurrent connections are serviced has been left to
the operating system. In this paper we ask whether servers might
provide better service by using non-traditional service ordering. In
particular, for the case when a Web server is serving static files, we
examine the costs and benefits of a policy that gives preferential
service to short connections. We start by assessing the scheduling
behavior of a commonly used server (Apache running on Linux) with
respect to connection size and show that it does not appear to provide
preferential service to short connections. We then examine the
potential performance improvements of a policy that does favor short
connections (shortest-connection-first). We show that
mean response time can be improved by factors of four or five under
shortest-connection-first, as compared to an (Apache-like)
size-independent policy. Finally we assess the costs of
shortest-connection-first scheduling in terms of unfairness (
i.e., the degree to which long connections suffer). We show
that under shortest-connection-first scheduling, long connections pay
very little penalty. This surprising result can be understood as a
consequence of heavy-tailed Web server workloads, in which most connections
are small, but most server load is due to the few large connections.
We support this explanation using analysis.
::::::::::::::
1999-004
::::::::::::::
Title: Measuring Web Performance in the Wide Area
Author: Paul Barford and Mark Crovella
Date: April 23, 1999
Abstract:
One of the most vexing questions facing researchers interested in
the World Wide Web is why users often experience long delays in
document retrieval. The Internet's size, complexity, and continued
growth make this a difficult question to answer. We describe the Wide
Area Web Measurement project (WAWM) which uses an infrastructure
distributed across the Internet to study Web performance. The
infrastructure enables simultaneous measurements of Web client
performance, network performance and Web server performance. The
infrastructure uses a Web traffic generator to create representative
workloads on servers, and both active and passive tools to measure
performance characteristics. Initial results based on a prototype
installation of the infrastructure are presented in this paper.
::::::::::::::
1999-005
::::::::::::::
Title: Fast, Reliable Head Tracking under Varying Illumination: An Approach
Based on Registration of Texture-Mapped 3D Models
by: Marco La Cascia, Stan Sclaroff, and Vassilis Athitsos
supercedes: BU-TR-98-018 and BU-TR-97-020
Abstract:
An improved technique for 3D head tracking under varying
illumination conditions is proposed. The head is modeled as a texture
mapped cylinder. Tracking is formulated as an image registration
problem in the cylinder's texture map image. The resulting dynamic
texture map provides a stabilized view of the face that can be used as
input to many existing 2D techniques for face recognition, facial
expressions analysis, lip reading, and eye tracking. To solve the
registration problem in the presence of lighting variation and head
motion, the residual error of registration is modeled as a linear
combination of texture warping templates and orthogonal illumination
templates. Fast and stable on-line tracking is achieved via
regularized, weighted least squares minimization of the registration
error. The regularization term tends to limit potential ambiguities
that arise in the warping and illumination templates. It enables
stable tracking over extended sequences. Tracking does not require a
precise initial fit of the model; the system is initialized
automatically using a simple 2D face detector. The only assumption is
that the target is facing the camera in the first frame of the
sequence. The formulation is tailored to take advantage of texture
mapping hardware available in many workstations, PC's, and game
consoles. The non-optimized implementation runs at about 15 frames per
second on a SGI O2 graphic workstation. Extensive experiments
evaluating the effectiveness of the formulation are reported. The
sensitivity of the technique to illumination, regularization
parameters, errors in the initial positioning and internal camera
parameters are analyzed. Examples and applications of tracking are
reported.
::::::::::::::
1999-006
::::::::::::::
Title: Non-Rigid Shape from Image Streams
Authors: Stan Sclaroff and Jonathan Alon
Date: July 27, 1999
Abstract:
We present a framework for estimating 3D relative structure
(shape) and motion given objects undergoing nonrigid deformation as
observed from a fixed camera, under perspective projection. Deforming
surfaces are approximated as piece-wise planar, and piece-wise rigid.
Robust registration methods allow tracking of corresponding image
patches from view to view and recovery of 3D shape despite occlusions,
discontinuities, and varying illumination conditions. Many relatively
small planar/rigid image patch trackers are scattered throughout the
image; resulting estimates of structure and motion at each patch are
combined over local neighborhoods via an oriented particle systems
formulation. Preliminary experiments have been conducted on real
image sequences of deforming objects and on synthetic sequences where
ground truth is known.
::::::::::::::
1999-007
::::::::::::::
Title: Combinations of Deformable Shape Prototypes
Authors: Saratendu Sethi and Stan Sclaroff
Date: July 27, 1999
We propose to investigate a model-based technique for encoding
non-rigid object classes in terms of object prototypes. Objects from
the same class can be parameterized by identifying shape and appearance
invariants of the class to devise low-level representations. The
approach presented here creates a flexible model for an object class
from a set of prototypes. This model is then used to estimate the
parameters of low-level representation of novel objects as
combinations of the prototype parameters. Variations in the object
shape are modeled as non-rigid deformations. Appearance variations are
modeled as intensity variations. In the training phase, the system is
presented with several example prototype images. These prototype
images are registered to a reference image by a finite element-based
technique called Active Blobs. The deformations of the finite
element model to register a prototype image with the reference image
provide the shape description or shape vector for the
prototype. The shape vector for each prototype, is then used to warp
the prototype image onto the reference image and obtain the
corresponding texture vector. The prototype texture vectors,
being warped onto the same reference image have a pixel by pixel
correspondence with each other and hence are ``shape normalized''.
Given sufficient number of prototypes that exhibit appropriate
in-class variations, the shape and the texture vectors define a linear
prototype subspace that spans the object class. Each prototype is a
vector in this subspace. The matching phase involves the estimation of
a set of combination parameters for synthesis of the novel object by
combining the prototype shape and texture vectors. The strengths of
this technique lie in the combined estimation of both shape and
appearance parameters. This is in contrast with the previous
approaches where shape and appearance parameters were estimated
separately.
::::::::::::::
1999-008
::::::::::::::
Title: Optimal Scheduling of Secondary Content for Aggregation in Video-on-Demand Systems
Authors: P. Basu, A. Narayanan, W. Ke, T.D.C. Little, and A. Bestavros
Date: July 27, 1999
Abstract:
Dynamic service aggregation techniques can exploit skewed access
popularity patterns to reduce the costs of building interactive VoD
systems. These schemes seek to cluster and merge users into single
streams by bridging the temporal skew between them, thus improving
server and network utilization. Rate adaptation and secondary content
insertion are two such schemes.
In this paper, we present and evaluate an optimal scheduling algorithm
for inserting secondary content in this scenario. The algorithm runs
in polynomial time, and is optimal with respect to the total bandwidth
usage over the merging interval. We present constraints on content
insertion which make the overall QoS of the delivered stream
acceptable, and show how our algorithm can satisfy these
constraints. We report simulation results which quantify the excellent
gains due to content insertion. We discuss dynamic scenarios with user
arrivals and interactions, and show that content insertion reduces the
channel bandwidth requirement to almost half. We also discuss
differentiated service techniques, such as N-VoD and premium
no-advertisement service, and show how our algorithm can support these
as well.
(This report is cross listed as BU ECE Department Technical
Report: TR-12-16-98)
::::::::::::::
1999-009
::::::::::::::
Title: Popularity-Aware GreedyDual-Size Web Proxy Caching Algorithms
Authors: Shudong Jin and Azer Bestavros
Date: August 21, 1999
Abstract:
Web caching aims to reduce network traffic, server load, and
user-perceived retrieval delays by replicating ``popular'' content
on proxy caches that are strategically placed within the
network. While key to effective cache utilization, popularity
information (e.g. relative access frequencies of objects
requested through a proxy) is seldom incorporated directly in
cache replacement algorithms. Rather, other properties of the
request stream (e.g. temporal locality and content size), which are
easier to capture in an on-line fashion, are used to
indirectly infer popularity information, and hence drive cache
replacement policies. Recent studies suggest that the correlation
between these secondary properties and popularity is weakening due
in part to the prevalence of efficient client and proxy caches
(which tend to mask these correlations). This trend points to the
need for proxy cache replacement algorithms that directly capture
and use popularity information.
In this paper, we (1) present an on-line algorithm that effectively
captures and maintains an accurate popularity profile of Web objects
requested through a caching proxy, (2) propose a novel cache
replacement policy that uses such information to generalize the
well-known GreedyDual-Size algorithm, and (3) show the superiority
of our proposed algorithm by comparing it to a host of
recently-proposed and widely-used algorithms using extensive
trace-driven simulations and a variety of performance metrics.
::::::::::::::
1999-010
::::::::::::::
Title: A Fully Distributed Location Management Scheme for Large PCS
Authors: Karunaharan Ratnam (Northeastern University), Ibrahim Matta (Boston University), and Sampath Rangarajan (Bell Laboratories)
Date: August 24, 1999
Abstract:
In [previous papers] we presented the design, specification and proof
of correctness of a fully distributed location management scheme for
PCS networks and argued that fully replicating location information is
both appropriate and efficient for small PCS networks. In this paper,
we analyze the performance of this scheme. Then, we extend the scheme
in a hierarchical environment so as to scale to large PCS networks.
Through extensive numerical results, we show the superiority of our
scheme compared to the current IS-41 standard.
::::::::::::::
1999-011
::::::::::::::
Title: BU Computer Science 1998 Proxy Trace
Author: Adam D. Bradley
Date: September 7, 1999
Abstract:
In a recent paper (Changes in Web Client Access Patterns:
Characteristics and Caching Implications by Barford, Bestavros,
Bradley, and Crovella) we performed a variety of analyses upon user
traces collected in the Boston University Computer Science department
in 1995 and 1998. A sanitized version of the 1995 trace has been
publicly available for some time; the 1998 trace has now been
sanitized, and is available from:
http://www.cs.bu.edu/techreports/1999-011-usertrace-98.gz
ftp://ftp.cs.bu.edu/techreports/1999-011-usertrace-98.gz
This memo discusses the format of this public version of the log,
and includes additional discussion of how the data was collected,
how the log was sanitized, what this log is and is not useful for,
and areas of potential future research interest.
::::::::::::::
1999-012
::::::::::::::
Title: Adaptive Reliable Multicast
Authors: Jaehee Yoon, Azer Bestavros, and Ibrahim Matta
Date: September 15, 1999
Abstract:
An increasing number of applications, such as distributed interactive
simulation, live auctions, distributed games and collaborative
systems, require the network to provide a reliable multicast
service. This service enables one sender to reliably transmit data
to multiple receivers. Reliability is traditionally achieved by
having receivers send negative acknowledgments (NACKs) to request
from the sender the retransmission of lost (or missing) data
packets. However, this Automatic Repeat reQuest (ARQ) approach
results in the well-known NACK implosion problem at the
sender. Many reliable multicast protocols have been recently
proposed to reduce NACK implosion. But, the message overhead due to
NACK requests remains significant. Another approach, based on
Forward Error Correction (FEC), requires the sender to encode
additional redundant information so that a receiver can
independently recover from losses. However, due to the lack of
feedback from receivers, it is impossible for the sender to
determine how much redundancy is needed.
In this paper, we propose a new reliable multicast protocol, called
ARM for Adaptive Reliable Multicast. Our protocol integrates
ARQ and FEC techniques. The objectives of ARM are (1) reduce the
message overhead due to NACK requests, (2) reduce the amount of data
transmission, and (3) reduce the time it takes for all receivers to
receive the data intact (without loss). During data transmission,
the sender periodically informs the receivers of the number of
packets that are yet to be transmitted. Based on this information,
each receiver predicts whether this amount is enough to recover its
losses. Only if it is not enough, that the receiver requests the
sender to encode additional redundant packets. Using ns
simulations, we show the superiority of our hybrid ARQ-FEC protocol
over the well-known Scalable Reliable Multicast (SRM) protocol.
::::::::::::::
1999-013
::::::::::::::
Title: Search Space Reduction in QoS Routing
Authors: Liang Guo and Ibrahim Matta
Date: October 8, 1999
Abstract:
To provide real-time service or engineer constrained-based paths,
networks require the underlying routing algorithm to be able to find
low-cost paths that satisfy given Quality-of-Service (QoS)
constraints. However, the problem of constrained shortest (least-cost)
path routing is known to be NP-hard, and some heuristics have been
proposed to find a near-optimal solution. However, these heuristics
either impose relationships among the link metrics to reduce the
complexity of the problem which may limit the general applicability of
the heuristic, or are too costly in terms of execution time to be
applicable to large networks. In this paper, we focus on solving the
delay-constrained minimum-cost path problem, and present a fast
algorithm to find a near-optimal solution. This algorithm, called
DCCR (for Delay-Cost-Constrained Routing), is a variant of the
k-shortest path algorithm. DCCR uses a new adaptive path weight
function together with an additional constraint imposed on the path
cost, to restrict the search space. Thus, DCCR can return a
near-optimal solution in a very short time. Furthermore, we use the
method proposed by Blokh and Gutin to further reduce the search space
by using a tighter bound on path cost. This makes our algorithm more
accurate and even faster. We call this improved algorithm SSR+DCCR
(for Search Space Reduction+DCCR). Through extensive simulations, we
confirm that SSR+DCCR performs very well compared to the optimal but
very expensive solution.
* This technical report revises TR NU-CCS-98-09.
::::::::::::::
1999-014
::::::::::::::
Title: Temporal Locality in Web Request Streams: Sources,
Characteristics, and Caching Implications
Authors: Shudong Jin and Azer Bestavros
Date: October 10, 1999
Abstract:
Temporal locality of reference in Web request streams emerges from two
distinct phenomena: the popularity of Web objects and the {\em
temporal correlation} of requests. Capturing these two elements of
temporal locality is important because it enables cache replacement
policies to adjust how they capitalize on temporal locality based
on the relative prevalence of these phenomena. In this paper, we
show that temporal locality metrics proposed in the literature are
unable to delineate between these two sources of temporal
locality. In particular, we show that the commonly-used
distribution of reference interarrival times is predominantly
determined by the power law governing the popularity of documents
in a request stream.
To capture (and more importantly quantify) both sources of temporal
locality in a request stream, we propose a new and robust metric
that enables accurate delineation between locality due to
popularity and that due to temporal correlation. Using this metric,
we characterize the locality of reference in a number of
representative proxy cache traces. Our findings show that there are
measurable differences between the degrees (and sources) of
temporal locality across these traces, and that these differences
are effectively captured using our proposed metric. We illustrate
the significance of our findings by summarizing the performance of
a novel Web cache replacement policy---called GreedyDual*---which
exploits both long-term popularity and short-term temporal
correlation in an adaptive fashion. Our trace-driven simulation
experiments (which are detailed in an accompanying Technical
Report) show the superior performance of GreedyDual* when compared
to other Web cache replacement policies.
::::::::::::::
1999-015
::::::::::::::
Title: Estimation and Prediction of Evolving Color Distributions for Skin Segmentation Under Varying Illumination
Authors: Leonid Sigal and Stan Sclaroff
Date: December 1, 1999
Abstract:
A novel approach for real-time skin segmentation in video sequences is
described. The approach enables reliable skin segmentation despite
wide variation in illumination during tracking. An explicit second
order Markov model is used to predict evolution of the skin color
(HSV) histogram over time. Histograms are dynamically updated based
on feedback from the current segmentation and based on predictions of
the Markov model. The evolution of the skin color distribution at
each frame is parameterized by translation, scaling and rotation in
color space. Consequent changes in geometric parameterization of the
distribution are propagated by warping and re-sampling the
histogram. The parameters of the discrete-time dynamic Markov model
are estimated using Maximum Likelihood Estimation, and also evolve
over time. Quantitative evaluation of the method was conducted on
labeled ground-truth video sequences taken from popular movies.
::::::::::::::
1999-016
::::::::::::::
Title: Recursive Estimation of Motion and Planar Structure
Authors: Jonathan Alon and Stan Sclaroff
Date: December 1, 1999
Abstract:
A specialized formulation of Azarbayejani and Pentland's framework for
recursive recovery of motion, structure and focal length from feature
correspondences tracked through an image sequence is presented. The
specialized formulation addresses the case where all tracked points
lie on a plane. This planarity constraint reduces the dimension of the
original state vector, and consequently the number of feature points
needed to estimate the state. Experiments with synthetic data and real
imagery illustrate the system performance. The experiments confirm
that the specialized formulation provides improved accuracy, stability
to observation noise, and rate of convergence in estimation for the
case where the tracked points lie on a plane.
::::::::::::::
1999-017
::::::::::::::
Title: Inferring Body Pose without Tracking Body Parts
Authors: Romer Rosales and Stan Sclaroff
Date: December 1, 1999
Abstract:
A novel approach for estimating articulated body posture and motion
from monocular video sequences is proposed. Human pose is defined as
the instantaneous two dimensional configuration (i.e., the projection
onto the image plane) of a single articulated body in terms of the
position of a predetermined set of joints. First, statistical
segmentation of the human bodies from the background is performed and
low-level visual features are found given the segmented body
shape. The goal is to be able to map these, generally low level,
visual features to body configurations. The system estimates
different mappings, each one with a specific cluster in the visual
feature space. Given a set of body motion sequences for training,
unsupervised clustering is obtained via the Expectation Maximation
algorithm. Then, for each of the clusters, a function is estimated to
build the mapping between low-level features to 3D pose. Currently
this mapping is modeled by a neural network. Given new visual
features, a mapping from each cluster is performed to yield a set of
possible poses. From this set, the system selects the most likely pose
given the learned probability distribution and the visual feature
similarity between hypothesis and input. Performance of the proposed
approach is characterized using a new set of known body postures,
showing promising results.
::::::::::::::
1999-018
::::::::::::::
Title: SomeCast: A Paradigm for Real-Time Adaptive Reliable Multicast
Authors: Jaehee Yoon, Azer Bestavros, and Ibrahim Matta
Date: December 10, 1999
Abstract:
SomeCast is a novel paradigm for the reliable multicast of real-time
data to a large set of receivers over the Internet. SomeCast is
receiver-initiated and thus scalable in
the number of receivers, the diverse characteristics of paths
between senders and receivers (e.g. maximum bandwidth and
round-trip-time), and the dynamic conditions of such paths
(e.g. congestion-induced delays and losses). SomeCast enables
receivers to dynamically adjust the rate at which they receive
multicast information to enable the satisfaction of real-time QoS
constraints (e.g. rate, deadlines, or jitter). This is done by
enabling a receiver to join SOME number of concurrent
multiCAST sessions, whereby each session delivers a portion of
an encoding of the real-time data. By adjusting the number of such
sessions dynamically, client-specific QoS constraints can be met
independently. The SomeCast paradigm can be thought of as a
generalization of the AnyCast (e.g. Dynamic Server Selection) and
ManyCast (e.g. Digital Fountain) paradigms,
which have been proposed in the literature to address issues of
scalability of UniCast and MultiCast environments, respectively.
In this paper we overview the SomeCast paradigm, describe an instance
of a SomeCast protocol, and present simulation results that quantify
the significant advantages gained from adopting such a protocol for
the reliable multicast of data to a diverse set of receivers subject
to real-time QoS constraints.
::::::::::::::
1999-019
::::::::::::::
Title: BU/NSF Workshop on Internet Measurement Instrumentation and Characterization
Authors: Azer Bestavros, John Byers, and Mark Crovella (PIs and co-organizers)
Paul Barford, Ibrahim Matta, and Michael Mitzenmacher (co-organizers)
Date: December 15, 1999
Abstract:
Because of its growth in size, scope, and complexity---as well as its
increasingly central role in society---the Internet has become an
important object of study and evaluation. Many significant innovations
in the networking community in recent years have been directed at
obtaining a more accurate understanding of the fundamental behavior of
the complex system that is the Internet. These innovations have come
in the form of better models of components of the system, better tools
which enable us to measure the performance of the system more
accurately, and new techniques coupled with performance evaluation
which have delivered better system utilization. The continued
development and improvement of our understanding of the properties of
the Internet is essential to guide designers of hardware, protocols,
and applications for the next decade of Internet growth.
As a research community, an important next step involves an
comprehensive look at the challenges that lie ahead in this area. This
includes an an evaluation of both the current unsolved challenges and
the upcoming challenges the Internet will present us with in the near
future, and a discussion of the promising new techniques that
innovators in the field are currently developing. To this end, the
Networking Research Group at Boston University, with support from the
National Science Foundation, organized a one-day workshop which was
held at Boston University on Monday, August 30, 1999. This report
summarizes the technical presentations and discussions that took place
during that workshop.
::::::::::::::
2000-001
::::::::::::::
Title: Faithful Translations between Polyvariant Flows and Polymorphic Types
Author: Torben Amtoft (Boston University) and Franklyn Turbak (Wellesley College)
Abstract:
Recent work has shown equivalences between various type systems and
flow logics. Ideally, the translations upon which such equivalences
are based should be faithful in the sense that information is not lost
in round-trip translations from flows to types and back or from types
to flows and back. Building on the work of Nielson & Nielson and of
Palsberg & Pavlopoulou, we present the first faithful translations
between a class of finitary polyvariant flow analyses and a type
system supporting polymorphism in the form of intersection and union
types. Additionally, our flow/type correspondence solves several open
problems posed by Palsberg & Pavlopoulou: (1) it expresses call-string
based polyvariance (such as k-CFA) as well as argument based
polyvariance; (2) it enjoys a subject reduction property for flows as
well as for types; and (3) it supports a flow-oriented perspective
rather than a type-oriented one.
::::::::::::::
2000-002
::::::::::::::
Title: Determining Acceptance Possibility for a Quantum Computation is Hard for the Polynomial Hierarchy
Authors: Fenner, Stephen; Green, Frederic; Homer, Steven and Pruim, Randall
Date: Jan 20, 2000
Abstract:
It is shown that determining whether a quantum computation has a
non-zero probability of accepting is at least as hard as the
polynomial time hierarchy. This hardness result also applies to
determining in general whether a given quantum basis state appears
with nonzero amplitude in a superposition, or whether a given quantum
bit has positive expectation value at the end of a quantum
computation. This result is achieved by showing that the complexity
class NQP of Adleman, Demarrais, and Huang, a quantum analog of NP, is
equal to the counting class $co-C equals P$.
::::::::::::::
2000-003
::::::::::::::
Title: On the Complexity of Quantum ACC
Authors: Green, Frederic; Homer, Steven; and Pollett, Christopher
Date: Jan 20, 2000
Abstract:
For any q > 1, let MOD_q be a quantum gate that determines if the
number of 1's in the input is divisible by q. We show that for any
q,t > 1, MOD_q is equivalent to MOD_t (up to constant depth). Based
on the case q=2, Moore has shown that quantum analogs of AC^(0),
ACC[q], and ACC, denoted QAC^(0)_wf, QACC[2], QACC respectively,
define the same class of operators, leaving q > 2 as an open
question. Our result resolves this question, implying that QAC^(0)_wf
= QACC[q] = QACC for all q. We also prove the first upper bounds for
QACC in terms of related language classes. We define classes of
languages EQACC, NQACC (both for arbitrary complex amplitudes) and
BQACC (for rational number amplitudes) and show that they are all
contained in TC^(0). To do this, we show that a TC^(0) circuit can
keep track of the amplitudes of the state resulting from the
application of a QACC operator using a constant width polynomial size
tensor sum. In order to accomplish this, we also show that TC^(0) can
perform iterated addition and multiplication in certain field
extensions.
::::::::::::::
2000-004
::::::::::::::
Title: On the Origin of Power Laws in Internet Topologies
Authors: Alberto Medina, Ibrahim Matta, and John Byers
Date: January 21, 2000
Abstract:
Recent empirical studies have shown that Internet topologies exhibit power
laws of the form $y = x^\alpha$ for the following relationships: (P1)
outdegree of node (domain or router) versus rank; (P2) number of nodes
versus outdegree; (P3) number of node pairs within a neighborhood versus
neighborhood size (in hops); and (P4) eigenvalues of the adjacency matrix
versus rank. However, causes for the appearance of such power laws have
not been convincingly given. In this paper, we examine four factors in the
formation of Internet topologies. These factors are (F1) preferential
connectivity of a new node to existing nodes; (F2) incremental growth of
the network; (F3) distribution of nodes in space; and (F4) locality of edge
connections. In synthetically generated network topologies, we study the
relevance of each factor in causing the aforementioned power laws as well
as other properties, namely diameter, average path length and clustering
coefficient. Different kinds of network topologies are generated: (T1)
topologies generated using our parametrized generator, we call BRITE; (T2)
random topologies generated using the well-known Waxman model; (T3)
Transit-Stub topologies generated using GT-ITM tool; and (T4) regular grid
topologies. We observe that some generated topologies may not obey power
laws P1 and P2. Thus, the existence of these power laws can be used to
validate the accuracy of a given tool in generating representative Internet
topologies. Power laws P3 and P4 were observed in nearly all considered
topologies, but different topologies showed different values of the power
exponent $\alpha$. Thus, while the presence of power laws P3 and P4 do not
give strong evidence for the representativeness of a generated topology,
the value of $\alpha$ in P3 and P4 can be used as a litmus test for the
representativeness of a generated topology. We also find that factors F1
and F2 are the key contributors in our study which provide the resemblance
of our generated topologies to that of the Internet.
* BRITE (Boston university Representative Internet Topology gEnerator) is
available at http://www.cs.bu.edu/fac/matta/software.html
::::::::::::::
2000-005
::::::::::::::
Title: BRITE: A Flexible Generator of Internet Topologies
Authors: Alberto Medina, Ibrahim Matta, and John Byers
Date: January 21, 2000 (Revised January 15, 2001)
Abstract:
BRITE is a parameterized topology generation tool, which can be used to
flexibly control various parameters (such as connectivity and growth
models) and study various properties of generated network topologies (such
power laws, path length and clustering coefficient).
BRITE can be used to study the relevance of possible causes for properties
recently observed in Internet topologies. Different combinations of
possible causes can be tested. In this version, we consider four of them:
(1) preferential connectivity of a new node to existing nodes; (2)
incremental growth of the network; (3) geographical distribution of nodes;
and (4) locality of edge connections. We use BRITE in [BU-CS-TR-2000-0004]
to study the origin of power laws and other metrics in Internet topologies.
BRITE (Boston university Representative Internet Topology gEnerator) is
available on the Web at http://www.cs.bu.edu/faculty/matta/Research/BRITE/
::::::::::::::
2000-006
::::::::::::::
Title: Efficient Hash-Consing of Recursive Types
Author: Jeffrey Considine
Date: January 29, 2000
Abstract:
Efficient storage of types within a compiler is necessary to avoid large
blowups in space during compilation. Recursive types in particular are
important to consider, as naive representations of recursive types may be
arbitrarily larger than necessary through unfolding. Hash-consing has been
used to efficiently store non-recursive types. Deterministic finite automata
techniques have been used to efficiently perform various operations on
recursive types. We present a new system for storing recursive types combining
hash-consing and deterministic finite automata techniques. The space
requirements are linear in the number of distinct types. Both update and
lookup operations take polynomial time and linear space and type equality can
be checked in constant time once both types are in the system.
::::::::::::::
2000-007
::::::::::::::
Title: Type Inference For Recursive Definitions
Author: Assaf J. Kfoury (Boston University) and Santiago M. Pericas-Geertsen (Boston University)
Date: March 6, 2000
Abstract:
We consider type systems that combine universal types, recursive
types, and object types. We study type inference in these systems
under a rank restriction, following Leivant's notion of rank. To
motivate our work, we present several examples showing how our systems
can be used to type programs encountered in practice. We show that
type inference in the rank-k system is decidable for k <= 2 and
undecidable for k >= 3. (Similar results based on different
techniques are known to hold for System F, without recursive types and
object types.) Our undecidability result is obtained by a reduction
from a particular adaptation (which we call ``regular'') of the
semi-unification problem and whose undecidability is, interestingly,
obtained by methods totally different from those used in the case of
standard (or finite) semi-unification.
::::::::::::::
2000-008
::::::::::::::
Title: QoS Controllers for the Internet
Authors: Ibrahim Matta and Azer Bestavros
Date: March 12, 2000
Abstract:
In this position paper, we review basic control strategies that
machines acting as "traffic controllers" could deploy in order to
improve the management of Internet services. Such traffic controllers
are likely to spur the widespread emergence of advanced applications,
which have (so far) been hindered by the inability of the networking
infrastructure to deliver on the promise of Quality-of-Service (QoS).
::::::::::::::
2000-009
::::::::::::::
Title: Index trees for efficient deformable shape-based retrieval
Author: Lifeng Liu and Stan Sclaroff
Image and Vision Computing Group,
Computer Science Department, Boston University
Date: March 22, 2000
Abstract:
An improved method for deformable shape-based image indexing
and retrieval is described. A pre-computed index tree is
used to improve the speed of our previously reported on-line
model fitting method; simple shape features are used as keys
in a pre-generated index tree of model instances. In
addition, a coarse to fine indexing scheme is used at
different levels of the tree to further improve speed while
maintaining matching accuracy. Experimental results show
that the speedup is significant, while accuracy of
shape-based indexing is maintained. A method for shape
population-based retrieval is also described. The method
allows query formulation based on the population
distributions of shapes in each image. Results of
population-based image queries for a database of blood cell
micrographs are shown.
::::::::::::::
2000-010
::::::::::::::
Title: Deciding Isomorphisms of Simple Types in Polynomial Time
Author: Jeffrey Considine
Date: April 2, 2000
Abstract:
The isomorphisms holding in all models of the simply typed lambda calculus
with surjective and terminal objects are well studied - these models are
exactly the Cartesian closed categories. Isomorphism of two simple types in
such a model is decidable by reduction to a normal form and comparison under a
finite number of permutations (Bruce, Di Cosmo, and Longo 1992).
Unfortunately, these normal forms may be exponentially larger than the
original types so this construction decides isomorphism in exponential
time. We show how using space-sharing/hash-consing techniques and memoization
can be used to decide isomorphism in practical polynomial time (low degree,
small hidden constant).
Other researchers have investigated simple type isomorphism in relation to,
among other potential applications, type-based retrieval of software modules
from libraries and automatic generation of bridge code for multi-language
systems. Our result makes such potential applications practically feasible.
::::::::::::::
2000-011
::::::::::::::
Title: GreedyDual* Web Caching Algorithm: Exploiting the Two Sources
of Temporal Locality in Web Request Streams
Authors: Shudong Jin and Azer Bestavros
Date: April 4, 2000
Abstract:
The relative importance of long-term popularity and short-term
temporal correlation of references for Web cache replacement policies
has not been studied thoroughly. This is partially due to the lack of
accurate characterization of temporal locality that enables the
identification of the relative strengths of these two sources of
temporal locality in a reference stream. In [JB99], we have proposed
such a metric and have shown that Web reference streams differ
significantly in the the prevelance of these two sources of temporal
locality. These findings underscore the importance of a Web caching
strategy that can adapt in a dynamic fashion to the prevelance of
these two sources of temporal locality. In this paper, we propose a
novel cache replacement algorithm, GreedyDual*, which is a
generalization of GreedyDual-Size. GreedyDual* uses the metrics
proposed in [JB99] to adjust the relative worth of long-term
popularity versus short-term temporal correlation of references. Our
trace-driven simulation experiments show the superior performance of
GreedyDual* when compared to other Web cache replacement policies
proposed in the literature.
::::::::::::::
2000-012
::::::::::::::
Title: Differentiated Predictive Fair Service for TCP Flows
Author: Ibrahim Matta and Liang Guo
Computer Science Department, Boston University
Date: May 17, 2000
Abstract:
The majority of the traffic (bytes) flowing over the Internet today
have been attributed to the Transmission Control Protocol (TCP). This
strong presence of TCP has recently spurred further investigations
into its congestion avoidance mechanism and its effect on the
performance of short and long data transfers. At the same time, the
rising interest in enhancing Internet services while keeping the
implementation cost low has led to several service-differentiation
proposals. In such service-differentiation architectures, much of the
complexity is placed only in access routers, which classify and mark
packets from different flows. Core routers can then allocate enough
resources to each class of packets so as to satisfy delivery
requirements, such as predictable (consistent) and fair service.
In this paper, we investigate the interaction among short and long TCP
flows, and how TCP service can be improved by employing a low-cost
service-differentiation scheme. Through control-theoretic arguments
and extensive simulations, we show the utility of isolating TCP flows
into two classes based on their lifetime/size, namely one class of
short flows and another of long flows. With such class-based
isolation, short and long TCP flows have separate service queues at
routers. This protects each class of flows from the other as they
possess different characteristics, such as burstiness of
arrivals/departures and congestion/sending window dynamics. We show
the benefits of isolation, in terms of better predictability and
fairness, over traditional shared queueing systems with both tail-drop
and Random-Early-Drop (RED) packet dropping policies. The proposed
class-based isolation of TCP flows has several advantages: (1) the
implementation cost is low since it only requires core routers to
maintain per-class (rather than per-flow) state; (2) it promises to be
an effective traffic engineering tool for improved predictability and
fairness for both short and long TCP flows; and (3) stringent delay
requirements of short interactive transfers can be met by increasing
the amount of resources allocated to the class of short flows.
::::::::::::::
2000-013
::::::::::::::
Title: Robust Identification of Shared Losses Using End-to-End Unicast Probes
Authors: Khaled Harfoush, Azer Bestavros, and John Byers
Date: May 30, 2000
Abstract:
Current Internet transport protocols make end-to-end measurements and
maintain per-connection state to regulate the use of shared network
resources. When two or more such connections share a common endpoint,
there is an opportunity to correlate the end-to-end measurements made
by these protocols to better diagnose and control the use of shared
resources. We develop packet probing techniques to determine whether
a pair of connections experience shared congestion. Correct,
efficient diagnoses could enable new techniques for aggregate
congestion control, QoS admission control, connection scheduling and
mirror site selection. Our extensive simulation results demonstrate
that the conditional (Bayesian) probing approach we employ provides
superior accuracy, converges faster, and tolerates a wider range of
network conditions than recently proposed memoryless (Markovian)
probing approaches for addressing this opportunity.
::::::::::::::
2000-014
::::::::::::::
Title: Utility-Based Decision-Making in Wireless Sensor Networks
Author: John Byers and Gabriel Nasser (B.U.)
Date: June 1, 2000
Abstract:
We consider challenges associated with application domains in which
a large number of distributed, networked sensors must perform a sensing
task repeatedly over time. For the tasks we consider, there are three
significant challenges to address. First, nodes have resource constraints
imposed by their finite power supply, which motivates computations that
are energy-conserving. Second, for the applications we describe, the utility
derived from a sensing task may vary depending on the placement and size of
the set of nodes who participate, which often involves complex objective
functions for nodes to target. Finally, nodes must attempt to realize these
global objectives with only local information.
We present a model for such applications, in which we define appropriate global
objectives based on utility functions and specify a cost model for energy
consumption. Then, for an important class of utility functions, we present
distributed algorithms which attempt to maximize the utility derived from the
sensor network over its lifetime. The algorithms and experimental results we
present enable nodes to adaptively change their roles over time and use dynamic
reconfiguration of routes to load balance energy consumption in the network.
::::::::::::::
2000-015
::::::::::::::
Title: Estimating Human Body Pose from a Single Image via the Specialized Mappings Architecture
Authors: Romer Rosales and Stan Sclaroff
Date: June 10, 2000
Abstract:
A non-linear supervised learning architecture, the Specialized Mapping
Architecture (SMA) and its application to articulated body pose
reconstruction from single monocular images is described. The
architecture is formed by a number of specialized mapping functions,
each of them with the purpose of mapping certain portions (connected
or not) of the input space, and a feedback matching process. A
probabilistic model for the architecture is described along with a
mechanism for learning its parameters. The learning problem is
approached using a maximum likelihood estimation framework; we present
Expectation Maximization (EM) algorithms for two different instances
of the likelihood probability. Performance is characterized by
estimating human body postures from low level visual features, showing
promising results
::::::::::::::
2000-016
::::::::::::::
Title: Unicast-based Characterization of Network Loss Topologies
Authors: Khaled Harfoush, Azer Bestavros, and John Byers
Date: July 3, 2000
Abstract:
Current Internet transport protocols make end-to-end measurements and
maintain per-connection state to regulate the use of shared network
resources. When a number of such connections share a common endpoint,
that endpoint has the opportunity to correlate these end-to-end
measurements to better diagnose and control the use of shared
resources. A valuable characterization of such shared resources is the
``loss topology''. From the perspective of a server with concurrent
connections to multiple clients, the loss topology is a logical tree
rooted at the server in which edges represent lossy paths between a
pair of internal network nodes. We develop an end-to-end unicast
packet probing technique and an associated analytical framework to:
(1) infer loss topologies, (2) identify loss rates of links in an
existing loss topology, and (3) augment a topology to incorporate the
arrival of a new connection. Correct, efficient inference of loss
topology information enables new techniques for aggregate congestion
control, QoS admission control, connection scheduling and mirror site
selection. Our extensive simulation results demonstrate that our
approach is robust in terms of its accuracy and convergence over a
wide range of network conditions.
::::::::::::::
2000-017
::::::::::::::
Title: TCP Congestion Control and Heavy Tails
Author: Liang Guo, Mark Crovella, and Ibrahim Matta
Computer Science Department, Boston University
Date: July 3, 2000
Abstract:
Long-range dependence has been observed in many recent Internet
traffic measurements. Previous studies have shown that there is a
close relationship between heavy-tailed distribution of various
traffic parameters and the long-range dependent property. In this
paper, we use a simple Markov chain model to argue that when the loss
rate is relatively high, TCP's adaptive congestion control mechanism
indeed generates traffic with heavy-tailed OFF, or idle, periods, and
therefore introduces long-range dependence into the overall traffic.
Moreover, the degree of such long-range dependence, measured by the
Hurst parameter, increases as the loss rate increases, agreeing with
many previous measurement-based studies. In addition, we observe that
more variable initial retransmission timeout values for different
packets introduces more variable packet inter-arrival times, which
increases the burstiness of the overall traffic. Finally, we show
that high loss conditions can lead to a heavy-tailed distribution of
transmission times even for constant-sized files. This means that
file size variability need not be the only cause of heavy-tailed
variability in transmission durations.
::::::::::::::
2000-018
::::::::::::::
Title: On the Marginal Utility of Deploying Measurement Infrastructure
Authors: Paul Barford, Azer Bestavros, John Byers, and Mark Crovella
Computer Science Department, Boston University
Date: July 3, 2000
Abstract:
The cost and complexity of deploying measurement infrastructure in the
Internet for the purpose of analyzing its structure and behavior is
considerable. Basic questions about the {\em utility} of increasing
the number of measurements and/or measurement sites have not yet been
addressed which has lead to a ``more is better'' approach to wide-area
measurements. In this paper, we quantify the marginal utility of
performing wide-area measurements in the context of Internet topology
discovery. We characterize topology in terms of nodes, links, node
degree distribution, and end-to-end flows using statistical and
information-theoretic techniques. We classify nodes discovered on
the routes between a set of 8 sources and 1277 destinations to differentiate
nodes which make up the so called ``backbone'' from those which border the
backbone and those on links between the border nodes and destination nodes.
This process includes reducing nodes that advertise multiple interfaces to
single IP addresses. We show that the utility of adding sources goes down
significantly after 2 from the persperspective of interface, node, link and
node degree discovery. We show that the utility of adding destinations is
constant for interfaces, nodes, links and node degree indicating that it is
more important to add destinations than sources. Finally, we analyze
paths through the backbone and show that shared link distributions
approximate a power law indicating that a small number of backbone links in
our study are very heavily utilized.
::::::::::::::
2000-019
::::::::::::::
Title: Cachability of Web Objects
Author: Xiaohui Zhang
Computer Science Department, Boston University
Date: August 8, 2000
Abstract:
Much work on the performance of Web proxy caching has focused on
high-level metrics such as hit rate and byte hit rate, but has ignored
all the information related to the cachability of Web
objects. Uncachable objects include those fetched by dynamic requests,
objects with uncachable HTTP status code, objects with the uncachable
HTTP header, objects with an HTTP 1.0 cookie, and objects without a
last-modified header. Although some researchers filter the Web traces
before they use them for analysis or simulation,many do not have a
comprehensive understanding of the cachability of Web objects. In this
paper we evaluate all the reasons that a Web object might be
uncachable. We use traces from NLANR. Since these traces do not
contain HTTP header information, we replay them using request
generator to get the response header information. We find that between
15% and 40% of Web objects in our traces can not be cached by a Web
proxy server . We use a LRU simulator to show the performance gap when
the cachability is either considered or not. We show the
characteristics of the cachable data set and find that all its
characteristics are fairly similar to that of total data set. Finally,
we present some additional results for the cachable and total data
set: (1) The main reasons for uncachability are: dynamic requests,
responses without last-modified header, responses with HTTP "302 Moved
Temporarily" status code, and responses with a HTTP/1.0 cookie. (2)
The cachability of Web objects can not be ignored in simulation
because uncachable objects comprise a huge percentage of the total
trace. Simulations without cachability consideration will be
misleading.
::::::::::::::
2000-020
::::::::::::::
Title: Type Inference for Variant Object Types
Author: Michele Bugliesi and Santiago M. Pericas-Geertsen
Date: October 16, 2000
Abstract:
Existing type systems for object calculi are based on invariant subtyping.
Subtyping invariance is required for soundness of static typing in the
presence of method overrides, but it is often in the way of the expressive
power of the type system. Flexibility of static typing can be recovered in
different ways: in first-order systems, by the adoption of object types with
variance annotations, in second-order systems by resorting
to Self types.
Type inference is known to be P-complete for first-order systems of finite
and recursive object types, and NP-complete for a restricted version of Self
types. The complexity of type inference for systems with variance annotations
is yet unknown.
This paper presents a new object type system based on the notion of Split
types, a form of object types where every method is assigned two types,
namely, an update type and a select type. The subtyping relation that arises
for Split types is variant and, as a result, subtyping can be performed
both in width and in depth.
The new type system generalizes all the existing first-order type systems
for objects, including systems based on variance annotations. Interestingly,
the additional expressive power does not affect the complexity of the type
inference problem, as we show by presenting an O(n^3) inference algorithm.
::::::::::::::
2000-021
::::::::::::::
Title: What are polymorphically-typed ambients?
Authors: Torben Amtoft, Assaf Kfoury, Santiago Pericas-Geertsen
Date: October 19, 2000
Abstract: The Ambient Calculus was developed by Cardelli and
Gordon as a formal framework to study issues of mobility and
migrant code. We consider an Ambient Calculus where ambients
transport and exchange programs rather that just inert data. We
propose different senses in which such a calculus can be said to be
polymorphically typed, and design accordingly a polymorphic
type system for it. Our type system assigns types to embedded
programs and what we call behaviors to processes; a denotational
semantics of behaviors is then proposed, here called trace
semantics, underlying much of the remaining analysis. We state
and prove a Subject Reduction property for our polymorphically
typed calculus. Based on techniques borrowed from finite automata
theory, type-checking of fully type-annotated processes is shown
to be decidable; the time complexity of our decision procedure is
exponential (this is a worst-case in theory, arguably not encountered
in practice). Our polymorphically-typed calculus is a conservative
extension of the typed Ambient Calculus originally proposed by
Cardelli and Gordon.
::::::::::::::
2000-022
::::::::::::::
Title: 3D Hand Pose Reconstruction Using Specialized Mappings
Authors: Romer Rosales, Vassilis Athitsos, and Stan Sclaroff
Date: December 4, 2000
Abstract:
A system for recovering 3D hand pose from monocular color
sequences is proposed. The system employs a non-linear
supervised learning framework, the specialized mappings
architecture (SMA), to map image features to likely 3D hand
poses. The SMA's fundamental components are a set of
specialized forward mapping functions, and a single feedback
matching function. The forward functions are estimated
directly from training data, which in our case are examples
of hand joint configurations and their corresponding visual
features. The joint angle data in the training set is
obtained via a CyberGlove, a glove with 22 sensors that
monitor the angular motions of the palm
and fingers. In training, the visual features are generated
using a computer graphics module that renders the hand from
arbitrary viewpoints given the 22 joint angles. We test our
system both on synthetic sequences and on sequences taken
with a color camera. The system automatically detects and
tracks both hands of the user, calculates the appropriate
features, and estimates the 3D hand joint angles from those
features. Results are encouraging given the complexity of
the task.
::::::::::::::
2000-023
::::::::::::::
Title: An Integrated Approach for Segmentation and Estimation of Planar Structures
Authors: Joni Alon and Stan Sclaroff
Date: December 4, 2000
Abstract:
Standard structure from motion algorithms recover 3D
structure of points. If a surface representation is desired,
for example a piece-wise planar representation, then a
two-step procedure typically follows: in the first step the
plane-membership of points is first determined manually, and
in a subsequent step planes are fitted to the sets of points
thus determined, and their parameters are recovered. This
paper presents an approach for automatically segmenting
planar structures from a sequence of images, and
simultaneously estimating their parameters. In the proposed
approach the plane-membership of points is determined
automatically, and the planar structure parameters are
recovered directly in the algorithm rather than indirectly
in a post-processing stage. Simulated and real experimental
results show the efficacy of this approach.
::::::::::::::
2000-024
::::::::::::::
Title: Region Segmentation via Deformable Model-Guided Split and Merge
Authors: Lifeng Liu and Stan Sclaroff
Date: December 4, 2000
Abstract:
An improved method for deformable shape-based image
segmentation is described. Image regions are merged
together and/or split apart, based on their agreement with
an a priori distribution on the global deformation
parameters for a shape template. The quality of a candidate
region merging is evaluated by a cost measure that includes:
homogeneity of image properties within the combined region,
degree of overlap with a deformed shape model, and a
deformation likelihood term. Perceptually-motivated
criteria are used to determine where/how to split regions,
based on the local shape properties of the region group's
bounding contour. A globally consistent interpretation is
determined in part by the minimum description length
principle. Experiments show that the model-based splitting
strategy yields a significant improvement in segmention over
a method that uses merging alone.
::::::::::::::
2000-025
::::::::::::::
Title: The Cyclone Server Architecture: Streamlining Delivery of Popular Content
Authors: Stan Rost, John Byers, Azer Bestavros (Boston University)
Date: 12/15/2000
Abstract:
We propose a new technique for efficiently delivering popular content
from information repositories with bounded file caches. Our strategy
relies on the use of fast erasure codes (a.k.a. forward error
correcting codes) to generate encodings of popular files, of which
only a small sliding window is cached at any time instant, even to satisfy
an unbounded number of asynchronous requests for the file. Our approach
capitalizes on concurrency to maximize sharing of state across different
request threads while minimizing cache memory utilization. Additional
reduction in resource requirements arises from providing for a
lightweight version of the network stack.
In this paper, we describe the design and implementation of our
Cyclone server as a Linux kernel subsystem.
::::::::::::::
2000-026
::::::::::::::
Title: Fine-Grained Layered Multicast
Authors: John Byers, Boston University
Michael Luby, Digital Fountain, Inc.
Michael Mitzenmacher, Harvard University
Date: 1/12/01
Abstract:
Traditional approaches to receiver-driven layered multicast have advocated
the benefits of cumulative layering, which can enable coarse-grained
congestion control that complies with TCP-friendliness equations over large
time scales. In this paper, we quantify the costs and benefits of using
non-cumulative layering and present a new, scalable multicast congestion
control scheme which provides a fine-grained approximation to the behavior of
TCP additive increase / multiplicative decrease (AIMD). In contrast to
the conventional wisdom, we demonstrate that fine-grained rate adjustment
can be achieved with only modest increases in the number of layers and
aggregate bandwidth consumption, while using only a small constant number
of control messages to perform either additive increase or multiplicative
decrease.
::::::::::::::
2000-027
::::::::::::::
Title: An Infrastructure for the Dynamic Distribution of Web Applications and Services
Authors: Enrique Duvos and Azer Bestavros
Abstract:
This paper presents the design and implementation of an infrastructure
that enables any Web application, regardless of its current state, to
be stopped and uninstalled from a particular server, transferred to a
new server, then installed, loaded, and resumed, with all these events
occurring "on the fly" and totally transparent to clients. Such
functionalities allow entire applications to fluidly move from server
to server, reducing the overhead required to administer the system,
and increasing its performance in a number of ways: (1) Dynamic
replication of new instances of applications to several servers to
raise throughput for scalability purposes, (2) Moving applications to
servers to achieve load balancing or other resource management goals,
(3) Caching entire applications on servers located closer to clients.
::::::::::::::
2000-028
::::::::::::::
"Scaling Phenomena in Small-World Networks," Byers, Crovella, Ye.
::::::::::::::
2001-001
::::::::::::::
Title: Robust Identification of Shared Losses Using End-to-End Unicast Probes (ERRATA)
Authors: Khaled Harfoush, Azer Bestavros and John Byers
Date: January 8, 2001
Abstract:
We present corrections to Fact 3 and (as a consequence) to Lemma 1 of
BUCS Technical Report BUCS-TR-2000-013 (also published in IEEE
ICNP'2000). These corrections result in slight changes to the formulae
used for the identifications of shared losses, which we quantify.
::::::::::::::
2001-002
::::::::::::::
Title: Program representation size in an intermediate language with intersection and union types
Authors: Allyn Dimock, Ian Westmacott, Robert Muller, Franklyn Turbak, J. B. Wells, and Jeffrey Considine
Date: March 15, 2001
Abstract:
The CIL compiler for core Standard ML compiles whole programs using a
novel typed intermediate language (TIL) with intersection and union
types and flow labels on both terms and types. The CIL term
representation duplicates portions of the program where intersection
types are introduced and union types are eliminated. This duplication
makes it easier to represent type information and to introduce
customized data representations. However, duplication incurs
compile-time space costs that are potentially much greater than are
incurred in TILs employing type-level abstraction or quantification.
In this paper, we present empirical data on the compile-time space
costs of using CIL as an intermediate language. The data shows that
these costs can be made tractable by using sufficiently fine-grained
flow analyses together with standard hash-consing techniques. The
data also suggests that non-duplicating formulations of intersection
(and union) types would not achieve significantly better space
complexity.
::::::::::::::
2001-003
::::::::::::::
Title: BRITE: Universal Topology Generation from a User's Perspective
Authors: Alberto Medina, Anukool Lakhina, Ibrahim Matta, John Byers
Date: April 1, 2001
Abstract:
Effective engineering of the Internet is predicated upon a detailed
understanding of issues such as the large-scale structure of its
underlying physical topology, the manner in which it evolves over
time, and the way in which its constituent components contribute to
its overall function. Unfortunately, developing a deep understanding
of these issues has proven to be a challenging task, since it in turn
involves solving difficult problems such as mapping the actual
topology, characterizing it, and developing models that capture its
emergent behavior. Consequently, even though there are a number of
topology models, it is an open question as to how representative the
topologies they generate are of the actual Internet. Our goal is to
produce a topology generation framework which improves the state of
the art and is based on design principles which include
representativeness, inclusiveness, and interoperability.
Representativeness leads to synthetic topologies that accurately
reflect many aspects of the actual Internet topology (e.g.
hierarchical structure, degree distribution, etc.). Inclusiveness
combines the strengths of as many generation models as possible in a
single generation tool. Interoperability provides interfaces to
widely-used simulation applications such as ns and SSF as well as
visualization applications. We call such a tool a "universal topology
generator".
In this paper we discuss the design, implementation and usage of the
BRITE universal topology generation tool that we have built. We also
describe the BRITE Analysis Engine, BRIANA, which is an independent
piece of software designed and built upon BRITE design goals of
flexibility and extensibility. The purpose of BRIANA is to act as a
repository of analysis routines along with a user--friendly interface
that allows its use on different topology formats.
KEYWORDS:
Topology generation, graph models, network topology, growth models,
annotated topologies, simulation environments.
::::::::::::::
2001-004
::::::::::::::
Title: Automatic 3D Registration of Lung Surfaces in Computed Tomography Scans
Authors: Margrit Betke, Harrison Hong, Jane P. Ko
Date: April 24, 2001
Abstract:
We developed an automated system that registers chest CT scans
temporally. Our registration method matches corresponding anatomical
landmarks to obtain initial registration parameters. The initial
point-to-point registration is then generalized to an iterative
surface-to-surface registration method. Our ``goodness-of-fit''
measure is evaluated at each step in the iterative scheme until the
registration performance is sufficient. We applied our method to
register the 3D lung surfaces of 11 pairs of chest CT scans and report
promising registration performance.
::::::::::::::
2001-005
::::::::::::::
Title: The War Between Mice and Elephants
Author: Liang Guo, Ibrahim Matta
Date: May 7, 2001
Abstract:
Recent measurement based studies reveal that most of the Internet
connections are short in terms of the amount of traffic they carry
(mice), while a small fraction of the connections are carrying a large
portion of the traffic (elephants). A careful study of the TCP
protocol shows that without help from an Active Queue Management (AQM)
policy, short connections tend to lose to long connections in their
competition for bandwidth. This is because short connections do not
gain detailed knowledge of the network state, and therefore they are
doomed to be less competitive due to the conservative nature of the
TCP congestion control algorithm.
Inspired by the Differentiated Services (Diffserv) architecture, we
propose to give preferential treatment to short connections inside the
bottleneck queue, so that short connections experience less packet
drop rate than long connections. This is done by employing the RIO
(RED with In and Out) queue management policy which uses different
drop functions for different classes of traffic.
Our simulation results show that: (1) in a highly loaded network,
preferential treatment is necessary to provide short TCP connections
with better response time and fairness without hurting the performance
of long TCP connections; (2) the proposed scheme still delivers
packets in FIFO manner at each link, thus it maintains statistical
multiplexing gain and does not misorder packets; (3) choosing a
smaller default initial timeout value for TCP can help enhance the
performance of short TCP flows, however not as effectively as our
scheme and with the risk of congestion collapse; (4) in the worst
case, our proposal works as well as a regular RED scheme, in terms of
response time and goodput.
Keywords:
Traffic Engineering, Congestion Control, TCP Performance, Fairness.
::::::::::::::
2001-006
::::::::::::::
Title: TCP-friendly SIMD Congestion Control and Its Convergence Behavior
Authors: Shudong Jin, Liang Guo, Ibrahim Matta, and Azer Bestavros
Date: May 8, 2001
Abstract:
The increased diversity of Internet application requirements has
spurred recent interest in flexible congestion control
mechanisms. Window-based congestion control schemes use increase rules
to probe available bandwidth, and decrease rules to back off when
congestion is detected. The parameterization of these control rules is
done so as to ensure that the resulting protocol is TCP-friendly in
terms of the relationship between throughput and packet loss rate. In
this paper, we propose a novel window-based congestion control
algorithm called SIMD (Square-Increase/Multiplicative-Decrease).
Contrary to previous memory-less controls, SIMD utilizes history
information in its control rules. It uses multiplicative decrease but
the increase in window size is in proportion to the {\em square} of
the time elapsed since the detection of the last loss event. Thus,
SIMD can efficiently probe available bandwidth. Nevertheless, SIMD is
TCP-friendly as well as TCP-compatible under RED, and it has much
better convergence behavior than TCP-friendly AIMD and binomial
algorithms proposed recently.
::::::::::::::
2001-007
::::::::::::::
Title: Retrieval by Shape Population: An Index Tree Approach
Authors: Lifeng Liu and Stan Sclaroff
Date: June 5, 2001
Based on our previous work in deformable shape model-based
object detection, a new method is proposed that uses index
trees for organizing shape features to support content-based
retrieval applications. In the proposed strategy, different
shape feature sets can be used in index trees constructed
for object detection and shape similarity comparison
respectively. There is a direct correspondence between the
two shape feature sets. As a result, application-specific
features can be obtained efficiently for shape-based
retrieval after object detection. A novel approach is
proposed that allows retrieval of images based on the
population distribution of deformed shapes in each image.
Experiments testing these new approaches have been conducted
using an image database that contains blood cell
micrographs. The precision vs. recall performance measure
shows that our method is superior to previous methods.
::::::::::::::
2001-008
::::::::::::::
Title: Estimating 3D Body Pose using Uncalibrated Cameras
Author: Romer Rosales, Matheen Siddiqui, Jonathan Alon and Stan Sclaroff
Date: June 5, 2001
An approach for estimating 3D body pose from multiple,
uncalibrated views is proposed. First, a mapping from image
features to 2D body joint locations is computed using a
statistical framework that yields a set of several body pose
hypotheses. The concept of a ``virtual camera'' is
introduced that makes this mapping invariant to translation,
image-plane rotation, and scaling of the input. As a
consequence, the calibration matrices (intrinsics) of the
virtual cameras can be considered completely known, and
their poses are known up to a single angular displacement
parameter. Given pose hypotheses obtained in the multiple
virtual camera views, the recovery of 3D body pose and
camera relative orientations is formulated as a stochastic
optimization problem. An Expectation-Maximization algorithm
is derived that can obtain the most likely (self-consistent)
combination of body pose hypotheses. Performance of the
approach is evaluated with synthetic sequences as well as
real video sequences of human motion.
::::::::::::::
2001-009
::::::::::::::
Title: Surface Reconstruction from Multiple Views using Rational B-Splines
Author: Matheen Siddiqui and Stan Sclaroff
Date: June 5, 2001
A method for reconstructing 3D rational B-spline surfaces
from multiple views is proposed. The method takes advantage
of the projective invariance properties of rational
B-splines. Given feature correspondences in multiple views,
the 3D surface is reconstructed via a four step framework.
First, corresponding features in each view are given an
initial surface parameter value (s,t), and a 2D B-spline is
fitted in each view. After this initialization, an iterative
minimization procedure alternates between updating the 2D
B-spline control points and re-estimating each feature's
(s,t). Next, a non-linear minimization method is used to
upgrade the 2D B-splines to 2D rational B-splines, and
obtain a better fit. Finally, a factorization method is used
to reconstruct the 3D B-spline surface given 2D B-splines in
each view. This surface recovery method can be applied in
both the perspective and orthographic case. The
orthographic case allows the use of additional constraints
in the recovery. Experiments with real and synthetic
imagery demonstrate the efficacy of the approach for the
orthographic case.
::::::::::::::
2001-010
::::::::::::::
Title: Inference and Labeling of Metric-Induced Network Topologies
Authors: Azer Bestavros, John Byers, and Khaled Harfoush
Date: June 5, 2001
Abstract:
The deployment of distributed network-aware applications over the
Internet requires an accurate representation of the conditions of
underlying network resources. To be effective, this representation
must be possible at multiple resolutions relative to a metric of
interest. In this paper, we propose an approach for the construction
of such representations using end-to-end measurements.
We instantiate our approach by considering packet loss rates
as an example metric. To that end, we present an analytical
framework for the inference of Internet loss topologies. From the
perspective of a server the loss topology is a logical tree rooted
at the server with clients at its leaves, in which edges represent
lossy paths---paths exhibiting observable loss rates higher than a
specified resolution---between a pair of internal network nodes.
We show how end-to-end unicast packet probing techniques could be used
to (1) infer a loss topology, and (2) identify the loss rates of
links in an existing loss topology. We report on simulation,
implementation, and Internet deployment results that show the
effectiveness of our approach and its robustness in terms of its
accuracy and convergence.
::::::::::::::
2001-011
::::::::::::::
Title: On Class-based Isolation of UDP, Short-lived and
Long-lived TCP Flows
Author: Selma Yilmaz and Ibrahim Matta
Date: June 12, 2001
Abstract:
The congestion control mechanisms of TCP make it vulnerable in an
environment where flows with different congestion-sensitivity compete
for scarce resources. With the increasing amount of unresponsive UDP
traffic in today's Internet, new mechanisms are needed to enforce
fairness in the core of the network. We propose a scalable
Diffserv-like architecture, where flows with different characteristics
are classified into separate service queues at the routers. Such
class-based isolation provides protection so that flows with different
characteristics do not negatively impact one another. In this study,
we examine different aspects of UDP and TCP interaction and possible
gains from segregating UDP and TCP into different classes. We also
investigate the utility of further segregating TCP flows into two
classes, which are class of short and class of long flows. Results
are obtained analytically for both Tail-drop and Random Early Drop
(RED) routers. Class-based isolation have the following salient
features: (1) better fairness, (2) improved predictability for all
kinds of flows, (3) lower transmission delay for delay-sensitive
flows, and (4) better control over Quality of Service (QoS) of a
particular traffic type.
::::::::::::::
2001-012
::::::::::::::
Title: DNS-based Internet Client Clustering and Characterization
Authors: Azer Bestavros and Sumit Mehrotra
Date: June 5, 2001
Abstract:
This paper proposes a novel protocol which uses the Internet Domain
Name System (DNS) to partition Web clients into disjoint sets, each of
which is associated with a single DNS server. We define an L-DNS
cluster to be a grouping of Web Clients that use the same Local DNS
server to resolve Internet host names. We identify such clusters in
real-time using data obtained from a Web Server in conjunction with
that server's Authoritative DNS---both instrumented with an
implementation of our clustering algorithm. Using these clusters, we
perform measurements from four distinct Internet locations. Our
results show that L-DNS clustering enables a better estimation of
proximity of a Web Client to a Web Server than previously proposed
techniques. Thus, in a Content Distribution Network, a DNS-based
scheme that redirects a request from a web client to one of many
servers based on the client's name server coordinates (e.g.,
hops/latency/loss-rates between the client and servers) would perform
better with our algorithm.
::::::::::::::
2001-013
::::::::::::::
Title: Open Issues on TCP for Mobile Computing
Author: Vassilis Tsaoussidis (Northeastern University) and
Ibrahim Matta (Boston University)
Date: July 3, 2001
Abstract:
We discuss the design principles of TCP within the context of
heterogeneous wired/wireless networks and mobile networking. We
identify three shortcomings in TCP's behavior: (i) the protocol's
error detection mechanism, which does not distinguish different types
of errors and thus does not suffice for heterogeneous wired/wireless
environments, (ii) the error recovery, which is not responsive to the
distinctive characteristics of wireless networks such as transient or
burst errors due to handoffs and fading channels, and (iii) the
protocol strategy, which does not control the tradeoff between
performance measures such as goodput and energy consumption, and often
entails a wasteful effort of retransmission and energy expenditure.
We discuss a solution-framework based on selected research proposals
and the associated evaluation criteria for the suggested
modifications. We highlight an important angle that did not attract
the required attention so far: the need for new performance metrics,
appropriate for evaluating the impact of protocol strategies on
battery-powered devices.
Keywords: TCP, congestion control, wireless links, mobile computing,
energy efficiency.
::::::::::::::
2001-014
::::::::::::::
Title: How does TCP generate Pseudo-self-similarity?
Author: Liang Guo, Mark Crovella, and Ibrahim Matta
Computer Science Department, Boston University
Date: July 12, 2001
Abstract:
Long-range dependence has been observed in many recent Internet
traffic measurements. In addition, some recent studies have shown
that under certain network conditions, TCP itself can produce traffic
that exhibits dependence over limited timescales, even in the absence
of higher-level variability. In this paper, we use a simple Markovian
model to argue that when the loss rate is relatively high, TCP's
adaptive congestion control mechanism indeed generates traffic with
OFF periods exhibiting power-law shape over several timescales and
thus introduces pseudo-long-range dependence into the overall traffic.
Moreover, we observe that more variable initial retransmission timeout
values for different packets introduces more variable packet
inter-arrival times, which increases the burstiness of the overall
traffic. We can thus explain why a single TCP connection can produce
a time-series that can be misidentified as self-similar using standard
tests.
Keywords: Congestion Control, Long-Range Dependence, Self-Similarity.
Revises Technical Report BUCS-TR-2000-017.
::::::::::::::
2001-015
::::::::::::::
Title: A Spectrum of TCP-friendly Window-based Congestion Control Algorithms
Author: Shudong Jin, Liang Guo, Ibrahim Matta, and Azer Bestavros
{jins, guol, matta, best}@cs.bu.edu
Computer Science Department
Boston University
Date: February 2, 2001
Revised on April 27, 2001
Posted on July 12, 2001
Abstract:
The increased diversity of Internet application requirements has
spurred recent interests in transport protocols with flexible
transmission controls. In window-based congestion control schemes,
increase rules determine how to probe available bandwidth, whereas
decrease rules determine how to back off when losses due to
congestion are detected. The parameterization of these control rules
is done so as to ensure that the resulting protocol is TCP-friendly
in terms of the relationship between throughput and loss rate.
In this paper, we define a new spectrum of window-based congestion
control algorithms that are TCP-friendly as well as TCP-compatible
under RED. Contrary to previous memory-less controls, our algorithms
utilize history information in their control rules. Our proposed
algorithms have two salient features: (1) They enable a wider region
of TCP-friendliness, and thus more flexibility in trading off among
smoothness, aggressiveness, and responsiveness; and (2) they ensure a
faster convergence to fairness under a wide range of system
conditions. SIMD is one instance of this spectrum of algorithms, in
which the congestion window is increased super-linearly with time
since the detection of the last loss. Compared to recently proposed
TCP-friendly AIMD and binomial algorithms, we demonstrate the
superiority of SIMD in: (1) adapting to sudden increases in available
bandwidth, while maintaining competitive smoothness and
responsiveness; and (2) rapidly converging to fairness and
efficiency.
Keywords:
Congestion Control, TCP-friendly, Fairness, Convergence.
::::::::::::::
2001-016
::::::::::::::
Title: Measuring Bottleneck Bandwidth of Targeted Path Segments
Author: Khaled Harfoush, Azer Bestavros, and John Byers
{harfoush, best, byers}@cs.bu.edu
Computer Science Department
Boston University
Date: July 31, 2001
Abstract:
Accurate measurement of network bandwidth is crucial for flexible
Internet applications and protocols which actively manage and
dynamically adapt to changing utilization of network resources.
These applications must do so to perform tasks such as distributing
and delivering high-bandwidth media, scheduling service requests and
performing admission control. Extensive work has focused on two
approaches to measuring bandwidth: measuring it hop-by-hop, and
measuring it end-to-end along a path. Unfortunately, best-practice
techniques for the former are inefficient and techniques for the
latter are only able to observe bottlenecks visible at end-to-end
scope. In this paper, we develop and simulate end-to-end probing
methods which can measure bottleneck bandwidth along arbitrary,
targeted subpaths of a path in the network, including subpaths
shared by a set of flows. As another important contribution, we
describe a number of practical applications which we foresee as
standing to benefit from solutions to this problem, especially in
emerging, flexible network architectures such as overlay networks,
ad-hoc networks, peer-to-peer architectures and massively accessed
content servers.
::::::::::::::
2001-017
::::::::::::::
Title: Proceedings of the Sixth International Web Content Caching and
Distribution Workshop (WCW'01)
Author: Azer Bestavros and Michael Rabinovich
Date: June 20-22, 2001
Posted August 2, 2001
Abstract:
The International Web Content Caching and Distribution Workshop (WCW)
is a premiere technical meeting for researchers and practitioners
interested in all aspects of content caching, distribution and
delivery on the Internet. This year's meeting will be held on the
Boston University Campus and will build on the successes of the five
previous WCW meetings. This technical report includes all the
technical papers presented at WCW'01.
::::::::::::::
2001-018
::::::::::::::
Title: STAIR: Practical AIMD Multirate Multicast Congestion Control
Authors: John Byers and Gu-In Kwon
Date: September 3, 2001
Abstract:
Existing approaches for multirate multicast congestion control
are either friendly to TCP only over large time scales or introduce
unfortunate side effects, such as significant control traffic, wasted
bandwidth, or the need for modifications to existing routers. We
advocate a layered multicast approach in which steady-state receiver
reception rates emulate the classical TCP sawtooth derived from
additive-increase, multiplicative decrease (AIMD) principles. Our
approach introduces the concept of dynamic {\em stair} layers to
simulate various rates of additive increase for receivers with
heterogeneous round-trip times (RTTs), facilitated by a minimal
amount of IGMP control traffic. We employ a mix of cumulative and
non-cumulative layering to minimize the amount of excess bandwidth
consumed by receivers operating asynchronously behind a shared bottleneck.
We integrate these techniques together into a congestion control scheme
called STAIR which is amenable to those multicast applications which can
make effective use of arbitrary and time-varying subscription levels.
::::::::::::::
2001-019
::::::::::::::
Title: Generating Good Degree Distributions for Sparse Parity Check Codes using Oracles
Author: Jeffrey Considine
Date: October 1, 2001
Abstract:
Fast forward error correction codes are becoming an important
component in bulk content delivery. They fit in naturally with
multicast scenarios as a way to deal with losses and are now seeing
use in peer to peer networks as a basis for distributing load. In
particular, new irregular sparse parity check codes have been
developed with provable average linear time performance, a significant
improvement over previous codes. In this paper, we present a new
heuristic for generating codes with similar performance based on
observing a server with an oracle for client state. This heuristic is
easy to implement and provides further intuition into the need for an
irregular heavy tailed distribution.
::::::::::::::
2001-020
::::::::::::::
Title: Gismo: A Generator of Internet Streaming Media Objects and Workloads
Authors: Shudong Jin and Azer Bestavros
Date: October 10, 2001
Abstract:
This paper presents a tool called GISMO (Generator of Internet
Streaming Media Objects and workloads). GISMO enables the
specification of a number of streaming media access characteristics,
including object popularity, temporal correlation of requests,
seasonal access patterns, user session durations, user inter-activity
times, and variable bit-rate (VBR) self-similarity and marginal
distributions. The embodiment of these characteristics in GISMO
enables the generation of realistic and scalable request streams for
use in the benchmarking and comparative evaluation of Internet
streaming media delivery techniques. To demonstrate the usefulness of
GISMO, we present a case study that shows the importance of various
workload characteristics in determining the effectiveness of proxy
caching and server patching techniques in reducing bandwidth
requirements.
::::::::::::::
2001-021
::::::::::::::
Title: 3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views
Authors: Vassilis Athitsos and Stan Sclaroff
Date: October 22, 2001
Abstract:
Ongoing work towards appearance-based 3D hand pose estimation from a
single image is presented. Using a 3D hand model and computer
graphics a large database of synthetic views is generated. The views
display different hand shapes as seen from arbitrary viewpoints. Each
synthetic view is automatically labeled with parameters describing its
hand shape and viewing parameters. Given an input image, the system
retrieves the most similar database views, and uses the shape and
viewing parameters of those views as candidate estimates for the
parameters of the input image. Preliminary results are presented, in
which appearance-based similarity is defined in terms of the chamfer
distance between edge images.
::::::::::::::
2001-022
::::::::::::::
Title: An Appearance-Based Framework for 3D Hand Shape Classification and Camera Viewpoint Estimation
Authors: Vassilis Athitsos and Stan Sclaroff
Date: October 22, 2001
Abstract:
An appearance-based framework for 3D hand shape classification
and simultaneous camera viewpoint estimation is presented. Given an
input image of a segmented hand, the most similar matches from a large
database of synthetic hand images are retrieved. The ground truth
labels of those matches, containing hand shape and camera viewpoint
information, are returned by the system as estimates for the input
image. Database retrieval is done hierarchically, by first quickly
rejecting the vast majority of all database views, and then ranking
the remaining candidates in order of similarity to the input. Four
different similarity measures are employed, based on edge location,
edge orientation, finger location and geometric moments.
::::::::::::::
2001-023
::::::::::::::
Title: Accelerating Internet Streaming Media Delivery using Network-Aware Partial Caching
Authors: Shudong Jin and Azer Bestavros
Date: October 30, 2001
Abstract:
Internet streaming applications are adversly affected by network
conditions such as high packet loss rates and long delays. This paper
aims at mitigating such effects by leveraging the availability of
client-side caching proxies. We present a novel caching architecture
(and associated cache management algorithms) that turn edge caches
into accelerators of streaming media delivery. A salient feature of
our caching algorithms is that they allow partial caching of streaming
media objects and joint delivery of content from caches and origin
servers. The caching algorithms we propose are both network-aware and
stream-aware; they take into account the popularity of streaming media
objects, their bit-rate requirements, and the available bandwidth
between clients and servers. Using realistic models of Internet
bandwidth (derived from proxy cache logs and measured over real
Internet paths), we have conducted extensive simulations to evaluate
the performance of various caching management alternatives. Our
experiments demonstrate that network-aware caching algorithms can
significantly reduce service delay and improve overall stream
quality. Also, our experiments show that partial caching is
particularly effective when bandwidth variability is not very high.
::::::::::::::
2001-024
::::::::::::::
Title: Basis Token Consistency: A Practical Mechanism for Strong Web Cache Consistency
Authors: Adam D. Bradley and Azer Bestavros
Date: November 1, 2001
Abstract:
With web caching and cache-related services like CDNs and edge
services playing an increasingly significant role in the modern
internet, the problem of the weak consistency and coherence provisions
in current web protocols is becoming increasingly significant and
drawing the attention of the standards community. Toward this end, we
present definitions of consistency and coherence for web-like
environments, that is, distributed client-server information systems
where the semantics of interactions with resources are more general
than the read/write operations found in memory hierarchies and
distributed file systems. We then present a brief review of proposed
mechanisms which strengthen the consistency of caches in the web,
focusing upon their conceptual contributions and their weaknesses in
real-world practice. These insights motivate a new mechanism, which
we call ``Basis Token Consistency'' or BTC; when implemented at the
server, this mechanism allows any client (independent of the presence
and conformity of any intermediaries) to maintain a self-consistent
view of the server's state. This is accomplished by annotating
responses with additional per-resource application information which
allows client caches to recognize the obsolescence of currently cached
entities and identify responses from other caches which are already
stale in light of what has already been seen. The mechanism requires
no deviation from the existing client-server communication model, and
does not require servers to maintain any additional per-client state.
We discuss how our mechanism could be integrated into a
fragment-assembling Content Management System (CMS), and present a
simulation-driven performance comparison between the BTC algorithm and
the use of the Time-To-Live (TTL) heuristic.
::::::::::::::
2001-025
::::::::::::::
Title: Scalability of Multicast Delivery for Non-sequential Streaming Access
Author: Shudong Jin and Azer Bestavros
Date: October 30, 2001
Abstract:
Multicast is considered a panacea for scalable streaming media
delivery over the Internet. To enable asynchronous service over a
multicast infrastructure, two categories of techniques have been
proposed: stream merging and periodic broadcasting. The scalability of
these techniques stems from the fact that for sequential streaming
access, the required server bandwidth grows {\em logarithmically} with
request arrival rates for stream merging techniques, and {\em
logarithmically} with the inverse of start-up delay for periodic
multicasting techniques. Recent studies raise doubts as to the
appropriateness of the sequential access model (in which access to a
stream proceeds uninterrupted from beginning to end). A non-sequential
access model (allowing access to start at random points in the stream)
is more accurate as it allows the modeling of partial access and
client inter-activity. In this paper, we analytically and
experimentally (re-)evaluate the scalability of multicast delivery
under a non-sequential access model. We show that under such a
realistic model, the required server bandwidth for any protocol
providing immediate service grows at least as fast as the {\em square
root} of the request arrival rate, and that the required server
bandwidth for any protocol providing delayed service grows {\em
linearly} with the inverse of the start-up delay. We also investigate
the impact of limited client bandwidth on scalability. We present
practical protocols, which provide immediate service to non-sequential
requests (subject to limited client bandwidth), and which are
near-optimal in that the required server bandwidth is very close to
its lower bound.
::::::::::::::
2001-026
::::::::::::::
Title: How does TCP generate Pseudo-self-similarity? (ERRATA)
Authors: Liang Guo, Mark Crovella, and Ibrahim Matta
Computer Science Department, Boston University
Date: November 7, 2001
Abstract:
In this note we clarify and amend a number of points made in BUCS
Technical Report BUCS-TR-2001-014 (also published in MASCOTS 2001).
We address the relationship to Technical Report UMass-CMPSC-00-55 by
Figueiredo, Liu, Misra, and Towsley.
::::::::::::::
2002-001
::::::::::::::
Title: Specialized Mappings Architecture with Applications to Vision-Based Estimation of Articulated Body Pose
Author: Romer Rosales
Abstract:
A fundamental task of vision systems is to infer the state of the
world given some form of visual observations. From a computational
perspective, this often involves facing an ill-posed problem; e.g.,
information is lost via projection of the 3D world into a 2D image.
Solution of an ill-posed problem requires additional information,
usually provided as a model of the underlying process. It is
important that the model be both computationally feasible as well as
theoretically well-founded. In this thesis, a probabilistic, nonlinear
supervised computational learning model is proposed: the Specialized
Mappings Architecture (SMA). The SMA framework is demonstrated in a
computer vision system that can estimate the articulated pose
parameters of a human body or human hands, given images obtained via
one or more uncalibrated cameras.
The SMA consists of several specialized forward mapping functions that
are estimated automatically from training data, and a possibly known
feedback function. Each specialized function maps certain domains of
the input space (e.g., image features) onto the output space (e.g.,
articulated body parameters). A probabilistic model for the
architecture is first formalized. Solutions to key algorithmic
problems are then derived: simultaneous learning of the specialized
domains along with the mapping functions, as well as performing
inference given inputs and a feedback function. The SMA employs a
variant of the Expectation-Maximization algorithm and approximate
inference. The approach allows the use of alternative conditional
independence assumptions for learning and inference, which are derived
from a forward model and a feedback model.
Experimental validation of the proposed approach is conducted in the
task of estimating articulated body pose from image
silhouettes. Accuracy and stability of the SMA framework is tested
using artificial data sets, as well as synthetic and real video
sequences of human bodies and hands.
::::::::::::::
2002-002
::::::::::::::
Title: Securing Bulk Content Almost for Free
Authors: John Byers, Mei Chin Cheng, Jeffrey Considine, Gene Itkis, Alex Yeung
Date: January 22, 2002
Abstract:
Content providers often consider the costs of security to be greater
than the losses they might incur without it; many view ``casual
piracy'' as their main concern. Our goal is to provide a low cost
defense against such attacks while maintaining rigorous security
guarantees. Our defense is integrated with and leverages fast forward
error correcting codes, such as Tornado codes, which are widely used
to facilitate reliable delivery of rich content. We tune one such
family of codes - while preserving their original desirable properties
- to guarantee that none of the original content can be recovered
whenever a key subset of encoded packets is missing. Ultimately we
encrypt only these key codewords (only 4% of all transmissions),
making the security overhead negligible.
::::::::::::::
2002-003
::::::::::::::
Title: Deanonymizing Users of the SafeWeb Anonymizing Service
Authors: David Martin, Andrew Schulman
Date: Feb 11, 2002
Abstract
The SafeWeb anonymizing system has been lauded by the press and loved
by its users; self-described as "the most widely used online privacy
service in the world," it served over 3,000,000 page views per day at
its peak. SafeWeb was designed to defeat content blocking by
firewalls and to defeat Web server attempts to identify users, all
without degrading Web site behavior or requiring users to install
specialized software. In this article we describe how these
fundamentally incompatible requirements were realized in SafeWeb's
architecture, resulting in spectacular failure modes under simple
JavaScript attacks. These exploits allow adversaries to turn SafeWeb
into a weapon against its users, inflicting more damage on them than
would have been possible if they had never relied on SafeWeb
technology. By bringing these problems to light, we hope to remind
readers of the chasm that continues to separate popular and technical
notions of security.
::::::::::::::
2002-004
::::::::::::::
Title: Small-World Internet Topologies: Possible Causes and Implications on Scalability of End-System Multicast
Author: Shudong Jin and Azer Bestavros
Date: January 30, 2002
Abstract:
Recent work has shown the prevalence of small-world phenomena in many
networks. Small-world graphs exhibit a high degree of clustering, yet
have typically short path lengths between arbitrary vertices. Internet
AS-level graphs have been shown to exhibit small-world behaviors. In
this paper, we show that both Internet AS-level and router-level
graphs exhibit small-world behavior. We attribute such behavior to two
possible causes--namely the high variability of vertex degree
distributions (which were found to follow approximately a power law)
and the preference of vertices to have local connections. We show that
both factors contribute with different relative degrees to the
small-world behavior of AS-level and router-level topologies. Our
findings underscore the inefficacy of the Barabasi-Albert model in
explaining the growth process of the Internet, and provide a basis for
more promising approaches to the development of Internet topology
generators. We present such a generator and show the resemblance of
the synthetic graphs it generates to real Internet AS-level and
router-level graphs. Using these graphs, we have examined how
small-world behaviors affect the scalability of end-system
multicast. Our findings indicate that lower variability of vertex
degree and stronger preference for local connectivity in small-world
graphs results in slower network neighborhood expansion, and in longer
average path length between two arbitrary vertices, which in turn
results in better scaling of end system multicast.
::::::::::::::
2002-005
::::::::::::::
Title: PeriScope: An Active Internet Probing and Measurement API
Author: Khaled Harfoush, Azer Bestavros, and John Byers
{harfoush, best, byers}@cs.bu.edu
Computer Science Department
Boston University
Date: January 30, 2002
Abstract:
Growing interest in inference and prediction of network
characteristics is justified by its importance for a variety of
network-aware applications. One widely adopted strategy to
characterize network conditions relies on active, end-to-end probing
of the network. Active end-to-end probing techniques differ in (1)
the structural composition of the probes they use (e.g., number and
size of packets, the destination of various packets, the protocols
used, etc.), (2) the entity making the measurements (e.g. sender
vs. receiver), and (3) the techniques used to combine measurements
in order to infer specific metrics of interest. In this paper, we
present PeriScope, a Linux API that enables the definition of new
probing structures and inference techniques from user space through
a flexible interface. PeriScope requires no support from clients
beyond the ability to respond to ICMP ECHO REQUESTs and is designed
to minimize user/kernel crossings and to ensure various constraints
(e.g., back-to-back packet transmissions, fine-grained timing
measurements) We show how to use PeriScope for two different probing
purposes, namely the measurement of shared packet losses between
pairs of endpoints and for the measurement of subpath
bandwidth. Results from Internet experiments for both of these goals
are also presented.
::::::::::::::
2002-006
::::::::::::::
::::::::::::::
2002-007
::::::::::::::
Title: Informed Content Delivery Across Adaptive Overlay Networks
Authors: John Byers, Jeffrey Considine, Michael Mitzenmacher, Stanislav
Rost
Date: 3/4/2002
Abstract:
Overlay networks have emerged as a powerful and highly flexible method
for delivering content. We study how to optimize throughput of large,
multipoint transfers across richly connected overlay networks,
focusing on the question of what to put in each transmitted packet.
We first make the case for transmitting encoded content in this
scenario, arguing for the digital fountain approach which enables
end-hosts to efficiently restitute the original content of size n from
a subset of any n symbols from a large universe of encoded symbols.
Such an approach affords reliability and a substantial degree of
application-level flexibility, as it seamlessly tolerates packet loss,
connection migration, and parallel transfers. However, since the sets
of symbols acquired by peers are likely to overlap substantially, care
must be taken to enable them to collaborate effectively. We provide a
collection of useful algorithmic tools for efficient estimation,
summarization, and approximate reconciliation of sets of symbols
between pairs of collaborating peers, all of which keep messaging
complexity and computation to a minimum. Through simulations and
experiments on a prototype implementation, we demonstrate the
performance benefits of our informed content delivery mechanisms and
how they complement existing overlay network architectures.
::::::::::::::
2002-008
::::::::::::::
Title: End-to-End Inference of Loss Nature in a Hybrid Wired/Wireless Environment
Authors: Jun Liu, Ibrahim Matta, and Mark Crovella (Boston University)
Date: March 14, 2002
Abstract:
End-to-End differentiation between wireless and congestion loss can
equip TCP control so it operates effectively in a hybrid
wired/wireless environment. Our approach integrates two techniques:
packet loss pairs (PLP) and Hidden Markov Modeling (HMM). A packet
loss pair is formed by two back-to-back packets, where one packet is
lost while the second packet is successfully received. The purpose is
for the second packet to carry the state of the network path, namely
the round trip time (RTT), at the time the other packet is lost. Under
realistic conditions, PLP provides strong differentiation between
congestion and wireless type of loss based on distinguishable RTT
distributions. An HMM is then trained so observed RTTs can be mapped
to model states that represent either congestion loss or wireless
loss. Extensive simulations confirm the accuracy of our HMM-based
technique in classifying the cause of a packet loss. We also show the
superiority of our technique over the Vegas predictor, which was
recently found to perform best and which exemplifies other existing
loss labeling techniques.
::::::::::::::
2002-009
::::::::::::::
Title: Scheduling Flows with Unknown Sizes: Approximate Analysis
Author: Liang Guo and Ibrahim Matta
Date: March 21, 2002
Abstract:
Previous studies have shown that giving preferential treatment to
short jobs helps reduce the average system response time, especially
when the job size distribution possesses the heavy-tailed
property. Since it has been shown that the TCP flow length
distribution also has the same property, it is natural to let short
TCP flows enjoy better service inside the network. Analyzing such
discriminatory system requires modification to traditional job
scheduling models since usually network traffic managers do not have
detailed knowledge about individual flows such as their lengths. The
Multi-Level (ML) queue, proposed by Kleinrock, can be used to
characterize such system. In an ML queueing system, the priority of a
flow is reduced as the flow stays longer. We present an approximate
analysis of the ML queueing system to obtain a closed-form solution of
the average system response time function. We show that the response
time of short flows can be significantly reduced without penalizing
long flows.
::::::::::::::
2002-010
::::::::::::::
Title: Surface Reconstruction from Multiple Views using Rational B-Splines and Knot Insertion
Authors: Matheen Siddiqui and Stan Sclaroff
Date: March 25, 2002
Abstract:
A method for reconstruction of 3D rational B-spline surfaces from
multiple views is proposed. Given corresponding features in multiple
views, though not necessarily visible in all views, the surface is
reconstructed. First 2D B-spline patches are fitted to each view.
The 3D B-splines and projection matricies can then be extracted from
the 2D B-splines using factorization methods. The surface fit is then
further refined via an iterative procedure. Finally, a hierarchal
fitting scheme is proposed to allow modeling of complex surfaces by
means of knot insertion. Experiments with real imagery demonstrate the
efficacy of the approach.
::::::::::::::
2002-011
::::::::::::::
Title: Automatic Detection of Relevant Head Gestures in American Sign Language Communication
Authors: Ugur Murat Erdem and Stan Sclaroff
Date: May 3, 2002
Abstract:
An automated system for detection of head movements is described. The goal
is to label relevant head gestures in video of American Sign Language (ASL)
communication. In the system, a 3D head tracker recovers head rotation and
translation parameters from monocular video. Relevant head gestures are
then detected by analyzing the length and frequency of the motion signal's
peaks and valleys. Each parameter is analyzed independently, due to the
fact that a number of relevant head movements in ASL are associated with
major changes around one rotational axis. No explicit training of the
system is necessary. Currently, the system can detect ``head shakes." In
experimental evaluation, classification performance is compared against
ground-truth labels obtained from ASL linguists. Initial results are
promising, as the system matches the linguists' labels in a significant
number of cases.
Keywords: Computer human interaction, gesture classification, visual
motion, image and video indexing.
::::::::::::::
2002-012
::::::::::::::
Title: Differentiated Control of Web Traffic: A Numerical Analysis
Authors: Liang Guo and Ibrahim Matta
Computer Science Department
Boston University
Date: May 10, 2002
Abstract:
Internet measurements show that the size distribution of Web-based
transactions is usually very skewed; a few large requests constitute
most of the total traffic. Motivated by the advantages of scheduling
algorithms which favor short jobs, we propose to perform
differentiated control over Web-based transactions to give
preferential service to short web requests. The control is realized
through service semantics provided by Internet Traffic Managers, a
Diffserv-like architecture. To evaluate the performance of such a
control system, it is necessary to have a fast but accurate analytical
method. To this end, we model the Internet as a time-shared system
and propose a numerical approach which utilizes Kleinrock's
conservation law to solve the model. The numerical results are shown
to match well those obtained by packet-level simulation, which runs
orders of magnitude slower than our numerical method.
Keywords: Heavy-tailed Distributions, TCP Congestion Control,
Traffic Engineering.
::::::::::::::
2002-013
::::::::::::::
Title: On the Scalability-Performance Tradeoffs in MPLS and IP Routing
Authors: Selma Yilmaz and Ibrahim Matta
Computer Science Department
Boston University
Date: May 10, 2002
Abstract:
MPLS (Multi-Protocol Label Switching) has recently emerged to facilitate
the engineering of network traffic. This can be achieved by directing
packet flows over paths that satisfy multiple requirements. MPLS has been
regarded as an enhancement to traditional IP routing, which has the
following problems: (1) all packets with the same IP destination address
have to follow the same path through the network; and (2) paths have often
been computed based on static and single link metrics. These problems may
cause traffic concentration, and thus degradation in quality of service.
In this paper, we investigate by simulations a range of routing solutions
and examine the tradeoff between scalability and performance. At one
extreme, IP packet routing using dynamic link metrics provides a stateless
solution but may lead to routing oscillations. At the other extreme, we
consider a recently proposed Profile-based Routing (PBR), which uses
knowledge of potential ingress-egress pairs as well as the traffic profile
among them. Minimum Interference Routing (MIRA) is another recently
proposed MPLS-based scheme, which only exploits knowledge of potential
ingress-egress pairs but not their traffic profile. MIRA and the more
conventional widest-shortest path (WSP) routing represent alternative
MPLS-based approaches on the spectrum of routing solutions. We compare
these solutions in terms of utility, bandwidth acceptance ratio as well as
their scalability (routing state and computational overhead) and load
balancing capability. While the simplest of the per-flow algorithms we
consider, the performance of WSP is close to dynamic per-packet routing,
without the potential instabilities of dynamic routing.
Keywords:
Multi-Protocol Label Switching, IP Routing, Constraint-Based Routing,
Multicommodity Flow Algorithms, Simulation.
::::::::::::::
2002-014
::::::::::::::
Title: A Hierarchical Characterization of a Live Streaming Media Workload
Authors: Eveline Veloso, Virgilio Almeida, Wagner Meira
(Federal University of Minas Gerais, Brazil), and
Azer Bestavros, Sudong Jin
(Boston University, Massachusetts, USA)
Date: May 10, 2002
Abstract:
We present what we believe to be the first thorough characterization
of live streaming media content delivered over the Internet. Our
characterization of over five million requests spanning a 28-day
period is done at three increasingly granular levels, corresponding
to clients, sessions, and transfers. Our findings support two
important conclusions. First, we show that the nature of
interactions between users and objects is fundamentally different
for live versus stored objects. Access to stored objects is user
driven, whereas access to live objects is object driven}. This
reversal of active/passive roles of users and objects leads to
interesting dualities. For instance, our analysis underscores a
Zipf-like profile for user interest in a given object, which is to
be contrasted to the classic Zipf-like popularity of objects for a
given user. Also, our analysis reveals that transfer lengths are
highly variable and that this variability is due to the stickiness
of clients to a particular live object, as opposed to structural
(size) properties of objects. Second, based on observations we
make, we conjecture that the particular characteristics of live
media access workloads are likely to be highly dependent on the
nature of the live content being accessed. In our study, this
dependence is clear from the strong temporal correlations we
observed in the traces, which we attribute to the synchronizing
impact of live content on access characteristics. Based on our
analyses, we present a model for live media workload generation that
incorporates many of our findings, and which we implement in GISMO.
Keywords: Live streaming content delivery; streaming media
characterization; synthetic workload generation.
::::::::::::::
2002-015
::::::::::::::
Title: On the Geographic Location of Internet Resources
Authors: Anukool Lakhina, John Byers, Mark Crovella, and Ibrahim Matta
Date: May 21, 2002
Abstract:
One relatively unexplored question about the Internet's physical
structure concerns the geographical location of its components:
routers, links and autonomous systems (ASes). We study this question
using two large inventories of Internet routers and links, collected
by different methods and about two years apart. We first map each
router to its geographical location using two different
state-of-the-art tools. We then study the relationship between router
location and population density; between geographic distance and link
density; and between the size and geographic extent of ASes. Our
findings are consistent across the two datasets and both mapping
methods. First, as expected, router density per person varies widely
over different economic regions; however, in economically homogeneous
regions, router density shows a strong superlinear relationship to
population density. Second, the probability that two routers are
directly connected is strongly dependent on distance; our data is
consistent with a model in which a majority (up to 75-95\%) of link
formation is based on geographical distance (as in the Waxman topology
generation method). Finally, we find that ASes show high variability
in geographic size, which is correlated with other measures of AS size
(degree and number of interfaces). Among small to medium ASes, ASes
show wide variability in their geographic dispersal; however, all ASes
exceeding a certain threshold in size are maximally dispersed
geographically. These findings have many implications for the next
generation of topology generators, which we envisage as producing
router-level graphs annotated with attributes such as link latencies,
AS identifiers and geographical locations.
::::::::::::::
2002-016
::::::::::::::
Title: Effectiveness of Loss Labeling in Improving TCP Performance
in Wired/Wireless Networks
Authors: Dhiman Barman and Ibrahim Matta
Computer Science Department
Boston University
Date: May 22, 2002
Abstract:
The current congestion-oriented design of TCP hinders its ability to
perform well in hybrid wireless/wired networks. We propose a new
improvement on TCP NewReno (NewReno-FF) using a new loss labeling
technique to discriminate wireless from congestion losses. The
proposed technique is based on the estimation of average and variance
of the round trip time using a filter called Flip Flop filter that is
augmented with history information. We show the comparative
performance of TCP NewReno, NewReno-FF, and TCP Westwood through
extensive simulations. We study the fundamental gains and limits
using TCP NewReno with varying Loss Labeling accuracy (NewReno-LL) as
a benchmark. Lastly our investigation opens up important research
directions. First, there is a need for a finer grained classification
of losses (even within congestion and wireless losses) for TCP in
heterogeneous networks. Second, it is essential to develop an
appropriate control strategy for recovery after the correct
classification of a packet loss.
Keywords:
TCP; Congestion Control; Error Control; Loss Labeling (Classification);
Wireless Links; Simulation.
::::::::::::::
2002-017
::::::::::::::
Title: Safe Composition of Web Communication Protocols for Extensible Edge Services
Authors: Adam Bradley, Azer Bestavros, and Assaf Kfoury
Date: May 22, 2002
Abstract:
As new multi-party edge services are deployed on the Internet,
application-layer protocols with complex communication models and
event dependencies are increasingly being specified and adopted. To
ensure that such protocols (and compositions thereof with existing
protocols) do not result in undesirable behaviors (e.g., livelocks)
there needs to be a methodology for the automated checking of the
``safety'' of these protocols. In this paper, we present ingredients
of such a methodology. Specifically, we show how SPIN, a tool from the
formal systems verification community, can be used to quickly identify
problematic behaviors of application-layer protocols with non-trivial
communication models---such as HTTP with the addition of the ``100
Continue'' mechanism. As a case study, we examine several versions of
the specification for the Continue mechanism; our experiments
mechanically uncovered multi-version interoperability problems,
including some which motivated revisions of HTTP/1.1 and some which
persist even with the current version of the protocol. One such
problem resembles a classic degradation-of-service attack, but can
arise between well-meaning peers. We also discuss how the methods we
employ can be used to make explicit the requirements for hardening a
protocol's implementation against potentially malicious peers, and for
verifying an implementation's interoperability with the full range of
allowable peer behaviors.
Keywords: Formal verification, HTTP, Interoperability, Model checking, Protocol composition.
::::::::::::::
2002-018
::::::::::::::
Title: Unicast Routing: Cost-Performance Tradeoffs
Author: Selma Yilmaz and Ibrahim Matta
Date: July 5, 2002
Abstract:
The objective of unicast routing is to find a path from a source to a
destination. Conventional routing has been used mainly to provide
connectivity. It lacks the ability to provide any kind of service
guarantees and smart usage of network resources. Improving performance
is possible by being aware of both traffic characteristics and current
available resources. This paper surveys a range of routing solutions,
which can be categorized depending on the degree of the awareness of
the algorithm: (1) QoS/Constraint-based routing solutions are aware of
traffic requirements of individual connection requests; (2)
Traffic-aware routing solutions assume knowledge of the location of
communicating ingress-egress pairs and possibly the traffic demands
among them; (3) Routing solutions that are both QoS-aware as (1) and
traffic-aware as (2); (4) Best-effort solutions are oblivious to both
traffic and QoS requirements, but are adaptive only to current
resource availability. The best performance can be achieved by having
all possible knowledge so that while finding a path for an individual
flow, one can make a smart choice among feasible paths to increase the
chances of supporting future requests. However, this usually comes at
the cost of increased complexity and decreased scalability. In this
paper, we discuss such cost-performance tradeoffs by surveying
proposed heuristic solutions and hybrid approaches.
::::::::::::::
2002-019
::::::::::::::
Title: Fast Approximate Reconciliation of Set Differences
Authors: John W. Byers, Jeffrey Considine and Michael Mitzenmacher
Date: 7/11/02
Abstract:
We present new, simple, efficient data structures for approximate
reconciliation of set differences, a useful standalone primitive for
peer-to-peer networks and a natural subroutine in methods for exact
reconciliation. In the approximate reconciliation problem, peers A and
B respectively have subsets of elements S(A) and S(B) of a large
universe U. Peer A wishes to send a short message M to peer B with
the goal that B should use M to determine as many elements in the set
S(B) - S(A) as possible. To avoid the expense of round trip
communication times, we focus on the situation where a single message
M is sent.
We motivate the performance tradeoffs between message size, accuracy
and computation time for this problem with a straightforward approach
using Bloom filters. We then introduce approximation reconciliation
trees, a more computationally efficient solution that combines
techniques from Patricia tries, Merkle trees, and Bloom filters. We
present an analysis of approximation reconciliation trees and provide
experimental results comparing the various methods proposed for
approximate reconciliation.
::::::::::::::
2002-020
::::::::::::::
Title: Graph Wavelets for Spatial Traffic Analysis
Authors: Mark Crovella and Eric Kolaczyk
Date: July 15, 2002
Abstract:
A number of problems in network operations and engineering call
for new methods of traffic analysis. While most existing traffic
analysis methods are fundamentally temporal, there is a clear need
for the analysis of traffic across multiple network links --- that is, for
spatial traffic analysis. In this paper we give examples of
problems that can be addressed via spatial traffic analysis. We then
propose a formal approach to spatial traffic analysis based on the
wavelet transform. Our approach generalizes the traditional wavelet
transform so that it can be applied to data elements connected via an
arbitrary topology. We explore the necessary and desirable properties
of this approach (graph wavelets) and consider some of its
possible realizations. We then apply graph wavelets to measurements
from an operating network. Our results show that graph wavelets are
very useful for our motivating problems; for example, they can be used
to form highly summarized views of an entire network's traffic load, to
gain insight into a network's global traffic response to a link failure,
and to localize the extent of a failure event within the network.
::::::::::::::
2002-021
::::::::::::::
Title: Sampling Biases in IP Topology Measurements
Authors: Anukool Lakhina, John W. Byers, Mark Crovella, and Peng Xie
Date: July 15, 2002
Abstract:
Considerable attention has been focused on the properties of
graphs derived from Internet measurements. Router-level topologies
collected via traceroute studies have led some authors to conclude that
the router graph of the Internet is a scale-free graph, or more
generally a power-law random graph. In such a graph, the degree
distribution of nodes follows a distribution with a power-law tail.
In this paper we argue that the evidence to date for this conclusion is at
best insufficient. We show that graphs appearing to have power-law degree
distributions can arise surprisingly easily, when sampling graphs whose
true degree distribution is not at all like a power-law. For example,
given a classical Erdos-Renyi sparse, random graph, the subgraph
formed by a collection of shortest paths
from a small set of random sources to
a larger set of random destinations can easily appear to show a
degree distribution remarkably like a power-law.
We explore the reasons for how this effect arises, and show that in such
a setting, edges are sampled in a highly biased manner. This insight
allows us to distinguish measurements taken from the Erdos-Renyi
graphs from those taken from power-law random graphs. When we apply
this distinction to a number of well-known datasets, we find that the
evidence for sampling bias in these datasets is strong.
::::::::::::::
2002-022
::::::::::::::
Title: On the Intrinsic Locality Properties of Web Reference Streams
Authors: Rodrigo Fonseca, Virgilio Almeida, Mark Crovella, and Bruno Abrahao
Date: July 15, 2002
Abstract:
There has been considerable work done in the study of Web reference
streams: sequences of requests for Web objects. In particular, many
studies have looked at the locality properties of such streams,
because of the impact of locality on the design and performance of
caching and prefetching systems. However, a general framework for
understanding why reference streams exhibit given locality properties
has not yet emerged. In this paper we take a first step in this
direction. We propose a framework for describing how reference
streams are transformed as they pass through the Internet, based on
three operations: aggregation, disaggregation, and filtering. We also
propose metrics to capture the temporal locality of reference streams
in this framework. We argue that these metrics (marginal entropy and
interreference coefficient of variation) are more natural and more
useful than previously proposed metrics for temporal locality; and we
show that these metrics provide insight into the nature of reference
stream transformations in the Web.
::::::::::::::
2002-023
::::::::::::::
Title: A Self-initializing Eyebrow Tracker for Binary Switch Emulation
Authors: Jonathan Lombardi and Margrit Betke, Boston University
Date: 9/20/2002
Abstract:
We designed the "Eyebrow-Clicker," a camera-based human computer
interface system that implements a new form of binary switch. When
the user raises his or her eyebrows, the binary switch is activated
and a selection command is issued. The Eyebrow-Clicker thus replaces
the "click" functionality of a mouse. The system initializes itself
by detecting the user's eyes and eyebrows, tracks these features at
frame rate, and recovers in the event of errors. The initialization
uses the natural blinking of the human eye to select suitable
templates for tracking. Once execution has begun, a user therefore
never has to restart the program or even touch the computer. In our
experiments with human-computer interaction software, the system
successfully determined 93% of the time when a user raised his
eyebrows.
::::::::::::::
2002-024
::::::::::::::
Title: Cache-and-Relay Streaming Media Delivery for Asynchronous Clients
Authors: Shudong Jin and Azer Bestavros
Date: September 20, 2002
Abstract:
We consider the problem of delivering popular streaming media to a
large number of asynchronous clients. We propose and evaluate a
cache-and-relay end-system multicast approach, whereby a client
joining a multicast session caches the stream, and if needed, relays
that stream to neighboring clients which may join the multicast
session at some later time. This cache-and-relay approach is fully
distributed, scalable, and efficient in terms of network link cost. In
this paper we analytically derive bounds on the network link cost of
our cache-and-relay approach, and we evaluate its performance under
assumptions of limited client bandwidth and limited client cache
capacity. When client bandwidth is limited, we show that although
finding an optimal solution is NP-hard, a simple greedy algorithm
performs surprisingly well in that it incurs network link costs that
are very close to a theoretical lower bound. When client cache
capacity is limited, we show that our cache-and-relay approach can
still significantly reduce network link cost. We have evaluated our
cache-and-relay approach using simulations over large, synthetic
random networks, power-law degree networks, and small-world networks,
as well as over large real router-level Internet maps.
::::::::::::::
2002-025
::::::::::::::
Title: Smooth Multirate Multicast Congestion Control
Authors: Gu-In Kwon and John W. Byers
Date: September 25, 2002
Abstract:
A significant impediment to deployment of multicast services is the
daunting technical complexity of developing, testing and validating
congestion control protocols fit for wide-area deployment. Protocols such
as pgmcc and TFMCC have recently made considerable progress on the single
rate case, i.e. where one dynamic reception rate is maintained for all
receivers in the session. However, these protocols have limited
applicability, since scaling to session sizes beyond tens of participants
necessitates the use of multiple rate protocols. Unfortunately, while
existing multiple rate protocols exhibit better scalability, they are
both less mature than single rate protocols and suffer from high
complexity.
We propose a new approach to multiple rate congestion control that
leverages proven single rate congestion control methods by orchestrating
an ensemble of independently controlled single rate sessions. We describe
SMCC, a new multiple rate equation-based congestion control algorithm for
layered multicast sessions that employs TFMCC as the primary underlying
control mechanism for each layer. SMCC combines the benefits of TFMCC
(smooth rate control, equation-based TCP friendliness) with the
scalability and flexibility of multiple rates to provide a sound multiple
rate multicast congestion control policy.
::::::::::::::
2002-026
::::::::::::::
Title: Scalable Peer-to-Peer Indexing with Constant State
Authors: Jeffrey Considine and Thomas Florio
Date: September 25, 2002
Abstract:
We present a distributed indexing scheme for peer to peer
networks. Past work on distributed indexing traded off fast search times
with non-constant degree topologies or network-unfriendly behavior such as
flooding. In contrast, the scheme we present optimizes all three of these
performance measures. That is, we provide logarithmic round searches while
maintaining connections to a fixed number of peers and avoiding network
flooding. In comparison to the well known scheme Chord, we provide
competitive constant factors. Finally, we observe that arbitrary linear
speedups are possible and discuss both a general brute force approach and
specific economical optimizations.
::::::::::::::
2002-027
::::::::::::::
Title: A Spectrum of TCP-friendly Window-based Congestion Control Algorithms
Author: Shudong Jin, Liang Guo, Ibrahim Matta, and Azer Bestavros
{jins, guol, matta, best}@cs.bu.edu
Computer Science Department
Boston University
Date: February 2, 2001
Revised on April 27, 2001
Posted on July 12, 2001
Abstract:
This technical report revises BUCS-TR-2001-015 and is a longer version
of a paper to appear in IEEE/ACM Transactions on Networking.
The increased diversity of Internet application requirements has
spurred recent interests in transport protocols with flexible
transmission controls. In window-based congestion control schemes,
increase rules determine how to probe available bandwidth, whereas
decrease rules determine how to back off when losses due to
congestion are detected. The parameterization of these control rules
is done so as to ensure that the resulting protocol is TCP-friendly
in terms of the relationship between throughput and loss rate.
In this paper, we define a new spectrum of window-based congestion
control algorithms that are TCP-friendly as well as TCP-compatible
under RED. Contrary to previous memory-less controls, our algorithms
utilize history information in their control rules. Our proposed
algorithms have two salient features: (1) They enable a wider region
of TCP-friendliness, and thus more flexibility in trading off among
smoothness, aggressiveness, and responsiveness; and (2) they ensure a
faster convergence to fairness under a wide range of system
conditions. SIMD is one instance of this spectrum of algorithms, in
which the congestion window is increased super-linearly with time
since the detection of the last loss. Compared to recently proposed
TCP-friendly AIMD and binomial algorithms, we demonstrate the
superiority of SIMD in: (1) adapting to sudden increases in available
bandwidth, while maintaining competitive smoothness and
responsiveness; and (2) rapidly converging to fairness and
efficiency.
Keywords:
Congestion Control, TCP-friendly, Fairness, Convergence.
::::::::::::::
2002-028
::::::::::::::
Title: A Short History of Computational Complexity
Author: Lance Fortnow, NEC Research and Steven Homer, BU
Date: October 30, 2002
Abstract:
A brief history of the major issues and developments in computational
complexity theory over the past 30 years is presented.
This paper will appear in the volume entitled,
"A History of Mathematical Logic", edited by D. van Dalen, J. Dawson
and A. Kanamori, and published by Elsevier.
::::::::::::::
2002-029
::::::::::::::
Title: Simple Load Balancing for Distributed Hash Tables
Authors: John Byers, Jeffrey Considine, and Michael Mitzenmacher
Date: November 1, 2002
Abstract:
Distributed hash tables have recently become a useful building
block for a variety of distributed applications. However, current schemes
based upon consistent hashing require both considerable implementation
complexity and substantial storage overhead to achieve desired load
balancing goals. We argue in this paper that these goals can be achieved
more simply and more cost-effectively. First, we suggest the direct
application of the ``power of two choices'' paradigm, whereby an item is
stored at the less loaded of two (or more) random alternatives. We then
consider how associating a small constant number of hash values with a key
can naturally be extended to support other load balancing methods,
including load-stealing or load-shedding schemes, as well as providing
natural fault-tolerance mechanisms.
::::::::::::::
2002-030
::::::::::::::
Title: Validating Arbitrarily Large Network Protocol Compositions with Finite Computation
Authors: Adam D. Bradley, Azer Bestavros, and Assaf J. Kfoury
Date: November 1, 2002
Abstract:
Formal tools like finite-state model checkers have proven useful in
verifying the correctness of systems of bounded size and for hardening
single system components against arbitrary inputs. However,
conventional applications of these techniques are not well suited to
characterizing emergent behaviors of large compositions of processes.
In this paper, we present a methodology by which arbitrarily large
compositions of components can, if sufficient conditions are proven
concerning properties of small compositions, be modeled and completely
verified by performing formal verifications upon only a finite set of
compositions. The sufficient conditions take the form of reductions,
which are claims that particular sequences of components will be
causally indistinguishable from other shorter sequences of components.
We show how this methodology can be applied to a variety of network
protocol applications, including two features of the HTTP protocol, a
simple active networking applet, and a proposed web cache consistency
algorithm. We also doing discuss its applicability to framing protocol
design goals and to representing systems which employ non-model-checking
verification methodologies. Finally, we briefly discuss how we hope to
broaden this methodology to more general topological compositions of
network applications.
Keywords: Protocol Verification, Formal Methods, Model Checking,
Language Reduction, Protocol Design
::::::::::::::
2002-031
::::::::::::::
Title: Cluster-based Optimizations for Distributed Hash Tables
Author: Jeffrey Considine
Date: November 1, 2002
Abstract:
We consider the problem of performing topological optimizations of
distributed hash tables. Such hash tables include Chord and Tapestry
and are a popular building block for distributed applications.
Optimizing topologies over one dimensional hash spaces is particularly
difficult as the higher dimensionality of the underlying network makes
close fits unlikely. Instead, current schemes are limited to
heuristically performing local optimizations finding the best of small
random set of peers. We propose a new class of topology optimizations
based on the existence of clusters of close overlay members within the
underlying network. By constructing additional overlays for each
cluster, a significant portion of the search procedure can be
performed within the local cluster with a corresponding reduction in
the search time. Finally, we discuss the effects of these additional
overlays on spatial locality and other load balancing schemes.
::::::::::::::
2003-001
::::::::::::::
Title: On the Size Distribution of Autonomous Systems
Authors: Marwan Fayed, Paul Krapivsky, John Byers, Mark Crovella, David Finkel(WPI), Sid Redner
Date: January 17, 2003
Abstract:
This paper explores reasons for the high degree of variability in the
sizes of ASes that have recently been observed, and the processes by
which this variable distribution develops. AS size distribution is
important for a number of reasons. First, when modeling network
topologies, an AS size distribution assists in labeling routers with
an associated AS. Second, AS size has been found to be positively
correlated with the degree of the AS (number of peering links), so
understanding the distribution of AS sizes has implications for AS
connectivity properties. Our model accounts for AS births, growth, and
mergers. We analyze two models: one incorporates only the growth of
hosts and ASes, and a second extends that model to include mergers of
ASes. We show analytically that, given reasonable assumptions about
the nature of mergers, the resulting size distribution exhibits a
power law tail with the exponent independent of the details of the
merging process. We estimate parameters of the models from
measurements obtained from Internet registries and from BGP tables.
We then compare the models solutions to empirical AS size distribution
taken from Mercator and Skitter datasets, and find that the simple
growth-based model yields general agreement with empirical data. Our
analysis of the model in which mergers occur in a manner independent
of the size of the merging ASes suggests that more detailed analysis
of merger processes is needed.
::::::::::::::
2003-002
::::::::::::::
Title: Geometric Generalizations of the Power of Two Choices
Authors: John Byers, Jef Considine, and Michael Mitzenmacher
Abstract:
A well-known paradigm for load balancing in distributed systems is the
``power of two choices,'' whereby an item is stored at the less loaded of
two (or more) random alternative servers. We investigate the power of two
choices in natural settings for distributed computing where items and
servers reside in a geometric space and each item is associated with the
server that is its nearest neighbor. This is in fact the backdrop for
distributed hash tables such as Chord, where the geometric space is
determined by clockwise distance on a one-dimensional ring.
Theoretically, we consider the following load balancing problem. Suppose
that servers are initially hashed uniformly at random to points in the
space. Sequentially, each item then considers d candidate insertion
points also chosen uniformly at random from the space, and selects the
insertion point whose associated server has the least load. For the
one-dimensional ring, and for Euclidean distance on the two-dimensional
torus, we demonstrate that when n data items are hashed to n servers, the
maximum load at any server is log log n / log d + O(1) with high
probability. While our results match the well-known bounds in the
standard setting in which each server is selected equiprobably, our
applications do not have this feature, since the sizes of the
nearest-neighbor regions around servers are non-uniform. Therefore, the
novelty in our methods lies in developing appropriate tail bounds on the
distribution of nearest-neighbor region sizes and in adapting previous
arguments to this more general setting. In addition, we provide
simulation results demonstrating the load balance that results as the
system size scales into the millions.
::::::::::::::
2003-003
::::::::::::::
Title: On the Convergence of Statistical Techniques for
Inferring Network Traffic Demands
Author: Alberto Medina, Kave Salamatian, Nina Taft, Ibrahim Matta, Yolanda Tsang, Christophe Diot
Date: February 6, 2003
Abstract:
Accurate knowledge of traffic demands in a communication network
enables or enhances a variety of traffic engineering and network
management tasks of paramount importance for operational networks.
Directly measuring a complete set of these demands is prohibitively
expensive because of the huge amounts of data that must be collected
and the performance impact that such measurements would impose on the
regular behavior of the network. As a consequence, we must rely on
statistical techniques to produce estimates of actual traffic demands
from partial information. The performance of such techniques is
however limited due to their reliance on limited information and the
high amount of computations they incur, which limits their convergence
behavior. In this paper we study strategies to improve the convergence
of a powerful statistical technique based on an
Expectation-Maximization iterative algorithm. First we analyze
modeling approaches to generating starting points. We call these
starting points {\it informed priors} since they are obtained using
actual network information such as packet traces and SNMP link counts.
Second we provide a very fast variant of the EM algorithm which
extends its computation range, increasing its accuracy and decreasing
its dependence on the quality of the starting point. Finally, we
study the convergence characteristics of our EM algorithm and compare
it against a recently proposed Weighted Least Squares approach.
::::::::::::::
2003-004
::::::::::::::
Title: Cryptographic Tamper Evidence
Author: Gene Itkis, Boston University
Date: February 11, 2003
Abstract:
We propose a new notion of cryptographic tamper evidence. A
tamper-evident signature scheme provides an additional procedure Div
which detects tampering: given two signatures, Div can determine
whether one of them was generated by the forger. Surprisingly, this is
possible even after the adversary has inconspicuously learned some ---
or even all --- the secrets in the system. In this case, it might be
impossible to tell which signature is generated by the legitimate
signer and which by the forger. But at least the fact of the
tampering will be made evident. We define several variants of
tamper-evidence, differing in their power to detect tampering. In all
of these, we assume an equally powerful adversary: she adaptively
controls all the inputs to the legitimate signer (i.e., all messages
to be signed and their timing), and observes all his outputs; she can
also adaptively expose all the secrets at arbitrary times. We provide
tamper-evident schemes for all the variants and prove their
optimality. We stress that our mechanisms are purely cryptographic:
the tamper-detection algorithm Div is stateless and takes no inputs
except the two signatures (in particular, it keeps no logs), we use no
infrastructure (or other ways to conceal additional secrets), and we
use no hardware properties (except those implied by the standard
cryptographic assumptions, such as random number generators). Our
constructions are based on arbitrary ordinary signature schemes and do
not require random oracles.
::::::::::::::
2003-005
::::::::::::::
Title: On the Emergence of Highly Variable Distributions in the Autonomous System Topology
Author: Fayed, Marwan; Krapivsky, Paul; Byers, John; Crovella, Mark; Finkel, David; Redner, Sid
Date: March 1, 2003
Abstract:
Recent studies have noted that vertex degree in the autonomous system
(AS) graph exhibits a highly variable distribution \cite{fff,MP01}.
The most prominent explanatory model for this phenomenon is the
Barabasi-Albert (B-A) model [BA99,AB00]. A central feature of
the B-A model is preferential connectivity --- meaning that the
likelihood a new node in a growing graph will connect to an existing
node is proportional to the existing node's degree. In this paper we
ask whether a more general explanation than the B-A model, and absent
the assumption of preferential connectivity, is consistent with
empirical data. We are motivated by two observations: first, AS
degree and AS size are highly correlated [CHEN02]; and second,
highly variable AS size can arise simply through exponential growth.
We construct a model incorporating exponential growth in the size of
the Internet, and in the number of ASes. We then show via analysis
that such a model yields a size distribution exhibiting a power-law
tail. In such a model, if an AS's link formation is roughly
proportional to its size, then AS degree will also show high
variability. We instantiate such a model with empirically derived
estimates of growth rates and show that the resulting degree
distribution is in good agreement with that of real AS graphs.
::::::::::::::
2003-006
::::::::::::::
Title: Skin Color-Based Video Segmentation under Time-Varying Illumination
Authors: Leonid Sigal and Stan Sclaroff
Date: March 28, 2003
Abstract:
A novel approach for real-time skin segmentation in video
sequences is described. The approach enables reliable skin
segmentation despite wide variation in illumination during
tracking. An explicit second order Markov model is used to
predict evolution of the skin-color (HSV) histogram over time.
Histograms are dynamically updated based on feedback from the
current segmentation and predictions of the Markov model. The
evolution of the skin-color distribution at each frame is
parameterized by translation, scaling and rotation in color space.
Consequent changes in geometric parameterization of the
distribution are propagated by warping and re-sampling the
histogram. The parameters of the discrete-time dynamic Markov
model are estimated using Maximum Likelihood Estimation, and also
evolve over time. The accuracy of the new dynamic skin color
segmentation algorithm is compared to that obtained via a static
color model. Segmentation accuracy is evaluated using labeled
ground-truth video sequences taken from staged experiments and
popular movies. An overall increase in segmentation accuracy of up
to 24% is observed in 17 out of 21 test sequences. In all but one
case the skin-color classification rates for our system were
higher, with background classification rates comparable to those
of the static segmentation.
::::::::::::::
2003-007
::::::::::::::
Title: The Specialized Mappings Architecture
Authors: Romer Rosales and Stan Sclaroff
Date: March 28, 2003
Abstract:
A probabilistic, nonlinear supervised learning model is proposed:
the Specialized Mappings Architecture (SMA). The SMA employs a
set of several mapping functions that are estimated automatically
from training data. Each specialized function maps certain domains
of the input space (e.g., image features) onto the output space
(e.g., articulated body parameters). One important advantage of
the SMA is that it can model ambiguous, one-to-many mappings that
may yield multiple valid output hypotheses. Once learned, the
mapping functions generate a set of output hypotheses for a given
input via a statistical inference procedure. The SMA inference
procedure incorporates an inverse mapping or feedback function,
which enables the SMA to evaluate the likelihood of each
hypothesis. Possible feedback functions include computer graphics
rendering routines that can generate images for given hypotheses.
The SMA employs a variant of the Expectation-Maximization
algorithm for simultaneous learning of the specialized domains
along with the mapping functions, and approximate strategies for
inference. The framework is demonstrated in a computer vision
system that can estimate the articulated pose parameters of a
human body or human hands, given image silhouettes. The accuracy
and stability of the SMA are also tested using synthetic images of
human bodies and hands, where ground truth is known.
::::::::::::::
2003-008
::::::::::::::
Title: Discovering Clusters in Motion Time-Series Data
Authors: Jonathan Alon, Stan Sclaroff, George Kollios, and Vladimir Pavlovic
Date: March 28, 2003
Abstract:
A new approach is proposed for clustering time-series
data. The approach can be used to discover groupings of
similar object motions that were observed in a video collection.
A finite mixture of hidden Markov models (HMMs) is
fitted to the motion data using the expectation-maximization
(EM) framework. Previous approaches for HMM-based
clustering employ a k-means formulation, where each sequence
is assigned to only a single HMM. In contrast, the
formulation presented in this paper allows each sequence to
belong to more than a single HMM with some probability,
and the hard decision about the sequence class membership
can be deferred until a later time when such a decision
is required. Experiments with simulated data demonstrate
the benefit of using this EM-based approach when there is
more "overlap" in the processes generating the data. Experiments
with real data show the promising potential of
HMM-based motion clustering in a number of applications.
::::::::::::::
2003-009
::::::::::::::
Title: Estimating 3D Hand Pose from a Cluttered Image
Authors: Vassilis Athitsos and Stan Sclaroff
Date: April 1, 2003
Abstract:
A method is proposed that can generate a ranked list of plausible
three-dimensional hand configurations that best match an input image.
Hand pose estimation is formulated as an image database indexing
problem, where the closest matches for an input hand image are
retrieved from a large database of synthetic hand images. In contrast
to previous approaches, the system can function in the presence of
clutter, thanks to two novel clutter-tolerant indexing methods. First,
a computationally efficient approximation of the image-to-model
chamfer distance is obtained by embedding binary edge images into a
high-dimensional Euclidean space. Second, a general-purpose,
probabilistic line matching method identifies those line segment
correspondences between model and input images that are the least
likely to have occurred by chance. The performance of this
clutter-tolerant approach is demonstrated in quantitative experiments
with hundreds of real hand images.
::::::::::::::
2003-010
::::::::::::::
Title: Database Indexing Methods for 3D Hand Pose Estimation
Authors: Vassilis Athitsos and Stan Sclaroff
Date: April 1, 2003
Abstract:
Estimation of 3D hand pose is useful in many gesture recognition
applications, ranging from human-computer interaction to automated
recognition of sign languages. In this paper, 3D hand pose estimation
is treated as a database indexing problem. Given an input image of a
hand, the most similar images in a large database of hand images are
retrieved. The hand pose parameters of the retrieved images are used
as estimates for the hand pose in the input image. Lipschitz
embeddings of edge images into a Euclidean space are used to improve
the efficiency of database retrieval. In order to achieve interactive
retrieval times, similarity queries are initially performed in this
Euclidean space. The paper describes ongoing work that focuses on how
to best choose reference images, in order to improve retrieval
accuracy.
::::::::::::::
2003-011
::::::::::::::
Title: How well can TCP infer network state?
Authors: Dhiman Barman and Ibrahim Matta
Date: May 16, 2003
Abstract:
The Transmission Control Protocol (TCP) has been the protocol of
choice for many Internet applications requiring reliable connections.
The design of TCP has been challenged by the extension of connections
over wireless links. We ask a fundamental question: {\em What is the
basic predictive power of TCP of network state, including wireless
error conditions?} The goal is to improve or readily exploit this
predictive power to enable TCP (or variants) to perform well in
generalized network settings.
To that end, we use Maximum Likelihood Ratio tests to evaluate TCP as
a detector/estimator. We quantify how well network state can be
estimated, given network response such as distributions of packet
delays or TCP throughput that are conditioned on the type of packet
loss. Using our model-based approach and extensive simulations, we
demonstrate that congestion-induced losses and losses due to wireless
transmission errors produce sufficiently different statistics upon
which an efficient detector can be built; distributions of network
loads can provide effective means for estimating packet loss type; and
packet delay is a better signal of network state than short-term
throughput. We demonstrate how estimation accuracy is influenced by
different proportions of congestion versus wireless losses and
penalties on incorrect estimation.
Keywords: TCP; Congestion Control; Error Control; Binary Hypothesis
Testing; Maximum Likelihood Ratio Test; Gaussian Distribution;
Wireless Links; Simulation.
::::::::::::::
2003-012
::::::::::::::
Title: Systematic Verication of Safety Properties of Arbitrary Network Protocol Compositions Using CHAIN
Authors: Adam Bradley, Azer Bestavros, and Assaf Kfoury
Date: May 16, 2003
Abstract:
Formal correctness of complex multi-party network protocols can be
difficult to verify. While models of specific fixed compositions of agents
can be checked against design constraints, protocols which lend themselves
to arbitrarily many compositions of agents--such as the chaining of
proxies or the peering of routers--are more difficult to verify because
they represent potentially infinite state spaces and may exhibit emergent
behaviors which may not materialize under particular fixed compositions.
We address this challenge by developing an algebraic approach that enables
us to reduce arbitrary compositions of network agents into a
behaviorally-equivalent (with respect to some correctness property)
compact, canonical representation, which is amenable to mechanical
verification. Our approach consists of an algebra and a set of
property-preserving rewrite rules for the Canonical Homomorphic
Abstraction of Infinite Network protocol compositions (CHAIN). Using
CHAIN, an expression over our algebra (i.e., a set of configurations of
network protocol agents) can be reduced to another behaviorally-equivalent
expression (i.e., a smaller set of configurations). Repeated applications
of such rewrite rules produces a canonical expression which can be checked
mechanically. We demonstrate our approach by characterizing deadlock-prone
configurations of HTTP agents, as well as establishing useful properties
of an overlay protocol for scheduling MPEG frames, and of a protocol for
Web intra-cache consistency.
::::::::::::::
2003-013
::::::::::::::
Title: On the Efficiency and Fairness of Transmission Control Loops: A Case for Exogenous Losses
Authors: Mina Guirguis, Azer Bestavros, and Ibrahim Matta
Date: May 16, 2003
Abstract:
We postulate that exogenous losses--which are typically regarded as
introducing undesirable ``noise'' that needs to be filtered out or hidden
from end points--can be surprisingly beneficial. In this paper we evaluate
the effects of exogenous losses on transmission control loops, focusing
primarily on efficiency and convergence to fairness properties. By
analytically capturing the effects of exogenous losses, we are able to
characterize the transient behavior of TCP. Our numerical results suggest
that ``noise'' resulting from exogenous losses should not be filtered out
blindly, and that a careful examination of the parameter space leads to
better strategies regarding the treatment of exogenous losses inside the
network. Specifically, we show that while low levels of exogenous losses do
help connections converge to their fair share, higher levels of losses lead
to inefficient network utilization. We draw the line between these two
cases by determining whether or not it is advantageous to hide, or more
interestingly introduce, exogenous losses. Our proposed approach is based
on classifying the effects of exogenous losses into long-term and
short-term effects. Such classification informs the extent to which we
control exogenous losses, so as to operate in an efficient and fair
region. We validate our results through simulations.
::::::::::::::
2003-014
::::::::::::::
Title: User-Level Sandboxing: a Safe and Efficient Mechanism for Extensibility
Author: Richard West and Jason Gloudon
Date: June 1, 2003
Abstract:
Extensible systems allow services to be configured and deployed for
the specific needs of individual applications. This paper describes a
safe and efficient method for user-level extensibility that requires
only minimal changes to the kernel. A sandboxing technique is
described that supports multiple logical protection domains within the
same address space at user-level. This approach allows applications to
register sandboxed code with the system, that may be executed in the
context of any process. Our approach differs from other
implementations that require special hardware support, such as
segmentation or tagged translation look-aside buffers (TLBs), to
either implement multiple protection domains in a single address
space, or to support fast switching between address spaces. Likewise,
we do not require the entire system to be written in a type -safe
language, to provide fine-grained protection domains. Instead, our
user-level sandboxing technique requires only paged-based virtual
memory support, and the requirement that extension code is written
either in a type-safe language, or by a trusted source.
Using a fast method of upcalls, we show how our sandboxing technique
for implementing logical protection domains provides significant
performance improvements over traditional methods of invoking
user-level services. Experimental results show our approach to be an
efficient method for extensibility, with inter-protection domain
communication costs close to those of hardware-based solutions
leveraging segmentation.
::::::::::::::
2003-015
::::::::::::::
Title: ROMA: Reliable Overlay Multicast with Loosely Coupled TCP Connections
Author: Gu-In Kwon and John Byers
Date: July 1, 2003
Abstract:
We consider the problem of architecting a reliable content delivery
system across an overlay network using TCP connections as the
transport primitive. We first argue that natural designs based on
store-and-forward principles that tightly couple TCP connections at
intermediate end-systems impose fundamental performance limitations, such
as dragging down all transfer rates in the system to the rate of the
slowest receiver. In contrast, the ROMA architecture we propose
incorporates the use of loosely coupled TCP connections together with
fast forward error correction techniques to deliver a scalable solution
that better accommodates a set of heterogeneous receivers. The methods we
develop establish chains of TCP connections, whose expected performance we
analyze through equation-based methods. We validate our analytical
findings and evaluate the performance of our ROMA architecture using a
prototype implementation via extensive Internet experimentation across the
PlanetLab distributed testbed.
::::::::::::::
2003-016
::::::::::::::
Title: Stochastic Mesh-Based Multiview Reconstruction
Authors: John Isidoro and Stan Sclaroff
Date: July 1, 2003
Abstract:
A method for reconstruction of 3D polygonal models from multiple views
is presented. The method uses sampling techniques to construct a
texture-mapped semi-regular polygonal mesh of the object in question.
Given a set of views and segmentation of the object in each view,
constructive solid geometry is used to build a visual hull from
silhouette prisms. The resulting polygonal mesh is simplified and
subdivided to produce a semi-regular mesh. Regions of model fit
inaccuracy are found by projecting the reference images onto the mesh
from different views. The resulting error images for each view are
used to compute a probability density function, and several points are
sampled from it. Along the epipolar lines corresponding to these
sampled points, photometric consistency is evaluated. The mesh
surface is then pulled towards the regions of higher photometric
consistency using free-form deformations. This sampling-based
approach produces a photometrically consistent solution in much less
time than possible with previous multi-view algorithms given arbitrary
camera placement.
::::::::::::::
2003-017
::::::::::::::
Title: Stochastic Refinement of the Visual Hull to Satisfy Photometric and Silhouette Consistency Constraints
Authors: John Isidoro and Stan Sclaroff
Date: 7/18/03
Abstract:
An iterative method for reconstructing a 3D polygonal mesh and color
texture map from multiple views of an object is presented. In each
iteration, the method first estimates a texture map given the current
shape estimate. The texture map and its associated residual error image
are obtained via maximum a posteriori estimation and reprojection of
the multiple views into texture space. Next, the surface shape is adjusted
to minimize residual error in texture space. The surface is deformed
towards a photometrically-consistent solution via a series of 1D epipolar
searches at randomly selected surface points. The texture space
formulation has improved computational complexity over standard image-based
error aproaches, and allows computation of the reprojection error and
uncertainty for any point on the surface. Moreover, shape adjustments can
be constrained such that the recovered model's silhouette matches those of
the input images. Experiments with real world imagery demonstrate the
validity of the approach.
::::::::::::::
2003-018
::::::::::::::
Title: Segmenting Foreground Objects from a Dynamic Textured Background
via a Robust Kalman Filter
Authors: Jing Zhong and Stan Sclaroff
Date: July 18, 2003
Abstract:
The algorithm presented in this paper aims to segment the foreground
objects in video (e.g., people) given time-varying, textured
backgrounds. Examples of time-varying backgrounds include waves on water,
clouds moving, trees waving in the wind, automobile traffic, moving crowds,
escalators, etc. We have developed a novel foreground-background
segmentation algorithm that explicitly accounts for the non-stationary
nature and clutter-like appearance of many dynamic textures. The dynamic
texture is modeled by an Autoregressive Moving Average Model (ARMA). A
robust Kalman filter algorithm iteratively estimates the intrinsic
appearance of the dynamic texture, as well as the regions of the foreground
objects. Preliminary experiments with this method have demonstrated
promising results.
::::::::::::::
2003-019
::::::::::::::
Title: Dynamic Window-Constrained Scheduling for Real-Time Media Streaming
Authors: Richard West, Karsten Schwan Christian Poellabauer
Date: August 29, 2003
Abstract:
This paper describes an algorithm for scheduling packets in real-time
multimedia data streams. Common to these classes of data streams are
service constraints in terms of bandwidth and delay. However, it is
typical for real-time multimedia streams to tolerate bounded delay
variations and, in some cases, finite losses of packets. We have
therefore developed a scheduling algorithm that assumes streams have
window-constraints on groups of consecutive packet deadlines. A
window-constraint defines the number of packet deadlines that can be
missed in a window of deadlines for consecutive packets in a stream.
Our algorithm, called Dynamic Window-Constrained Scheduling (DWCS),
attempts to guarantee no more than x out of a window of y
deadlines are missed for consecutive packets in real-time and
multimedia streams. Using DWCS, the delay of service to real-time
streams is bounded even when the scheduler is overloaded. Moreover,
DWCS is capable of ensuring independent delay bounds on streams, while
at the same time guaranteeing minimum bandwidth utilizations over
tunable and finite windows of time.
We show the conditions under which the total demand for link bandwidth
by a set of real-time (i.e., window-constrained) streams can exceed
100% and still ensure all window-constraints are met. In fact, we
show how it is possible to guarantee worst-case per-stream bandwidth
and delay constraints while utilizing all available link
capacity. Finally, we show how best-effort packets can be serviced
with fast response time, in the presence of window-constrained
traffic.
Keywords: Real-time (window-constrained) scheduling and communications
::::::::::::::
2003-020
::::::::::::::
Title: Adaptive Routing of QoS-constrained Media Streams over Scalable Overlay
Topologies
Authors: Gerald Fry and Richard West
Abstract:
Current research on Internet-based distributed systems emphasizes the
scalability of overlay topologies for efficient search and retrieval of
data items, as well as routing amongst peers. However, most existing
approaches fail to address the transport of data across these logical
networks in accordance with quality of service (QoS) constraints.
Consequently, this paper investigates the use of scalable overlay
topologies for routing real-time media streams between publishers and
potentially many thousands of subscribers. Specifically, we analyze the
costs of using k-ary n-cubes for QoS-constrained routing. Given a number
of nodes in a distributed system, we calculate the optimal k-ary n-cube
structure for minimizing the average distance between any pair of nodes.
Using this structure, we describe a greedy algorithm that selects paths
between nodes in accordance with the real-time delays along physical
links. We show this method improves the routing latencies by as much as
67%, compared to approaches that do not consider physical link costs.
We are in the process of developing a method for adaptive node placement
in the overlay topology, based upon the locations of publishers,
subscribers, physical link costs and per-subscriber QoS constraints. One
such method for repositioning nodes in logical space is discussed, to
improve the likelihood of meeting service requirements on data routed
between publishers and subscribers. Future work will evaluate the benefits
of such techniques more thoroughly.
::::::::::::::
2003-021
::::::::::::::
Title: Structural Analysis of Network Traffic Flows
Authors: Anukool Lakhina, Konstantina Papagiannaki, Mark Crovella, Christophe Diot, Eric Kolaczyk, and Nina Taft
Abstract:
Network traffic arises from the superposition of Origin-Destination
(OD) flows. Hence, a thorough understanding of OD flows is essential
for modeling network traffic, and for addressing a wide variety of
problems including traffic engineering, traffic matrix
estimation, capacity planning, forecasting and anomaly detection.
However, to date, OD flows have not been closely studied, and there is
very little known about their properties.
We present the first analysis of complete sets of OD flow timeseries,
taken from two different backbone networks (Abilene and
Sprint-Europe). Using Principal Component Analysis (PCA), we find that
the set of OD flows has small intrinsic dimension. In fact, even in a
network with over a hundred OD flows, these flows can be accurately
modeled in time using a small number (10 or less) of independent
components or dimensions.
We also show how to use PCA to systematically decompose the structure
of OD flow timeseries into three main constituents: common periodic
trends, short-lived bursts, and noise. We provide insight into how
the various constitutents contribute to the overall structure of OD
flows and explore the extent to which this decomposition varies over
time.
::::::::::::::
2003-022
::::::::::::::
Title: Analysis of OD Flows (Raw Data)
Authors:
Anukool Lakhina, Konstantina Papagiannaki, Mark Crovella, Christophe Diot, Eric D. Kolaczyk and Nina Taft
Abstract:
In a recent paper, Structural Analysis of Network Traffic Flows, we
analyzed the set of Origin Destination traffic flows from the Sprint-Europe
and Abilene backbone networks. This report presents the complete set of
results from analyzing data from both networks. The results in this report
are specific to the Sprint-1 and Abilene datasets studied in the above
paper. The following results are presented here:
1 Rows of Principal Matrix ($V$) ... 2
1.1 Sprint-1 Dataset ... 2
1.2 Abilene Dataset ... 9
2 Set of Eigenflows ... 14
2.1 Sprint-1 Dataset ... 14
2.2 Abilene Dataset ... 21
3 Classifying Eigenflows ... 26
3.1 Sprint-1 Dataset ... 26
3.2 Abilene Dataset ... 44
::::::::::::::
2003-023
::::::::::::::
Title: BoostMap: A Method for Efficient Approximate Similarity Rankings
Authors: Vassilis Athitsos, Jonathan Alon, Stan Sclaroff, George Kollios
Date: November 24, 2003
Abstract:
This paper introduces BoostMap, a method that can significantly
reduce retrieval time in image and video database systems that employ
computationally expensive distance measures, metric or
non-metric. Database and query objects are embedded into a Euclidean
space, in which similarities can be rapidly measured using a weighted
Manhattan distance. Embedding construction is formulated as a machine
learning task, where AdaBoost is used to combine many simple, 1D
embeddings into a multidimensional embedding that preserves a
significant amount of the proximity structure in the original
space. Performance is evaluated in a hand pose estimation system, and
a dynamic gesture recognition system, where the proposed method is
used to retrieve approximate nearest neighbors under expensive image
and video similarity measures. In both systems, BoostMap significantly
increases efficiency, with minimal losses in accuracy. Moreover, the
experiments indicate that BoostMap compares favorably with existing
embedding methods that have been employed in computer vision and
database applications, i.e., FastMap and Bourgain embeddings.
::::::::::::::
2003-024
::::::::::::::
Title: A Pragmatic Approach to DHT Adoption
Authors: Jeffrey Considine, Michael Walfish, David G. Andersen
Date: December 1, 2003
Abstract:
Despite the peer-to-peer community's obvious wish to have its systems
adopted, specific mechanisms to facilitate incremental adoption have not
yet received the same level of attention as the many other practical
concerns associated with these systems. This paper argues that ease of
adoption should be elevated to a first-class concern and accordingly
presents HOLD, a front-end to existing DHTs that is optimized for
incremental adoption. Specifically, HOLD is backwards-compatible: it
leverages DNS to provide a key-based routing service to existing Internet
hosts without requiring them to install any software. This paper also
presents applications that could benefit from HOLD as well as the
trade-offs that accompany HOLD. Early implementation experience suggests
that HOLD is practical.
::::::::::::::
2003-025
::::::::::::::
Title: Contour Generator Points for Threshold Selection and a Novel Photo-Consistency Measure for Space Carving
Authors: John Isodoro and Stan Sclaroff
Date: December 2, 2003
Abstract:
Space carving has emerged as a powerful method for multiview scene
reconstruction. Although a wide variety of methods have been
proposed, the quality of the reconstruction remains highly-dependent
on the photometric consistency measure, and the threshold used to
carve away voxels. In this paper, we present a novel
photo-consistency measure that is motivated by a multiset variant of
the chamfer distance. The new measure is robust to high amounts of
within-view color variance and also takes into account the projection
angles of back-projected pixels.
Another critical issue in space carving is the selection of the
photo-consistency threshold used to determine what surface voxels are
kept or carved away. In this paper, a reliable threshold selection
technique is proposed that examines the photo-consistency values at
contour generator points. Contour generators are points that lie on
both the surface of the object and the visual hull. To determine the
threshold, a percentile ranking of the photo-consistency values of
these generator points is used. This improved technique is applicable
to a wide variety of photo-consistency measures, including the new
measure presented in this paper. Also presented in this paper is a
method to choose between photo-consistency measures, and voxel array
resolutions prior to carving using receiver operating characteristic
(ROC) curves.
::::::::::::::
2003-026
::::::::::::::
Title: Exogenous-Loss Awareness in Queue Management: Toward Global Fairness
Authors: Mina Guirguis, Azer Bestavros, and Ibrahim Matta
Date: December 2, 2003
Abstract:
For a given TCP flow, exogenous losses are those occurring on links
other than the flow's bottleneck link. Exogenous losses are typically
viewed as introducing undesirable ``noise'' into TCP's feedback
control loop, leading to inefficient network utilization and
potentially severe global unfairness. This has prompted much research
on mechanisms for hiding such losses from end-points. In this paper,
we show through analysis and simulations that low levels of exogenous
losses are surprisingly beneficial in that they improve stability and
convergence, without sacrificing efficiency. Based on this, we argue
that exogenous loss awareness should be taken into account in any AQM
design that aims to achieve global fairness. To that end, we propose
an eXogenous-loss aware Queue Management (XQM) that actively accounts
for and leverages exogenous losses. We use an equation based approach
to derive the quiescent loss rate for a connection based on the
connection's profile and its global fair share. In contrast to other
queue management techniques, XQM ensures that a connection sees its
quiescent loss rate, not only by complementing already existing
exogenous losses, but also by actively hiding exogenous losses, if
necessary, to achieve global fairness. We establish the advantages of
exogenous-loss awareness using extensive simulations in which, we
contrast the performance of XQM to that of a host of traditional
exogenous-loss unaware AQM techniques.
::::::::::::::
2003-027
::::::::::::::
Title: Efficiently and Fairly Allocating Bandwidth at a Highly Congested
Link
Authors: Tao Wang, Ibrahim Matta, and Azer Bestavros
Date: December 2, 2003
Abstract:
We consider the problem of efficiently and fairly allocating bandwidth
at a highly congested link to a diverse set of flows, including TCP
flows with various Round Trip Times (RTT), non-TCP-friendly flows such
as Constant-Bit-Rate (CBR) applications using UDP, misbehaving, or
malicious flows. Though simple, a FIFO queue management is
vulnerable. Fair Queueing (FQ) can guarantee max-min fairness but
fails at efficiency. RED-PD exploits the history of RED's actions in
preferentially dropping packets from higher-rate flows. Thus, RED-PD
attempts to achieve fairness at low cost. By relying on RED's
actions, RED-PD turns out not to be effective in dealing with
non-adaptive flows in settings with a highly heterogeneous mix of
flows. In this paper, we propose a new approach we call RED-NB (RED
with No Bias). RED-NB does not rely on RED's actions. Rather it
explicitly maintains its own history for the few high-rate flows.
RED-NB then adaptively adjusts flow dropping probabilities to achieve
max-min fairness. In addition, RED-NB helps RED itself at very high
loads by tuning RED's dropping behavior to the flow characteristics
(restricted in this paper to RTTs) to eliminate its bias against
long-RTT TCP flows while still taking advantage of RED's features at
low loads. Through extensive simulations, we confirm the fairness of
RED-NB and show that it outperforms RED, RED-PD, and CHOKe in all
scenarios.
::::::::::::::
2003-028
::::::::::::::
Title: Providing Soft Bandwidth Guarantees Using Elastic TCP-based Tunnels
Authors: Mina Guirguis, Azer Bestavros, Ibrahim Matta, Niky Riga, Galia Damiant, and Yuting Zhang
Date: December 2, 2003
Abstract:
The best-effort nature of the Internet poses a significant obstacle to
the deployment of many applications that require guaranteed
bandwidth. In this paper, we present a novel approach that enables two
edge/border routers---which we call Internet Traffic Managers
(ITM)---to use an adaptive number of TCP connections to set up a
tunnel of desirable bandwidth between them. The number of TCP
connections that comprise this tunnel is elastic in the sense that it
increases/decreases in tandem with competing cross traffic to maintain
a target bandwidth. An origin ITM would then schedule incoming
packets from an application requiring guaranteed bandwidth over that
elastic tunnel. Unlike many proposed solutions that aim to deliver
soft QoS guarantees, our elastic-tunnel approach does not
require any support from core routers (as with IntServ and DiffServ);
it is scalable in the sense that core routers do not have to maintain
per-flow state (as with IntServ); and it is readily deployable within
a single ISP or across multiple ISPs. To evaluate our approach, we
develop a flow-level control-theoretic model to study the transient
behavior of established elastic TCP-based tunnels. The model captures
the effect of cross-traffic connections on our bandwidth allocation
policies. Through extensive simulations, we confirm the effectiveness
of our approach in providing soft bandwidth guarantees. We also
outline our kernel-level ITM prototype implementation.
::::::::::::::
2003-029
::::::::::::::
Title: TCP Optimization through FEC, ARQ and Transmission Power Tradeoffs
Authors: Dhiman Barman, Ibrahim Matta, Eitan Altman, and Rachid El Azouzi
Date: December 3, 2003
Abstract:
TCP performance degrades when end-to-end connections extend over
wireless connections --- links which are characterized by high bit
error rate and intermittent connectivity. Such link characteristics
can significantly degrade TCP performance as the TCP sender assumes
wireless losses to be congestion losses resulting in unnecessary
congestion control actions. Link errors can be reduced by increasing
transmission power, code redundancy (FEC) or number of retransmissions
(ARQ). But increasing power costs resources, increasing code
redundancy reduces available channel bandwidth and increasing
persistency increases end-to-end delay. The paper proposes a TCP
optimization through proper tuning of power management, FEC and ARQ in
wireless environments (WLAN and WWAN). In particular, we conduct
analytical and numerical analysis taking into account the three
aforementioned factors, and evaluate TCP (and ``wireless-aware'' TCP)
performance under different settings. Our results show that
increasing power, redundancy and/or retransmission levels always
improves TCP performance by reducing link-layer losses. However, such
improvements are often associated with cost and arbitrary improvement
cannot be realized without paying a lot in return. It is therefore
important to consider some kind of net utility function that should be
optimized, thus maximizing throughput at the least possible cost.
::::::::::::::
2003-030
::::::::::::::
Title: A Bayesian Approach for TCP to Distinguish Congestion from Wireless Losses
Authors: Dhiman Barman and Ibrahim Matta
Date: December 3, 2003
Abstract:
(This Technical Report revises TR-BUCS-2003-011) The Transmission
Control Protocol (TCP) has been the protocol of choice for many
Internet applications requiring reliable connections. The design of
TCP has been challenged by the extension of connections over wireless
links. In this paper, we investigate a Bayesian approach to infer at
the source host the reason of a packet loss, whether congestion or
wireless transmission error. Our approach is ``mostly'' end-to-end
since it requires only one {\em long-term average} quantity (namely,
long-term average packet loss probability over the wireless segment)
that may be best obtained with help from the network (e.g.\ wireless
access agent).
Specifically, we use Maximum Likelihood Ratio tests to evaluate TCP as a
classifier of the type of packet loss. We study the effectiveness of {\em
short-term} classification of packet errors (congestion vs. wireless),
given stationary prior error probabilities and distributions of packet
delays conditioned on the type of packet loss (measured over a larger time
scale). Using our Bayesian-based approach and extensive simulations, we
demonstrate that congestion-induced losses and losses due to wireless
transmission errors produce sufficiently different statistics upon which an
efficient online error classifier can be built. We introduce a simple
queueing model to underline the conditional delay distributions arising
from different kinds of packet losses over a heterogeneous wired/wireless
path. We show how Hidden Markov Models (HMMs) can be used by a TCP
connection to infer efficiently conditional delay distributions. We
demonstrate how estimation accuracy is influenced by different proportions
of congestion versus wireless losses and penalties on incorrect
classification.
::::::::::::::
2003-031
::::::::::::::
Title: Automated Placement of Cameras in a Floorplan to Satisfy Task-Specific
Constraints
Authors: Ugur Murat Erdem and Stan Sclaroff
Abstract:
In many multi-camera vision systems the effect of camera locations on the
task-specific quality of service is ignored. Researchers in Computational
Geometry have proposed elegant solutions for some sensor location problem
classes. Unfortunately, these solutions utilize unrealistic assumptions
about the cameras' capabilities that make these algorithms unsuitable for
many real-world computer vision applications: unlimited field of view,
infinite depth of field, and/or infinite servo precision and speed. In
this paper, the general camera placement problem is first defined with
assumptions that are more consistent with the capabilities of real-world
cameras. The region to be observed by cameras may be volumetric, static or
dynamic, and may include holes that are caused, for instance, by columns or
furniture in a room that can occlude potential camera views. A subclass of
this general problem can be formulated in terms of planar regions that are
typical of building floorplans. Given a floorplan to be observed, the
problem is then to efficiently compute a camera layout such that certain
task-specific constraints are met. A solution to this problem is obtained
via binary optimization over a discrete problem space. In preliminary
experiments the performance of the resulting system is demonstrated with
different real floorplans.
::::::::::::::
2003-032
::::::::::::::
Title: itmBench: Generalized API for Internet Traffic Managers
Authors: Gali Diamant, Leonid Veytser, Ibrahim Matta, Azer Bestavros, Mina Guirguis, Liang Guo, Yuting Zhang, Sean Chen
Date: December 16, 2003
Abstract:
Internet Traffic Managers (ITMs) are special machines placed at
strategic places in the Internet. itmBench is an interface that allows
users (e.g. network managers, service providers, or experimental
researchers) to register different traffic control functionalities to
run on one ITM or an overlay of ITMs. Thus {\em itmBench} offers a
tool that is extensible and powerful yet easy to maintain. ITM
traffic control applications could be developed either using a kernel
API so they run in kernel space, or using a user-space API so they run
in user space. We demonstrate the flexibility of {\em itmBench} by
showing the implementation of both a kernel module that provides a
differentiated network service, and a user-space module that provides
an overlay routing service. Our itmBench Linux-based prototype is free
software and can be obtained from http://www.cs.bu.edu/groups/itm/.
::::::::::::::
2004-001
::::::::::::::
Title: Integrated Chest Image Analysis System ``BU-MIA''
Authors: Margrit Betke, Boston University
Jingbin Wang, Boston University
Jane P. Ko, New York University
Date: January 7, 2004
Abstract:
We introduce ``BU-MIA,'' a Medical Image Analysis system that
integrates various advanced chest image analysis methods for
detection, estimation, segmentation, and registration. BU-MIA
evaluates repeated computed tomography (CT) scans of the same patient
to facilitate identification and evaluation of pulmonary nodules for
interval growth. It provides a user-friendly graphical user interface
with a number of interaction tools for development, evaluation, and
validation of chest image analysis methods. The structures that BU-MIA
processes include the thorax, lungs, and trachea, pulmonary
structures, such as lobes, fissures, nodules, and vessels, and bones,
such as sternum, vertebrae, and ribs.
::::::::::::::
2004-002
::::::::::::::
Title: Quantum Lower Bounds for Fanout
Authors: M. Fang, . Fenner, F. Green, S. Homer, Y. Zhang
Date: Jan 12, 2004
Abstract:
We prove several new lower bounds for constant
depth quantum circuits. The main result is that parity (and hence
fanout) requires log depth circuits, when the circuits are composed
of single qubit and arbitrary size Toffoli gates, and when they
use only constantly many ancillae. Under this constraint, this
bound is close to optimal. In the case of a non-constant number
of ancillae , we give a tradeoff between the number of ancillae
and the required depth.
::::::::::::::
2004-003
::::::::::::::
Title: Bounds on the Power of Constant-Depth Quantum Circuits
Authors: S. Fenner, F. Green, S. Homer, Y. Zhang
Date: Jan 12, 2004
Abstract:
We show that if a language is recognized within certain error bounds
by constant-depth quantum circuits over a finite family of gates, then
it is computable in (classical) polynomial time. In particular, our
results imply EQNC^0 is contained in P, where EQNC^0 is the
constant-depth analog of the class EQP. On the other hand, we adapt
and extend ideas of Terhal and DiVincenzo (quant-ph/0205133) to show
that, for any family F of quantum gates including Hadamard and CNOT
gates, computing the acceptance probabilities of depth-five circuits
over F is just as hard as computing these probabilities for circuits
over F. In particular, this implies that NQNC^0 is hard for the
polynomial time hierarchy, where NQNC^0 is the constant-depth analog
of the class NQP. This essentially refutes a conjecture of Green et
al. that NQACC is contained in TC^0 (quant-ph/0106017).
::::::::::::::
2004-004
::::::::::::::
Title: Programming Examples Needing Polymorphic Recursion
Authors: J. J. Hallett and A. J. Kfoury
Date: January 22, 2004
Abstract:
Inferring types for polymorphic recursive function definitions
(abbreviated to polymorphic recursion) is a recurring topic on the
mailing lists of popular typed programming languages. This is despite the
fact that type inference for polymorphic recursion using for all-types
has been proved undecidable. This report presents several programming
examples involving polymorphic recursion and determines their typability
under various type systems, including the Hindley-Milner system, an
intersection-type system, and extensions of these two. The goal of this
report is to show that many of these examples are typable using a system
of intersection types as an alternative form of polymorphism. By
accomplishing this, we hope to lay the foundation for future research into
a decidable intersection-type inference algorithm.
We do not provide a comprehensive survey of type systems appropriate
for polymorphic recursion, with or without type annotations inserted in
the source language. Rather, we focus on examples for which types may be
inferred without type annotations.
::::::::::::::
2004-005
::::::::::::::
Title: Exploiting the Transients of Adaptation for RoQ Attacks on Internet Resources
Authors: Mina Guirguis, Azer Bestavros, and Ibrahim Matta
Date: January 30, 2004
Abstract:
In this paper, we expose an unorthodox adversarial attack that
exploits the transients of a system's adaptive behavior, as opposed to
its limited steady-state capacity. We show that a well orchestrated
attack could introduce significant inefficiencies that could
potentially deprive a network element from much of its capacity, or
significantly reduce its service quality, while evading detection by
consuming an unsuspicious, small fraction of that element's hijacked
capacity. This type of attack stands in sharp contrast to traditional
brute-force, sustained high-rate DoS attacks, as well as recently
proposed attacks that exploit specific protocol settings such as TCP
timeouts. We exemplify what we term as Reduction of Quality (RoQ)
attacks by exposing the vulnerabilities of common adaptation
mechanisms. We develop control-theoretic models and associated metrics
to quantify these vulnerabilities. We present numerical and simulation
results, which we validate with observations from real Internet
experiments. Our findings motivate the need for the development of
adaptation mechanisms that are resilient to these new forms of
attacks.
::::::::::::::
2004-006
::::::::::::::
Title: Boosting Nearest Neighbor Classifiers for Multiclass Recognition
Authors: Vassilis Athitsos and Stan Sclaroff
Date: February 13, 2004
Abstract:
This paper introduces an algorithm that uses boosting to learn a
distance measure for multiclass k-nearest neighbor
classification. Given a family of distance measures as input, AdaBoost
is used to learn a weighted distance measure, that is a linear
combination of the input measures. The proposed method can be seen
both as a novel way to learn a distance measure from data, and as a
novel way to apply boosting to multiclass recognition problems, that
does not require output codes. In our approach, multiclass recognition
of objects is reduced into a single binary recognition task, defined
on triples of objects. Preliminary experiments with eight UCI datasets
yield no clear winner among our method, boosting using output codes,
and k-nn classification using an unoptimized distance measure. Our
algorithm did achieve lower error rates in some of the datasets, which
indicates that, in some domains, it may lead to better results than
existing methods.
::::::::::::::
2004-007
::::::::::::::
Title: StaXML: Static Typing of XML Document Fragments for Imperative Web Scripting Languages
Authors: Adam Bradley, Assaf Kfoury, and Azer Bestavros
Date: February 13, 2004
Abstract:
We present a type system, StaXML, which employs the stacked type syntax
to represent essential aspects of the potential roles of XML fragments
to the structure of complete XML documents. The simplest application of
this system is to enforce well-formedness upon the construction of XML
documents without requiring the use of templates or balanced "gap
plugging" operators; this allows it to be applied to programs written
according to common imperative web scripting idioms, particularly the
echoing of unbalanced XML fragments to an output buffer. The system can
be extended to verify particular XML applications such as XHTML and
identifying individual XML tags constructed from their lexical
components. We also present StaXML for PHP, a prototype precompiler for
the PHP4 scripting language which infers StaXML types for expressions
without assistance from the programmer.
::::::::::::::
2004-008
::::::::::::::
Title: Diagnosing Network-Wide Traffic Anomalies
Authors: Anukool Lakhina, Mark Crovella and Christophe Diot
Date: February 24, 2004
Abstract:
Anomalies are unusual and significant changes in a network's traffic
levels, which can often involve multiple links. Diagnosing anomalies
is critical for both network operators and end users. It is a
difficult problem because one must extract and interpret anomalous
patterns from large amounts of high-dimensional, noisy data. In this
paper we propose a general method to diagnose anomalies. This method
is based on a separation of the high-dimensional space occupied by a
set of network traffic measurements into disjoint subspaces
corresponding to normal and anomalous network conditions. We show
that this separation can be performed effectively using Principal
Component Analysis. Using only simple traffic measurements from
links, we study volume anomalies and show that the method can: (1)
accurately detect when a volume anomaly is occurring; (2) correctly
identify the underlying origin-destination (OD) flow which is the
source of the anomaly; and (3) accurately estimate the amount of
traffic involved in the anomalous OD flow. We evaluate the method's
ability to diagnose (i.e., detect, identify, and quantify) both
existing and synthetically injected volume anomalies in real traffic
from two backbone networks. Our method consistently diagnoses the
largest volume anomalies, and does so with a very low false alarm
rate.
::::::::::::::
2004-009
::::::::::::::
Title: Efficient End-Host Architecture for High Performance Communication
Using User-level Sandboxing
Authors: Xin Qi, Gabriel Parmer, Richard West, Jason Gloudon, Luis Hernandez
Date: March 1, 2004
Abstract:
Current low-level networking abstractions on modern operating systems are
commonly implemented in the kernel to provide sufficient performance for
general purpose applications. However, it is desirable for high performance
applications to have more control over the networking subsystem to support
optimizations for their specific needs. One approach is to allow networking
services to be implemented at user-level. Unfortunately, this typically
incurs costs due to scheduling overheads and unnecessary data copying via
the kernel. In this paper, we describe a method to implement efficient
application-specific network service extensions at user-level, that removes
the cost of scheduling and provides protected access to lower-level system
abstractions. We present a networking implementation that, with minor
modifications to the Linux kernel, passes data between ``sandboxed''
extensions and the Ethernet device without copying or processing in the
kernel. Using this mechanism, we put a customizable networking stack into a
user-level sandbox and show how it can be used to efficiently process and
forward data via proxies, or intermediate hosts, in the communication path
of high performance data streams. Unlike other user-level networking
implementations, our method makes no special hardware requirements to avoid
unnecessary data copies. Results show that we achieve a substantial
increase in throughput over comparable user-space methods using our
networking stack implementation.
::::::::::::::
2004-010
::::::::::::::
Title: A Randomized Solution to BGP Divergence
Author: Selma Yilmaz and Ibrahim Matta
Date: March 1, 2004
Abstract:
The Border Gateway Protocol (BGP) is an interdomain routing protocol
that allows each Autonomous System (AS) to define its own routing
policies independently and use them to select the best routes. By
means of policies, ASes are able to prevent some traffic from
accessing their resources, or direct their traffic to a preferred
route. However, this flexibility comes at the expense of a possibility
of divergence behavior because of mutually conflicting policies.
Since BGP is not guaranteed to converge even in the absence of network
topology changes, it is not {\em safe}. In this paper, we propose a
randomized approach to providing safety in BGP. The proposed
algorithm dynamically detects policy conflicts, and tries to eliminate
the conflict by changing the local preference of the paths involved.
Both the detection and elimination of policy conflicts are performed
locally, {\em i.e.}\ by using only local information. Randomization
is introduced to prevent synchronous updates of the local preferences
of the paths involved in the same conflict.
::::::::::::::
2004-011
::::::::::::::
Title: A Two-step Statistical Approach for Inferring Network Traffic
Demands (Revises Technical Report BUCS-2003-003)
Author: Alberto Medina, Kave Salamatian, Nina Taft, Ibrahim Matta, and Christophe Diot
Date: March 1, 2004
Abstract:
Accurate knowledge of traffic demands in a communication network
enables or enhances a variety of traffic engineering and network
management tasks of paramount importance for operational
networks. Directly measuring a complete set of these demands is
prohibitively expensive because of the huge amounts of data that must be
collected and the performance impact that such measurements would
impose on the regular behavior of the network. As a consequence, we
must rely on statistical techniques to produce estimates of actual
traffic demands from partial information. The performance of such
techniques is however limited due to their reliance on limited
information and the high amount of computations they incur, which
limits their convergence behavior. In this paper we study a two-step
approach for inferring network traffic demands. First we elaborate
and evaluate a modeling approach for generating good starting points
to be fed to iterative statistical inference techniques. We call these
starting points {\it informed priors} since they are obtained using
actual network information such as packet traces and SNMP link
counts. Second we provide a very fast variant of the EM algorithm
which extends its computation range, increasing its accuracy and
decreasing its dependence on the quality of the starting point.
Finally, we evaluate and compare alternative mechanisms for generating
starting points and the convergence characteristics of our EM
algorithm against a recently proposed Weighted Least Squares approach.
::::::::::::::
2004-012
::::::::::::::
Title: Simultaneous Localization and Recognition of Dynamic Hand Gestures
Authors: Jonathan Alon, Vassilis Athitsos, Quan Yuan, and Stan Sclaroff
Date: March 8, 2004
Abstract:
A framework for the simultaneous localization and recognition of
dynamic hand gestures is proposed. At the core of this framework is a
dynamic space-time warping (DSTW) algorithm, that aligns a pair of
query and model gestures in both space and time. For every frame of
the query sequence, feature detectors generate multiple hand region
candidates. Dynamic programming is then used to compute both a global
matching cost, which is used to recognize the query gesture, and a
warping path, which aligns the query and model sequences in time, and
also finds the best hand candidate region in every query frame. The
proposed framework includes translation invariant recognition of
gestures, a desirable property for many HCI systems. The performance
of the approach is evaluated on a dataset of hand signed digits
gestured by people wearing short sleeve shirts, in front of a
background containing other non-hand skin-colored objects. The
algorithm simultaneously localizes the gesturing hand and recognizes
the hand-signed digit. Although DSTW is illustrated in a gesture
recognition setting, the proposed algorithm is a general method for
matching time series, that allows for multiple candidate feature
vectors to be extracted at each time step.
::::::::::::::
2004-013
::::::::::::::
Title: A Virtual Deadline Scheduler for Window-Constrained Service Guarantees
Authors: Yuting Zhang, Richard West and Xin Qi
Date: March 23, 2004
Abstract:
This paper presents a new approach to window-constrained scheduling,
suitable for multimedia and weakly-hard real-time systems. We
originally developed an algorithm, called Dynamic Window-Constrained
Scheduling (DWCS), that attempts to guarantee no more than x out of y
deadlines are missed for real-time jobs such as periodic CPU tasks, or
delay-constrained packet streams. While DWCS is capable of generating
a feasible window-constrained schedule that utilizes 100% of
resources, it requires all jobs to have the same request periods (or
intervals between successive service requests). We describe a new
algorithm called Virtual Deadline Scheduling (VDS), that provides
window-constrained service guarantees to jobs with potentially
different request periods, while still maximizing resource
utilization.
VDS attempts to service m out of k job instances by their virtual
deadlines, that may be some finite time after the corresponding
real-time deadlines. Notwithstanding, VDS is capable of outperforming
DWCS and similar algorithms, when servicing jobs with potentially
different request periods. Additionally, VDS is able to limit the
extent to which a fraction of all job instances are serviced
late. Results from simulations show that VDS can provide better
window-constrained service guarantees than other related algorithms,
while still having as good or better delay bounds for all scheduled
jobs. Finally, an implementation of VDS in the Linux kernel compares
favorably against DWCS for a range of scheduling loads.
::::::::::::::
2004-014
::::::::::::::
Title: Learning Euclidean Embeddings for Indexing and Classification
Authors: Vassilis Athitsos, Joni Alon, Stan Sclaroff, George Kollios
Date: April 7, 2004
Abstract:
BoostMap is a recently proposed method for efficient approximate
nearest neighbor retrieval in arbitrary non- Euclidean
spaces with computationally expensive and possibly
non-metric distance measures. Database and query objects
are embedded into a Euclidean space, in which similarities
can be rapidly measured using a weighted Manhattan distance.
The key idea is formulating embedding construction
as a machine learning task, where AdaBoost is used
to combine simple, 1D embeddings into a multidimensional
embedding that preserves a large amount of the proximity
structure of the original space. This paper demonstrates
that, using the machine learning formulation of BoostMap,
we can optimize embeddings for indexing and classification,
in ways that are not possible with existing alternatives for
constructive embeddings, and without additional costs in retrieval
time. First, we show how to construct embeddings
that are query-sensitive, in the sense that they yield a different
distance measure for different queries, so as to improve nearest
neighbor retrieval accuracy for each query. Second, we
show how to optimize embeddings for nearest neighbor classification
tasks, by tuning them to approximate a parameter
space distance measure, instead of the original feature-based distance
measure.
::::::::::::::
2004-015
::::::::::::::
Title: Automated Camera Layout to Satisfy Task-Specific and Floorplan-Specific Coverage Requirements
Authors: Ugur Murat Erdem, Stan Sclaroff
Date: April 15, 2004
Abstract:
In many multi-camera vision systems the effect of camera locations on
the task-specific quality of service is ignored. Researchers in
Computational Geometry have proposed elegant solutions for some sensor
location problem classes. Unfortunately, these solutions utilize
unrealistic assumptions about the cameras' capabilities that make
these algorithms unsuitable for many real-world computer vision
applications: unlimited field of view, infinite depth of field, and/or
infinite servo precision and speed. In this paper, the general camera
placement problem is first defined with assumptions that are more
consistent with the capabilities of real-world cameras. The region to
be observed by cameras may be volumetric, static or dynamic, and may
include holes that are caused, for instance, by columns or furniture
in a room that can occlude potential camera views. A subclass of this
general problem can be formulated in terms of planar regions that are
typical of building floorplans. Given a foorplan to be observed, the
problem is then to efficiently compute a camera layout such that
certain task-specific constraints are met. A solution to this problem
is obtained via binary optimization over a discrete problem space. In
experiments the performance of the resulting system is demonstrated
with different real foorplans.
::::::::::::::
2004-016
::::::::::::::
Title: Robust Tracking of Human Motion
Author: Dan Buzan, Boston University
Date: 04/23/2004
Abstract:
This technical report presents a combined solution for two problems,
one: tracking objects in 3D space and estimating their trajectories and
second: computing the similarity between previously estimated trajectories
and clustering them using the similarities that we just computed. For the
first part, trajectories are estimated using an EKF formulation that will
provide the 3D trajectory up to a constant. To improve accuracy, when
occlusions appear, multiple hypotheses are followed. For the second
problem we compute the distances between trajectories using a similarity
based on LCSS formulation. Similarities are computed between projections
of trajectories on coordinate axes. Finally we group trajectories together
based on previously computed distances, using a clustering algorithm. To
check the validity of our approach, several experiments using real data
were performed.
::::::::::::::
2004-017
::::::::::::::
Title: Extraction and Clustering of Motion Trajectories in Video
Author: Dan Buzan, Boston University
Stan Sclaroff, Boston University
George Kollios, Boston University
Date: 04/23/2004
Abstract:
A system is described that tracks moving objects in a video dataset so as
to extract a representation of the objects' 3D trajectories. The system
then finds hierarchical clusters of similar trajectories in the video
dataset. Objects' motion trajectories are extracted via an EKF formulation
that provides each object's 3D trajectory up to a constant factor. To
increase accuracy when occlusions occur, multiple tracking hypotheses are
followed. For trajectory-based clustering and retrieval, a modified
version of edit distance, called longest common subsequence (LCSS) is
employed. Similarities are computed between projections of trajectories on
coordinate axes. Trajectories are grouped based, using an agglomerative
clustering algorithm. To check the validity of the approach, experiments
using real data were performed.
::::::::::::::
2004-018
::::::::::::::
Title: Group Key Manager on a Smart Card
Authors: Hani Hamandi (Geotrust) and Gene Itkis (BU)
Date: 4/27/04
Abstract:
Group communication is as an important functionality, which needs to be
supported by various communication technologies. Applications of group
communication include IP (or application-level) multicast, wireless
and/or ad-hoc networks, broadcast, conference calling, pay-per-view, and
even such seemingly unrelated to networks areas as copy protection. For
many, if not all, of these applications, security and trust play an
important role. Securing group communication typically requires
confidentiality and authentication, which typically rely on secret keys.
Thus key management issues must be addressed.
This paper describes an implementation of one approach to dynamic group
key management, which is based on Logical Key Hierarchy or Subset-Cover
approach [1,2].
Our approach achieves a dramatic reduction of the storage requirements
for the Group Key Manager, and in particular allows all the secret key
data to be stored on a smart-card. It also allows a number of subsequent
improvements.
::::::::::::::
2004-019
::::::::::::::
Title: Interactive Password Schemes
Authors: Gene Itkis (BU) and Arwa Maiss (BU)
Date: 4/27/04
Abstract:
Usual password schemes suffer from the flaw that they are easy to steal.
An attacker who has correctly observed a login session (by peeping,
wiretapping and/or by launching a "man-in-the-middle" attack, etc.) can
easily impersonate the corresponding user.
Available protection techniques require computations on hundreds digit
integers that are so complex that they require special software and/or
hardware.
This project tries to combine the simplicity of the conventional
password schemes with a protection technique that results in a different
password being typed each session, but only requires simple computation
performed in the user's head.
::::::::::::::
2004-020
::::::::::::::
Authors: Anukool Lakhina, Mark Crovella and Christophe Diot
Title: Characterization of Network-Wide Anomalies in Traffic Flows
Date: May 14, 2004
Abstract:
Detecting and understanding anomalies in IP networks is an open and
ill-defined problem. Toward this end, we have recently proposed the
subspace method for anomaly diagnosis. In this paper we present the
first large-scale exploration of the power of the subspace method when
applied to flow traffic. An important aspect of this approach
is that it fuses information from flow measurements taken
throughout a network. We apply the subspace method to three different
types of sampled flow traffic in a large academic network: multivariate
timeseries of byte counts, packet counts, and IP-flow counts. We show
that each traffic type brings into focus a different set of
anomalies via the subspace method. We illustrate and classify the set
of anomalies detected. We find that almost all of the anomalies
detected represent events of interest to network operators.
Furthermore, the anomalies span a remarkably wide spectrum of event
types, including denial of service attacks (single-source and
distributed), flash crowds, port scanning, downstream traffic
engineering, high-rate flows, worm propagation, and network outage.
::::::::::::::
2004-021
::::::::::::::
Title: Safe Compositional Specification of Networking Systems
Authors: Azer Bestavros, Adam D. Bradley, Assaf J. Kfoury, and Ibrahim Matta
Date: May 14, 2004
Abstract:
The Science of Network Service Composition has clearly emerged
as one of the grand themes driving many of our research questions in
the networking field today [NeXtworking 2003]. This driving
force stems from the rise of sophisticated applications and new
networking paradigms. By ``service composition'' we mean that the
performance and correctness properties local to the various
constituent components of a service can be readily composed into
global (end-to-end) properties without re-analyzing any of the
constituent components in isolation, or as part of the whole
composite service. The set of laws that would govern such
composition is what will constitute that new science of composition.
The combined heterogeneity and dynamic open nature of network systems
makes composition quite challenging, and thus programming network
services has been largely inaccessible to the average user. We
identify (and outline) a research agenda in which we aim to develop
a specification language that is expressive enough to describe
different components of a network service, and that will include
type hierarchies inspired by type systems in general
programming languages that enable the safe composition of software
components. We envision this new science of composition to be built
upon several theories (e.g., control theory, game theory,
network calculus, percolation theory, economics, queuing theory). In
essence, different theories may provide different languages by
which certain properties of system components can be expressed and
composed into larger systems. We then seek to lift these
lower-level specifications to a higher level by abstracting away
details that are irrelevant for safe composition at the higher
level, thus making theories scalable and useful to the average user.
In this paper we focus on services built upon an overlay management
architecture, and we use control theory and QoS theory as example
theories from which we lift up compositional specifications.
::::::::::::::
2004-022
::::::::::::::
Title: SEP: A Stable Election Protocol for clustered heterogeneous
wireless sensor networks
Authors: Georgios Smaragdakis, Ibrahim Matta, and Azer Bestavros
Date: May 31, 2004
Abstract:
We study the impact of heterogeneity of nodes, in terms of their
energy, in wireless sensor networks that are hierarchically
clustered. In these networks some of the nodes become cluster heads,
aggregate the data of their cluster members and transmit it to the
sink. We assume that a percentage of the population of sensor nodes is
equipped with additional energy resources---this is a source of
heterogeneity which may result from the initial setting or as the
operation of the network evolves. We also assume that the sensors are
randomly (uniformly) distributed and are not mobile, the coordinates
of the sink and the dimensions of the sensor field are known. We show
that the behavior of such sensor networks becomes very unstable once
the first node dies, especially in the presence of node heterogeneity.
Classical clustering protocols assume that all the nodes are equipped
with the same amount of energy and as a result, they can not take full
advantage of the presence of node heterogeneity. We propose SEP, a
heterogeneous-aware protocol to prolong the time interval before the
death of the first node (we refer to as {\em stability period}), which
is crucial for many applications where the feedback from the sensor
network must be reliable. SEP is based on weighted election
probabilities of each node to become cluster head according to the
remaining energy in each node. We show by simulation that SEP always
prolongs the stability period compared to (and that the average
throughput is greater than) the one obtained using current clustering
protocols. We conclude by studying the sensitivity of our SEP
protocol to heterogeneity parameters capturing energy imbalance in the
network. We found that SEP yields longer stability region for higher
values of extra energy brought by more powerful nodes.
::::::::::::::
2004-023
::::::::::::::
Title: DIP: Density Inference Protocol for wireless sensor networks and its application to density-unbiased statistics
Authors: Niky Riga, Ibrahim Matta, and Azer Bestavros
Date: May 31, 2004
Abstract:
Wireless sensor networks have recently emerged as enablers of
important applications such as environmental, chemical and nuclear
sensing systems. Such applications have sophisticated
spatial-temporal semantics that set them aside from traditional
wireless networks. For example, the computation of temperature
averaged over the sensor field must take into account local densities.
This is crucial since otherwise the estimated average temperature can
be biased by over-sampling areas where a lot more sensors exist.
Thus, we envision that a fundamental service that a wireless sensor
network should provide is that of estimating local densities.
In this paper, we propose a lightweight probabilistic density
inference protocol, we call DIP, which allows each sensor node to {\em
implicitly} estimate its neighborhood size without the explicit
exchange of node identifiers as in existing density discovery schemes.
The theoretical basis of DIP is a probabilistic analysis which gives
the relationship between the number of sensor nodes contending in the
neighborhood of a node and the level of contention measured by that
node. Extensive simulations confirm the premise of DIP: it can
provide statistically reliable and accurate estimates of local density
at a very low energy cost and constant running time. We demonstrate
how applications could be built on top of our DIP-based service by
computing density-unbiased statistics from estimated local densities.
::::::::::::::
2004-024
::::::::::::::
Title: On the Interaction between Data Aggregation and Topology Control in Wireless Sensor Networks
Authors: Vijay Erramilli, Ibrahim Matta, and Azer Bestavros
Date: June 18, 2004
Abstract:
Wireless sensor networks are characterized by limited energy resources.
To conserve energy, application-specific aggregation (fusion) of data
reports from multiple sensors can be beneficial in reducing the amount
of data flowing over the network. Furthermore, controlling the topology
by scheduling the activity of nodes between active and sleep modes has
often been used to uniformly distribute the energy consumption among all
nodes by de-synchronizing their activities. We present an integrated
analytical model to study the joint performance of in-network
aggregation and topology control. We define performance metrics that
capture the tradeoffs among delay, energy, and fidelity of the
aggregation. Our results indicate that to achieve high fidelity levels
under medium to high event reporting load, shorter and fatter
aggregation/routing trees (toward the sink) offer the best delay-energy
tradeoff as long as topology control is well coordinated with routing.
::::::::::::::
2004-025
::::::::::::::
Title: Bayesian Packet Loss Detection for TCP
Author: Nahur Fonseca and Mark Crovella
Date: July 6, 2004
Abstract:
One of TCP's critical tasks is to determine which packets are lost in
the network, as a basis for control actions (flow control and packet
retransmission). Modern TCP implementations use two mechanisms:
timeout, and fast retransmit. Detection via timeout is necessarily a
time-consuming operation; fast retransmit, while much quicker, is only
effective for a small fraction of packet losses. In this paper we
consider the problem of packet loss detection in TCP more generally. We
concentrate on the fact that TCP's control actions are necessarily
triggered by *inference* of packet loss, rather than conclusive
knowledge. This suggests that one might analyze TCP's packet loss
detection in a standard inferencing framework based on probability of
detection and probability of false alarm. This paper makes two
contributions to that end: First, we study an example of more general
packet loss inference, namely optimal Bayesian packet loss detection
based on round trip time. We show that for long-lived flows, it is
frequently possible to achieve high detection probability and low false
alarm probability based on measured round trip time. Second, we
construct an analytic performance model that incorporates general packet
loss inference into TCP. We show that for realistic detection and
false alarm probabilities (as are achievable via our Bayesian detector)
and for moderate packet loss rates, the use of more general packet loss
inference in TCP can improve throughput by as much as 25%.
::::::::::::::
2004-026
::::::::::::::
Title: dPAM: A Distributed Prefetching Protocol for Scalable Asynchronous Multicast in P2P Systems
Authors: Abhishek Sharma, Azer Bestavros, and Ibrahim Matta
Date: July 7, 2004
Abstract:
We leverage the buffering capabilities of end-systems to achieve
scalable, asynchronous delivery of streams in a peer-to-peer
environment. Unlike existing cache-and-relay schemes, we propose a
distributed {\em prefetching} protocol where peers prefetch and store
portions of the streaming media ahead of their playout time, thus not
only turning themselves to possible sources for other peers but their
prefetched data can allow them to overcome the departure of their
source-peer. This stands in sharp contrast to existing
cache-and-relay schemes where the departure of the source-peer forces
its peer children to go the original server, thus disrupting their
service and increasing server and network load. Through mathematical
analysis and simulations, we show the effectiveness of maintaining
such asynchronous multicasts from several source-peers to other
children peers, and the efficacy of prefetching in the face of peer
departures. We confirm the scalability of our dPAM protocol as it is
shown to significantly reduce server load.
::::::::::::::
2004-027
::::::::::::::
Title: On trip planning queries in spatial databases
Author: Feifei Li, Dihan Cheng
Date: July 1, 2004
Abstract:
In this paper we discuss a new type of query in Spatial
Databases, called Trip Planning Query (TPQ). Given a set
of points P in space, where each point belongs to a category,
and given two points s and e, TPQ asks for the best trip that
starts at s, passes through exactly one point from each category,
and ends at e. An example of a TPQ is when a user
wants to visit a set of different places and at the same time
minimize the total travelling cost, e.g. what is the shortest
travelling plan for me to visit an automobile shop, a CVS
pharmacy outlet, and a Best Buy shop along my trip from A to
B? The trip planning query is an extension of the well-known
TSP problem and therefore is NP-hard. The difficulty of this
query lies in the existence of multiple choices for each category.
In this paper, we first study fast approximation algorithms
for the trip planning query in a metric space, assuming
that the data set fits in main memory, and give the theory
analysis of their approximation bounds. Then, the trip planning
query is examined for data sets that do not fit in main
memory and must be stored on disk. For the disk-resident
data, we consider two cases. In one case, we assume that the
points are located in Euclidean space and indexed with an Rtree.
In the other case, we consider the problem of points that
lie on the edges of a spatial network (e.g. road network) and
the distance between two points is defined using the shortest
distance over the network. Finally, we give an experimental
evaluation of the proposed algorithms using synthetic data
sets generated on real road networks.
::::::::::::::
2004-028
::::::::::::::
Title: GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams
Author: Ching Chang, Feifei Li, Azer Bestavros, and George Kollios
Date: July 1, 2004
Abstract:
We investigate adaptive buffer management techniques for approximate
evaluation of sliding window joins over multiple data streams. In many
applications, data stream processing systems have limited memory or
have to deal with very high speed data streams. In both cases,
computing the exact results of joins between these streams may not be
feasible, mainly because the buffers used to compute the joins contain
much smaller number of tuples than the tuples contained in the sliding
windows. Therefore, a stream buffer management policy is needed in that
case. We show that the buffer replacement policy is an important
determinant of the quality of the produced results. To that end, we
propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering
technique for managing these buffers. GDJ exploits the temporal
correlations (at both long and short time scales), which we found to
be prevalent in many real data streams. We note that our algorithm is
readily applicable to multiple data streams and multiple joins and
requires almost no additional system resources. We report results of
an experimental study using both synthetic and real-world data
sets. Our results demonstrate the superiority and flexibility of our
approach when contrasted to other recently proposed techniques.
::::::::::::::
2004-029
::::::::::::::
Title: M2RC: Multiplicative-increase/additive-decrease Multipath Routing
Control for Wireless Sensor Networks
Authors: Hany Morcos, Ibrahim Matta, and Azer Bestavros
Date: July 14, 2004
Abstract:
Routing protocols in wireless sensor networks (WSN) face
two main challenges: first, the challenging environments in
which WSNs are deployed negatively affect the quality of
the routing process. Therefore, routing protocols for WSNs
should recognize and react to node failures and packet losses.
Second, sensor nodes are battery-powered, which makes
power a scarce resource. Routing protocols should optimize
power consumption to prolong the lifetime of the WSN. In
this paper, we present a new adaptive routing protocol for
WSNs, we call it M2RC. M2RC has two phases: mesh establishment
phase and data forwarding phase. In the first
phase,M2RC establishes the routing state to enable multipath
data forwarding. In the second phase, M2RC forwards data
packets from the source to the sink. Targeting hop-by-hop
reliability, an M2RC forwarding node waits for an acknowledgement
(ACK) that its packets were correctly received at
the next neighbor. Based on this feedback, an M2RC node
applies multiplicative-increase/additive-decrease (MIAD) to
control the number of neighbors targeted by its packet broadcast.
We simulated M2RC in the ns-2 simulator and
compared it to GRAB, Max-power, and Min-power routing
schemes. Our simulations show that M2RC achieves
the highest throughput with at least 10-30 percent less consumed
power per delivered report in scenarios where a certain number
of nodes unexpectedly fail.
::::::::::::::
2004-030
::::::::::::::
Title: Friendly Virtual Machine: Leveraging a Feedback-Control Model
for Application Adaptation
Authors: Yuting Zhang, Azer Bestavros, Mina Guirguis, Ibrahim Matta, and Richard West
Date: July 2004
Abstract:
With the increased use of ``Virtual Machines'' (VMs) as vehicles that
isolate applications running on the same host, it is necessary to
devise techniques that enable multiple VMs to share underlying
resources both fairly and efficiently. To that end, one common
approach is to deploy complex resource management techniques in the
%untrust% hosting infrastructure. Alternately, in this paper, we
advocate the use of self-adaptation in the VMs themselves based on
feedback about resource usage and availability. Consequently, we
define a ``Friendly'' VM (FVM) to be a virtual machine that adjusts
its demand for system resources, so that they are both efficiently and
fairly allocated to competing FVMs. Such properties are ensured using
one of many provably convergent control rules, such as AIMD. By
adopting this distributed application-based approach to resource
management, it is not necessary to make assumptions about the
underlying resources nor about the requirements of FVMs competing for
these resources. To demonstrate the elegance and simplicity of our
approach, we present a prototype implementation of our FVM framework
in User-Mode Linux (UML)---an implementation that consists of less
than 500 lines of code changes to UML. We present an analytic,
control-theoretic model of FVM adaptation, which establishes
convergence and fairness properties. These properties are also backed
up with experimental results using our prototype FVM implementation.
::::::::::::::
2004-031
::::::::::::::
Title: Approximately Uniform Random Sampling in Sensor Networks
Authors: Boulat A. Bash, John W. Byers and Jeffrey Considine
Date: July 19, 2004
Abstract:
Recent work in sensor databases has focused extensively on distributed
query problems, notably distributed computation of aggregates. Existing
methods for computing aggregates broadcast queries to all sensors and use
in-network aggregation of responses to minimize messaging costs. In this
work, we focus on uniform random sampling across nodes, which can serve
both as an alternative building block for aggregation and as an integral
component of many other useful randomized algorithms. Prior to our work,
the best existing proposals for uniform random sampling of sensors involve
contacting all nodes in the network. We propose a practical method which
is only approximately uniform, but contacts a number of sensors
proportional to the diameter of the network instead of its size. The
approximation achieved is tunably close to exact uniform sampling, and
only relies on well-known existing primitives, namely geographic routing,
distributed computation of Voronoi regions and von Neumann's rejection
method. Ultimately, our sampling algorithm has the same worst-case
asymptotic cost as routing a point-to-point message, and thus it is
asymptotically optimal among request/reply-based sampling methods. We
provide experimental results demonstrating the effectiveness of our
algorithm on both synthetic and real sensor topologies.
::::::::::::::
2004-032
::::::::::::::
Title: A Note On the Statistical Difference of Small Direct Products
Author: Leonid Reyzin
Date: 9/21/04
Abstract: We demonstrate that if two probability distributions D and E of
sufficiently small min-entropy have statistical difference \epsilon, then
the direct-product distributions D^l and E^l have statistical difference
at least roughly \epsilon\sqrt{l}, provided that l is sufficiently small,
smaller than roughly \epsilon^{-4/3}. Previously known bounds did not
work for few repetitions l, requiring l>\epsilon^{-2}.
::::::::::::::
2004-033
::::::::::::::
Title: Periodic Motion Detection and Estimation via Space-Time Sampling
Authors: Ashwin Thangali and Stan Sclaroff
Date: November 2, 2004
Abstract:
A novel technique to detect and localize periodic movements in video is
presented. The distinctive feature of the technique
is that it requires neither feature tracking nor object
segmentation. Intensity patterns along linear sample paths in
space-time are used in estimation of period of object motion in a given
sequence of frames. Sample paths are obtained by
connecting (in space-time) sample points from regions of high motion
magnitude in the first and last frames. Oscillations in
intensity values are induced at time instants when an object intersects
the sample path. The locations of peaks in intensity are
determined by parameters of both cyclic object motion and orientation of
the sample path with respect to object motion. The
information about peaks is used in a least squares framework to obtain an
initial estimate of these parameters. The estimate is
further refined using the full intensity profile. The best estimate for
the period of cyclic object motion is obtained by looking
for consensus among estimates from many sample paths. The proposed
technique is evaluated with synthetic videos where
ground-truth is known, and with American Sign Language videos where the
goal is to detect periodic hand motions.
::::::::::::::
2004-034
::::::::::::::
Title: Multi-scale 3D Scene Flow from Binocular Stereo Sequences
Authors: Rui Li and Stan Sclaroff
Date: November 2, 2004
Abstract:
Scene flow methods estimate the three-dimensional motion field for
points in the world, using multi-camera video data. Such methods
combine multi-view reconstruction with motion estimation approaches.
This paper describes an alternative formulation for dense scene flow
estimation that provides convincing results using only two cameras by
fusing stereo and optical flow estimation into a single coherent
framework. To handle the aperture problems inherent in the estimation
task, a multi-scale method along with a novel adaptive smoothing
technique is used to gain a regularized solution. This combined approach
both preserves discontinuities and prevents over-regularization -- two
problems commonly associated with basic multi-scale approaches.
Internally, the framework generates probability distributions for
optical flow and disparity. Taking into account the uncertainty in the
intermediate stages allows for more reliable estimation of the 3D scene
flow than standard stereo and optical flow methods allow. Experiments
with synthetic and real test data demonstrate the effectiveness of
the approach.
::::::::::::::
2004-035
::::::::::::::
Title: Automatic 2D Hand Tracking in Video Sequences
Author: Quan Yuan , Stan Sclaroff and Vassilis Athitsos
Date: November 2, 2004
Abstract:
In gesture and sign language video sequences, hand motion tends to be
rapid, and hands frequently appear in front of each other or in front
of the face. Thus, hand location is often ambiguous, and naive
color-based hand tracking is insufficient. To improve tracking
accuracy, some methods employ a prediction-update framework, but such
methods require careful initialization of model parameters, and tend
to drift and lose track in extended sequences. In this paper, a
temporal filtering framework for hand tracking is proposed that can
initialize and reset itself without human intervention. In each frame,
simple features like color and motion residue are exploited to
identify multiple candidate hand locations. The temporal filter then
uses the Viterbi algorithm to select among the candidates from frame
to frame. The resulting tracking system can automatically identify
video trajectories of unambiguous hand motion, and detect frames where
tracking becomes ambiguous because of occlusions or
overlaps. Experiments on video sequences of several hundred frames in
duration demonstrate the system's ability to track hands robustly, to
detect and handle tracking ambiguities, and to extract the
trajectories of unambiguous hand motion.
::::::::::::::
2004-036
::::::::::::::
Title : Handsignals Recognition From Video Using 3D Motion Capture Data
Authors : Tai-Peng Tian and Stan Sclaroff
Date: November 4, 2004
Abstract:
Hand signals are commonly used in applications such as giving
instructions to a pilot for airplane take off or direction of a crane
operator by a foreman on the ground. A new algorithm for recognizing
hand signals from a single camera is proposed. Typically, tracked 2D
feature positions of hand signals are matched to 2D training
images. In contrast, our approach matches the 2D feature positions to
an archive of 3D motion capture sequences. The method avoids explicit
reconstruction of the 3D articulated motion from 2D image
features. Instead, the matching between the 2D and 3D sequence is done
by backprojecting the 3D motion capture data onto 2D. Experiments
demonstrate the effectiveness of the approach in an example app
lication: recognizing six classes of basketball referee hand signals
in video.
::::::::::::::
2005-001
::::::::::::::
Title: Scalable Coordination Techniques for Distributed Network Monitoring
Authors: Manish Sharma and John Byers
Date: January 20, 2005
Abstract:
Emerging network monitoring infrastructures capture packet-level
traces or keep per-flow statistics at a set of distributed vantage
points. Today, distributed monitors in such an infrastructure do not
coordinate monitoring effort, which both can lead to duplication of
effort and can complicate subsequent data analysis. We argue that
nodes in such a monitoring infrastructure, whether across the
wide-area Internet, or across a sensor network, should coordinate
effort to minimize resource consumption. We propose space-efficient
data structures for use in gossip-based protocols to approximately
summarize sets of monitored flows. With some fine-tuning of our
methods, we can ensure that all flows observed by at least one monitor
are monitored, and only a tiny fraction are monitored redundantly. Our
preliminary results over a realistic ISP topology demonstrate the
effectiveness of our techniques on monitoring tens of thousands of
point-of-presence (PoP) level network flows. Our methods are
competitive with optimal off-line coordination, but require
significantly less space and network overhead than naive approaches.
::::::::::::::
2005-002
::::::::::::::
Title: Mining Anomalies Using Traffic Distributions
Authors: Anukool Lakhina, Mark Crovella and Christophe Diot
Date: February 10, 2005
Abstract:
The increasing practicality of large-scale flow capture makes it possible
to conceive of traffic analysis methods that detect and identify a large
and diverse set of anomalies. However the challenge of effectively
analyzing this massive data source for anomaly diagnosis is as yet
unmet. We argue that the distributions of packet features (IP addresses
and ports) observed in flow traces reveals both the presence and the
structure of a wide range of anomalies. Using entropy as a summarization
tool, we show that the analysis of feature distributions leads to
significant advances on two fronts: (1) it enables highly sensitive
detection of a wide range of anomalies, augmenting detections by
volume-based methods, and (2) it enables automatic classification of
anomalies via unsupervised learning. We show that using feature
distributions, anomalies naturally fall into distinct and meaningful
clusters. These clusters can be used to automatically classify anomalies
and to uncover new anomaly types. We validate our claims on data from two
backbone networks (Abilene and Geant) and conclude that feature
distributions show promise as a key element of a fairly general network
anomaly diagnosis framework.
::::::::::::::
2005-003
::::::::::::::
Title: Applied Type System with Stateful Views
Authors:
Hongwei Xi, Boston University
Dengping Zhu, Boston University
Yanka Li, Carnegie Mellon University
Date: Feb. 10, 2005
Abstract:
We present a type system that can effectively facilitate the use of
types in capturing invariants in stateful programs that may involve
(sophisticated) pointer manipulation. With its root in a recently
developed framework Applied Type System (ATS), the type system imposes
a level of abstraction on program states by introducing a novel
notion of recursive stateful views and then relies on a form of
linear logic to reason about such views. We consider the design and
then the formalization of the type system to constitute the primary
contribution of the paper. In addition, we mention a prototype
implementation of the type system and then give a variety of
examples that attests to the practicality of programming with
recursive stateful views.
::::::::::::::
2005-004
::::::::::::::
Authors: Richard West, Gerald Fry and Gary Wong
Title: Comparison of k-ary n-cube and de Bruijn Overlays in QoS-constrained Multicast Applications
Date: February 23, 2005
Abstract:
Research on the construction of logical overlay networks has
gained significance in recent times. This is partly due to
work on peer-to-peer (P2P) systems for locating and
retrieving distributed data objects, and also scalable
content distribution using end-system multicast techniques.
However, there are emerging applications that require the
real-time transport of data from various sources to
potentially many thousands of subscribers, each having their
own quality-of-service (QoS) constraints. This paper
primarily focuses on the properties of two popular
topologies found in interconnection networks, namely k-ary
n-cubes and de Bruijn graphs. The regular structure of these
graph topologies makes them easier to analyze and determine
possible routes for real-time data than complete or
irregular graphs. We show how these overlay topologies
compare in their ability to deliver data according to the
QoS constraints of many subscribers, each receiving data
from specific publishing hosts. Comparisons are drawn on
the ability of each topology to route data in the presence
of dynamic system effects, due to end-hosts joining and
departing the system. Finally, experimental results show the
service guarantees and physical link stress resulting from
efficient multicast trees constructed over both kinds of
overlay networks.
::::::::::::::
2005-005
::::::::::::::
Title: An Efficient User-Level Shared Memory Mechanism for Application-Specific
Extensions (revised and extended version of BUCS-TR-2003-014)
Authors: Richard West, Jason Gloudon, Xin Qi and Gabriel Parmer
Date: February 23, 2005
Abstract:
This paper focuses on an efficient user-level method for the deployment of
application-specific extensions, using commodity operating systems and
hardware. A sandboxing technique is described that supports multiple
extensions within a shared virtual address space. Applications can
register sandboxed code with the system, so that it may be executed in the
context of any process. Such code may be used to implement generic
routines and handlers for a class of applications, or system service
extensions that complement the functionality of the core kernel. Using our
approach, application-specific extensions can be written like conventional
user-level code, utilizing libraries and system calls, with the advantage
that they may be executed without the traditional costs of scheduling and
context-switching between process-level protection domains. No special
hardware support such as segmentation or tagged translation look-aside
buffers (TLBs) is required. Instead, our ``user-level sandboxing''
mechanism requires only paged-based virtual memory support, given that
sandboxed extensions are either written by a trusted source or are
guaranteed to be memory-safe (e.g., using type-safe languages).
Using a fast method of upcalls, we show how our mechanism provides
significant performance improvements over traditional methods of
invoking user-level services. As an application of our approach, we
have implemented a user-level network subsystem that avoids data
copying via the kernel and, in many cases, yields far greater network
throughput than kernel-level approaches.
::::::::::::::
2005-006
::::::::::::::
Title: Cuckoo: a Language for Implementing Memory- and Thread-safe System Services
Authors: Richard West and Gary Wong
Date: February 23, 2005
Abstract:
This paper is centered around the design of a thread- and memory-safe
language, primarily for the compilation of application-specific services
for extensible operating systems. We describe various issues that have
influenced the design of our language, called Cuckoo, that guarantees
safety of programs with potentially asynchronous flows of control.
Comparisons are drawn between Cuckoo and related software safety
techniques, including Cyclone and software-based fault isolation (SFI),
and performance results suggest our prototype compiler is capable of
generating safe code that executes with low runtime overheads, even
without potential code optimizations. Compared to Cyclone, Cuckoo is able
to safely guard accesses to memory when programs are multithreaded.
Similarly, Cuckoo is capable of enforcing memory safety in situations that
are potentially troublesome for techniques such as SFI.
::::::::::::::
2005-007
::::::::::::::
Title: SymbolDesign: A User-centered Method to Design Pen-based
Interfaces and Extend the Functionality of Pointer Input Devices
Author: Margrit Betke, Oleg Gusyatin, and Mikhail Urinson
Date: February 25, 2005
Abstract:
A method called ``SymbolDesign'' is proposed that can be used to
design user-centered interfaces for pen-based input devices. It can
also extend the functionality of pointer input devices such as the
traditional computer mouse or the Camera Mouse, a camera-based
computer interface. Users can create their own interfaces by choosing
single-stroke movement patterns that are convenient to draw with the
selected input device and by mapping them to a desired set of
commands. A pattern could be the trace of a moving finger detected
with the Camera Mouse or a symbol drawn with an optical pen. The core
of the SymbolDesign system is a dynamically created classifier, in the
current implementation an artificial neural network. The architecture
of the neural network automatically adjusts according to the
complexity of the classification task. In experiments, subjects used
the SymbolDesign method to design and test the interfaces they
created, for example, to browse the web. The experiments demonstrated
good recognition accuracy and responsiveness of the user interfaces.
The method provided an easily-designed and easily-used computer input
mechanism for people without physical limitations, and, with some
modifications, has the potential to become a computer access tool for
people with severe paralysis.
::::::::::::::
2005-008
::::::::::::::
Title: MosaicShape: Stochastic Region Grouping with Shape Prior
Author: Jingbin Wang, Erdan Gu, and Margrit Betke
Date: February 25, 2005
Abstract:
A novel method that combines shape-based object recognition and
image segmentation is proposed for shape retrieval from
images. Given a shape prior represented in a multi-scale curvature
form, the proposed method identifies the target objects in images
by grouping oversegmented image regions. The problem is formulated
in a unified probabilistic framework and solved by a stochastic
Markov Chain Monte Carlo (MCMC) mechanism. By this means, object
segmentation and recognition are accomplished simultaneously.
Within each sampling move during the simulation process,
probabilistic region grouping operations are influenced by both the
image information and the shape similarity constraint. The latter
constraint is measured by a partial shape matching process. A
generalized parallel algorithm by Barbu and Zhu, combined with a
large sampling jump and other implementation improvements, greatly
speeds up the overall stochastic process. The proposed method
supports the segmentation and recognition of multiple occluded
objects in images. Experimental results are provided for both
synthetic and real images.
::::::::::::::
2005-009
::::::::::::::
Authors: Vassilis Athitsos, Jonathan Alon, and Stan Sclaroff
Title: Efficient Nearest Neighbor Classification Using a Cascade of
Approximate Similarity Measures
Date: March 16, 2005
Abstract:
Nearest neighbor classification using shape context can yield highly
accurate results in a number of recognition problems. Unfortunately,
the approach can be too slow for practical applications, and thus
approximation strategies are needed to make shape context
practical. This paper proposes a method for efficient and accurate
nearest neighbor classification in non-Euclidean spaces, such as the
space induced by the shape context measure. First, a method is
introduced for constructing a Euclidean embedding that is optimized
for nearest neighbor classification accuracy. Using that embedding,
multiple approximations of the underlying non-Euclidean similarity
measure are obtained, at different levels of accuracy and
efficiency. The approximations are automatically combined to form a
cascade classifier, which applies the slower approximations only to
the hardest cases. Unlike typical cascade-of-classifiers approaches,
that are applied to binary classification problems, our method
constructs a cascade for a multiclass problem. Experiments with a
standard shape data set indicate that a two-to-three order of
magnitude speed up is gained over the standard shape context
classifier, with minimal losses in classification accuracy.
::::::::::::::
2005-010
::::::::::::::
Authors: Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, and Stan Sclaroff
Title: Query-Sensitive Embeddings
Date: March 16, 2005
Abstract:
A common problem in many types of databases is retrieving the most
similar matches to a query object. Finding those matches in a large
database can be too slow to be practical, especially in domains where
objects are compared using computationally expensive similarity (or
distance) measures. This paper proposes a novel method for approximate
nearest neighbor retrieval in such spaces. Our method is
embedding-based, meaning that it constructs a function that maps
objects into a real vector space. The mapping preserves a large amount
of the proximity structure of the original space, and it can be used
to rapidly obtain a short list of likely matches to the query. The
main novelty of our method is that it constructs, together with the
embedding, a query-sensitive distance measure that should be used when
measuring distances in the vector space. The term ``query-sensitive''
means that the distance measure changes depending on the current query
object. We report experiments with an image database of handwritten
digits, and a time-series database. In both cases, the proposed method
outperforms existing state-of-the-art embedding methods, meaning that
it provides significantly better trade-offs between efficiency and
retrieval accuracy.
::::::::::::::
2005-011
::::::::::::::
Title: Robust Sketching and Aggregation of Distributed Data Streams
Author: Marios Hadjieleftheriou, John W. Byers, George Kollios
Date: March 16, 2005
Abstract:
The data streaming model provides an attractive
framework for one-pass summarization
of massive data sets at a single observation
point. However, in an environment where
multiple data streams arrive at a set of distributed
observation points, sketches must be
computed remotely and then must be aggregated
through a hierarchy before queries may
be conducted. As a result, many sketch-based
methods for the single stream case do not apply
directly, as either the error introduced becomes
large, or because the methods assume
that the streams are non-overlapping. These
limitations hinder the application of these
techniques to practical problems in network
traffic monitoring and aggregation in sensor
networks. To address this, we develop a general
framework for evaluating and enabling robust
computation of duplicate-sensitive aggregate
functions (e.g., SUM and QUANTILE),
over data produced by distributed sources.
We instantiate our approach by augmenting
the Count-Min and Quantile-Digest sketches
to apply in this distributed setting, and analyze
their performance. We conclude with
experimental evaluation to validate our analysis.
::::::::::::::
2005-012
::::::::::::::
Title: Real Time Eye Tracking and Blink Detection with USB Cameras
Authors: Michael Chau and Margrit Betke
Date: April 28, 2005
Abstract:
A human-computer interface (HCI) system designed for use by people
with severe disabilities is presented. People that are severely
paralyzed or afflicted with diseases such as ALS (Lou Gehrig's
disease) or multiple sclerosis are unable to move or control any parts
of their bodies except for their eyes. The system presented here
detects the user's eye blinks and analyzes the pattern and duration of
the blinks, using them to provide input to the computer in the form of
a mouse click. After the automatic initialization of the system occurs
from the processing of the user's involuntary eye blinks in the first
few seconds of use, the eye is tracked in real time using correlation
with an online template. If the user's depth changes significantly or
rapid head movement occurs, the system is automatically
reinitialized. There are no lighting requirements nor offline
templates needed for the proper functioning of the system. The system
works with inexpensive USB cameras and runs at a frame rate of 30
frames per second. Extensive experiments were conducted to determine
both the system's accuracy in classifying voluntary and involuntary
blinks, as well as the system's fitness in varying environment
conditions, such as alternative camera placements and different
lighting conditions. These experiments on eight test subjects yielded
an overall detection accuracy of 95.3%.
::::::::::::::
2005-013
::::::::::::::
Title: "An Invariant Representation for Matching Trajectories across Uncalibrated Video Streams"
Authors: Walter Nunziati (University of Florence), Stan Sclaroff, Alberto Del Bimbo (University of Florence)
Abstract:
We introduce a viewpoint invariant representation of moving object
trajectories that can be used in video database applications. It is
assumed that trajectories lie on a surface that can be locally
approximated with a plane. Raw trajectory data is first locally
approximated with a cubic spline via least squares fitting. For each
sampled point of the obtained curve, a projective invariant feature is
computed using a small number of points in its neighborhood. The
resulting sequence of invariant features computed along the entire
trajectory forms the view invariant descriptor of the trajectory
itself. Time parametrization has been exploited to compute cross
ratios without ambiguity due to point ordering. Similarity between
descriptors of different trajectories is measured with a distance that
takes into account the statistical properties of the cross ratio, and
its symmetry with respect to the point at in nity. In experiments, an
overall correct classification rate of about 95% has been obtained on
a dataset of 58 trajectories of players in soccer video, and an
overall correct classification rate of about 80% has been obtained on
matching partial segments of trajectories collected from two
overlapping views of outdoor scenes with moving people and cars.
::::::::::::::
2005-014
::::::::::::::
Title: Typed Abstraction of Complex Network Compositions
Authors: Azer Bestavros and Adam D. Bradley and Assaf J. Kfoury and Ibrahim Matta
Date: May 1, 2005
Abstract:
The heterogeneity and open nature of network systems make analysis of
compositions of components quite challenging, making the design and
implementation of robust network services largely inaccessible to the
average programmer. We propose the development of a novel type system
and practical type spaces which reflect simplified representations of
the results and conclusions which can be derived from complex
compositional theories in more accessible ways, essentially allowing
the system architect or programmer to be exposed only to the inputs
and output of compositional analysis without having to be familiar
with the ins and outs of its internals. Toward this end we present
the TRAFFIC (Typed Representation and Analysis of Flows For
Interoperability Checks) framework, a simple flow-composition and
typing language with corresponding type system. We then discuss and
demonstrate the expressive power of a type space for TRAFFIC derived
from the network calculus, allowing us to reason about and infer such
properties as data arrival, transit, and loss rates in large composite
network applications.
::::::::::::::
2005-015
::::::::::::::
Title: Safe Compositional Specification of Networking Systems: TRAFFIC The Language and Its Type Checking
Authors: Likai Liu, Assaf Kfoury, Azer Bestavros, Adam D. Bradley, Yarom Gabay, and Ibrahim Matta
Date: May 12, 2005
Abstract:
This paper formally defines the operational semantic for TRAFFIC, a
specification language for flow composition applications proposed in
BUCS-TR-2005-014, and presents a type system based on desired safety
assurance. We provide proofs on reduction (weak-confluence,
strong-normalization and unique normal form), on soundness and
completeness of type system with respect to reduction, and on
equivalence classes of flow specifications. Finally, we provide a
pseudo-code listing of a syntax-directed type checking algorithm
implementing rules of the type system capable of inferring the type of
a closed flow specification.
::::::::::::::
2005-016
::::::::::::::
Title: An Invariant Representation for Matching Trajectories across uncalibrated video streams
Authors: Walter Nunziati, Stan Sclaroff, and Alberto Del Bimbo
Date: May 19, 2005
Abstract:
We introduce a view-point invariant representation of moving
object trajectories that can be used in video database applications. It
is assumed that trajectories lie on a surface that can be locally approximated
with a plane. Raw trajectory data is first locally?approximated
with a cubic spline via least squares fitting. For each sampled point of
the obtained curve, a projective invariant feature is computed using a
small number of points in its neighborhood. The resulting sequence of
invariant features computed along the entire trajectory forms the view?
invariant descriptor of the trajectory itself. Time parametrization has
been exploited to compute cross ratios without ambiguity due to point
ordering. Similarity between descriptors of different trajectories is measured
with a distance that takes into account the statistical properties of
the cross ratio, and its symmetry with respect to the point at infinity. In
experiments, an overall correct classification rate of about 95% has been
obtained on a dataset of 58 trajectories of players in soccer video, and
an overall correct classification rate of about 80% has been obtained on
matching partial segments of trajectories collected from two overlapping
views of outdoor scenes with moving people and cars.
::::::::::::::
2005-017
::::::::::::::
Title: View registration using interesting segments of planar trajectories
Authors: Walter Nunziati, Jonathan Alon, Stan Sclaroff, and Alberto Del Bimbo
Date: May 19, 2005
Abstract:
We introduce a method for recovering the spatial and temporal
alignment between two or more views of objects moving over a ground
plane. Existing approaches either assume that the streams are globally
synchronized, so that only solving the spatial alignment is needed, or
that the temporal misalignment is small enough so that exhaustive
search can be performed. In contrast, our approach can recover both
the spatial and temporal alignment. We compute for each trajectory a
number of interesting segments, and we use their description to form
putative matches between trajectories. Each pair of corresponding
interesting segments induces a temporal alignment, and defines an
interval of common support across two views of an object that is used
to recover the spatial alignment. Interesting segments and their
descriptors are defined using algebraic projective invariants measured
along the trajectories. Similarity between interesting segments is
computed taking into account the statistics of such
invariants. Candidate alignment parameters are verified checking the
consistency, in terms of the symmetric transfer error, of all the
putative pairs of corresponding interesting segments. Experiments are
conducted with two different sets of data, one with two views of an
outdoor scene featuring moving people and cars, and one with four
views of a laboratory sequence featuring moving radio-controlled cars.
::::::::::::::
2005-018
::::::::::::::
Title: Foreground Object Segmentation from Binocular Stereo Video
Authors: Kevin Law and Stan Sclaroff
Date: May 19, 2005
Abstract:
Moving cameras are needed for a wide range of applications in
robotics, vehicle systems, surveillance, etc. However, many foreground
object segmentation methods reported in the literature are unsuitable
for such settings; these methods assume that the camera is fixed and
the background changes slowly, and are inadequate for segmenting
objects in video if there is significant motion of the camera or
background. To address this shortcoming, a new method for segmenting
foreground objects is proposed that utilizes binocular video. The
method is demonstrated in the application of tracking and segmenting
people in video who are approximately facing the binocular camera
rig. Given a stereo image pair, the system first tries to find
faces. Starting at each face, the region containing the person is
grown by merging regions from an over-segmented color image. The
disparity map is used to guide this merging process. The system has
been implemented on a consumer-grade PC, and tested on video sequences
of people indoors obtained from a moving camera rig. As can be
expected, the proposed method works well in situations where other
foreground-background segmentation methods typically fail. We believe
that this superior performance is partly due to the use of object
detection to guide region merging in disparity/color foreground
segmentation, and partly due to the use of disparity information
available with a binocular rig, in contrast with most previous methods
that assumed monocular sequences.
::::::::::::::
2005-019
::::::::::::::
Title: Online and Offine Character Recognition Using Alignment to Prototypes
Authors: Jonathan Alon, Vassilis Athitsos, and Stan Sclaroff
Date: June 3, 2005
Abstract:
Nearest neighbor classifiers are simple to implement, yet they can
model complex non-parametric distributions, and provide
state-of-the-art recognition accuracy in OCR databases. At the
same time, they may be too slow for practical character
recognition, especially when they rely on similarity measures that
require computationally expensive pairwise alignments between
characters. This paper proposes an efficient method for computing
an approximate similarity score between two characters based on
their exact alignment to a small number of prototypes. The
proposed method is applied to both online and offline character
recognition, where similarity is based on widely used and
computationally expensive alignment methods, i.e., Dynamic Time
Warping and the Hungarian method respectively. In both cases
significant recognition speedup is obtained at the expense of only
a minor increase in recognition error.
::::::::::::::
2005-020
::::::::::::::
Title: Fast and Accurate Gesture Spotting using Subgesture Reasoning and
Pruning of Unlikely Dynamic Programming Paths
Authors: Jonathan Alon, Vassilis Athitsos, and Stan Sclaroff
Date: June 3, 2005
Abstract:
Vision-based recognition of gestures in continuous video
streams can facilitate more natural human-computer interaction.
Gesture spotting is the challenging task of locating
the start and end frames of the video stream that correspond
to a gesture of interest, while at the same time rejecting
non-gesture motion patterns. This paper proposes a new
gesture spotting and recognition algorithm that is based on
the widely used continuous dynamic programming (CDP)
algorithm. Our first contribution is a pruning method that
allows the system to evaluate a relatively small number of
hypotheses compared to CDP. Pruning is implemented by
a set of model-dependent classifiers, that are learned from
training examples. In our experiments, the proposed CDP
with pruning was an order of magnitude faster compared
to the original CDP algorithm, and recognition accuracy
improved by 7%. The second contribution of the proposed
spotting algorithm is a subgesture reasoning process that
models the fact that some gesture models can falsely match
parts of other longer gestures. In our experiments, using the
proposed subgesture modeling improved recognition accuracy
by an additional 12%.
::::::::::::::
2005-021
::::::::::::::
Title: Detecting Instances of Shape Classes That Exhibit Variable Structure
Authors: Vassilis Athitsos, Jingbin Wang, Stan Sclaroff, Margrit Betke
Date: June 8, 2005
Abstract:
This paper proposes a method for detecting shapes of variable
structure in images with clutter. The term ``variable structure''
means that some shape parts can be repeated an arbitrary number of
times, some parts can be optional, and some parts can have several
alternative appearances. The particular variation of the shape
structure that occurs in a given image is not known a priori. Existing
computer vision methods, including deformable model methods, were not
designed to detect shapes of variable structure; they may only be used
to detect shapes that can be decomposed into a fixed, a priori known,
number of parts. The proposed method can handle both variations in
shape structure and variations in the appearance of individual shape
parts. A new class of shape models is introduced, called Hidden State
Shape Models, that can naturally represent shapes of variable
structure. A detection algorithm is described that finds instances of
such shapes in images with large amounts of clutter by finding
globally optimal correspondences between image features and shape
models. Experiments with real images demonstrate that our method can
localize plant branches that consist of an a priori unknown number of
leaves and can detect hands more accurately than a hand detector based
on the chamfer distance.
::::::::::::::
2005-022
::::::::::::::
Title: Face identification by a cascade of rejection classifiers
Authors: Quan Yuan, Ashwin Thangali, and Stan Sclaroff
Date: June 10, 2005
Abstract:
Nearest neighbor search is commonly employed in face
recognition but it does not scale well to large dataset sizes.
A strategy to combine rejection classifiers into a cascade
for face identification is proposed in this paper. A rejection
classifier for a pair of classes is defined to reject at
least one of the classes with high confidence. These rejection
classifiers are able to share discriminants in feature
space and at the same time have high confidence in the
rejection decision. In the face identification problem, it is
possible that a pair of known individual faces are very dissimilar.
It is very unlikely that both of them are close to an
unknown face in the feature space. Hence, only one of them
needs to be considered. Using a cascade structure of rejection
classifiers, the scope of nearest neighbor search can
be reduced significantly. Experiments on Face Recognition
Grand Challenge (FRGC) version 1 data demonstrate that
the proposed method achieves significant speed up and an
accuracy comparable with the brute force Nearest Neighbor
method. In addition, a graph cut based clustering technique
is employed to demonstrate that the pairwise separability of
these rejection classifiers is capable of semantic grouping.
::::::::::::::
2005-023
::::::::::::::
Title:
Fast Head Tilt Detection for Human-Computer Interaction
Authors: Benjamin N. Waber, John J. Magee, and Margrit Betke
Boston University
Date: July 7, 2005
Abstract:
Accurate head tilt detection has a large potential to aid people with
disabilities in the use of human-computer interfaces and provide
universal access to communication software. We show how it can be
utilized to tab through links on a web page or control a video game
with head motions. It may also be useful as a correction method for
currently available video-based assistive technology that requires
upright facial poses. Few of the existing computer vision methods that
detect head rotations in and out of the image plane with reasonable
accuracy can operate within the context of a real-time communication
interface because the computational expense that they incur is too
great. Our method uses a variety of metrics to obtain a robust head
tilt estimate without incurring the computational cost of previous
methods. Our system runs in real time on a computer with a 2.53 GHz
processor, 256 MB of RAM and an inexpensive webcam, using only 55% of
the processor cycles.
::::::::::::::
2005-024
::::::::::::::
Title:
Facial Feature Tracking and Occlusion Recovery in American Sign
Language
Authors: Thomas J. Castelli, Margrit Betke, and Carol Neidle,
Boston University
Date: July 7, 2005
Abstract:
Facial features play an important role in expressing grammatical
information in signed languages, including American Sign Language
(ASL). Gestures such as raising or furrowing the eyebrows are key
indicators of constructions such as yes-no questions. Periodic
head movements (nods and shakes) are also an essential part of the
expression of syntactic information, such as negation (associated
with a side-to-side headshake). Therefore, identification of these
facial gestures is essential to sign language recognition. One
problem with detection of such grammatical indicators is occlusion
recovery. If the signer's hand blocks his/her eyebrows during
production of a sign, it becomes difficult to track the eyebrows.
We have developed a system to detect such grammatical markers in
ASL that recovers promptly from occlusion. Our system detects and
tracks evolving templates of facial features, which are based on an
anthropometric face model, and interprets the geometric
relationships of these templates to identify grammatical markers.
It was tested on a variety of ASL sentences signed by various Deaf
native signers and detected facial gestures used to express
grammatical information, such as raised and furrowed eyebrows as
well as headshakes.
::::::::::::::
2005-025
::::::::::::::
Title: Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions
Authors: Tai-Peng Tian, Rui Li, Stan Sclaroff
Abstract:
A learning based framework is proposed for estimating human body pose
from a single image. Given a differentiable function that maps from
pose space to image feature space, the goal is to invert the process:
estimate the pose given only image features. The inversion is an
ill-posed problem as the inverse mapping is a one to many
process. Hence multiple solutions exist, and it is desirable to
restrict the solution space to a smaller subset of feasible
solutions. For example, not all human body poses are feasible due to
anthropometric constraints. Since the space of feasible solutions may
not admit a closed form description, the proposed framework seeks to
exploit machine learning techniques to learn an approximation that is
smoothly parameterized over such a space. One such technique is
Gaussian Process Latent Variable Modelling. Scaled conjugate gradient
is then used tond the best matching pose in the space of feasible
solutions when given an input image. The formulation allows easy
incorporation of various constraints, e.g. temporal consistency and
anthropometric constraints. The performance of the proposed approach
is evaluated in the task of upper-body pose estimation from
silhouettes and compared with the Specialized Mapping
Architecture. The estimation accuracy of the Specialized Mapping
Architecture is at least one standard deviation worse than the
proposed approach in the experiments with synthetic data. In
experiments with real video of humans performing gestures, the
proposed approach produces qualitatively better estimation results.
::::::::::::::
2005-026
::::::::::::::
Title: Mistreatment in Distributed Caching Groups: Causes and Implications
Authors: Nikolaos Laoutaris, Georgios Smaragdakis, Azer Bestavros, Ioannis Stavrakakis
Date: July 7, 2005
Abstract:
Although cooperation generally increases the amount of resources
available to a community of nodes, thus improving individual and
collective performance, it also allows for the appearance of potential
mistreatment problems through the exposition of one node's resources
to others. We study such concerns by considering a group of
independent, rational, self-aware nodes that cooperate using on-line
caching algorithms, where the exposed resource is the storage of each
node. Motivated by content networking applications -- including web
caching, CDNs, and P2P -- this paper extends our previous work on the
off-line version of the problem, which was limited to object
replication and was conducted under a game-theoretic framework. We
identify and investigate two causes of mistreatment: (1) cache state
interactions (due to the cooperative servicing of requests) and (2)
the adoption of a common scheme for cache
replacement/redirection/admission policies. Using analytic models,
numerical solutions of these models, as well as simulation
experiments, we show that on-line cooperation schemes using caching
are fairly robust to mistreatment caused by state interactions. When
this becomes possible, the interaction through the exchange of
miss-streams has to be very intense, making it feasible for the
mistreated nodes to detect and react to the exploitation. This
robustness ceases to exist when nodes fetch and store objects in
response to remote requests, i.e., when they operate as Level-2 caches
(or proxies) for other nodes. Regarding mistreatment due to a common
scheme, we show that this can easily take place when the ``outlier''
characteristics of some of the nodes get overlooked. This finding
underscores the importance of allowing cooperative caching nodes the
flexibility of choosing from a diverse set of schemes to fit the
peculiarities of individual nodes. To that end, we outline an
emulation-based framework for the development of
mistreatment-resilient distributed selfish caching schemes.
::::::::::::::
2005-027
::::::::::::::
Title: Computing a Uniform Scaling Parameter for 3D Registration of Lung Surfaces
Authors: Vladimir Rodeski, William Mullally, Carissa Bellardine, Kenneth Lutchen, and Margrit Betke
Date: July 7, 2005
Abstract:
A difficulty in lung image registration is accounting for changes in
the size of the lungs due to inspiration. We propose two methods for
computing a uniform scale parameter for use in lung image
registration that account for size change. A scaled rigid-body
transformation allows analysis of corresponding lung CT scans taken
at different times and can serve as a good low-order transformation
to initialize non-rigid registration approaches. Two different
features are used to compute the scale parameter. The first method
uses lung surfaces. The second uses lung volumes. Both approaches are
computationally inexpensive and improve the alignment of lung images
over rigid registration. The two methods produce different scale
parameters and may highlight different functional information about
the lungs.
::::::::::::::
2005-028
::::::::::::::
Title: An Adaptive Policy Management Approach to BGP Convergence
Author: Selma Yilmaz and Ibrahim Matta
Date: July 13, 2005
Abstract:
The Border Gateway Protocol (BGP) is the current inter-domain routing
protocol used to exchange reachability information between Autonomous
Systems (ASes) in the Internet. BGP supports policy-based routing
which allows each AS to independently adopt a set of local policies
that specify which routes it accepts and advertises from/to other
networks, as well as which route it prefers when more than one route
becomes available. However, independently chosen local policies may
cause global conflicts, which result in protocol divergence. In this
paper, we propose a new algorithm, called Adaptive Policy Management
Scheme (APMS), to resolve policy conflicts in a distributed manner.
Akin to distributed feedback control systems, each AS independently
classifies the state of the network as either conflict-free or
potentially-conflicting by observing its local history only (namely,
route flaps). Based on the degree of measured conflicts, each AS
dynamically adjusts its own path preferences---increasing its
preference for observably stable paths over flapping paths. APMS also
includes a mechanism to distinguish route flaps due to topology
changes, so as not to confuse them with those due to policy
conflicts. A correctness and convergence analysis of APMS based on the
sub-stability property of chosen paths is presented. Implementation in
the SSF network simulator is performed, and simulation results for
different performance metrics are presented. The metrics capture the
dynamic performance (in terms of instantaneous throughput, delay,
routing load, etc.) of APMS and other competing solutions, thus
exposing the often neglected aspects of performance.
::::::::::::::
2005-029
::::::::::::::
Title : Tracking Human Body Pose on a Learned Smooth Space
Authors : Tai-Peng Tian, Rui Li and Stan Sclaroff
Date: July 28, 2005
Abstract :
Particle filtering is a popular method used in systems for tracking
human body pose in video. One key difficulty in using particle
filtering is caused by the curse of dimensionality: generally a very
large number of particles is required to adequately approximate the
underlying pose distribution in a high-dimensional state
space. Although the number of degrees of freedom in the human body is
quite large, in reality, the subset of allowable configurations in
state space is generally restricted by human biomechanics, and the
trajectories in this allowable subspace tend to be smooth. Therefore,
a framework is proposed to learn a low-dimensional representation of
the high-dimensional human poses state space. This mapping can be
learned using a Gaussian Process Latent Variable Model (GPLVM)
framework. One important advantage of the GPLVM framework is that both
the mapping to, and mapping from the embedded space are smooth; this
facilitates sampling in the low-dimensional space, and samples
generated in the low-dimensional embedded space are easily mapped back
into the original high-dimensional space. Moreover, human body poses
that are similar in the original space tend to be mapped close to each
other in the embedded space; this property can be exploited when
sampling in the embedded space. The proposed framework is tested in
tracking 2D human body pose using a Scaled Prismatic
Model. Experiments on real life video sequences demonstrate the
strength of the approach. In comparison with the Multiple Hypothesis
Tracking and the standard Condensation algorithm, the proposed
algorithm is able to maintain tracking reliably throughout the long
test sequences. It also handles singularity and self occlusion
robustly.
::::::::::::::
2005-030
::::::::::::::
Title: Some Considerations on a Calculus with Weak References
Author: Kevin Donnelly and Assaf J. Kfoury
Date: July 27, 2005
Abstract:
Weak references are references that do not prevent the object they point
to from being garbage collected. Most realistic languages, including
Java, SML/NJ, and OCaml to name a few, have some facility for
programming with weak references. Weak references are used in
implementing idioms like memoizing functions and hash-consing in order
to avoid potential memory leaks.
However, the semantics of weak references in many languages are not
clearly specified. Without a formal semantics for weak references it
becomes impossible to prove the correctness of implementations making
use of this feature. Previous work by Hallett and Kfoury extends $\gc$,
a language for modeling garbage collection, to $\weak$, a similar
language with weak references.
Using this previously formalized semantics for weak references, we
consider two issues related to well-behavedness of programs. Firstly,
we provide a new, simpler proof of the well-behavedness of the
syntactically restricted fragment of $\weak$ defined previously.
Secondly, we give a natural semantic criterion for well-behavedness much
broader than the syntactic restriction, which is useful as principle for
programming with weak references.
Furthermore we extend the result, proved in previously of $\gc$, which
allows one to use type-inference to collect some reachable objects that
are never used. We prove that this result holds of our language, and we
extend this result to allow the collection of weakly-referenced
reachable garbage without incurring the computational overhead sometimes
associated with collecting weak bindings (e.g. the need to recompute a
memoized function).
Lastly we use extend the semantic framework to model the key/value weak
references found in Haskell and we prove the Haskell is semantics
equivalent to a simpler semantics due to the lack of side-effects in our
language
::::::::::::::
2005-031
::::::::::::::
Title: A Formal Semantics for Weak References
Author: Joseph J. Hallett and Assaf J. Kfoury
Date: August 8, 2005
Abstract:
A weak reference is a reference to an object that is not followed by the
pointer tracer when garbage collection is called. That is, a weak
reference cannot prevent the object it references from being garbage
collected. Weak references remain a troublesome programming feature
largely because there is not an accepted, precise semantics that describes
their behavior (in fact, we are not aware of any formalization of their
semantics). The trouble is that weak references allow reachable objects to
be garbage collected, therefore allowing garbage collection to influence
the result of a program. Despite this difficulty, weak references continue
to be used in practice for reasons related to efficient storage
management, and are included in many popular programming languages
(Standard ML, Haskell, OCaml, and Java).
We give a formal semantics for a calculus called that includes weak
references and is derived from Morrisett, Felleisen, and Harper's .
formalizes the notion of garbage collection by means of a rewrite rule.
Such a formalization is required to precisely characterize the semantics
of weak references. However, the inclusion of a garbage-collection
rewrite-rule in a language with weak references introduces
non-deterministic evaluation, even if the parameter-passing mechanism is
deterministic (call-by-value in our case). This raises the question of
confluence for our rewrite system. We discuss natural restrictions under
which our rewrite system is confluent, thus guaranteeing uniqueness of
program result. We define conditions that allow other garbage collection
algorithms to co-exist with our semantics of weak references. We also
introduce a polymorphic type system to prove the absence of erroneous
program behavior (i.e., the absence of "stuck evaluation") and a
corresponding type inference algorithm. We prove the type system sound and
the inference algorithm sound and complete.
::::::::::::::
2005-032
::::::::::::::
Title: MusicCamera -- A Camera-based Music Making Tool for Physical Rehabilitation
Date: December 8, 2005
Authors: Mikhail Gorman, Margrit Betke, Elliot Saltzman, and Amir Lahav
Abstract:
The therapeutic effects of playing music are being recognized
increasingly in the field of rehabilitation medicine. People with
physical disabilities, however, often do not have the motor dexterity
needed to play an instrument. We developed a camera-based
human-computer interface called ``MusicCamera'' to provide such people
with a means to make music by performing therapeutic exercises.
MusicCamera uses computer vision techniques to convert the movements
of a patient's body part, for example, a finger, hand, or foot, into
musical and visual feedback using the open software platform EyesWeb.
It can be adjusted to a patient's particular therapeutic needs and
provides quantitative tools for monitoring the recovery process and
assessing therapeutic outcomes. We tested the potential of
MusicCamera as a rehabilitation tool with six subjects who responded
to or created music in various movement exercises. In these
proof-of-concept experiments, MusicCamera has performed reliably and
shown its promise as a therapeutic device.
::::::::::::::
2005-033
::::::::::::::
Title: Safe Compositional Specification of Networking Systems: A Compositional Analysis Approach
Authors: Likai Liu, Assaf Kfoury, Azer Bestavros, Adam D. Bradley, Yarom Gabay, and Ibrahim Matta
Date: December 28, 2005
Abstract:
We present a type inference algorithm, in the style of compositional
analysis, for the language TRAFFIC---a specification language for flow
composition applications proposed in BUCS-TR-2005-014---and prove that
this algorithm is correct: the typings it infers are principal
typings, and the typings agree with syntax-directed type checking on
closed flow specifications. This algorithm is capable of verifying
partial flow specifications, which is a significant improvement over
syntax-directed type checking algorithm presented in BUCS-TR-2005-015.
We also show that this algorithm runs efficiently, i.e., in low-degree
polynomial time.
::::::::::::::
2005-034
::::::::::::::
Title: Type Systems for a Network Specification Language With Multiple-Choice Let
Authors: Yarom Gabay, Assaf J. Kfoury, Likai Liu, Azer Bestavros, Adam D. Bradley, and Ibrahim Matta
Abstract:
When analysing the behavior of complex networked systems, it is often
the case that some components within that network are only
known to the extent that they belong to one of a set of possible
"implementations" -- e.g., versions of a specific protocol, class of
schedulers, etc. In this report we augment the specification language
considered in BUCS-TR-2004-021, BUCS-TR-2005-014, BUCS-TR-2005-015,
and BUCS-TR-2005-033, to include a non-deterministic multiple-choice
let-binding, which allows us to consider compositions of networking
subsystems that allow for looser component specifications.
::::::::::::::
2005-035
::::::::::::::
Title: Inferring Intersection Typings that Are Equivalent to Call-by-Name and Call-by-Value Evaluations.
Authors: Adam Bakewell, Sebastien Carlier, A.J. Kfoury, J.B. Wells
Date: April 9, 2005
Abstract:
We present a procedure to infer a typing for an arbitrary lambda-term
M in an intersection-type system that translates into exactly the
call-by-name (resp., call-by-value) evaluation of M. Our framework is
the recently developed System E which augments intersection types with
expansion variables. The inferred typing for M is obtained by setting
up a unification problem involving both type variables and expansion
variables, which we solve with a confluent rewrite system. The
inference procedure is compositional in the sense that typings for
different program components can be inferred in any order, and without
knowledge of the definition of other program components. Using
expansion variables lets us achieve a compositional inference
procedure easily. Termination of the procedure is generally
undecidable. The procedure terminates and returns a typing iff the
input M is normalizing according to call-by-name (resp.,
call-by-value). The inferred typing is exact in the sense that the
exact call-by-name (resp., call-by-value) behaviour of M can be
obtained by a (polynomial) transformation of the typing. The inferred
typing is also principal in the sense that any other typing that
translates the call-by-name (resp., call-by-value) evaluation of M can
be obtained from the inferred typing for M using a substitution-based
transformation.
::::::::::::::
2006-001
::::::::::::::
Title: Computational Properties of SNAFU
Authors: Yarom Gabay, Michael J. Ocean, Assaf J. Kfoury, and Likai Liu
Date: February 6, 2006
Abstract:
Sensor applications in Sensoria [BBKO:basenets05] are expressed using
STEP "Sensorium Task Execution Plan". SNAFU (SensorNet Applications as
Functional Units) serves as a high-level sensor-programming language,
which is compiled into STEP. In SNAFU's current form, its differences
with STEP are relatively minor, as they are limited to shorthands and
macros not available in STEP. We show that, however restrictive it may
seem, SNAFU has in fact universal power; technically, it is a
Turing-complete language, i.e., any Turing program can be written in
SNAFU (though not always conveniently). Although STEP may be allowed
to have universal power, as a low-level language not directly
available to Sensorium users, SNAFU programmers may use this power for
malicious purposes or inadvertently introduce errors with destructive
consequences. In future developments of SNAFU, we plan to introduce
restrictions and high-level features with safety guards, such as those
provided by a type system, which will make SNAFU programming safer.
::::::::::::::
2006-002
::::::::::::::
Title: On the Impact of Low-Rate Attacks
Authors: Mina Guirguis, Azer Bestavros, and Ibrahim Matta
Date: February 6, 2006
Abstract
Recent research have exposed new breeds of attacks that are capable of
denying service or inflicting significant damage to TCP flows,
without sustaining the attack traffic. Such attacks are often referred
to as ``low-rate'' attacks and they stand in sharp contrast against
traditional Denial of Service (DoS) attacks that can completely shut
off TCP flows by flooding an Internet link. In this paper, we study
the impact of these new breeds of attacks and the extent to which
defense mechanisms are capable of mitigating the attack's
impact. Through adopting a simple discrete-time model with a single
TCP flow and a non-oblivious adversary, we were able to expose new
variants of these low-rate attacks that could potentially have high
attack potency per attack burst. Our analysis is focused towards
worst-case scenarios, thus our results should be regarded as upper
bounds on the impact of low-rate attacks rather than a real assessment
under a specific attack scenario.
::::::::::::::
2006-003
::::::::::::::
Title: Distributed Selfish Caching
Authors: Nikolaos Laoutaris, Georgios Smaragdakis, Azer Bestavros, Ibrahim Matta, and Ioannis Stavrakakis
Date: February 7, 2006
Abstract:
Although cooperation generally increases the amount of resources
available to a community of nodes, thus improving individual and
collective performance, it also allows for the appearance of potential
mistreatment problems through the exposition of one node's resources
to others. We study such concerns by considering a group of
independent, rational, self-aware nodes that cooperate using on-line
caching algorithms, where the exposed resource is the storage at each
node. Motivated by content networking applications -- including web
caching, CDNs, and P2P -- this paper extends our previous work on the
on-line version of the problem, which was conducted under a
game-theoretic framework, and limited to object replication. We
identify and investigate two causes of mistreatment: (1) cache state
interactions (due to the cooperative servicing of requests) and (2)
the adoption of a common scheme for cache management policies. Using
analytic models, numerical solutions of these models, as well as
simulation experiments, we show that on-line cooperation schemes using
caching are fairly robust to mistreatment caused by state
interactions. To appear in a substantial manner, the interaction
through the exchange of miss-streams has to be very intense, making it
feasible for the mistreated nodes to detect and react to
exploitation. This robustness ceases to exist when nodes fetch and
store objects in response to remote requests, i.e., when they operate
as Level-2 caches (or proxies) for other nodes. Regarding mistreatment
due to a common scheme, we show that this can easily take place when
the "outlier" characteristics of some of the nodes get
overlooked. This finding underscores the importance of allowing
cooperative caching nodes the flexibility of choosing from a diverse
set of schemes to fit the peculiarities of individual nodes. To that
end, we outline an emulation-based framework for the development of
mistreatment-resilient distributed selfish caching schemes. Our
framework utilizes a simple control-theoretic approach to dynamically
parameterize the cache management scheme. We show performance
evaluation results that quantify the benefits from instantiating such a
framework, which could be substantial under skewed demand profiles.
::::::::::::::
2006-004
::::::::::::::
Title: Authenticated Index Structures for Outsourced Database Systems
Authors: Feifei Li, Marios Hadjieleftheriou, George Kollios, Leonid Reyzin
Date: April 1, 2006
Abstract:
In outsourced database (ODB) systems the database owner publishes its
data through a number of remote servers, with the goal of enabling
clients at the edge of the network to access and query the data more
efficiently. As servers might be untrusted or can be compromised,
query authentication becomes an essential component of ODB
systems. Existing solutions for this problem concentrate mostly on
static scenarios and are based on idealistic properties for certain
cryptographic primitives, looking at the problem mostly from a
theoretical perspective. In this work, first we define a variety of
essential and practical cost metrics associated with ODB systems.
Then we analytically evaluate a number of different approaches, in
search for a solution that best leverages all metrics. Most
importantly, we look at solutions that can handle dynamic scenarios,
where owners periodically update the data residing at the
servers. Finally, we discuss query freshness, a new dimension in data
authentication that has not been explored before. A comprehensive
experimental evaluation of the proposed and existing approaches is
used to validate the analytical models and verify our claims. Our
findings exhibit that the proposed solutions improve performance
substantially over existing approaches, both for static and dynamic
environments.
::::::::::::::
2006-005
::::::::::::::
Title: Amorphous Placement and Retrieval of Sensory Data in Sparse
Mobile Ad-Hoc Networks
Authors: Hany Morcos, Azer Bestavros, and Ibrahim Matta
Date: April 4, 2006
Abstract:
Personal communication devices are increasingly being equipped with
sensors that are able to passively collect information from their
surroundings -- information that could be stored in fairly small
local caches. We envision a system in which users of such devices
use their collective sensing, storage, and communication resources
to query the state of (possibly remote) neighborhoods. The goal of
such a system is to achieve the highest query success ratio using
the least communication overhead (power). We show that the use of
Data Centric Storage (DCS), or directed placement, is a viable
approach for achieving this goal, but only when the underlying
network is well connected. Alternatively, we propose, amorphous
placement, in which sensory samples are cached locally and
informed exchanges of cached samples is used to diffuse the sensory
data throughout the whole network. In handling queries, the local
cache is searched first for potential answers. If unsuccessful, the
query is forwarded to one or more direct neighbors for answers. This
technique leverages node mobility and caching capabilities to avoid
the multi-hop communication overhead of directed placement. Using a
simplified mobility model, we provide analytical lower and upper
bounds on the ability of amorphous placement to achieve uniform
field coverage in one and two dimensions. We show that combining
informed shuffling of cached samples upon an encounter between two
nodes, with the querying of direct neighbors could lead to
significant performance improvements. For instance, under realistic
mobility models, our simulation experiments show that amorphous
placement achieves 10% to 40% better query answering ratio at a
25% to 35% savings in consumed power over directed placement.
::::::::::::::
2006-006
::::::::::::::
Title: A customizable camera-based human computer interaction system allowing people with disabilities autonomous hands free navigation of multiple computing tasks
Authors: Wajeeha Akram, Laura R. Tiberii and Margrit Betke
Date: May 11, 2006
Abstract
Many people suffer from conditions that lead to deterioration of motor
control and makes access to the computer using traditional input devices
difficult. In particular, they may loose control of hand movement to the
extent that the standard mouse cannot be used as a pointing device. Most
current alternatives use markers or specialized hardware to track and
translate a user's movement to pointer movement. These approaches may be
perceived as intrusive, for example, wearable devices. Camera-based
assistive systems that use visual tracking of features on the user's
body often require cumbersome manual adjustment. This paper introduces
an enhanced computer vision based strategy where features, for example
on a user's face, viewed through an inexpensive USB camera, are tracked
and translated to pointer movement. The main contributions of this paper
are (1) enhancing a video based interface with a mechanism for mapping
feature movement to pointer movement, which allows users to navigate to
all areas of the screen even with very limited physical movement, and
(2) providing a customizable, hierarchical navigation framework for
human computer interaction (HCI). This framework provides effective use
of the vision-based interface system for accessing multiple applications
in an autonomous setting. Experiments with several users show the
effectiveness of the mapping strategy and its usage within the
application framework as a practical tool for desktop users with
disabilities.
::::::::::::::
2006-007
::::::::::::::
Title: Web Mediators for Accessible Browsing
Authors: Benjamin N. Waber, John J. Magee, and Margrit Betke
Date: May 11, 2006
Abstract
We present a highly accurate method for classifying web pages based on
link percentage, which is the percentage of text characters that are
parts of links normalized by the number of all text characters on a
web page. K-means clustering is used to create unique thresholds to
differentiate index pages and article pages on individual web sites.
Index pages contain mostly links to articles and other indices, while
article pages contain mostly text. We also present a novel link
grouping algorithm using agglomerative hierarchical clustering that
groups links in the same spatial neighborhood together while
preserving link structure. Grouping allows users with severe
disabilities to use a scan-based mechanism to tab through a web page
and select items. In experiments, we saw up to a 40-fold reduction in
the number of commands needed to click on a link with a scan-based
interface, which shows that we can vastly improve the rate of
communication for users with disabilities. We used web page
classification and link grouping to alter web page display on an
accessible web browser that we developed to make a usable browsing
interface for users with disabilities. Our classification method
consistently outperformed a baseline classifier even when using
minimal data to generate article and index clusters, and achieved
classification accuracy of 94.0% on web sites with well-formed or
slightly malformed HTML, compared with 80.1% accuracy for the baseline
classifier.
::::::::::::::
2006-008
::::::::::::::
Title: An Adaptive Management Approach to Resolving Policy Conflicts
Authors: Selma Yilmaz and Ibrahim Matta
Date: May 25, 2006
Abstract:
The Border Gateway Protocol (BGP) is the current inter-domain routing
protocol used to exchange reachability information between Autonomous
Systems (ASes) in the Internet. BGP supports policy-based routing
which allows each AS to independently define a set of local policies
on which routes it accepts and advertises from/to other networks, as
well as on which route it prefers when more than one route becomes
available. However, independently chosen local policies may cause
global conflicts, which result in protocol divergence. In this paper,
we propose a new algorithm, called Adaptive Policy Management Scheme
(APMS), to resolve policy conflicts in a distributed manner. Akin to
distributed feedback control systems, each AS independently classifies
the state of the network as either conflict-free or potentially
conflicting by observing its local history only (namely, route
flaps). Based on the degree of measured conflicts, each AS dynamically
adjusts its own path preferences---increasing its preference for
observably stable paths over flapping paths. APMS also includes a
mechanism to distinguish route flaps due to topology changes, so as
not to confuse them with those due to policy conflicts. A correctness
and convergence analysis of APMS based on the sub-stability property
of chosen paths is presented. Implementation in the SSF network
simulator is performed, and simulation results for different
performance metrics are presented. The metrics capture the dynamic
performance (in terms of instantaneous throughput, delay, etc.) of
APMS and other competing solutions, thus exposing the often neglected
aspects of performance.
::::::::::::::
2006-009
::::::::::::::
Title: On the Interaction between TCP and the Wireless Channel in CDMA2000 Networks
Authors: Karim Mattar, Ashwin Sridharan, Hui Zang, Ibrahim Matta and Azer Bestavros
Date: June 6, 2006
Abstract:
In this work, we conducted extensive active measurements on a large
nationwide CDMA2000 1xRTT network in order to characterize the impact
of both the Radio Link Protocol and more importantly, the wireless
scheduler, on TCP. Our measurements include standard TCP/UDP logs, as
well as detailed RF layer statistics that allow observability into RF
dynamics. With the help of a robust correlation measure, normalized
mutual information, we were able to quantify the impact of these two
RF factors on TCP performance metrics such as the round trip time,
packet loss rate, instantaneous throughput etc. We show that the
variable channel rate has the larger impact on TCP behavior when
compared to the Radio Link Protocol. Furthermore, we expose and rank
the factors that influence the assigned channel rate itself and in
particular, demonstrate the sensitivity of the wireless scheduler to
the data sending rate. Thus, TCP is adapting its rate to match the
available network capacity, while the rate allocated by the wireless
scheduler is influenced by the sender's behavior. Such a system is
best described as a closed loop system with two feedback controllers,
the TCP controller and the wireless scheduler, each
one affecting the other's decisions. In this work, we take the first
steps in characterizing such a system in a realistic environment.
::::::::::::::
2006-010
::::::::::::::
Author: Vassilis Athitsos
Title: Learning Embeddings for Indexing, Retrieval, and Classification, with Applications to Object and Shape Recognition in Image Databases
Date: June 14, 2006
Abstract:
Nearest neighbor retrieval is the task of identifying, given a
database of objects and a query object, the objects in the database
that are the most similar to the query. Retrieving nearest neighbors
is a necessary component of many practical applications, in fields as
diverse as computer vision, pattern recognition, multimedia databases,
bioinformatics, and computer networks. At the same time, finding
nearest neighbors accurately and efficiently can be challenging,
especially when the database contains a large number of objects, and
when the underlying distance measure is computationally expensive.
This thesis proposes new methods for improving the efficiency and
accuracy of nearest neighbor retrieval and classification in spaces
with computationally expensive distance measures. The proposed methods
are domain-independent, and can be applied in arbitrary spaces,
including non-Euclidean and non-metric spaces. In this thesis
particular emphasis is given to computer vision applications related
to object and shape recognition, where expensive non-Euclidean
distance measures are often needed to achieve high accuracy.
The first contribution of this thesis is the BoostMap algorithm for
embedding arbitrary spaces into a vector space with a computationally
efficient distance measure. Using this approach, an approximate set of
nearest neighbors can be retrieved efficiently - often orders of
magnitude faster than retrieval using the exact distance measure in
the original space. The BoostMap algorithm has two key distinguishing
features with respect to existing embedding methods. First, embedding
construction explicitly maximizes the amount of nearest neighbor
information preserved by the embedding. Second, embedding construction
is treated as a machine learning problem, in contrast to existing
methods that are based on geometric considerations.
The second contribution is a method for constructing query-sensitive
distance measures for the purposes of nearest neighbor retrieval and
classification. In high-dimensional spaces, query-sensitive distance
measures allow for automatic selection of the dimensions that are the
most informative for each specific query object. It is shown
theoretically and experimentally that query-sensitivity increases the
modeling power of embeddings, allowing embeddings to capture a larger
amount of the nearest neighbor structure of the original space.
The third contribution is a method for speeding up nearest neighbor
classification by combining multiple embedding-based nearest neighbor
classifiers in a cascade. In a cascade, computationally efficient
classifiers are used to quickly classify easy cases, and classifiers
that are more computationally expensive and also more accurate are
only applied to objects that are harder to classify. An interesting
property of the proposed cascade method is that, under certain
conditions, classification time actually decreases as the size of the
database increases, a behavior that is in stark contrast to the
behavior of typical nearest neighbor classification systems.
The proposed methods are evaluated experimentally in several different
applications: hand shape recognition, off-line character recognition,
online character recognition, and efficient retrieval of time series.
In all datasets, the proposed methods lead to significant improvements
in accuracy and efficiency compared to existing state-of-the-art
methods. In some datasets, the general-purpose methods introduced in
this thesis even outperform domain-specific methods that have been
custom-designed for such datasets.
::::::::::::::
2006-011
::::::::::::::
Title: Authenticated Index Sturctures for Aggregation Queries in Outsourced
Databases
Authors: Feifei Li, Marios Hadjieleftheriou, George Kollios, and Leonid Reyzin.
Date: July 10, 2006
Abstract:
In an outsourced database system the data owner publishes
information through a number of remote, untrusted servers
with the goal of enabling clients to access and query the
data more efficiently. As clients cannot trust servers, query
authentication is an essential component in any outsourced
database system. Clients should be given the capability to
verify that the answers provided by the servers are correct
with respect to the actual data published by the owner.
While existing work provides authentication techniques for
selection and projection queries, there is a lack of techniques
for authenticating aggregation queries. This article introduces
the rst known authenticated index structures for aggregation
queries. First, we design an index that features
good performance characteristics for static environments,
where few or no updates occur to the data. Then, we extend
these ideas and propose more involved structures for the dynamic
case, where the database owner is allowed to update
the data arbitrarily. Our structures feature excellent average
case performance for authenticating queries with multiple
aggregate attributes and multiple selection predicates.
We also implement working prototypes of the proposed techniques
and experimentally validate the correctness of our ideas.
::::::::::::::
2006-012
::::::::::::::
Title: Extending snBench to Support Hierarchical and Configurable
Scheduling
Authors: Gabriel Parmer, Georgios Zervas, Angshuman Bagchi
Date: July 14, 2006
Abstract:
It is useful in systems that must support multiple applications with
various temporal requirements to allow application-specific policies to
manage resources accordingly. However, there is a tension between this
goal and the desire to control and police possibly malicious programs.
The Java-based Sensor Execution Environment (SXE) in snBench presents a
situation where such considerations add value to the system. Multiple
applications can be run by multiple users with varied temporal
requirements, some Real-Time and others best effort.}
This paper outlines and documents an implementation of a hierarchical
and configurable scheduling system with which different applications can
be executed using application-specific scheduling policies. Concurrently
the system administrator can define fairness policies between
applications that are imposed upon the system. Additionally, to ensure
forward progress of system execution in the face of malicious or
malformed user programs, an infrastructure for execution using multiple
threads is described.
::::::::::::::
2006-013
::::::::::::::
Title: Extending snBench to Provide Concurrency Support in the Sensorium Execution Environment (SXE)
Authors: Jorge Londono, Sowmya Manjanatha, and Zhinan Han.
Date: July 14, 2006
Abstract:
The SNBENCH is a general-purpose programming environment and run-time
system targeted towards a variety of Sensor applications such as
environmental sensing, location sensing, video sensing, etc. In its
current structure, the run-time engine of the SNBENCH namely, the
Sensorium Execution Environment (SXE) processes the entities of
execution in a single thread of operation. In order to effectively
support applications that are time-sensitive and need priority, it is
imperative to process the tasks discretely so that specific policies can
be applied at a much granular level. The goal of this project was to
modify the SXE to enable efficient use of system resources by way
of multi-tasking the individual components. Additionally, the
transformed SXE offers the ability to classify and employ different
schemes of processing to the individual tasks.
::::::::::::::
2006-014
::::::::::::::
Title: Extending snBench to Support a Graphical Programming Interface for a Sensor Network Tasking Language (STEP)
Authors: Ching Chang, Raymond Sweha, Panagiotis Papapetrou
Date: July 14, 2006
Abstract:
We report on our development and implementation of a graphical
"programming" interface for a sensor network tasking language called
STEP. The graphical interface allows the user to specify a program
execution graphically from an extensible pallet of functionalities and
save the results as a properly formatted STEP file. Moreover, the
software is able to load a file in STEP format and convert it into the
corresponding graphical representation. During both phases a
type-checker is running on the background to ensure that both the
graphical representation and the STEP file are syntactically correct.
This project has been motivated by the Sensorium project at Boston
University. In this technical report we present the basic features of
the software, the process that has been followed during the design and
implementation. Finally, we describe the approach used to test and
validate our software.
::::::::::::::
2006-015
::::::::::::::
Title: Extending snBench to Support a Video-Based Intrusion Detection and Alerting System with a Centralized Hash Table
Author: Dustin Burke, Dave Cecere, and Ben Freiberg
Date: July 14, 2006
Abstract:
In this project we design and implement a centralized hashing table
in the snBench sensor network environment. We discuss the feasibility
of this approach and compare and contrast with the distributed hashing
architecture, with particular discussion regarding the conditions under
which a centralized architecture makes sense.
There are numerous computational tasks that require persistence of data
in a sensor network environment. To help motivate the need for data
storage in snBench we demonstrate a practical application of the
technology whereby a video camera can monitor a room to detect the
presence of a person and send an alert to the appropriate authorities.
::::::::::::::
2006-016
::::::::::::::
Title: Integrating Sensor-Network Research and Development into a Software Engineering Curriculum
Author: Michael J Ocean, Assaf J. Kfoury, and Azer Bestavros
Date: July 14, 2006
Abstract:
The emergence of a sensor-networked world produces a clear and urgent
need for well-planned, safe and secure software engineering. It is the
role of universities to prepare graduates with the knowledge and
experience to enter the work-force with a clear understanding of
software design and its application to the future safety of computing.
The snBench (Sensor Network WorkBench) project aims to provide support
to the programming and deployment of Sensor Network Applications,
enabling shared sensor embedded spaces to be easily tasked with
various sensory applications by different users for simultaneous
execution. In this report we discus our experience using the snBench
research project as the foundation for semester-long project in a
graduate level software engineering class at Boston University
(CS511).
::::::::::::::
2006-017
::::::::::::::
Title: The Cache Inference Problem and its Application to Content and Request Routing
Authors: Nikolaos Laoutaris, Georgos Zervas, Azer Bestavros, and George Kollios
Date: July 14, 2006
In many networked applications, independent caching agents cooperate
by servicing each other's miss streams, without revealing the
operational details of the caching mechanisms they employ. Inference
of such details could be instrumental for many other processes. For
example, it could be used for optimized forwarding (or routing) of
one's own miss stream (or content) to available proxy caches, or for
making cache-aware resource management decisions. In this paper, we
introduce the ``Cache Inference Problem'' (CIP) as that of
inferring the characteristics of a caching agent, given the miss
stream of that agent. While CIP is insolvable in its most general
form, there are special cases of practical importance in which it is,
including when the request stream follows an Independent Reference
Model (IRM) with generalized power-law (GPL) demand distribution. To
that end, we design two basic ``litmus'' tests that are able to detect
LFU and LRU replacement policies, the effective size of the cache and
of the object universe, and the skewness of the GPL demand for
objects. Using extensive experiments under synthetic as well as real
traces, we show that our methods infer such characteristics accurately
and quite efficiently, and that they remain robust even when the
IRM/GPL assumptions do not hold, and even when the underlying
replacement policies are not ``pure'' LFU or LRU. We exemplify the
value of our inference framework by considering example applications.
::::::::::::::
2006-018
::::::::::::::
Title: Distributed Placement of Service Facilities in Large-Scale Networks
Authors: Nikolaos Laoutaris, Georgios Smaragdakis, Konstantinos Oikonomou, Ioannis Stavrakakis, and Azer Bestavros
Date: July 14, 2006
Abstract:
The effectiveness of service provisioning in large-scale networks is
highly dependent on the number and location of service facilities
deployed at various hosts. The classical, centralized approach to
determining the latter would amount to formulating and solving the
``uncapacitated k-median'' (UKM) problem (if the requested number of
facilities is fixed), or the ``uncapacitated facility location'' (UFL)
problem (if the number of facilities is also to be optimized).
Clearly, such centralized approaches require knowledge of global
topological and demand information, and thus do not scale and are not
practical for large networks. The key question posed and answered in
this paper is the following: ``How can we determine in a distributed
and scalable manner the number and location of service facilities?''
We propose an innovative approach in which topology and demand
information is limited to neighborhoods, or ``balls'' small radius
around selected facilities, whereas demand information is captured
implicitly for the remaining (remote) clients outside these
neighborhoods, by mapping them to clients on the edge of the
neighborhood; the ball radius regulates the trade-off between
scalability and performance. We develop a scalable, distributed
approach that answers our key question through an iterative
re-optimization of the location and the number of facilities within
such balls. We show that even for small values of the radius (1 or 2),
our distributed approach achieves performance under various synthetic
and real Internet topologies that is comparable to that of optimal,
centralized approaches requiring full topology and demand information.
::::::::::::::
2006-019
::::::::::::::
Title: Implications of Selfish Neighbor Selection in Overlay Networks
Authors: Nikolaos Laoutaris, Georgios Smaragdakis, Azer Bestavros, and John Byers
Date: July 14, 2006
Abstract:
In a typical overlay network for routing or content sharing, each node
must select a fixed number of immediate overlay neighbors for routing
traffic or content queries. A selfish node entering such a network
would select neighbors so as to minimize the weighted sum of expected
access costs to all its destinations. Previous work on selfish
neighbor selection has built intuition with simple models where edges
are undirected, access costs are modeled by hop-counts, and nodes have
potentially unbounded degrees. However, in practice, important
constraints not captured by these models lead to richer games with
substantively and fundamentally different outcomes. Our work models
neighbor selection as a game involving directed links, constraints on
the number of allowed neighbors, and costs reflecting both network
latency and node preference. We express a node's ``best response''
wiring strategy as a $k$-median problem on asymmetric distance, and
use this formulation to obtain pure Nash equilibria. We experimentally
examine the properties of such stable wirings on synthetic topologies,
as well as on real topologies and maps constructed from PlanetLab and
the AS-level Internet measurements. Our results indicate that selfish
nodes can reap substantial performance benefits when connecting to
overlay networks composed of non-selfish nodes. On the other hand, in
overlays that are dominated by selfish nodes, the resulting stable
wirings are optimized to such great extent that even non-selfish
newcomers can extract near-optimal performance through naive wiring
strategies.
::::::::::::::
2006-020
::::::::::::::
Title: Scalable Overlay Multicast Tree Construction for QoS-Constrained Media Streaming
Authors: Gabriel Parmer, Richard West, and Gerald Fry
Date: July 14, 2006
Abstract:
Overlay networks have become popular in recent times for content
distribution and end-system multicasting of media streams. In the latter
case, the motivation is based on the lack of widespread deployment of
IP multicast and the ability to perform end-host processing. However,
constructing routes between various end-hosts, so that data can be
streamed from content publishers to many thousands of subscribers,
each having their own QoS constraints, is still a challenging
problem. First, any routes between end-hosts using trees built on top
of overlay networks can increase stress on the underlying physical
network, due to multiple instances of the same data traversing a given
physical link. Second, because overlay routes between end-hosts may
traverse physical network links more than once, they increase the
end-to-end latency compared to IP-level routing. Third, algorithms for
constructing efficient, large-scale trees that reduce link stress and
latency are typically more complex.
This paper therefore compares various methods to construct multicast
trees between end-systems, that vary in terms of implementation costs
and their ability to support per-subscriber QoS constraints. We
describe several algorithms that make trade-offs between algorithmic
complexity, physical link stress and latency. While no algorithm is
best in all three cases we show how it is possible to efficiently
build trees for several thousand subscribers with latencies within a
factor of two of the optimal, and link stresses comparable to, or
better than, existing technologies.
::::::::::::::
2006-021
::::::::::::::
Mina's Technical Note
::::::::::::::
2006-022
::::::::::::::
Title: An Independent-Connection Model for Traffic Matrices
Authors: Vijay Erramilli, Mark Crovella Dept. of Computer Science, Boston Univ. , Nina Taft, Intel Research, Berkeley
Date: 09/06/2006
Abstract:
The `gravity' model has been used both for traffic matrix (TM)
estimation and for generating synthetic TMs. It is based on the
assumption that a packet's network egress is independent of its ingress.
We argue that in real IP networks, this assumption should not and does
not hold. The fact that most traffic consists of two-way exchanges of
packets means that traffic streams flowing in opposite directions at any
point in the network are {\em not\/} independent. In this paper we
propose a model for traffic matrices based on independence of {\em
connections\/} rather than packets. We argue that the
independent-connection (IC) model is simpler, more intuitive, and has a
more direct connection to underlying network phenomena than the gravity
model. Using publicly available TMs, we show that the IC model fits
real data better than the gravity model. We then
characterize the parameters involved in the IC model based on our
datasets; these results can be used to construct synthetic TMs.
Finally, we turn to the well-studied problem of choosing a prior for TM
estimation. Assuming that certain parameters of model can be measured
in advance and remain constant in time, we show that the IC model yields
a better prior for TM estimation than the gravity model.
::::::::::::::
2006-023
::::::::::::::
Title: Notes on the Effect of Different Access Patterns on the Intensity of Mistreatment in Distributed Caching Groups
Author: Georgios Smaragdakis
Date: September 18, 2006
Abstract:
In this report, we extend our study of the intensity of mistreatment in
distributed caching groups due to state interaction. In our earlier
work (published as BUCS-TR-2006-003), we analytically showed how this type
of mistreatment may appear under homogeneous demand distributions. We
provided a simple setting where mistreatment due to state interaction
may occur. According to this setting, one or more ``overactive'' nodes
generate disproportionately more requests than the other nodes. In this
report, we extend our experimental evaluation of the intensity of
mistreatment to which non-overactive nodes are subjected, when the
demand distributions are not homogeneous.
::::::::::::::
2006-024
::::::::::::::
Title: Spatiotemporal Gesture Segmentation
Authors: Jonathan Alon
Date: September 18, 2006
Abstract:
Spotting patterns of interest in an input signal is a very useful
task in many different