%R 1993-001
%T Performance Evaluation of Two-Shadow Speculative Concurrency Control
%A Bestavros, Azer
%A Braoudakis, Spyridon
%A Panagos, Euthimios
%D February 1993
%U http://www.cs.bu.edu/techreports/1993-001-scc-2s-perf.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Speculative Concurrency Control (SCC) is a new concurrency control
approach especially suited for real-time database applications. It
relies on the use of redundancy to ensure that serializable schedules
are discovered and adopted as early as possible, thus increasing the
likelihood of the timely commitment of transactions with strict timing
constraints. In a recent publication by two of the authors, SCC-nS, a
generic algorithm that characterizes a family of SCC-based algorithms
was described, and its correctness established by showing that it only
admits serializable histories. In this paper, we evaluate the
performance of the Two-Shadow SCC algorithm (SCC-2S), a member of the
SCC-nS family, which is notable for its minimal use of redundancy. In
particular, we show that SCC-2S (as a representative of SCC-based
algorithms) provides significant performance gains over the widely
used Optimistic Concurrency Control with Broadcast Commit (OCC-BC),
under a variety of operating conditions and workloads.
%R 1993-002
%A Bestavros, Azer
%T Speculative Concurrency Control for Real-Time Databases
%D January 1993
%U http://www.cs.bu.edu/techreports/1993-002-scc.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper we examine a number of admission control and scheduling
protocols for high-performance web servers based on a 2-phase policy
for serving HTTP requests. The first ``registration'' phase involves
establishing the TCP connection for the HTTP request and
parsing/iterpreting its arguments, whereas the second ``service''
phase involves the service/transmission of data in response to the
HTTP request. By introducing a delay between these two phases, we show
that the performance of a web server could be improved significantly
through the adoption of a number of scheduling policies that optimize
the utilization of various system components (e.g. memory cache and
I/O). In addition, to its premise for improving the performance of a
single web server, the delineation between the registration and
service phases of an HTTP request may be useful for load balancing
purposes on clusters of web servers. We are investigating the use of
such a mechanism as part of the Commonwealth testbed being developed
at Boston University.
%R 1993-003
%T Quadsim Student Manual
%A Shaban, Marwan
%D April 1993
%U http://www.cs.bu.edu/techreports/1993-003-quadsim.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Quadsim is an intermediate code simulator. It allows you to "run"
programs that your compiler generates in intermediate code format.
Its user interface is similar to most debuggers in that you can step
through your program, instruction by instruction, set breakpoints,
examine variable values, and so on. The intermediate code format used
by Quadsim is that described in [Aho 86]. If your compiler generates
intermediate code in this format, you will be able to take
intermediate-code files generated by your compiler, load them into the
simulator, and watch them "run." You are provided with functions that
hide the internal representation of intermediate code. You can use
these functions within your compiler to generate intermediate code
files that can be read by the simulator. Quadsim was inspired and
greatly influenced by [Aho 86]. The material in chapter 8
(Intermediate Code Generation) of [Aho 86] should be considered
background material for users of Quadsim.
%R 1993-004
%T Proceedings of Sixth International Workshop on Unification
%A Snyder, Wayne
%D April 1993
%U http://www.cs.bu.edu/techreports/1993-004-unif93proc.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The Proceedings of the Sixth International Workshop on
Unification contains short papers presented at the workshop
which took place at the Dagstuhl conference center in
Germany, in June 1992.
%R 1993-005
%T Mermera: Non-coherent Distributed Shared Memory for Parallel Computing (PhD Thesis)
%A Sinha, Himanshu
%D May 1993
%U http://www.cs.bu.edu/techreports/1993-005-mermera.hss.thesis.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The proliferation of inexpensive workstations and networks has
prompted several researchers to use such distributed systems for
parallel computing. Attempts have been made to offer a shared-memory
programming model on such distributed memory computers. Most systems
provide a shared-memory that is {\em coherent} in that all processes
that use it agree on the order of all memory events. This
dissertation explores the possibility of a significant improvement in
the performance of some applications when they use {\em non-coherent}
memory. First, a new formal model to describe existing non-coherent
memories is developed. I use this model to prove that certain
problems can be solved using asynchronous iterative algorithms on
shared-memory in which the coherence constraints are substantially
relaxed. In the course of the development of the model I discovered a
new type of non-coherent behavior called {\em Local Consistency}.
Second, a programming model, {\sc Mermera}, is proposed. It provides
programmers with a choice of hierarchically related non-coherent
behaviors along with one coherent behavior. Thus, one can trade-off
the ease of programming with coherent memory for improved performance
with non-coherent memory. As an example, I present a program to solve
a linear system of equations using an asynchronous iterative
algorithm. This program uses all the behaviors offered by {\sc
Mermera}. Third, I describe the implementation of {\sc Mermera} on a
BBN Butterfly TC2000 and on a network of workstations. The
performance of a version of the equation solving program that uses all
the behaviors of {\sc Mermera} is compared with that of a version that
uses coherent behavior only. For a system of 1000 equations the
former exhibits at least a 5-fold improvement in convergence time over
the latter. The version using coherent behavior only does not benefit
from employing more than one workstation to solve the problem while
the program using non-coherent behavior continues to achieve improved
performance as the number of workstations is increased from 1 to 6.
This measurement corroborates our belief that non-coherent shared
memory can be a performance boon for some applications.
%R 1993-006
%T An Implementation of Mermera: A Shared Memory System that Mixes Coherence with Non-coherence
%A Heddaya, Abdelsalam
%A Sinha, Himanshu
%D June 1993
%U http://www.cs.bu.edu/techreports/1993-006-mermera3.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Coherent shared memory is a convenient, but inefficient, method of
inter-process communication for parallel programs. By contrast,
message passing can be less convenient, but more efficient. To get
the benefits of both models, several non-coherent memory behaviors
have recently been proposed in the literature. We present an
implementation of Mermera, a shared memory system that supports both
coherent and non-coherent behaviors in a manner that enables
programmers to mix multiple behaviors in the same
program~\cite{HeddayaS93}. A programmer can debug a Mermera program
using coherent memory, and then improve its performance by selectively
reducing the level of coherence in the parts that are critical to
performance. Mermera permits a trade-off of coherence for
performance. We analyze this trade-off through measurements of our
implementation, and by an example that illustrates the style of
programming needed to exploit non-coherence. We find that, even on a
small network of workstations, the performance advantage of
non-coherence is compelling. Raw non-coherent memory operations
perform 20-40~times better than non-coherent memory operations. An
example aplication program is shown to run 5-11~times faster when
permitted to exploit non-coherence. We conclude by commenting on our
use of the Isis Toolkit of multicast protocols in implementing
Mermera.
%R 1993-007
%T Using Warp to Control Network Contention in Mermera
%A Heddaya, Abdelsalam
%A Park, Kihong
%A Sinha, Himanshu
%D June 1993
%U http://www.cs.bu.edu/techreports/1993-007-mermera-warp.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Parallel computing on a distributed system, such as a network of
workstations, can saturate the communication network, leading to
excessive message delays and consequently poor application
performance. Current operating systems offer only partial support for
flow control protocols that can help insulate application performance
from extraneous traffic on the shared network. We examine empirically
the consequences of integrating one such protocol, called Warp
control~\cite{Park93}, into Mermera, a software shared memory system
that supports parallel computing on distributed
systems~\cite{HeddayaS93hicss}. Preliminary performance measurements
are reported for an asynchronous iterative program to solve a system
of linear equations, under varying levels of network contention. The
experiments were conducted on a network of seven Sun Sparc~1+
workstations, using an auxiliary traffic generator. These
measurements show that Warp succeeds in stabilizing the network
behavior when there is high contention, increasing the effective
throughput available to the application, and consequently decreasing
its completion time. In some cases, however, Warp control does not
achieve the performance attainable by fixed size buffering when using
a statically optimal buffer size. Based on the nature of Warp and the
underlying communication layers, we offer explanations for our
results. Our use of Warp to regulate the allocation of network
bandwidth emphasizes the possibility for integrating it with the
allocation of other resources, such as CPU cycles and disk bandwidth,
so as to optimize overall system throughtput, and enable fully-shared
execution of parallel programs.
%R 1993-008
%T Fixed Point vs. First-Order Logic on Finite Ordered Structures with Unary Relations
%A Kfoury, A.J.
%A Wymann-Boeni, M.
%D August 1993
%U http://www.cs.bu.edu/techreports/1993-008-monadic-fo-vs-fp.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We prove that first order logic is strictly weaker than fixed point
logic over every infinite classes of finite ordered structures with
additional unary relations: Over these classes there is always an
inductive unary relation which cannot be defined by a first-order
formula, even when every inductive sentence (i.e., closed formula) can
be expressed in first-order over this particular class. Our proof
first establishes a property valid for every unary relation definable
by first-order logic over these classes which is peculiar to classes
of ordered structures with unary relations. In a second step we show
that this property itself can be expressed in fixed point logic and
can be used to construct a non-elementary unary relation.
%R 1993-009
%T A Characterization of First-Order Definable Subsets on Classes of Finite Total Orders
%A Kfoury, A.J.
%A Wymann-Boeni, M.
%D August 1993
%U http://www.cs.bu.edu/techreports/1993-009-fo-subsets.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We give an explicit and easy-to-verify characterization for subsets in
finite total orders (infinitely many of them in general) to be
definable by the same first-order formula over any class of finite
total orders. From this characterization we derive immediately that
Beth's definability theorem does not hold in any class of finite total
orders, as well as that McColm's first conjecture is true for all
classes of finite total orders. Another consequence is a natural 0-1
law for definable subsets on finite total orders expressed as a
statement about the possible densities of first-order definable
subsets.
%R 1993-010
%T Learning Unions of Rectangles with Queries
%A Chen, Zhixiang
%A Homer, Steve
%D September 1993
%U http://www.cs.bu.edu/techreports/1993-010-learning-rect.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We investigate the efficient learnability of unions of $k$ rectangles
in the discrete plane $\{1,\ldots,n\}^{2}$ with equivalence and
membership queries. We exhibit a learning algorithm that learns any
union of $k$ rectangles with $O(k^{3}\log n)$ queries, while the time
complexity of this algorithm is bounded by $O(k^{5}\log n)$. We
design our learning algorithm by finding ``corners'' and ``edges'' for
rectangles contained in the target concept and then constructing the
target concept from those ``corners'' and ``edges''. Our result
provides a first approach to on-line learning of nontrivial subclasses
of unions of intersections of halfspaces with equivalence and
membership queries.
%R 1993-011
%T Typability and Type Checking in the Second-Order Lambda-Calculus Are Equivalent and Undecidable
%A Wells, J.B.
%D September 1993
%U http://www.cs.bu.edu/techreports/1993-011-f-undecidable.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider the problems of typability and type checking in the
Girard/Reynolds second-order polymorphic typed lambda calculus, for
which we use the short name ``System F'' and which we use in the
``Curry style'' where types are assigned to pure lambda terms. These
problems have been considered and proven to be decidable or
undecidable for various restrictions and extensions of System F and
other related systems, and lower-bound complexity results for System F
have been achieved, but they have remained ``embarrassing open
problems'' for System F itself. We first prove that type checking in
System F is undecidable by a reduction from semi-unification. We then
prove typability in System F is undecidable by a reduction from type
checking. Since the reverse reduction is already known, this implies
the two problems are equivalent. The second reduction uses a novel
method of constructing lambda terms such that in all type derivations,
specific bound variables must always be assigned a specific type.
Using this technique, we can require that specific subterms must be
typable using a specific, fixed type assignment in order for the
entire term to be typable at all. Any desired type assignment may be
simulated. We develop this method, which we call ``constants for
free'', for both the lambda-K and lambda-I calculi.
%R 1993-012
%T Building Responsive Systems from Physically-correct Specifications
%A Bestavros, Azer
%D October 1993
%U http://www.cs.bu.edu/techreports/1993-012-tra-responsive.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Predictability -- the ability to foretell that an implementation will
not violate a set of specified reliability and timeliness requirements
-- is a crucial, highly desirable property of responsive embedded
systems. This paper overviews a development methodology for responsive
systems, which enhances predictability by eliminating potential
hazards resulting from physically-unsound specifications.
The backbone of our methodology is the Time-constrained Reactive
Automaton (TRA) formalism, which adopts a fundamental notion of space
and time that restricts expressiveness in a way that allows the
specification of only reactive, spontaneous, and causal computation.
Using the TRA model, unrealistic systems -- possessing properties such
as clairvoyance, caprice, infinite capacity, or perfect timing --
cannot even be specified. We argue that this ``ounce of prevention''
at the specification level is likely to spare a lot of time and energy
in the development cycle of responsive systems -- not to mention the
elimination of potential hazards that would have gone, otherwise,
unnoticed.
The TRA model is presented to system developers through the Cleopatra
programming language. Cleopatra features a C-like imperative syntax
for the description of computation, which makes it easier to
incorporate in applications already using C. It is event-driven, and
thus appropriate for embedded process control applications. It is
object-oriented and compositional, thus advocating modularity and
reusability. Cleopatra is semantically sound; its objects can be
transformed, mechanically and unambiguously, into formal TRA automata
for verification purposes, which can be pursued using model-checking
or theorem proving techniques. Since 1989, an ancestor of Cleopatra
has been in use as a specification and simulation language for
embedded time-critical robotic processes.
%R 1993-013
%T A Minimal GB Parser
%A Shaban, Marwan
%D October 1993
%U http://www.cs.bu.edu/techreports/1993-013-gb-parser.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We describe a GB parser implemented along the lines of those written
by Fong [Fong91] and Dorr [Dorr87]. The phrase structure recovery
component is an implementation of Tomita's generalized LR parsing
algorithm (described in [Tomi86]), with recursive control flow
(similar to Fong's implementation). The major principles implemented
are government, binding, bounding, trace theory, case theory,
theta-theory, and barriers. The particular version of GB theory we
use is that described by Haegeman [Haeg91].
The parser is minimal in the sense that it implements the major
principles needed in a GB parser, and has fairly good coverage of
linguistically interesting portions of the English language.
%R 1993-014
%T Multi-version Speculative Concurrency Control with Delayed Commit
%A Bestavros, Azer
%A Wang, Biao
%D October 1993
%U http://www.cs.bu.edu/techreports/1993-014-scc-delayed-commit.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper presents an algorithm which extends the relatively new
notion of speculative concurrency control by delaying the commitment
of transactions, thus allowing other conflicting transactions to
continue execution and commit rather than restart. This algorithm
propagates uncommitted data to other outstanding transactions thus
allowing more speculative schedules to be considered. The algorithm is
shown always to find a serializable schedule, and to avoid cascading
aborts. Like speculative concurrency control, it considers strictly
more schedules than traditional concurrency control algorithms.
Further work is needed to determine which of these speculative methods
performs better on actual transaction loads.
%R 1993-015
%T How good are genetic algorithms at finding large cliques: an experimental
%A Carter, Robert
%A Park, Kihong
%D November 1993
%U http://www.cs.bu.edu/techreports/1993-015-ga-clique.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper investigates the power of genetic algorithms at solving
the MAX-CLIQUE problem. We measure
the performance of a standard genetic algorithm on an elementary set
of problem instances consisting of embedded cliques in random graphs.
We indicate the need for improvement, and
introduce a new genetic algorithm, the {\em multi-phase annealed
GA}, which exhibits superior performance on the same
problem set.
As we scale up the problem size and test on ``hard'' benchmark
instances, we notice a degraded performance in the algorithm caused by
premature convergence to local minima. To alleviate this problem, a
sequence of modifications are implemented ranging from changes in
input representation to systematic local search. The most recent
version, called {\em union GA}, incorporates the features of union
cross-over, greedy replacement, and diversity enhancement. It shows a
marked speed-up in the number of iterations required to find a given
solution, as well as some improvement in the clique size found.
We discuss issues related to the SIMD implementation of the
genetic algorithms on a Thinking Machines CM-5,
which was necessitated by the intrinsically
high time complexity ($O(n^3)$) of the serial algorithm for computing
one iteration.
Our preliminary conclusions are: (1) a genetic algorithm
needs to be heavily customized to work ``well'' for the clique problem;
(2) a GA is computationally very expensive, and its use is
only recommended if it is known to find larger cliques than other
algorithms; (3) although our customization effort is bringing forth
continued improvements, there is no clear evidence, at this time, that a
GA will have better success in circumventing local minima.
%R 1993-016
%T An Algebraic Characterization of First-Order Definability
%A Kfoury, A.J.
%A Wymann-Boeni, M.
%D November 1993
%U http://www.cs.bu.edu/techreports/1993-016-fo-definability.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We give a variable-free relational calculus which defines exactly
all first-order definable relations in a arbitrary structure.
We then show that, over an arbitrary class $\C$ of finite ordered
structures with signature $\{ \LE, R_1, \ldots, R_\alpha \}$,
the unary relations uniformly defined by this calculus over $\C$
are characterized by a another simplified variable-free calculus which we
call $\Q$. $\Q$ is the least set of formal expressions such that:
\begin{eqnarray*}
\Q &\supseteq&\ \{ \varnothing, R_1,\ldots, R_\alpha \}\ \cup\\
& &\ \{ (Q\PLUS x)\ |\ Q\in\Q, x\in\omega \cup \{\infty\} \}\ \cup
\ \{ (Q\MINUS x)\ |\ Q\in\Q, x\in\omega \cup \{\infty\} \}\ \cup \\
& &\ \{ (\NOT Q)\ |\ Q\in\Q\}\ \cup
\ \{ (Q_1\AND Q_2)\ |\ Q_1,Q_2\in\Q\}\ \cup
\ \{ (Q_1\OR Q_2)\ |\ Q_1,Q_2\in\Q\}\ .\
\end{eqnarray*}
where $\PLUS$ and $\MINUS$ are ``shift'' operators defined in Section 3.
\end{abstract}
%R 1993-017
%T A Direct Algorithm for Type Inference in the Rank 2 Fragment of the Second-Order Lambda-Calculus
%A Wells, Joe
%D November 1993
%U http://www.cs.bu.edu/techreports/1993-017-finite-rank.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We study the problem of type inference for a family of polymorphic type
disciplines containing the power of Core-ML. This family comprises all
levels of the stratification of the second-order lambda-calculus by
``rank'' of types. We show that typability is an undecidable problem at
every rank k >= 3 of this stratification. While it was already known that
typability is decidable at rank <= 2, no direct and easy-to-implement
algorithm was available. To design such an algorithm, we develop a new
notion of reduction and show how to use it to reduce the problem of
typability at rank 2 to the problem of acyclic semi-unification. A
by-product of our analysis is the publication of a simple solution
procedure for acyclic semi-unification.
%R 1993-018
%T A General Theory of Semi-Unification
%A Jahama, Said
%A Kfoury, A.J.
%D December 1993
%U http://www.cs.bu.edu/techreports/1993-018-gsureport.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Various restrictions on the terms allowed for substitution give rise
to different cases of semi-unification. Semi-unification on finite
and regular terms has already been considered in the literature. We
introduce a general case of semi-unification where substitutions are
allowed on non-regular terms, and we prove the equivalence of this
general case to a well-known undecidable data base dependency problem,
thus establishing the undecidability of general semi-unification.
We present a unified way of looking at the various problems of
semi-unification. We give some properties that are common to all the
cases of semi-unification. We also the principality property and the
solution set for those problems. We prove that semi-unification on
general terms has the principality property. Finally, we present a
recursive inseparability result between semi-unification on regular
terms and semi-unification on general terms.
%R 1993-019
%T Type Reconstruction in the Presence of Polymorphic Recursion and Recursive Types
%A Jahama, Said
%D December 1993
%U http://www.cs.bu.edu/techreports/1993-019-recursivetype.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We establish the equivalence of type reconstruction with polymorphic recursion
and recursive types is equivalent to regular semi-unification which proves
the undecidability of the corresponding type reconstruction problem. We also
establish the equivalence of type reconstruction with polymorphic recursion
and positive recursive types to a special case of regular semi-unification
which we call positive regular semi-unification. The decidability of positive
regular semi-unification is an open problem.
%R 1993-020
%T AIDA-based Distributed File System
%A Bestavros, Azer
%A Makarechian, Mohammad
%D December 1993
%U http://www.cs.bu.edu/techreports/1993-020-aida-dfs.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper describes a prototype implementation of a Distributed File
System (DFS) based on the Adaptive Information Dispersal Algorithm
(AIDA). Using AIDA, a file block is encoded and dispersed into smaller
blocks stored on a number of DFS nodes distributed over a network. The
implementation devises file creation, read, and write operations. In
particular, when reading a file, the DFS accepts an optional timing
constraint, which it uses to determine the level of redundancy needed
for the read operation. The tighter the timing constraint, the more
nodes in the DFS are queried for encoded blocks. Write operations
update all blocks in all DFS nodes--with future implementations
possibly including the use of read and write quorums. This work was
conducted under the supervision of Professor Azer Bestavros
(best@cs.bu.edu) in the Computer Science Department as part of
Mohammad Makarechian's Master's project.
%R 1994-001
%T On the Performance of Polynomial-time CLIQUE Algorithms on Very Large Graphs
%A Homer, Steve
%A Peinado, Marcus
%D January 1994
%U http://www.cs.bu.edu/techreports/1994-001-maxclique.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The performance of a randomized version of the subgraph-exclusion
algorithm (called Ramsey) for CLIQUE by Boppana and Halld\'{o}rsson is
studied on very large graphs. We compare the performance of this
algorithm with the performance of two common heuristic algorithms, the
greedy heuristic and a version of simulated annealing. These
algorithms are tested on graphs with up to 10,000 vertices on a
workstation and graphs as large as 70,000 vertices on a Connection
Machine. Our implementations establish the ability to run clique
approximation algorithms on very large graphs. We test our
implementations on a variety of different graphs. Our conclusions
indicate that on randomly generated graphs minor changes to the
distribution can cause dramatic changes in the performance of the
heuristic algorithms. The Ramsey algorithm, while not as good as the
others for the most common distributions, seems more robust and
provides a more even overall performance. In general, and especially
on deterministically generated graphs, a combination of simulated
annealing with either the Ramsey algorithm or the greedy heuristic
seems to perform best. This combined algorithm works particularly
well on large Keller and Hamming graphs and has a competitive overall
performance on the DIMACS benchmark graphs.
%R 1994-002
%T On Learning Counting Functions With Queries
%A Chen, Zhixiang
%A Homer, Steven
%D February 1994
%U http://www.cs.bu.edu/techreports/1994-002-counting.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We investigate the problem of learning disjunctions of counting
functions, which are general cases of parity and modulo functions,
with equivalence and membership queries. We prove that, for any prime
number $p$, the class of disjunctions of integer-weighted counting
functions with modulus $p$ over the domain $Z^{n}_{q}$ (or $Z^{n}$)
for any given integer $q \ge 2$ is polynomial time learnable using at
most $n+1$ equivalence queries, where the hypotheses issued by the
learner are disjunctions of at most $n$ counting functions with
weights from $Z_{p}$. The result is obtained through learning linear
systems over an arbitrary field. In general a counting function may
have a composite modulus. We prove that, for any given integer $q \ge
2$, over the domain $Z_{2}^{n}$, the class of read-once disjunctions
of Boolean-weighted counting functions with modulus $q$ is polynomial
time learnable with only one equivalence query, and the class of
disjunctions of $\log \log n$ Boolean-weighted counting functions with
modulus $q$ is polynomial time learnable.tions, which are general
cases Finally, we present an algorithm for learning graph-based
counting functions.
%R 1994-003
%T Mapping parallel iterative algorithms onto workstation networks
%A Heddaya, Abdelsalam
%A Park, Kihong
%D February 1994
%U http://www.cs.bu.edu/techreports/1994-003-parallel-comm.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
For communication-intensive parallel applications, the maximum degree
of concurrency achievable is limited by the communication throughput
made available by the network. In previous work, we showed
experimentally that the performance of certain parallel applications
running on a workstation network can be enhanced significantly if a
congestion control protocol is used to enhance network performance.
In this paper, we characterize and analyze the communication
requirements of a large class of supercomputing applications that fall
under the category of fixed-point problems, amenable to solution by
parallel iterative methods. This results in a set of interface and
architectural features sufficient for the efficient implementation of
the application over a large-scale distributed system. In particular,
we propose a direct link between the application and network layer,
supporting congestion control actions at both ends. This in turn
enhances the system's responsiveness to network congestion, improving
performance.
Preliminary results of a prototype system are summarized showing the
efficacy of our scheme to support large-scale parallel computations.
We conclude with a description of a full implementation in progress.
%R 1994-004
%T A Hybrid GLR Algorithm for Parsing with Epsilon Grammars
%A Shaban, Marwan
%D March 22, 1994
%U http://www.cs.bu.edu/techreports/1994-004-e-grammar-parser.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We give a hybrid algorithm for parsing $\epsilon$-grammars based on
Tomita's non-$\epsilon$-grammar parsing algorithm (\cite{tomi86}) and
Nozohoor-Farshi's $\epsilon$-grammar recognition algorithm
(\cite{fars91}). The hybrid parser handles the same set of grammars
handled by Nozohoor-Farshi's recognizer. The algorithm's details and
an example of its use are given. We also discuss the deployment of
the hybrid algorithm within a GB parser, and the reason an
$\epsilon$-grammar parser is needed in our GB parser.
%R 1994-005
%T Structure Sharing and Parallelization in a GB Parser
%A Shaban, Marwan
%D March 22, 1994
%U http://www.cs.bu.edu/techreports/1994-005-structure-sharing.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
By utilizing structure sharing among its parse trees, a GB parser can
increase its efficiency dramatically. Using a GB parser which has as
its phrase structure recovery component an implementation of Tomita's
algorithm (as described in \cite{tomi86}), we investigate how a GB
parser can preserve the structure sharing output by Tomita's
algorithm. In this report, we discuss the implications of using
Tomita's algorithm in GB parsing, and we give some details of the
structure-sharing parser currently under construction. We also
discuss a method of parallelizing a GB parser, and relate it to the
existing literature on parallel GB parsing. Our approach to
preserving sharing within a shared-packed forest is applicable not
only to GB parsing, but anytime we want to preserve structure sharing
in a parse forest in the presence of features.
%R 1994-006
%T Adding Polymorphic Abstraction to ML (Detailed Abstract)
%A Kfoury, A.J.
%A Wells, J.B.
%D May 1994
%U http://www.cs.bu.edu/techreports/1994-006-polymorphic-abstraction.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The ML programming language restricts type polymorphism to occur only
in the ``let-in'' construct and requires every occurrence of a formal
parameter of a function (a lambda abstraction) to have the same type.
Milner in 1978 refers to this restriction (which was adopted to help
ML achieve automatic type inference) as a serious limitation. We show
that this restriction can be relaxed enough to allow universal
polymorphic abstraction without losing automatic type inference. This
extension is equivalent to the rank-2 fragment of system F. We
precisely characterize the additional program phrases (lambda terms)
that can be typed with this extension and we describe typing anomalies
both before and after the extension. We discuss how macros may be
used to gain some of the power of rank-3 types without losing
automatic type inference. We also discuss user-interface problems in
how to inform the programmer of the possible types a program phrase
may have.
%R 1994-007
%T Timeliness via Speculation for Real-Time Databases
%A Bestavros, Azer
%A Braoudakis, Spyridon
%D May 1994
%U http://www.cs.bu.edu/techreports/1994-007-rtdbs-timeliness.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Various concurrency control algorithms differ in the time when
conflicts are detected, and in the way they are resolved. In that
respect, the Pessimistic and Optimistic Concurrency Control (PCC and
OCC) alternatives represent two extremes. PCC locking protocols detect
conflicts as soon as they occur and resolve them using {\em
blocking}. OCC protocols detect conflicts at transaction commit time
and resolve them using {\em rollbacks} (restarts). For real-time
databases, blockages and rollbacks are hazards that increase the
likelihood of transactions missing their deadlines. We propose a {\em
Speculative} Concurrency Control (SCC) technique that minimizes the
impact of blockages and rollbacks. SCC relies on the use of added
system resources to {\em speculate} on potential serialization orders
and to ensure that if such serialization orders materialize, the
hazards of blockages and roll-backs are minimized. We present a number
of SCC-based algorithms that differ in the level of speculation they
introduce, and the amount of system resources (mainly memory) they
require. We show the performance gains (in terms of number of
satisfied timing constraints) to be expected when a representative SCC
algorithm (SCC-2S) is adopted.
%R 1994-008
%T Towards Physically-Correct Specifications of Embedded Real-Time Systems
%A Bestavros, Azer
%D May 1994
%U http://www.cs.bu.edu/techreports/1994-008-physical-correctness.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Predictability (the ability to foretell that an implementation will
not violate a set of specified reliability and timeliness
requirements) is a crucial, highly desirable property of responsive
embedded systems. This paper overviews a development methodology for
responsive systems, which enhances predictability by eliminating
potential hazards resulting from physically-unsound specifications.
The backbone of our methodology is a formalism that restricts
expressiveness in a way that allows the specification of only
reactive, spontaneous, and causal computation. Unrealistic systems
(possessing properties such as clairvoyance, caprice, infinite
capacity, or perfect timing) cannot even be specified. We argue that
this ``ounce of prevention'' at the specification level is likely to
spare a lot of time and energy in the development cycle of responsive
systems -- not to mention the elimination of potential hazards that
would have gone, otherwise, unnoticed.
%R 1994-009
%T A lower-bound result on the power of a genetic algorithm
%A Park, Kihong
%D October 12, 1994
%U http://www.cs.bu.edu/techreports/1994-009-lowerbound-ga.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper presents a lower-bound result on the computational power of
a genetic algorithm in the context of combinatorial optimization. We describe
a new genetic algorithm, the merged genetic algorithm, and prove that
for the class of monotonic functions, the algorithm finds the optimal solution,
and does so with an exponential convergence rate. The analysis pertains to the
ideal behavior of the algorithm where the main task reduces to showing
convergence of probability distributions over the search space of combinatorial
structures to the optimal one. We take exponential convergence to be indicative
of efficient solvability for the sample-bounded algorithm, although a sampling
theory is needed to better relate the limit behavior to actual behavior. The
paper concludes with a discussion of some immediate problems that lie ahead.
%R 1994-010
%T On the effectiveness of genetic search in combinatorial optimization
%A Carter, Robert
%A Park, Kihong
%D November 10, 1994
%U http://www.cs.bu.edu/techreports/1994-010-genetic_search.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper, we study the efficacy of genetic algorithms in the
context of combinatorial optimization. In particular, we isolate the
effects of cross-over, treated as the central component of genetic
search. We show that for problems of nontrivial size and difficulty,
the contribution of cross-over search is marginal, both
synergistically when run in conjunction with mutation and selection,
or when run with selection alone, the reference point being the search
procedure consisting of just mutation and selection. The latter can be
viewed as another manifestation of the Metropolis process. Considering
the high computational cost of maintaining a population to facilitate
cross-over search, its marginal benefit renders genetic search
inferior to its singleton-population counterpart, the Metropolis
process, and by extension, simulated annealing. This is further
compounded by the fact that many problems arising in practice may
inherently require a large number of state transitions for a
near-optimal solution to be found, making genetic search infeasible
given the high cost of computing a single iteration in the enlarged
state-space.
%R 1994-011
%T Concurrency Control Protocols for Real-Time Databases (PhD Thesis)
%A Braoudakis, Spyridon
%D November 12, 1994
%U http://www.cs.bu.edu/techreports/1994-011-realtime-databases.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Concurrency control methods developed for traditional database systems
are not appropriate for real-time database systems (RTDBS), where, in
addition to database consistency requirements, satisfying timing
constraints is an integral part of the correctness criterion. Most
real-time concurrency control protocols considered in the literature
combine time-critical scheduling with traditional concurrency control
methods to conform to transaction timing constraints. These methods
rely on either transaction blocking or restarts, both of
which are inappropriate for real-time concurrency control because of
the unpredictability they introduce. Moreover, RTDBS
performance objectives differ from those of conventional database
systems in that maximizing the number of transactions that complete
before their deadlines becomes the decisive performance objective,
rather than merely maximizing concurrency (or throughput). Recently,
Speculative Concurrency Control (SCC) was proposed as a categorically
different approach to concurrency control for RTDBS. SCC relies on
the use of redundant processes ( shadows), which
speculate on alternative schedules, once conflicts that threaten the
consistency of the database are detected. SCC algorithms utilize added
system resources to ensure that correct (serializable) executions are
discovered and adopted as early as possible, thus increasing the
likelihood of the timely commitment of transactions.
This dissertation starts by reviewing the Order-Based SCC (SCC-OB)
algorithm which associates almost as many shadows as there are
serialization orders of transactions. After demonstrating SCC-OB's
excessive use of redundancy, a host of novel SCC-based protocols is
introduced. Conflict-Based SCC (SCC-CB) reduces the number of shadows
that a running transaction needs to keep by maintaining one shadow per
uncommitted conflicting transaction. It is shown that the quadratic
number of shadows maintained by SCC-CB is optimal, covering all
serialization orders produced by SCC-OB. SCC-CB's correctness is
established by showing that it admits only serializable histories.
Next, the trade-off between the number of shadows and timeliness is
considered. A generic SCC algorithm (SCC-kS) that operates under a
limited redundancy assumption is presented; it allows no more than a
constant number $k$ of shadows to coexist on behalf of any uncommitted
transaction. Next, a novel technique is proposed that incorporates
additional information such as deadline, priority and
criticalness within the SCC methodology. SCC with Deferred Commit
(SCC-DC) utilizes this additional information to improve the
timeliness through the controlled deferment of transaction
commitments. A probabilistic Value Induced Shadow Allocation (VISA)
policy is developed which aims at preserving the most valuable
shadows for each system transaction. The thesis of this dissertation
is that SCC-based algorithms offer a new dimension, redundancy,
to improve the timeliness of RTDBS. SCC-based algorithms are
efficient (quadratic number of shadows is optimal), scalable
(redundancy can be traded-off for timeliness), and easily amendable
(deadline and priority information can be incorporated).
(Major Advisor: Azer Bestavros)
%R 1994-012
%T OS Support for Portable Bulk Synchronous Parallel Programs
%A Heddaya, Abdelsalam
%A Fahmy, Amr
%D December 5, 1994
%U http://www.cs.bu.edu/techreports/1994-012-bsp-os.ps.Z
%I Computer Science Department, Boston University
%I Computer Science Department, Harvard University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
For parallel programs to become portable, they must be executable with
uniform efficiency on a variety of hardware platforms, which is not
the case at present. In 1990, Valiant proposed Bulk-Synchronous
Parallelism (BSP) as a model on which portable parallel programs can
be built. We argue that shared-memory BSP is efficiently
implementable on a wide variety of parallel hardware, and that BSP
forms a useful basis for providing an even higher level programming
interface based on Sequential Consistency (SC). A list of memory and
thread management features needed to support BSP and SC parallel
programs are given, under the assumption that the parallel computer is
space-shared among multiple parallel task, rather than time-shared.
Known techniques to realize efficiently the most important of these
features are sketched.
%R 1994-013
%T An Algorithm for Inferring Quasi-Static Types
%A Oliart, Alberto
%D November 1994
%U http://www.cs.bu.edu/techreports/1994-013-quasi-static-types.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This report presents an algorithm, and its implementation, for doing type
inference in the context of Quasi-Static Typing (QST) ["Quasy-static
Typing." Satish Thatte Proc. ACM Symp. om Principles of Programming
Languages, 1988]. The package infers types a la ``QST'' for the simply
typed lambda-calculus.
%R 1994-014
%T New Notions of Reduction and Non-Semantic Proofs of Beta-Strong Normalization in Typed Lambda-Calculi
%A Kfoury, A.J.
%A Wells, J.B.
%D December 19, 1994
%U http://www.cs.bu.edu/techreports/1994-014-strong-normalization.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Two new notions of reduction for terms of the lambda-calculus are
introduced and the question of whether a lambda-term is beta-strongly
normalizing is reduced to the question of whether a lambda-term is merely
normalizing under one of the new notions of reduction. This leads to a
new way to prove beta-strong normalization for typed lambda-calculi.
Instead of the usual semantic proof style based on Girard's ``candidats de
r\'eductibilit\'e'', termination can be proved using a decreasing metric
over a well-founded ordering in a style more common in the field of term
rewriting. This new proof method is applied to the simply-typed
lambda-calculus and the system of intersection types.
%R 1994-015
%T Search by Shape Examples: Modeling Nonrigid Deformation
%A Sclaroff, S.
%A Pentland, A.P.
%D October, 1994
%U http://www.cs.bu.edu/techreports/1994-015-search-by-shape-example.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We describe our work on shape-based image database search using the
technique of modal matching. Modal matching employs a deformable shape
decomposition that allows users to select example objects and have the
computer efficiently sort the set of objects based on the similarity
of their shape. Shapes are compared in terms of the types of nonrigid
deformations (differences) that relate them. The modal decomposition
provides deformation ``control knobs'' for flexible matching and thus
allows for selecting weighted subsets of shape parameters that are
deemed significant for a particular category or context. We
demonstrate the utility of this approach for shape comparison in 2-D
image databases; however, the general formulation is applicable to
signals of any dimensionality.
%R 1994-016
%T Physically-Based Combinations of Views: Representing Rigid and Nonrigid Motion
%A Sclaroff, S.
%A Pentland, A.P.
%D November, 1994
%U http://www.cs.bu.edu/techreports/1994-016-phys-based-comb-of-views.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Nonrigid motion can be described as morphing or blending between
extremal shapes, e.g., heart motion can be described as transitioning
between the systole and diastole states. Using physically-based
modeling techniques, shape similarity can be measured in terms of
forces and strain. This provides a physically-based coordinate system
in which motion is characterized in terms of physical similarity to a
set of extremal shapes. Having such a low-dimensional
characterization of nonrigid motion allows for the recognition and the
comparison of different types of nonrigid motion.
%R 1995-001
%T Proceedings of the Workshop on Versioning in Hypertext Systems
%A Durand, David
%A Haake, Anja
%A Hicks, David
%A Vitali, Fabio
%D February 7, 1995
%U http://www.cs.bu.edu/techreports/1995-001-hypertext-versioning-workshop
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This report contains 9 papers presented at a workshop on version
management and hypertext, as well as a summary introduction by the
organizers. These papers address requirements, solutions, and research
issues related to the management of hypertext databases. Version management
is not only a key application requirement in some domains (like design
journals and electronic manuals) but provides a way to preserve the
integrity of links in a changing hyperbase.
%R 1995-002
%T Application-Level Document Caching in the Internet
%A Bestavros, Azer
%A Carter, Robert
%A Crovella, Mark
%A Cunha, Carlos
%A Heddaya, Abdelsalam
%A Mirdad, Sulaiman
%D February 15, 1995
%U http://www.cs.bu.edu/techreports/1995-002-web-client-caching.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
With the increasing demand for document transfer services such as the
World Wide Web comes a need for better resource management to reduce
the latency of documents in these systems. To address this need, we
report on the potential for document caching at the application level
in document transfer services. We collected traces of over 250
executions of Mosaic, reflecting actual user requests for WWW
documents. Using those traces, we study the tradeoffs between caching
at three levels in the system, and the potential for use of
application-level information in the caching system. Our traces show
that while a high hit rate in terms of URLs is achievable, a much
lower hit rate is possible in terms of bytes, because most
profitably-cached documents are small. We considered the performance
of caching when applied at the level of individual user sessions, at
the level of individual hosts, and at the level of a collection of
hosts on a single LAN. We show that the performance gain achievable
by caching at the session level (which is straightforward to
implement) is nearly all of that achievable at the LAN level (where
caching is more difficult to implement). However, when resource
requirements are considered, LAN level caching becomes much more
desirable, since it can achieve a given level of caching performance
using a much smaller amount of cache space. Finally, we consider the
use of organizational boundary information as an example of the
potential for use of application-level information in caching. We
show that while it is desirable to cache local documents at the LAN
level, the opposite is true at the session level, where remote
documents are more profitably cached.
%R 1995-003
%T Demand-based Document Dissemination for the World-Wide Web
%A Bestavros, Azer
%D February 15, 1995
%U http://www.cs.bu.edu/techreports/1995-003-web-server-dissemination.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We analyzed the logs of the cs-www.bu.edu HTTP server for the month of
January 1995. Our analysis showed that remote HTTP accesses were
confined to a small subset of documents. Using an analytical model of
server popularity and file access profiles, we show that by
disseminating the most popular documents on servers (proxies) closer
to the clients, network traffic could be reduced considerably, while
server loads are balanced. We argue that this process could be
generalized so as to provide for an automated demand-based duplication
of documents. We believe that such server-based information
dissemination protocols will be more effective at reducing both
network bandwidth and document retrieval times than client-based
caching protocols.
%R 1995-004
%T Equational Axiomatization of Bicoercibility for Polymorphic Types
%A Tiuryn, Jerzy
%D February 16, 1995
%U http://www.cs.bu.edu/techreports/1995-004-coercibility.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Two polymorphic types \sigma and \tau are said to be bicoercible if
there is a coercion from \sigma to \tau and conversely. We give a
complete equational axiomatization of bicoercible types and prove that
the relation of bicoercibility is decidable.
%R 1995-005
%T Speculative Concurrency Control with Deferred Commitment for Real-Time Databases
%A Bestavros, Azer
%A Braoudakis, Spyridon
%D February 20, 1995
%U http://www.cs.bu.edu/techreports/1995-005-scc-dc.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A problem with Speculative Concurrency Control algorithms and other
common concurrency control schemes using forward validation is that
committing a transaction as soon as it finishes validating, may result
in a value loss to the system. Haritsa showed that by making a lower
priority transaction wait after it is validated, the number of
transactions meeting their deadlines is increased, which may result in
a higher value-added to the system. SCC-based protocols can benefit
from the introduction of such delays by giving optimistic shadows with
high value-added to the system more time to execute and commit instead
of being aborted in favor of other validating transactions, whose
value-added to the system is lower. In this paper we present and
evaluate an extension to SCC algorithms that allows for commit
deferments.
%R 1995-006
%T Using Speculation to Reduce Server Load and Service Time on the WWW
%A Bestavros, Azer
%D February 21, 1995
%U http://www.cs.bu.edu/techreports/1995-006-speculative-service.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Speculative service implies that a client's request for a document is
serviced by sending, in addition to the document requested, a number
of other documents that the server speculates will be requested by the
client in the near future. This speculation is based on statistical
information that the server maintains for each document it serves. The
notion of speculative service is analogous to prefetching, which is
used to improve cache performance in distributed/parallel shared
memory systems, with the exception that servers (not clients) control
when and what to prefetch. Using trace simulations based on the logs
of our departmental HTTP server http://cs-www.bu.edu, we show that
both server load and service time could be reduced considerably, if
speculative service is used. This is above and beyond what is
currently achievable using client-side caching and server-side
dissemination. We identify a number of parameters that could be used
to fine-tune the level of speculation performed by the server based on
the level of lookahead, the state of the network, the tradeoffs
between bulk and individual transmission of documents, and the
relative popularity of documents, among other factors.
%R 1995-007
%T Addendum to ``New Notions of Reduction and Non-Semantic Proofs of Beta Strong Normalization in Typed Lambda Calculi''
%A Kfoury, A.J.
%A Wells, J.B.
%D March 1995
%U http://www.cs.bu.edu/techreports/1995-007-strong-normalization-addendum.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This is an addendum to our technical report BUCS TR-94-014 of December
19, 1994. It clarifies some statements, adds information on some
related research, includes a comparison with research be de Groote, and
fixes two minor mistakes in a proof.
%R 1995-008
%T Modal Matching for Correspondence and Recognition
%A Sclaroff, S.
%A Pentland, A.P.
%D March 1995
%U http://www.cs.bu.edu/techreports/1995-008-modal-matching.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Modal matching is a new method for establishing correspondences and
computing canonical descriptions. The method is based on the idea of
describing objects in terms of generalized symmetries, as defined by
each object's eigenmodes. The resulting modal description is
used for object recognition and categorization, where shape
similarities are expressed as the amounts of modal deformation energy
needed to align the two objects. In general, modes provide a
global-to-local ordering of shape deformation and thus allow for
selecting which types of deformations are used in object alignment and
comparison. In contrast to previous techniques, which required
correspondence to be computed with an initial or prototype shape,
modal matching utilizes a new type of finite element formulation that
allows for an object's eigenmodes to be computed directly from
available image information. This improved formulation provides
greater generality and accuracy, and is applicable to data of any
dimensionality. Correspondence results with 2-D contour and point
feature data are shown, and recognition experiments with 2-D images of
hand tools and airplanes are described.
%R 1995-009
%T A New Version of Toom's Proof
%A Gacs, Peter
%D March 27, 1995
%U http://www.cs.bu.edu/techreports/1995-009-toom-proof.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
There are several proofs now for the stability of Toom's example of a
two-dimensional stable cellular automaton and its application to
fault-tolerant computation. Simon and Berman simplified and
strengthened Toom's original proof: the present report is simplified
exposition of their proof.
%R 1995-010
%T Characteristics of WWW Client-based Traces
%A Cunha, Carlos
%A Bestavros, Azer
%A Crovella, Mark
%D April 1, 1995 (modified July 18, 1995)
%U http://www.cs.bu.edu/techreports/1995-010-www-client-traces.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The explosion of WWW traffic necessitates an accurate picture of WWW
use, and in particular requires a good understanding of client
requests for WWW documents. To address this need, we have collected
traces of actual executions of NCSA Mosaic, reflecting over half a
million user requests for WWW documents. In this paper we present a
descriptive statistical summary of the traces we collected, which
identifies a number of trends and reference patterns in WWW use. In
particular, we show that many characteristics of WWW use can be
modelled using power-law distributions, including the distribution of
document sizes, the popularity of documents as a function of size, the
distribution of user requests for documents, and the number of
references to documents as a function of their overall rank in
popularity (Zipf's law). In addition, we show how the power-law
distributions derived from our traces can be used to guide system
designers interested in caching WWW documents.
---
Our client-based traces are available via FTP from
http://www.cs.bu.edu/techreports/1995-010-www-client-traces.tar.gz
http://www.cs.bu.edu/techreports/1995-010-www-client-traces.a.tar.gz
%R 1995-011
%T A Prefetching Protocol Using Client Speculation for the WWW
%A Bestavros, Azer
%A Cunha, Carlos
%D May 8, 1995
%U http://www.cs.bu.edu/techreports/1995-011-prefetching-via-client-speculation.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In an earlier paper, the potential of speculation (server-initiated
prefetching) in distributed information systems (such as the WWW) was
investigated and shown to be effective in reducing service time and
server load. This speculation was based on statistical information
that the server maintains for each document it serves. In this paper
we study the performance of a client-initiated prefetching protocol,
whereby speculation is based on past user-specific access patterns.
In this paper we present results of trace-driven simulation
experiments we performed using extensive user traces.
%R 1995-012
%T Object-Oriented Animation on the World Wide Web
%A Cai, Patrick
%A Bestavros, Azer
%D May 8, 1995
%U http://www.cs.bu.edu/techreports/1995-012-mosaic-animation.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose that video/audio animation be considered as a first-class
object on the World Wide Web. Animation is a very "bandwidth-efficient"
alternative to using video streams, especially for presentations
involving mathematical objects and interactions. We present an
object-oriented model that supports drawing-based and frame-based
animation. Based on that model, we describe an extension of the HyperText
Markup Language to support these capabilities. BU-NCSA Mosanim, a
modified version of the NCSA Mosaic for X(v2.5), was developed and is
available for distribution via anonymous FTP to demonstrate the concepts
and potentials of animation in presentations and interactive game playing
over the web.
%R 1995-013
%T Simulation of Hardware Dynamic Scheduling on the DLX Architecture
%A Bestavros, Azer
%A Liu, Yueh-Lin
%D June 6, 1995
%U http://www.cs.bu.edu/techreports/1995-013-dynamic-scheduling-dlxsim.html
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We describe our extention of the existing DLX simulator (DLXsim),
available from the University of California at Berkeley, which allows
the simulation of two hardware dynamic scheduling techniques. There
are two DLXsim-like interactive simulators developed as part of this
project. DLXscore simulates the operation of a DLX architecture
equipped with scoreboarding hardware. DLXscore provides the status of
instructions, scoreboard tables, and statistics. DLXtomasulo simulates
the operation of a DLX architecture equipped with a hardware
implementation of Tomasulo's algorithm. DLXtomasulo provides the
status of instructions, reservation stations, and statistics. Both
programs allow the user to configure the number of functional units
and the latency of floating point operations.
%R 1995-014
%T Dynamic Server Selection in the Internet
%A Crovella, Mark
%A Carter, Robert
%D June 30, 1995
%U http://www.cs.bu.edu/techreports/1995-014-dynamic-server-selection.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
As distributed information services like the World Wide Web become
increasingly popular on the Internet, problems of scale are clearly
evident. A promising technique that addresses many of these problems is
service (or document) replication. However, when a service is
replicated, clients then need the additional ability to find a ``good''
provider of that service. In this paper we report on techniques for
finding good service providers without a priori knowledge of server
location or network topology. We consider the use of two principal
metrics for measuring distance in the Internet: hops, and round-trip
latency. We show that these two metrics yield very different results in
practice. Surprisingly, we show data indicating that the number of hops
between two hosts in the Internet is {\em not\/} strongly correlated to
round-trip latency. Thus, the distance in hops between two hosts is not
necessarily a good predictor of the expected latency of a document
transfer. Instead of using known or measured distances in hops, we show
that the extra cost at runtime incurred by dynamic latency measurement
is well justified based on the resulting improved performance. In
addition we show that selection based on dynamic latency measurement
performs much better in practice that any static selection scheme.
Finally, the difference between the distribution of hops and latencies
is fundamental enough to suggest differences in algorithms for server
replication. We show that conclusions drawn about service replication
based on the distribution of hops need to be revised when the
distribution of latencies is considered instead.
%R 1995-015
%T Explaining World Wide Web Traffic Self-Similarity
%A Crovella, Mark
%A Bestavros, Azer
%D August 29, 1995
%U http://www.cs.bu.edu/techreports/1995-015-explaining-web-self-similarity.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Recently the notion of self-similarity has been shown to apply to
wide-area and local-area network traffic. In this paper we examine the
mechanisms that give rise to self-similar network traffic. We present
an explanation for traffic self-similarity by using a particular subset
of wide area traffic: traffic due to the World Wide Web (WWW). Using an
extensive set of traces of actual user executions of NCSA Mosaic,
reflecting over half a million requests for WWW documents, we show
evidence that WWW traffic is self-similar. Then we show that the
self-similarity in such traffic can be explained based on the underlying
distributions of WWW document sizes, the effects of caching and user
preference in file transfer, the effect of user ``think time'', and the
superimposition of many such transfers in a local area network. To do
this we rely on empirically measured distributions both from our traces
and from data independently collected at over thirty WWW sites.
%R 1995-016
%T World Wide Web Image Search Engines
%A Sclaroff, Stan
%D May 27, 1995
%U http://www.cs.bu.edu/techreports/1995-016-www-image-search-engines.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
(white paper presented at the NSF Workshop on Visual Information
Management, MIT, June 1995)
We propose the development of a world wide web image search engine
that crawls the web collecting information about the images it finds,
computes the appropriate image decompositions and indices, and stores
this extracted information for searches based on image content.
Indexing and searching images need not require solving the image
understanding problem. Instead, the general approach should be to
provide an arsenal of image decompositions and discriminants that can
be precomputed for images. At search time, users can select a
weighted subset of these decompositions to be used for computing image
similarity measurements. While this approach avoids the
search-time-dependent problem of labeling what is important in images,
it still holds several important problems that require further
research in the area of query by image content. We briefly explore
some of these problems as they pertain to shape.
%R 1995-017
%T Deformable Prototypes for Encoding Shape Categories in Image Databases
%A Sclaroff, Stan
%D Sept 12, 1995
%U http://www.cs.bu.edu/techreports/1995-017-deformable-prototypes.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We describe a method for shape-based image database search that uses
deformable prototypes to represent categories. Rather than directly
comparing a candidate shape with all shape entries in the database,
shapes are compared in terms of the types of nonrigid deformations
(differences) that relate them to a small subset of representative
prototypes. To solve the shape correspondence and alignment problem,
we employ the technique of {\em modal matching}, an
information-preserving shape decomposition for matching, describing,
and comparing shapes despite sensor variations and nonrigid
deformations. In modal matching, shape is decomposed into an ordered
basis of orthogonal principal components. We demonstrate the utility
of this approach for shape comparison in 2-D image databases.
%R 1995-018
%T Deterministic Computations Whose Hisrtory is Independent of the Order of Updating
%A Gacs, Peter
%D November 18, 1995
%U http://www.cs.bu.edu/techreports/1995-018-commut.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Consider a network of processors (sites) in which each site has
finitely many neighbors. Each site has some transition function
computing its next state from the states of the neighbors. These
transitions (updates) are applied in arbitrary order, one or many at a
time.
If the state of site x at time t is r(x,t) then let us define the
sequence r'(x,0),r'(x,1),... by taking the sequence
r(x,0),r(x,1),... and deleting each repetition, i.e. each element
equal to the preceding one.
The system of transition functions is said to support asynchrony if
the sequence r'(x,i), (while it lasts, in case it is finite) depends
only on the initial configuration, not on the order of updates.
This paper gives a simple characterization of transition functions
supporting asynchrony. The characterization says that it is
equivalent to the following seemingly weaker commutativity condition:
For any configuration, for any pair x,y of neighbors, if the updating
would change both s(x) and s(y) then the result of updating first x
and then y is be the same as the result of doing this in the reverse
order.
%R 1995-019
%T Title: The Undecidability of Mitchell's Subtyping Relationship
%A Wells, J.B.
%D December 10, 1995
%U http://www.cs.bu.edu/techreports/1995-019-subtyping-undecidable.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Mitchell defined and axiomatized a subtyping relationship (also known as
containment , coercibility , or subsumption over the types of System F
(with "arrow" and "forall"). This subtyping relationship is quite simple
and does not involve bounded quantification. Tiuryn and Urzyczyn quite
recently proved this subtyping relationship to be undecidable. This paper
supplies a new undecidability proof for this subtyping relationship.
First, a new syntax-directed axiomatization of the subtyping relationship
is defined. Then, this axiomatization is used to prove a reduction from
the undecidable problem of semi-unification to subtyping. The
undecidability of subtyping implies the undecidability of type checking
for System F extended with Mitchell's subtyping, also known as F plus eta.
%R 1996-001
%T AIDA-based Real-Time Fault-Tolerant Broadcast Disks
%A Bestavros, Azer
%D January 5, 1996
%U http://www.cs.bu.edu/techreports/1996-001-aida-broadcast-disks.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The proliferation of mobile computers and wireless networks requires
the design of future distributed real-time applications to recognize
and deal with the significant asymmetry between downstream and
upstream communication capacities, and the significant disparity
between server and client storage capacities. Recent research work
proposed the use of Broadcast Disks as a scalable mechanism to deal
with this problem. In this paper, we propose a new broadcast disks
protocol, based on our Adaptive Information Dispersal Algorithm
(AIDA). Our protocol is different from previous broadcast disks
protocols in that it improves communication timeliness,
fault-tolerance, and security, while allowing for a finer control of
multiplexing of prioritized data (broadcast frequencies). We start
with a general introduction of broadcast disks. Next, we propose
broadcast disk organizations that are suitable for real-time
applications. Next, we present AIDA and show its fault-tolerance and
security properties. We conclude the paper with the description and
analysis of AIDA-based broadcast disks organizations that achieve both
timeliness and fault-tolerance, while preserving downstream
communication capacity.
%R 1996-002
%T An Admission Control Paradigm for Real-Time Databases
%A Bestavros, Azer
%A Nagy, Sue
%D January 15, 1996
%U http://www.cs.bu.edu/techreports/1996-002-rtdbs-admission-control.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose and evaluate an admission control paradigm for RTDBS, in
which a transaction is submitted to the system as a pair of processes:
a primary task, and a recovery block. The execution requirements of
the primary task are not known a priori, whereas those of the recovery
block are known a priori. Upon the submission of a transaction, an
Admission Control Mechanism is employed to decide whether to admit or
reject that transaction. Once admitted, a transaction is guaranteed to
finish executing before its deadline. A transaction is considered to
have finished executing if exactly one of two things occur: Either its
primary task is completed (successful commitment), or its recovery
block is completed (safe termination). Committed transactions bring a
profit to the system, whereas a terminated transaction brings no
profit. The goal of the admission control, and scheduling protocols
(e.g., concurrency control, I/O scheduling, memory management)
employed in the system is to maximize system profit. We describe a
number of admission control strategies and contrast (through
simulations) their relative performance.
%R 1996-003
%T Advances in Real-Time Database Systems Research: Special Section on RTDBS of ACM SIGMOD Record 25(1), March 1996.
%A Bestavros, Azer
%D January 15, 1996
%U http://www.cs.bu.edu/techreports/1996-003-rtdbs-sigmod-record
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A Real-Time DataBase System (RTDBS) can be viewed as an
amalgamation of a conventional DataBase Management System (DBMS) and a
real-time system. Like a DBMS, it has to process transactions and
guarantee ACID database properties. Furthermore, it has to operate in
real-time, satisfying time constraints imposed on transaction
commitments. A RTDBS may exist as a stand-alone system or as an
embedded component in a larger multidatabase system. The publication
in 1988 of a special issue of ACM SIGMOD Record on Real-Time DataBases
signaled the birth of the RTDBS research area---an area that brings
together researchers from both the database and real-time systems
communities. Today, almost eight years later, I am pleased to present
in this special section of ACM SIGMOD Record a review of recent
advances in RTDBS research. There were 18 submissions to this special
section, of which eight papers were selected for inclusion to provide
the readers of ACM SIGMOD Record with an overview of current and
future research directions within the RTDBS community. In this paper,
I will summarize these directions and provide the reader with pointers
to other publications for further information.
%R 1996-004
%T On the Fractal Nature of WWW and Its Application to Cache Modeling
%A Almeida, Virgilio
%A Oliveira, Adriana
%D February 5, 1996
%U http://www.cs.bu.edu/techreports/1996-004-www-caching-fractals.ps.Z
%I Computer Science Department, Boston University
%I (Depto. de Ciencia da Computacao da UFMG, Brazil)
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The World Wide Web (WWW or Web) is growing rapidly on the Internet.
Web users want fast response time and easy access to a enormous
variety of information across the world. Thus, performance is becoming
a main issue in the Web. Fractals have been used to study fluctuating
phenomena in many different disciplines, from the distribution of
galaxies in astronomy to complex physiological control systems. The
Web is also a complex, irregular, and random system. In this paper,
we look at the document reference pattern at Internet Web servers and
use fractal-based models to understand aspects (e.g. caching schemes)
that affect the Web performance.
%R 1996-005
%T Distributed Parallel Computing in Mermera: Mixing Noncoherent Shared Memories
%A Heddaya, Abdelsalam
%A Sinha, Himanshu
%D March 7, 1996
%U http://www.cs.bu.edu/techreports/1996-005-mermera-model-system.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Programmers of parallel processes that communicate through shared globally
distributed data structures (DDS) face a difficult choice. Either they must
explicitly program DDS management, by partitioning or replicating it over
multiple distributed memory modules, or be content with a high latency
coherent (sequentially consistent) memory abstraction that hides the DDS'
distribution. We present Mermera, a formalism and system that enables a
smooth spectrum of noncoherent shared memory behaviors to coexist between the
above two extremes. Our approach allows us to define known noncoherent
memories in a new simple way, to identify new memory behaviors, and to
characterize generic mixed-behavior computations. The latter are useful for
programming using multiple behaviors that complement each others' advantages,
and for programming by step-wise refinement.
On the practical side, we show that the large class of programs that use
asynchronous iterative methods (AIM) can run correctly on slow memory, one of
the weakest, and hence most efficient and fault-tolerant, noncoherence
conditions. An example AIM program to solve linear equations, is developed to
illustrate the need for concurrently mixing memory behaviors, and the
performance gains attainable via noncoherence. Other program classes tolerate
weak memory consistency by synchronizing in such a way as to yield executions
indistinguishable from coherent ones. AIM computations on noncoherent memory
yield noncoherent, yet correct, computations. We present performance data
that illustrate the benefits of noncoherence, in terms of raw memory
performance, as well as application speed.
%R 1996-006
%T Measuring Bottleneck Link Speed in Packet-Switched Networks
%A Carter, Robert
%A Crovella, Mark
%D March 15, 1996
%U http://www.cs.bu.edu/techreports/1996-006-measuring-bottleneck-link.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The quality of available network connections can often have a large
impact on the performance of distributed applications. For example,
document transfer applications such as FTP, Gopher and the World Wide
Web suffer increased response times as a result of network
congestion. For these applications, the document transfer time is
directly related to the available bandwidth of the connection.
Available bandwidth depends on two things: 1) the underlying capacity
of the path from client to server, which is limited by the
bottleneck link; and 2) the amount of other traffic competing for
links on the path. If measurements of these quantities were available
to the application, the current utilization of connections could be
calculated. Network utilization could then be used as a basis for
selection from a set of alternative connections or servers, thus
providing reduced response time. Such a dynamic server selection
scheme would be especially important in a mobile computing environment
in which the set of available servers is frequently changing.
In order to provide these measurements at the application level, we
introduce two tools: bprobe, which provides an estimate of the
uncongested bandwidth of a path; and cprobe, which gives an
estimate of the current congestion along a path. These two measures
may be used in combination to provide the application with an estimate
of available bandwidth between server and client thereby enabling
application-level congestion avoidance.
In this paper we discuss the design and implementation of our probe
tools, specifically illustrating the techniques used to achieve
accuracy and robustness. We present validation studies for both tools
which demonstrate their reliability in the face of actual Internet
conditions; and we give results of a survey of available bandwidth to
a random set of WWW servers as a sample application of our probe
technique. We conclude with descriptions of other applications of our
measurement tools, several of which are currently under development.
%R 1996-007
%T Dynamic Server Selection using Bandwidth Probing in Wide-Area Networks
%A Carter, Robert
%A Crovella, Mark
%D March 18, 1996
%U http://www.cs.bu.edu/techreports/1996-007-dss-using-bandwidth.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Replication is a commonly proposed solution to problems of scale
associated with distributed services. However, when a service is
replicated, each client must be assigned a server. Prior work has
generally assumed that assignment to be static. In contrast, we propose
dynamic server selection, and show that it enables
application-level congestion avoidance.
To make dynamic server selection practical, we demonstrate the use
of three tools. In addition to direct measurements of round-trip latency,
we introduce and validate two new tools: bprobe, which estimates
the maximum possible bandwidth along a given path; and cprobe, which
estimates the current congestion along a path.
Using these tools we demonstrate dynamic server selection and compare it
to previous static approaches. We show that dynamic server selection
consistently outperforms static policies by as much as 50%. Furthermore,
we demonstrate the importance of each of our tools in performing dynamic
server selection.
%R 1996-008
%T Responsive Web Computing: Resource Management, Protocol Techniques, and Applications (A research statement)
%A Bestavros, Azer
%A Chen, Marina
%A Crovella, Mark
%A Heddaya, Abdelsalam
%A Sclaroff, Stan
%A Cowie, James
%D March 21, 1996
%U http://www.cs.bu.edu/techreports/1996-008-rwc.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The exploding demand for services like the World Wide Web reflects the
potential that is presented by globally distributed information systems.
The number of WWW servers world-wide has doubled every 3 to 5 months since
1993, outstripping even the growth of the Internet. At each of these
self-managed sites, the Common Gateway Interface (CGI) and Hypertext
Transfer Protocol (HTTP) already constitute a rudimentary basis for
contributing local resources to remote collaborations.
However, the Web has serious deficiencies that make it unsuited for use
as a true medium for metacomputing --- the process of bringing
hardware, software, and expertise from many geographically dispersed
sources to bear on large scale problems. These deficiencies are,
paradoxically, the direct result of the very simple design principles
that enabled its exponential growth.
There are many symptoms of the problems exhibited by the Web: disk and
network resources are consumed extravagantly; information search and
discovery are difficult; protocols are aimed at data movement rather than
task migration, and ignore the potential for distributing computation.
However, all of these can be seen as aspects of a single
problem: as a distributed system for metacomputing, the Web offers
unpredictable performance and unreliable results.
The goal of our project is to use the Web as a medium (within either
the global Internet or an enterprise intranet) for metacomputing in a
reliable way with performance guarantees. We attack this problem one
four levels:
(1) Resource Management Services:
Globally distributed computing allows novel approaches to the old
problems of performance guarantees and reliability. Our first set of
ideas involve setting up a family of real-time resource management
models organized by the Web Computing Framework with a standard
Resource Management Interface (RMI), a Resource Registry, a Task
Registry, and resource management protocols to allow resource needs
and availability information be collected and disseminated so that a
family of algorithms with varying computational precision and accuracy
of representations can be chosen to meet realtime and reliability constraints.
(2) Middleware Services:
Complementary to techniques for allocating and scheduling available
resources to serve application needs under realtime and reliability
constraints, the second set of ideas aim at reduce communication
latency, traffic conjestion, server work load, etc. We develop
customizable middleware services to exploit application
characteristics in traffic analysis to drive new server/browser design
strategies (e.g., exploit self-similarity of Web traffic), derive
document access patterns via multiserver cooperation, and use them in
speculative prefetching, document caching, and aggressive replication
to reduce server load and bandwidth requirements.
(3) Communication Infrastructure:
Finally, to achieve any guarantee of quality of service or
performance, one must get at the network layer that can provide the
basic guarantees of bandwidth, latency, and reliability. Therefore,
the third area is a set of new techniques in network service and
protocol designs.
(4) Object-Oriented Web Computing Framework
A useful resource management system must deal with job priority,
fault-tolerance, quality of service, complex resources such as ATM
channels, probabilistic models, etc., and models must be tailored to
represent the best tradeoff for a particular setting. This requires a
family of models, organized within an object-oriented framework,
because no one-size-fits-all approach is appropriate. This presents a
software engineering challenge requiring integration of solutions at
all levels: algorithms, models, protocols, and profiling and
monitoring tools. The framework captures the abstract class
interfaces of the collection of cooperating components, but allows the
concretization of each component to be driven by the requirements of a
specific approach and environment.
%R 1996-009
%T Proceedings of the ECSCW'95: Workshop on the Role of Version Control in CSCW Applications
%A Hicks, David
%A Haake, Anja
%A Durand, David
%A Vitali, Fabio
%D April 26, 1996
%U http://www.cs.bu.edu/techreports/1996-009-ecscw95-proceedings
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The workshop entitled "The Role of Version Control in Computer Supported
Cooperative Work Applications" was held on September 10, 1995 in Stockholm,
Sweden in conjunction with the ECSCW'95 conference. Version control, the
ability to manage relationships between successive instances of artifacts,
organize those instances into meaningful structures, and support navigation
and other operations on those structures, is an important problem in CSCW
applications. It has long been recognized as a critical issue for
inherently cooperative tasks such as software engineering, technical
documentation, and authoring. The primary challenge for versioning in these
areas is to support opportunistic, open-ended design processes requiring
the preservation of historical perspectives in the design process, the
reuse of previous designs, and the exploitation of alternative designs.
This report contains a summary in which the workshop organizers report the
major results of the workshop. The summary is followed by a section that
contains the position papers that were accepted to the workshop. The
position papers provide more detailed information describing recent
research efforts of the workshop participants as well as current challenges
that are being encountered in the development of CSCW applications. A list
of workshop participants is provided at the end of the report.
%R 1996-010
%T Client-Based Logging: A New Paradigm For Distributed Transaction Management (PhD Thesis)
%A Panagos, Euthimios
%D June 13, 1996
%U http://www.cs.bu.edu/techreports/1996-010-client-based-logging.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The proliferation of inexpensive workstations and networks has created
a new era in distributed computing. At the same time, non-traditional
applications such as computer-aided design (CAD), computer-aided
software engineering (CASE), geographic- information systems (GIS),
and office-information systems (OIS) have placed increased demands for
high-performance transaction processing on database systems. The
combination of these factors gives rise to significant challenges in
the design of modern database systems. In this thesis, we propose
novel techniques whose aim is to improve the performance and
scalability of these new database systems. These techniques exploit
client resources through client-based transaction management.
Client-based transaction management is realized by providing logging
facilities locally even when data is shared in a global environment.
This thesis presents several recovery algorithms which utilize client
disks for storing recovery related information (i.e., log records).
Our algorithms work with both coarse and fine-granularity locking and
they do not require the merging of client logs at any time. Moreover,
our algorithms support fine-granularity locking with multiple clients
permitted to concurrently update different portions of the same
database page. The database state is recovered correctly when there
is a complex crash as well as when the updates performed by different
clients on a page are not present on the disk version of the page,
even though some of the updating transactions have committed.
This thesis also presents the implementation of the proposed
algorithms in a memory-mapped storage manager as well as a detailed
performance study of these algorithms using the OO1 database
benchmark. The performance results show that client- based logging is
superior to traditional server-based logging. This is because
client-based logging is an effective way to reduce dependencies on
server CPU and disk resources and, thus, prevents the server from
becoming a performance bottleneck as quickly when the number of
clients accessing the database increases.
%R 1996-011
%T Characterizing Reference Locality in the WWW
%A Almeida, Virgilio
%A Bestavros, Azer
%A Crovella, Mark
%A deOliveira, Adriana
%D June 21, 1996
%U http://www.cs.bu.edu/techreports/1996-011-www-reference-locality.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
As the World Wide Web (Web) is increasingly adopted as the
infrastructure for large-scale distributed information systems, issues
of performance modeling become ever more critical. In particular,
locality of reference is an important property in the performance
modeling of distributed information systems. In the case of the Web,
understanding the nature of reference locality will help improve the
design of middleware, such as caching, prefetching, and document
dissemination systems. For example, good measurements of reference
locality would allow us to generate synthetic reference streams with
accurate performance characteristics, would allow us to compare
empirically measured streams to explain differences, and would allow
us to predict expected performance for system design and capacity
planning.
In this paper we propose models for both temporal and spatial locality
of reference in streams of requests arriving at Web servers.
We show that simple models based only on document popularity (likelihood
of reference) are insufficient for capturing either temporal or spatial
locality. Instead, we rely on an equivalent, but numerical,
representation of a reference stream: a stack distance trace.
We show that temporal locality can be
characterized by the marginal distribution of the stack distance trace,
and we propose models for typical distributions and compare their cache
performance to our traces.
We also show that spatial locality in a reference stream can be
characterized using the notion of self-similarity. Self-similarity
describes long-range correlations in the dataset, which is a property
that previous researchers have found hard to incorporate into synthetic
reference strings. We show that stack distance strings appear to be
stongly self-similar, and we provide measurements of the degree of
self-similarity in our traces. Finally, we discuss methods for
generating synthetic Web traces that exhibit the properties of temporal
and spatial locality that we measured in our data.
Keywords: Self-similarity; Long-range dependence; Distance strings;
Reference locality; Caching; Performance modeling.
%R 1996-012
%T Management of Communicable Memory and Lazy Barriers for Bulk Synchronous Parallelism in BSPk
%A Fahmy, Amr
%A Heddaya, Abdelsalam
%D July 2, 1996
%U http://www.cs.bu.edu/techreports/1996-012-bspk-design.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Communication and synchronization stand as the dual bottlenecks in the
performance of parallel systems, and especially those that attempt to
alleviate the programming burden by incurring overhead in these two
domains. We formulate the notions of communicable memory and lazy
barriers to help achieve efficient communication and synchronization.
These concepts are developed in the context of BSPk, a toolkit library
for programming networks of workstations---and other distributed
memory architectures in general---based on the Bulk Synchronous
Parallel (BSP) model. BSPk, whose design is the subject of this
paper, emphasizes efficiency in communication by minimizing local
memory-to-memory copying, and in barrier synchronization by not
forcing a process to wait unless it needs remote data. Both the
message passing (MP) and distributed shared memory (DSM) programming
styles are supported in BSPk, for the former helps processes exchange
short-lived unnamed data values, while the latter permits
communication through long-lived named variables.
%R 1996-013
%T Real-Time Databases: Issues and Applications (RTDB'96 Workshop Report)
%A Bestavros, Azer
%A Lin, Kwei-Jay
%A Son, Sang
%D July 3, 1996
%U http://www.cs.bu.edu/techreports/1996-013-rtdb96-report.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This report summarizes the technical presentations and discussions that
took place during RTDB'96: the First International Workshop on
Real-Time Databases, which was held on March 7 and 8, 1996 in Newport
Beach, California. The main goals of this project were to (1) review
recent advances in real-time database systems research, (2) to promote
interaction among real-time database researchers and practitioners,
and (3) to evaluate the maturity and directions of real-time database
technology.
%R 1996-014
%T TCP Boston: A Fragmentation-tolerant TCP Protocol for ATM Networks
%A Bestavros, Azer
%A Kim, Gitae
%D July 15, 1996
%U http://www.cs.bu.edu/techreports/1996-014-tcp-boston.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The popularity of TCP/IP coupled with the premise of high speed
communication using Asynchronous Transfer Mode (ATM) technology
have prompted the network research community to propose a number of
techniques to adapt TCP/IP to ATM network environments. ATM offers
Available Bit Rate (ABR) and Unspecified Bit Rate (UBR) services
for best-effort traffic, such as conventional file transfer.
However, recent studies have shown that TCP/IP, when implemented
using ABR or UBR, leads to serious performance degradations,
especially when the utilization of network resources (such as
switch buffers) is high. Proposed techniques---switch-level
enhancements, for example---that attempt to patch up TCP/IP over
ATMs have had limited success in alleviating this problem. The
major reason for TCP/IP's poor performance over ATMs has been
consistently attributed to packet fragmentation, which is the
result of ATM's 53-byte cell-oriented switching architecture.
In this paper, we present a new transport protocol, TCP Boston, that
turns ATM's 53-byte cell-oriented switching architecture into an
advantage for TCP/IP. At the core of TCP Boston is the Adaptive
Information Dispersal Algorithm (AIDA), an efficient encoding
technique that allows for dynamic redundancy control. AIDA makes
TCP/IP's performance less sensitive to cell losses, thus ensuring a
graceful degradation of TCP/IP's performance when faced with congested
resources. In this paper, we introduce AIDA and overview the main
features of TCP Boston. We present detailed simulation results that
show the superiority of our protocol when compared to other
adaptations of TCP/IP over ATMs. In particular, we show that TCP
Boston improves TCP/IP's performance over ATMs for both
network-centric metrics (e.g., effective throughput) and
application-centric metrics (e.g., response time).
%R 1996-015
%T Ergodicity and mixing rate of one-dimensional cellular automata (PhD Thesis)
%A Park, Kihong
%D July 22, 1996
%U http://www.cs.bu.edu/techreports/1996-015-park-phdthesis.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
One-and two-dimensional cellular automata which are known to be
fault-tolerant are very complex. On the other hand, only very simple
cellular automata have actually been proven to lack fault-tolerance,
i.e., to be mixing. The latter either have large noise probability
$\eps$ or belong to the small family of two-state nearest-neighbor
monotonic rules which includes local majority voting.
For a certain simple automaton $L$ called the soldiers rule, this
problem has intrigued researchers for the last two decades since $L$
is clearly more robust than local voting: in the absence of noise, $L$
eliminates any finite island of perturbation from an initial
configuration of all 0's or all 1's. The same holds for a 4-state
monotonic variant of $L$, $K$, called two-line voting. We will prove
that the probabilistic cellular automata $K_\eps$ and $L_\eps$
asymptotically lose all information about their initial state when
subject to small, strongly biased noise. The mixing property
trivially implies that the systems are ergodic.
The finite-time information-retaining quality of a mixing system can
be represented by its relaxation time $\Relax(\cdot)$, which measures
the time before the onset of significant information loss. This is
known to grow as $(1/\eps)^c$ for noisy local voting. The impressive
error-correction ability of $L$ has prompted some researchers to
conjecture that $\Relax(L_\eps)=2^{c/\eps}$. We prove the tight bound
$2^{c_1\log^2 1/\eps} < \Relax(L_\eps) < 2^{c_2\log^2 1/\eps}$ for a
biased error model. The same holds for $K_\eps$. Moreover, the lower
bound is independent of the bias assumption.
The strong bias assumption makes it possible to apply
sparsity/renormalization techniques, the main tools of our
investigation, used earlier in the opposite context of proving
fault-tolerance.
%R 1996-016
%T On the relationship between file sizes, transport protocols, and self-similar network traffic
%A Park, Kihong
%A Kim, Gitae
%A Crovella, Mark
%D July 30, 1996
%U http://www.cs.bu.edu/techreports/1996-016-self-similar-cause.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Recent measurements of local-area and wide-area traffic have shown
that network traffic exhibits variability at a wide range of
scales---self-similarity. In this paper, we examine a mechanism that
gives rise to self-similar network traffic and present some of its
performance implications. The mechanism we study is the transfer of
files or messages whose size is drawn from a heavy-tailed distribution.
We examine its effects through detailed transport-level simulations
of multiple TCP streams in an internetwork.
First, we show that in a ``realistic'' client/server network
environment---i.e., one with bounded resources and coupling among traffic
sources competing for resources---the degree to which file sizes are
heavy-tailed can directly determine the degree of traffic self-similarity
at the link level. We show that this causal relationship is not
significantly affected by changes in network resources (bottleneck
bandwidth and buffer capacity), network topology, the influence of
cross-traffic, or the distribution of interarrival times.
Second, we show that properties of the transport layer play an
important role in preserving and modulating this relationship. In
particular, the reliable transmission and flow control mechanisms
of TCP (Reno, Tahoe, or Vegas) serve to maintain the long-range
dependency structure induced by heavy-tailed file size distributions.
In contrast, if a non-flow-controlled and unreliable (UDP-based)
transport protocol is used, the resulting traffic shows little
self-similar characteristics: although still bursty at short time scales,
it has little long-range dependence. If flow-controlled, unreliable
transport is employed, the degree of traffic self-similarity is
positively correlated with the degree of throttling at the source.
Third, in exploring the relationship between file sizes, transport
protocols, and self-similarity, we are also able to show some of the
performance implications of self-similarity. We present data on
the relationship between traffic self-similarity and network performance
as captured by performance measures including packet loss rate,
retransmission rate, and queueing delay. Increased self-similarity,
as expected, results in degradation of performance. Queueing delay,
in particular, exhibits a drastic increase with increasing
self-similarity. Throughput-related measures such as packet loss and
retransmission rate, however, increase only gradually with increasing
traffic self-similarity as long as reliable, flow-controlled transport
protocol is used.
%R 1996-017
%T Load Profiling in Distributed Real-Time Systems
%A Bestavros, Azer
%D August 1, 1996
%U http://www.cs.bu.edu/techreports/1996-017-load-profiling.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Load balancing is often used to ensure that nodes in a distributed
systems are equally loaded. In this paper, we show that for real-time
systems, load balancing is not desirable. In particular, we propose a
new load-profiling strategy that allows the nodes of a distributed
system to be unequally loaded. Using load profiling, the system
attempts to distribute the load amongst its nodes so as to maximize
the chances of finding a node that would satisfy the computational
needs of incoming real-time tasks. To that end, we describe and
evaluate a distributed load-profiling protocol for dynamically
scheduling time-constrained tasks in a loosely-coupled distributed
environment. When a task is submitted to a node, the scheduling
software tries to schedule the task locally so as to meet its
deadline. If that is not feasible, it tries to locate another node
where this could be done with a high probability of success, while
attempting to maintain an overall load profile for the system. Nodes
in the system inform each other about their state using a combination
of multicasting and gossiping. The performance of the proposed
protocol is evaluated via simulation, and is contrasted to other
dynamic scheduling protocols for real-time distributed systems. Based
on our findings, we argue that keeping a diverse availability
profile and using passive bidding (through gossiping) are both
advantageous to distributed scheduling for real-time systems.
%R 1996-018
%T Performance Analysis of a WWW Server
%A Almeida, Virgilio
%A Almeida, Jussara
%A Murta, Cristina
%D August 5, 1996
%U http://www.cs.bu.edu/techreports/1996-018-www-performance-analysis.ps.Z
%I Computer Science Department, Boston University and UFMG
%X
The WWW has experienced a phenomenal growth and has become the most
popular Internet application. As a consequence of its large
popularity, the Internet has suffered from various performance
problems, such as network congestion and overloaded servers. These
days, it is not uncommon to find servers refusing connections because
they are overloaded.
Performance has always been a key issue in the design and operation of
on-line systems. With regard to Internet, performance is also
critical, because users want fast and easy access to all objects
(e.g., documents, graphics, audio, and video) available on the
net. Thus, it is important to understand WWW performance issues. This
paper focuses on the performance analysis of Web servers. Using a
synthetic benchmark (WebStone) and standard operating systems
monitoring tools, it analyzes three different Web server software
running on top of a Windows NT platform and performing some typical
WWW tasks. It also discusses the main steps needed to carry out a WWW
performance analysis effort and shows relations between the workload
characteristics and system resource usage.
%R 1996-019
%T Beta-Reduction as Unification
%A Kfoury, A.J.
%D July 8, 1996
%U http://www.cs.bu.edu/techreports/1996-019-beta-unification.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We define a unification problem ^UP with the property that,
given a pure lambda-term M, we can derive an instance Gamma(M)
of ^UP from M such that Gamma(M) has a solution if and only if
M is beta-strongly normalizable. There is a type discipline for
pure lambda-terms that characterizes beta-strong normalization;
this is the system of intersection types (without a ``top'' type
that can be assigned to every lambda-term). In this report, we
use a lean version LAMBDA of the usual system of intersection types.
Hence, ^UP is also an appropriate unification problem to characterize
typability of lambda-terms in LAMBDA. It also follows that ^UP is
an undecidable problem, which can in turn be related to semi-unification
and second-order unification (both known to be undecidable).
%R 1996-020
%T An Infinite Pebble Game and Applications
%A Kfoury, A.J.
%A Stolboushkin, A.P.
%D August 15, 1996
%U http://www.cs.bu.edu/techreports/1996-020-infinite-pebble-game.ps.Z
%I Computer Science Department, Boston University and Mathematics Department, UCLA
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We generalize the well-known pebble game to infinite dag's, and we
use this generalization to give new and shorter proofs of results in
different areas of computer science (as diverse as ``logic of programs''
and ``formal language theory''). Our applications here include a proof
of a theorem due to Salomaa, asserting the existence of a context-free
language with infinite index, and a proof of a theorem due to Tiuryn
and Erimbetov, asserting that unbounded memory increases the power of
logics of programs. The original proofs by Salomaa, Tiuryn, and Erimbetov,
are fairly technical. The proofs by Tiuryn and Erimbetov also involve
advanced techniques of model theory, namely, back-and-forth constructions
based on a variant of Ehrenfeucht-Fraisse games. By contrast, our proofs
are not only shorter, but also elementary. All we need is essentially
finite induction and, in the case of the Tiuryn-Erimbetov result, the
compactness and completeness of first-order logic.
%R 1996-021
%T A Linearization of the Lambda Calculus and Consequences
%A Kfoury, A.J.
%D August 19, 1996
%U http://www.cs.bu.edu/techreports/1996-021-linearization-lambda-calculus.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
If every lambda-abstraction in a lambda-term M binds at most one
variable occurrence, then M is said to be "linear". Many questions
about linear lambda-terms are relatively easy to answer, e.g.
they all are beta-strongly normalizing and all are simply-typable.
We extend the syntax of the standard lambda-calculus L to a non-standard
lambda-calculus L^ satisfying a linearity condition generalizing the
notion in the standard case. Specifically, in L^ a subterm Q of a term
M can be applied to several subterms R1,...,Rk in parallel, which we
write as (Q. R1 \wedge ... \wedge Rk). The appropriate notion of beta-
reduction beta^ for the calculus L^ is such that, if Q is the lambda-
abstraction (\lambda x.P) with m\geq 0 bound occurrences of x, the
reduction can be carried out provided k = max(m,1). Every M in L^ is
thus beta^-SN. We relate standard beta-reduction and non-standard
beta^-reduction in several different ways, and draw several consequences,
e.g. a new simple proof for the fact that a standard term M is beta-SN
iff M can be assigned a so-called ``intersection'' type (``top'' type
disallowed).
%R 1996-022
%T Typability is Undecidable for F+Eta
%A Wells, J.B.
%D March 9, 1996
%U http://www.cs.bu.edu/techreports/1996-022-f+eta-typability-undecidable.ps.Z
%I Computer Science Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
System F is the well-known polymorphically-typed lambda calculus with
universal quantifiers. F+eta is System F extended with the eta rule,
which says that if term M can be given type tau and M eta-reduces to N ,
then N can also be given the type tau. Adding the eta rule to System F is
equivalent to adding the subsumption rule using the subtyping
(containment) relation that Mitchell defined and axiomatized [Mit88]. The
subsumption rule says that if M can be given type tau and tau is a subtype
of type sigma, then M can be given type sigma. Mitchell's subtyping
relation involves no extensions to the syntax of types, i.e., no bounded
polymorphism and no supertype of all types, and is thus unrelated to the
system "F-sub".
Typability for F+eta is the problem of determining for any term M whether
there is any type tau that can be given to it using the type inference
rules of F+eta. Typability has been proven undecidable for System F
[Wel94] (without the eta rule), but the decidability of typability has
been an open problem for F+eta. Mitchell's subtyping relation has
recently been proven undecidable [TU95,Wel95b], implying the
undecidability of "type checking" for F+eta. This paper reduces the
problem of subtyping to the problem of typability for F+eta, thus proving
the undecidability of typability. The proof methods are similar in
outline to those used to prove the undecidability of typability for System
F, but the fine details differ greatly.
%R 1996-023
%T Pinwheel Scheduling for Fault-tolerant Broadcast Disks in Real-time Database Systems
%A Baruah, Sanjoy
%A Bestavros, Azer
%D August 22, 1996
%U http://www.cs.bu.edu/techreports/1996-023-pinwheel-bdisks.ps.Z
%I EE/CS Department, University of Vermont; CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The design of programs for broadcast disks which incorporate real-time
and fault-tolerance requirements is considered. A generalized model
for real-time fault-tolerant broadcast disks is defined. It is shown
that designing programs for broadcast disks specified in this model is
closely related to the scheduling of pinwheel task systems. Some new
results in pinwheel scheduling theory are derived, which facilitate
the efficient generation of real-time fault-tolerant broadcast disk
programs.
%R 1996-024
%T WebWave: Globally Load Balanced Fully Distributed Caching of Hot Published Documents
%A Heddaya, Abdelsalam
%A Mirdad, Sulaiman
%D October 10, 1996
%U http://www.cs.bu.edu/techreports/1996-024-webwave-theory.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Document publication service over such a large network as the Internet
challenges us to harness available server and network resources to
meet fast growing demand. In this paper, we show that large-scale
dynamic caching can be employed to globally minimize server idle time,
and hence maximize the aggregate throughput of the whole service.
Given the distributed nature of the system, a successful caching
mechanism must satisfy three properties: (1) that it maximize the
global throughput of the system, (2) that it be completely distributed
in the sense of operating only on the basis of local information, and
(3) that it require no naming service that introduces a scalability
bottleneck.
In this paper, we develop a precise definition, which we call "tree
load-balance", of what it means for a mechanism to satisfy these three
goals, and present two algorithms that achieve them. Both algorithms
compute the request rate that should be allocated to each cache server, so
that global throughput is maximized. The first algorithm, WebFold, is a
centralized one that is provably optimal with respect to throughput. The
second algorithm, WebWave, whose optimality is evidenced by simulation, is
a fully distributed diffusion-based protocol. Both algorithms assume that
cache copies are placed on the routing tree that connects the cached
document's home server with its clients. As a consequence, document
requests can find cache copies without resorting to a cache directory of
any kind. The results herein apply only to immutable documents; we do not
consider the cache consistency problem.
%R 1996-025
%T Measuring the Behavior of a World-Wide Web Server
%A Almeida, Jussara
%A Almeida, Virgilio
%A Yates, David
%D October 29, 1996
%U http://www.cs.bu.edu/techreports/1996-025-web-server-measurements.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Server performance has become a crucial issue for improving the overall
performance of the World-Wide Web. This paper describes Webmonitor, a tool
for evaluating and understanding server performance, and presents new
results for a realistic workload.
Webmonitor measures activity and resource consumption, both within
the kernel and in HTTP processes running in user space. Webmonitor is
implemented using an efficient combination of sampling and event-driven
techniques that exhibit low overhead. Our initial implementation is for
the Apache World-Wide Web server running on the Linux operating system. We
demonstrate the utility of Webmonitor by measuring and understanding the
performance of a Pentium-based PC acting as a dedicated WWW server. Our
workload uses a file size distribution with a heavy tail. This captures
the fact that Web servers must concurrently handle some requests for large
audio and video files, and a large number of requests for small documents,
containing text or images.
Our results show that in a Web server saturated by client requests,
over 90% of the time spent handling HTTP requests is spent in the kernel.
Furthermore, keeping TCP connections open, as required by TCP, causes a
factor of 2-9 increase in the elapsed time required to service an HTTP
request. Data gathered from Webmonitor provide insight into the causes of
this performance penalty. Specifically, we observe a significant increase
in resource consumption along three dimensions: the number of HTTP
processes running at the same time, CPU utilization, and memory
utilization. These results emphasize the important role of operating
system and network protocol implementation in determining Web server
performance.
%R 1996-026
%T Blocking Java Applets at the Firewall
%A Martin, David M.
%A Rajagopalan, Sivaramakrishnan
%A Rubin, Aviel D.
%D November 14, 1996
%U http://www.cs.bu.edu/techreports/1996-026-java-firewalls.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper explores the problem of protecting a site on the Internet
against hostile external Java applets while allowing trusted internal
applets to run. With careful implementation, a site can be
made resistant to current Java security weaknesses as well as those yet to
be discovered. In addition, we describe a new attack on certain
sophisticated firewalls that is most effectively realized
as a Java applet.
%R 1996-027
%T Proceedings of the 17th Real-Time Systems Symposium WIP Session
%A Bestavros, Azer
%D December 4, 1996
%U http://www.cs.bu.edu/techreports/1996-027-ieee-rtss96-wip
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This technical report includes 14 short papers presented during the
WIP session of the 17th Real-Time Systems Symposium, held in
Washington DC on December 4-6, 1996. The title and authors are
included below.
------
(1) A Specialized Specification and Verification System for Timed Automata
Myla Archer and Constance Heitmeyer
Naval Research Laboratory, USA
Abstract: Assuring the correctness of specifications of
real-time systems can involve significant human effort. The use
of a mechanical theorem prover to encode such specifications and
to verify their properties could significantly reduce this
effort. A barrier to routinely encoding and mechanically
verifying specifications has been the need first to master the
specification language and logic of a general theorem proving
system. Our approach to overcoming this barrier is to provide
mechanical support for producing specifications and verifying
proofs, specialized for particular mathematical models and proof
techniques. We are currently developing a mechanical
verification system called TAME (Timed Automata Modeling
Environment), which provides this specialized support using
SRI's Prototype Verification System (PVS). Our system is
intended to permit steps in reasoning similar to those in hand
proofs that use model-specific techniques. TAME has recently
been used to detect errors in a realistic example.
------
(2) Scheduling Slack in MetaH
Pam Binns
Honeywell Technology Center, USA
Abstract: A real-time implementation for allocating slack to
aperiodic proceesses in MetaH is nearing completion. The slack
scheduling algorithm is based on the slack stealer originally
proposed in "An Optimal Algorithm for Scheduling Soft-Aperiodic
Tasks in Fixed-Priority Preemptive Systems" with practical
extensions to allow for support of process criticalities,
multiple process streams (of different criticalities) competing
for pooled slack and inclusion of run-time overheads in the
slack functions. Areas in need of future work are also
identified.
------
(3) AFTER: A case tool to assist in Fine-tuning of embedded real-time systems
Gaurav Arora and David Stewart
University of Maryland, USA
Abstract: AFTER (Assist in Fine-Tuning of Embedded Real-time
systems) is an interactive analysis and predictor tool for
embedded systems. It helps designers quickly identify timing
problems and systematically fine-tune an application during and
after the implementation phase of a product's lifecycle. The
tool begins with raw timing data collected from an embedded
system. It analyzes the data to provide a temporal image of the
current implementation, highlighting actual and potential
problems. The user then interacts with AFTER to obtain
predictions on what overall effect can be expected if small
adjustments are made to configuration parameters or to the
timing properties of specific software components. The tool
integrates and extends prior research in scheduling, task
monitoring, and operating system design for real-time systems.
------
(4) Genericity and Upgradability in Ultra-Dependable Real-Time Architectures
Andy Wellings, Ljerka Beus-Dukis, Alan Burns, and David Powell
LAAS-CNRS, France and University of York, UK
Abstract: We report on the ideas currently being developed
within the European GUARDS project to develop a generic
upgradable architecture for real-time dependable systems. After
a brief introduction and overview of the architecture, we
outline the GUARDS approach for scheduling real-time replicated
computation.
------
(5) Challenges in Engineering Distributed Shipboard Control System
L.Welch, B.Ravindran, R.Harrison, L.Madden, M.W.Masters and W.Mills
Naval Surface Warfare Center and University of Texas at Arlington, USA
Abstract: In response to the need to develop high capacity,
scalable computer systems for shipboard use, a program called
the High Performance Distributed Computing Program (HiPer-D),
was created. HiPer-D is intended to provide the technical
design concepts and engineering data needed to enable the Navy
to capitalize on commercial computing products. The program,
conducted jointly by the Defense Advanced Research Projects
Agency (DARPA) and the Aegis Shipbuilding Program, consists of
simultaneous top down engineering studies and large-scale
critical experiments using new computer technology.
------
(6) Issues for realizing a scalable Real Time Kernel for
function-distributed Multiprocessors
Hiroaki Takada, Cai-Dong Wang, and ken Sakamura
University of Tokyo, Japan
Abstract: In multiprocessor systems, the worst-case execution
time of a task that exclusively accesses a shared resource is
unavoidably prolonged as the number of contending processors is
increased. In case of function-distributed multiprocessors,
because many of the tasks can be processed within a processor,
it is advantageous that their worst-case behavior are
independent of the number of processors in the system. This
paper summarizes the required properties on scalable real-time
kernels and discusses their realization techniques. What we
have solved so far are described, and the remaining problem to
be solved is presented.
------
(7) The design and implementation of the CPU power regulator for
multimedia operating systems
Giun-Haur Huang, Shie-Kai Ni, and Tei-Wei Kuo
National Chung Cheng University, Taiwan
Abstract: This paper describes a Windows NT/95 utility, the CPU
Power Regulator (CPR), which improves the capability of Windows
NT/95 in servicing time-critical applications. CPR considers a
distance model [4] to service time-critical applications such as
multimedia softwares and electronic games in a timely
fashion. Distinct from the past work [7, 8, 9], CPR adopts a
user-level control mechanism to manage the resource allocations
on Windows NT/95 and makes no modifications to the operating
system and application softwares. The performance of CPR was
verified by a collection of simulation experiments of randomly
generated and realistic workloads. CPR not only introduces very
low system overheads but also largely reduces the phenomenon of
non-timely resource allocation for applications. The
experimental results also demonstrate the capability and
flexibility of CPR in multiplexing CPU cycles to provide
different degrees of quality-of-service to time-critical
applications. The results of this work present a low-cost
software solution to transform an ordinary operating system into
a multimedia operating system.
------
(8) An approach for monitoring intrusion removal in Real Time Systems
Vishal Jain, Madalene Spezialetti, and Rajiv Gupta
University of Pittsburgh and Trinity College, USA
Abstract: To assist in the development of a real-time
application, monitoring is used to collect execution timing
information for the application. In this paper we propose a
strategy that accurately reports timing information by
accounting for intrusion introduced by monitoring. In addition,
by allowing processes that miss deadlines to run to completion,
our approach provides the user with times by which the execution
of these processes exceeds their deadlines. This information can
be used to guide the user in restructuring the application to
meet timing requirements.
------
(9) Empirical Evaluation of Task and Resource Scheduling in Dynamic
Real-Time Systems
Ken Tew and Panos Chrysantis and Daniel Mosse
University of Pittsburgh, USA
Abstract: This work-in-progress reports on our on-going
empirical evaluation of a two-tiered resource allocation scheme
assuming independent jobs, that is, jobs have no precedence
constraints. The first tier extends the temporal density
approach, while the second tier uses an Earliest Deadline First
(EDF) approach to schedule jobs at a site. However, job
scheduling at sites is constrained by the precedence relation
between the loading and execution of a job. In addition to CPU
scheduling, we also take care of the time it takes to load a
task onto memory from a disk (or from another processor over the
network). We assume that loading (i.e., disk scheduling)
follows an EDF non-preemptive discipline whereas the execution
(i.e., CPU scheduling) follows a preemptive EDF.
------
(10) Scalability based admission control of real-time channels
Ramesh Yerraballi and Ravi Mukkamala
Midwestern State University and Old Dominion University, USA
Abstract: This paper reports our continuing efforts and initial
results with the problem of admission control in real-time
networks. This problem was first addressed by the Tenet group,
and, their approach was based on the assumption that the link
level scheduling was EDD (Earliest Due Date) based. Our work
departs from this assumption by addressing the problem in the
context of any arbitrary dynamic/fixed priority link level
scheduling. Our approach is based on extending a result we have
derived in a different context, viz., Task Scalability. It
involves assessing the current capacity of a link in terms of
its ability to accommodate (scale to) new channels. This
assessment (called the admittance measure) is then heuristically
compared against the traffic requirements of the newly requested
channel to decide its admissibility. A simulation study was
performed to study the effectiveness of our approach in
improving both utilization of the link and admissibility of
channels. Further, we demonstrate the relevance of our heuristic
by observing that it reduces to the Tenet schedulability test,
for the case of EDD.
------
(11) Optimization of scheduling on real-time parallel computer systems
Leyuan Shi and Philip Q. Hwang
University of Wisconsin and Defence Mapping Agency, USA
Abstract: We describe our ongoing work in the field of optimal
scheduling for real-time systems. We are primarily concerned
with optimal task allocation and job scheduling for parallel
computer systems. Many real-time task allocation and job
scheduling problems are proven to be NP-hard. Recently, we
proposed a randomized optimization framework for efficiently
solving such NP-hard problems. The proposed method, the Nested
Partitions (NP) method, has been proved to converge to global
optimal solutions and it is also highly matched to emerging
massively parallel processing capabilities.
------
(12) Dynamic Scheduling of Hard Real-Time Applications in Open System
Environment
Z. Deng, J. W.-S. Liu, and J. Sun
University of Illinois at Urbana Champaign, USA
Abstract: This paper focuses on the problem of providing
run-time support to real-time applications and non-real-time
applications in an open system. It describes a two-level
hierarchical priority-driven scheme for scheduling independently
developed applications. The scheme allows the developer of each
real-time application to validate the schedulability of the
application independently of other applications. Once a
real-time application is created and accepted by the open
system, its schedulability is guaranteed regardless of the
behaviors of other applications that execute concurrently in the
system.
------
(13) In Search for an efficient Real-Time Atomic Commit Protocol
Yousef Al-Houmaily and Panos Chrysantis
University of Pittsburgh, USA
Abstract: The purpose of this paper is to report on the first
step in our quest for an efficient atomic commit protocol in
real-time databases. This includes the development of RT-IYV
(real-time implicit yes-vote), a new real-time atomic commit
protocol. In contrast to other real-time commit protocols that
provide for semantic atomicity, RT-IYV is designed to ensure the
traditional notion of transaction atomicity. RT-IYV (1)
eliminates the voting phase from 2PC hence, reducing the number
of sequential coordination messages and forced log writes during
normal processing, and (2) supports transactions' forward
recovery hence, enabling partially executed transactions to
resume their execution after a failure. To illustrate its
performance advantages, we compare RT-IYV with the recently
proposed OPT (optimistic commit protocol) which is also designed
to support the standard transaction atomicity in real-time
databases.
------
(14) Distributed Real-Time Dataflow: An Execution Paradigm for Image
Processing and Anti-Submarine Warfare Applications
Steve Goddard and Kevin Jeffay
University of North Carolina, USA
Abstract: The purpose of this paper is to report on the first
step in our quest for an efficient atomic commit protocol in
real-time databases. This includes the development of RT-IYV
(real-time implicit yes-vote), a new real-time atomic commit
protocol. In contrast to other real-time commit protocols that
provide for semantic atomicity, RT-IYV is designed to ensure the
traditional notion of transaction atomicity. RT-IYV (1)
eliminates the voting phase from 2PC hence, reducing the number
of sequential coordination messages and forced log writes during
normal processing, and (2) supports transactions' forward
recovery hence, enabling partially executed transactions to
resume their execution after a failure. To illustrate its
performance advantages, we compare RT-IYV with the recently
proposed OPT (optimistic commit protocol) which is also designed
to support the standard transaction atomicity in real-time
databases.
%R 1997-001
%T Exploiting Redundancy for Timeliness in TCP Boston
%A Bestavros, Azer
%A Kim, Gitae
%D January 24, 1997
%U http://www.cs.bu.edu/techreports/1997-001-tcp-boston-realtime.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
While ATM bandwidth-reservation techniques are able to offer the
guarantees necessary for the delivery of real-time streams in many
applications (e.g. live audio and video), they suffer from many
disadvantages that make them inattractive (or impractical) for many
others. These limitations coupled with the flexibility and popularity
of TCP/IP as a best-effort transport protocol have prompted the
network research community to propose and implement a number of
techniques that adapt TCP/IP to the Available Bit Rate (ABR) and
Unspecified Bit Rate (UBR) services in ATM network environments. This
allows these environments to smoothly integrate (and make use of)
currently available TCP-based applications and services without much
(if any) modifications. However, recent studies have shown that
TCP/IP, when implemented over ATM networks, is susceptible to serious
performance limitations. In a recently completed study, we have
unveiled a new transport protocol, TCP Boston, that turns ATM's
53-byte cell-oriented switching architecture into an advantage for
TCP/IP.
In this paper, we demonstrate the real-time features of TCP Boston
that allow communication bandwidth to be traded off for timeliness. We
start with an overview of the protocol. Next, we analytically
characterize the dynamic redundancy control features of TCP
Boston. Next, We present detailed simulation results that show the
superiority of our protocol when compared to other adaptations of
TCP/IP over ATMs. In particular, we show that TCP Boston improves
TCP/IP's performance over ATMs for both network-centric metrics ({\em
e.g.}, effective throughput and percent of missed deadlines) and
real-time application-centric metrics (e.g., response time and
jitter).
%R 1997-002
%T The Network Effects of Prefetching
%A Crovella, Mark
%A Barford, Paul
%D February 7, 1997
%U http://www.cs.bu.edu/techreports/1997-002-prefetcheff.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Prefetching has been shown to be an effective technique for reducing
user perceived latency in distributed systems. In this paper we show
that even when prefetching adds no extra traffic to the network, it can
have serious negative performance effects. Straightforward approaches
to prefetching increase the burstiness of individual sources, leading to
increased average queue sizes in network switches. However, we
also show that applications can avoid the undesirable queueing effects
of prefetching. In fact, we show that applications employing
prefetching can significantly improve network performance, to a level
much better than that obtained without any prefetching at all. This is
because prefetching offers increased opportunities for traffic shaping
that are not available in the absence of prefetching. Using a simple
transport rate control mechanism, a prefetching application can modify
its behavior from a distinctly ON/OFF entity to one whose data transfer
rate changes less abruptly, while still delivering all data in advance
of the user's actual requests.
%R 1997-003
%T Visible Volume: A Robust Measure for Protein Structure Characterization
%A LoConte, Loredana
%A Smith, Temple F.
%D March 20, 1997
%U http://www.cs.bu.edu/techreports/1997-003-visiblevolume.ps.Z
%I CS Department and BioMolecular Eng. Research Center, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose a new characterization of protein structure based on the
natural tetrahedral geometry of the beta carbon and a new geometric
measure of structural similarity, called visible volume. In our model,
the side-chains are replaced by an ideal tetrahedron, the orientation
of which is fixed with respect to the backbone and corresponds to the
preferred rotamer directions. Visible volume is a measure of the
non-occluded empty space surrounding each residue position after the
side-chains have been removed. It is a robust, parameter-free,
locally-computed quantity that accounts for all spatial constraints
that are of relevance to the corresponding position in the native
structure. When computing visible volume, we ignore the nature of both
the residue observed at each site and the ones surrounding it. We
focus instead on the space that, together, these residues could
occupy. By doing so, we are able to quantify a new kind of invariance
beyond the apparent variations in a protein family, namely, the
conservation of the physical space that is available at structurally
equivalent positions for 3-D side-chain packing. Visible volume has
the unique property of estimating how much space can be used at each
site for different combinations of side-chains to fit in. This
property, and the relation of visible volume to the degree of exposure
of a residue position, qualify it as a powerful tool in a variety of
applications, from the detailed analysis of protein structure to the
definition of better scoring functions for threading purpose.
%R 1997-004
%T Determining WWW User's Next Access and Its Application to Pre-fetching
%A Cunha, Carlos R.
%A Jaccoud, Carlos F.B.
%D March 24, 1997
%U http://www.cs.bu.edu/techreports/1997-004-userbehaviorprediction.ps.Z
%I CS Department, Boston University and Embratel, Brazil
%Z Wed, 16 May 2012 14:43:22 GMT
%X
World-Wide Web (WWW) services have grown to levels where significant
delays are expected to happen. Techniques like pre-fetching are likely
to help users to personalize their needs, reducing their waiting times.
However, pre-fetching is only effective if the right documents are
identified and if user's move is correctly predicted. Otherwise,
pre-fetching will only waste bandwidth. Therefore, it is productive to
determine whether a revisit will occur or not, before starting
pre-fetching. In this paper we develop two user models that help
determining user's next move. One model uses Random Walk
approximation and the other is based on Digital Signal Processing
techniques. We also give hints on how to use such models with a simple
pre-fetching technique that we are developing.
This is an extended version of the article with the same title
presented in the International Symposium on Computers and
Communication'97, Alexandria, Egypt, 1-3 July, 1997.
%R 1997-005
%T ImageRover: A Content-Based Image Browser for the World Wide Web
%A Sclaroff, Stan
%A Taycher, Leonid
%A LaCascia, Marco
%D March 24, 1997
%U http://www.cs.bu.edu/techreports/1997-005-imagerover.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
ImageRover is a search by image content navigation tool for the world
wide web. To gather images expediently, the image collection subsystem
utilizes a distributed fleet of WWW robots running on different
computers. The image robots gather information about the images they
find, computing the appropriate image decompositions and indices, and
store this extracted information in vector form for searches based on
image content. At search time, users can iteratively guide the search
through the selection of relevant examples. Search performance is made
efficient through the use of an approximate, optimized k-d tree
algorithm. The system employs a novel relevance feedback algorithm that
selects the Lm distance metrics appropriate for a particular query.
%R 1997-006
%T Generating Representative Web Workloads for Network and Server Performance Evaluation
%A Barford, Paul
%A Crovella, Mark
%D May 5, 1997 (revised November 4, 1997)
%U http://www.cs.bu.edu/techreports/1997-006-surge.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
One role for workload generation is as a means for understanding how
servers and networks respond to variation in load. This enables
management and capacity planning based on current and projected usage.
This paper applies a number of observations of Web server usage to
create a realistic Web workload generation tool which mimics a set of
real users accessing a server. The tool, called SURGE (Scalable URL
Reference Generator) generates references matching empirical
measurements of 1) server file size distribution; 2) request size
distribution; 3) relative file popularity; 4) embedded file
references; 5) temporal locality of reference; and 6) idle periods of
individual users. This paper reviews the essential elements required
in the generation of a representative Web workload. It also addresses
the technical challenges to satisfying this large set of simultaneous
constraints on the properties of the reference stream, the solutions
we adopted, and their associated accuracy. Finally, we present
evidence that SURGE exercises servers in a manner significantly
different from other Web server benchmarks.
%R 1997-007
%T Real-Time Mutable Broadcast Disks
%A Baruah, Sanjoy
%A Bestavros, Azer
%D May 5, 1997
%U http://www.cs.bu.edu/techreports/1997-007-mutable-bdisks.ps.Z
%I EE/CS Department, University of Vermont; CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
There is an increased interest in using broadcast disks to support
mobile access to real-time databases. However, previous work has only
considered the design of real-time immutable broadcast disks, the
contents of which do not change over time. This paper considers the
design of programs for real-time mutable broadcast disks --- broadcast
disks whose contents are occasionally updated. Recent
scheduling-theoretic results relating to pinwheel scheduling and pfair
scheduling are used to design algorithms for the efficient generation
of real-time mutable broadcast disk programs.
%R 1997-008
%T Active Blobs
%A Sclaroff, Stan
%A Isidoro, John
%D May 5, 1997
%U http://www.cs.bu.edu/techreports/1997-008-activeblobs.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Active blobs, a new region-based approach to nonrigid motion tracking is
described. Active blobs employ a view-based representation; each
object is defined in terms of a deformable, active blob of color pixels.
Shape is defined in terms of a triangulated finite element model that
captures object shape plus a color texture map that captures
object appearance. Active blobs also provide normalization with respect to
some photometric variations. Nonrigid shape registration and motion
recovery is achieved by posing the problem as an energy-based, robust
minimization procedure. The active blob formulation is robust to
occlusions, shadows, and specular highlights.
%R 1997-009
%T Load Profiling for Efficient Route Selection in Multi-Class Networks
%A Bestavros, Azer
%A Matta, Ibrahim
%D May 14, 1997
%U http://www.cs.bu.edu/techreports/1997-009-route-profiling.ps.Z
%I CS Department, Boston University and CS Department, Northeastern University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
High-speed networks, such as ATM networks, are expected to
support diverse Quality of Service (QoS) constraints, including
real-time QoS guarantees. Real-time QoS is required by many
applications such as those that involve voice and video communication.
To support such services, routing algorithms that allow applications
to reserve the needed bandwidth over a Virtual Circuit (VC) have been
proposed. Commonly, these bandwidth-reservation algorithms assign VCs
to routes using the least-loaded concept, and thus result in balancing
the load over the set of all candidate routes.
In this paper, we show that for such reservation-based
protocols---which allow for the exclusive use of a preset fraction of
a resource's bandwidth for an extended period of time---load balancing
is not desirable as it results in resource fragmentation, which
adversely affects the likelihood of accepting new reservations. In
particular, we show that load-balancing VC routing algorithms are not
appropriate when the main objective of the routing protocol is to
increase the probability of finding routes that satisfy incoming VC
requests, as opposed to equalizing the bandwidth utilization along the
various routes. We present an on-line VC routing scheme that is based
on the concept of ``load profiling'', which allows a distribution of
``available'' bandwidth across a set of candidate routes to match the
characteristics of incoming VC QoS requests. We show the
effectiveness of our load-profiling approach when compared to
traditional load-balancing and load-packing VC routing schemes.
%R 1997-010
%T Concurrency Admission Control Management in ACCORD
%A Nagy, Sue
%A Bestavros, Azer
%D May 15, 1997
%U http://www.cs.bu.edu/techreports/1997-010-accord-cacm.ps.Z
%I CS Department, Boston University and OSF
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose and evaluate admission control mechanisms for ACCORD, an
Admission Control and Capacity Overload management Real-time Database
framework---an architecture and a transaction model---for hard
deadline RTDB systems. The system architecture consists of admission
control and scheduling components which provide early notification of
failure to submitted transactions that are deemed not valuable or
incapable of completing on time. In this paper, we focus on our
Concurrency Admission Control Manager (CACM), which ensures that
admitted transactions do not overburden the system by requiring a
level of concurrency that is not sustainable. The transaction model
consists of two components: a primary taskand a compensating task.
The execution requirements of the primary task are notknown a priori,
whereas those of the compensating task are known a priori. Upon the
submission of a transaction, the Admission Control Mechanismsare
employed to decide whether to admitor rejectthat transaction. Once
admitted, a transaction is guaranteed to finishexecuting before its
deadline. A transaction is considered to have finished executing if
exactly one of two things occur: Either its primary task is completed
(successful commitment), or its compensating task is completed (safe
termination). Committed transactions bring a profit to the system,
whereas a terminated transaction brings no profit. The goal of the
admission control and scheduling protocols (e.g., concurrency control,
I/O scheduling, memory management) employed in the system is to
maximize system profit. In that respect, we describe a number of
concurrency admission control strategies and contrast (through
simulations) their relative performance.
%R 1997-011
%T Reliability, Availability, Dependability and Performability: A User-centered View
%A Heddaya, Abdelsalam
%A Helal, Abdelsalam
%D May 15, 1997
%U http://www.cs.bu.edu/techreports/1997-011-reliability-def.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Reliability and availability have long been considered twin system
properties that could be enhanced by distribution. Paradoxically, the
traditional definitions of these properties do not recognize the
positive impact of recovery---as distinct from simple repair and
restart---on reliability, nor the negative effect of recovery, and of
internetworking of clients and servers, on availability. As a result
of employing the standard definitions, reliability would tend to be
underestimated, and availability overestimated.
We offer revised definitions of these two critical metrics, which we
call service reliability and service availability, that improve the
match between their formal expression, and intuitive meaning. A
fortuitous advantage of our approach is that the product of our two
metrics yields a highly meaningful figure of merit for the overall
dependability of a system. But techniques that enhance system
dependability exact a performance cost, so we conclude with a cohesive
definition of performability that rewards the system for performance
that is delivered to its client applications, after discounting the
following consequences of failure: service denial and interruption,
lost work, and recovery cost.
%R 1997-012
%T On the Interaction Between an Operating System and Web Server
%A Yates, David J.
%A Almeida, Virgilio
%A Almeida, Jussara M.
%D July 16, 1997
%U http://www.cs.bu.edu/techreports/1997-012-interaction-os-webserver.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper examines how and why web server performance changes as the
workload at the server varies. We measure the performance of a PC acting
as a standalone web server, running Apache on top of Linux. We use two
important tools to understand what aspects of software architecture and
implementation determine performance at the server. The first is a tool
that we developed, called WebMonitor, which measures activity and resource
consumption, both in the operating system and in the web server. The
second is the kernel profiling facility distributed as part of Linux. We
vary the workload at the server along two important dimensions: the number
of clients concurrently accessing the server, and the size of the documents
stored on the server. Our results quantify and show how more clients and
larger files stress the web server and operating system in different and
surprising ways. Our results also show the importance of fixed costs
(i.e., opening and closing TCP connections, and updating the server log) in
determining web server performance.
%R 1997-013
%T Evaluation of a Load Profiling Approach to Routing Guaranteed Bandwidth Flows
%A Matta, Ibrahim
%A Bestavros, Azer
%D July 30, 1997
%U http://www.cs.bu.edu/techreports/1997-013-route-profiling-evaluation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
To support the diverse Quality of Service (QoS) requirements of
real-time (e.g. audio/video) applications in integrated services
networks, several routing algorithms that allow for the reservation of
the needed bandwidth over a Virtual Circuit (VC) established on one of
several candidate routes have been proposed. Traditionally, such
routing is done using the least-loaded concept, and thus results in
balancing the load across the set of candidate routes. In a recent
study, we have established the inadequacy of this load balancing
practice and proposed the use of load profiling as an alternative.
Load profiling techniques allow the distribution of ``available''
bandwidth across a set of candidate routes to match the
characteristics of incoming VC QoS requests.
In this paper we thoroughly characterize the performance of VC routing
using load profiling and contrast it to routing using load balancing
and load packing. We do so both analytically and via extensive
simulations of multi-class traffic routing in Virtual Path (VP) based
networks. Our findings confirm that for routing guaranteed bandwidth
flows in VP networks, load balancing is not desirable as it results in
VP bandwidth fragmentation, which adversely affects the likelihood of
accepting new VC requests. This fragmentation is more pronounced when
the granularity of VC requests is large. Typically, this occurs when a
common VC is established to carry the aggregate traffic flow of many
high-bandwidth real-time sources. For VP-based networks, our
simulation results show that our load-profiling VC routing scheme
performs better or as well as the traditional load-balancing VC
routing in terms of revenue under both skewed and uniform workloads.
Furthermore, load-profiling routing improves routing fairness by
proactively increasing the chances of admitting high-bandwidth
connections.
%R 1997-014
%T Image Digestion and Relevance Feedback in the ImageRover WWW Search Engine
%A Taycher, Leonid
%A La Cascia, Marco
%A Sclaroff, Stan
%D August 14, 1997
%U http://www.cs.bu.edu/techreports/1997-014-imagedigestion.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
ImageRover is a search by image content navigation tool for the
world wide web. The staggering size of the WWW dictates certain
strategies and algorithms for image collection, digestion, indexing,
and user interface. This paper describes two key components of the
ImageRover strategy: image digestion and relevance feedback. Image
digestion occurs during image collection; robots digest the images
they find, computing image decompositions and indices, and storing
this extracted information in vector form for searches based on image
content. Relevance feedback occurs during index search; users can
iteratively guide the search through the selection of relevant
examples. ImageRover employs a novel relevance feedback algorithm to
determine the weighted combination of image similarity metrics
appropriate for a particular query. ImageRover is
available and running on the web site.
%R 1997-015
%T Admission Control and Scheduling for High Performance WWW Servers
%A Bestavros, Azer
%A Katagai, Naomi
%A Londono, Jorge
%D August 21, 1997
%U http://www.cs.bu.edu/techreports/1997-015-web-admission-control-and-scheduling
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper we examine a number of admission control and scheduling
protocols for high-performance web servers. In particular, we propose
the use of a 2-phase policy for serving HTTP requests. The first
``registration'' phase involves establishing the TCP connection for
the HTTP request and parsing/iterpreting its arguments, whereas the
second ``service'' phase involves the service/transmission of data in
response to the HTTP request. By introducing a delay between these two
phases, we show that the performance of a web server could be improved
significantly through the adoption of a number of scheduling policies
that optimize the utilization of various system components
(e.g. memory cache and I/O). In addition, to its premise for
improving the performance of a single web server, the delineation
between the registration and service phases of an HTTP request may be
useful for load balancing purposes on clusters of web servers. We are
investigating the use of such a mechanism as part of the Commonwealth
testbed being developed at Boston University.
%R 1997-016
%T Discovering Spatial Locality in WWW Access Patterns using Data Mining of Document Clusters in Server Logs
%A Bestavros, Azer
%D August 28, 1997
%U http://www.cs.bu.edu/techreports/1997-016-www-spacial-locality-mining.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper, we introduce the notion of a ``document cluster'' in
WWW space as a generalization of the notion of a ``cache line'' in
linear memory address space. Through the analysis of Web server logs,
we show evidence of the spatial locality of reference in WWW access
patterns and present an implementation of an efficient data mining
algorithm that discovers document clusters. We show preliminary
simulation results that quantify the benefits of using document
clusters for file allocation on server disks, as well as for purposes
of prefetching into server cache/main memory.
%R 1997-017
%T To queue or not to queue?: When FCFS is better than PS in a distributed system
%A Harchol-Balter, Mor
%A Crovella, Mark
%A Murta, Cristina
%D October 31, 1997
%U http://www.cs.bu.edu/techreports/1997-017-queue-or-not.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We examine the question of whether to employ the first-come-first-served
(FCFS) discipline or the processor-sharing (PS) discipline at the nodes
in a distributed server system. We are interested in the case in which
service times are drawn from a heavy-tailed distribution, and so have
very high variability. Traditional wisdom in such a situation would
prefer the PS discipline, because it allows small tasks to avoid being
delayed behind large tasks in a queue. However, we show that system
performance can actually be significantly better under FCFS queueing, if
a particular kind of task assignment is used. By task assignment, we
mean an algorithm that inspects incoming tasks and assigns them to hosts
for service. The policy we propose is called SITA-E: Size Interval Task
Assignment with Equal Load; it is a static policy that does not
incorporate feedback knowledge of the state of the hosts. Surprisingly,
under SITA-E, FCFS queueing typically outperforms the PS discipline by a
factor of about two, as measured by mean waiting time and mean slowdown
(waiting time of task divided by its service time). We analyze the
FCFS/SITA-E policy and compare it to the processor-sharing case; in
addition we compare it in simulation to a number of other policies. We
show that the benefits of SITA-E are present even in small-scale
distributed systems (four or more hosts), and that SITA-E can in many
cases be more effective than a dynamic policy that takes into account
the current load at each host. Finally we discuss issues in employing
this policy in distributed Web servers.
%R 1997-018
%T Task Assignment in a Distributed System: Improving Performance by Unbalancing Load
%A Crovella, Mark
%A Harchol-Balter, Mor
%A Murta, Cristina
%D October 31, 1997
%U http://www.cs.bu.edu/techreports/1997-018-unbalancing-load.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider the problem of task assignment in a distributed system (such as
a distributed Web server) in which task sizes are drawn from a heavy-tailed
distribution. Many task assignment algorithms are based on the heuristic
that balancing the load at the server hosts will result in optimal
performance. We show this conventional wisdom is less true when the task
size distribution is heavy-tailed (as is the case for Web file sizes). We
introduce a new task assignment policy, called Size Interval Task
Assignment with Variable Load (SITA-V). SITA-V purposely operates the
server hosts at different loads, and directs smaller tasks to the
lighter-loaded hosts. The result is that SITA-V provably decreases the
mean task slowdown by significant factors (up to 1000 or more) where the
more heavy-tailed the workload, the greater the improvement factor. We
evaluate the tradeoff between improvement in slowdown and increase in
waiting time in a system using SITA-V, and show conditions under which
SITA-V represents a particularly appealing policy. We conclude with a
discussion of the use of SITA-V in a distributed Web server, and show that
it is attractive because it has a simple implementation which requires no
communication from the server hosts back to the task router.
%R 1997-019
%T Color Region Grouping and Shape Recognition with Deformable Models
%A Liu, Lifeng
%A Sclaroff, Stan
%D November 24, 1997
%U http://www.cs.bu.edu/techreports/1997-019-deformable-color-region-grouping.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A new deformable shape-based method for color region segmentation is
described. The method includes two stages: over-segmentation using
a traditional color region segmentation algorithm, followed by
deformable model-based region merging via grouping and hypothesis
selection. During the second stage, region merging and object
identification are executed simultaneously. A statistical shape model is
used to estimate the likelihood of region groupings and model
hypotheses. The prior distribution on deformation parameters is
precomputed using principal component analysis over a training set of
region groupings. Once trained, the system autonomously segments
deformed shapes from the background, while not merging them with
similarly colored adjacent objects. Furthermore, the recovered
parametric shape model can be used directly in object recognition and
comparison. Experiments in segmentation and image retrieval are
reported.
%R 1997-020
%T Head Tracking via Robust Registration in Texture Map Images
%A La Cascia, Marco
%A Isidoro, John
%A Sclaroff, Stan
%D November 24, 1997
%U http://www.cs.bu.edu/techreports/1997-020-head-tracking.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A novel method for 3D head tracking in the presence of large head
rotations and facial expression changes is described. Tracking is
formulated in terms of color image registration in the texture map of a
3D surface model. Model appearance is recursively updated via image
mosaicking in the texture map as the head orientation varies. The
resulting dynamic texture map provides a stabilized view of the face
that can be used as input to many existing 2D techniques for face
recognition, facial expressions analysis, lip reading, and eye tracking.
Parameters are estimated via a robust minimization procedure; this
provides robustness to occlusions, wrinkles, shadows, and specular
highlights. The system was tested on a variety of sequences taken with
low quality, uncalibrated video cameras. Experimental results are
reported.
%R 1997-021
%T Proceedings of the 18th Real-Time Systems Symposium WIP Session
%A Bestavros, Azer
%D December 1, 1997
%U http://www.cs.bu.edu/techreports/1997-021-ieee-rtss97-wip
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This technical report includes 10 short papers presented during the
WIP session of the 18th Real-Time Systems Symposium, held in
Washington DC on December 3-5, 1997. The title and authors are
included below.
------
(1) CPU Reservations and Time Constraints:
Efficient, Predictable Scheduling of Independent Activities
Michael B. Jones, Microsoft Research, Microsoft Corporation
Daniela Rosu and Marcel-Catalin Rosu, Georgia Institute of Technology
Abstract:
Workstations and personal computers are increasingly being used
for applications with real-time characteristics such as speech
understanding and synthesis, media computations and I/O, and
animation, often concurrently executed with traditional
non-real-time workloads. This paper presents a system that can
schedule multiple independent activities so that:
- activities can obtain minimum guaranteed execution rates with
application-specified reservation granularities via CPU
Reservations,
- CPU Reservations, which are of the form "reserve X units of
time out of every Y units", provide not just an average case
execution rate of X/Y over long periods of time, but the
stronger guarantee that from any instant of time, by Y time
units later, the activity will have executed for at least X
time units,
- applications can use Time Constraints to schedule tasks by
deadlines, with on-time completion guaranteed for tasks with
accepted constraints, and
- both CPU Reservations and Time Constraints are implemented very
efficiently. In particular,
- CPU scheduling overhead is bounded by a constant and is not a
function of the number of schedulable tasks.
Other key scheduler properties are:
- activities cannot violate other activities' guarantees,
- time constraints and CPU reservations may be used together,
separately, or not at all (which gives a round-robin
schedule), with well-defined interactions between all
combinations, and
- spare CPU time is fairly shared among all activities.
The Rialto operating system, developed at Microsoft Research,
achieves these goals by using a precomputed schedule, which is
the fundamental basis of this work.
------
(2) Characterizing Group Communication Middleware for a Real-time
Distributed System
L. M. Feeney, P. Bernadat, F. Travostino
The Open Group Research Institute
Abstract:
This paper presents our current work in characterizing the
behavior of a real-time dependable distributed system, which
must exhibit predictable behavior under load and in the presence
of partial failures. We focus on measuring the end-to-end
properties of the middleware which implements the real-time
process group service, specifically its membership and message
latency. The paper also describes the tools and techniques we
have developed, along with some of the practical issues that
arise in instrumenting a real-time distributed system.
------
(3) Real-Time Monitoring of the EIVIS Distributed Video-Server on Windows NT
M. Gergeleit and M. Mock
GMD - German National Research Center for Information Technology
Abstract:
JewelNT is a fine-grained, trace-based real-time monitoring tool
for Windows NT. It hooks into the NT kernel and provides full
information about NT?s thread scheduling combined with
application-level timing information. JewelNT allows monitoring
a number of NT machines remotely controlled from one central
desktop. JewelNT has been initially developed for the evaluation
and performance tuning of the distributed EIVIS video server, a
European ESPRIT project.
------
(4) Achieving Predictability and Responsiveness of Fault Recovery
Operations in Real-Time Systems.
Pedro Mejia-Alvarez, CINVESTAV-IPN, Seccion de Computacion, Mexico
Juan A. de la Puente, Universidad Politecnica de Madrid, Spain.
Abstract:
The dependability of real-time software can be improved by
enhancing the robustness of the scheduler in predicting and
controlling the occurrence of timing failures during recovery.
This may be achieved by developing strategies which allow the
scheduler to dynamically control the manner in which real-time
applications tasks and its time-critical recovery operations are
handled in time.
In this paper, an scheme is presented to provide scheduling
guarantees for a variety of fault tolerant techniques. Bounds of
execution are developed and an study case examined to analyze
these techniques in its ability to recover from transient faulty
situations. A criterion for providing responsiveness for
fault-tolerant scheduler is discussed and some approaches were
developed.
A responsiveness table RTAB, has been developed for assisting
the scheduler during recovery of transient faults. This table is
based on different criterion for responsiveness of recovery. An
analytical characterization of the table, for supporting on-line
scheduling has been developed. Some of the issues involved in
using this table to support run-time scheduling decisions are
illustrated with a hypothetical application example.
The advantages of the RTAB approach over previously proposed
scheduling policies for aperiodic tasks include the support for
run-time customization and guaranteed scheduling stability
during recovery.
------
(5) Compositional Reasoning about Real-Time Asynchronous
Communication with Time-Outs
D. Peticolas and F.A. Stomp, University of California, Davis
Abstract:
This paper describes ongoing work in developing a compositional
trace-based semantics and proof system for a real-time
language. The semantics models distributed processes
communicating over asynchronous FIFO communication
channels. Sending processes can specify time-out periods for
individual messages. Messages not received within their time-out
period are `lost'. Program behavior is modeled as traces of
events, including events (such as asynchronous messages) which
occur after termination. The proof system uses specification
triples with explicit variables for time and program traces.
------
(6) Exploring Consistency of Read-Only Transactions in Real-Time Systems
Kwok-Wa Lam, Sang H. Son* and Sheung-Lun Hung
City University of Hong Kong, Hong Kong.
University of Virginia, U.S.A.
Abstract:
In this paper, we describe our current work on exploring the
consistency of read-only transactions (ROT) in real-time
systems. A ROT is a transaction that only reads, but does not
update any data items. Since there is a significant proportion
of ROTs in several real-time systems, it is important to
investigate how to process ROTs efficiently with separate
algorithms. We identify three different consistency
requirements for ROTs. Particularly, we define a weaker form of
consistency, view consistency, which allows ROTs to perceive
different serialization order of update transactions, thus
permitting non-serializable execution of transactions. However,
ROTs are still ensured to see consistent data. Based on view
consistency, we present two algorithms which let ROTs read the
most recent and consistent data without interfering with update
transactions. The recency of data read by a ROT could be
important in some real-time applications.
------
(7) Dynamic Timing Constraints - Relaxing Overconstraining
Specifications of Real-Time Systems
Gerhard Fohler
Malardalens University, S-72123 Vasteras, Sweden
Abstract:
Standard timing constraints, such as deadlines and periods can
overconstrain specifications and lack expressive power. Only few
tasks have "natural" periods and deadlines. Most are artifacts,
derived during system design. Knowledge of more flexibility is
abandoned in the process, thus overconstraining the
specification.
In this paper, we propose dynamic timing constraints, which
represent conditions for the temporal correctness rather than
fixed values for constraints such as period and deadline. This
is achieved by so-called timing entities, which combine a
functional unit, such as a task, with a feasibility function for
testing the feasibility of the timing of the unit. This
representation allows the system specification to provide
information about feasibility and various options of time
related design decisions.
We outline how dynamic timing constraints can be used with
standard scheduling algorithms, indicate modifications to these
algorithms, and novel approaches fully utilizing the benefits of
dynamic timing constraints.
------
(8) Exploring the Importance of Preprocessing Operations in
Real-Time Multiprocessor Scheduling
Jan Jonsson, Chalmers University of Technology, Sweden
Abstract:
Recent real-time scheduling research has mainly been focused on
generating mature scheduling theories. Therefore, the important
field of preprocessing operations has been left fairly
unexplored. Most real-time scheduling techniques in use today
assume that the constraints (e.g. local task deadlines, degree
of task replication, or task clustering) on the constituent
tasks are entirely known beforehand. In such cases, no
preprocessing is typically applied. However, when the
constraints are relaxed, preprocessing operations can be applied
for increasing the likelihood of succeeding with a scheduling
attempt. In addition, preprocessing operations are vital in
quality-of-service negotiations for adaptive real-time systems
since changing some of the task constraints may result in a
higher system reward.
In this paper, we define a set of preprocessing operations that
we believe is representative for real-time multiprocessor
scheduling. We also give a rationale for using these operations,
and present results from some preliminary work that corroborate
our conjecture. In conjunction to this, we present an evaluation
framework for objective studies of different preprocessing
operations.
------
(9) Compiler Support for Non-intrusive Monitoring and Debugging of
Real-Time Systems in the CRL Environment
P. V. Petrov, A. D. Stoyen
New Jersey Institute of Technology
Abstract:
In this work we approach the problem of monitoring and debugging
real-time distributed systems by performing static analysis and
transformations to eliminate obtrusion to the monitored system.
Our work extends the CRL testbed compiler and run-time
environment to support monitoring and logging for the purpose of
post-mortem debugging. The main contribution of this work is
the innovative use of compiler transformations and idle slots
for monitoring and logging.
------
(10) Optimization of Real-Time MRL Rule-Based Systems with the EQL Optimizer
Albert Mo Kim Cheng
University of Houston--University Park Houston, Texas, USA
Abstract:
In our earlier work, we developed an efficient algorithm for
optimizing a class of EQL rule-based systems so that they can
meet specified response time constraints. In this paper, we
show that this EQL optimizer with minor modifications can be
used to optimize a class of real-time MRL rule-based systems.
As a more expressive superset of EQL, MRL allows existentially
quantified as well as universally quantified variables (simple
or macro), making it comparable in expressive power to that of
OPS5 and CLIPS (two of the most popular commercially available
rule-based system languages) while maintaining predictable
response time behavior.
%R 1997-022
%T A Framework for Local Anonymity in the Internet
%A Martin, David M.
%D December 23, 1997
%U http://www.cs.bu.edu/techreports/1997-022-lanon.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We describe and evaluate options for providing anonymous IP service,
argue for the further investigation of local anonymity, and sketch a
framework for the implementation of locally anonymous networks.
%R 1998-001
%T Aggregating Congestion Information Over Sequences of TCP Connections
%A Bestavros, Azer Bestavros
%A Hartmann, Olivier
%D January 5, 1998
%U http://www.cs.bu.edu/techreports/1998-001-aggregate-tcp-sequences
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper we present an extension of the TCP stack that allows a
sequence of TCP connections between the same machines to
share the congestion window. Our Linux implementation of this
scenario shows significant improvement in performance,
particularly when the individual connections are short-lived. Such a
behavior is common on the web, due to the nature of the HTTP protocol
and the distribution of file sizes.
%R 1998-002
%T Reliable Cellular Automata with Self-Organization
%A Gacs, Peter
%D January 15, 1998
%U http://www.cs.bu.edu/techreports/1998-002-long-ca-ms.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In a probabilistic cellular automaton in which all local transitions
have positive probability, the problem of keeping a bit of information
for more than a constant number of steps is nontrivial, even in an
infinite automaton.
Still, there is a solution in 2 dimensions, and this solution can be
used to construct a simple 3-dimensional discrete-time universal
fault-tolerant cellular automaton.
This technique does not help much to solve the following problems:
remembering a bit of information in 1 dimension; computing in
dimensions lower than 3; computing in any dimension with
non-synchronized transitions.
Our more complex technique organizes the cells in blocks that
perform a reliable simulation of a second (generalized) cellular
automaton.
The cells of the latter automaton are also organized in blocks,
simulating even more reliably a third automaton, etc.
Since all this (a possibly infinite hierarchy) is organized in
``software'', it must be under repair all the time from damage caused
by errors.
A large part of the problem is essentially self-stabilization
recovering from a mess of arbitrary-size and content caused by the
faults.
The present paper constructs an asynchronous one-dimensional
fault-tolerant cellular automaton, with the further feature of
``self-organization''.
The latter means that unless a large amount of input information must be
given, the initial configuration can be chosen to be periodical with a
small period.
%R 1998-003
%T Distributed Packet Rewriting and its Application to Scalable Server Architectures
%A Bestavros, Azer
%A Crovella, Mark
%A Liu, Jun
%A Martin, David
%D February 1, 1998
%U http://www.cs.bu.edu/techreports/1998-003-dpr.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
To construct high performance Web servers, system builders are
increasingly turning to distributed designs. An important challenge
that arises in distributed Web servers is the need to direct incoming
connections to individual hosts. Previous methods for connection
routing have employed a centralized node which handles all incoming
requests. In contrast, we propose a distributed approach, called
Distributed Packet Rewriting (DPR), in which all hosts of the
distributed system participate in connection routing. We argue that
this approach promises better scalability and fault-tolerance than the
centralized approach. We describe our implementation of four variants
of DPR and compare their performance. We show that DPR provides
performance comparable to centralized alternatives, measured in terms
of throughput and delay under the SPECweb96 benchmark. Finally, we
argue that DPR is particularly attractive both for small scale systems
and for systems following the emerging trend toward increasingly
hintelligent I/O subsystems.
%R 1998-004
%T Combining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web
%A La Cascia, Marco
%A Sethi, Sarathendu
%A Sclaroff, Stan
%D February 9, 1998
%U http://www.cs.bu.edu/techreports/1998-004-combining-text-and-vis-cues.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Some WWW image engines allow the user to form a query in terms of text
keywords. To build the image index, keywords are extracted
heuristically from HTML documents containing each image, and/or from
the image URL and file headers. Unfortunately, text-based image
engines have merely retro-fitted standard SQL database query methods,
and it is difficult to include images cues within such a framework. On
the other hand, visual statistics ({\em e.g.}, color histograms) are
often insufficient for helping users find desired images in a vast WWW
index. By truly unifying textual and visual statistics, one would
expect to get better results than either used separately.
In this paper, we propose an approach that allows the combination of
visual statistics with textual statistics in the vector space
representation commonly used in query by image content systems. Text
statistics are captured in vector form using latent semantic indexing
(LSI). The LSI index for an HTML document is then associated with
each of the images contained therein. Visual statistics ({\em e.g.},
color, orientedness) are also computed for each image. The LSI and
visual statistic vectors are then combined into a single index vector
that can be used for content-based search of the resulting image
database. By using an integrated approach, we are able to take
advantage of possible statistical couplings between the topic of the
document (latent semantic content) and the contents of images (visual
statistics). This allows improved performance in conducting
content-based search. This approach has been implemented in a WWW
image search engine prototype.
%R 1998-005
%T Preserving Bandwidth Through A Lazy Packet Discard Policy in ATM Networks
%A Kim, Gitae
%A Bestavros, Azer
%D February 9, 1998
%U http://www.cs.bu.edu/techreports/1998-005-lpd.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A number of recent studies have pointed out that TCP's performance
over ATM networks tends to suffer, especially under congestion and
switch buffer limitations. Switch-level enhancements and link-level
flow control have been proposed to improve TCP's performance in ATM
networks. Seletive Cell Discard (SCD) and Early Packet Discard (EPD)
ensure that partial packets are discarded from the network "as early
as possible", thus reducing wasted bandwidth. While such techniques
improve the achievable throughput, their effectiveness tends to
degrade in multi-hop networks.
In this paper, we introduce Lazy Packet Discard (LPD), an AAL-level
enhancement that improves effective throughput, reduces response time,
and minimizes wasted bandwidth for TCP/IP over ATM. In contrast to the
SCD and EPD policies, LPD delays as much as possible the removal from
the network of cells belonging to a partially communicated packet. We
outline the implementation of LPD and show the performance advantage
of TCP/LPD, compared to plain TCP and TCP/EPD through analysis and
simulations.
%R 1998-006
%T Active Voodoo Dolls: A Vision Based Input Device for Non-rigid Control
%A Isidoro, John
%A Sclaroff, Stan
%D February 16, 1998
%U http://www.cs.bu.edu/techreports/1998-006-active-voodoo-dolls.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A vision based technique for non-rigid control is presented that can be
used for animation and video game applications. The user grasps a soft,
squishable object in front of a camera that can be moved and deformed in
order to specify motion. Active Blobs, a non-rigid tracking technique
is used to recover the position, rotation and non-rigid deformations of
the object. The resulting transformations can be applied to a texture
mapped mesh, thus allowing the user to control it interactively. Our
use of texture mapping hardware allows us to make the system responsive
enough for interactive animation and video game character control.
%R 1998-007
%T Improved Tracking of Multiple Humans with Trajectory Predcition and Occlusion Modeling
%A Rosales, Romer
%A Sclaroff, Stan
%D March 2, 1998
%U http://www.cs.bu.edu/techreports/1998-007-tracking-multiple-humans.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A combined 2D, 3D approach is presented that allows for robust tracking
of moving bodies in a given environment as observed via a single,
uncalibrated video camera. Tracking is robust even in the presence of
occlusions. Low-level features are often insufficient for detection,
segmentation, and tracking of non-rigid moving objects. Therefore, an
improved mechanism is proposed that combines low-level (image
processing) and mid-level (recursive trajectory estimation) information
obtained during the tracking process. The resulting system can segment
and maintain the tracking of moving objects before, during, and after
occlusion. At each frame, the system also extracts a stabilized
coordinate frame of the moving objects. This stabilized frame is used
to resize and resample the moving blob so that it can be used as input
to motion recognition modules. The approach enables robust tracking
without constraining the system to know the shape of the objects being
tracked beforehand; although, some assumptions are made about the
characterstics of the shape of the objects, and how they evolve with
time. Experiments in tracking moving people are described.
%R 1998-008
%T Determining Acceptance Possibility for a Quantum Computation is Hard for PH
%A Fenner, Stephen
%A Green, Frederic
%A Homer, Steven
%A Pruim, Randall
%D April 2, 1998
%U http://www.cs.bu.edu/techreports/1998-008-quant.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
It is shown that determining whether a quantum computation has a
non-zero probability of accepting is at least as hard as the
polynomial time hierarchy. This hardness result also applies to
determining in general whether a given quantum basis state appears
with nonzero amplitude in a superposition, or whether a given quantum
bit has positive expectation value at the end of a quantum
computation.
%R 1998-009
%T Slack Stealing Job Admission Control Scheduling
%A Atlas, Alia
%A Bestavros, Azer
%D May 2, 1998
%U http://www.cs.bu.edu/techreports/1998-009-ssjac.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper, we present Slack Stealing Job Admission Control
(SSJAC)---a methodology for scheduling periodic firm-deadline tasks
with variable resource requirements, subject to controllable Quality
of Service (QoS) constraints. In a system that uses Rate Monotonic
Scheduling, SSJAC augments the slack stealing algorithm of Thuel et al
with an admission control policy to manage the variability in the
resource requirements of the periodic tasks. This enables SSJAC to
take advantage of the 31% of utilization that RMS cannot use, as well
as any utilization unclaimed by jobs that are not admitted into the
system.
Using SSJAC, each task in the system is assigned a resource
utilization threshold that guarantees the minimal acceptable QoS for
that task (expressed as an upper bound on the rate of missed
deadlines). Job admission control is used to ensure that (1) only
those jobs that will complete by their deadlines are admitted, and (2)
tasks do not interfere with each other, thus a job can only monopolize
the slack in the system, but not the time guaranteed to jobs of other
tasks.
We have evaluated SSJAC against RMS and Statistical RMS (SRMS).
Ignoring overhead issues, SSJAC consistently provides better
performance than RMS in overload, and, in certain conditions, better
performance than SRMS. In addition, to evaluate optimality of SSJAC
in an absolute sense, we have characterized the performance of SSJAC
by comparing it to an inefficient, yet optimal scheduler for task sets
with harmonic periods.
%R 1998-010
%T Statistical Rate Monotonic Scheduling
%A Atlas, Alia
%A Bestavros, Azer
%D May 2, 1998
%U http://www.cs.bu.edu/techreports/1998-010-srms.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper we present Statistical Rate Monotonic Scheduling (SRMS),
a generalization of the classical RMS results of Liu and Layland
that allows scheduling periodic tasks with highly variable execution
times and statistical QoS requirements. Similar to RMS, SRMS
has two components: a feasibility test and a scheduling
algorithm. The feasibility test for SRMS ensures that using SRMS'
scheduling algorithms, it is possible for a given periodic task set
to share a given resource (e.g. a processor, communication
medium, switching device, etc.) in such a way that such sharing does
not result in the violation of any of the periodic tasks QoS
constraints.
The SRMS scheduling algorithm incorporates a number of unique
features. First, it allows for fixed priority scheduling that keeps
the tasks' value (or importance) independent of their
periods. Second, it allows for job admission control, which allows
the rejection of jobs that are not guaranteed to finish by their
deadlines as soon as they are released, thus enabling the system to
take necessary compensating actions. Also, admission control allows
the preservation of resources since no time is spent on jobs that
will miss their deadlines anyway. Third, SRMS integrates
reservation-based and best-effort resource scheduling seamlessly.
Reservation-based scheduling ensures the delivery of the minimal
requested QoS; best-effort scheduling ensures that unused, reserved
bandwidth is not wasted, but rather used to improve QoS
further. Fourth, SRMS allows a system to deal gracefully with
overload conditions by ensuring a fair deterioration in QoS across
all tasks---as opposed to penalizing tasks with longer periods, for
example. Finally, SRMS has the added advantage that its
schedulability test is simple and its scheduling algorithm has a
constant overhead in the sense that the complexity of the scheduler
is not dependent on the number of the tasks in the system.
We have evaluated SRMS against a number of alternative scheduling
algorithms suggested in the literature (e.g. RMS and slack
stealing), as well as refinements thereof, which we describe in this
paper. Consistently throughout our experiments, SRMS provided the
best performance. In addition, to evaluate the optimality of SRMS,
we have compared it to an inefficient, yet optimal scheduler
for task sets with harmonic periods.
%R 1998-011
%T Multiplexing VBR Traffic Flows with Guaranteed Application-level QoS Using Statistical Rate Monotonic Scheduling
%A Atlas, Alia
%A Bestavros, Azer
%D May 2, 1998
%U http://www.cs.bu.edu/techreports/1998-011-srms-qos.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Quality of Service (QoS) guarantees are required by an increasing
number of applications to ensure a minimal level of fidelity in the
delivery of application data units through the network.
Application-level QoS does not necessarily follow from any
transport-level QoS guarantees regarding the delivery of the
individual cells (e.g. ATM cells) which comprise the application's
data units. The distinction between application-level and
transport-level QoS guarantees is due primarily to the fragmentation
that occurs when transmitting large application data units (e.g. IP
packets, or video frames) using much smaller network cells, whereby
the partial delivery of a data unit is useless; and, bandwidth spent
to partially transmit the data unit is wasted.
The data units transmitted by an application may vary in size while
being constant in rate, which results in a variable bit rate (VBR)
data flow. That data flow requires QoS guarantees. Statistical
multiplexing is inadequate, because no guarantees can be made and no
firewall property exists between different data flows. In this
paper, we present a novel resource management paradigm for the
maintenance of application-level QoS for VBR flows. Our paradigm is
based on Statistical Rate Monotonic Scheduling (SRMS), in which (1)
each application generates its variable-size data units at a fixed
rate, (2) the partial delivery of data units is of no value to the
application, and (3) the QoS guarantee extended to the application is
the probability that an arbitrary data unit will be successfully
transmitted through the network to/from the application.
%R 1998-012
%T The Statistical Rate Monotonic Scheduling Workbench
%A Atlas, Alia
%A Bestavros, Azer
%D May 2, 1998
%U http://www.cs.bu.edu/techreports/1998-012-srms-workbench
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The SRMS Workbench is a software system developed to demonstrate the
notion of Statistical QoS employed in SRMS [AtlasBestavros:1998]. The
SRMS Workbench includes: (1) the SRMS schedulability analyzer (QoS
negotiator), and (2) a SRMS simulator (Basic SRMS + all
extensions). These two components are packaged into a Java Applet that
can be executed remotely on any Java-capable Internet browser. For
comparison, other scheduling algorithms, including RMS
[LiuLayland:1973] and SSJAC [AtlasBestavros:1998] are included.
Through a simple GUI, the SRMS Workbench allows users to specify a set
of periodic tasks, each with (a) its own period, (b) the
distributional characteristics of its periodic resource requirements
(e.g. Poisson, Pareto, Normal, Exponential, Gamma, etc.), (c) its
desired QoS as a lower bound on the percentage of deadlines to be met,
and (d) a criticality/importance index indicating the value of the
task (relative to other tasks in the task set). Once the task set is
specified, the SRMS Workbench allows the user to check for
schedulability under SRMS. If the task set is schedulable, the SRMS
Workbench generates the appropriate allowance for each task and allows
the user to create an animated simulation of the task system, which
can be executed and profiled. If the task set is not schedulable, the
SRMS Workbench informs the user of that fact and suggests (as part of
the QoS negotiation) an alternative set of feasible QoS requirements
that reflects the specified criticality/importance index of the tasks
in the task set.
The SRMS Workbench is available on the Web at
http://www.cs.bu.edu/groups/realtime/SRMSworkbench
%R 1998-013
%T Design and Implementation of SRMS in Kurt Linux
%A Atlas, Alia
%A Bestavros, Azer
%D September 2, 1998
%U http://www.cs.bu.edu/techreports/1998-013-srms-linux-implementation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Statistical Rate Monotonic Scheduling (SRMS) is a generalization of
the classical RMS results of Liu and Layland \cite{ll:sched} for
periodic tasks with highly variable execution times and statistical
QoS requirements. The main tenet of SRMS is that the variability in
task resource requirements could be smoothed through aggregation to
yield guaranteed QoS. This aggregation is done over time for a given
task and across multiple tasks for a given period of time. Similar
to RMS, SRMS has two components: a feasibility test and a scheduling
algorithm. SRMS feasibility test ensures that it is possible for a
given periodic task set to share a given resource without violating
any of the statistical QoS constraints imposed on each task in the
set. The SRMS scheduling algorithm consists of two parts: a job
admission controller and a scheduler. The SRMS scheduler is a
simple, preemptive, fixed-priority scheduler. The SRMS job admission
controller manages the QoS delivered to the various tasks through
admit/reject and priority assignment decisions. In particular, it
ensures the important property of task isolation, whereby tasks do
not infringe on each other.
In this paper we present the design and implementation of SRMS within
the KURT Linux Operating System. KURT Linux supports conventional
tasks as well as real-time tasks.
It provides a mechanism for transitioning from normal
Linux scheduling to a mixed scheduling of conventional and real-time
tasks, and to a focused mode where only real-time tasks are scheduled.
We overview the technical issues that we had to overcome in order to
integrate SRMS into KURT Linux and present the API we have developed
for scheduling periodic real-time tasks using SRMS.
%R 1998-014
%T An Omniscient Scheduling Oracle for Systems with Harmonic Periods
%A Atlas, Alia
%A Bestavros, Azer
%D September 2, 1998
%U http://www.cs.bu.edu/techreports/1998-014-omniscient-harmonic-scheduling.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Most real-time scheduling problems are known to be NP-complete.
To enable accurate comparison between the schedules of heuristic
algorithms and the optimal schedule, we introduce an omniscient
oracle. This oracle provides schedules for periodic task sets with
harmonic periods and variable resource requirements. Three different
job value functions are described and implemented. Each corresponds
to a different system goal.
The oracle is used to examine the performance of different on-line
schedulers under varying loads, including overload. We have compared
the oracle against Rate Monotonic Scheduling, Statistical Rate
Monotonic Scheduling, and Slack Stealing Job Admission Control
Scheduling. Consistently, the oracle provides an upper bound on
performance for the metric under consideration.
%R 1998-015
%T Principality and Decidable Type Inference for Finite-Rank Intersection Types
%A Kfoury, Assaf J.
%A Wells, Joe B.
%D November 6, 1998
%U http://www.cs.bu.edu/techreports/1998-015-finite-rank-intersection-types.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Principality of typings is the property that for each
typable term, there is a typing from which all other typings are
obtained via some set of operations. Type inference is the problem
of finding a typing for a given term, if possible. We define an
intersection type system which has principal typings and types
exactly the strongly normalizable $\lambda$-terms. More interestingly,
every finite-rank restriction of this system (using Leivant's first
notion of rank) has principal typings and also has decidable type
inference. This is in contrast to System~F where the finite rank
restriction for every finite rank at 3 and above has neither principal
typings nor decidable type inference. This is also in contrast to
earlier presentations of intersection types where the status of these
properties is not known for the finite-rank restrictions at 3 and above.
Furthermore, the notion of principal typings for our system involves
only one operation, substitution, rather than several operations
(not all substitution-based) as in earlier presentations of
principality for intersection types (of unrestricted rank).
A unification-based type inference algorithm is presented using a
new form of unification, $\beta$-unification.
%R 1998-016
%T A Performance Evaluation of Hyper Text Transfer Protocols
%A Barford, Paul
%A Crovella, Mark
%D October 23, 1998
%U http://www.cs.bu.edu/techreports/1998-016-http-protocols-evaluation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Version 1.1 of the Hyper Text Transfer Protocol (HTTP) was principally
developed as a means for reducing both document transfer latency and
network traffic. The rationale for the performance enhancements in HTTP/1.1
is based on the assumption that the network is the bottleneck in Web
transactions. In practice, however, the Web server can be the primary
source of document transfer latency. In this paper, we characterize and
compare the performance of HTTP/1.0 and HTTP/1.1 in terms of throughput at
the server and transfer latency at the client. Our approach
is based on considering a broader set of bottlenecks in an HTTP transfer;
we examine how bottlenecks in the network, CPU, and in the disk system
affect the relative performance of HTTP/1.0 versus HTTP/1.1. We show that
the network demands under HTTP/1.1 are somewhat lower than HTTP/1.0, and we
quantify those differences in terms of packets transferred, server
congestion window size and data bytes per packet. We show that when the
CPU is the bottleneck, there is relatively little difference in performance
between HTTP/1.0 and HTTP/1.1. Surprisingly, we show that when the disk
system is the bottleneck, performance using HTTP/1.1 can be much worse
than with HTTP/1.0. Based on these observations, we suggest a connection
management policy for HTTP/1.1 that can improve throughput, decrease
latency, and keep network traffic low when the disk system is the bottleneck.
%R 1998-017
%T Deformable Shape Detection and Description via Model-Based Region Grouping
%A Liu, Lifeng
%A Sclaroff, Stan
%D December 4, 1998
%U http://www.cs.bu.edu/techreports/1998-017-deformable-shape-detection-and-description.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A method for deformable shape detection and recognition is described.
Deformable shape templates are used to partition the image into a
globally consistent interpretation, determined in part by the minimum
description length principle. Statistical shape models enforce the
prior probabilities on global, parametric deformations for each object
class. Once trained, the system autonomously segments deformed shapes
from the background, while not merging them with adjacent objects or
shadows. The formulation can be used to group image regions based on
any image homogeneity predicate; e.g., texture, color, or motion. The
recovered shape models can be used directly in object recognition.
Experiments with color imagery are reported.
Note: This TR supercedes BUCS-TR-1997-019
%R 1998-018
%T Fast, Reliable Head Tracking under Varying Illumination
%A La Cascia, Marco
%A Sclaroff, Stan
%D December 4, 1998
%U http://www.cs.bu.edu/techreports/1998-018-fast-reliable-head-tracking.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
An improved technique for 3D head tracking under varying illumination
conditions is proposed. The head is modeled as a texture mapped
cylinder. Tracking is formulated as an image registration problem in the
cylinder's texture map image. To solve the registration problem in the
presence of lighting variation and head motion, the residual error of
registration is modeled as a linear combination of texture warping
templates and orthogonal illumination templates. Fast and stable on-line
tracking is then achieved via regularized, weighted least squares
minimization of the registration error. The regularization term tends to
limit potential ambiguities that arise in the warping and illumination
templates. It enables stable tracking over extended sequences. Tracking
does not require a precise initial fit of the model; the system is
initialized automatically using a simple 2-D face detector. The only
assumption is that the target is facing the camera in the first frame of
the sequence. The warping templates are computed at the first frame of
the sequence. Illumination templates are precomputed off-line over a
training set of face images collected under varying lighting conditions.
Experiments in tracking are reported.
%R 1998-019
%T 3D Trajectory Recovery for Tracking Multiple Objects and Trajectory Guided Recognition of Actions
%A Rosales, Romer
%A Sclaroff, Stan
%D December 4, 1998
%U http://www.cs.bu.edu/techreports/1998-019-3D-trajectory-guided-action-recog.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A mechanism is proposed that integrates low-level (image
processing), mid-level (recursive 3D trajectory estimation), and
high-level (action recognition) processes. It is assumed that the
system observes multiple moving objects via a single, uncalibrated
video camera. A novel extended Kalman filter formulation is used in
estimating the relative 3D motion trajectories up to a scale
factor. The recursive estimation process provides a prediction and
error measure that is exploited in higher-level stages of action
recognition. Conversely, higher-level mechanisms provide feedback
that allows the system to reliably segment and maintain the tracking
of moving objects before, during, and after occlusion. The 3D
trajectory, occlusion, and segmentation information are utlized in
extracting stabilized views of the moving object. Trajectory-guided
recognition (TGR) is proposed as a new and efficient method for
adaptive classification of action. The TGR approach is demonstrated
using ``motion history images'' that are then recognized via a mixture
of Gaussian classifier. The system was tested in recognizing
various dynamic human outdoor activities; e.g., running, walking,
roller blading, and cycling. Experiments with synthetic data sets are
used to evaluate stability of the trajectory estimator with respect to
noise.
%R 1998-020
%T Recognition of Human Action Using Moment-Based Features
%A Rosales, Romer
%D December 4, 1998
%U http://www.cs.bu.edu/techreports/1998-020-human-action-classification.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The performance of different classification approaches is evaluated
using a view-based approach for motion representation. The view-based
approach uses computer vision and image processing techniques to
register and process the video sequence. Two motion representations
called Motion Energy Images and Motion History Image are then
constructed. These representations collapse the temporal component in a
way that no explicit temporal analysis or sequence matching is needed.
Statistical descriptions are then computed using moment-based features
and dimensionality reduction techniques. For these tests, we used 7 Hu
moments, which are invariant to scale and translation. Principal
Components Analysis is used to reduce the dimensionality of this
representation. The system is trained using different subjects
performing a set of examples of every action to be recognized. Given
these samples, K-nearest neighbor, Gaussian, and Gaussian mixture
classifiers are used to recognize new actions. Experiments are conducted
using instances of eight human actions (i.e., eight classes) performed
by seven different subjects. Comparisons in the performance among these
classifiers under different conditions are analyzed and reported. Our
main goals are to test this dimensionality-reduced representation of
actions, and more importantly to use this representation to compare the
advantages of different classification approaches in this recognition
task.
%R 1998-023
%T Changes in Web Client Access Patterns: Characteristics and Caching Implications
%A Barford, Paul
%A Bestavros, Azer
%A Bradley, Adam
%A Crovella, Mark
%D December 4, 1998
%U http://www.cs.bu.edu/techreports/1998-023-web-client-trace-changes-and-implications.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Understanding the nature of the workloads and system demands created by
users of the World Wide Web is crucial to properly designing and
provisioning Web services. Previous measurements of Web client
workloads have been shown to exhibit a number of characteristic
features; however, it is not clear how those features may be changing
with time. In this study we compare two measurements of Web client
workloads separated in time by three years, both captured from the same
computing facility at Boston University. The older dataset, obtained in
1995, is well-known in the research literature and has been the basis
for a wide variety of studies. The newer dataset was captured in 1998
and is comparable in size to the older dataset. The new dataset has the
drawback that the collection of users measured may no longer be
representative of general Web users; however using it has the advantage
that many comparisons can be drawn more clearly than would be possible
using a new, different source of measurement. Our results fall into two
categories. First we compare the statistical and distributional
properties of Web requests across the two datasets. This serves to
reinforce and deepen our understanding of the characteristic statistical
properties of Web client requests. We find that the kinds of
distributions that best describe document sizes have not changed between
1995 and 1998, although specific values of the distributional parameters
are different. Second, we explore the question of how the observed
differences in the properties of Web client requests, particularly the
popularity and temporal locality properties, affect the potential for
Web file caching in the network. We find that for the computing
facility represented by our traces between 1995 and 1998, (1) the
benefits of using size-based caching policies have diminished; and (2)
the potential for caching requested files in the network has declined.
%R 1999-001
%T Load Balancing a Cluster of Web Servers using Distributed Packet Rewriting
%A Aversa, Luis
%A Bestavros, Azer
%D January 6, 1999
%U http://www.cs.bu.edu/techreports/1999-001-dpr-cluster-load-balancing
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper, we propose and evaluate an implementation of a
prototype scalable web server. The prototype consists of a
load-balanced cluster of hosts that collectively accept and service
TCP connections. The host IP addresses are advertised using the Round
Robin DNS technique, allowing any host to receive requests from any
client. Once a client attempts to establish a TCP connection with one
of the hosts, a decision is made as to whether or not the connection
should be redirected to a different host---namely, the host with the
lowest number of established connections. We use the low-overhead
Distributed Packet Rewriting (DPR) technique to redirect TCP
connections. In our prototype, each host keeps information about
connections in hash tables and linked lists. Every time a packet
arrives, it is examined to see if it has to be redirected or not. Load
information is maintained using periodic broadcasts amongst the
cluster hosts.
%R 1999-002
%T Trajectory Guided Tracking and Recognition of Actions
%A Rosales, Romer
%A Sclaroff, Stan
%D March 9, 1999
%U http://www.cs.bu.edu/techreports/1999-002-trajector-guided-tracking-and-action-recognition.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A combined 2D, 3D approach is presented that allows for robust
tracking of moving people and recognition of actions. It is assumed
that the system observes multiple moving objects via a single,
uncalibrated video camera. Low-level features are often insufficient
for detection, segmentation, and tracking of non-rigid moving
objects. Therefore, an improved mechanism is proposed that integrates
low-level (image processing), mid-level (recursive 3D trajectory
estimation), and high-level (action recognition) processes. A novel
extended Kalman filter formulation is used in estimating the relative
3D motion trajectories up to a scale factor. The recursive estimation
process provides a prediction and error measure that is exploited in
higher-level stages of action recognition. Conversely, higher-level
mechanisms provide feedback that allows the system to reliably segment
and maintain the tracking of moving objects before, during, and after
occlusion. The 3D trajectory, occlusion, and segmentation information
are utilized in extracting stabilized views of the moving object that
are then used as input to action recognition modules.
Trajectory-guided recognition (TGR) is proposed as a new and efficient
method for adaptive classification of action. The TGR approach is
demonstrated using ``motion history images'' that are then recognized
via a mixture-of-Gaussians classifier. The system was tested in
recognizing various dynamic human outdoor activities: running,
walking, roller blading, and cycling. Experiments with real and
synthetic data sets are used to evaluate stability of the trajectory
estimator with respect to noise.
(This technical report supercedes TR's [98-020] and [98-007])
%R 1999-003
%T Connection Scheduling in Web Servers
%A Crovella, Mark
%A Frangioso, Robert
%A Harchol-Balter, Mor
%D March 31, 1999
%U http://www.cs.bu.edu/techreports/1999-003-connection-scheduling-in-web-servers.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Under high loads, a Web server may be servicing many hundreds of
connections concurrently. In traditional Web servers, the question of
the order in which concurrent connections are serviced has been left to
the operating system. In this paper we ask whether servers might
provide better service by using non-traditional service ordering. In
particular, for the case when a Web server is serving static files, we
examine the costs and benefits of a policy that gives preferential
service to short connections. We start by assessing the scheduling
behavior of a commonly used server (Apache running on Linux) with
respect to connection size and show that it does not appear to provide
preferential service to short connections. We then examine the
potential performance improvements of a policy that does favor short
connections (shortest-connection-first). We show that
mean response time can be improved by factors of four or five under
shortest-connection-first, as compared to an (Apache-like)
size-independent policy. Finally we assess the costs of
shortest-connection-first scheduling in terms of unfairness (i.e., the
degree to which long connections suffer). We show
that under shortest-connection-first scheduling, long connections pay
very little penalty. This surprising result can be understood as a
consequence of heavy-tailed Web server workloads, in which most connections
are small, but most server load is due to the few large connections.
We support this explanation using analysis.
%R 1999-004
%T Measuring Web Performance in the Wide Area
%A Barford, Paul
%A Crovella, Mark
%D April 23, 1999
%U http://www.cs.bu.edu/techreports/1999-004-wide-area-web-measurement.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
One of the most vexing questions facing researchers interested in
the World Wide Web is why users often experience long delays in
document retrieval. The Internet's size, complexity, and continued
growth make this a difficult question to answer. We describe the Wide
Area Web Measurement project (WAWM) which uses an infrastructure
distributed across the Internet to study Web performance. The
infrastructure enables simultaneous measurements of Web client
performance, network performance and Web server performance. The
infrastructure uses a Web traffic generator to create representative
workloads on servers, and both active and passive tools to measure
performance characteristics. Initial results based on a prototype
installation of the infrastructure are presented in this paper.
%R 1999-005
%T Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models
%A La Cascia, Marco
%A Sclaroff, Stan
%A Athitsos, Vassilis
%D April 23, 1999
%U http://www.cs.bu.edu/techreports/1999-005-HeadTrack.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
An improved technique for 3D head tracking under varying
illumination conditions is proposed. The head is modeled as a texture
mapped cylinder. Tracking is formulated as an image registration
problem in the cylinder's texture map image. The resulting dynamic
texture map provides a stabilized view of the face that can be used as
input to many existing 2D techniques for face recognition, facial
expressions analysis, lip reading, and eye tracking. To solve the
registration problem in the presence of lighting variation and head
motion, the residual error of registration is modeled as a linear
combination of texture warping templates and orthogonal illumination
templates. Fast and stable on-line tracking is achieved via
regularized, weighted least squares minimization of the registration
error. The regularization term tends to limit potential ambiguities
that arise in the warping and illumination templates. It enables
stable tracking over extended sequences. Tracking does not require a
precise initial fit of the model; the system is initialized
automatically using a simple 2D face detector. The only assumption is
that the target is facing the camera in the first frame of the
sequence. The formulation is tailored to take advantage of texture
mapping hardware available in many workstations, PC's, and game
consoles. The non-optimized implementation runs at about 15 frames per
second on a SGI O2 graphic workstation. Extensive experiments
evaluating the effectiveness of the formulation are reported. The
sensitivity of the technique to illumination, regularization
parameters, errors in the initial positioning and internal camera
parameters are analyzed. Examples and applications of tracking are
reported.
%R 1999-006
%T Non-Rigid Shape from Image Streams
%A Sclaroff, Stan
%A Alon, Jonathan
%D July 27, 1999
%U http://www.cs.bu.edu/techreports/1999-006-NonrigidShape.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a framework for estimating 3D relative structure
(shape) and motion given objects undergoing nonrigid deformation as
observed from a fixed camera, under perspective projection. Deforming
surfaces are approximated as piece-wise planar, and piece-wise rigid.
Robust registration methods allow tracking of corresponding image
patches from view to view and recovery of 3D shape despite occlusions,
discontinuities, and varying illumination conditions. Many relatively
small planar/rigid image patch trackers are scattered throughout the
image; resulting estimates of structure and motion at each patch are
combined over local neighborhoods via an oriented particle systems
formulation. Preliminary experiments have been conducted on real
image sequences of deforming objects and on synthetic sequences where
ground truth is known.
%R 1999-007
%T Combinations of Deformable Shape Prototypes
%A Sethi, Saratendu
%A Sclaroff, Stan
%D July 27, 1999
%U http://www.cs.bu.edu/techreports/1999-007-PrototypeCombinations.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose to investigate a model-based technique for encoding
non-rigid object classes in terms of object prototypes. Objects from
the same class can be parameterized by identifying shape and appearance
invariants of the class to devise low-level representations. The
approach presented here creates a flexible model for an object class
from a set of prototypes. This model is then used to estimate the
parameters of low-level representation of novel objects as
combinations of the prototype parameters. Variations in the object
shape are modeled as non-rigid deformations. Appearance variations are
modeled as intensity variations. In the training phase, the system is
presented with several example prototype images. These prototype
images are registered to a reference image by a finite element-based
technique called Active Blobs. The deformations of the finite
element model to register a prototype image with the reference image
provide the shape description or shape vector for the
prototype. The shape vector for each prototype, is then used to warp
the prototype image onto the reference image and obtain the
corresponding texture vector. The prototype texture vectors,
being warped onto the same reference image have a pixel by pixel
correspondence with each other and hence are ``shape normalized''.
Given sufficient number of prototypes that exhibit appropriate
in-class variations, the shape and the texture vectors define a linear
prototype subspace that spans the object class. Each prototype is a
vector in this subspace. The matching phase involves the estimation of
a set of combination parameters for synthesis of the novel object by
combining the prototype shape and texture vectors. The strengths of
this technique lie in the combined estimation of both shape and
appearance parameters. This is in contrast with the previous
approaches where shape and appearance parameters were estimated
separately.
%R 1999-008
%T Optimal Scheduling of Secondary Content for Aggregation in Video-on-Demand Systems
%A Basu, Prithwish
%A Narayanan, Ashok
%A Ke, Wang
%A Little, Tom
%A Bestavros, Azer
%D July 27, 1999
%U http://www.cs.bu.edu/techreports/1999-008-vod-ad-scheduling.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Dynamic service aggregation techniques can exploit skewed access
popularity patterns to reduce the costs of building interactive VoD
systems. These schemes seek to cluster and merge users into single
streams by bridging the temporal skew between them, thus improving
server and network utilization. Rate adaptation and secondary content
insertion are two such schemes.
In this paper, we present and evaluate an optimal scheduling algorithm
for inserting secondary content in this scenario. The algorithm runs
in polynomial time, and is optimal with respect to the total bandwidth
usage over the merging interval. We present constraints on content
insertion which make the overall QoS of the delivered stream
acceptable, and show how our algorithm can satisfy these
constraints. We report simulation results which quantify the excellent
gains due to content insertion. We discuss dynamic scenarios with user
arrivals and interactions, and show that content insertion reduces the
channel bandwidth requirement to almost half. We also discuss
differentiated service techniques, such as N-VoD and premium
no-advertisement service, and show how our algorithm can support these
as well. (This report is cross listed as BU ECE Department Technical
Report: TR-12-16-98)
%R 1999-009
%T Popularity-Aware GreedyDual-Size Web Proxy Caching Algorithms
%A Jin, Shudong
%A Bestavros, Azer
%D August 21, 1999
%U http://www.cs.bu.edu/techreports/1999-009-gdsp-web-proxy-caching.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Web caching aims to reduce network traffic, server load, and
user-perceived retrieval delays by replicating ``popular'' content
on proxy caches that are strategically placed within the
network. While key to effective cache utilization, popularity
information (e.g. relative access frequencies of objects
requested through a proxy) is seldom incorporated directly in
cache replacement algorithms. Rather, other properties of the
request stream (e.g. temporal locality and content size), which are
easier to capture in an on-line fashion, are used to
indirectly infer popularity information, and hence drive cache
replacement policies. Recent studies suggest that the correlation
between these secondary properties and popularity is weakening due
in part to the prevalence of efficient client and proxy caches
(which tend to mask these correlations). This trend points to the
need for proxy cache replacement algorithms that directly capture
and use popularity information.
In this paper, we (1) present an on-line algorithm that effectively
captures and maintains an accurate popularity profile of Web objects
requested through a caching proxy, (2) propose a novel cache
replacement policy that uses such information to generalize the
well-known GreedyDual-Size algorithm, and (3) show the superiority
of our proposed algorithm by comparing it to a host of
recently-proposed and widely-used algorithms using extensive
trace-driven simulations and a variety of performance metrics.
%R 1999-010
%T A Fully Distributed Location Management Scheme for Large PCS
%A Ratnam, Karunaharan
%A Matta, Ibrahim
%A Rangarajan, Sampath
%D August 24, 1999
%U http://www.cs.bu.edu/techreports/1999-010-dist-location-mgmt-pcs.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In [previous papers] we presented the design, specification and proof
of correctness of a fully distributed location management scheme for
PCS networks and argued that fully replicating location information is
both appropriate and efficient for small PCS networks. In this paper,
we analyze the performance of this scheme. Then, we extend the scheme
in a hierarchical environment so as to scale to large PCS networks.
Through extensive numerical results, we show the superiority of our
scheme compared to the current IS-41 standard.
%R 1999-011
%T Boston University, Computer Science 1998 Proxy Trace
%A Bradley, Adam
%D September 7, 1999
%U http://www.cs.bu.edu/techreports/1999-011-usertrace-98-release-notes.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In a recent paper (Changes in Web Client Access Patterns:
Characteristics and Caching Implications by Barford, Bestavros,
Bradley, and Crovella) we performed a variety of analyses upon user
traces collected in the Boston University Computer Science department
in 1995 and 1998. A sanitized version of the 1995 trace has been
publicly available for some time; the 1998 trace has now been
sanitized, and is available from:
http://www.cs.bu.edu/techreports/1999-011-usertrace-98.gz
ftp://ftp.cs.bu.edu/techreports/1999-011-usertrace-98.gz
This memo discusses the format of this public version of the log,
and includes additional discussion of how the data was collected,
how the log was sanitized, what this log is and is not useful for,
and areas of potential future research interest.
%R 1999-012
%T Adaptive Reliable Multicast
%A Yoon, Jaehee
%A Bestavros, Azer
%A Matta, Ibrahim
%D September 15, 1999
%U http://www.cs.bu.edu/techreports/1999-012-arm.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
An increasing number of applications, such as distributed interactive
simulation, live auctions, distributed games and collaborative
systems, require the network to provide a reliable multicast
service. This service enables one sender to reliably transmit data
to multiple receivers. Reliability is traditionally achieved by
having receivers send negative acknowledgments (NACKs) to request
from the sender the retransmission of lost (or missing) data
packets. However, this Automatic Repeat reQuest (ARQ) approach
results in the well-known NACK implosion problem at the
sender. Many reliable multicast protocols have been recently
proposed to reduce NACK implosion. But, the message overhead due to
NACK requests remains significant. Another approach, based on
Forward Error Correction (FEC), requires the sender to encode
additional redundant information so that a receiver can
independently recover from losses. However, due to the lack of
feedback from receivers, it is impossible for the sender to
determine how much redundancy is needed.
In this paper, we propose a new reliable multicast protocol, called
ARM for Adaptive Reliable Multicast. Our protocol integrates
ARQ and FEC techniques. The objectives of ARM are (1) reduce the
message overhead due to NACK requests, (2) reduce the amount of data
transmission, and (3) reduce the time it takes for all receivers to
receive the data intact (without loss). During data transmission,
the sender periodically informs the receivers of the number of
packets that are yet to be transmitted. Based on this information,
each receiver predicts whether this amount is enough to recover its
losses. Only if it is not enough, that the receiver requests the
sender to encode additional redundant packets. Using ns
simulations, we show the superiority of our hybrid ARQ-FEC protocol
over the well-known Scalable Reliable Multicast (SRM) protocol.
%R 1999-013
%T Search Space Reduction in QoS Routing
%A Guo, Liang
%A Matta, Ibrahim
%D October 8, 1999
%U http://www.cs.bu.edu/techreports/1999-013-search-qos-routing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
To provide real-time service or engineer constrained-based paths,
networks require the underlying routing algorithm to be able to find
low-cost paths that satisfy given Quality-of-Service (QoS)
constraints. However, the problem of constrained shortest (least-cost)
path routing is known to be NP-hard, and some heuristics have been
proposed to find a near-optimal solution. However, these heuristics
either impose relationships among the link metrics to reduce the
complexity of the problem which may limit the general applicability of
the heuristic, or are too costly in terms of execution time to be
applicable to large networks. In this paper, we focus on solving the
delay-constrained minimum-cost path problem, and present a fast
algorithm to find a near-optimal solution. This algorithm, called
DCCR (for Delay-Cost-Constrained Routing), is a variant of the
k-shortest path algorithm. DCCR uses a new adaptive path weight
function together with an additional constraint imposed on the path
cost, to restrict the search space. Thus, DCCR can return a
near-optimal solution in a very short time. Furthermore, we use the
method proposed by Blokh and Gutin to further reduce the search space
by using a tighter bound on path cost. This makes our algorithm more
accurate and even faster. We call this improved algorithm SSR+DCCR
(for Search Space Reduction+DCCR). Through extensive simulations, we
confirm that SSR+DCCR performs very well compared to the optimal but
very expensive solution.
(This technical report revises TR NU-CCS-98-09.)
%R 1999-014
%T Temporal Locality in Web Request Streams: Sources, Characteristics, and Caching Implications
%A Jin, Shudong
%A Bestavros, Azer
%D October 10, 1999
%U http://www.cs.bu.edu/techreports/1999-014-web-temporal-locality.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Temporal locality of reference in Web request streams emerges from two
distinct phenomena: the popularity of Web objects and the {\em
temporal correlation} of requests. Capturing these two elements of
temporal locality is important because it enables cache replacement
policies to adjust how they capitalize on temporal locality based on
the relative prevalence of these phenomena. In this paper, we show
that temporal locality metrics proposed in the literature are unable
to delineate between these two sources of temporal locality. In
particular, we show that the commonly-used distribution of reference
interarrival times is predominantly determined by the power law
governing the popularity of documents in a request stream.
To capture (and more importantly quantify) both sources of temporal
locality in a request stream, we propose a new and robust metric that
enables accurate delineation between locality due to popularity and
that due to temporal correlation. Using this metric, we characterize
the locality of reference in a number of representative proxy cache
traces. Our findings show that there are measurable differences
between the degrees (and sources) of temporal locality across these
traces, and that these differences are effectively captured using our
proposed metric. We illustrate the significance of our findings by
summarizing the performance of a novel Web cache replacement
policy---called GreedyDual*---which exploits both long-term popularity
and short-term temporal correlation in an adaptive fashion. Our
trace-driven simulation experiments (which are detailed in an
accompanying Technical Report) show the superior performance of
GreedyDual* when compared to other Web cache replacement policies.
%R 1999-015
%T Estimation and Prediction of Evolving Color Distributions for Skin Segmentation Under Varying Illumination
%A Sigal, Leonid
%A Sclaroff, Stan
%D December 1, 1999
%U http://www.cs.bu.edu/techreports/1999-015-ColorSkinTracker.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A novel approach for real-time skin segmentation in video sequences is
described. The approach enables reliable skin segmentation despite
wide variation in illumination during tracking. An explicit second
order Markov model is used to predict evolution of the skin color
(HSV) histogram over time. Histograms are dynamically updated based
on feedback from the current segmentation and based on predictions of
the Markov model. The evolution of the skin color distribution at
each frame is parameterized by translation, scaling and rotation in
color space. Consequent changes in geometric parameterization of the
distribution are propagated by warping and re-sampling the
histogram. The parameters of the discrete-time dynamic Markov model
are estimated using Maximum Likelihood Estimation, and also evolve
over time. Quantitative evaluation of the method was conducted on
labeled ground-truth video sequences taken from popular movies.
%R 1999-016
%T Recursive Estimation of Motion and Planar Structure
%A Alon, Jonathan
%A Sclaroff, Stan
%D December 1, 1999
%U http://www.cs.bu.edu/techreports/1999-016-PlanarStructureFromMotion.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A specialized formulation of Azarbayejani and Pentland's framework for
recursive recovery of motion, structure and focal length from feature
correspondences tracked through an image sequence is presented. The
specialized formulation addresses the case where all tracked points
lie on a plane. This planarity constraint reduces the dimension of the
original state vector, and consequently the number of feature points
needed to estimate the state. Experiments with synthetic data and real
imagery illustrate the system performance. The experiments confirm
that the specialized formulation provides improved accuracy, stability
to observation noise, and rate of convergence in estimation for the
case where the tracked points lie on a plane.
%R 1999-017
%T Inferring Body Pose without Tracking Body Parts
%A Rosales, Romer
%A Sclaroff, Stan
%D December 1, 1999
%U http://www.cs.bu.edu/techreports/1999-017-BodyPose.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A novel approach for estimating articulated body posture and motion
from monocular video sequences is proposed. Human pose is defined as
the instantaneous two dimensional configuration (i.e., the projection
onto the image plane) of a single articulated body in terms of the
position of a predetermined set of joints. First, statistical
segmentation of the human bodies from the background is performed and
low-level visual features are found given the segmented body
shape. The goal is to be able to map these, generally low level,
visual features to body configurations. The system estimates
different mappings, each one with a specific cluster in the visual
feature space. Given a set of body motion sequences for training,
unsupervised clustering is obtained via the Expectation Maximation
algorithm. Then, for each of the clusters, a function is estimated to
build the mapping between low-level features to 3D pose. Currently
this mapping is modeled by a neural network. Given new visual
features, a mapping from each cluster is performed to yield a set of
possible poses. From this set, the system selects the most likely pose
given the learned probability distribution and the visual feature
similarity between hypothesis and input. Performance of the proposed
approach is characterized using a new set of known body postures,
showing promising results.
%R 1999-018
%T SomeCast: A Paradigm for Real-Time Adaptive Reliable Multicast
%A Yoon, Jaehee
%A Bestavros, Azer
%A Matta, Ibrahim
%D December 10, 1999
%U http://www.cs.bu.edu/techreports/1999-018-SomeCast.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
SomeCast is a novel paradigm for the reliable multicast of real-time
data to a large set of receivers over the Internet. SomeCast is
receiver-initiated and thus scalable in the number of receivers, the
diverse characteristics of paths between senders and receivers
(e.g. maximum bandwidth and round-trip-time), and the dynamic
conditions of such paths (e.g. congestion-induced delays and
losses). SomeCast enables receivers to dynamically adjust the rate at
which they receive multicast information to enable the satisfaction of
real-time QoS constraints (e.g. rate, deadlines, or jitter). This is
done by enabling a receiver to join SOME number of concurrent
multiCAST sessions, whereby each session delivers a portion of an
encoding of the real-time data. By adjusting the number of such
sessions dynamically, client-specific QoS constraints can be met
independently. The SomeCast paradigm can be thought of as a
generalization of the AnyCast (e.g. Dynamic Server Selection) and
ManyCast (e.g. Digital Fountain) paradigms, which have been proposed
in the literature to address issues of scalability of UniCast and
MultiCast environments, respectively.
In this paper we overview the SomeCast paradigm, describe an
instance of a SomeCast protocol, and present simulation results that
quantify the significant advantages gained from adopting such a
protocol for the reliable multicast of data to a diverse set of
receivers subject to real-time QoS constraints.
%R 1999-019
%T BU/NSF Workshop on Internet Measurement Instrumentation and Characterization
%A Bestavros, Azer
%A Byers, John
%A Crovella, Mark
%A Barford, Paul
%A Matta, Ibrahim
%A Mitzenmacher, Michael
%D December 15, 1999
%U http://www.cs.bu.edu/techreports/1999-019-imic-final-report
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Because of its growth in size, scope, and complexity---as well as its
increasingly central role in society---the Internet has become an
important object of study and evaluation. Many significant innovations
in the networking community in recent years have been directed at
obtaining a more accurate understanding of the fundamental behavior of
the complex system that is the Internet. These innovations have come
in the form of better models of components of the system, better tools
which enable us to measure the performance of the system more
accurately, and new techniques coupled with performance evaluation
which have delivered better system utilization. The continued
development and improvement of our understanding of the properties of
the Internet is essential to guide designers of hardware, protocols,
and applications for the next decade of Internet growth.
As a research community, an important next step involves an
comprehensive look at the challenges that lie ahead in this area. This
includes an an evaluation of both the current unsolved challenges and
the upcoming challenges the Internet will present us with in the near
future, and a discussion of the promising new techniques that
innovators in the field are currently developing. To this end, the
Networking Research Group at Boston University, with support from the
National Science Foundation, organized a one-day workshop which was
held at Boston University on Monday, August 30, 1999. This report
summarizes the technical presentations and discussions that took place
during that workshop.
%R 2000-001
%T Faithful Translations between Polyvariant Flows and Polymorphic Types
%A Amtoft, Torben
%A Turbak, Franklyn
%D January 10, 2000
%U http://www.cs.bu.edu/techreports/2000-001-polyvariant-flows-to-polymorphic-types.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Recent work has shown equivalences between various type systems and
flow logics. Ideally, the translations upon which such equivalences
are based should be faithful in the sense that information is not lost
in round-trip translations from flows to types and back or from types
to flows and back. Building on the work of Nielson & Nielson and of
Palsberg & Pavlopoulou, we present the first faithful translations
between a class of finitary polyvariant flow analyses and a type
system supporting polymorphism in the form of intersection and union
types. Additionally, our flow/type correspondence solves several open
problems posed by Palsberg & Pavlopoulou: (1) it expresses call-string
based polyvariance (such as k-CFA) as well as argument based
polyvariance; (2) it enjoys a subject reduction property for flows as
well as for types; and (3) it supports a flow-oriented perspective
rather than a type-oriented one.
%R 2000-002
%T Determining Acceptance Possibility for a Quantum Computation is Hard for the Polynomial Hierarchy
%A Fenner, Stephen
%A Green, Frederic
%A Homer, Steven
%A Pruim, Randall
%D January 20, 2000
%U http://www.cs.bu.edu/techreports/2000-002-quant1.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
It is shown that determining whether a quantum computation has a
non-zero probability of accepting is at least as hard as the
polynomial time hierarchy. This hardness result also applies to
determining in general whether a given quantum basis state appears
with nonzero amplitude in a superposition, or whether a given quantum
bit has positive expectation value at the end of a quantum
computation. This result is achieved by showing that the complexity
class NQP of Adleman, Demarrais, and Huang, a quantum analog of NP, is
equal to the counting class $co-C equals P$.
%R 2000-003
%T On the Complexity of Quantum ACC
%A Green, Frederic
%A Homer, Steven
%A Pollett, Christopher
%D January 20, 2000
%U http://www.cs.bu.edu/techreports/2000-003-quant2.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
For any q > 1, let MOD_q be a quantum gate that determines if the
number of 1's in the input is divisible by q. We show that for any
q,t > 1, MOD_q is equivalent to MOD_t (up to constant depth). Based
on the case q=2, Moore has shown that quantum analogs of AC^(0),
ACC[q], and ACC, denoted QAC^(0)_wf, QACC[2], QACC respectively,
define the same class of operators, leaving q > 2 as an open
question. Our result resolves this question, implying that QAC^(0)_wf
= QACC[q] = QACC for all q. We also prove the first upper bounds for
QACC in terms of related language classes. We define classes of
languages EQACC, NQACC (both for arbitrary complex amplitudes) and
BQACC (for rational number amplitudes) and show that they are all
contained in TC^(0). To do this, we show that a TC^(0) circuit can
keep track of the amplitudes of the state resulting from the
application of a QACC operator using a constant width polynomial size
tensor sum. In order to accomplish this, we also show that TC^(0) can
perform iterated addition and multiplication in certain field
extensions.
%R 2000-004
%T On the Origin of Power Laws in Internet Topologies
%A Medina, Alberto
%A Matta, Ibrahim
%A Byers, John
%D January 20, 2000
%U http://www.cs.bu.edu/techreports/2000-004-power-laws-Internet-topology.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Recent empirical studies have shown that Internet topologies exhibit
power laws of the form $y = x^\alpha$ for the following relationships:
(P1) outdegree of node (domain or router) versus rank; (P2) number of
nodes versus outdegree; (P3) number of node pairs within a
neighborhood versus neighborhood size (in hops); and (P4) eigenvalues
of the adjacency matrix versus rank. However, causes for the
appearance of such power laws have not been convincingly given. In
this paper, we examine four factors in the formation of Internet
topologies. These factors are (F1) preferential connectivity of a new
node to existing nodes; (F2) incremental growth of the network; (F3)
distribution of nodes in space; and (F4) locality of edge connections.
In synthetically generated network topologies, we study the relevance
of each factor in causing the aforementioned power laws as well as
other properties, namely diameter, average path length and clustering
coefficient. Different kinds of network topologies are generated:
(T1) topologies generated using our parametrized generator, we call
BRITE; (T2) random topologies generated using the well-known Waxman
model; (T3) Transit-Stub topologies generated using GT-ITM tool; and
(T4) regular grid topologies. We observe that some generated
topologies may not obey power laws P1 and P2. Thus, the existence of
these power laws can be used to validate the accuracy of a given tool
in generating representative Internet topologies. Power laws P3 and
P4 were observed in nearly all considered topologies, but different
topologies showed different values of the power exponent $\alpha$.
Thus, while the presence of power laws P3 and P4 do not give strong
evidence for the representativeness of a generated topology, the value
of $\alpha$ in P3 and P4 can be used as a litmus test for the
representativeness of a generated topology. We also find that factors
F1 and F2 are the key contributors in our study which provide the
resemblance of our generated topologies to that of the Internet.
Note: BRITE (Boston university Representative Internet Topology gEnerator) is
available at http://www.cs.bu.edu/fac/matta/software.html
%R 2000-005
%T BRITE: A Flexible Generator of Internet Topologies
%A Medina, Alberto
%A Matta, Ibrahim
%A Byers, John
%D January 21, 2000 (Revised January 15, 2001)
%U http://www.cs.bu.edu/techreports/2000-005-brite
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
BRITE is a parameterized topology generation tool, which can be used to
flexibly control various parameters (such as connectivity and growth
models) and study various properties of generated network topologies (such
power laws, path length and clustering coefficient).
BRITE can be used to study the relevance of possible causes for properties
recently observed in Internet topologies. Different combinations of
possible causes can be tested. In this version, we consider four of them:
(1) preferential connectivity of a new node to existing nodes; (2)
incremental growth of the network; (3) geographical distribution of nodes;
and (4) locality of edge connections. We use BRITE in [BU-CS-TR-2000-004]
to study the origin of power laws and other metrics in Internet topologies.
BRITE (Boston university Representative Internet Topology gEnerator) is
available on the Web at http://www.cs.bu.edu/faculty/matta/Research/BRITE/
%R 2000-006
%T Efficient Hash-Consing of Recursive Types
%A Considine, Jeffrey
%D January 29, 2000
%U http://www.cs.bu.edu/techreports/2000-006-hashconsing-recursive-types.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Efficient storage of types within a compiler is necessary to avoid large
blowups in space during compilation. Recursive types in particular are
important to consider, as naive representations of recursive types may be
arbitrarily larger than necessary through unfolding. Hash-consing has been
used to efficiently store non-recursive types. Deterministic finite automata
techniques have been used to efficiently perform various operations on
recursive types. We present a new system for storing recursive types combining
hash-consing and deterministic finite automata techniques. The space
requirements are linear in the number of distinct types. Both update and
lookup operations take polynomial time and linear space and type equality can
be checked in constant time once both types are in the system.
%R 2000-007
%T Type Inference For Recursive Definitions
%A Kfoury, Assaf
%A Pericas-Geertsen, Santiago M.
%D March 6, 2000
%U http://www.cs.bu.edu/techreports/2000-007-type-inference-recursive-types.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider type systems that combine universal types, recursive
types, and object types. We study type inference in these systems
under a rank restriction, following Leivant's notion of rank. To
motivate our work, we present several examples showing how our systems
can be used to type programs encountered in practice. We show that
type inference in the rank-k system is decidable for k <= 2 and
undecidable for k >= 3. (Similar results based on different
techniques are known to hold for System F, without recursive types and
object types.) Our undecidability result is obtained by a reduction
from a particular adaptation (which we call ``regular'') of the
semi-unification problem and whose undecidability is, interestingly,
obtained by methods totally different from those used in the case of
standard (or finite) semi-unification.
%R 2000-008
%T QoS Controllers for the Internet
%A Matta, Ibrahim
%A Bestavros, Azer
%D March 12, 2000
%U http://www.cs.bu.edu/techreports/2000-008-QoScontrollers.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this position paper, we review basic control strategies that
machines acting as "traffic controllers" could deploy in order to
improve the management of Internet services. Such traffic controllers
are likely to spur the widespread emergence of advanced applications,
which have (so far) been hindered by the inability of the networking
infrastructure to deliver on the promise of Quality-of-Service (QoS).
%R 2000-009
%T Index trees for efficient deformable shape-based retrieval
%A Liu, Lifeng
%A Sclaroff, Stan
%D March 22, 2000
%U http://www.cs.bu.edu/techreports/2000-009-index-trees-shape-retrieval.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
An improved method for deformable shape-based image indexing
and retrieval is described. A pre-computed index tree is
used to improve the speed of our previously reported on-line
model fitting method; simple shape features are used as keys
in a pre-generated index tree of model instances. In
addition, a coarse to fine indexing scheme is used at
different levels of the tree to further improve speed while
maintaining matching accuracy. Experimental results show
that the speedup is significant, while accuracy of
shape-based indexing is maintained. A method for shape
population-based retrieval is also described. The method
allows query formulation based on the population
distributions of shapes in each image. Results of
population-based image queries for a database of blood cell
micrographs are shown.
%R 2000-010
%T Deciding Isomorphisms of Simple Types in Polynomial Time
%A Considine, Jeffrey
%D April 2, 2000
%U http://www.cs.bu.edu/techreports/2000-010-deciding-simple-type-isomorphisms.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The isomorphisms holding in all models of the simply typed lambda calculus
with surjective and terminal objects are well studied - these models are
exactly the Cartesian closed categories. Isomorphism of two simple types in
such a model is decidable by reduction to a normal form and comparison under a
finite number of permutations (Bruce, Di Cosmo, and Longo 1992).
Unfortunately, these normal forms may be exponentially larger than the
original types so this construction decides isomorphism in exponential
time. We show how using space-sharing/hash-consing techniques and memoization
can be used to decide isomorphism in practical polynomial time (low degree,
small hidden constant).
Other researchers have investigated simple type isomorphism in relation to,
among other potential applications, type-based retrieval of software modules
from libraries and automatic generation of bridge code for multi-language
systems. Our result makes such potential applications practically feasible.
%R 2000-011
%T GreedyDual* Web Caching Algorithm: Exploiting the Two Sources of Temporal Locality in Web Request Streams
%A Jin, Shudong
%A Bestavros, Azer
%D April 4, 2000
%U http://www.cs.bu.edu/techreports/2000-011-gdstar-web-caching.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The relative importance of long-term popularity and short-term
temporal correlation of references for Web cache replacement policies
has not been studied thoroughly. This is partially due to the lack of
accurate characterization of temporal locality that enables the
identification of the relative strengths of these two sources of
temporal locality in a reference stream. In [JB99], we have proposed
such a metric and have shown that Web reference streams differ
significantly in the the prevelance of these two sources of temporal
locality. These findings underscore the importance of a Web caching
strategy that can adapt in a dynamic fashion to the prevelance of
these two sources of temporal locality. In this paper, we propose a
novel cache replacement algorithm, GreedyDual*, which is a
generalization of GreedyDual-Size. GreedyDual* uses the metrics
proposed in [JB99] to adjust the relative worth of long-term
popularity versus short-term temporal correlation of references. Our
trace-driven simulation experiments show the superior performance of
GreedyDual* when compared to other Web cache replacement policies
proposed in the literature.
%R 2000-012
%T Differentiated Predictive Fair Service for TCP Flows
%A Matta, Ibrahim
%A Guo, Liang
%D May 17, 2000
%U http://www.cs.bu.edu/techreports/2000-012-diffserv-tcp.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The majority of the traffic (bytes) flowing over the Internet today
have been attributed to the Transmission Control Protocol (TCP). This
strong presence of TCP has recently spurred further investigations
into its congestion avoidance mechanism and its effect on the
performance of short and long data transfers. At the same time, the
rising interest in enhancing Internet services while keeping the
implementation cost low has led to several service-differentiation
proposals. In such service-differentiation architectures, much of the
complexity is placed only in access routers, which classify and mark
packets from different flows. Core routers can then allocate enough
resources to each class of packets so as to satisfy delivery
requirements, such as predictable (consistent) and fair service.
In this paper, we investigate the interaction among short and long TCP
flows, and how TCP service can be improved by employing a low-cost
service-differentiation scheme. Through control-theoretic arguments
and extensive simulations, we show the utility of isolating TCP flows
into two classes based on their lifetime/size, namely one class of
short flows and another of long flows. With such class-based
isolation, short and long TCP flows have separate service queues at
routers. This protects each class of flows from the other as they
possess different characteristics, such as burstiness of
arrivals/departures and congestion/sending window dynamics. We show
the benefits of isolation, in terms of better predictability and
fairness, over traditional shared queueing systems with both tail-drop
and Random-Early-Drop (RED) packet dropping policies. The proposed
class-based isolation of TCP flows has several advantages: (1) the
implementation cost is low since it only requires core routers to
maintain per-class (rather than per-flow) state; (2) it promises to be
an effective traffic engineering tool for improved predictability and
fairness for both short and long TCP flows; and (3) stringent delay
requirements of short interactive transfers can be met by increasing
the amount of resources allocated to the class of short flows.
%R 2000-013
%T Robust Identification of Shared Losses Using End-to-End Unicast Probes
%A Harfoush, Khaled
%A Bestavros, Azer
%A Byers, John
%D May 30, 2000
%U http://www.cs.bu.edu/techreports/2000-013-unicast-shared-loss-identification.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Current Internet transport protocols make end-to-end measurements and
maintain per-connection state to regulate the use of shared network
resources. When two or more such connections share a common endpoint,
there is an opportunity to correlate the end-to-end measurements made
by these protocols to better diagnose and control the use of shared
resources. We develop packet probing techniques to determine whether
a pair of connections experience shared congestion. Correct,
efficient diagnoses could enable new techniques for aggregate
congestion control, QoS admission control, connection scheduling and
mirror site selection. Our extensive simulation results demonstrate
that the conditional (Bayesian) probing approach we employ provides
superior accuracy, converges faster, and tolerates a wider range of
network conditions than recently proposed memoryless (Markovian)
probing approaches for addressing this opportunity.
%R 2000-014
%T Utility-Based Decision-Making in Wireless Sensor Networks
%A Byers, John
%A Nasser, Gabriel
%D June 1, 2000
%U http://www.cs.bu.edu/techreports/2000-014-sensor-networks-utility.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider challenges associated with application domains in which
a large number of distributed, networked sensors must perform a sensing
task repeatedly over time. For the tasks we consider, there are three
significant challenges to address. First, nodes have resource constraints
imposed by their finite power supply, which motivates computations that
are energy-conserving. Second, for the applications we describe, the utility
derived from a sensing task may vary depending on the placement and size of
the set of nodes who participate, which often involves complex objective
functions for nodes to target. Finally, nodes must attempt to realize these
global objectives with only local information.
We present a model for such applications, in which we define
appropriate global objectives based on utility functions and specify a
cost model for energy consumption. Then, for an important class of
utility functions, we present distributed algorithms which attempt to
maximize the utility derived from the sensor network over its
lifetime. The algorithms and experimental results we present enable
nodes to adaptively change their roles over time and use dynamic
reconfiguration of routes to load balance energy consumption in the
network.
%R 2000-015
%T Estimating Human Body Pose from a Single Image via the Specialized Mappings Architecture
%A Rosales, Romer
%A Sclaroff, Stan
%D June 10, 2000
%U http://www.cs.bu.edu/techreports/2000-015-specialized-mappings-and-human-pose.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A non-linear supervised learning architecture, the Specialized Mapping
Architecture (SMA) and its application to articulated body pose
reconstruction from single monocular images is described. The
architecture is formed by a number of specialized mapping functions,
each of them with the purpose of mapping certain portions (connected
or not) of the input space, and a feedback matching process. A
probabilistic model for the architecture is described along with a
mechanism for learning its parameters. The learning problem is
approached using a maximum likelihood estimation framework; we present
Expectation Maximization (EM) algorithms for two different instances
of the likelihood probability. Performance is characterized by
estimating human body postures from low level visual features, showing
promising results
%R 2000-016
%T Unicast-based Characterization of Network Loss Topologies
%A Harfoush, Khaled
%A Bestavros, Azer
%A Byers, John
%D July 3, 2000
%U http://www.cs.bu.edu/techreports/2000-016-unicast-loss-topology-characterization.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Current Internet transport protocols make end-to-end measurements and
maintain per-connection state to regulate the use of shared network
resources. When a number of such connections share a common endpoint,
that endpoint has the opportunity to correlate these end-to-end
measurements to better diagnose and control the use of shared
resources. A valuable characterization of such shared resources is the
``loss topology''. From the perspective of a server with concurrent
connections to multiple clients, the loss topology is a logical tree
rooted at the server in which edges represent lossy paths between a
pair of internal network nodes. We develop an end-to-end unicast
packet probing technique and an associated analytical framework to:
(1) infer loss topologies, (2) identify loss rates of links in an
existing loss topology, and (3) augment a topology to incorporate the
arrival of a new connection. Correct, efficient inference of loss
topology information enables new techniques for aggregate congestion
control, QoS admission control, connection scheduling and mirror site
selection. Our extensive simulation results demonstrate that our
approach is robust in terms of its accuracy and convergence over a
wide range of network conditions.
%R 2000-017
%T TCP Congestion Control and Heavy Tails
%A Guo, Liang
%A Crovella, Mark
%A Matta, Ibrahim
%D July 3, 2000
%U http://www.cs.bu.edu/techreports/2000-017-tcp-heavy-tails.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Long-range dependence has been observed in many recent Internet
traffic measurements. Previous studies have shown that there is a
close relationship between heavy-tailed distribution of various
traffic parameters and the long-range dependent property. In this
paper, we use a simple Markov chain model to argue that when the loss
rate is relatively high, TCP's adaptive congestion control mechanism
indeed generates traffic with heavy-tailed OFF, or idle, periods, and
therefore introduces long-range dependence into the overall traffic.
Moreover, the degree of such long-range dependence, measured by the
Hurst parameter, increases as the loss rate increases, agreeing with
many previous measurement-based studies. In addition, we observe that
more variable initial retransmission timeout values for different
packets introduces more variable packet inter-arrival times, which
increases the burstiness of the overall traffic. Finally, we show
that high loss conditions can lead to a heavy-tailed distribution of
transmission times even for constant-sized files. This means that
file size variability need not be the only cause of heavy-tailed
variability in transmission durations.
%R 2000-018
%T On the Marginal Utility of Deploying Measurement Infrastructure
%A Barford, Paul
%A Bestavros, Azer
%A Byers, John
%A Crovella, Mark
%D July 3, 2000
%U http://www.cs.bu.edu/techreports/2000-018-internet-measurement-utility.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The cost and complexity of deploying measurement infrastructure in the
Internet for the purpose of analyzing its structure and behavior is
considerable. Basic questions about the {\em utility} of increasing
the number of measurements and/or measurement sites have not yet been
addressed which has lead to a ``more is better'' approach to wide-area
measurements. In this paper, we quantify the marginal utility of
performing wide-area measurements in the context of Internet topology
discovery. We characterize topology in terms of nodes, links, node
degree distribution, and end-to-end flows using statistical and
information-theoretic techniques. We classify nodes discovered on
the routes between a set of 8 sources and 1277 destinations to differentiate
nodes which make up the so called ``backbone'' from those which border the
backbone and those on links between the border nodes and destination nodes.
This process includes reducing nodes that advertise multiple interfaces to
single IP addresses. We show that the utility of adding sources goes down
significantly after 2 from the persperspective of interface, node, link and
node degree discovery. We show that the utility of adding destinations is
constant for interfaces, nodes, links and node degree indicating that it is
more important to add destinations than sources. Finally, we analyze
paths through the backbone and show that shared link distributions
approximate a power law indicating that a small number of backbone links in
our study are very heavily utilized.
%R 2000-019
%T Cachability of Web Objects
%A Zhang, Xiaohui
%D August 8, 2000
%U http://www.cs.bu.edu/techreports/2000-019-web-cachability.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Much work on the performance of Web proxy caching has focused on
high-level metrics such as hit rate and byte hit rate, but has ignored
all the information related to the cachability of Web
objects. Uncachable objects include those fetched by dynamic requests,
objects with uncachable HTTP status code, objects with the uncachable
HTTP header, objects with an HTTP 1.0 cookie, and objects without a
last-modified header. Although some researchers filter the Web traces
before they use them for analysis or simulation,many do not have a
comprehensive understanding of the cachability of Web objects. In this
paper we evaluate all the reasons that a Web object might be
uncachable. We use traces from NLANR. Since these traces do not
contain HTTP header information, we replay them using request
generator to get the response header information. We find that between
15% and 40% of Web objects in our traces can not be cached by a Web
proxy server . We use a LRU simulator to show the performance gap when
the cachability is either considered or not. We show the
characteristics of the cachable data set and find that all its
characteristics are fairly similar to that of total data set. Finally,
we present some additional results for the cachable and total data
set: (1) The main reasons for uncachability are: dynamic requests,
responses without last-modified header, responses with HTTP "302 Moved
Temporarily" status code, and responses with a HTTP/1.0 cookie. (2)
The cachability of Web objects can not be ignored in simulation
because uncachable objects comprise a huge percentage of the total
trace. Simulations without cachability consideration will be
misleading.
%R 2000-020
%T Type Inference for Variant Object Types
%A Bugliesi, Michele
%A Pericas-Geertsen, Santiago M.
%D October 16, 2000
%U http://www.cs.bu.edu/techreports/2000-020-inference-for-variant-types.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Existing type systems for object calculi are based on invariant subtyping.
Subtyping invariance is required for soundness of static typing in the
presence of method overrides, but it is often in the way of the expressive
power of the type system. Flexibility of static typing can be recovered in
different ways: in first-order systems, by the adoption of object types with
variance annotations, in second-order systems by resorting
to Self types.
Type inference is known to be P-complete for first-order systems of finite
and recursive object types, and NP-complete for a restricted version of Self
types. The complexity of type inference for systems with variance annotations
is yet unknown.
This paper presents a new object type system based on the notion of Split
types, a form of object types where every method is assigned two types,
namely, an update type and a select type. The subtyping relation that arises
for Split types is variant and, as a result, subtyping can be performed
both in width and in depth.
The new type system generalizes all the existing first-order type systems
for objects, including systems based on variance annotations. Interestingly,
the additional expressive power does not affect the complexity of the type
inference problem, as we show by presenting an O(n^3) inference algorithm.
%R 2000-021
%T What are polymorphically-typed ambients?
%A Amtoft, Torben Amtoft
%A Kfoury, Assaf
%A Pericas-Geertsen, Santiago
%D October 19, 2000
%U http://www.cs.bu.edu/techreports/2000-021-what-are-polymorphically-typed-ambients.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The Ambient Calculus was developed by Cardelli and
Gordon as a formal framework to study issues of mobility and
migrant code. We consider an Ambient Calculus where ambients
transport and exchange programs rather that just inert data. We
propose different senses in which such a calculus can be said to be
polymorphically typed, and design accordingly a polymorphic
type system for it. Our type system assigns types to embedded
programs and what we call behaviors to processes; a denotational
semantics of behaviors is then proposed, here called trace
semantics, underlying much of the remaining analysis. We state
and prove a Subject Reduction property for our polymorphically
typed calculus. Based on techniques borrowed from finite automata
theory, type-checking of fully type-annotated processes is shown
to be decidable; the time complexity of our decision procedure is
exponential (this is a worst-case in theory, arguably not encountered
in practice). Our polymorphically-typed calculus is a conservative
extension of the typed Ambient Calculus originally proposed by
Cardelli and Gordon.
%R 2000-022
%T 3D Hand Pose Reconstruction Using Specialized Mappings
%A Rosales, Romer
%A Athitsos, Vassilis
%A Sclaroff, Stan
%D December 4, 2000
%U http://www.cs.bu.edu/techreports/2000-022-hand-pose-estimation-with-sma.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A system for recovering 3D hand pose from monocular color
sequences is proposed. The system employs a non-linear
supervised learning framework, the specialized mappings
architecture (SMA), to map image features to likely 3D hand
poses. The SMA's fundamental components are a set of
specialized forward mapping functions, and a single feedback
matching function. The forward functions are estimated
directly from training data, which in our case are examples
of hand joint configurations and their corresponding visual
features. The joint angle data in the training set is
obtained via a CyberGlove, a glove with 22 sensors that
monitor the angular motions of the palm
and fingers. In training, the visual features are generated
using a computer graphics module that renders the hand from
arbitrary viewpoints given the 22 joint angles. We test our
system both on synthetic sequences and on sequences taken
with a color camera. The system automatically detects and
tracks both hands of the user, calculates the appropriate
features, and estimates the 3D hand joint angles from those
features. Results are encouraging given the complexity of
the task.
%R 2000-023
%T An Integrated Approach for Segmentation and Estimation of Planar Structures
%A Alon, Joni
%A Sclaroff, Stan
%D December 4, 2000
%U http://www.cs.bu.edu/techreports/2000-023-planar-struct-segmentation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Standard structure from motion algorithms recover 3D
structure of points. If a surface representation is desired,
for example a piece-wise planar representation, then a
two-step procedure typically follows: in the first step the
plane-membership of points is first determined manually, and
in a subsequent step planes are fitted to the sets of points
thus determined, and their parameters are recovered. This
paper presents an approach for automatically segmenting
planar structures from a sequence of images, and
simultaneously estimating their parameters. In the proposed
approach the plane-membership of points is determined
automatically, and the planar structure parameters are
recovered directly in the algorithm rather than indirectly
in a post-processing stage. Simulated and real experimental
results show the efficacy of this approach.
%R 2000-024
%T Region Segmentation via Deformable Model-Guided Split and Merge
%A Liu, Lifeng
%A Sclaroff, Stan
%D December 4, 2000
%U http://www.cs.bu.edu/techreports/2000-024-deform-shape-based-split-merge.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
An improved method for deformable shape-based image
segmentation is described. Image regions are merged
together and/or split apart, based on their agreement with
an a priori distribution on the global deformation
parameters for a shape template. The quality of a candidate
region merging is evaluated by a cost measure that includes:
homogeneity of image properties within the combined region,
degree of overlap with a deformed shape model, and a
deformation likelihood term. Perceptually-motivated
criteria are used to determine where/how to split regions,
based on the local shape properties of the region group's
bounding contour. A globally consistent interpretation is
determined in part by the minimum description length
principle. Experiments show that the model-based splitting
strategy yields a significant improvement in segmention over
a method that uses merging alone.
%R 2000-025
%T The Cyclone Server Architecture: Streamlining Delivery of Popular Content
%A Rost, Stan
%A Byers, John
%A Bestavros, Azer
%D December 15, 2000
%U http://www.cs.bu.edu/techreports/2000-025-cyclone.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose a new technique for efficiently delivering popular content
from information repositories with bounded file caches. Our strategy
relies on the use of fast erasure codes (a.k.a. forward error
correcting codes) to generate encodings of popular files, of which
only a small sliding window is cached at any time instant, even to satisfy
an unbounded number of asynchronous requests for the file. Our approach
capitalizes on concurrency to maximize sharing of state across different
request threads while minimizing cache memory utilization. Additional
reduction in resource requirements arises from providing for a
lightweight version of the network stack.
In this paper, we describe the design and implementation of our
Cyclone server as a Linux kernel subsystem.
%R 2000-026
%T Fine-Grained Layered Multicast
%A Byers, John
%A Luby, Michael
%A Mitzenmacher, Michael
%D December 15, 2000
%U http://www.cs.bu.edu/techreports/2000-026-fine-grained-layered-multicast.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Traditional approaches to receiver-driven layered multicast have advocated
the benefits of cumulative layering, which can enable coarse-grained
congestion control that complies with TCP-friendliness equations over large
time scales. In this paper, we quantify the costs and benefits of using
non-cumulative layering and present a new, scalable multicast congestion
control scheme which provides a fine-grained approximation to the behavior of
TCP additive increase / multiplicative decrease (AIMD). In contrast to
the conventional wisdom, we demonstrate that fine-grained rate adjustment
can be achieved with only modest increases in the number of layers and
aggregate bandwidth consumption, while using only a small constant number
of control messages to perform either additive increase or multiplicative
decrease.
%R 2000-027
%T An Infrastructure for the Dynamic Distribution of Web Applications and Services
%A Duvos, Enrique
%A Bestavros, Azer
%D December 15, 2000
%U http://www.cs.bu.edu/techreports/2000-027-web-apps-dynamic-distribution.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper presents the design and implementation of an infrastructure
that enables any Web application, regardless of its current state, to
be stopped and uninstalled from a particular server, transferred to a
new server, then installed, loaded, and resumed, with all these events
occurring "on the fly" and totally transparent to clients. Such
functionalities allow entire applications to fluidly move from server
to server, reducing the overhead required to administer the system,
and increasing its performance in a number of ways: (1) Dynamic
replication of new instances of applications to several servers to
raise throughput for scalability purposes, (2) Moving applications to
servers to achieve load balancing or other resource management goals,
(3) Caching entire applications on servers located closer to clients.
%R 2000-028
%T TCP Control Groups: Aggregated Congestion Control for TCP (MA Thesis)
%A Gschwendter, Thomas
%D May 12, 2000
%U http://www.cs.bu.edu/techreports/2000-028-MA-Thesis-Thomas-Gschwendter.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This thesis presents a framework for aggregated congestion
management for TCP flows and shows how to integrate such an
approach in an existing TCP protocol stack. The thesis presents
an initial implementation of this congestion management scheme
in Linux, with performance evaluation in ns as well.
%R 2001-001
%T Robust Identification of Shared Losses Using End-to-End Unicast Probes (ERRATA)
%A Harfoush, Khaled
%A Bestavros, Azer
%A Byers, John
%D January 8, 2001
%U http://www.cs.bu.edu/techreports/2001-001-unicast-shared-loss-identification-errata.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present corrections to Fact 3 and (as a consequence) to Lemma 1 of
BUCS Technical Report BUCS-TR-2000-013 (also published in IEEE
ICNP'2000). These corrections result in slight changes to the formulae
used for the identifications of shared losses, which we quantify.
%R 2001-002
%T Program representation size in an intermediate language with intersection and union types
%A Dimock, Allyn
%A Westmacott, Ian
%A Muller, Robert
%A Turbak, Franklyn
%A Wells, J.B.
%A Considine, Jeffrey
%D March 15, 2001
%U http://www.cs.bu.edu/techreports/2001-002-program-representation-size.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The CIL compiler for core Standard ML compiles whole programs using a
novel typed intermediate language (TIL) with intersection and union
types and flow labels on both terms and types. The CIL term
representation duplicates portions of the program where intersection
types are introduced and union types are eliminated. This duplication
makes it easier to represent type information and to introduce
customized data representations. However, duplication incurs
compile-time space costs that are potentially much greater than are
incurred in TILs employing type-level abstraction or quantification.
In this paper, we present empirical data on the compile-time space
costs of using CIL as an intermediate language. The data shows that
these costs can be made tractable by using sufficiently fine-grained
flow analyses together with standard hash-consing techniques. The
data also suggests that non-duplicating formulations of intersection
(and union) types would not achieve significantly better space
complexity.
%R 2001-003
%T BRITE: Universal Topology Generation from a User's Perspective
%A Medina, Alberto
%A Lakhina, Anukool
%A Matta, Ibrahim
%A Byers, John
%D April 1, 2001
%U http://www.cs.bu.edu/techreports/2001-003-brite-user-manual.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Effective engineering of the Internet is predicated upon a detailed
understanding of issues such as the large-scale structure of its
underlying physical topology, the manner in which it evolves over
time, and the way in which its constituent components contribute to
its overall function. Unfortunately, developing a deep understanding
of these issues has proven to be a challenging task, since it in turn
involves solving difficult problems such as mapping the actual
topology, characterizing it, and developing models that capture its
emergent behavior. Consequently, even though there are a number of
topology models, it is an open question as to how representative the
topologies they generate are of the actual Internet. Our goal is to
produce a topology generation framework which improves the state of
the art and is based on design principles which include
representativeness, inclusiveness, and interoperability.
Representativeness leads to synthetic topologies that accurately
reflect many aspects of the actual Internet topology (e.g.
hierarchical structure, degree distribution, etc.). Inclusiveness
combines the strengths of as many generation models as possible in a
single generation tool. Interoperability provides interfaces to
widely-used simulation applications such as ns and SSF as well as
visualization applications. We call such a tool a "universal topology
generator".
In this paper we discuss the design, implementation and usage of the
BRITE universal topology generation tool that we have built. We also
describe the BRITE Analysis Engine, BRIANA, which is an independent
piece of software designed and built upon BRITE design goals of
flexibility and extensibility. The purpose of BRIANA is to act as a
repository of analysis routines along with a user--friendly interface
that allows its use on different topology formats.
%R 2001-004
%T Automatic 3D Registration of Lung Surfaces in Computed Tomography Scans
%A Betke, Margrit
%A Hong, Harrison
%A Ko, Jane
%D April 24, 2001
%U http://www.cs.bu.edu/techreports/2001-004-betke-hong-ko.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We developed an automated system that registers chest CT scans
temporally. Our registration method matches corresponding anatomical
landmarks to obtain initial registration parameters. The initial
point-to-point registration is then generalized to an iterative
surface-to-surface registration method. Our ``goodness-of-fit''
measure is evaluated at each step in the iterative scheme until the
registration performance is sufficient. We applied our method to
register the 3D lung surfaces of 11 pairs of chest CT scans and report
promising registration performance.
%R 2001-005
%T The War Between Mice and Elephants
%A Guo, Liang
%A Matta, Ibrahim
%D May 7, 2001
%U http://www.cs.bu.edu/techreports/2001-005-war-tcp-rio.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Recent measurement based studies reveal that most of the Internet
connections are short in terms of the amount of traffic they carry
(mice), while a small fraction of the connections are carrying a large
portion of the traffic (elephants). A careful study of the TCP
protocol shows that without help from an Active Queue Management (AQM)
policy, short connections tend to lose to long connections in their
competition for bandwidth. This is because short connections do not
gain detailed knowledge of the network state, and therefore they are
doomed to be less competitive due to the conservative nature of the
TCP congestion control algorithm.
Inspired by the Differentiated Services (Diffserv) architecture, we
propose to give preferential treatment to short connections inside the
bottleneck queue, so that short connections experience less packet
drop rate than long connections. This is done by employing the RIO
(RED with In and Out) queue management policy which uses different
drop functions for different classes of traffic.
Our simulation results show that: (1) in a highly loaded network,
preferential treatment is necessary to provide short TCP connections
with better response time and fairness without hurting the performance
of long TCP connections; (2) the proposed scheme still delivers
packets in FIFO manner at each link, thus it maintains statistical
multiplexing gain and does not misorder packets; (3) choosing a
smaller default initial timeout value for TCP can help enhance the
performance of short TCP flows, however not as effectively as our
scheme and with the risk of congestion collapse; (4) in the worst
case, our proposal works as well as a regular RED scheme, in terms of
response time and goodput.
%R 2001-006
%T TCP-friendly SIMD Congestion Control and Its Convergence Behavior
%A Jin, Shudong
%A Guo, Liang
%A Matta, Ibrahim
%A Bestavros, Azer
%D May 8, 2001
%U http://www.cs.bu.edu/techreports/2001-006-simd.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The increased diversity of Internet application requirements has
spurred recent interest in flexible congestion control
mechanisms. Window-based congestion control schemes use increase rules
to probe available bandwidth, and decrease rules to back off when
congestion is detected. The parameterization of these control rules is
done so as to ensure that the resulting protocol is TCP-friendly in
terms of the relationship between throughput and packet loss rate. In
this paper, we propose a novel window-based congestion control
algorithm called SIMD (Square-Increase/Multiplicative-Decrease).
Contrary to previous memory-less controls, SIMD utilizes history
information in its control rules. It uses multiplicative decrease but
the increase in window size is in proportion to the {\em square} of
the time elapsed since the detection of the last loss event. Thus,
SIMD can efficiently probe available bandwidth. Nevertheless, SIMD is
TCP-friendly as well as TCP-compatible under RED, and it has much
better convergence behavior than TCP-friendly AIMD and binomial
algorithms proposed recently.
%R 2001-007
%T Retrieval by Shape Population: An Index Tree Approach
%A Liu, Lifeng
%A Sclaroff, Stan
%D June 5, 2001
%U http://www.cs.bu.edu/techreports/2001-007-shape-population-retrieval.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Based on our previous work in deformable shape model-based
object detection, a new method is proposed that uses index
trees for organizing shape features to support content-based
retrieval applications. In the proposed strategy, different
shape feature sets can be used in index trees constructed
for object detection and shape similarity comparison
respectively. There is a direct correspondence between the
two shape feature sets. As a result, application-specific
features can be obtained efficiently for shape-based
retrieval after object detection. A novel approach is
proposed that allows retrieval of images based on the
population distribution of deformed shapes in each image.
Experiments testing these new approaches have been conducted
using an image database that contains blood cell
micrographs. The precision vs. recall performance measure
shows that our method is superior to previous methods.
%R 2001-008
%T Estimating 3D Body Pose using Uncalibrated Cameras
%A Rosales, Romer
%A Siddiqui, Matheen
%A Alon, Jonathan
%A Sclaroff, Stan
%D June 5, 2001
%U http://www.cs.bu.edu/techreports/2001-008-estimating-3D-body-pose.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
An approach for estimating 3D body pose from multiple,
uncalibrated views is proposed. First, a mapping from image
features to 2D body joint locations is computed using a
statistical framework that yields a set of several body pose
hypotheses. The concept of a ``virtual camera'' is
introduced that makes this mapping invariant to translation,
image-plane rotation, and scaling of the input. As a
consequence, the calibration matrices (intrinsics) of the
virtual cameras can be considered completely known, and
their poses are known up to a single angular displacement
parameter. Given pose hypotheses obtained in the multiple
virtual camera views, the recovery of 3D body pose and
camera relative orientations is formulated as a stochastic
optimization problem. An Expectation-Maximization algorithm
is derived that can obtain the most likely (self-consistent)
combination of body pose hypotheses. Performance of the
approach is evaluated with synthetic sequences as well as
real video sequences of human motion.
%R 2001-009
%T Surface Reconstruction from Multiple Views using Rational B-Splines
%A Siddiqui, Matheen
%A Sclaroff, Stan
%D June 5, 2001
%U http://www.cs.bu.edu/techreports/2001-009-rational-bspline-surface.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A method for reconstructing 3D rational B-spline surfaces
from multiple views is proposed. The method takes advantage
of the projective invariance properties of rational
B-splines. Given feature correspondences in multiple views,
the 3D surface is reconstructed via a four step framework.
First, corresponding features in each view are given an
initial surface parameter value (s,t), and a 2D B-spline is
fitted in each view. After this initialization, an iterative
minimization procedure alternates between updating the 2D
B-spline control points and re-estimating each feature's
(s,t). Next, a non-linear minimization method is used to
upgrade the 2D B-splines to 2D rational B-splines, and
obtain a better fit. Finally, a factorization method is used
to reconstruct the 3D B-spline surface given 2D B-splines in
each view. This surface recovery method can be applied in
both the perspective and orthographic case. The
orthographic case allows the use of additional constraints
in the recovery. Experiments with real and synthetic
imagery demonstrate the efficacy of the approach for the
orthographic case.
%R 2001-010
%T Inference and Labeling of Metric-Induced Network Topologies
%A Bestavros, Azer
%A Byers, John
%A Harfoush, Khaled
%D June 5, 2001
%U http://www.cs.bu.edu/techreports/2001-010-mint.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The deployment of distributed network-aware applications over the
Internet requires an accurate representation of the conditions of
underlying network resources. To be effective, this representation
must be possible at multiple resolutions relative to a metric of
interest. In this paper, we propose an approach for the construction
of such representations using end-to-end measurements.
We instantiate our approach by considering packet loss rates as an
example metric. To that end, we present an analytical framework for
the inference of Internet loss topologies. From the perspective of a
server the loss topology is a logical tree rooted at the server with
clients at its leaves, in which edges represent lossy paths---paths
exhibiting observable loss rates higher than a specified
resolution---between a pair of internal network nodes.
We show how end-to-end unicast packet probing techniques could be
used to (1) infer a loss topology, and (2) identify the loss rates of
links in an existing loss topology. We report on simulation,
implementation, and Internet deployment results that show the
effectiveness of our approach and its robustness in terms of its
accuracy and convergence.
%R 2001-011
%T On Class-based Isolation of UDP, Short-lived and Long-lived TCP Flows
%A Yilmaz, Selma
%A Matta, Ibrahim
%D June 5, 2001
%U http://www.cs.bu.edu/techreports/2001-011-cbi.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The congestion control mechanisms of TCP make it vulnerable in an
environment where flows with different congestion-sensitivity compete
for scarce resources. With the increasing amount of unresponsive UDP
traffic in today's Internet, new mechanisms are needed to enforce
fairness in the core of the network. We propose a scalable
Diffserv-like architecture, where flows with different characteristics
are classified into separate service queues at the routers. Such
class-based isolation provides protection so that flows with different
characteristics do not negatively impact one another. In this study,
we examine different aspects of UDP and TCP interaction and possible
gains from segregating UDP and TCP into different classes. We also
investigate the utility of further segregating TCP flows into two
classes, which are class of short and class of long flows. Results
are obtained analytically for both Tail-drop and Random Early Drop
(RED) routers. Class-based isolation have the following salient
features: (1) better fairness, (2) improved predictability for all
kinds of flows, (3) lower transmission delay for delay-sensitive
flows, and (4) better control over Quality of Service (QoS) of a
particular traffic type.
%R 2001-012
%T DNS-based Internet Client Clustering and Characterization
%A Bestavros, Azer
%A Mehrotra, Sumit
%D June 5, 2001
%U http://www.cs.bu.edu/techreports/2001-012-dns-clustering.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper proposes a novel protocol which uses the Internet Domain
Name System (DNS) to partition Web clients into disjoint sets, each of
which is associated with a single DNS server. We define an L-DNS
cluster to be a grouping of Web Clients that use the same Local DNS
server to resolve Internet host names. We identify such clusters in
real-time using data obtained from a Web Server in conjunction with
that server's Authoritative DNS---both instrumented with an
implementation of our clustering algorithm. Using these clusters, we
perform measurements from four distinct Internet locations. Our
results show that L-DNS clustering enables a better estimation of
proximity of a Web Client to a Web Server than previously proposed
techniques. Thus, in a Content Distribution Network, a DNS-based
scheme that redirects a request from a web client to one of many
servers based on the client's name server coordinates (e.g.,
hops/latency/loss-rates between the client and servers) would perform
better with our algorithm.
%R 2001-013
%T Open Issues on TCP for Mobile Computing
%A Tsaoussidis, Vassilis
%A Matta, Ibrahim
%D July 3, 2001
%U http://www.cs.bu.edu/techreports/2001-013-open-issues-tcp-wireless.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We discuss the design principles of TCP within the context of
heterogeneous wired/wireless networks and mobile networking. We
identify three shortcomings in TCP's behavior: (i) the protocol's
error detection mechanism, which does not distinguish different types
of errors and thus does not suffice for heterogeneous wired/wireless
environments, (ii) the error recovery, which is not responsive to the
distinctive characteristics of wireless networks such as transient or
burst errors due to handoffs and fading channels, and (iii) the
protocol strategy, which does not control the tradeoff between
performance measures such as goodput and energy consumption, and often
entails a wasteful effort of retransmission and energy expenditure.
We discuss a solution-framework based on selected research proposals
and the associated evaluation criteria for the suggested
modifications. We highlight an important angle that did not attract
the required attention so far: the need for new performance metrics,
appropriate for evaluating the impact of protocol strategies on
battery-powered devices.
%R 2001-014
%T How does TCP generate Pseudo-self-similarity?
%A Guo, Liang
%A Crovella, Mark
%A Matta, Ibrahim
%D July 12, 2001
%U http://www.cs.bu.edu/techreports/2001-014-tcp-pseudo-ss.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Long-range dependence has been observed in many recent Internet
traffic measurements. In addition, some recent studies have shown
that under certain network conditions, TCP itself can produce traffic
that exhibits dependence over limited timescales, even in the absence
of higher-level variability. In this paper, we use a simple Markovian
model to argue that when the loss rate is relatively high, TCP's
adaptive congestion control mechanism indeed generates traffic with
OFF periods exhibiting power-law shape over several timescales and
thus introduces pseudo-long-range dependence into the overall traffic.
Moreover, we observe that more variable initial retransmission timeout
values for different packets introduces more variable packet
inter-arrival times, which increases the burstiness of the overall
traffic. We can thus explain why a single TCP connection can produce
a time-series that can be misidentified as self-similar using standard
tests.
%R 2001-015
%T A Spectrum of TCP-friendly Window-based Congestion Control Algorithms
%A Jin, Shudong
%A Guo, Liang
%A Matta, Ibrahim
%A Bestavros, Azer
%D July 12, 2001
%U http://www.cs.bu.edu/techreports/2001-015-spectrum-tcp-friendly.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The increased diversity of Internet application requirements has
purred recent interests in transport protocols with flexible
transmission controls. In window-based congestion control schemes,
increase rules determine how to probe available bandwidth, whereas
decrease rules determine how to back off when losses due to congestion
are detected. The parameterization of these control rules is done so
as to ensure that the resulting protocol is TCP-friendly in terms of
the relationship between throughput and loss rate.
In this paper, we define a new spectrum of window-based congestion
control algorithms that are TCP-friendly as well as TCP-compatible
under RED. Contrary to previous memory-less controls, our algorithms
utilize history information in their control rules. Our proposed
algorithms have two salient features: (1) They enable a wider region
of TCP-friendliness, and thus more flexibility in trading off among
smoothness, aggressiveness, and responsiveness; and (2) they ensure a
faster convergence to fairness under a wide range of system
conditions. SIMD is one instance of this spectrum of algorithms, in
which the congestion window is increased super-linearly with time
since the detection of the last loss. Compared to recently proposed
TCP-friendly AIMD and binomial algorithms, we demonstrate the
superiority of SIMD in: (1) adapting to sudden increases in available
bandwidth, while maintaining competitive smoothness and
responsiveness; and (2) rapidly converging to fairness and efficiency.
%R 2001-016
%T Measuring Bottleneck Bandwidth of Targeted Path Segments
%A Harfoush, Khaled
%A Bestavros, Azer
%A Byers, John
%D July 31, 2001
%U http://www.cs.bu.edu/techreports/2001-016-segment-bottleneck-bandwidth.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Accurate measurement of network bandwidth is crucial for flexible
Internet applications and protocols which actively manage and
dynamically adapt to changing utilization of network resources. These
applications must do so to perform tasks such as distributing and
delivering high-bandwidth media, scheduling service requests and
performing admission control. Extensive work has focused on two
approaches to measuring bandwidth: measuring it hop-by-hop, and
measuring it end-to-end along a path. Unfortunately, best-practice
techniques for the former are inefficient and techniques for the
latter are only able to observe bottlenecks visible at end-to-end
scope. In this paper, we develop and simulate end-to-end probing
methods which can measure bottleneck bandwidth along arbitrary,
targeted subpaths of a path in the network, including subpaths shared
by a set of flows. As another important contribution, we describe a
number of practical applications which we foresee as standing to
benefit from solutions to this problem, especially in emerging,
flexible network architectures such as overlay networks, ad-hoc
networks, peer-to-peer architectures and massively accessed content
servers.
%R 2001-017
%T Proceedings of the Sixth International Web Content Caching and Distribution Workshop (WCW'01)
%A Bestavros, Azer
%A Rabinovich, Michael
%D August 2, 2001
%U http://www.cs.bu.edu/techreports/2001-017-wcw01-proceedings
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The International Web Content Caching and Distribution Workshop (WCW)
is a premiere technical meeting for researchers and practitioners
interested in all aspects of content caching, distribution and
delivery on the Internet. This year's meeting will be held on the
Boston University Campus and will build on the successes of the five
previous WCW meetings. This technical report includes all the
technical papers presented at WCW'01.
%R 2001-018
%T STAIR: Practical AIMD Multirate Multicast Congestion Control
%A Byers, John
%A Kwon, Gu-In
%D September 3, 2001
%U http://www.cs.bu.edu/techreports/2001-018-stair.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Existing approaches for multirate multicast congestion control
are either friendly to TCP only over large time scales or introduce
unfortunate side effects, such as significant control traffic, wasted
bandwidth, or the need for modifications to existing routers. We
advocate a layered multicast approach in which steady-state receiver
reception rates emulate the classical TCP sawtooth derived from
additive-increase, multiplicative decrease (AIMD) principles. Our
approach introduces the concept of dynamic {\em stair} layers to
simulate various rates of additive increase for receivers with
heterogeneous round-trip times (RTTs), facilitated by a minimal
amount of IGMP control traffic. We employ a mix of cumulative and
non-cumulative layering to minimize the amount of excess bandwidth
consumed by receivers operating asynchronously behind a shared bottleneck.
We integrate these techniques together into a congestion control scheme
called STAIR which is amenable to those multicast applications which can
make effective use of arbitrary and time-varying subscription levels.
%R 2001-019
%T Generating Good Degree Distributions for Sparse Parity Check Codes using Oracles
%A Considine, Jeffrey
%D October 1, 2001
%U http://www.cs.bu.edu/techreports/2001-019-oracle-distribution.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Fast forward error correction codes are becoming an important
component in bulk content delivery. They fit in naturally with
multicast scenarios as a way to deal with losses and are now seeing
use in peer to peer networks as a basis for distributing load. In
particular, new irregular sparse parity check codes have been
developed with provable average linear time performance, a significant
improvement over previous codes. In this paper, we present a new
heuristic for generating codes with similar performance based on
observing a server with an oracle for client state. This heuristic is
easy to implement and provides further intuition into the need for an
irregular heavy tailed distribution.
%R 2001-020
%T GISMO: A Generator of Internet Streaming Media Objects and Workloads
%A Jin, Shudong
%A Bestavros, Azer
%D October 10, 2001
%U http://www.cs.bu.edu/techreports/2001-020-gismo.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper presents a tool called GISMO (Generator of Internet
Streaming Media Objects and workloads). GISMO enables the
specification of a number of streaming media access characteristics,
including object popularity, temporal correlation of requests,
seasonal access patterns, user session durations, user inter-activity
times, and variable bit-rate (VBR) self-similarity and marginal
distributions. The embodiment of these characteristics in GISMO
enables the generation of realistic and scalable request streams for
use in the benchmarking and comparative evaluation of Internet
streaming media delivery techniques. To demonstrate the usefulness of
GISMO, we present a case study that shows the importance of various
workload characteristics in determining the effectiveness of proxy
caching and server patching techniques in reducing bandwidth
requirements.
%R 2001-021
%T 3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views
%A Athitsos, Vassilis
%A Sclaroff, Stan
%D October 22, 2001
%U http://www.cs.bu.edu/techreports/2001-021-handpose-estimation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Ongoing work towards appearance-based 3D hand pose estimation from a
single image is presented. Using a 3D hand model and computer
graphics a large database of synthetic views is generated. The views
display different hand shapes as seen from arbitrary viewpoints. Each
synthetic view is automatically labeled with parameters describing its
hand shape and viewing parameters. Given an input image, the system
retrieves the most similar database views, and uses the shape and
viewing parameters of those views as candidate estimates for the
parameters of the input image. Preliminary results are presented, in
which appearance-based similarity is defined in terms of the chamfer
distance between edge images.
%R 2001-022
%T An Appearance-Based Framework for 3D Hand Shape Classification and Camera Viewpoint Estimation
%A Athitsos, Vassilis
%A Sclaroff, Stan
%D October 22, 2001
%U http://www.cs.bu.edu/techreports/2001-022-handshape-classification.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
An appearance-based framework for 3D hand shape classification
and simultaneous camera viewpoint estimation is presented. Given an
input image of a segmented hand, the most similar matches from a large
database of synthetic hand images are retrieved. The ground truth
labels of those matches, containing hand shape and camera viewpoint
information, are returned by the system as estimates for the input
image. Database retrieval is done hierarchically, by first quickly
rejecting the vast majority of all database views, and then ranking
the remaining candidates in order of similarity to the input. Four
different similarity measures are employed, based on edge location,
edge orientation, finger location and geometric moments.
%R 2001-023
%T Accelerating Internet Streaming Media Delivery using Network-Aware Partial Caching
%A Jin, Shudong
%A Bestavros, Azer
%A Iyengar, Arun
%D October 30, 2001
%U http://www.cs.bu.edu/techreports/2001-023-pcaching.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Internet streaming applications are adversly affected by network
conditions such as high packet loss rates and long delays. This paper
aims at mitigating such effects by leveraging the availability of
client-side caching proxies. We present a novel caching architecture
(and associated cache management algorithms) that turn edge caches
into accelerators of streaming media delivery. A salient feature of
our caching algorithms is that they allow partial caching of streaming
media objects and joint delivery of content from caches and origin
servers. The caching algorithms we propose are both network-aware and
stream-aware; they take into account the popularity of streaming media
objects, their bit-rate requirements, and the available bandwidth
between clients and servers. Using realistic models of Internet
bandwidth (derived from proxy cache logs and measured over real
Internet paths), we have conducted extensive simulations to evaluate
the performance of various caching management alternatives. Our
experiments demonstrate that network-aware caching algorithms can
significantly reduce service delay and improve overall stream
quality. Also, our experiments show that partial caching is
particularly effective when bandwidth variability is not very high.
%R 2001-024
%T Basis Token Consistency: A Practical Mechanism for Strong Web Cache Consistency
%A Bradley, Adam
%A Bestavros, Azer
%D October 30, 2001
%U http://www.cs.bu.edu/techreports/2001-024-btc.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
With web caching and cache-related services like CDNs and edge
services playing an increasingly significant role in the modern
internet, the problem of the weak consistency and coherence provisions
in current web protocols is becoming increasingly significant and
drawing the attention of the standards community. Toward this end, we
present definitions of consistency and coherence for web-like
environments, that is, distributed client-server information systems
where the semantics of interactions with resources are more general
than the read/write operations found in memory hierarchies and
distributed file systems. We then present a brief review of proposed
mechanisms which strengthen the consistency of caches in the web,
focusing upon their conceptual contributions and their weaknesses in
real-world practice. These insights motivate a new mechanism, which
we call ``Basis Token Consistency'' or BTC; when implemented at the
server, this mechanism allows any client (independent of the presence
and conformity of any intermediaries) to maintain a self-consistent
view of the server's state. This is accomplished by annotating
responses with additional per-resource application information which
allows client caches to recognize the obsolescence of currently cached
entities and identify responses from other caches which are already
stale in light of what has already been seen. The mechanism requires
no deviation from the existing client-server communication model, and
does not require servers to maintain any additional per-client state.
We discuss how our mechanism could be integrated into a
fragment-assembling Content Management System (CMS), and present a
simulation-driven performance comparison between the BTC algorithm and
the use of the Time-To-Live (TTL) heuristic.
%R 2001-025
%T Scalability of Multicast Delivery for Non-sequential Streaming Access
%A Jin, Shudong
%A Bestavros, Azer
%D October 30, 2001
%U http://www.cs.bu.edu/techreports/2001-025-multicast-scalability.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Multicast is considered a panacea for scalable streaming media
delivery over the Internet. To enable asynchronous service over a
multicast infrastructure, two categories of techniques have been
proposed: stream merging and periodic broadcasting. The scalability of
these techniques stems from the fact that for sequential streaming
access, the required server bandwidth grows {\em logarithmically} with
request arrival rates for stream merging techniques, and {\em
logarithmically} with the inverse of start-up delay for periodic
multicasting techniques. Recent studies raise doubts as to the
appropriateness of the sequential access model (in which access to a
stream proceeds uninterrupted from beginning to end). A non-sequential
access model (allowing access to start at random points in the stream)
is more accurate as it allows the modeling of partial access and
client inter-activity. In this paper, we analytically and
experimentally (re-)evaluate the scalability of multicast delivery
under a non-sequential access model. We show that under such a
realistic model, the required server bandwidth for any protocol
providing immediate service grows at least as fast as the {\em square
root} of the request arrival rate, and that the required server
bandwidth for any protocol providing delayed service grows {\em
linearly} with the inverse of the start-up delay. We also investigate
the impact of limited client bandwidth on scalability. We present
practical protocols, which provide immediate service to non-sequential
requests (subject to limited client bandwidth), and which are
near-optimal in that the required server bandwidth is very close to
its lower bound.
%R 2001-026
%T How does TCP generate Pseudo-self-similarity? (ERRATA)
%A Guo, Liang
%A Crovella, Mark
%A Matta, Ibrahim
%D November 7, 2001
%U http://www.cs.bu.edu/techreports/2001-026-tcp-pseudo-ss-errata.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this note we clarify and amend a number of points made in BUCS
Technical Report BUCS-TR-2001-014 (also published in MASCOTS 2001).
We address the relationship to Technical Report UMass-CMPSC-00-55 by
Figueiredo, Liu, Misra, and Towsley.
%R 2002-001
%T Specialized Mappings Architecture with Applications to Vision-Based Estimation of Articulated Body Pose (PhD Thesis)
%A Rosales, Romer
%D January 15, 2002
%U http://www.cs.bu.edu/techreports/2002-001-rosales-phd.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A fundamental task of vision systems is to infer the state of the
world given some form of visual observations. From a computational
perspective, this often involves facing an ill-posed problem; e.g.,
information is lost via projection of the 3D world into a 2D image.
Solution of an ill-posed problem requires additional information,
usually provided as a model of the underlying process. It is
important that the model be both computationally feasible as well as
theoretically well-founded. In this thesis, a probabilistic, nonlinear
supervised computational learning model is proposed: the Specialized
Mappings Architecture (SMA). The SMA framework is demonstrated in a
computer vision system that can estimate the articulated pose
parameters of a human body or human hands, given images obtained via
one or more uncalibrated cameras.
The SMA consists of several specialized forward mapping functions that
are estimated automatically from training data, and a possibly known
feedback function. Each specialized function maps certain domains of
the input space (e.g., image features) onto the output space (e.g.,
articulated body parameters). A probabilistic model for the
architecture is first formalized. Solutions to key algorithmic
problems are then derived: simultaneous learning of the specialized
domains along with the mapping functions, as well as performing
inference given inputs and a feedback function. The SMA employs a
variant of the Expectation-Maximization algorithm and approximate
inference. The approach allows the use of alternative conditional
independence assumptions for learning and inference, which are derived
from a forward model and a feedback model.
Experimental validation of the proposed approach is conducted in the
task of estimating articulated body pose from image
silhouettes. Accuracy and stability of the SMA framework is tested
using artificial data sets, as well as synthetic and real video
sequences of human bodies and hands.
%R 2002-002
%T Securing Bulk Content Almost for Free
%A Byers, John
%A Cheng, Mei Chin
%A Considine, Jeffrey
%A Itkis, Gene
%A Yeung, Alex
%D January 22, 2002
%U http://www.cs.bu.edu/techreports/2002-002-securecodes.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Content providers often consider the costs of security to be greater
than the losses they might incur without it; many view ``casual
piracy'' as their main concern. Our goal is to provide a low cost
defense against such attacks while maintaining rigorous security
guarantees. Our defense is integrated with and leverages fast forward
error correcting codes, such as Tornado codes, which are widely used
to facilitate reliable delivery of rich content. We tune one such
family of codes - while preserving their original desirable properties
- to guarantee that none of the original content can be recovered
whenever a key subset of encoded packets is missing. Ultimately we
encrypt only these key codewords (only 4% of all transmissions),
making the security overhead negligible.
%R 2002-003
%T Deanonymizing Users of the SafeWeb Anonymizing Service
%A Martin, David
%A Schulman, Andrew
%D February 11, 2002
%U http://www.cs.bu.edu/techreports/2002-003-deanonymizing-safeweb.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Abstract
The SafeWeb anonymizing system has been lauded by the press and loved
by its users; self-described as "the most widely used online privacy
service in the world," it served over 3,000,000 page views per day at
its peak. SafeWeb was designed to defeat content blocking by
firewalls and to defeat Web server attempts to identify users, all
without degrading Web site behavior or requiring users to install
specialized software. In this article we describe how these
fundamentally incompatible requirements were realized in SafeWeb's
architecture, resulting in spectacular failure modes under simple
JavaScript attacks. These exploits allow adversaries to turn SafeWeb
into a weapon against its users, inflicting more damage on them than
would have been possible if they had never relied on SafeWeb
technology. By bringing these problems to light, we hope to remind
readers of the chasm that continues to separate popular and technical
notions of security.
%R 2002-004
%T Small-World Internet Topologies: Possible Causes and Implications on Scalability of End-System Multicast
%A Jin, Shudong
%A Bestavros, Azer
%D January 30, 2002
%U http://www.cs.bu.edu/techreports/2002-004-internet-topology-smallworld-sources.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Recent work has shown the prevalence of small-world graphs in many
networks. Small-world graphs exhibit a high degree of clustering, yet
have typically short path lengths between arbitrary vertices. Internet
AS-level maps have been shown to exhibit small-world behavior. In this
paper, we show that indeed both Internet AS-level and router-level
maps exhibit small-world behavior. We attribute such behavior to two
possible causes--namely the high variability of vertex degree
distributions (which are known to follow approximately a power law and
the preference of vertices to have local connections. We show that
both causes contribute with different relative degrees to the
small-world behavior of AS-level and router-level topologies. Our
findings underscore the inefficacy of the Barabasi-Albert model in
explaining the growth process of Internet, and provide a basis for
more promising approaches to the development of Internet topology
generators. We present such a generator and show the resemblance of
the synthetic maps it generates to real Internet AS-level and
router-level maps. Using these maps, we have examined how small-world
behavior affects the scalability of end system multicast. Our findings
indicate that higher degree of clustering in small-world graphs
results in slower network neighborhood expansion, and in longer
average path length between two arbitrary vertices, which in turn
results in better scaling of end system multicast.
%R 2002-005
%T PeriScope: An Active Internet Probing and Measurement API
%A Harfoush, Khaled
%A Bestavros, Azer
%A Byers, John
%D January 30, 2002
%U http://www.cs.bu.edu/techreports/2002-005-periscope.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Growing interest in inference and prediction of network
characteristics is justified by its importance for a variety of
network-aware applications. One widely adopted strategy to
characterize network conditions relies on active, end-to-end probing
of the network. Active end-to-end probing techniques differ in (1) the
structural composition of the probes they use (e.g., number and size
of packets, the destination of various packets, the protocols used,
etc.), (2) the entity making the measurements (e.g. sender
vs. receiver), and (3) the techniques used to combine measurements in
order to infer specific metrics of interest. In this paper, we
present PeriScope, a Linux API that enables the definition of new
probing structures and inference techniques from user space through a
flexible interface. PeriScope requires no support from clients beyond
the ability to respond to ICMP ECHO REQUESTs and is designed to
minimize user/kernel crossings and to ensure various constraints
(e.g., back-to-back packet transmissions, fine-grained timing
measurements) We show how to use PeriScope for two different probing
purposes, namely the measurement of shared packet losses between pairs
of endpoints and for the measurement of subpath bandwidth. Results
from Internet experiments for both of these goals are also presented.
%R 2002-007
%T Informed Content Delivery Across Adaptive Overlay Networks
%A Byers, John
%A Considine, Jeffrey
%A Mitzenmacher, Michael
%A Rost, Stanislav
%D March 4, 2002
%U http://www.cs.bu.edu/techreports/2002-007-informed-overlay-delivery.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Overlay networks have emerged as a powerful and highly flexible method
for delivering content. We study how to optimize throughput of large,
multipoint transfers across richly connected overlay networks,
focusing on the question of what to put in each transmitted packet.
We first make the case for transmitting encoded content in this
scenario, arguing for the digital fountain approach which enables
end-hosts to efficiently restitute the original content of size n from
a subset of any n symbols from a large universe of encoded symbols.
Such an approach affords reliability and a substantial degree of
application-level flexibility, as it seamlessly tolerates packet loss,
connection migration, and parallel transfers. However, since the sets
of symbols acquired by peers are likely to overlap substantially, care
must be taken to enable them to collaborate effectively. We provide a
collection of useful algorithmic tools for efficient estimation,
summarization, and approximate reconciliation of sets of symbols
between pairs of collaborating peers, all of which keep messaging
complexity and computation to a minimum. Through simulations and
experiments on a prototype implementation, we demonstrate the
performance benefits of our informed content delivery mechanisms and
how they complement existing overlay network architectures.
%R 2002-008
%T End-to-End Inference of Loss Nature in a Hybrid Wired/Wireless Environment
%A Liu, Jun
%A Matta, Ibrahim
%A Crovella, Mark
%D March 14, 2002
%U http://www.cs.bu.edu/techreports/2002-008-loss-hmm.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
End-to-End differentiation between wireless and congestion loss can
equip TCP control so it operates effectively in a hybrid
wired/wireless environment. Our approach integrates two techniques:
packet loss pairs (PLP) and Hidden Markov Modeling (HMM). A packet
loss pair is formed by two back-to-back packets, where one packet is
lost while the second packet is successfully received. The purpose is
for the second packet to carry the state of the network path, namely
the round trip time (RTT), at the time the other packet is lost. Under
realistic conditions, PLP provides strong differentiation between
congestion and wireless type of loss based on distinguishable RTT
distributions. An HMM is then trained so observed RTTs can be mapped
to model states that represent either congestion loss or wireless
loss. Extensive simulations confirm the accuracy of our HMM-based
technique in classifying the cause of a packet loss. We also show the
superiority of our technique over the Vegas predictor, which was
recently found to perform best and which exemplifies other existing
loss labeling techniques.
%R 2002-009
%T Scheduling Flows with Unknown Sizes: Approximate Analysis
%A Guo, Liang
%A Matta, Ibrahim
%D March 21, 2002
%U http://www.cs.bu.edu/techreports/2002-009-multi-class-processor-sharing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Previous studies have shown that giving preferential treatment to
short jobs helps reduce the average system response time, especially
when the job size distribution possesses the heavy-tailed
property. Since it has been shown that the TCP flow length
distribution also has the same property, it is natural to let short
TCP flows enjoy better service inside the network. Analyzing such
discriminatory system requires modification to traditional job
scheduling models since usually network traffic managers do not have
detailed knowledge about individual flows such as their lengths. The
Multi-Level (ML) queue, proposed by Kleinrock, can be used to
characterize such system. In an ML queueing system, the priority of a
flow is reduced as the flow stays longer. We present an approximate
analysis of the ML queueing system to obtain a closed-form solution of
the average system response time function. We show that the response
time of short flows can be significantly reduced without penalizing
long flows.
%R 2002-010
%T Surface Reconstruction from Multiple Views using Rational B-Splines and Knot Insertion
%A Siddiqui, Matheen
%A Sclaroff, Stan
%D March 25, 2002
%U http://www.cs.bu.edu/techreports/2002-010-Bspline.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A method for reconstruction of 3D rational B-spline surfaces from
multiple views is proposed. Given corresponding features in multiple
views, though not necessarily visible in all views, the surface is
reconstructed. First 2D B-spline patches are fitted to each view.
The 3D B-splines and projection matricies can then be extracted from
the 2D B-splines using factorization methods. The surface fit is then
further refined via an iterative procedure. Finally, a hierarchal
fitting scheme is proposed to allow modeling of complex surfaces by
means of knot insertion. Experiments with real imagery demonstrate the
efficacy of the approach.
%R 2002-011
%T Automatic Detection of Relevant Head Gestures in American Sign Language Communication
%A Erdem, Ugur Murat
%A Sclaroff, Stan
%D May 3, 2002
%U http://www.cs.bu.edu/techreports/2002-011-head-motion-detection.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
An automated system for detection of head movements is described. The goal
is to label relevant head gestures in video of American Sign Language (ASL)
communication. In the system, a 3D head tracker recovers head rotation and
translation parameters from monocular video. Relevant head gestures are
then detected by analyzing the length and frequency of the motion signal's
peaks and valleys. Each parameter is analyzed independently, due to the
fact that a number of relevant head movements in ASL are associated with
major changes around one rotational axis. No explicit training of the
system is necessary. Currently, the system can detect ``head shakes." In
experimental evaluation, classification performance is compared against
ground-truth labels obtained from ASL linguists. Initial results are
promising, as the system matches the linguists' labels in a significant
number of cases.
%R 2002-012
%T Differentiated Control of Web Traffic: A Numerical Analysis
%A Guo, Liang
%A Matta, Ibrahim
%D May 10, 2002
%U http://www.cs.bu.edu/techreports/2002-012-diff-web-numerical.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Internet measurements show that the size distribution of Web-based
transactions is usually very skewed; a few large requests constitute
most of the total traffic. Motivated by the advantages of scheduling
algorithms which favor short jobs, we propose to perform
differentiated control over Web-based transactions to give
preferential service to short web requests. The control is realized
through service semantics provided by Internet Traffic Managers, a
Diffserv-like architecture. To evaluate the performance of such a
control system, it is necessary to have a fast but accurate analytical
method. To this end, we model the Internet as a time-shared system
and propose a numerical approach which utilizes Kleinrock's
conservation law to solve the model. The numerical results are shown
to match well those obtained by packet-level simulation, which runs
orders of magnitude slower than our numerical method.
%R 2002-013
%T On the Scalability-Performance Tradeoffs in MPLS and IP Routing
%A Yilmaz, Selma
%A Matta, Ibrahim
%D May 10, 2002
%U http://www.cs.bu.edu/techreports/2002-013-tradeoff-mpls-ip.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
MPLS (Multi-Protocol Label Switching) has recently emerged to
facilitate the engineering of network traffic. This can be achieved
by directing packet flows over paths that satisfy multiple
requirements. MPLS has been regarded as an enhancement to traditional
IP routing, which has the following problems: (1) all packets with the
same IP destination address have to follow the same path through the
network; and (2) paths have often been computed based on static and
single link metrics. These problems may cause traffic concentration,
and thus degradation in quality of service. In this paper, we
investigate by simulations a range of routing solutions and examine
the tradeoff between scalability and performance. At one extreme, IP
packet routing using dynamic link metrics provides a stateless
solution but may lead to routing oscillations. At the other extreme,
we consider a recently proposed Profile-based Routing (PBR), which
uses knowledge of potential ingress-egress pairs as well as the
traffic profile among them. Minimum Interference Routing (MIRA) is
another recently proposed MPLS-based scheme, which only exploits
knowledge of potential ingress-egress pairs but not their traffic
profile. MIRA and the more conventional widest-shortest path (WSP)
routing represent alternative MPLS-based approaches on the spectrum of
routing solutions. We compare these solutions in terms of utility,
bandwidth acceptance ratio as well as their scalability (routing state
and computational overhead) and load balancing capability. While the
simplest of the per-flow algorithms we consider, the performance of
WSP is close to dynamic per-packet routing, without the potential
instabilities of dynamic routing.
%R 2002-014
%T A Hierarchical Characterization of a Live Streaming Media Workload
%A Veloso, Eveline
%A Almeida, Virgilio
%A Meira, Wagner
%A Bestavros, Azer
%A Jin, Shudong
%D May 10, 2002
%U http://www.cs.bu.edu/techreports/2002-014-internet-live-streaming-characterization.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present what we believe to be the first thorough characterization
of live streaming media content delivered over the Internet. Our
characterization of over five million requests spanning a 28-day
period is done at three increasingly granular levels, corresponding to
clients, sessions, and transfers. Our findings support two important
conclusions. First, we show that the nature of interactions between
users and objects is fundamentally different for live versus stored
objects. Access to stored objects is user driven, whereas access to
live objects is object driven}. This reversal of active/passive roles
of users and objects leads to interesting dualities. For instance, our
analysis underscores a Zipf-like profile for user interest in a given
object, which is to be contrasted to the classic Zipf-like popularity
of objects for a given user. Also, our analysis reveals that transfer
lengths are highly variable and that this variability is due to the
stickiness of clients to a particular live object, as opposed to
structural (size) properties of objects. Second, based on
observations we make, we conjecture that the particular
characteristics of live media access workloads are likely to be highly
dependent on the nature of the live content being accessed. In our
study, this dependence is clear from the strong temporal correlations
we observed in the traces, which we attribute to the synchronizing
impact of live content on access characteristics. Based on our
analyses, we present a model for live media workload generation that
incorporates many of our findings, and which we implement in GISMO.
%R 2002-015
%T On the Geographic Location of Internet Resources
%A Lakhina, Anukool
%A Byers, John
%A Crovella, Mark
%A Matta, Ibrahim
%D May 21, 2002
%U http://www.cs.bu.edu/techreports/2002-015-internet-geography.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
One relatively unexplored question about the Internet's physical
structure concerns the geographical location of its components:
routers, links and autonomous systems (ASes). We study this question
using two large inventories of Internet routers and links, collected
by different methods and about two years apart. We first map each
router to its geographical location using two different
state-of-the-art tools. We then study the relationship between router
location and population density; between geographic distance and link
density; and between the size and geographic extent of ASes. Our
findings are consistent across the two datasets and both mapping
methods. First, as expected, router density per person varies widely
over different economic regions; however, in economically homogeneous
regions, router density shows a strong superlinear relationship to
population density. Second, the probability that two routers are
directly connected is strongly dependent on distance; our data is
consistent with a model in which a majority (up to 75-95\%) of link
formation is based on geographical distance (as in the Waxman topology
generation method). Finally, we find that ASes show high variability
in geographic size, which is correlated with other measures of AS size
(degree and number of interfaces). Among small to medium ASes, ASes
show wide variability in their geographic dispersal; however, all ASes
exceeding a certain threshold in size are maximally dispersed
geographically. These findings have many implications for the next
generation of topology generators, which we envisage as producing
router-level graphs annotated with attributes such as link latencies,
AS identifiers and geographical locations.
%R 2002-016
%T Effectiveness of Loss Labeling in Improving TCP Performance in Wired/Wireless Networks
%A Barman, Dhiman
%A Matta, Ibrahim
%D May 22, 2002
%U http://www.cs.bu.edu/techreports/2002-016-loss-labeling-tcp-flip-flop.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The current congestion-oriented design of TCP hinders its ability to
perform well in hybrid wireless/wired networks. We propose a new
improvement on TCP NewReno (NewReno-FF) using a new loss labeling
technique to discriminate wireless from congestion losses. The
proposed technique is based on the estimation of average and variance
of the round trip time using a filter called Flip Flop filter that is
augmented with history information. We show the comparative
performance of TCP NewReno, NewReno-FF, and TCP Westwood through
extensive simulations. We study the fundamental gains and limits
using TCP NewReno with varying Loss Labeling accuracy (NewReno-LL) as
a benchmark. Lastly our investigation opens up important research
directions. First, there is a need for a finer grained classification
of losses (even within congestion and wireless losses) for TCP in
heterogeneous networks. Second, it is essential to develop an
appropriate control strategy for recovery after the correct
classification of a packet loss.
%R 2002-017
%T Safe Composition of Web Communication Protocols for Extensible Edge Services
%A Bradley, Adam
%A Bestavros, Azer
%A Kfoury, Assaf
%D May 22, 2002
%U http://www.cs.bu.edu/techreports/2002-017-http-safe-compositions.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
As new multi-party edge services are deployed on the Internet,
application-layer protocols with complex communication models and
event dependencies are increasingly being specified and adopted. To
ensure that such protocols (and compositions thereof with existing
protocols) do not result in undesirable behaviors (e.g., livelocks)
there needs to be a methodology for the automated checking of the
``safety'' of these protocols. In this paper, we present ingredients
of such a methodology. Specifically, we show how SPIN, a tool from the
formal systems verification community, can be used to quickly identify
problematic behaviors of application-layer protocols with non-trivial
communication models---such as HTTP with the addition of the ``100
Continue'' mechanism. As a case study, we examine several versions of
the specification for the Continue mechanism; our experiments
mechanically uncovered multi-version interoperability problems,
including some which motivated revisions of HTTP/1.1 and some which
persist even with the current version of the protocol. One such
problem resembles a classic degradation-of-service attack, but can
arise between well-meaning peers. We also discuss how the methods we
employ can be used to make explicit the requirements for hardening a
protocol's implementation against potentially malicious peers, and for
verifying an implementation's interoperability with the full range of
allowable peer behaviors.
%R 2002-018
%T Unicast Routing: Cost-Performance Tradeoffs
%A Yilmaz, Selma
%A Matta, Ibrahim
%D July 5, 2002
%U http://www.cs.bu.edu/techreports/2002-018-routing-tradeoffs.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The objective of unicast routing is to find a path from a source to a
destination. Conventional routing has been used mainly to provide
connectivity. It lacks the ability to provide any kind of service
guarantees and smart usage of network resources. Improving performance
is possible by being aware of both traffic characteristics and current
available resources. This paper surveys a range of routing solutions,
which can be categorized depending on the degree of the awareness of
the algorithm: (1) QoS/Constraint-based routing solutions are aware of
traffic requirements of individual connection requests; (2)
Traffic-aware routing solutions assume knowledge of the location of
communicating ingress-egress pairs and possibly the traffic demands
among them; (3) Routing solutions that are both QoS-aware as (1) and
traffic-aware as (2); (4) Best-effort solutions are oblivious to both
traffic and QoS requirements, but are adaptive only to current
resource availability. The best performance can be achieved by having
all possible knowledge so that while finding a path for an individual
flow, one can make a smart choice among feasible paths to increase the
chances of supporting future requests. However, this usually comes at
the cost of increased complexity and decreased scalability. In this
paper, we discuss such cost-performance tradeoffs by surveying
proposed heuristic solutions and hybrid approaches.
%R 2002-019
%T Fast Approximate Reconciliation of Set Differences
%A Byers, John
%A Considine, Jeffrey
%A Mitzenmacher, Michael
%D July 11, 2002
%U http://www.cs.bu.edu/techreports/2002-019-approx-reconciliation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present new, simple, efficient data structures for approximate
reconciliation of set differences, a useful standalone primitive for
peer-to-peer networks and a natural subroutine in methods for exact
reconciliation. In the approximate reconciliation problem, peers A and
B respectively have subsets of elements S(A) and S(B) of a large
universe U. Peer A wishes to send a short message M to peer B with
the goal that B should use M to determine as many elements in the set
S(B) - S(A) as possible. To avoid the expense of round trip
communication times, we focus on the situation where a single message
M is sent. We motivate the performance tradeoffs between message size,
accuracy and computation time for this problem with a straightforward
approach using Bloom filters. We then introduce approximation
reconciliation trees, a more computationally efficient solution that
combines techniques from Patricia tries, Merkle trees, and Bloom
filters. We present an analysis of approximation reconciliation trees
and provide experimental results comparing the various methods
proposed for approximate reconciliation.
%R 2002-020
%T Graph Wavelets for Spatial Traffic Analysis
%A Crovella, Mark
%A Kolaczyk, Eric
%D July 15, 2002
%U http://www.cs.bu.edu/techreports/2002-020-graph-wavelets.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A number of problems in network operations and engineering call for
new methods of traffic analysis. While most existing traffic analysis
methods are fundamentally temporal, there is a clear need for the
analysis of traffic across multiple network links --- that is, for
spatial traffic analysis. In this paper we give examples of problems
that can be addressed via spatial traffic analysis. We then propose a
formal approach to spatial traffic analysis based on the wavelet
transform. Our approach generalizes the traditional wavelet transform
so that it can be applied to data elements connected via an arbitrary
topology. We explore the necessary and desirable properties of this
approach (graph wavelets) and consider some of its possible
realizations. We then apply graph wavelets to measurements from an
operating network. Our results show that graph wavelets are very
useful for our motivating problems; for example, they can be used to
form highly summarized views of an entire network's traffic load, to
gain insight into a network's global traffic response to a link
failure, and to localize the extent of a failure event within the
network.
%R 2002-021
%T Sampling Biases in IP Topology Measurements
%A Lakhina, Anukool
%A Byers, John
%A Crovella, Mark
%A Xie, Peng
%D July 15, 2002
%U http://www.cs.bu.edu/techreports/2002-021-topology-sampling-bias.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Considerable attention has been focused on the properties of graphs
derived from Internet measurements. Router-level topologies collected
via traceroute studies have led some authors to conclude that the
router graph of the Internet is a scale-free graph, or more generally
a power-law random graph. In such a graph, the degree distribution of
nodes follows a distribution with a power-law tail. In this paper we
argue that the evidence to date for this conclusion is at best
insufficient. We show that graphs appearing to have power-law degree
distributions can arise surprisingly easily, when sampling graphs
whose true degree distribution is not at all like a power-law. For
example, given a classical Erdos-Renyi sparse, random graph, the
subgraph formed by a collection of shortest paths from a small set of
random sources to a larger set of random destinations can easily
appear to show a degree distribution remarkably like a power-law. We
explore the reasons for how this effect arises, and show that in such
a setting, edges are sampled in a highly biased manner. This insight
allows us to distinguish measurements taken from the Erdos-Renyi
graphs from those taken from power-law random graphs. When we apply
this distinction to a number of well-known datasets, we find that the
evidence for sampling bias in these datasets is strong.
%R 2002-022
%T On the Intrinsic Locality Properties of Web Reference Streams
%A Fonseca, Rodrigo
%A Almeida, Virgilio
%A Crovella, Mark
%A Abrahao, Bruno
%D July 15, 2002
%U http://www.cs.bu.edu/techreports/2002-022-web-stream-locality.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
There has been considerable work done in the study of Web reference
streams: sequences of requests for Web objects. In particular, many
studies have looked at the locality properties of such streams,
because of the impact of locality on the design and performance of
caching and prefetching systems. However, a general framework for
understanding why reference streams exhibit given locality properties
has not yet emerged. In this paper we take a first step in this
direction. We propose a framework for describing how reference
streams are transformed as they pass through the Internet, based on
three operations: aggregation, disaggregation, and filtering. We also
propose metrics to capture the temporal locality of reference streams
in this framework. We argue that these metrics (marginal entropy and
interreference coefficient of variation) are more natural and more
useful than previously proposed metrics for temporal locality; and we
show that these metrics provide insight into the nature of reference
stream transformations in the Web.
%R 2002-023
%T A Self-initializing Eyebrow Tracker for Binary Switch Emulation
%A Lombardi, Jonathan
%A Betke, Margrit
%D September 15, 2002
%U http://www.cs.bu.edu/techreports/2002-023-eyebrow-tracker.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We designed the "Eyebrow-Clicker," a camera-based human computer
interface system that implements a new form of binary switch. When
the user raises his or her eyebrows, the binary switch is activated
and a selection command is issued. The Eyebrow-Clicker thus replaces
the "click" functionality of a mouse. The system initializes itself
by detecting the user's eyes and eyebrows, tracks these features at
frame rate, and recovers in the event of errors. The initialization
uses the natural blinking of the human eye to select suitable
templates for tracking. Once execution has begun, a user therefore
never has to restart the program or even touch the computer. In our
experiments with human-computer interaction software, the system
successfully determined 93% of the time when a user raised his
eyebrows.
%R 2002-024
%T Cache-and-Relay Streaming Media Delivery for Asynchronous Clients
%A Jin, Shudong
%A Bestavros, Azer
%D September 20, 2002
%U http://www.cs.bu.edu/techreports/2002-024-cache-relay.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider the problem of delivering popular streaming media to a
large number of asynchronous clients. We propose and evaluate a
cache-and-relay end-system multicast approach, whereby a client
joining a multicast session caches the stream, and if needed, relays
that stream to neighboring clients which may join the multicast
session at some later time. This cache-and-relay approach is fully
distributed, scalable, and efficient in terms of network link cost. In
this paper we analytically derive bounds on the network link cost of
our cache-and-relay approach, and we evaluate its performance under
assumptions of limited client bandwidth and limited client cache
capacity. When client bandwidth is limited, we show that although
finding an optimal solution is NP-hard, a simple greedy algorithm
performs surprisingly well in that it incurs network link costs that
are very close to a theoretical lower bound. When client cache
capacity is limited, we show that our cache-and-relay approach can
still significantly reduce network link cost. We have evaluated our
cache-and-relay approach using simulations over large, synthetic
random networks, power-law degree networks, and small-world networks,
as well as over large real router-level Internet maps.
%R 2002-025
%T Smooth Multirate Multicast Congestion Control
%A Kwon, Gu-In
%A Byers, John
%D September 27, 2002
%U http://www.cs.bu.edu/techreports/2002-025-smcc.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A significant impediment to deployment of multicast services is the
daunting technical complexity of developing, testing and validating
congestion control protocols fit for wide-area deployment. Protocols such
as pgmcc and TFMCC have recently made considerable progress on the single
rate case, i.e. where one dynamic reception rate is maintained for all
receivers in the session. However, these protocols have limited
applicability, since scaling to session sizes beyond tens of participants
necessitates the use of multiple rate protocols. Unfortunately, while
existing multiple rate protocols exhibit better scalability, they are
both less mature than single rate protocols and suffer from high
complexity.
We propose a new approach to multiple rate congestion control that
leverages proven single rate congestion control methods by orchestrating
an ensemble of independently controlled single rate sessions. We describe
SMCC, a new multiple rate equation-based congestion control algorithm for
layered multicast sessions that employs TFMCC as the primary underlying
control mechanism for each layer. SMCC combines the benefits of TFMCC
(smooth rate control, equation-based TCP friendliness) with the
scalability and flexibility of multiple rates to provide a sound multiple
rate multicast congestion control policy.
%R 2002-026
%T Scalable Peer-to-Peer Indexing with Constant State
%A Considine, Jeffrey
%A Florio, Thomas
%D September 27, 2002
%U http://www.cs.bu.edu/techreports/2002-026-scalable-p2p-indexing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a distributed indexing scheme for peer to peer
networks. Past work on distributed indexing traded off fast search times
with non-constant degree topologies or network-unfriendly behavior such as
flooding. In contrast, the scheme we present optimizes all three of these
performance measures. That is, we provide logarithmic round searches while
maintaining connections to a fixed number of peers and avoiding network
flooding. In comparison to the well known scheme Chord, we provide
competitive constant factors. Finally, we observe that arbitrary linear
speedups are possible and discuss both a general brute force approach and
specific economical optimizations.
%R 2002-027
%T A Spectrum of TCP-friendly Window-based Congestion Control Algorithms
%A Jin, Shudong
%A Guo, Liang
%A Matta, Ibrahim
%A Bestavros, Azer
%D October 15, 2002
%U http://www.cs.bu.edu/techreports/2002-027-spectrum-tcp-friendly.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
[[This technical report revises BUCS-TR-2001-015 and is a longer
version of a paper to appear in IEEE/ACM Transactions on Networking.]]
The increased diversity of Internet application requirements has
spurred recent interests in transport protocols with flexible
transmission controls. In window-based congestion control schemes,
increase rules determine how to probe available bandwidth, whereas
decrease rules determine how to back off when losses due to congestion
are detected. The parameterization of these control rules is done so
as to ensure that the resulting protocol is TCP-friendly in terms of
the relationship between throughput and loss rate. In this paper, we
define a new spectrum of window-based congestion control algorithms
that are TCP-friendly as well as TCP-compatible under RED. Contrary
to previous memory-less controls, our algorithms utilize history
information in their control rules. Our proposed algorithms have two
salient features: (1) They enable a wider region of TCP-friendliness,
and thus more flexibility in trading off among smoothness,
aggressiveness, and responsiveness; and (2) they ensure a faster
convergence to fairness under a wide range of system conditions. SIMD
is one instance of this spectrum of algorithms, in which the
congestion window is increased super-linearly with time since the
detection of the last loss. Compared to recently proposed TCP-friendly
AIMD and binomial algorithms, we demonstrate the superiority of SIMD
in: (1) adapting to sudden increases in available bandwidth, while
maintaining competitive smoothness and responsiveness; and (2) rapidly
converging to fairness and efficiency.
%R 2002-028
%T A Short History of Computational Complexity
%A Fortnow, Lance
%A Homer, Steven
%D October 30, 2002
%U http://www.cs.bu.edu/techreports/2002-028-computational-complexity-history.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X 2002-028-computational-complexity-history.ps.Z
A brief history of the major issues and developments in computational
complexity theory over the past 30 years is presented. This paper will
appear in the volume entitled, "A History of Mathematical Logic",
edited by D. van Dalen, J. Dawson and A. Kanamori, and published by
Elsevier.
%R 2002-029
%T Simple Load Balancing for Distributed Hash Tables
%A Byers, John
%A Considine, Jeffrey
%A Mitzenmacher, Michael
%D November 1, 2002
%U http://www.cs.bu.edu/techreports/2002-029-simple-load-balancing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Distributed hash tables have recently become a useful building
block for a variety of distributed applications. However, current schemes
based upon consistent hashing require both considerable implementation
complexity and substantial storage overhead to achieve desired load
balancing goals. We argue in this paper that these goals can be achieved
more simply and more cost-effectively. First, we suggest the direct
application of the ``power of two choices'' paradigm, whereby an item is
stored at the less loaded of two (or more) random alternatives. We then
consider how associating a small constant number of hash values with a key
can naturally be extended to support other load balancing methods,
including load-stealing or load-shedding schemes, as well as providing
natural fault-tolerance mechanisms.
%R 2002-030
%T Validating Arbitrarily Large Network Protocol Compositions with Finite Computation
%A Bradley, Adam
%A Bestavros, Azer
%A Kfoury, Assaf
%D October 31, 2002
%U http://www.cs.bu.edu/techreports/2002-030-finite-net-protocol-compositions.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Formal tools like finite-state model checkers have proven useful in
verifying the correctness of systems of bounded size and for hardening
single system components against arbitrary inputs. However,
conventional applications of these techniques are not well suited to
characterizing emergent behaviors of large compositions of processes.
In this paper, we present a methodology by which arbitrarily large
compositions of components can, if sufficient conditions are proven
concerning properties of small compositions, be modeled and completely
verified by performing formal verifications upon only a finite set of
compositions. The sufficient conditions take the form of reductions,
which are claims that particular sequences of components will be
causally indistinguishable from other shorter sequences of components.
We show how this methodology can be applied to a variety of network
protocol applications, including two features of the HTTP protocol, a
simple active networking applet, and a proposed web cache consistency
algorithm. We also doing discuss its applicability to framing protocol
design goals and to representing systems which employ non-model-checking
verification methodologies. Finally, we briefly discuss how we hope to
broaden this methodology to more general topological compositions of
network applications.
%R 2002-031
%T Cluster-based Optimizations for Distributed Hash Tables
%A Considine, Jeffrey
%D November 1, 2002
%U http://www.cs.bu.edu/techreports/2002-031-DHT-cluster-optimization.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider the problem of performing topological optimizations of
distributed hash tables. Such hash tables include Chord and Tapestry
and are a popular building block for distributed applications.
Optimizing topologies over one dimensional hash spaces is particularly
difficult as the higher dimensionality of the underlying network makes
close fits unlikely. Instead, current schemes are limited to
heuristically performing local optimizations finding the best of small
random set of peers. We propose a new class of topology optimizations
based on the existence of clusters of close overlay members within the
underlying network. By constructing additional overlays for each
cluster, a significant portion of the search procedure can be
performed within the local cluster with a corresponding reduction in
the search time. Finally, we discuss the effects of these additional
overlays on spatial locality and other load balancing schemes.
%R 2003-001
%T On the Size Distribution of Autonomous Systems
%A Fayed, Marwan
%A Krapivsky, Paul
%A Byers, John
%A Crovella, Mark
%A Finkel, David
%A Redner, Sid
%D January 17, 2003
%U http://www.cs.bu.edu/techreports/2003-001-AS-size-distribution.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper explores reasons for the high degree of variability in the
sizes of ASes that have recently been observed, and the processes by
which this variable distribution develops. AS size distribution is
important for a number of reasons. First, when modeling network
topologies, an AS size distribution assists in labeling routers with
an associated AS. Second, AS size has been found to be positively
correlated with the degree of the AS (number of peering links), so
understanding the distribution of AS sizes has implications for AS
connectivity properties. Our model accounts for AS births, growth, and
mergers. We analyze two models: one incorporates only the growth of
hosts and ASes, and a second extends that model to include mergers of
ASes. We show analytically that, given reasonable assumptions about
the nature of mergers, the resulting size distribution exhibits a
power law tail with the exponent independent of the details of the
merging process. We estimate parameters of the models from
measurements obtained from Internet registries and from BGP tables.
We then compare the models solutions to empirical AS size distribution
taken from Mercator and Skitter datasets, and find that the simple
growth-based model yields general agreement with empirical data. Our
analysis of the model in which mergers occur in a manner independent
of the size of the merging ASes suggests that more detailed analysis
of merger processes is needed.
%R 2003-002
%T Geometric Generalizations of the Power of Two Choices
%A Byers, John
%A Considine, Jeffrey
%A Mitzenmacher, Michael
%D February 6, 2003
%U http://www.cs.bu.edu/techreports/2003-002-power-of-two-generalizations.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A well-known paradigm for load balancing in distributed systems is the
``power of two choices,'' whereby an item is stored at the less loaded of
two (or more) random alternative servers. We investigate the power of two
choices in natural settings for distributed computing where items and
servers reside in a geometric space and each item is associated with the
server that is its nearest neighbor. This is in fact the backdrop for
distributed hash tables such as Chord, where the geometric space is
determined by clockwise distance on a one-dimensional ring.
Theoretically, we consider the following load balancing problem. Suppose
that servers are initially hashed uniformly at random to points in the
space. Sequentially, each item then considers d candidate insertion
points also chosen uniformly at random from the space, and selects the
insertion point whose associated server has the least load. For the
one-dimensional ring, and for Euclidean distance on the two-dimensional
torus, we demonstrate that when n data items are hashed to n servers, the
maximum load at any server is log log n / log d + O(1) with high
probability. While our results match the well-known bounds in the
standard setting in which each server is selected equiprobably, our
applications do not have this feature, since the sizes of the
nearest-neighbor regions around servers are non-uniform. Therefore, the
novelty in our methods lies in developing appropriate tail bounds on the
distribution of nearest-neighbor region sizes and in adapting previous
arguments to this more general setting. In addition, we provide
simulation results demonstrating the load balance that results as the
system size scales into the millions.
%R 2003-003
%T On the Convergence of Statistical Techniques for Inferring Network Traffic Demands
%A Medina, Alberto
%A Salamatian, Kave
%A Taft, Nina
%A Matta, Ibrahim
%A Tsang, Yolanda
%A Diot, Christophe
%D February 6, 2003
%U http://www.cs.bu.edu/techreports/2003-003-convergence-TM.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Accurate knowledge of traffic demands in a communication network
enables or enhances a variety of traffic engineering and network
management tasks of paramount importance for operational networks.
Directly measuring a complete set of these demands is prohibitively
expensive because of the huge amounts of data that must be collected
and the performance impact that such measurements would impose on the
regular behavior of the network. As a consequence, we must rely on
statistical techniques to produce estimates of actual traffic demands
from partial information. The performance of such techniques is
however limited due to their reliance on limited information and the
high amount of computations they incur, which limits their convergence
behavior. In this paper we study strategies to improve the convergence
of a powerful statistical technique based on an
Expectation-Maximization iterative algorithm. First we analyze
modeling approaches to generating starting points. We call these
starting points {\it informed priors} since they are obtained using
actual network information such as packet traces and SNMP link counts.
Second we provide a very fast variant of the EM algorithm which
extends its computation range, increasing its accuracy and decreasing
its dependence on the quality of the starting point. Finally, we
study the convergence characteristics of our EM algorithm and compare
it against a recently proposed Weighted Least Squares approach.
%R 2003-004
%T Cryptographic Tamper Evidence
%A Itkis, Gene
%D February 11, 2003
%U http://www.cs.bu.edu/techreports/2003-004-tamper-evidence.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose a new notion of cryptographic tamper evidence. A
tamper-evident signature scheme provides an additional procedure Div
which detects tampering: given two signatures, Div can determine
whether one of them was generated by the forger. Surprisingly, this is
possible even after the adversary has inconspicuously learned some ---
or even all --- the secrets in the system. In this case, it might be
impossible to tell which signature is generated by the legitimate
signer and which by the forger. But at least the fact of the
tampering will be made evident. We define several variants of
tamper-evidence, differing in their power to detect tampering. In all
of these, we assume an equally powerful adversary: she adaptively
controls all the inputs to the legitimate signer (i.e., all messages
to be signed and their timing), and observes all his outputs; she can
also adaptively expose all the secrets at arbitrary times. We provide
tamper-evident schemes for all the variants and prove their
optimality. We stress that our mechanisms are purely cryptographic:
the tamper-detection algorithm Div is stateless and takes no inputs
except the two signatures (in particular, it keeps no logs), we use no
infrastructure (or other ways to conceal additional secrets), and we
use no hardware properties (except those implied by the standard
cryptographic assumptions, such as random number generators). Our
constructions are based on arbitrary ordinary signature schemes and do
not require random oracles.
%R 2003-005
%T On the Emergence of Highly Variable Distributions in the Autonomous System Topology
%A Fayed, Marwan
%A Krapivsky, Paul
%A Byers, John
%A Crovella, Mark
%A Finkel, David
%A Redner, Sid
%D March 1, 2003
%U http://www.cs.bu.edu/techreports/2003-005-AS-degree-distribution.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Recent studies have noted that vertex degree in the autonomous
system (AS) graph exhibits a highly variable distribution \cite{fff,MP01}.
The most prominent explanatory model for this phenomenon is the
Barabasi-Albert (B-A) model [BA99,AB00]. A central feature of the
B-A model is preferential connectivity --- meaning that the likelihood a
new node in a growing graph will connect to an existing node is
proportional to the existing node's degree. In this paper we ask whether
a more general explanation than the B-A model, and absent the assumption
of preferential connectivity, is consistent with empirical data. We are
motivated by two observations: first, AS degree and AS size are highly
correlated [CHEN02]; and second, highly variable AS size can arise
simply through exponential growth. We construct a model incorporating
exponential growth in the size of the Internet, and in the number of ASes.
We then show via analysis that such a model yields a size distribution
exhibiting a power-law tail. In such a model, if an AS's link formation
is roughly proportional to its size, then AS degree will also show high
variability. We instantiate such a model with empirically derived
estimates of growth rates and show that the resulting degree distribution
is in good agreement with that of real AS graphs.
%R 2003-006
%T Skin Color-Based Video Segmentation under Time-Varying Illumination
%A Sigal, Leonid
%A Sclaroff, Stan
%D March 28, 2003
%U http://www.cs.bu.edu/techreports/2003-006.SkinColor.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A novel approach for real-time skin segmentation in video
sequences is described. The approach enables reliable skin
segmentation despite wide variation in illumination during
tracking. An explicit second order Markov model is used to
predict evolution of the skin-color (HSV) histogram over time.
Histograms are dynamically updated based on feedback from the
current segmentation and predictions of the Markov model. The
evolution of the skin-color distribution at each frame is
parameterized by translation, scaling and rotation in color space.
Consequent changes in geometric parameterization of the
distribution are propagated by warping and re-sampling the
histogram. The parameters of the discrete-time dynamic Markov
model are estimated using Maximum Likelihood Estimation, and also
evolve over time. The accuracy of the new dynamic skin color
segmentation algorithm is compared to that obtained via a static
color model. Segmentation accuracy is evaluated using labeled
ground-truth video sequences taken from staged experiments and
popular movies. An overall increase in segmentation accuracy of up
to 24% is observed in 17 out of 21 test sequences. In all but one
case the skin-color classification rates for our system were
higher, with background classification rates comparable to those
of the static segmentation.
%R 2003-007
%T The Specialized Mappings Architecture
%A Rosales, Romer
%A Sclaroff, Stan
%D March 28, 2003
%U http://www.cs.bu.edu/techreports/2003-007.SMA.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A probabilistic, nonlinear supervised learning model is proposed:
the Specialized Mappings Architecture (SMA). The SMA employs a
set of several mapping functions that are estimated automatically
from training data. Each specialized function maps certain domains
of the input space (e.g., image features) onto the output space
(e.g., articulated body parameters). One important advantage of
the SMA is that it can model ambiguous, one-to-many mappings that
may yield multiple valid output hypotheses. Once learned, the
mapping functions generate a set of output hypotheses for a given
input via a statistical inference procedure. The SMA inference
procedure incorporates an inverse mapping or feedback function,
which enables the SMA to evaluate the likelihood of each
hypothesis. Possible feedback functions include computer graphics
rendering routines that can generate images for given hypotheses.
The SMA employs a variant of the Expectation-Maximization
algorithm for simultaneous learning of the specialized domains
along with the mapping functions, and approximate strategies for
inference. The framework is demonstrated in a computer vision
system that can estimate the articulated pose parameters of a
human body or human hands, given image silhouettes. The accuracy
and stability of the SMA are also tested using synthetic images of
human bodies and hands, where ground truth is known.
%R 2003-008
%T Discovering Clusters in Motion Time-Series Data
%A Alon, Jonathan
%A Sclaroff, Stan
%A Kollios, George
%A Pavlovic, Vladimir
%D March 28, 2003
%U http://www.cs.bu.edu/techreports/2003-008-discovering-clusters.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A new approach is proposed for clustering time-series
data. The approach can be used to discover groupings of
similar object motions that were observed in a video collection.
A finite mixture of hidden Markov models (HMMs) is
fitted to the motion data using the expectation-maximization
(EM) framework. Previous approaches for HMM-based
clustering employ a k-means formulation, where each sequence
is assigned to only a single HMM. In contrast, the
formulation presented in this paper allows each sequence to
belong to more than a single HMM with some probability,
and the hard decision about the sequence class membership
can be deferred until a later time when such a decision
is required. Experiments with simulated data demonstrate
the benefit of using this EM-based approach when there is
more "overlap" in the processes generating the data. Experiments
with real data show the promising potential of
HMM-based motion clustering in a number of applications.
%R 2003-009
%T Estimating 3D Hand Pose from a Cluttered Image
%A Athitsos, Vassilis
%A Sclaroff, Stan
%D April 1, 2003
%U http://www.cs.bu.edu/techreports/2003-009-3D-hand-pose-from-cluttered.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A method is proposed that can generate a ranked list of plausible
three-dimensional hand configurations that best match an input image.
Hand pose estimation is formulated as an image database indexing
problem, where the closest matches for an input hand image are
retrieved from a large database of synthetic hand images. In contrast
to previous approaches, the system can function in the presence of
clutter, thanks to two novel clutter-tolerant indexing methods. First,
a computationally efficient approximation of the image-to-model
chamfer distance is obtained by embedding binary edge images into a
high-dimensional Euclidean space. Second, a general-purpose,
probabilistic line matching method identifies those line segment
correspondences between model and input images that are the least
likely to have occurred by chance. The performance of this
clutter-tolerant approach is demonstrated in quantitative experiments
with hundreds of real hand images.
%R 2003-010
%T Database Indexing Methods for 3D Hand Pose Estimation
%A Athitsos, Vassilis
%A Sclaroff, Stan
%D April 1, 2003
%U http://www.cs.bu.edu/techreports/2003-010-3D-hand-pose-db-indexing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Estimation of 3D hand pose is useful in many gesture recognition
applications, ranging from human-computer interaction to automated
recognition of sign languages. In this paper, 3D hand pose estimation
is treated as a database indexing problem. Given an input image of a
hand, the most similar images in a large database of hand images are
retrieved. The hand pose parameters of the retrieved images are used
as estimates for the hand pose in the input image. Lipschitz
embeddings of edge images into a Euclidean space are used to improve
the efficiency of database retrieval. In order to achieve interactive
retrieval times, similarity queries are initially performed in this
Euclidean space. The paper describes ongoing work that focuses on how
to best choose reference images, in order to improve retrieval
accuracy.
%R 2003-011
%T How well can TCP infer network state?
%A Barman, Dhiman
%A Matta, Ibrahim
%D May 16, 2003
%U http://www.cs.bu.edu/techreports/2003-011-TCP-Bayesian.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The Transmission Control Protocol (TCP) has been the protocol of
choice for many Internet applications requiring reliable connections.
The design of TCP has been challenged by the extension of connections
over wireless links. We ask a fundamental question: What is the
basic predictive power of TCP of network state, including wireless
error conditions? The goal is to improve or readily exploit this
predictive power to enable TCP (or variants) to perform well in
generalized network settings. To that end, we use Maximum Likelihood
Ratio tests to evaluate TCP as a detector/estimator. We quantify how
well network state can be estimated, given network response such as
distributions of packet delays or TCP throughput that are conditioned
on the type of packet loss. Using our model-based approach and
extensive simulations, we demonstrate that congestion-induced losses
and losses due to wireless transmission errors produce sufficiently
different statistics upon which an efficient detector can be built;
distributions of network loads can provide effective means for
estimating packet loss type; and packet delay is a better signal of
network state than short-term throughput. We demonstrate how
estimation accuracy is influenced by different proportions of
congestion versus wireless losses and penalties on incorrect
estimation.
%R 2003-012
%T Systematic Verification of Safety Properties of Arbitrary Network Protocol Compositions Using CHAIN
%A Bradley, Adam
%A Bestavros, Azer
%A Kfoury, Assaf
%D May 16, 2003
%U http://www.cs.bu.edu/techreports/2003-012-chain-safety-verification.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Formal correctness of complex multi-party network protocols can be
difficult to verify. While models of specific fixed compositions of agents
can be checked against design constraints, protocols which lend themselves
to arbitrarily many compositions of agents--such as the chaining of
proxies or the peering of routers--are more difficult to verify because
they represent potentially infinite state spaces and may exhibit emergent
behaviors which may not materialize under particular fixed compositions.
We address this challenge by developing an algebraic approach that enables
us to reduce arbitrary compositions of network agents into a
behaviorally-equivalent (with respect to some correctness property)
compact, canonical representation, which is amenable to mechanical
verification. Our approach consists of an algebra and a set of
property-preserving rewrite rules for the Canonical Homomorphic
Abstraction of Infinite Network protocol compositions (CHAIN). Using
CHAIN, an expression over our algebra (i.e., a set of configurations of
network protocol agents) can be reduced to another behaviorally-equivalent
expression (i.e., a smaller set of configurations). Repeated applications
of such rewrite rules produces a canonical expression which can be checked
mechanically. We demonstrate our approach by characterizing deadlock-prone
configurations of HTTP agents, as well as establishing useful properties
of an overlay protocol for scheduling MPEG frames, and of a protocol for
Web intra-cache consistency.
%R 2003-013
%T On the Efficiency and Fairness of Transmission Control Loops: A Case for Exogenous Losses
%A Guirguis, Mina
%A Bestavros, Azer
%A Matta, Ibrahim
%D May 16, 2003
%U http://www.cs.bu.edu/techreports/2003-013-exogenous-loss-effect.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We postulate that exogenous losses--which are typically regarded as
introducing undesirable ``noise'' that needs to be filtered out or hidden
from end points--can be surprisingly beneficial. In this paper we evaluate
the effects of exogenous losses on transmission control loops, focusing
primarily on efficiency and convergence to fairness properties. By
analytically capturing the effects of exogenous losses, we are able to
characterize the transient behavior of TCP. Our numerical results suggest
that ``noise'' resulting from exogenous losses should not be filtered out
blindly, and that a careful examination of the parameter space leads to
better strategies regarding the treatment of exogenous losses inside the
network. Specifically, we show that while low levels of exogenous losses do
help connections converge to their fair share, higher levels of losses lead
to inefficient network utilization. We draw the line between these two
cases by determining whether or not it is advantageous to hide, or more
interestingly introduce, exogenous losses. Our proposed approach is based
on classifying the effects of exogenous losses into long-term and
short-term effects. Such classification informs the extent to which we
control exogenous losses, so as to operate in an efficient and fair
region. We validate our results through simulations.
%R 2003-014
%T User-Level Sandboxing: a Safe and Efficient Mechanism for Extensibility
%A West, Richard
%A Gloudon, Jason
%D June 1, 2003
%U http://www.cs.bu.edu/techreports/2003-014-user-level-sandboxing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Extensible systems allow services to be configured and deployed for
the specific needs of individual applications. This paper describes a
safe and efficient method for user-level extensibility that requires
only minimal changes to the kernel. A sandboxing technique is
described that supports multiple logical protection domains within the
same address space at user-level. This approach allows applications to
register sandboxed code with the system, that may be executed in the
context of any process. Our approach differs from other
implementations that require special hardware support, such as
segmentation or tagged translation look-aside buffers (TLBs), to
either implement multiple protection domains in a single address
space, or to support fast switching between address spaces. Likewise,
we do not require the entire system to be written in a type -safe
language, to provide fine-grained protection domains. Instead, our
user-level sandboxing technique requires only paged-based virtual
memory support, and the requirement that extension code is written
either in a type-safe language, or by a trusted source.
Using a fast method of upcalls, we show how our sandboxing technique
for implementing logical protection domains provides significant
performance improvements over traditional methods of invoking
user-level services. Experimental results show our approach to be an
efficient method for extensibility, with inter-protection domain
communication costs close to those of hardware-based solutions
leveraging segmentation.
%R 2003-015
%T ROMA: Reliable Overlay Multicast with Loosely Coupled TCP Connections
%A Kwon, Gu-In
%A Byers, John
%D July 1, 2003
%U http://www.cs.bu.edu/techreports/2003-015-roma.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider the problem of architecting a reliable content delivery
system across an overlay network using TCP connections as the
transport primitive. We first argue that natural designs based on
store-and-forward principles that tightly couple TCP connections at
intermediate end-systems impose fundamental performance limitations, such
as dragging down all transfer rates in the system to the rate of the
slowest receiver. In contrast, the ROMA architecture we propose
incorporates the use of loosely coupled TCP connections together with
fast forward error correction techniques to deliver a scalable solution
that better accommodates a set of heterogeneous receivers. The methods we
develop establish chains of TCP connections, whose expected performance we
analyze through equation-based methods. We validate our analytical
findings and evaluate the performance of our ROMA architecture using a
prototype implementation via extensive Internet experimentation across the
PlanetLab distributed testbed.
%R 2003-016
%T Stochastic Mesh-Based Multiview Reconstruction
%A Isidoro, John
%A Sclaroff, Stan
%D July 1, 2003
%U http://www.cs.bu.edu/techreports/2003-016-stochastic-mesh-based-reconstruction.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A method for reconstruction of 3D polygonal models from multiple views
is presented. The method uses sampling techniques to construct a
texture-mapped semi-regular polygonal mesh of the object in question.
Given a set of views and segmentation of the object in each view,
constructive solid geometry is used to build a visual hull from
silhouette prisms. The resulting polygonal mesh is simplified and
subdivided to produce a semi-regular mesh. Regions of model fit
inaccuracy are found by projecting the reference images onto the mesh
from different views. The resulting error images for each view are
used to compute a probability density function, and several points are
sampled from it. Along the epipolar lines corresponding to these
sampled points, photometric consistency is evaluated. The mesh
surface is then pulled towards the regions of higher photometric
consistency using free-form deformations. This sampling-based
approach produces a photometrically consistent solution in much less
time than possible with previous multi-view algorithms given arbitrary
camera placement.
%R 2003-017
%T Stochastic Refinement of the Visual Hull to Satisfy Photometric and Silhouette Consistency Constraints
%A Isidoro, John
%A Sclaroff, Stan
%D July 18, 2003
%U http://www.cs.bu.edu/techreports/2003-017-visual-hull-stochastic-refinement.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
An iterative method for reconstructing a 3D polygonal mesh and color
texture map from multiple views of an object is presented. In each
iteration, the method first estimates a texture map given the current
shape estimate. The texture map and its associated residual error image
are obtained via maximum a posteriori estimation and reprojection of
the multiple views into texture space. Next, the surface shape is adjusted
to minimize residual error in texture space. The surface is deformed
towards a photometrically-consistent solution via a series of 1D epipolar
searches at randomly selected surface points. The texture space
formulation has improved computational complexity over standard image-based
error aproaches, and allows computation of the reprojection error and
uncertainty for any point on the surface. Moreover, shape adjustments can
be constrained such that the recovered model's silhouette matches those of
the input images. Experiments with real world imagery demonstrate the
validity of the approach.
%R 2003-018
%T Segmenting Foreground Objects from a Dynamic Textured Background via a Robust Kalman Filter
%A Zhong, Jing
%A Sclaroff, Stan
%D July 18, 2003
%U http://www.cs.bu.edu/techreports/2003-018-dynamic-background-segmentation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The algorithm presented in this paper aims to segment the foreground
objects in video (e.g., people) given time-varying, textured
backgrounds. Examples of time-varying backgrounds include waves on water,
clouds moving, trees waving in the wind, automobile traffic, moving crowds,
escalators, etc. We have developed a novel foreground-background
segmentation algorithm that explicitly accounts for the non-stationary
nature and clutter-like appearance of many dynamic textures. The dynamic
texture is modeled by an Autoregressive Moving Average Model (ARMA). A
robust Kalman filter algorithm iteratively estimates the intrinsic
appearance of the dynamic texture, as well as the regions of the foreground
objects. Preliminary experiments with this method have demonstrated
promising results.
%R 2003-019
%T Dynamic Window-Constrained Scheduling for Real-Time Media Streaming
%A West, Richard
%A Schwan, Karsten
%A Poellabauer, Christian
%D August 29, 2003
%U http://www.cs.bu.edu/techreports/2003-019-dwcs.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper describes an algorithm for scheduling packets in real-time
multimedia data streams. Common to these classes of data streams are
service constraints in terms of bandwidth and delay. However, it is
typical for real-time multimedia streams to tolerate bounded delay
variations and, in some cases, finite losses of packets. We have
therefore developed a scheduling algorithm that assumes streams have
window-constraints on groups of consecutive packet deadlines. A
window-constraint defines the number of packet deadlines that can be
missed in a window of deadlines for consecutive packets in a stream.
Our algorithm, called Dynamic Window-Constrained Scheduling (DWCS),
attempts to guarantee no more than x out of a window of y
deadlines are missed for consecutive packets in real-time and
multimedia streams. Using DWCS, the delay of service to real-time
streams is bounded even when the scheduler is overloaded. Moreover,
DWCS is capable of ensuring independent delay bounds on streams, while
at the same time guaranteeing minimum bandwidth utilizations over
tunable and finite windows of time.
We show the conditions under which the total demand for link bandwidth
by a set of real-time (i.e., window-constrained) streams can exceed
100\% and still ensure all window-constraints are met. In fact, we
show how it is possible to guarantee worst-case per-stream bandwidth
and delay constraints while utilizing all available link
capacity. Finally, we show how best-effort packets can be serviced
with fast response time, in the presence of window-constrained
traffic.
%R 2003-020
%T Adaptive Routing of QoS-constrained Media Streams over Scalable Overlay Topologies
%A Fry, Gerald
%A West, Richard
%D November 7, 2003
%U http://www.cs.bu.edu/techreports/2003-020-qos-overlay-stream-routing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Current research on Internet-based distributed systems emphasizes the
scalability of overlay topologies for efficient search and retrieval of
data items, as well as routing amongst peers. However, most existing
approaches fail to address the transport of data across these logical
networks in accordance with quality of service (QoS) constraints.
Consequently, this paper investigates the use of scalable overlay
topologies for routing real-time media streams between publishers and
potentially many thousands of subscribers. Specifically, we analyze the
costs of using k-ary n-cubes for QoS-constrained routing. Given a number
of nodes in a distributed system, we calculate the optimal k-ary n-cube
structure for minimizing the average distance between any pair of nodes.
Using this structure, we describe a greedy algorithm that selects paths
between nodes in accordance with the real-time delays along physical
links. We show this method improves the routing latencies by as much as
67%, compared to approaches that do not consider physical link costs.
We are in the process of developing a method for adaptive node placement
in the overlay topology, based upon the locations of publishers,
subscribers, physical link costs and per-subscriber QoS constraints. One
such method for repositioning nodes in logical space is discussed, to
improve the likelihood of meeting service requirements on data routed
between publishers and subscribers. Future work will evaluate the benefits
of such techniques more thoroughly.
%R 2003-021
%T Structural Analysis of Network Traffic Flows
%A Lakhina, Anukool
%A Papagiannaki, Konstantina
%A Crovella, Mark
%A Diot, Christophe
%A Kolaczyk, Eric
%A Taft, Nina
%D November 20, 2003
%U http://www.cs.bu.edu/techreports/2003-021-odflows.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Network traffic arises from the superposition of Origin-Destination
(OD) flows. Hence, a thorough understanding of OD flows is essential
for modeling network traffic, and for addressing a wide variety of
problems including traffic engineering, traffic matrix
estimation, capacity planning, forecasting and anomaly detection.
However, to date, OD flows have not been closely studied, and there is
very little known about their properties.
We present the first analysis of complete sets of OD flow timeseries,
taken from two different backbone networks (Abilene and
Sprint-Europe). Using Principal Component Analysis (PCA), we find that
the set of OD flows has small intrinsic dimension. In fact, even in a
network with over a hundred OD flows, these flows can be accurately
modeled in time using a small number (10 or less) of independent
components or dimensions.
We also show how to use PCA to systematically decompose the structure
of OD flow timeseries into three main constituents: common periodic
trends, short-lived bursts, and noise. We provide insight into how
the various constitutents contribute to the overall structure of OD
flows and explore the extent to which this decomposition varies over
time.
%R 2003-022
%T Data Logs for Structural Analysis of Network Traffic Flows
%A Lakhina, Anukool
%A Papagiannaki, Konstantina
%A Crovella, Mark
%A Diot, Christophe
%A Kolaczyk, Eric
%A Taft, Nina
%D November 20, 2003
%U http://www.cs.bu.edu/techreports/2003-022-odflows-data.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In a recent paper, Structural Analysis of Network Traffic Flows, we
analyzed the set of Origin Destination traffic flows from the Sprint-Europe
and Abilene backbone networks. This report presents the complete set of
results from analyzing data from both networks. The results in this report
are specific to the Sprint-1 and Abilene datasets studied in the above
paper.
%R 2003-023
%T BoostMap: A Method for Efficient Approximate Similarity Rankings
%A Athitsos, Vassilis
%A Alon, Jonathan
%A Sclaroff, Stan
%A Kollios, George
%D November 24, 2003
%U http://www.cs.bu.edu/techreports/2003-023-boostmap.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper introduces BoostMap, a method that can significantly
reduce retrieval time in image and video database systems that employ
computationally expensive distance measures, metric or
non-metric. Database and query objects are embedded into a Euclidean
space, in which similarities can be rapidly measured using a weighted
Manhattan distance. Embedding construction is formulated as a machine
learning task, where AdaBoost is used to combine many simple, 1D
embeddings into a multidimensional embedding that preserves a
significant amount of the proximity structure in the original
space. Performance is evaluated in a hand pose estimation system, and
a dynamic gesture recognition system, where the proposed method is
used to retrieve approximate nearest neighbors under expensive image
and video similarity measures. In both systems, BoostMap significantly
increases efficiency, with minimal losses in accuracy. Moreover, the
experiments indicate that BoostMap compares favorably with existing
embedding methods that have been employed in computer vision and
database applications, i.e., FastMap and Bourgain embeddings.
%R 2003-024
%T A Pragmatic Approach to DHT Adoption
%A Considine, Jeffrey
%A Walfish, Michael
%A Andersen, David G.
%D December 1, 2003
%U http://www.cs.bu.edu/techreports/2003-024-DHT-pragmatic-approach.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Despite the peer-to-peer community's obvious wish to have its systems
adopted, specific mechanisms to facilitate incremental adoption have not
yet received the same level of attention as the many other practical
concerns associated with these systems. This paper argues that ease of
adoption should be elevated to a first-class concern and accordingly
presents HOLD, a front-end to existing DHTs that is optimized for
incremental adoption. Specifically, HOLD is backwards-compatible: it
leverages DNS to provide a key-based routing service to existing Internet
hosts without requiring them to install any software. This paper also
presents applications that could benefit from HOLD as well as the
trade-offs that accompany HOLD. Early implementation experience suggests
that HOLD is practical.
%R 2003-025
%T Contour Generator Points for Threshold Selection and a Novel Photo-Consistency Measure for Space Carving
%A Isodoro, John
%A Sclaroff, Stan
%D December 2, 2003
%U http://www.cs.bu.edu/techreports/2003-025-space-carving.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Space carving has emerged as a powerful method for multiview scene
reconstruction. Although a wide variety of methods have been
proposed, the quality of the reconstruction remains highly-dependent
on the photometric consistency measure, and the threshold used to
carve away voxels. In this paper, we present a novel
photo-consistency measure that is motivated by a multiset variant of
the chamfer distance. The new measure is robust to high amounts of
within-view color variance and also takes into account the projection
angles of back-projected pixels.
Another critical issue in space carving is the selection of the
photo-consistency threshold used to determine what surface voxels are
kept or carved away. In this paper, a reliable threshold selection
technique is proposed that examines the photo-consistency values at
contour generator points. Contour generators are points that lie on
both the surface of the object and the visual hull. To determine the
threshold, a percentile ranking of the photo-consistency values of
these generator points is used. This improved technique is applicable
to a wide variety of photo-consistency measures, including the new
measure presented in this paper. Also presented in this paper is a
method to choose between photo-consistency measures, and voxel array
resolutions prior to carving using receiver operating characteristic
(ROC) curves.
%R 2003-026
%T Exogenous-Loss Awareness in Queue Management: Toward Global Fairness
%A Guirguis, Mina
%A Bestavros, Azer
%A Matta, Ibrahim
%D December 2, 2003
%U http://www.cs.bu.edu/techreports/2003-026-xqm.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
For a given TCP flow, exogenous losses are those occurring on links
other than the flow's bottleneck link. Exogenous losses are typically
viewed as introducing undesirable ``noise'' into TCP's feedback
control loop, leading to inefficient network utilization and
potentially severe global unfairness. This has prompted much research
on mechanisms for hiding such losses from end-points. In this paper,
we show through analysis and simulations that low levels of exogenous
losses are surprisingly beneficial in that they improve stability and
convergence, without sacrificing efficiency. Based on this, we argue
that exogenous loss awareness should be taken into account in any AQM
design that aims to achieve global fairness. To that end, we propose
an eXogenous-loss aware Queue Management (XQM) that actively accounts
for and leverages exogenous losses. We use an equation based approach
to derive the quiescent loss rate for a connection based on the
connection's profile and its global fair share. In contrast to other
queue management techniques, XQM ensures that a connection sees its
quiescent loss rate, not only by complementing already existing
exogenous losses, but also by actively hiding exogenous losses, if
necessary, to achieve global fairness. We establish the advantages of
exogenous-loss awareness using extensive simulations in which, we
contrast the performance of XQM to that of a host of traditional
exogenous-loss unaware AQM techniques.
%R 2003-027
%T Efficiently and Fairly Allocating Bandwidth at a Highly Congested Link
%A Wang, Tao
%A Matta, Ibrahim
%A Bestavros, Azer
%D December 2, 2003
%U http://www.cs.bu.edu/techreports/2003-027-red-nb.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider the problem of efficiently and fairly allocating bandwidth
at a highly congested link to a diverse set of flows, including TCP
flows with various Round Trip Times (RTT), non-TCP-friendly flows such
as Constant-Bit-Rate (CBR) applications using UDP, misbehaving, or
malicious flows. Though simple, a FIFO queue management is
vulnerable. Fair Queueing (FQ) can guarantee max-min fairness but
fails at efficiency. RED-PD exploits the history of RED's actions in
preferentially dropping packets from higher-rate flows. Thus, RED-PD
attempts to achieve fairness at low cost. By relying on RED's
actions, RED-PD turns out not to be effective in dealing with
non-adaptive flows in settings with a highly heterogeneous mix of
flows. In this paper, we propose a new approach we call RED-NB (RED
with No Bias). RED-NB does not rely on RED's actions. Rather it
explicitly maintains its own history for the few high-rate flows.
RED-NB then adaptively adjusts flow dropping probabilities to achieve
max-min fairness. In addition, RED-NB helps RED itself at very high
loads by tuning RED's dropping behavior to the flow characteristics
(restricted in this paper to RTTs) to eliminate its bias against
long-RTT TCP flows while still taking advantage of RED's features at
low loads. Through extensive simulations, we confirm the fairness of
RED-NB and show that it outperforms RED, RED-PD, and CHOKe in all
scenarios.
%R 2003-028
%T Providing Soft Bandwidth Guarantees Using Elastic TCP-based Tunnels
%A Guirguis, Mina
%A Bestavros, Azer
%A Matta, Ibrahim
%A Riga, Niky
%A Diamant, Gali
%A Zhang, Yuting
%D December 2, 2003
%U http://www.cs.bu.edu/techreports/2003-028-elastic-tunneling.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The best-effort nature of the Internet poses a significant obstacle to
the deployment of many applications that require guaranteed
bandwidth. In this paper, we present a novel approach that enables two
edge/border routers---which we call Internet Traffic Managers
(ITM)---to use an adaptive number of TCP connections to set up a
tunnel of desirable bandwidth between them. The number of TCP
connections that comprise this tunnel is elastic in the sense that it
increases/decreases in tandem with competing cross traffic to maintain
a target bandwidth. An origin ITM would then schedule incoming
packets from an application requiring guaranteed bandwidth over that
elastic tunnel. Unlike many proposed solutions that aim to deliver
soft QoS guarantees, our elastic-tunnel approach does not require any
support from core routers (as with IntServ and DiffServ); it is
scalable in the sense that core routers do not have to maintain
per-flow state (as with IntServ); and it is readily deployable within
a single ISP or across multiple ISPs. To evaluate our approach, we
develop a flow-level control-theoretic model to study the transient
behavior of established elastic TCP-based tunnels. The model captures
the effect of cross-traffic connections on our bandwidth allocation
policies. Through extensive simulations, we confirm the effectiveness
of our approach in providing soft bandwidth guarantees. We also
outline our kernel-level ITM prototype implementation.
%R 2003-029
%T TCP Optimization through FEC, ARQ and Transmission Power Tradeoffs
%A Barman, Dhiman
%A Matta, Ibrahim
%A Altman, Eitan
%A Azouzi, Rachid
%D December 3, 2003
%U http://www.cs.bu.edu/techreports/2003-029-tcp-tradeoffs.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
TCP performance degrades when end-to-end connections extend over
wireless connections --- links which are characterized by high bit
error rate and intermittent connectivity. Such link characteristics
can significantly degrade TCP performance as the TCP sender assumes
wireless losses to be congestion losses resulting in unnecessary
congestion control actions. Link errors can be reduced by increasing
transmission power, code redundancy (FEC) or number of retransmissions
(ARQ). But increasing power costs resources, increasing code
redundancy reduces available channel bandwidth and increasing
persistency increases end-to-end delay. The paper proposes a TCP
optimization through proper tuning of power management, FEC and ARQ in
wireless environments (WLAN and WWAN). In particular, we conduct
analytical and numerical analysis taking into account the three
aforementioned factors, and evaluate TCP (and ``wireless-aware'' TCP)
performance under different settings. Our results show that
increasing power, redundancy and/or retransmission levels always
improves TCP performance by reducing link-layer losses. However, such
improvements are often associated with cost and arbitrary improvement
cannot be realized without paying a lot in return. It is therefore
important to consider some kind of net utility function that should be
optimized, thus maximizing throughput at the least possible cost.
%R 2003-030
%T A Bayesian Approach for TCP to Distinguish Congestion from Wireless Losses
%A Barman, Dhiman
%A Matta, Ibrahim
%D December 3, 2003
%U http://www.cs.bu.edu/techreports/2003-030-TCP-bayesian-loss-distinction.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
(This Technical Report revises TR-BUCS-2003-011) The Transmission
Control Protocol (TCP) has been the protocol of choice for many
Internet applications requiring reliable connections. The design of
TCP has been challenged by the extension of connections over wireless
links. In this paper, we investigate a Bayesian approach to infer at
the source host the reason of a packet loss, whether congestion or
wireless transmission error. Our approach is ``mostly'' end-to-end
since it requires only one long-term average quantity (namely,
long-term average packet loss probability over the wireless segment)
that may be best obtained with help from the network (e.g. wireless
access agent).
Specifically, we use Maximum Likelihood Ratio tests to evaluate TCP as a
classifier of the type of packet loss. We study the effectiveness of
short-term classification of packet errors (congestion vs. wireless),
given stationary prior error probabilities and distributions of packet
delays conditioned on the type of packet loss (measured over a larger time
scale). Using our Bayesian-based approach and extensive simulations, we
demonstrate that congestion-induced losses and losses due to wireless
transmission errors produce sufficiently different statistics upon which an
efficient online error classifier can be built. We introduce a simple
queueing model to underline the conditional delay distributions arising
from different kinds of packet losses over a heterogeneous wired/wireless
path. We show how Hidden Markov Models (HMMs) can be used by a TCP
connection to infer efficiently conditional delay distributions. We
demonstrate how estimation accuracy is influenced by different proportions
of congestion versus wireless losses and penalties on incorrect
classification.
%R 2003-031
%T Automated Placement of Cameras in a Floorplan to Satisfy Task-Specific Constraints
%A Murat, Ugur
%A Sclaroff, Stan
%D December 8, 2003
%U http://www.cs.bu.edu/techreports/2003-031-camera-placement.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In many multi-camera vision systems the effect of camera locations on the
task-specific quality of service is ignored. Researchers in Computational
Geometry have proposed elegant solutions for some sensor location problem
classes. Unfortunately, these solutions utilize unrealistic assumptions
about the cameras' capabilities that make these algorithms unsuitable for
many real-world computer vision applications: unlimited field of view,
infinite depth of field, and/or infinite servo precision and speed. In
this paper, the general camera placement problem is first defined with
assumptions that are more consistent with the capabilities of real-world
cameras. The region to be observed by cameras may be volumetric, static or
dynamic, and may include holes that are caused, for instance, by columns or
furniture in a room that can occlude potential camera views. A subclass of
this general problem can be formulated in terms of planar regions that are
typical of building floorplans. Given a floorplan to be observed, the
problem is then to efficiently compute a camera layout such that certain
task-specific constraints are met. A solution to this problem is obtained
via binary optimization over a discrete problem space. In preliminary
experiments the performance of the resulting system is demonstrated with
different real floorplans.
%R 2003-032
%T itmBench: Generalized API for Internet Traffic Managers
%A Diamant, Gali
%A Veytser, Leonid
%A Matta, Ibrahim
%A Bestavros, Azer
%A Guirguis, Mina
%A Guo, Liang
%A Zhang, Yuting
%A Chen, Sean
%D December 16, 2003
%U http://www.cs.bu.edu/techreports/2003-032-itmbench.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Internet Traffic Managers (ITMs) are special machines placed at
strategic places in the Internet. itmBench is an interface that allows
users (e.g. network managers, service providers, or experimental
researchers) to register different traffic control functionalities to
run on one ITM or an overlay of ITMs. Thus itmBench offers a
tool that is extensible and powerful yet easy to maintain. ITM
traffic control applications could be developed either using a kernel
API so they run in kernel space, or using a user-space API so they run
in user space. We demonstrate the flexibility of itmBench by
showing the implementation of both a kernel module that provides a
differentiated network service, and a user-space module that provides
an overlay routing service. Our itmBench Linux-based prototype is free
software and can be obtained from http://www.cs.bu.edu/groups/itm/.
%R 2004-001
%T Integrated Chest Image Analysis System ``BU-MIA''
%A Betke, Margrit
%A Wang, Jingbin
%A Ko, Jane
%D January 7, 2004
%U http://www.cs.bu.edu/techreports/2004-001-bu-mia.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We introduce ``BU-MIA,'' a Medical Image Analysis system that
integrates various advanced chest image analysis methods for
detection, estimation, segmentation, and registration. BU-MIA
evaluates repeated computed tomography (CT) scans of the same patient
to facilitate identification and evaluation of pulmonary nodules for
interval growth. It provides a user-friendly graphical user interface
with a number of interaction tools for development, evaluation, and
validation of chest image analysis methods. The structures that BU-MIA
processes include the thorax, lungs, and trachea, pulmonary
structures, such as lobes, fissures, nodules, and vessels, and bones,
such as sternum, vertebrae, and ribs.
%R 2004-002
%T Quantum Lower Bounds for Fanout
%A Fang, M.
%A Fenner, S.
%A Green, F.
%A Homer, S.
%A Zhang, Y.
%D January 12, 2004
%U http://www.cs.bu.edu/techreports/2004-002-quantum-fanout-lower-bounds.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We prove several new lower bounds for constant depth quantum
circuits. The main result is that parity (and hence fanout) requires
log depth circuits, when the circuits are composed of single qubit and
arbitrary size Toffoli gates, and when they use only constantly many
ancillae. Under this constraint, this bound is close to optimal. In
the case of a non-constant number of ancillae , we give a tradeoff
between the number of ancillae and the required depth.
%R 2004-003
%T Bounds on the Power of Constant-Depth Quantum Circuits
%A Fenner, S.
%A Green, F.
%A Homer, S.
%A Zhang, Y.
%D January 12, 2004
%U http://www.cs.bu.edu/techreports/2004-003-constant-depth-quantum-circuit-power.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We show that if a language is recognized within certain error bounds
by constant-depth quantum circuits over a finite family of gates, then
it is computable in (classical) polynomial time. In particular, our
results imply EQNC^0 is contained in P, where EQNC^0 is the
constant-depth analog of the class EQP. On the other hand, we adapt
and extend ideas of Terhal and DiVincenzo (quant-ph/0205133) to show
that, for any family F of quantum gates including Hadamard and CNOT
gates, computing the acceptance probabilities of depth-five circuits
over F is just as hard as computing these probabilities for circuits
over F. In particular, this implies that NQNC^0 is hard for the
polynomial time hierarchy, where NQNC^0 is the constant-depth analog
of the class NQP. This essentially refutes a conjecture of Green et
al. that NQACC is contained in TC^0 (quant-ph/0106017).
%R 2004-004
%T Programming Examples Needing Polymorphic Recursion
%A Hallett, Joseph
%A Kfoury, Assaf
%D January 22, 2004
%U http://www.cs.bu.edu/techreports/2004-004-polymorphic-recursion-progams.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Inferring types for polymorphic recursive function definitions
(abbreviated to polymorphic recursion) is a recurring topic on the
mailing lists of popular typed programming languages. This is despite the
fact that type inference for polymorphic recursion using for all-types
has been proved undecidable. This report presents several programming
examples involving polymorphic recursion and determines their typability
under various type systems, including the Hindley-Milner system, an
intersection-type system, and extensions of these two. The goal of this
report is to show that many of these examples are typable using a system
of intersection types as an alternative form of polymorphism. By
accomplishing this, we hope to lay the foundation for future research into
a decidable intersection-type inference algorithm.
We do not provide a comprehensive survey of type systems appropriate
for polymorphic recursion, with or without type annotations inserted in
the source language. Rather, we focus on examples for which types may be
inferred without type annotations.
%R 2004-005
%T Exploiting the Transients of Adaptation for RoQ Attacks on Internet Resources
%A Guirguis, Mina
%A Bestavros, Azer
%A Matta, Ibrahim
%D January 30, 2004
%U http://www.cs.bu.edu/techreports/2004-005-roq.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper, we expose an unorthodox adversarial attack that
exploits the transients of a system's adaptive behavior, as opposed to
its limited steady-state capacity. We show that a well orchestrated
attack could introduce significant inefficiencies that could
potentially deprive a network element from much of its capacity, or
significantly reduce its service quality, while evading detection by
consuming an unsuspicious, small fraction of that element's hijacked
capacity. This type of attack stands in sharp contrast to traditional
brute-force, sustained high-rate DoS attacks, as well as recently
proposed attacks that exploit specific protocol settings such as TCP
timeouts. We exemplify what we term as Reduction of Quality (RoQ)
attacks by exposing the vulnerabilities of common adaptation
mechanisms. We develop control-theoretic models and associated metrics
to quantify these vulnerabilities. We present numerical and simulation
results, which we validate with observations from real Internet
experiments. Our findings motivate the need for the development of
adaptation mechanisms that are resilient to these new forms of
attacks.
%R 2004-006
%T Boosting Nearest Neighbor Classifiers for Multiclass Recognition
%A Athitsos, Vassilis
%A Sclaroff, Stan
%D February 13, 2005
%U http://www.cs.bu.edu/techreports/2004-006-nearest-neighbor-classifiers.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper introduces an algorithm that uses boosting to learn a
distance measure for multiclass k-nearest neighbor
classification. Given a family of distance measures as input, AdaBoost
is used to learn a weighted distance measure, that is a linear
combination of the input measures. The proposed method can be seen
both as a novel way to learn a distance measure from data, and as a
novel way to apply boosting to multiclass recognition problems, that
does not require output codes. In our approach, multiclass recognition
of objects is reduced into a single binary recognition task, defined
on triples of objects. Preliminary experiments with eight UCI datasets
yield no clear winner among our method, boosting using output codes,
and k-nn classification using an unoptimized distance measure. Our
algorithm did achieve lower error rates in some of the datasets, which
indicates that, in some domains, it may lead to better results than
existing methods.
%R 2004-007
%T StaXML: Static Typing of XML Document Fragments for Imperative Web Scripting Languages
%A Bradley, Adam
%A Kfoury, Assaf
%A Bestavros, Azer
%D February 13, 2005
%U http://www.cs.bu.edu/techreports/2004-007-staxml.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a type system, StaXML, which employs the stacked type syntax
to represent essential aspects of the potential roles of XML fragments
to the structure of complete XML documents. The simplest application of
this system is to enforce well-formedness upon the construction of XML
documents without requiring the use of templates or balanced "gap
plugging" operators; this allows it to be applied to programs written
according to common imperative web scripting idioms, particularly the
echoing of unbalanced XML fragments to an output buffer. The system can
be extended to verify particular XML applications such as XHTML and
identifying individual XML tags constructed from their lexical
components. We also present StaXML for PHP, a prototype precompiler for
the PHP4 scripting language which infers StaXML types for expressions
without assistance from the programmer.
%R 2004-008
%T Diagnosing Network-Wide Traffic Anomalies
%A Lakhina, Anukool
%A Crovella, Mark
%A Diot, Christophe
%D February 24, 2004
%U http://www.cs.bu.edu/techreports/2004-008-whole-network-anomalies.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Anomalies are unusual and significant changes in a network's traffic
levels, which can often involve multiple links. Diagnosing anomalies
is critical for both network operators and end users. It is a
difficult problem because one must extract and interpret anomalous
patterns from large amounts of high-dimensional, noisy data. In this
paper we propose a general method to diagnose anomalies. This method
is based on a separation of the high-dimensional space occupied by a
set of network traffic measurements into disjoint subspaces
corresponding to normal and anomalous network conditions. We show
that this separation can be performed effectively using Principal
Component Analysis. Using only simple traffic measurements from
links, we study volume anomalies and show that the method can: (1)
accurately detect when a volume anomaly is occurring; (2) correctly
identify the underlying origin-destination (OD) flow which is the
source of the anomaly; and (3) accurately estimate the amount of
traffic involved in the anomalous OD flow. We evaluate the method's
ability to diagnose (i.e., detect, identify, and quantify) both
existing and synthetically injected volume anomalies in real traffic
from two backbone networks. Our method consistently diagnoses the
largest volume anomalies, and does so with a very low false alarm
rate.
%R 2004-009
%T Efficient End-Host Architecture for High Performance Communication Using User-level Sandboxing
%A Qi, Xin
%A Parmer, Gabriel
%A West, Richard
%A Gloudon, Jason
%A Hernandez, Luis
%D March 1, 2004
%U http://www.cs.bu.edu/techreports/2004-009-endhost-architecture.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Current low-level networking abstractions on modern operating systems are
commonly implemented in the kernel to provide sufficient performance for
general purpose applications. However, it is desirable for high performance
applications to have more control over the networking subsystem to support
optimizations for their specific needs. One approach is to allow networking
services to be implemented at user-level. Unfortunately, this typically
incurs costs due to scheduling overheads and unnecessary data copying via
the kernel. In this paper, we describe a method to implement efficient
application-specific network service extensions at user-level, that removes
the cost of scheduling and provides protected access to lower-level system
abstractions. We present a networking implementation that, with minor
modifications to the Linux kernel, passes data between ``sandboxed''
extensions and the Ethernet device without copying or processing in the
kernel. Using this mechanism, we put a customizable networking stack into a
user-level sandbox and show how it can be used to efficiently process and
forward data via proxies, or intermediate hosts, in the communication path
of high performance data streams. Unlike other user-level networking
implementations, our method makes no special hardware requirements to avoid
unnecessary data copies. Results show that we achieve a substantial
increase in throughput over comparable user-space methods using our
networking stack implementation.
%R 2004-010
%T A Randomized Solution to BGP Divergence
%A Yilmaz, Selma
%A Matta, Ibrahim
%D March 1, 2004
%U http://www.cs.bu.edu/techreports/2004-010-randomized-bgp.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The Border Gateway Protocol (BGP) is an interdomain routing protocol
that allows each Autonomous System (AS) to define its own routing
policies independently and use them to select the best routes. By
means of policies, ASes are able to prevent some traffic from
accessing their resources, or direct their traffic to a preferred
route. However, this flexibility comes at the expense of a possibility
of divergence behavior because of mutually conflicting policies.
Since BGP is not guaranteed to converge even in the absence of network
topology changes, it is not safe. In this paper, we propose a
randomized approach to providing safety in BGP. The proposed
algorithm dynamically detects policy conflicts, and tries to eliminate
the conflict by changing the local preference of the paths involved.
Both the detection and elimination of policy conflicts are performed
locally, i.e., by using only local information. Randomization
is introduced to prevent synchronous updates of the local preferences
of the paths involved in the same conflict.
%R 2004-011
%T A Two-step Statistical Approach for Inferring Network Traffic Demands (Revises Technical Report BUCS-TR-2003-003)
%A Medina, Alberto
%A Salamatian, Kave
%A Taft, Nina
%A Matta, Ibrahim
%A Diot, Christophe
%D March 1, 2004
%U http://www.cs.bu.edu/techreports/2004-011-two-step-tm-inference.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Accurate knowledge of traffic demands in a communication network
enables or enhances a variety of traffic engineering and network
management tasks of paramount importance for operational
networks. Directly measuring a complete set of these demands is
prohibitively expensive because of the huge amounts of data that must be
collected and the performance impact that such measurements would
impose on the regular behavior of the network. As a consequence, we
must rely on statistical techniques to produce estimates of actual
traffic demands from partial information. The performance of such
techniques is however limited due to their reliance on limited
information and the high amount of computations they incur, which
limits their convergence behavior. In this paper we study a two-step
approach for inferring network traffic demands. First we elaborate
and evaluate a modeling approach for generating good starting points
to be fed to iterative statistical inference techniques. We call these
starting points {\it informed priors} since they are obtained using
actual network information such as packet traces and SNMP link
counts. Second we provide a very fast variant of the EM algorithm
which extends its computation range, increasing its accuracy and
decreasing its dependence on the quality of the starting point.
Finally, we evaluate and compare alternative mechanisms for generating
starting points and the convergence characteristics of our EM
algorithm against a recently proposed Weighted Least Squares approach.
%R 2004-012
%T Simultaneous Localization and Recognition of Dynamic Hand Gestures
%A Alon, Jonathan
%A Athitsos, Vassilis
%A Yuan, Quan
%A Sclaroff, Stan
%D March 8, 2004
%U http://www.cs.bu.edu/techreports/2004-012-dstw.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A framework for the simultaneous localization and recognition of
dynamic hand gestures is proposed. At the core of this framework is a
dynamic space-time warping (DSTW) algorithm, that aligns a pair of
query and model gestures in both space and time. For every frame of
the query sequence, feature detectors generate multiple hand region
candidates. Dynamic programming is then used to compute both a global
matching cost, which is used to recognize the query gesture, and a
warping path, which aligns the query and model sequences in time, and
also finds the best hand candidate region in every query frame. The
proposed framework includes translation invariant recognition of
gestures, a desirable property for many HCI systems. The performance
of the approach is evaluated on a dataset of hand signed digits
gestured by people wearing short sleeve shirts, in front of a
background containing other non-hand skin-colored objects. The
algorithm simultaneously localizes the gesturing hand and recognizes
the hand-signed digit. Although DSTW is illustrated in a gesture
recognition setting, the proposed algorithm is a general method for
matching time series, that allows for multiple candidate feature
vectors to be extracted at each time step.
%R 2004-013
%T A Virtual Deadline Scheduler for Window-Constrained Service Guarantees
%A Zhang, Yuting
%A West, Richard
%A Qi, Xin
%D March 22, 2004
%U http://www.cs.bu.edu/techreports/2004-013-vds.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper presents a new approach to window-constrained scheduling,
suitable for multimedia and weakly-hard real-time systems. We
originally developed an algorithm, called Dynamic Window-Constrained
Scheduling (DWCS), that attempts to guarantee no more than x out of y
deadlines are missed for real-time jobs such as periodic CPU tasks, or
delay-constrained packet streams. While DWCS is capable of generating
a feasible window-constrained schedule that utilizes 100% of
resources, it requires all jobs to have the same request periods (or
intervals between successive service requests). We describe a new
algorithm called Virtual Deadline Scheduling (VDS), that provides
window-constrained service guarantees to jobs with potentially
different request periods, while still maximizing resource
utilization.
VDS attempts to service m out of k job instances by their virtual
deadlines, that may be some finite time after the corresponding
real-time deadlines. Notwithstanding, VDS is capable of outperforming
DWCS and similar algorithms, when servicing jobs with potentially
different request periods. Additionally, VDS is able to limit the
extent to which a fraction of all job instances are serviced
late. Results from simulations show that VDS can provide better
window-constrained service guarantees than other related algorithms,
while still having as good or better delay bounds for all scheduled
jobs. Finally, an implementation of VDS in the Linux kernel compares
favorably against DWCS for a range of scheduling loads.
%R 2004-014
%T Learning Euclidean Embeddings for Indexing and Classification
%A Athitsos, Vassilis
%A Alon, Joni
%A Sclaroff, Stan
%A Kollios, George
%D April 7, 2004
%U http://www.cs.bu.edu/techreports/2004-014-boostmap-learning.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
BoostMap is a recently proposed method for efficient approximate
nearest neighbor retrieval in arbitrary non-Euclidean
spaces with computationally expensive and possibly
non-metric distance measures. Database and query objects
are embedded into a Euclidean space, in which similarities
can be rapidly measured using a weighted Manhattan distance.
The key idea is formulating embedding construction
as a machine learning task, where AdaBoost is used
to combine simple, 1D embeddings into a multidimensional
embedding that preserves a large amount of the proximity
structure of the original space. This paper demonstrates
that, using the machine learning formulation of BoostMap,
we can optimize embeddings for indexing and classification,
in ways that are not possible with existing alternatives for
constructive embeddings, and without additional costs in retrieval
time. First, we show how to construct embeddings
that are query-sensitive, in the sense that they yield a different
distance measure for different queries, so as to improve nearest
neighbor retrieval accuracy for each query. Second, we
show how to optimize embeddings for nearest neighbor classification
tasks, by tuning them to approximate a parameter
space distance measure, instead of the original feature-based distance
measure.
%R 2004-015
%T Automated Camera Layout to Satisfy Task-Specific and Floorplan-Specific Coverage Requirements
%A Erdem, Murat
%A Sclaroff, Stan
%D April 15, 2004
%U http://www.cs.bu.edu/techreports/2004-015-camera-layout.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In many multi-camera vision systems the effect of camera locations on
the task-specific quality of service is ignored. Researchers in
Computational Geometry have proposed elegant solutions for some sensor
location problem classes. Unfortunately, these solutions utilize
unrealistic assumptions about the cameras' capabilities that make
these algorithms unsuitable for many real-world computer vision
applications: unlimited field of view, infinite depth of field, and/or
infinite servo precision and speed. In this paper, the general camera
placement problem is first defined with assumptions that are more
consistent with the capabilities of real-world cameras. The region to
be observed by cameras may be volumetric, static or dynamic, and may
include holes that are caused, for instance, by columns or furniture
in a room that can occlude potential camera views. A subclass of this
general problem can be formulated in terms of planar regions that are
typical of building floorplans. Given a foorplan to be observed, the
problem is then to efficiently compute a camera layout such that
certain task-specific constraints are met. A solution to this problem
is obtained via binary optimization over a discrete problem space. In
experiments the performance of the resulting system is demonstrated
with different real foorplans.
%R 2004-016
%T Robust Tracking of Human Motion
%A Buzan, Dan
%D April 23, 2004
%U http://www.cs.bu.edu/techreports/2004-016-robust-tracking.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This technical report presents a combined solution for two problems,
one: tracking objects in 3D space and estimating their trajectories and
second: computing the similarity between previously estimated trajectories
and clustering them using the similarities that we just computed. For the
first part, trajectories are estimated using an EKF formulation that will
provide the 3D trajectory up to a constant. To improve accuracy, when
occlusions appear, multiple hypotheses are followed. For the second
problem we compute the distances between trajectories using a similarity
based on LCSS formulation. Similarities are computed between projections
of trajectories on coordinate axes. Finally we group trajectories together
based on previously computed distances, using a clustering algorithm. To
check the validity of our approach, several experiments using real data
were performed.
%R 2004-017
%T Extraction and Clustering of Motion Trajectories in Video
%A Buzan, Dan
%A Sclaroff, Stan
%A Kollios, George
%D April 23, 2004
%U http://www.cs.bu.edu/techreports/2004-017-trajectory-clustering.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A system is described that tracks moving objects in a video dataset so as
to extract a representation of the objects' 3D trajectories. The system
then finds hierarchical clusters of similar trajectories in the video
dataset. Objects' motion trajectories are extracted via an EKF formulation
that provides each object's 3D trajectory up to a constant factor. To
increase accuracy when occlusions occur, multiple tracking hypotheses are
followed. For trajectory-based clustering and retrieval, a modified
version of edit distance, called longest common subsequence (LCSS) is
employed. Similarities are computed between projections of trajectories on
coordinate axes. Trajectories are grouped based, using an agglomerative
clustering algorithm. To check the validity of the approach, experiments
using real data were performed.
%R 2004-018
%T Group Key Manager on a Smart Card
%A Hamandi, Hani
%A Itkis, Gene
%D April 27, 2004
%U http://www.cs.bu.edu/techreports/2004-018-group-key-manager.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Group communication is as an important functionality, which needs to
be supported by various communication technologies. Applications of
group communication include IP (or application-level) multicast,
wireless and/or ad-hoc networks, broadcast, conference calling,
pay-per-view, and even such seemingly unrelated to networks areas as
copy protection. For many, if not all, of these applications, security
and trust play an important role. Securing group communication
typically requires confidentiality and authentication, which typically
rely on secret keys. Thus key management issues must be addressed.
This paper describes an implementation of one approach to dynamic
group key management, which is based on Logical Key Hierarchy or
Subset-Cover approach [1,2]. Our approach achieves a dramatic
reduction of the storage requirements for the Group Key Manager, and
in particular allows all the secret key data to be stored on a
smart-card. It also allows a number of subsequent improvements.
%R 2004-019
%T Interactive Password Schemes
%A Itkis, Gene
%A Maiss, Arwa
%D April 27, 2004
%U http://www.cs.bu.edu/techreports/2004-019-interactive-password.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Usual password schemes suffer from the flaw that they are easy to
steal. An attacker who has correctly observed a login session (by
peeping, wiretapping and/or by launching a "man-in-the-middle" attack,
etc.) can easily impersonate the corresponding user. Available
protection techniques require computations on hundreds digit integers
that are so complex that they require special software and/or
hardware. This project tries to combine the simplicity of the
conventional password schemes with a protection technique that results
in a different password being typed each session, but only requires
simple computation performed in the user's head.
%R 2004-020
%T Characterization of Network-Wide Anomalies in Traffic Flows
%A Lakhina, Anukool
%A Crovella, Mark
%A Diot, Christophe
%D May 14, 2004
%U http://www.cs.bu.edu/techreports/2004-020-traffic-flow-anomalies.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Detecting and understanding anomalies in IP networks is an open and
ill-defined problem. Toward this end, we have recently proposed the
subspace method for anomaly diagnosis. In this paper we present the
first large-scale exploration of the power of the subspace method when
applied to flow traffic. An important aspect of this approach
is that it fuses information from flow measurements taken
throughout a network. We apply the subspace method to three different
types of sampled flow traffic in a large academic network: multivariate
timeseries of byte counts, packet counts, and IP-flow counts. We show
that each traffic type brings into focus a different set of
anomalies via the subspace method. We illustrate and classify the set
of anomalies detected. We find that almost all of the anomalies
detected represent events of interest to network operators.
Furthermore, the anomalies span a remarkably wide spectrum of event
types, including denial of service attacks (single-source and
distributed), flash crowds, port scanning, downstream traffic
engineering, high-rate flows, worm propagation, and network outage.
%R 2004-021
%T Safe Compositional Specification of Networking Systems
%A Bestavros, Azer
%A Bradley, Adam
%A Kfoury, Assaf
%A Matta, Ibrahim
%D May 14, 2004
%U http://www.cs.bu.edu/techreports/2004-021-compositional-net-specs.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The Science of Network Service Composition has clearly emerged as one
of the grand themes driving many of our research questions in the
networking field today [NeXtworking 2003]. This driving force stems
from the rise of sophisticated applications and new networking
paradigms. By ``service composition'' we mean that the performance
and correctness properties local to the various constituent components
of a service can be readily composed into global (end-to-end)
properties without re-analyzing any of the constituent components in
isolation, or as part of the whole composite service. The set of laws
that would govern such composition is what will constitute that new
science of composition.
The combined heterogeneity and dynamic open nature of network
systems makes composition quite challenging, and thus programming
network services has been largely inaccessible to the average user.
We identify (and outline) a research agenda in which we aim to develop
a specification language that is expressive enough to describe
different components of a network service, and that will include type
hierarchies inspired by type systems in general programming languages
that enable the safe composition of software components. We envision
this new science of composition to be built upon several theories
(e.g., control theory, game theory, network calculus, percolation
theory, economics, queuing theory). In essence, different theories may
provide different languages by which certain properties of system
components can be expressed and composed into larger systems. We then
seek to lift these lower-level specifications to a higher level by
abstracting away details that are irrelevant for safe composition at
the higher level, thus making theories scalable and useful to the
average user.
In this paper we focus on services built upon an overlay management
architecture, and we use control theory and QoS theory as example
theories from which we lift up compositional specifications.
%R 2004-022
%T SEP: A Stable Election Protocol for clustered heterogeneous wireless sensor networks
%A Smaragdakis, Georgios
%A Matta, Ibrahim
%A Bestavros, Azer
%D May 31, 2004
%U http://www.cs.bu.edu/techreports/2004-022-sep.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We study the impact of heterogeneity of nodes, in terms of their
energy, in wireless sensor networks that are hierarchically
clustered. In these networks some of the nodes become cluster heads,
aggregate the data of their cluster members and transmit it to the
sink. We assume that a percentage of the population of sensor nodes is
equipped with additional energy resources---this is a source of
heterogeneity which may result from the initial setting or as the
operation of the network evolves. We also assume that the sensors are
randomly (uniformly) distributed and are not mobile, the coordinates
of the sink and the dimensions of the sensor field are known. We show
that the behavior of such sensor networks becomes very unstable once
the first node dies, especially in the presence of node heterogeneity.
Classical clustering protocols assume that all the nodes are equipped
with the same amount of energy and as a result, they can not take full
advantage of the presence of node heterogeneity. We propose SEP, a
heterogeneous-aware protocol to prolong the time interval before the
death of the first node (we refer to as stability period), which
is crucial for many applications where the feedback from the sensor
network must be reliable. SEP is based on weighted election
probabilities of each node to become cluster head according to the
remaining energy in each node. We show by simulation that SEP always
prolongs the stability period compared to (and that the average
throughput is greater than) the one obtained using current clustering
protocols. We conclude by studying the sensitivity of our SEP
protocol to heterogeneity parameters capturing energy imbalance in the
network. We found that SEP yields longer stability region for higher
values of extra energy brought by more powerful nodes.
%R 2004-023
%T DIP: Density Inference Protocol for wireless sensor networks and its application to density-unbiased statistics
%A Riga, Niky
%A Matta, Ibrahim
%A Bestavros, Azer
%D May 31, 2004
%U http://www.cs.bu.edu/techreports/2004-023-dip.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Wireless sensor networks have recently emerged as enablers of
important applications such as environmental, chemical and nuclear
sensing systems. Such applications have sophisticated
spatial-temporal semantics that set them aside from traditional
wireless networks. For example, the computation of temperature
averaged over the sensor field must take into account local densities.
This is crucial since otherwise the estimated average temperature can
be biased by over-sampling areas where a lot more sensors exist.
Thus, we envision that a fundamental service that a wireless sensor
network should provide is that of estimating local densities.
In this paper, we propose a lightweight probabilistic density
inference protocol, we call DIP, which allows each sensor node to
implicitly estimate its neighborhood size without the explicit
exchange of node identifiers as in existing density discovery schemes.
The theoretical basis of DIP is a probabilistic analysis which gives
the relationship between the number of sensor nodes contending in the
neighborhood of a node and the level of contention measured by that
node. Extensive simulations confirm the premise of DIP: it can
provide statistically reliable and accurate estimates of local density
at a very low energy cost and constant running time. We demonstrate
how applications could be built on top of our DIP-based service by
computing density-unbiased statistics from estimated local densities.
%R 2004-024
%T On the Interaction between Data Aggregation and Topology Control in Wireless Sensor Networks
%A Erramilli, Vijay
%A Matta, Ibrahim
%A Bestavros, Azer
%D June 18, 2004
%U http://www.cs.bu.edu/techreports/2004-024-aggregation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Wireless sensor networks are characterized by limited energy
resources. To conserve energy, application-specific aggregation
(fusion) of data reports from multiple sensors can be beneficial in
reducing the amount of data flowing over the network. Furthermore,
controlling the topology by scheduling the activity of nodes between
active and sleep modes has often been used to uniformly distribute the
energy consumption among all nodes by de-synchronizing their
activities. We present an integrated analytical model to study the
joint performance of in-network aggregation and topology control. We
define performance metrics that capture the tradeoffs among delay,
energy, and fidelity of the aggregation. Our results indicate that to
achieve high fidelity levels under medium to high event reporting
load, shorter and fatter aggregation/routing trees (toward the sink)
offer the best delay-energy tradeoff as long as topology control is
well coordinated with routing.
%R 2004-025
%T Bayesian Packet Loss Detection for TCP
%A Fonseca, Nahur
%A Crovella, Mark
%D July 1, 2004
%U http://www.cs.bu.edu/techreports/2004-025-tcpbayes.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
One of TCP's critical tasks is to determine which packets are lost in
the network, as a basis for control actions (flow control and packet
retransmission). Modern TCP implementations use two mechanisms:
timeout, and fast retransmit. Detection via timeout is necessarily a
time-consuming operation; fast retransmit, while much quicker, is only
effective for a small fraction of packet losses. In this paper we
consider the problem of packet loss detection in TCP more generally. We
concentrate on the fact that TCP's control actions are necessarily
triggered by *inference* of packet loss, rather than conclusive
knowledge. This suggests that one might analyze TCP's packet loss
detection in a standard inferencing framework based on probability of
detection and probability of false alarm. This paper makes two
contributions to that end: First, we study an example of more general
packet loss inference, namely optimal Bayesian packet loss detection
based on round trip time. We show that for long-lived flows, it is
frequently possible to achieve high detection probability and low false
alarm probability based on measured round trip time. Second, we
construct an analytic performance model that incorporates general packet
loss inference into TCP. We show that for realistic detection and
false alarm probabilities (as are achievable via our Bayesian detector)
and for moderate packet loss rates, the use of more general packet loss
inference in TCP can improve throughput by as much as 25%.
%R 2004-026
%T dPAM: A Distributed Prefetching Protocol for Scalable Asynchronous Multicast in P2P Systems
%A Sharma, Abhishek
%A Bestavros, Azer
%A Matta, Ibrahim
%D July 1, 2004
%U http://www.cs.bu.edu/techreports/2004-026-dPAM.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We leverage the buffering capabilities of end-systems to achieve
scalable, asynchronous delivery of streams in a peer-to-peer
environment. Unlike existing cache-and-relay schemes, we propose a
distributed prefetching protocol where peers prefetch and store
portions of the streaming media ahead of their playout time, thus not
only turning themselves to possible sources for other peers but their
prefetched data can allow them to overcome the departure of their
source-peer. This stands in sharp contrast to existing
cache-and-relay schemes where the departure of the source-peer forces
its peer children to go the original server, thus disrupting their
service and increasing server and network load. Through mathematical
analysis and simulations, we show the effectiveness of maintaining
such asynchronous multicasts from several source-peers to other
children peers, and the efficacy of prefetching in the face of peer
departures. We confirm the scalability of our dPAM protocol as it is
shown to significantly reduce server load.
%R 2004-027
%T On Trip Planning Queries in Spatial Databases
%A Li, Feifei Li
%A Dihan, Cheng
%D July 1, 2004
%U http://www.cs.bu.edu/techreports/2004-027-spatial-db-trip-planning-queries.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper we discuss a new type of query in Spatial
Databases, called Trip Planning Query (TPQ). Given a set
of points P in space, where each point belongs to a category,
and given two points s and e, TPQ asks for the best trip that
starts at s, passes through exactly one point from each category,
and ends at e. An example of a TPQ is when a user
wants to visit a set of different places and at the same time
minimize the total travelling cost, e.g. what is the shortest
travelling plan for me to visit an automobile shop, a CVS
pharmacy outlet, and a Best Buy shop along my trip from A to
B? The trip planning query is an extension of the well-known
TSP problem and therefore is NP-hard. The difficulty of this
query lies in the existence of multiple choices for each category.
In this paper, we first study fast approximation algorithms
for the trip planning query in a metric space, assuming
that the data set fits in main memory, and give the theory
analysis of their approximation bounds. Then, the trip planning
query is examined for data sets that do not fit in main
memory and must be stored on disk. For the disk-resident
data, we consider two cases. In one case, we assume that the
points are located in Euclidean space and indexed with an Rtree.
In the other case, we consider the problem of points that
lie on the edges of a spatial network (e.g. road network) and
the distance between two points is defined using the shortest
distance over the network. Finally, we give an experimental
evaluation of the proposed algorithms using synthetic data
sets generated on real road networks.
%R 2004-028
%T GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams
%A Chang, Ching
%A Li, Feifei Li
%A Bestavros, Azer
%A Kollios, George
%D July 1, 2004
%U http://www.cs.bu.edu/techreports/2004-028-gdj.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We investigate adaptive buffer management techniques for approximate
evaluation of sliding window joins over multiple data streams. In many
applications, data stream processing systems have limited memory or
have to deal with very high speed data streams. In both cases,
computing the exact results of joins between these streams may not be
feasible, mainly because the buffers used to compute the joins contain
much smaller number of tuples than the tuples contained in the sliding
windows. Therefore, a stream buffer management policy is needed in that
case. We show that the buffer replacement policy is an important
determinant of the quality of the produced results. To that end, we
propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering
technique for managing these buffers. GDJ exploits the temporal
correlations (at both long and short time scales), which we found to
be prevalent in many real data streams. We note that our algorithm is
readily applicable to multiple data streams and multiple joins and
requires almost no additional system resources. We report results of
an experimental study using both synthetic and real-world data
sets. Our results demonstrate the superiority and flexibility of our
approach when contrasted to other recently proposed techniques.
%R 2004-029
%T M2RC: Multiplicative-increase/additive-decrease Multipath Routing Control for Wireless Sensor Networks
%A Morcos, Hany
%A Matta, Ibrahim
%A Bestavros, Azer
%D July 14, 2004
%U http://www.cs.bu.edu/techreports/2004-029-m2rc.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Routing protocols in wireless sensor networks (WSN) face
two main challenges: first, the challenging environments in
which WSNs are deployed negatively affect the quality of
the routing process. Therefore, routing protocols for WSNs
should recognize and react to node failures and packet losses.
Second, sensor nodes are battery-powered, which makes
power a scarce resource. Routing protocols should optimize
power consumption to prolong the lifetime of the WSN. In
this paper, we present a new adaptive routing protocol for
WSNs, we call it M2RC. M2RC has two phases: mesh establishment
phase and data forwarding phase. In the first
phase,M2RC establishes the routing state to enable multipath
data forwarding. In the second phase, M2RC forwards data
packets from the source to the sink. Targeting hop-by-hop
reliability, an M2RC forwarding node waits for an acknowledgement
(ACK) that its packets were correctly received at
the next neighbor. Based on this feedback, an M2RC node
applies multiplicative-increase/additive-decrease (MIAD) to
control the number of neighbors targeted by its packet broadcast.
We simulated M2RC in the ns-2 simulator and
compared it to GRAB, Max-power, and Min-power routing
schemes. Our simulations show that M2RC achieves
the highest throughput with at least 10-30 percent less consumed
power per delivered report in scenarios where a certain number
of nodes unexpectedly fail.
%R 2004-030
%T Friendly Virtual Machine: Leveraging a Feedback-Control Model for Application Adaptation
%A Zhang, Yuting
%A Bestavros, Azer
%A Guirguis, Mina
%A Matta, Ibrahim
%A West, Richard
%D July 19, 2004
%U http://www.cs.bu.edu/techreports/2004-030-FVM.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
With the increased use of ``Virtual Machines'' (VMs) as vehicles that
isolate applications running on the same host, it is necessary to
devise techniques that enable multiple VMs to share underlying
resources both fairly and efficiently. To that end, one common
approach is to deploy complex resource management techniques in the
hosting infrastructure. Alternately, in this paper, we
advocate the use of self-adaptation in the VMs themselves based on
feedback about resource usage and availability. Consequently, we
define a ``Friendly'' VM (FVM) to be a virtual machine that adjusts
its demand for system resources, so that they are both efficiently and
fairly allocated to competing FVMs. Such properties are ensured using
one of many provably convergent control rules, such as AIMD. By
adopting this distributed application-based approach to resource
management, it is not necessary to make assumptions about the
underlying resources nor about the requirements of FVMs competing for
these resources. To demonstrate the elegance and simplicity of our
approach, we present a prototype implementation of our FVM framework
in User-Mode Linux (UML)---an implementation that consists of less
than 500 lines of code changes to UML. We present an analytic,
control-theoretic model of FVM adaptation, which establishes
convergence and fairness properties. These properties are also backed
up with experimental results using our prototype FVM implementation.
%R 2004-031
%T Approximately Uniform Random Sampling in Sensor Networks
%A Bash, Boulat
%A Byers, John
%A Considine, Jeffrey
%D July 19, 2004
%U http://www.cs.bu.edu/techreports/2004-031-random-sampling.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Recent work in sensor databases has focused extensively on distributed
query problems, notably distributed computation of aggregates. Existing
methods for computing aggregates broadcast queries to all sensors and use
in-network aggregation of responses to minimize messaging costs. In this
work, we focus on uniform random sampling across nodes, which can serve
both as an alternative building block for aggregation and as an integral
component of many other useful randomized algorithms. Prior to our work,
the best existing proposals for uniform random sampling of sensors involve
contacting all nodes in the network. We propose a practical method which
is only approximately uniform, but contacts a number of sensors
proportional to the diameter of the network instead of its size. The
approximation achieved is tunably close to exact uniform sampling, and
only relies on well-known existing primitives, namely geographic routing,
distributed computation of Voronoi regions and von Neumann's rejection
method. Ultimately, our sampling algorithm has the same worst-case
asymptotic cost as routing a point-to-point message, and thus it is
asymptotically optimal among request/reply-based sampling methods. We
provide experimental results demonstrating the effectiveness of our
algorithm on both synthetic and real sensor topologies.
%R 2004-032
%T A Note On the Statistical Difference of Small Direct Products
%A Reyzin, Leonid
%D September 21, 2004
%U http://www.cs.bu.edu/techreports/2004-032-statdiff.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We demonstrate that if two probability distributions D and E of
sufficiently small min-entropy have statistical difference \epsilon, then
the direct-product distributions D^l and E^l have statistical difference
at least roughly \epsilon\sqrt{l}, provided that l is sufficiently small,
smaller than roughly \epsilon^{-4/3}. Previously known bounds did not
work for few repetitions l, requiring l>\epsilon^{-2}.
%R 2004-033
%T Periodic Motion Detection and Estimation via Space-Time Sampling
%A Thangali, Ashwin
%A Sclaroff, Stan
%D November 2, 2004
%U http://www.cs.bu.edu/techreports/2004-033-periodic-motion-detection.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A novel technique to detect and localize periodic movements in video is
presented. The distinctive feature of the technique
is that it requires neither feature tracking nor object
segmentation. Intensity patterns along linear sample paths in
space-time are used in estimation of period of object motion in a given
sequence of frames. Sample paths are obtained by
connecting (in space-time) sample points from regions of high motion
magnitude in the first and last frames. Oscillations in
intensity values are induced at time instants when an object intersects
the sample path. The locations of peaks in intensity are
determined by parameters of both cyclic object motion and orientation of
the sample path with respect to object motion. The
information about peaks is used in a least squares framework to obtain an
initial estimate of these parameters. The estimate is
further refined using the full intensity profile. The best estimate for
the period of cyclic object motion is obtained by looking
for consensus among estimates from many sample paths. The proposed
technique is evaluated with synthetic videos where
ground-truth is known, and with American Sign Language videos where the
goal is to detect periodic hand motions.
%R 2004-034
%T Multi-scale 3D Scene Flow from Binocular Stereo Sequences
%A Li, Rui
%A Sclaroff, Stan
%D November 2, 2004
%U http://www.cs.bu.edu/techreports/2004-034-multiscale3d-flow.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Scene flow methods estimate the three-dimensional motion field for
points in the world, using multi-camera video data. Such methods
combine multi-view reconstruction with motion estimation approaches.
This paper describes an alternative formulation for dense scene flow
estimation that provides convincing results using only two cameras by
fusing stereo and optical flow estimation into a single coherent
framework. To handle the aperture problems inherent in the estimation
task, a multi-scale method along with a novel adaptive smoothing
technique is used to gain a regularized solution. This combined approach
both preserves discontinuities and prevents over-regularization -- two
problems commonly associated with basic multi-scale approaches.
Internally, the framework generates probability distributions for
optical flow and disparity. Taking into account the uncertainty in the
intermediate stages allows for more reliable estimation of the 3D scene
flow than standard stereo and optical flow methods allow. Experiments
with synthetic and real test data demonstrate the effectiveness of
the approach.
%R 2004-035
%T Automatic 2D Hand Tracking in Video Sequences
%A Yuan, Quan
%A Sclaroff, Stan
%A Athitsos, Vassilis
%D November 2, 2004
%U http://www.cs.bu.edu/techreports/2004-035-auto-hand-tracking.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In gesture and sign language video sequences, hand motion tends to be
rapid, and hands frequently appear in front of each other or in front
of the face. Thus, hand location is often ambiguous, and naive
color-based hand tracking is insufficient. To improve tracking
accuracy, some methods employ a prediction-update framework, but such
methods require careful initialization of model parameters, and tend
to drift and lose track in extended sequences. In this paper, a
temporal filtering framework for hand tracking is proposed that can
initialize and reset itself without human intervention. In each frame,
simple features like color and motion residue are exploited to
identify multiple candidate hand locations. The temporal filter then
uses the Viterbi algorithm to select among the candidates from frame
to frame. The resulting tracking system can automatically identify
video trajectories of unambiguous hand motion, and detect frames where
tracking becomes ambiguous because of occlusions or
overlaps. Experiments on video sequences of several hundred frames in
duration demonstrate the system's ability to track hands robustly, to
detect and handle tracking ambiguities, and to extract the
trajectories of unambiguous hand motion.
%R 2004-036
%T Handsignals Recognition From Video Using 3D Motion Capture Data
%A Tian, Tai-Peng
%A Sclaroff, Stan
%D November 4, 2004
%U http://www.cs.bu.edu/techreports/2004-036-handsignals-recognition.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Hand signals are commonly used in applications such as giving
instructions to a pilot for airplane take off or direction of a crane
operator by a foreman on the ground. A new algorithm for recognizing
hand signals from a single camera is proposed. Typically, tracked 2D
feature positions of hand signals are matched to 2D training
images. In contrast, our approach matches the 2D feature positions to
an archive of 3D motion capture sequences. The method avoids explicit
reconstruction of the 3D articulated motion from 2D image
features. Instead, the matching between the 2D and 3D sequence is done
by backprojecting the 3D motion capture data onto 2D. Experiments
demonstrate the effectiveness of the approach in an example app
lication: recognizing six classes of basketball referee hand signals
in video.
%R 2005-001
%T Scalable Coordination Techniques for Distributed Network Monitoring
%A Sharma, Manish
%A Byers, John
%D January 20, 2005
%U http://www.cs.bu.edu/techreports/2005-001-coordinated-monitoring.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Emerging network monitoring infrastructures capture packet-level
traces or keep per-flow statistics at a set of distributed vantage
points. Today, distributed monitors in such an infrastructure do not
coordinate monitoring effort, which both can lead to duplication of
effort and can complicate subsequent data analysis. We argue that
nodes in such a monitoring infrastructure, whether across the
wide-area Internet, or across a sensor network, should coordinate
effort to minimize resource consumption. We propose space-efficient
data structures for use in gossip-based protocols to approximately
summarize sets of monitored flows. With some fine-tuning of our
methods, we can ensure that all flows observed by at least one monitor
are monitored, and only a tiny fraction are monitored redundantly. Our
preliminary results over a realistic ISP topology demonstrate the
effectiveness of our techniques on monitoring tens of thousands of
point-of-presence (PoP) level network flows. Our methods are
competitive with optimal off-line coordination, but require
significantly less space and network overhead than naive approaches.
%R 2005-002
%T Mining Anomalies Using Traffic Distributions
%A Lakhina, Anukool
%A Crovella, Mark
%A Diot, Christophe
%D February 10, 2005
%U http://www.cs.bu.edu/techreports/2005-002-anomaly-mining.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The increasing practicality of large-scale flow capture makes it possible
to conceive of traffic analysis methods that detect and identify a large
and diverse set of anomalies. However the challenge of effectively
analyzing this massive data source for anomaly diagnosis is as yet
unmet. We argue that the distributions of packet features (IP addresses
and ports) observed in flow traces reveals both the presence and the
structure of a wide range of anomalies. Using entropy as a summarization
tool, we show that the analysis of feature distributions leads to
significant advances on two fronts: (1) it enables highly sensitive
detection of a wide range of anomalies, augmenting detections by
volume-based methods, and (2) it enables automatic classification of
anomalies via unsupervised learning. We show that using feature
distributions, anomalies naturally fall into distinct and meaningful
clusters. These clusters can be used to automatically classify anomalies
and to uncover new anomaly types. We validate our claims on data from two
backbone networks (Abilene and Geant) and conclude that feature
distributions show promise as a key element of a fairly general network
anomaly diagnosis framework.
%R 2005-003
%T Applied Type System with Stateful Views
%A Xi, Hongwei
%A Zhu, Dengping
%A Li, Yanka
%D February 10, 2005
%U http://www.cs.bu.edu/techreports/2005-003-ATSwSV.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a type system that can effectively facilitate the use of
types in capturing invariants in stateful programs that may involve
(sophisticated) pointer manipulation. With its root in a recently
developed framework Applied Type System (ATS), the type system imposes
a level of abstraction on program states by introducing a novel
notion of recursive stateful views and then relies on a form of
linear logic to reason about such views. We consider the design and
then the formalization of the type system to constitute the primary
contribution of the paper. In addition, we mention a prototype
implementation of the type system and then give a variety of
examples that attests to the practicality of programming with
recursive stateful views.
%R 2005-004
%T Comparison of k-ary n-cube and de Bruijn Overlays in QoS-constrained Multicast Applications
%A West, Richard
%A Fry, Gerald
%A Wong, Gary
%D February 23, 2005
%U http://www.cs.bu.edu/techreports/2005-004-overlay.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Research on the construction of logical overlay networks has gained
significance in recent times. This is partly due to work on
peer-to-peer (P2P) systems for locating and retrieving distributed
data objects, and also scalable content distribution using end-system
multicast techniques. However, there are emerging applications that
require the real-time transport of data from various sources to
potentially many thousands of subscribers, each having their own
quality-of-service (QoS) constraints. This paper primarily focuses on
the properties of two popular topologies found in interconnection
networks, namely k-ary n-cubes and de Bruijn graphs. The regular
structure of these graph topologies makes them easier to analyze and
determine possible routes for real-time data than complete or
irregular graphs. We show how these overlay topologies compare in
their ability to deliver data according to the QoS constraints of many
subscribers, each receiving data from specific publishing hosts.
Comparisons are drawn on the ability of each topology to route data in
the presence of dynamic system effects, due to end-hosts joining and
departing the system. Finally, experimental results show the service
guarantees and physical link stress resulting from efficient multicast
trees constructed over both kinds of overlay networks.
%R 2005-005
%T An Efficient User-Level Shared Memory Mechanism for Application-Specific Extensions
%A West, Richard
%A Gloudon, Jason
%A Qi, Xin
%A Parmer, Gabriel
%D February 23, 2005
%U http://www.cs.bu.edu/techreports/2005-005-sandboxing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper focuses on an efficient user-level method for the
deployment of application-specific extensions, using commodity
operating systems and hardware. A sandboxing technique is described
that supports multiple extensions within a shared virtual address
space. Applications can register sandboxed code with the system, so
that it may be executed in the context of any process. Such code may
be used to implement generic routines and handlers for a class of
applications, or system service extensions that complement the
functionality of the core kernel. Using our approach,
application-specific extensions can be written like conventional
user-level code, utilizing libraries and system calls, with the
advantage that they may be executed without the traditional costs of
scheduling and context-switching between process-level protection
domains. No special hardware support such as segmentation or tagged
translation look-aside buffers (TLBs) is required. Instead, our
``user-level sandboxing'' mechanism requires only paged-based virtual
memory support, given that sandboxed extensions are either written by
a trusted source or are guaranteed to be memory-safe (e.g., using
type-safe languages). Using a fast method of upcalls, we show how our
mechanism provides significant performance improvements over
traditional methods of invoking user-level services. As an application
of our approach, we have implemented a user-level network subsystem
that avoids data copying via the kernel and, in many cases, yields far
greater network throughput than kernel-level approaches. This is a
revised and extended version of BUCS-TR-2003-014.
%R 2005-006
%T Cuckoo: a Language for Implementing Memory- and Thread-safe System Services
%A West, Richard
%A Wong, Gary
%D February 23, 2005
%U http://www.cs.bu.edu/techreports/2005-006-cuckoo.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper is centered around the design of a thread- and memory-safe
language, primarily for the compilation of application-specific services
for extensible operating systems. We describe various issues that have
influenced the design of our language, called Cuckoo, that guarantees
safety of programs with potentially asynchronous flows of control.
Comparisons are drawn between Cuckoo and related software safety
techniques, including Cyclone and software-based fault isolation (SFI),
and performance results suggest our prototype compiler is capable of
generating safe code that executes with low runtime overheads, even
without potential code optimizations. Compared to Cyclone, Cuckoo is able
to safely guard accesses to memory when programs are multithreaded.
Similarly, Cuckoo is capable of enforcing memory safety in situations that
are potentially troublesome for techniques such as SFI.
%R 2005-007
%T SymbolDesign: A User-centered Method to Design Pen-based Interfaces and Extend the Functionality of Pointer Input Devices
%A Betke, Margrit
%A Gusyatin, Oleg
%A Urinson, Mikhail
%D February 27, 2005
%U http://www.cs.bu.edu/techreports/2005-007-symbol-design.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A method called ``SymbolDesign'' is proposed that can be used to
design user-centered interfaces for pen-based input devices. It can
also extend the functionality of pointer input devices such as the
traditional computer mouse or the Camera Mouse, a camera-based
computer interface. Users can create their own interfaces by choosing
single-stroke movement patterns that are convenient to draw with the
selected input device and by mapping them to a desired set of
commands. A pattern could be the trace of a moving finger detected
with the Camera Mouse or a symbol drawn with an optical pen. The core
of the SymbolDesign system is a dynamically created classifier, in the
current implementation an artificial neural network. The architecture
of the neural network automatically adjusts according to the
complexity of the classification task. In experiments, subjects used
the SymbolDesign method to design and test the interfaces they
created, for example, to browse the web. The experiments demonstrated
good recognition accuracy and responsiveness of the user interfaces.
The method provided an easily-designed and easily-used computer input
mechanism for people without physical limitations, and, with some
modifications, has the potential to become a computer access tool for
people with severe paralysis.
%R 2005-008
%T MosaicShape: Stochastic Region Grouping with Shape Prior
%A Wang, Jingbin
%A Gu, Erdan
%A Betke, Margrit
%D February 27, 2005
%U http://www.cs.bu.edu/techreports/2005-008-mosaic-shape.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A novel method that combines shape-based object recognition and image
segmentation is proposed for shape retrieval from images. Given a
shape prior represented in a multi-scale curvature form, the proposed
method identifies the target objects in images by grouping
oversegmented image regions. The problem is formulated in a unified
probabilistic framework and solved by a stochastic Markov Chain Monte
Carlo (MCMC) mechanism. By this means, object segmentation and
recognition are accomplished simultaneously. Within each sampling
move during the simulation process, probabilistic region grouping
operations are influenced by both the image information and the shape
similarity constraint. The latter constraint is measured by a partial
shape matching process. A generalized parallel algorithm by Barbu and
Zhu, combined with a large sampling jump and other implementation
improvements, greatly speeds up the overall stochastic process. The
proposed method supports the segmentation and recognition of multiple
occluded objects in images. Experimental results are provided for
both synthetic and real images.
%R 2005-009
%T Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measures
%A Athitsos, Vassilis
%A Alon, Jonathan
%A Sclaroff, Stan
%D March 16, 2005
%U http://www.cs.bu.edu/techreports/2005-009-nearest-neighbor-classification.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Nearest neighbor classification using shape context can yield highly
accurate results in a number of recognition problems. Unfortunately,
the approach can be too slow for practical applications, and thus
approximation strategies are needed to make shape context
practical. This paper proposes a method for efficient and accurate
nearest neighbor classification in non-Euclidean spaces, such as the
space induced by the shape context measure. First, a method is
introduced for constructing a Euclidean embedding that is optimized
for nearest neighbor classification accuracy. Using that embedding,
multiple approximations of the underlying non-Euclidean similarity
measure are obtained, at different levels of accuracy and
efficiency. The approximations are automatically combined to form a
cascade classifier, which applies the slower approximations only to
the hardest cases. Unlike typical cascade-of-classifiers approaches,
that are applied to binary classification problems, our method
constructs a cascade for a multiclass problem. Experiments with a
standard shape data set indicate that a two-to-three order of
magnitude speed up is gained over the standard shape context
classifier, with minimal losses in classification accuracy.
%R 2005-010
%T Query-Sensitive Embeddings
%A Athitsos, Vassilis
%A Hadjieleftheriou, Marios
%A Kollios, George
%A Sclaroff, Stan
%D March 16, 2005
%U http://www.cs.bu.edu/techreports/2005-010-query-sesnsitive-embeddings.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A common problem in many types of databases is retrieving the most
similar matches to a query object. Finding those matches in a large
database can be too slow to be practical, especially in domains where
objects are compared using computationally expensive similarity (or
distance) measures. This paper proposes a novel method for approximate
nearest neighbor retrieval in such spaces. Our method is
embedding-based, meaning that it constructs a function that maps
objects into a real vector space. The mapping preserves a large amount
of the proximity structure of the original space, and it can be used
to rapidly obtain a short list of likely matches to the query. The
main novelty of our method is that it constructs, together with the
embedding, a query-sensitive distance measure that should be used when
measuring distances in the vector space. The term ``query-sensitive''
means that the distance measure changes depending on the current query
object. We report experiments with an image database of handwritten
digits, and a time-series database. In both cases, the proposed method
outperforms existing state-of-the-art embedding methods, meaning that
it provides significantly better trade-offs between efficiency and
retrieval accuracy.
%R 2005-011
%T Robust Sketching and Aggregation of Distributed Data Streams
%A Hadjieleftheriou, Marios
%A Byers, John
%A Kollios, George
%D March 16, 2005
%U http://www.cs.bu.edu/techreports/2005-011-robust-stream-sketching.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The data streaming model provides an attractive framework for one-pass
summarization of massive data sets at a single observation
point. However, in an environment where multiple data streams arrive
at a set of distributed observation points, sketches must be computed
remotely and then must be aggregated through a hierarchy before
queries may be conducted. As a result, many sketch-based methods for
the single stream case do not apply directly, as either the error
introduced becomes large, or because the methods assume that the
streams are non-overlapping. These limitations hinder the application
of these techniques to practical problems in network traffic
monitoring and aggregation in sensor networks. To address this, we
develop a general framework for evaluating and enabling robust
computation of duplicate-sensitive aggregate functions (e.g., SUM and
QUANTILE), over data produced by distributed sources. We instantiate
our approach by augmenting the Count-Min and Quantile-Digest sketches
to apply in this distributed setting, and analyze their
performance. We conclude with experimental evaluation to validate our
analysis.
%R 2005-012
%T Real Time Eye Tracking and Blink Detection with USB Cameras
%A Chau, Michael
%A Betke, Margrit
%D April 28, 2005
%U http://www.cs.bu.edu/techreports/2005-012-blink-detection.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A human-computer interface (HCI) system designed for use by people
with severe disabilities is presented. People that are severely
paralyzed or afflicted with diseases such as ALS (Lou Gehrig's
disease) or multiple sclerosis are unable to move or control any parts
of their bodies except for their eyes. The system presented here
detects the user's eye blinks and analyzes the pattern and duration of
the blinks, using them to provide input to the computer in the form of
a mouse click. After the automatic initialization of the system occurs
from the processing of the user's involuntary eye blinks in the first
few seconds of use, the eye is tracked in real time using correlation
with an online template. If the user's depth changes significantly or
rapid head movement occurs, the system is automatically
reinitialized. There are no lighting requirements nor offline
templates needed for the proper functioning of the system. The system
works with inexpensive USB cameras and runs at a frame rate of 30
frames per second. Extensive experiments were conducted to determine
both the system's accuracy in classifying voluntary and involuntary
blinks, as well as the system's fitness in varying environment
conditions, such as alternative camera placements and different
lighting conditions. These experiments on eight test subjects yielded
an overall detection accuracy of 95.3%.
%R 2005-013
%T An Invariant Representation for Matching Trajectories across Uncalibrated Video Streams
%A Nunziati, Walter
%A Sclaroff, Stan
%A Del Bimbo, Alberto
%D May 1, 2005
%U http://www.cs.bu.edu/techreports/2005-016-matching-trajectories.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We introduce a viewpoint invariant representation of moving object
trajectories that can be used in video database applications. It is
assumed that trajectories lie on a surface that can be locally
approximated with a plane. Raw trajectory data is first locally
approximated with a cubic spline via least squares fitting. For each
sampled point of the obtained curve, a projective invariant feature is
computed using a small number of points in its neighborhood. The
resulting sequence of invariant features computed along the entire
trajectory forms the view invariant descriptor of the trajectory
itself. Time parametrization has been exploited to compute cross
ratios without ambiguity due to point ordering. Similarity between
descriptors of different trajectories is measured with a distance that
takes into account the statistical properties of the cross ratio, and
its symmetry with respect to the point at in nity. In experiments, an
overall correct classification rate of about 95% has been obtained on
a dataset of 58 trajectories of players in soccer video, and an
overall correct classification rate of about 80% has been obtained on
matching partial segments of trajectories collected from two
overlapping views of outdoor scenes with moving people and cars.
%R 2005-014
%T Typed Abstraction of Complex Network Compositions
%A Bestavros, Azer
%A Bradley, Adam
%A Kfoury, Assaf
%A Matta, Ibrahim
%D May 1, 2005
%U http://www.cs.bu.edu/techreports/2005-014-traffic-framework.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The heterogeneity and open nature of network systems make analysis of
compositions of components quite challenging, making the design and
implementation of robust network services largely inaccessible to the
average programmer. We propose the development of a novel type system
and practical type spaces which reflect simplified representations of
the results and conclusions which can be derived from complex
compositional theories in more accessible ways, essentially allowing
the system architect or programmer to be exposed only to the inputs
and output of compositional analysis without having to be familiar
with the ins and outs of its internals. Toward this end we present
the TRAFFIC (Typed Representation and Analysis of Flows For
Interoperability Checks) framework, a simple flow-composition and
typing language with corresponding type system. We then discuss and
demonstrate the expressive power of a type space for TRAFFIC derived
from the network calculus, allowing us to reason about and infer such
properties as data arrival, transit, and loss rates in large composite
network applications.
%R 2005-015
%T Safe Compositional Specification of Networking Systems: TRAFFIC The Language and Its Type Checking
%A Liu, Likai
%A Kfoury, Assaf
%A Bestavros, Azer
%A Bradley, Adam
%A Gabay, Yarom
%A Matta, Ibrahim
%D May 12, 2005
%U http://www.cs.bu.edu/techreports/2005-015-traffic-types.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper formally defines the operational semantic for TRAFFIC, a
specification language for flow composition applications proposed in
BUCS-TR-2005-014, and presents a type system based on desired
safety assurance. We provide proofs on reduction (weak-confluence,
strong-normalization and unique normal form), on soundness and
completeness of type system with respect to reduction, and on
equivalence classes of flow specifications. Finally, we provide a
pseudo-code listing of a syntax-directed type checking algorithm
implementing rules of the type system capable of inferring the type of
a closed flow specification.
%R 2005-016
%T An Invariant Representation for Matching Trajectories across uncalibrated video streams
%A Nunziati, Walter
%A Sclaroff, Stan
%A Del Bimbo, Alberto
%D May 19, 2005
%U http://www.cs.bu.edu/techreports/2005-016-matching-trajectories.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We introduce a view-point invariant representation of moving
object trajectories that can be used in video database applications. It
is assumed that trajectories lie on a surface that can be locally approximated
with a plane. Raw trajectory data is first locally?approximated
with a cubic spline via least squares fitting. For each sampled point of
the obtained curve, a projective invariant feature is computed using a
small number of points in its neighborhood. The resulting sequence of
invariant features computed along the entire trajectory forms the view?
invariant descriptor of the trajectory itself. Time parametrization has
been exploited to compute cross ratios without ambiguity due to point
ordering. Similarity between descriptors of different trajectories is measured
with a distance that takes into account the statistical properties of
the cross ratio, and its symmetry with respect to the point at infinity. In
experiments, an overall correct classification rate of about 95% has been
obtained on a dataset of 58 trajectories of players in soccer video, and
an overall correct classification rate of about 80% has been obtained on
matching partial segments of trajectories collected from two overlapping
views of outdoor scenes with moving people and cars.
%R 2005-017
%T View registration using interesting segments of planar trajectories
%A Nunziati, Walter
%A Alon, Jonathan
%A Sclaroff, Stan
%A Del Bimbo, Alberto
%D May 19, 2005
%U http://www.cs.bu.edu/techreports/2005-017-view-registration.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We introduce a method for recovering the spatial and temporal
alignment between two or more views of objects moving over a ground
plane. Existing approaches either assume that the streams are globally
synchronized, so that only solving the spatial alignment is needed, or
that the temporal misalignment is small enough so that exhaustive
search can be performed. In contrast, our approach can recover both
the spatial and temporal alignment. We compute for each trajectory a
number of interesting segments, and we use their description to form
putative matches between trajectories. Each pair of corresponding
interesting segments induces a temporal alignment, and defines an
interval of common support across two views of an object that is used
to recover the spatial alignment. Interesting segments and their
descriptors are defined using algebraic projective invariants measured
along the trajectories. Similarity between interesting segments is
computed taking into account the statistics of such
invariants. Candidate alignment parameters are verified checking the
consistency, in terms of the symmetric transfer error, of all the
putative pairs of corresponding interesting segments. Experiments are
conducted with two different sets of data, one with two views of an
outdoor scene featuring moving people and cars, and one with four
views of a laboratory sequence featuring moving radio-controlled cars.
%R 2005-018
%T Foreground Object Segmentation from Binocular Stereo Video
%A Law, Kevin
%A Sclaroff, Stan
%D May 19, 2005
%U http://www.cs.bu.edu/techreports/2005-018-stereo-foreground-background.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Moving cameras are needed for a wide range of applications in
robotics, vehicle systems, surveillance, etc. However, many foreground
object segmentation methods reported in the literature are unsuitable
for such settings; these methods assume that the camera is fixed and
the background changes slowly, and are inadequate for segmenting
objects in video if there is significant motion of the camera or
background. To address this shortcoming, a new method for segmenting
foreground objects is proposed that utilizes binocular video. The
method is demonstrated in the application of tracking and segmenting
people in video who are approximately facing the binocular camera
rig. Given a stereo image pair, the system first tries to find
faces. Starting at each face, the region containing the person is
grown by merging regions from an over-segmented color image. The
disparity map is used to guide this merging process. The system has
been implemented on a consumer-grade PC, and tested on video sequences
of people indoors obtained from a moving camera rig. As can be
expected, the proposed method works well in situations where other
foreground-background segmentation methods typically fail. We believe
that this superior performance is partly due to the use of object
detection to guide region merging in disparity/color foreground
segmentation, and partly due to the use of disparity information
available with a binocular rig, in contrast with most previous methods
that assumed monocular sequences.
%R 2005-019
%T Online and Offine Character Recognition Using Alignment to Prototypes
%A Alon, Jonathan
%A Athitsos, Vassilis
%A Sclaroff, Stan
%D June 3, 2005
%U http://www.cs.bu.edu/techreports/2005-019-alignment-to-prototypes.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Nearest neighbor classifiers are simple to implement, yet they can
model complex non-parametric distributions, and provide
state-of-the-art recognition accuracy in OCR databases. At the
same time, they may be too slow for practical character
recognition, especially when they rely on similarity measures that
require computationally expensive pairwise alignments between
characters. This paper proposes an efficient method for computing
an approximate similarity score between two characters based on
their exact alignment to a small number of prototypes. The
proposed method is applied to both online and offline character
recognition, where similarity is based on widely used and
computationally expensive alignment methods, i.e., Dynamic Time
Warping and the Hungarian method respectively. In both cases
significant recognition speedup is obtained at the expense of only
a minor increase in recognition error.
%R 2005-020
%T Fast and Accurate Gesture Spotting using Subgesture Reasoning and Pruning of Unlikely Dynamic Programming Paths
%A Alon, Jonathan
%A Athitsos, Vassilis
%A Sclaroff, Stan
%D June 3, 2005
%U http://www.cs.bu.edu/techreports/2005-020-gesture-spotting.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Vision-based recognition of gestures in continuous video
streams can facilitate more natural human-computer interaction.
Gesture spotting is the challenging task of locating
the start and end frames of the video stream that correspond
to a gesture of interest, while at the same time rejecting
non-gesture motion patterns. This paper proposes a new
gesture spotting and recognition algorithm that is based on
the widely used continuous dynamic programming (CDP)
algorithm. Our first contribution is a pruning method that
allows the system to evaluate a relatively small number of
hypotheses compared to CDP. Pruning is implemented by
a set of model-dependent classifiers, that are learned from
training examples. In our experiments, the proposed CDP
with pruning was an order of magnitude faster compared
to the original CDP algorithm, and recognition accuracy
improved by 7%. The second contribution of the proposed
spotting algorithm is a subgesture reasoning process that
models the fact that some gesture models can falsely match
parts of other longer gestures. In our experiments, using the
proposed subgesture modeling improved recognition accuracy
by an additional 12%.
%R 2005-021
%T Detecting Instances of Shape Classes That Exhibit Variable Structure
%A Athitsos, Vassilis
%A Wang, Jingbin
%A Sclaroff, Stan
%A Betke, Margrit
%D June 8, 2005
%U http://www.cs.bu.edu/techreports/2005-021-variable-shape-structure-detection.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper proposes a method for detecting shapes of variable
structure in images with clutter. The term ``variable structure''
means that some shape parts can be repeated an arbitrary number of
times, some parts can be optional, and some parts can have several
alternative appearances. The particular variation of the shape
structure that occurs in a given image is not known a priori. Existing
computer vision methods, including deformable model methods, were not
designed to detect shapes of variable structure; they may only be used
to detect shapes that can be decomposed into a fixed, a priori known,
number of parts. The proposed method can handle both variations in
shape structure and variations in the appearance of individual shape
parts. A new class of shape models is introduced, called Hidden State
Shape Models, that can naturally represent shapes of variable
structure. A detection algorithm is described that finds instances of
such shapes in images with large amounts of clutter by finding
globally optimal correspondences between image features and shape
models. Experiments with real images demonstrate that our method can
localize plant branches that consist of an a priori unknown number of
leaves and can detect hands more accurately than a hand detector based
on the chamfer distance.
%R 2005-022
%T Face identification by a cascade of rejection classifiers
%A Yuan, Quan
%A Thangali, Ashwin
%A Sclaroff, Stan
%D June 10, 2005
%U http://www.cs.bu.edu/techreports/2005-022-face-id-by-classifiers-rejection.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Nearest neighbor search is commonly employed in face recognition but
it does not scale well to large dataset sizes. A strategy to combine
rejection classifiers into a cascade for face identification is
proposed in this paper. A rejection classifier for a pair of classes
is defined to reject at least one of the classes with high
confidence. These rejection classifiers are able to share
discriminants in feature space and at the same time have high
confidence in the rejection decision. In the face identification
problem, it is possible that a pair of known individual faces are very
dissimilar. It is very unlikely that both of them are close to an
unknown face in the feature space. Hence, only one of them needs to be
considered. Using a cascade structure of rejection classifiers, the
scope of nearest neighbor search can be reduced
significantly. Experiments on Face Recognition Grand Challenge (FRGC)
version 1 data demonstrate that the proposed method achieves
significant speed up and an accuracy comparable with the brute force
Nearest Neighbor method. In addition, a graph cut based clustering
technique is employed to demonstrate that the pairwise separability of
these rejection classifiers is capable of semantic grouping.
%R 2005-023
%T Fast Head Tilt Detection for Human-Computer Interaction
%A Waber, Benjamin
%A Magee, John
%A Betke, Margrit
%D July 7, 2005
%U http://www.cs.bu.edu/techreports/2005-023-head-tilt.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Accurate head tilt detection has a large potential to aid people with
disabilities in the use of human-computer interfaces and provide
universal access to communication software. We show how it can be
utilized to tab through links on a web page or control a video game
with head motions. It may also be useful as a correction method for
currently available video-based assistive technology that requires
upright facial poses. Few of the existing computer vision methods that
detect head rotations in and out of the image plane with reasonable
accuracy can operate within the context of a real-time communication
interface because the computational expense that they incur is too
great. Our method uses a variety of metrics to obtain a robust head
tilt estimate without incurring the computational cost of previous
methods. Our system runs in real time on a computer with a 2.53 GHz
processor, 256 MB of RAM and an inexpensive webcam, using only 55% of
the processor cycles.
%R 2005-024
%T Facial Feature Tracking and Occlusion Recovery in American Sign Language
%A Castelli, Thomas
%A Betke, Margrit
%A Neidle, Carol
%D July 7, 2005
%U http://www.cs.bu.edu/techreports/2005-024-tracking-occlusion-ASL.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Facial features play an important role in expressing grammatical
information in signed languages, including American Sign Language
(ASL). Gestures such as raising or furrowing the eyebrows are key
indicators of constructions such as yes-no questions. Periodic head
movements (nods and shakes) are also an essential part of the
expression of syntactic information, such as negation (associated with
a side-to-side headshake). Therefore, identification of these facial
gestures is essential to sign language recognition. One problem with
detection of such grammatical indicators is occlusion recovery. If
the signer's hand blocks his/her eyebrows during production of a sign,
it becomes difficult to track the eyebrows. We have developed a
system to detect such grammatical markers in ASL that recovers
promptly from occlusion. Our system detects and tracks evolving
templates of facial features, which are based on an anthropometric
face model, and interprets the geometric relationships of these
templates to identify grammatical markers. It was tested on a variety
of ASL sentences signed by various Deaf native signers and detected
facial gestures used to express grammatical information, such as
raised and furrowed eyebrows as well as headshakes.
%R 2005-025
%T Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions
%A Tian, Tai-Peng
%A Li, Rui
%A Sclaroff, Stan
%D July 7, 2005
%U http://www.cs.bu.edu/techreports/2005-025-learned-articulated-pose-estimation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A learning based framework is proposed for estimating human body pose
from a single image. Given a differentiable function that maps from
pose space to image feature space, the goal is to invert the process:
estimate the pose given only image features. The inversion is an
ill-posed problem as the inverse mapping is a one to many
process. Hence multiple solutions exist, and it is desirable to
restrict the solution space to a smaller subset of feasible
solutions. For example, not all human body poses are feasible due to
anthropometric constraints. Since the space of feasible solutions may
not admit a closed form description, the proposed framework seeks to
exploit machine learning techniques to learn an approximation that is
smoothly parameterized over such a space. One such technique is
Gaussian Process Latent Variable Modelling. Scaled conjugate gradient
is then used tond the best matching pose in the space of feasible
solutions when given an input image. The formulation allows easy
incorporation of various constraints, e.g. temporal consistency and
anthropometric constraints. The performance of the proposed approach
is evaluated in the task of upper-body pose estimation from
silhouettes and compared with the Specialized Mapping
Architecture. The estimation accuracy of the Specialized Mapping
Architecture is at least one standard deviation worse than the
proposed approach in the experiments with synthetic data. In
experiments with real video of humans performing gestures, the
proposed approach produces qualitatively better estimation results.
%R 2005-026
%T Mistreatment in Distributed Caching Groups: Causes and Implications
%A Laoutaris, Nikolaos
%A Smaragdakis, Georgios
%A Bestavros, Azer
%A Stavrakakis, Ioannis
%D July 7, 2005
%U http://www.cs.bu.edu/techreports/2005-026-distributed-caching-mistreatment.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Although cooperation generally increases the amount of resources
available to a community of nodes, thus improving individual and
collective performance, it also allows for the appearance of potential
mistreatment problems through the exposition of one node's resources
to others. We study such concerns by considering a group of
independent, rational, self-aware nodes that cooperate using on-line
caching algorithms, where the exposed resource is the storage of each
node. Motivated by content networking applications -- including web
caching, CDNs, and P2P -- this paper extends our previous work on the
off-line version of the problem, which was limited to object
replication and was conducted under a game-theoretic framework. We
identify and investigate two causes of mistreatment: (1) cache state
interactions (due to the cooperative servicing of requests) and (2)
the adoption of a common scheme for cache
replacement/redirection/admission policies. Using analytic models,
numerical solutions of these models, as well as simulation
experiments, we show that on-line cooperation schemes using caching
are fairly robust to mistreatment caused by state interactions. When
this becomes possible, the interaction through the exchange of
miss-streams has to be very intense, making it feasible for the
mistreated nodes to detect and react to the exploitation. This
robustness ceases to exist when nodes fetch and store objects in
response to remote requests, i.e., when they operate as Level-2 caches
(or proxies) for other nodes. Regarding mistreatment due to a common
scheme, we show that this can easily take place when the ``outlier''
characteristics of some of the nodes get overlooked. This finding
underscores the importance of allowing cooperative caching nodes the
flexibility of choosing from a diverse set of schemes to fit the
peculiarities of individual nodes. To that end, we outline an
emulation-based framework for the development of
mistreatment-resilient distributed selfish caching schemes.
%R 2005-027
%T Computing a Uniform Scaling Parameter for 3D Registration of Lung Surfaces
%A Rodeski, Vladimir
%A Mullally, William
%A Bellardine, Carissa
%A Lutchen, Kenneth
%A Betke, Margrit
%D July 7, 2005
%U http://www.cs.bu.edu/techreports/2005-027-scale-based-lung-registration.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A difficulty in lung image registration is accounting for changes in
the size of the lungs due to inspiration. We propose two methods for
computing a uniform scale parameter for use in lung image registration
that account for size change. A scaled rigid-body transformation
allows analysis of corresponding lung CT scans taken at different
times and can serve as a good low-order transformation to initialize
non-rigid registration approaches. Two different features are used to
compute the scale parameter. The first method uses lung surfaces. The
second uses lung volumes. Both approaches are computationally
inexpensive and improve the alignment of lung images over rigid
registration. The two methods produce different scale parameters and
may highlight different functional information about the lungs.
%R 2005-028
%T An Adaptive Policy Management Approach to BGP Convergence
%A Yilmaz, Selma
%A Matta, Ibrahim
%D July 7, 2005
%U http://www.cs.bu.edu/techreports/2005-028-BGP-convergence.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The Border Gateway Protocol (BGP) is the current inter-domain routing
protocol used to exchange reachability information between Autonomous
Systems (ASes) in the Internet. BGP supports policy-based routing
which allows each AS to independently adopt a set of local policies
that specify which routes it accepts and advertises from/to other
networks, as well as which route it prefers when more than one route
becomes available. However, independently chosen local policies may
cause global conflicts, which result in protocol divergence. In this
paper, we propose a new algorithm, called Adaptive Policy Management
Scheme (APMS), to resolve policy conflicts in a distributed manner.
Akin to distributed feedback control systems, each AS independently
classifies the state of the network as either conflict-free or
potentially-conflicting by observing its local history only (namely,
route flaps). Based on the degree of measured conflicts, each AS
dynamically adjusts its own path preferences---increasing its
preference for observably stable paths over flapping paths. APMS also
includes a mechanism to distinguish route flaps due to topology
changes, so as not to confuse them with those due to policy
conflicts. A correctness and convergence analysis of APMS based on the
sub-stability property of chosen paths is presented. Implementation in
the SSF network simulator is performed, and simulation results for
different performance metrics are presented. The metrics capture the
dynamic performance (in terms of instantaneous throughput, delay,
routing load, etc.) of APMS and other competing solutions, thus
exposing the often neglected aspects of performance.
%R 2005-029
%T Tracking Human Body Pose on a Learned Smooth Space
%A Tian, Tai-Peng
%A Li, Rui
%A Sclaroff, Stan
%D July 28, 2005
%U http://www.cs.bu.edu/techreports/2005-029-learned-tracking.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Particle filtering is a popular method used in systems for tracking
human body pose in video. One key difficulty in using particle
filtering is caused by the curse of dimensionality: generally a very
large number of particles is required to adequately approximate the
underlying pose distribution in a high-dimensional state
space. Although the number of degrees of freedom in the human body is
quite large, in reality, the subset of allowable configurations in
state space is generally restricted by human biomechanics, and the
trajectories in this allowable subspace tend to be smooth. Therefore,
a framework is proposed to learn a low-dimensional representation of
the high-dimensional human poses state space. This mapping can be
learned using a Gaussian Process Latent Variable Model (GPLVM)
framework. One important advantage of the GPLVM framework is that both
the mapping to, and mapping from the embedded space are smooth; this
facilitates sampling in the low-dimensional space, and samples
generated in the low-dimensional embedded space are easily mapped back
into the original high-dimensional space. Moreover, human body poses
that are similar in the original space tend to be mapped close to each
other in the embedded space; this property can be exploited when
sampling in the embedded space. The proposed framework is tested in
tracking 2D human body pose using a Scaled Prismatic
Model. Experiments on real life video sequences demonstrate the
strength of the approach. In comparison with the Multiple Hypothesis
Tracking and the standard Condensation algorithm, the proposed
algorithm is able to maintain tracking reliably throughout the long
test sequences. It also handles singularity and self occlusion
robustly.
%R 2005-030
%T Some Considerations on a Calculus with Weak References
%A Donnelly, Kevin
%A Kfoury, Assaf
%D July 27, 2005
%U http://www.cs.bu.edu/techreports/2005-030-considerations-on-weak-references.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Weak references are references that do not prevent the object they point
to from being garbage collected. Most realistic languages, including
Java, SML/NJ, and OCaml to name a few, have some facility for
programming with weak references. Weak references are used in
implementing idioms like memoizing functions and hash-consing in order
to avoid potential memory leaks.
However, the semantics of weak references in many languages are not
clearly specified. Without a formal semantics for weak references it
becomes impossible to prove the correctness of implementations making
use of this feature. Previous work by Hallett and Kfoury extends $\gc$,
a language for modeling garbage collection, to $\weak$, a similar
language with weak references.
Using this previously formalized semantics for weak references, we
consider two issues related to well-behavedness of programs. Firstly,
we provide a new, simpler proof of the well-behavedness of the
syntactically restricted fragment of $\weak$ defined previously.
Secondly, we give a natural semantic criterion for well-behavedness much
broader than the syntactic restriction, which is useful as principle for
programming with weak references.
Furthermore we extend the result, proved in previously of $\gc$, which
allows one to use type-inference to collect some reachable objects that
are never used. We prove that this result holds of our language, and we
extend this result to allow the collection of weakly-referenced
reachable garbage without incurring the computational overhead sometimes
associated with collecting weak bindings (e.g. the need to recompute a
memoized function).
Lastly we use extend the semantic framework to model the key/value weak
references found in Haskell and we prove the Haskell is semantics
equivalent to a simpler semantics due to the lack of side-effects in our
language
%R 2005-031
%T A Formal Semantics for Weak References
%A Hallett, Joseph
%A Kfoury, Assaf
%D August 8, 2005
%U http://www.cs.bu.edu/techreports/2005-031-weak-refs.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A weak reference is a reference to an object that is not followed by the
pointer tracer when garbage collection is called. That is, a weak
reference cannot prevent the object it references from being garbage
collected. Weak references remain a troublesome programming feature
largely because there is not an accepted, precise semantics that describes
their behavior (in fact, we are not aware of any formalization of their
semantics). The trouble is that weak references allow reachable objects to
be garbage collected, therefore allowing garbage collection to influence
the result of a program. Despite this difficulty, weak references continue
to be used in practice for reasons related to efficient storage
management, and are included in many popular programming languages
(Standard ML, Haskell, OCaml, and Java).
We give a formal semantics for a calculus called that includes weak
references and is derived from Morrisett, Felleisen, and Harper's .
formalizes the notion of garbage collection by means of a rewrite
rule. Such a formalization is required to precisely characterize the
semantics of weak references. However, the inclusion of a
garbage-collection rewrite-rule in a language with weak references
introduces non-deterministic evaluation, even if the parameter-passing
mechanism is deterministic (call-by-value in our case). This raises
the question of confluence for our rewrite system. We discuss natural
restrictions under which our rewrite system is confluent, thus
guaranteeing uniqueness of program result. We define conditions that
allow other garbage collection algorithms to co-exist with our
semantics of weak references. We also introduce a polymorphic type
system to prove the absence of erroneous program behavior (i.e., the
absence of "stuck evaluation") and a corresponding type inference
algorithm. We prove the type system sound and the inference algorithm
sound and complete.
%R 2005-032
%T MusicMaker -- A Camera-based Music Making Tool for Physical Rehabilitation
%A Gorman, Mikhail
%A Betke, Margrit
%A Saltzman, Elliot
%A Lahav, Amir
%D December 8, 2005
%U http://www.cs.bu.edu/techreports/2005-032-music-maker.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The therapeutic effects of playing music are being recognized
increasingly in the field of rehabilitation medicine. People with
physical disabilities, however, often do not have the motor dexterity
needed to play an instrument. We developed a camera-based
human-computer interface called ``MusicCamera'' to provide such people
with a means to make music by performing therapeutic exercises.
MusicCamera uses computer vision techniques to convert the movements
of a patient's body part, for example, a finger, hand, or foot, into
musical and visual feedback using the open software platform EyesWeb.
It can be adjusted to a patient's particular therapeutic needs and
provides quantitative tools for monitoring the recovery process and
assessing therapeutic outcomes. We tested the potential of
MusicCamera as a rehabilitation tool with six subjects who responded
to or created music in various movement exercises. In these
proof-of-concept experiments, MusicCamera has performed reliably and
shown its promise as a therapeutic device.
%R 2005-033
%T Safe Compositional Specification of Networking Systems: A Compositional Analysis Approach
%A Liu, Likai
%A Kfoury, Assaf
%A Bestavros, Azer
%A Gabay, Yarom
%A Bradley, Adam
%A Matta, Ibrahim
%D December 28, 2005
%U http://www.cs.bu.edu/techreports/2005-033-traffic-compositional-analysis.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a type inference algorithm, in the style of compositional
analysis, for the language TRAFFIC---a specification language for flow
composition applications proposed in BUCS-TR-2005-014---and prove that
this algorithm is correct: the typings it infers are principal
typings, and the typings agree with syntax-directed type checking on
closed flow specifications. This algorithm is capable of verifying
partial flow specifications, which is a significant improvement over
syntax-directed type checking algorithm presented in BUCS-TR-2005-015.
We also show that this algorithm runs efficiently, i.e., in low-degree
polynomial time.
%R 2005-034
%T Type Systems for a Network Specification Language With Multiple-Choice Let
%A Gabay, Yarom
%A Kfoury, Assaf
%A Liu, Likai
%A Bestavros, Azer
%A Bradley, Adam
%A Matta, Ibrahim
%D December 28, 2005
%U http://www.cs.bu.edu/techreports/2005-034-traffic-multiple-let-binding.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
When analysing the behavior of complex networked systems, it is often
the case that some components within that network are only
known to the extent that they belong to one of a set of possible
"implementations" -- e.g., versions of a specific protocol, class of
schedulers, etc. In this report we augment the specification language
considered in BUCS-TR-2004-021, BUCS-TR-2005-014, BUCS-TR-2005-015,
and BUCS-TR-2005-033, to include a non-deterministic multiple-choice
let-binding, which allows us to consider compositions of networking
subsystems that allow for looser component specifications.
%R 2005-035
%T Inferring Intersection Typings that Are Equivalent to Call-by-Name and Call-by-Value Evaluations
%A Bakewell, Adam
%A Carlier, Sebastien
%A Kfoury, Assaf
%A Wells, J. B.
%D December 30, 2005
%U http://www.cs.bu.edu/techreports/2005-035-nameval.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a procedure to infer a typing for an arbitrary lambda-term
M in an intersection-type system that translates into exactly the
call-by-name (resp., call-by-value) evaluation of M. Our framework is
the recently developed System E which augments intersection types with
expansion variables. The inferred typing for M is obtained by setting
up a unification problem involving both type variables and expansion
variables, which we solve with a confluent rewrite system. The
inference procedure is compositional in the sense that typings for
different program components can be inferred in any order, and without
knowledge of the definition of other program components. Using
expansion variables lets us achieve a compositional inference
procedure easily. Termination of the procedure is generally
undecidable. The procedure terminates and returns a typing iff the
input M is normalizing according to call-by-name (resp.,
call-by-value). The inferred typing is exact in the sense that the
exact call-by-name (resp., call-by-value) behaviour of M can be
obtained by a (polynomial) transformation of the typing. The inferred
typing is also principal in the sense that any other typing that
translates the call-by-name (resp., call-by-value) evaluation of M can
be obtained from the inferred typing for M using a substitution-based
transformation.
%R 2006-001
%T Computational Properties of SNAFU
%A Gabay, Yarom
%A Ocean, Michael
%A Kfoury, Assaf
%A Liu, Likai
%D February 6, 2006
%U http://www.cs.bu.edu/techreports/2006-001-snafu-properties.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Sensor applications in Sensoria
\cite{BestavrosBradleyKfouryOcean:basenets05} are expressed using STEP
(Sensorium Task Execution Plan). SNAFU (SensorNet
Applications as Functional Units) serves as a high-level
sensor-programming language, which is compiled into STEP. In SNAFU's
current form, its differences with STEP are relatively minor, as they
are limited to shorthands and macros not available in STEP. We show
that, however restrictive it may seem, SNAFU has in fact universal
power; technically, it is a Turing-complete language, i.e., any Turing
program can be written in SNAFU (though not always conveniently).
Although STEP may be allowed to have universal power, as a low-level
language not directly available to Sensorium users, SNAFU programmers
may use this power for malicious purposes or inadvertently introduce
errors with destructive consequences. In future developments of SNAFU,
we plan to introduce restrictions and high-level features with safety
guards, such as those provided by a type system, which will make SNAFU
programming safer.
%R 2006-002
%T On the Impact of Low-Rate Attacks
%A Guirguis, Mina
%A Bestavros, Azer
%A Matta, Ibrahim
%D February 6, 2006
%U http://www.cs.bu.edu/techreports/2006-002-low-rate-attack-impact.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Recent research have exposed new breeds of attacks that are capable of
denying service or inflicting significant damage to TCP flows,
without sustaining the attack traffic. Such attacks are often referred
to as ``low-rate'' attacks and they stand in sharp contrast against
traditional Denial of Service (DoS) attacks that can completely shut
off TCP flows by flooding an Internet link. In this paper, we study
the impact of these new breeds of attacks and the extent to which
defense mechanisms are capable of mitigating the attack's
impact. Through adopting a simple discrete-time model with a single
TCP flow and a non-oblivious adversary, we were able to expose new
variants of these low-rate attacks that could potentially have high
attack potency per attack burst. Our analysis is focused towards
worst-case scenarios, thus our results should be regarded as upper
bounds on the impact of low-rate attacks rather than a real assessment
under a specific attack scenario.
%R 2006-003
%T Distributed Selfish Caching
%A Laoutaris, Nikolaos
%A Smaragdakis, Georgios
%A Bestavros, Azer
%A Matta, Ibrahim
%A Stavrakakis, Ioannis
%D February 7, 2006
%U http://www.cs.bu.edu/techreports/2006-003-distributed-selfish-caching.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Although cooperation generally increases the amount of resources
available to a community of nodes, thus improving individual and
collective performance, it also allows for the appearance of potential
mistreatment problems through the exposition of one node's resources
to others. We study such concerns by considering a group of
independent, rational, self-aware nodes that cooperate using on-line
caching algorithms, where the exposed resource is the storage at each
node. Motivated by content networking applications -- including web
caching, CDNs, and P2P -- this paper extends our previous work on the
on-line version of the problem, which was conducted under a
game-theoretic framework, and limited to object replication. We
identify and investigate two causes of mistreatment: (1) cache state
interactions (due to the cooperative servicing of requests) and (2)
the adoption of a common scheme for cache management policies. Using
analytic models, numerical solutions of these models, as well as
simulation experiments, we show that on-line cooperation schemes using
caching are fairly robust to mistreatment caused by state
interactions. To appear in a substantial manner, the interaction
through the exchange of miss-streams has to be very intense, making it
feasible for the mistreated nodes to detect and react to
exploitation. This robustness ceases to exist when nodes fetch and
store objects in response to remote requests, i.e., when they operate
as Level-2 caches (or proxies) for other nodes. Regarding mistreatment
due to a common scheme, we show that this can easily take place when
the "outlier" characteristics of some of the nodes get
overlooked. This finding underscores the importance of allowing
cooperative caching nodes the flexibility of choosing from a diverse
set of schemes to fit the peculiarities of individual nodes. To that
end, we outline an emulation-based framework for the development of
mistreatment-resilient distributed selfish caching schemes. Our
framework utilizes a simple control-theoretic approach to dynamically
parameterize the cache management scheme. We show performance
evaluation results that quantify the benefits from instantiating such a
framework, which could be substantial under skewed demand profiles.
%R 2006-004
%T Authenticated Index Structures for Outsourced Database Systems
%A Li, Feifei
%A Hadjieleftheriou, Marios
%A Kollios, George
%A Reyzin, Leonid
%D April 1, 2006
%U http://www.cs.bu.edu/techreports/2006-004-authentication-btree.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In outsourced database (ODB) systems the database owner publishes its
data through a number of remote servers, with the goal of enabling
clients at the edge of the network to access and query the data more
efficiently. As servers might be untrusted or can be compromised,
query authentication becomes an essential component of ODB
systems. Existing solutions for this problem concentrate mostly on
static scenarios and are based on idealistic properties for certain
cryptographic primitives, looking at the problem mostly from a
theoretical perspective. In this work, first we define a variety of
essential and practical cost metrics associated with ODB systems.
Then we analytically evaluate a number of different approaches, in
search for a solution that best leverages all metrics. Most
importantly, we look at solutions that can handle dynamic scenarios,
where owners periodically update the data residing at the
servers. Finally, we discuss query freshness, a new dimension in data
authentication that has not been explored before. A comprehensive
experimental evaluation of the proposed and existing approaches is
used to validate the analytical models and verify our claims. Our
findings exhibit that the proposed solutions improve performance
substantially over existing approaches, both for static and dynamic
environments.
%R 2006-005
%T Amorphous Placement and Retrieval of Sensory Data in Sparse Mobile Ad-Hoc Networks
%A Morcos, Hany
%A Bestavros, Azer
%A Matta, Ibrahim
%D April 4, 2006
%U http://www.cs.bu.edu/techreports/2006-005-amorphous-placement-and-retrieval.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Personal communication devices are increasingly being equipped with
sensors that are able to passively collect information from their
surroundings -- information that could be stored in fairly small local
caches. We envision a system in which users of such devices use their
collective sensing, storage, and communication resources to query the
state of (possibly remote) neighborhoods. The goal of such a system is
to achieve the highest query success ratio using the least
communication overhead (power). We show that the use of Data Centric
Storage (DCS), or directed placement, is a viable approach for
achieving this goal, but only when the underlying network is well
connected. Alternatively, we propose, amorphous placement, in which
sensory samples are cached locally and informed exchanges of cached
samples is used to diffuse the sensory data throughout the whole
network. In handling queries, the local cache is searched first for
potential answers. If unsuccessful, the query is forwarded to one or
more direct neighbors for answers. This technique leverages node
mobility and caching capabilities to avoid the multi-hop communication
overhead of directed placement. Using a simplified mobility model, we
provide analytical lower and upper bounds on the ability of amorphous
placement to achieve uniform field coverage in one and two
dimensions. We show that combining informed shuffling of cached
samples upon an encounter between two nodes, with the querying of
direct neighbors could lead to significant performance
improvements. For instance, under realistic mobility models, our
simulation experiments show that amorphous placement achieves 10% to
40% better query answering ratio at a 25% to 35% savings in consumed
power over directed placement.
%R 2006-006
%T A customizable camera-based human computer interaction system allowing people with disabilities autonomous hands free navigation of multiple computing tasks
%A Akram, Wajeeha
%A Tiberii, Laura
%A Betke, Margrit
%D May 11, 2006
%U http://www.cs.bu.edu/techreports/2006-006-handsfree-navigation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Many people suffer from conditions that lead to deterioration of motor
control and makes access to the computer using traditional input devices
difficult. In particular, they may loose control of hand movement to the
extent that the standard mouse cannot be used as a pointing device. Most
current alternatives use markers or specialized hardware to track and
translate a user's movement to pointer movement. These approaches may be
perceived as intrusive, for example, wearable devices. Camera-based
assistive systems that use visual tracking of features on the user's
body often require cumbersome manual adjustment. This paper introduces
an enhanced computer vision based strategy where features, for example
on a user's face, viewed through an inexpensive USB camera, are tracked
and translated to pointer movement. The main contributions of this paper
are (1) enhancing a video based interface with a mechanism for mapping
feature movement to pointer movement, which allows users to navigate to
all areas of the screen even with very limited physical movement, and
(2) providing a customizable, hierarchical navigation framework for
human computer interaction (HCI). This framework provides effective use
of the vision-based interface system for accessing multiple applications
in an autonomous setting. Experiments with several users show the
effectiveness of the mapping strategy and its usage within the
application framework as a practical tool for desktop users with
disabilities.
%R 2006-007
%T Web Mediators for Accessible Browsing
%A Waber, Benjamin
%A Magee, John
%A Betke, Margrit
%D May 11, 2006
%U http://www.cs.bu.edu/techreports/2006-007-webcontext.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a highly accurate method for classifying web pages based on
link percentage, which is the percentage of text characters that are
parts of links normalized by the number of all text characters on a
web page. K-means clustering is used to create unique thresholds to
differentiate index pages and article pages on individual web sites.
Index pages contain mostly links to articles and other indices, while
article pages contain mostly text. We also present a novel link
grouping algorithm using agglomerative hierarchical clustering that
groups links in the same spatial neighborhood together while
preserving link structure. Grouping allows users with severe
disabilities to use a scan-based mechanism to tab through a web page
and select items. In experiments, we saw up to a 40-fold reduction in
the number of commands needed to click on a link with a scan-based
interface, which shows that we can vastly improve the rate of
communication for users with disabilities. We used web page
classification and link grouping to alter web page display on an
accessible web browser that we developed to make a usable browsing
interface for users with disabilities. Our classification method
consistently outperformed a baseline classifier even when using
minimal data to generate article and index clusters, and achieved
classification accuracy of 94.0% on web sites with well-formed or
slightly malformed HTML, compared with 80.1% accuracy for the baseline
classifier.
%R 2006-008
%T An Adaptive Management Approach to Resolving Policy Conflicts
%A Yilmaz, Selma
%A Matta, Ibrahim
%D May 25, 2006
%U http://www.cs.bu.edu/techreports/2006-008-adapt-mgmt-policies.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The Border Gateway Protocol (BGP) is the current inter-domain routing
protocol used to exchange reachability information between Autonomous
Systems (ASes) in the Internet. BGP supports policy-based routing
which allows each AS to independently define a set of local policies
on which routes it accepts and advertises from/to other networks, as
well as on which route it prefers when more than one route becomes
available. However, independently chosen local policies may cause
global conflicts, which result in protocol divergence. In this paper,
we propose a new algorithm, called Adaptive Policy Management Scheme
(APMS), to resolve policy conflicts in a distributed manner. Akin to
distributed feedback control systems, each AS independently classifies
the state of the network as either conflict-free or potentially
conflicting by observing its local history only (namely, route
flaps). Based on the degree of measured conflicts, each AS dynamically
adjusts its own path preferences---increasing its preference for
observably stable paths over flapping paths. APMS also includes a
mechanism to distinguish route flaps due to topology changes, so as
not to confuse them with those due to policy conflicts. A correctness
and convergence analysis of APMS based on the sub-stability property
of chosen paths is presented. Implementation in the SSF network
simulator is performed, and simulation results for different
performance metrics are presented. The metrics capture the dynamic
performance (in terms of instantaneous throughput, delay, etc.) of
APMS and other competing solutions, thus exposing the often neglected
aspects of performance.
%R 2006-009
%T On the Interaction between TCP and the Wireless Channel in CDMA2000 Networks
%A Mattar, Karim
%A Sridharan, Ashwin
%A Zang, Hui
%A Matta, Ibrahim
%A Bestavros, Azer
%D June 6, 2006
%U http://www.cs.bu.edu/techreports/2006-009-tcp-cdma-interactions.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this work, we conducted extensive active measurements on a large
nationwide CDMA2000 1xRTT network in order to characterize the impact
of both the Radio Link Protocol and more importantly, the wireless
scheduler, on TCP. Our measurements include standard TCP/UDP logs, as
well as detailed RF layer statistics that allow observability into RF
dynamics. With the help of a robust correlation measure, normalized
mutual information, we were able to quantify the impact of these two
RF factors on TCP performance metrics such as the round trip time,
packet loss rate, instantaneous throughput etc. We show that the
variable channel rate has the larger impact on TCP behavior when
compared to the Radio Link Protocol. Furthermore, we expose and rank
the factors that influence the assigned channel rate itself and in
particular, demonstrate the sensitivity of the wireless scheduler to
the data sending rate. Thus, TCP is adapting its rate to match the
available network capacity, while the rate allocated by the wireless
scheduler is influenced by the sender's behavior. Such a system is
best described as a closed loop system with two feedback controllers,
the TCP controller and the wireless scheduler, each one affecting the
other's decisions. In this work, we take the first steps in
characterizing such a system in a realistic environment.
%R 2006-010
%T Learning Embeddings for Indexing, Retrieval, and Classification, with Applications to Object and Shape Recognition in Image Databases (PhD Thesis)
%A Athitsos, Vassilis
%D June 14, 2006
%U http://www.cs.bu.edu/techreports/2006-010-learning-embeddings.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Nearest neighbor retrieval is the task of identifying, given a
database of objects and a query object, the objects in the database
that are the most similar to the query. Retrieving nearest neighbors
is a necessary component of many practical applications, in fields as
diverse as computer vision, pattern recognition, multimedia databases,
bioinformatics, and computer networks. At the same time, finding
nearest neighbors accurately and efficiently can be challenging,
especially when the database contains a large number of objects, and
when the underlying distance measure is computationally expensive.
This thesis proposes new methods for improving the efficiency and
accuracy of nearest neighbor retrieval and classification in spaces
with computationally expensive distance measures. The proposed methods
are domain-independent, and can be applied in arbitrary spaces,
including non-Euclidean and non-metric spaces. In this thesis
particular emphasis is given to computer vision applications related
to object and shape recognition, where expensive non-Euclidean
distance measures are often needed to achieve high accuracy.
The first contribution of this thesis is the BoostMap algorithm for
embedding arbitrary spaces into a vector space with a computationally
efficient distance measure. Using this approach, an approximate set of
nearest neighbors can be retrieved efficiently - often orders of
magnitude faster than retrieval using the exact distance measure in
the original space. The BoostMap algorithm has two key distinguishing
features with respect to existing embedding methods. First, embedding
construction explicitly maximizes the amount of nearest neighbor
information preserved by the embedding. Second, embedding construction
is treated as a machine learning problem, in contrast to existing
methods that are based on geometric considerations.
The second contribution is a method for constructing query-sensitive
distance measures for the purposes of nearest neighbor retrieval and
classification. In high-dimensional spaces, query-sensitive distance
measures allow for automatic selection of the dimensions that are the
most informative for each specific query object. It is shown
theoretically and experimentally that query-sensitivity increases the
modeling power of embeddings, allowing embeddings to capture a larger
amount of the nearest neighbor structure of the original space.
The third contribution is a method for speeding up nearest neighbor
classification by combining multiple embedding-based nearest neighbor
classifiers in a cascade. In a cascade, computationally efficient
classifiers are used to quickly classify easy cases, and classifiers
that are more computationally expensive and also more accurate are
only applied to objects that are harder to classify. An interesting
property of the proposed cascade method is that, under certain
conditions, classification time actually decreases as the size of the
database increases, a behavior that is in stark contrast to the
behavior of typical nearest neighbor classification systems.
The proposed methods are evaluated experimentally in several different
applications: hand shape recognition, off-line character recognition,
online character recognition, and efficient retrieval of time series.
In all datasets, the proposed methods lead to significant improvements
in accuracy and efficiency compared to existing state-of-the-art
methods. In some datasets, the general-purpose methods introduced in
this thesis even outperform domain-specific methods that have been
custom-designed for such datasets.
%R 2006-011
%T Authenticated Index Sturctures for Aggregation Queries in Outsourced Databases
%A Li, Feifei
%A Hadjieleftheriou, Marios
%A Kollios, George
%A Reyzin, Leonid
%D July 10, 2006
%U http://www.cs.bu.edu/techreports/2006-011-outsourced-databases-authentication.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In an outsourced database system the data owner publishes
information through a number of remote, untrusted servers
with the goal of enabling clients to access and query the
data more efficiently. As clients cannot trust servers, query
authentication is an essential component in any outsourced
database system. Clients should be given the capability to
verify that the answers provided by the servers are correct
with respect to the actual data published by the owner.
While existing work provides authentication techniques for
selection and projection queries, there is a lack of techniques
for authenticating aggregation queries. This article introduces
the rst known authenticated index structures for aggregation
queries. First, we design an index that features
good performance characteristics for static environments,
where few or no updates occur to the data. Then, we extend
these ideas and propose more involved structures for the dynamic
case, where the database owner is allowed to update
the data arbitrarily. Our structures feature excellent average
case performance for authenticating queries with multiple
aggregate attributes and multiple selection predicates.
We also implement working prototypes of the proposed techniques
and experimentally validate the correctness of our ideas.
%R 2006-012
%T Extending snBench to Support Hierarchical and Configurable Scheduling
%A Parmer, Gabriel
%A Zervas, Georgios
%A Bagchi, Angshuman
%D July 14, 2006
%U http://www.cs.bu.edu/techreports/2006-012-snBench-hierarchical-scheduling.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
It is useful in systems that must support multiple applications with
various temporal requirements to allow application-specific policies to
manage resources accordingly. However, there is a tension between this
goal and the desire to control and police possibly malicious programs.
The Java-based Sensor Execution Environment (SXE) in snBench presents a
situation where such considerations add value to the system. Multiple
applications can be run by multiple users with varied temporal
requirements, some Real-Time and others best effort. This paper
outlines and documents an implementation of a hierarchical
and configurable scheduling system with which different applications can
be executed using application-specific scheduling policies. Concurrently
the system administrator can define fairness policies between
applications that are imposed upon the system. Additionally, to ensure
forward progress of system execution in the face of malicious or
malformed user programs, an infrastructure for execution using multiple
threads is described.
%R 2006-013
%T Extending snBench to Provide Concurrency Support in the Sensorium Execution Environment (SXE)
%A Londono, Jorge
%A Manjanatha, Sowmya
%A Han, Zhinan
%D July 14, 2006
%U http://www.cs.bu.edu/techreports/2006-013-snBench-sxe-concurrency.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The SNBENCH is a general-purpose programming environment and run-time
system targeted towards a variety of Sensor applications such as
environmental sensing, location sensing, video sensing, etc. In its
current structure, the run-time engine of the SNBENCH namely, the
Sensorium Execution Environment (SXE) processes the entities of
execution in a single thread of operation. In order to effectively
support applications that are time-sensitive and need priority, it is
imperative to process the tasks discretely so that specific policies can
be applied at a much granular level. The goal of this project was to
modify the SXE to enable efficient use of system resources by way
of multi-tasking the individual components. Additionally, the
transformed SXE offers the ability to classify and employ different
schemes of processing to the individual tasks.
%R 2006-014
%T Extending snBench to Support a Graphical Programming Interface for a Sensor Network Tasking Language (STEP)
%A Chang, Ching
%A Sweha, Raymond
%A Papapetrou, Panagiotis
%D July 14, 2006
%U http://www.cs.bu.edu/techreports/2006-014-snBench-programming-gui.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We report on our development and implementation of a graphical
"programming" interface for a sensor network tasking language called
STEP. The graphical interface allows the user to specify a program
execution graphically from an extensible pallet of functionalities and
save the results as a properly formatted STEP file. Moreover, the
software is able to load a file in STEP format and convert it into the
corresponding graphical representation. During both phases a
type-checker is running on the background to ensure that both the
graphical representation and the STEP file are syntactically correct.
This project has been motivated by the Sensorium project at Boston
University. In this technical report we present the basic features of
the software, the process that has been followed during the design and
implementation. Finally, we describe the approach used to test and
validate our software.
%R 2006-015
%T Extending snBench to Support a Video-Based Intrusion Detection and Alerting System with a Centralized Hash Table
%A Burke, Dustin
%A Cecere, Dave
%A Freiberg, Ben
%D July 14, 2006
%U http://www.cs.bu.edu/techreports/2006-015-snbench-centralized-hash.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this project we design and implement a centralized hashing table
in the snBench sensor network environment. We discuss the feasibility
of this approach and compare and contrast with the distributed hashing
architecture, with particular discussion regarding the conditions under
which a centralized architecture makes sense.
There are numerous computational tasks that require persistence of data
in a sensor network environment. To help motivate the need for data
storage in snBench we demonstrate a practical application of the
technology whereby a video camera can monitor a room to detect the
presence of a person and send an alert to the appropriate authorities.
%R 2006-016
%T Integrating Sensor-Network Research and Development into a Software Engineering Curriculum
%A Ocean, Michael
%A Kfoury, Assaf
%A Bestavros, Azer
%D July 14, 2006
%U http://www.cs.bu.edu/techreports/2006-016-snbench-cs511-curriculum.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The emergence of a sensor-networked world produces a clear and urgent
need for well-planned, safe and secure software engineering. It is the
role of universities to prepare graduates with the knowledge and
experience to enter the work-force with a clear understanding of
software design and its application to the future safety of computing.
The snBench (Sensor Network WorkBench) project aims to provide support
to the programming and deployment of Sensor Network Applications,
enabling shared sensor embedded spaces to be easily tasked with
various sensory applications by different users for simultaneous
execution. In this report we discus our experience using the snBench
research project as the foundation for semester-long project in a
graduate level software engineering class at Boston University (CS511).
%R 2006-017
%T The Cache Inference Problem and its Application to Content and Request Routing
%A Laoutaris, Nikolaos
%A Zervas, Georgos
%A Bestavros, Azer
%A Kollios, George
%D July 14, 2006
%U http://www.cs.bu.edu/techreports/2006-017-cache-inference-and-applications.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In many networked applications, independent caching agents cooperate
by servicing each other's miss streams, without revealing the
operational details of the caching mechanisms they employ. Inference
of such details could be instrumental for many other processes. For
example, it could be used for optimized forwarding (or routing) of
one's own miss stream (or content) to available proxy caches, or for
making cache-aware resource management decisions. In this paper, we
introduce the ``Cache Inference Problem'' (CIP) as that of
inferring the characteristics of a caching agent, given the miss
stream of that agent. While CIP is insolvable in its most general
form, there are special cases of practical importance in which it is,
including when the request stream follows an Independent Reference
Model (IRM) with generalized power-law (GPL) demand distribution. To
that end, we design two basic ``litmus'' tests that are able to detect
LFU and LRU replacement policies, the effective size of the cache and
of the object universe, and the skewness of the GPL demand for
objects. Using extensive experiments under synthetic as well as real
traces, we show that our methods infer such characteristics accurately
and quite efficiently, and that they remain robust even when the
IRM/GPL assumptions do not hold, and even when the underlying
replacement policies are not ``pure'' LFU or LRU. We exemplify the
value of our inference framework by considering example applications.
%R 2006-018
%T Distributed Placement of Service Facilities in Large-Scale Networks
%A Laoutaris, Nikolaos
%A Smaragdakis, Georgios
%A Oikonomou, Konstantinos
%A Stavrakakis, Ioannis
%A Bestavros, Azer
%D July 14, 2006
%U http://www.cs.bu.edu/techreports/2006-018-distributed-facility-location.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The effectiveness of service provisioning in large-scale networks is
highly dependent on the number and location of service facilities
deployed at various hosts. The classical, centralized approach to
determining the latter would amount to formulating and solving the
``uncapacitated k-median'' (UKM) problem (if the requested number of
facilities is fixed), or the ``uncapacitated facility location'' (UFL)
problem (if the number of facilities is also to be optimized).
Clearly, such centralized approaches require knowledge of global
topological and demand information, and thus do not scale and are not
practical for large networks. The key question posed and answered in
this paper is the following: ``How can we determine in a distributed
and scalable manner the number and location of service facilities?''
We propose an innovative approach in which topology and demand
information is limited to neighborhoods, or ``balls'' small radius
around selected facilities, whereas demand information is captured
implicitly for the remaining (remote) clients outside these
neighborhoods, by mapping them to clients on the edge of the
neighborhood; the ball radius regulates the trade-off between
scalability and performance. We develop a scalable, distributed
approach that answers our key question through an iterative
re-optimization of the location and the number of facilities within
such balls. We show that even for small values of the radius (1 or 2),
our distributed approach achieves performance under various synthetic
and real Internet topologies that is comparable to that of optimal,
centralized approaches requiring full topology and demand information.
%R 2006-019
%T Implications of Selfish Neighbor Selection in Overlay Networks
%A Laoutaris, Nikolaos
%A Smaragdakis, Georgios
%A Bestavros, Azer
%A Byers, John
%D July 14, 2006
%U http://www.cs.bu.edu/techreports/2006-019-selfish-neighbor-selection.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In a typical overlay network for routing or content sharing, each node
must select a fixed number of immediate overlay neighbors for routing
traffic or content queries. A selfish node entering such a network
would select neighbors so as to minimize the weighted sum of expected
access costs to all its destinations. Previous work on selfish
neighbor selection has built intuition with simple models where edges
are undirected, access costs are modeled by hop-counts, and nodes have
potentially unbounded degrees. However, in practice, important
constraints not captured by these models lead to richer games with
substantively and fundamentally different outcomes. Our work models
neighbor selection as a game involving directed links, constraints on
the number of allowed neighbors, and costs reflecting both network
latency and node preference. We express a node's ``best response''
wiring strategy as a $k$-median problem on asymmetric distance, and
use this formulation to obtain pure Nash equilibria. We experimentally
examine the properties of such stable wirings on synthetic topologies,
as well as on real topologies and maps constructed from PlanetLab and
the AS-level Internet measurements. Our results indicate that selfish
nodes can reap substantial performance benefits when connecting to
overlay networks composed of non-selfish nodes. On the other hand, in
overlays that are dominated by selfish nodes, the resulting stable
wirings are optimized to such great extent that even non-selfish
newcomers can extract near-optimal performance through naive wiring
strategies.
%R 2006-020
%T Scalable Overlay Multicast Tree Construction for QoS-Constrained Media Streaming
%A Parmer, Gabriel
%A West, Richard
%A Fry, Gerald
%D July 14, 2006
%U http://www.cs.bu.edu/techreports/2006-020-multicast-for-qos-media-streaming.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Overlay networks have become popular in recent times for content
distribution and end-system multicasting of media streams. In the latter
case, the motivation is based on the lack of widespread deployment of
IP multicast and the ability to perform end-host processing. However,
constructing routes between various end-hosts, so that data can be
streamed from content publishers to many thousands of subscribers,
each having their own QoS constraints, is still a challenging
problem. First, any routes between end-hosts using trees built on top
of overlay networks can increase stress on the underlying physical
network, due to multiple instances of the same data traversing a given
physical link. Second, because overlay routes between end-hosts may
traverse physical network links more than once, they increase the
end-to-end latency compared to IP-level routing. Third, algorithms for
constructing efficient, large-scale trees that reduce link stress and
latency are typically more complex.
This paper therefore compares various methods to construct multicast
trees between end-systems, that vary in terms of implementation costs
and their ability to support per-subscriber QoS constraints. We
describe several algorithms that make trade-offs between algorithmic
complexity, physical link stress and latency. While no algorithm is
best in all three cases we show how it is possible to efficiently
build trees for several thousand subscribers with latencies within a
factor of two of the optimal, and link stresses comparable to, or
better than, existing technologies.
%R 2006-022
%T An Independent-Connection Model for Traffic Matrices
%A Erramilli, Vijay
%A Crovella, Mark
%A Taft, Nina
%D September 6, 2006
%U http://www.cs.bu.edu/techreports/2006-022-icmodel.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The `gravity' model has been used both for traffic matrix (TM)
estimation and for generating synthetic TMs. It is based on the
assumption that a packet's network egress is independent of its
ingress. We argue that in real IP networks, this assumption should
not and does not hold. The fact that most traffic consists of two-way
exchanges of packets means that traffic streams flowing in opposite
directions at any point in the network are not independent.
In this paper we propose a model for traffic matrices based on
independence of connections rather than packets. We argue that the
independent-connection (IC) model is simpler, more intuitive, and has
a more direct connection to underlying network phenomena than the
gravity model. Using publicly available TMs, we show that the IC
model fits real data better than the gravity model. We then
characterize the parameters involved in the IC model based on our
datasets; these results can be used to construct synthetic TMs.
Finally, we turn to the well-studied problem of choosing a prior for
TM estimation. Assuming that certain parameters of model can be
measured in advance and remain constant in time, we show that the IC
model yields a better prior for TM estimation than the gravity model.
%R 2006-023
%T Notes on the Effect of Different Access Patterns on the Intensity of Mistreatment in Distributed Caching Groups
%A Smaragdakis, Georgios
%D September 18, 2006
%U http://www.cs.bu.edu/techreports/2006-023-effect-different-patterns.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this report, we extend our study of the intensity of mistreatment in
distributed caching groups due to state interaction. In our earlier
work (published as BUCS-TR-2006-003), we analytically showed how this type
of mistreatment may appear under homogeneous demand distributions. We
provided a simple setting where mistreatment due to state interaction
may occur. According to this setting, one or more ``overactive'' nodes
generate disproportionately more requests than the other nodes. In this
report, we extend our experimental evaluation of the intensity of
mistreatment to which non-overactive nodes are subjected, when the
demand distributions are not homogeneous.
%R 2006-024
%T Spatiotemporal Gesture Segmentation (PhD Thesis)
%A Alon, Jonathan
%D September 18, 2006
%U http://www.cs.bu.edu/techreports/2006-024-alon-thesis.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Spotting patterns of interest in an input signal is a very useful
task in many different fields including medicine, bioinformatics,
economics, speech recognition and computer vision. Example instances
of this problem include spotting an object of interest in an image
(e.g., a tumor), a pattern of interest in a time-varying signal
(e.g., audio analysis), or an object of interest moving in a
specific way (e.g., a human's body gesture). Traditional spotting
methods, which are based on Dynamic Time Warping or hidden Markov
models, use some variant of dynamic programming to register the
pattern and the input while accounting for temporal variation
between them. At the same time, those methods often suffer from
several shortcomings: they may give meaningless solutions when input
observations are unreliable or ambiguous, they require a high
complexity search across the whole input signal, and they may give
incorrect solutions if some patterns appear as smaller parts within
other patterns. In this thesis, we develop a framework that
addresses these three problems, and evaluate the framework's
performance in spotting and recognizing hand gestures in video.
The first contribution is a spatiotemporal matching algorithm that
extends the dynamic programming formulation to accommodate
multiple candidate hand detections in every video frame. The
algorithm finds the best alignment between the gesture model and
the input, and simultaneously locates the best candidate hand
detection in every frame. This allows for a gesture to be
recognized even when the hand location is highly ambiguous.
The second contribution is a pruning method that uses
model-specific classifiers to reject dynamic programming
hypotheses with a poor match between the input and model. Pruning
improves the efficiency of the spatiotemporal matching algorithm,
and in some cases may improve the recognition accuracy. The
pruning classifiers are learned from training data, and
cross-validation is used to reduce the chance of overpruning.
The third contribution is a subgesture reasoning process that
models the fact that some gesture models can falsely match parts
of other, longer gestures. By integrating subgesture reasoning the
spotting algorithm can avoid the premature detection of a
subgesture when the longer gesture is actually being performed.
Subgesture relations between pairs of gestures are automatically
learned from training data.
The performance of the approach is evaluated on two challenging
video datasets: hand-signed digits gestured by users wearing short
sleeved shirts, in front of a cluttered background, and American
Sign Language (ASL) utterances gestured by ASL native signers. The
experiments demonstrate that the proposed method is more accurate
and efficient than competing approaches. The proposed approach can
be generally applied to alignment or search problems with multiple
input observations, that use dynamic programming to find a
solution.
%R 2006-025
%T JTP: An Energy-conscious Transport Protocol for Wireless Ad Hoc Networks
%A Riga, Niky
%A Matta, Ibrahim
%A Medina, Alberto
%A Redi, Jason
%A Partridge, Craig
%D September 18, 2006
%U http://www.cs.bu.edu/techreports/2006-025-jtp.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Within a recently developed low-power ad hoc network system, we
present a transport protocol (JTP) whose goal is to reduce power
consumption without trading off delivery requirements of
applications. JTP has the following features: it is lightweight
whereby end-nodes control in-network actions by encoding delivery
requirements in packet headers; JTP enables applications to specify a
range of reliability requirements, thus allocating the right energy
budget to packets; JTP minimizes feedback control traffic from the
destination by varying its frequency based on delivery requirements
and stability of the network; JTP minimizes energy consumption by
implementing in-network caching and increasing the chances that data
retransmission requests from destinations "hit" these caches, thus
avoiding costly source retransmissions; and JTP fairly allocates
bandwidth among flows by backing off the sending rate of a source to
account for in-network retransmissions on its behalf. Analysis and
extensive simulations demonstrate the energy gains of JTP over
one-size-fits-all transport protocols.
%R 2006-026
%T Object Detection at the Optimal Scale with Hidden State Shape Models
%A Wang, Jingbin
%A Athitsos, Vassilis
%A Sclaroff, Stan
%A Betke, Margrit
%D October 2, 2006
%U http://www.cs.bu.edu/techreports/2006-026-scale-hssm.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Hidden State Shape Models (HSSMs), a variant of Hidden Markov Models
(HMMs), were proposed to detect shape classes of variable structure in
cluttered images. In this paper, we formulate a probabilistic
framework for HSSMs which solves two scale related problems in
comparison to the original method. First, while HSSMs required the
scale of the object to be passed as an input, the method proposed here
estimates the scale of the object automatically. This is achieved by
introducing a new term for the observation probability that is based
on a object-clutter feature model. Second, a segmental HMM is applied
to model the duration probability of each HMM state, which is learned
from the shape statistics in a training set and helps obtain
meaningful registration results. Using a segmental HMM provides a
principled way to model dependencies between the scales of different
parts of the object. In object localization experiments on a dataset
of real hand images, the proposed method significantly outperforms the
original HSSMs, reducing the incorrect localization rate from 40% to
15%. The improvement in accuracy becomes more significant if we
consider that the method proposed here is scale-independent, whereas
the previous method takes as input the scale of the object we want to
localize.
%R 2006-027
%T Discovering Frequent Poly-Regions of DNA Sequences
%A Papapetrou, Panagiotis
%A Benson, Gary
%A Kollios, George
%D October 15, 2006
%U http://www.cs.bu.edu/techreports/2006-027-dna-frequent-poly-regions.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The problem of discovering frequent arrangements of regions of high
occurrence of one or more items of a given alphabet in a sequence is
studied, and two efficient approaches are proposed to solve it. The
first approach is entropy-based and uses an existing recursive
segmentation technique to split the input sequence into a set of
homogeneous segments. The key idea of the second approach is to use a
set of sliding windows over the sequence. Each sliding window keeps a
set of statistics of a sequence segment that mainly includes the
number of occurrences of each item in that segment. Combining these
statistics efficiently yields the complete set of regions of high
occurrence of the items of the given alphabet. After identifying these
regions, the sequence is converted to a sequence of labeled intervals
(each one corresponding to a region). An efficient algorithm for
mining frequent arrangements of temporal intervals on a single
sequence is applied on the converted sequence to discover frequently
occurring arrangements of these regions. The proposed algorithms are
tested on various DNA sequences producing results with significant
biological meaning.
%R 2006-028
%T Real-Time Spatio-Temporal Query Processing in Mobile Ad-Hoc Sensor Networks
%A Morcos, Hany
%A Bestavros, Azer
%A Matta, Ibrahim
%D October 15, 2006
%U http://www.cs.bu.edu/techreports/2006-028-ad-hoc-spatiotemporal-queries.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Personal communication devices are increasingly equipped with sensors
that are able to collect and locally store information from their
environs. The mobility of users carrying such devices, and hence the
mobility of sensor readings in space and time, opens new horizons for
interesting applications. In particular, we envision a system in which
the collective sensing, storage and communication resources, and
mobility of these devices could be leveraged to query the state of
(possibly remote) neighborhoods. Such queries would have
spatio-temporal constraints which must be met for the query answers to
be useful. Using a simplified mobility model, we analytically quantify
the benefits from cooperation (in terms of the system's ability to
satisfy spatio-temporal constraints), which we show to go beyond
simple space-time tradeoffs. In managing the limited storage resources
of such cooperative systems, the goal should be to minimize the number
of unsatisfiable spatio-temporal constraints. We show that Data
Centric Storage (DCS), or ``directed placement'', is a viable approach
for achieving this goal, but only when the underlying network is well
connected. Alternatively, we propose, ``amorphous placement'', in
which sensory samples are cached locally, and shuffling of cached
samples is used to diffuse the sensory data throughout the whole
network. We evaluate conditions under which directed versus amorphous
placement strategies would be more efficient. These results lead us to
propose a hybrid placement strategy, in which the spatio-temporal
constraints associated with a sensory data type determine the most
appropriate placement strategy for that data type. We perform an
extensive simulation study to evaluate the performance of directed,
amorphous, and hybrid placement protocols when applied to queries that
are subject to timing constraints. Our results show that, directed
placement is better for queries with moderately tight deadlines,
whereas amorphous placement is better for queries with looser
deadlines, and that under most operational conditions, the hybrid
technique gives the best compromise.
%R 2006-029
%T Safe Compositional Specification of Network Systems With Polymorphic, Constrained Types
%A Liu, Likai
%A Kfoury, Assaf
%D October 25, 2006
%U http://www.cs.bu.edu/techreports/2006-029-poly-constrained-traffic.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In the framework of iBench research project, our previous work created
a domain specific language TRAFFIC [BUCS-TR-2005-015] that facilitates
specification, programming, and maintenance of distributed
applications over a network. It allows safety property to be
formalized in terms of types and subtyping relations. Extending upon
our previous work, we add Hindley-Milner style polymorphism
[Milner:JCSS-1978-v17] with constraints [Odersky:TPOS-1999-v5] to the
type system of TRAFFIC. This allows a programmer to use for-all
quantifier to describe types of network components, escalating power
and expressiveness of types to a new level that was not possible
before with propositional subtyping relations. Furthermore, we design
our type system with a pluggable constraint system, so it can adapt to
different application needs while maintaining soundness.
In this paper, we show the soundness of the type system, which is not
syntax-directed but is easier to do typing derivation. We show that
there is an equivalent syntax-directed type system, which is what a
type checker program would implement to verify the safety of a network
flow. This is followed by discussion on several constraint systems:
polymorphism with subtyping constraints, Linear Programming, and
Constraint Handling Rules (CHR) [Fruhwirth:JLP-1998-v37]. Finally, we
provide some examples to illustrate workings of these constraint
systems.
%R 2006-030
%T TCP over CDMA2000 Networks: A Cross-Layer Measurement Study
%A Mattar, Karim
%A Sridharan, Ashwin
%A Zang, Hui
%A Matta, Ibrahim
%A Bestavros, Azer
%D October 25, 2006
%U http://www.cs.bu.edu/techreports/2006-030-tcp-over-cdma2000-measurement.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Modern cellular channels in 3G networks incorporate sophisticated
power control and dynamic rate adaptation which can have significant
impact on adaptive transport layer protocols, such as TCP. Though
there exists studies that have evaluated the performance of TCP over
such networks, they are based solely on observations at the transport
layer and hence have no visibility into the impact of lower layer
dynamics, which are a key characteristic of these networks. In this
work, we present a detailed characterization of TCP behavior based on
cross-layer measurement of transport layer, as well as RF and MAC
layer parameters. In particular, through a series of active TCP/UDP
experiments and measurement of the relevant variables at all three
layers, we characterize both, the wireless scheduler and the radio
link protocol in a commercial CDMA2000 network and assess their impact
on TCP dynamics. Somewhat surprisingly, our findings indicate that the
wireless scheduler is mostly insensitive to channel quality and sector
load over short timescales and is mainly affected by the transport
layer data rate. Furthermore, with the help of a robust correlation
measure, Normalized Mutual Information, we were able to quantify the
impact of the wireless scheduler and the radio link protocol on
various TCP parameters such as the round trip time, throughput and
packet loss rate.
%R 2006-031
%T Towards Formalizing Java's Weak References
%A Gabay, Yarom
%A Kfoury, Assaf
%D December 15, 2006
%U http://www.cs.bu.edu/techreports/2006-031-java-weak-references.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Weak references provide the programmer with limited control over the
process of memory management. By using them, a programmer can make
decisions based on previous actions that are taken by the garbage
collector. Although this is often helpful, the outcome of a program
using weak references is less predictable due to the nondeterminism
they introduce in program evaluation. It is therefore desirable to
have a framework of formal tools to reason about weak references and
programs that use them.
We present several calculi that formalize various aspects of weak
references, inspired by their implementation in Java. We provide a
calculus to model multiple levels of non-strong references, where a
different garbage collection policy is applied to each level. We
consider different collection policies such as eager collection and
lazy collection. Similar to the way they are implemented in Java, we
give the semantics of eager collection to weak references and
the semantics of lazy collection to soft references. Moreover,
we condition garbage collection on the availability of time and space
resources. While time constraints are used in order to restrict
garbage collection, space constraints are used in order to trigger it.
Finalizers are a problematic feature in Java, especially when they
interact with weak references. We provide a calculus to model
finalizer evaluation. Since finalizers have little meaning in a
language without side-effect, we introduce a limited form of side
effect into the calculus. We discuss determinism and the separate
notion of uniqueness of (evaluation) outcome. We show that in our
calculus, finalizer evaluation does not affect uniqueness of outcome.
%R 2006-032
%T netEmbed: A Network Resource Mapping Service for Distributed Applications
%A Londono, Jorge
%A Bestavros, Azer
%D December 15, 2006
%U http://www.cs.bu.edu/techreports/2006-032-netembed.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Emerging configurable infrastructures such as large-scale overlays and
grids, distributed testbeds, and sensor networks comprise diverse sets
of available computing resources (e.g., CPU and OS capabilities and
memory constraints) and network conditions (e.g., link delay,
bandwidth, loss rate, and jitter) whose characteristics are both
complex and time-varying. At the same time, distributed applications
to be deployed on these infrastructures exhibit increasingly complex
constraints and requirements on resources they wish to utilize.
Examples include selecting nodes and links to schedule an overlay
multicast file transfer across the Grid, or embedding a network
experiment with specific resource constraints in a distributed testbed
such as PlanetLab. Thus, a common problem facing the efficient
deployment of distributed applications on these infrastructures is
that of ``mapping'' application-level requirements onto the network in
such a manner that the requirements of the application are realized,
assuming that the underlying characteristics of the network are known.
We refer to this problem as the network embedding problem. In this
paper, we propose a new approach to tackle this combinatorially-hard
problem. Thanks to a number of heuristics, our approach greatly
improves performance and scalability over previously existing
techniques. It does so by pruning large portions of the search space
without overlooking any valid embedding. We present a construction
that allows a compact representation of candidate embeddings, which is
maintained by carefully controlling the order via which candidate
mappings are inserted and invalid mappings are removed. We present an
implementation of our proposed technique, which we call netEmbed -- a
service that identify feasible mappings of a virtual network
configuration (the query network) to an existing real infrastructure
or testbed (the hosting network). We present results of extensive
performance evaluation experiments of netEmbed using several
combinations of real and synthetic network topologies. Our results
show that our netEmbed service is quite effective in identifying one
(or all) possible embeddings for quite sizable queries and hosting
networks -- much larger than what any of the existing techniques or
services are able to handle.
%R 2006-033
%T Traffic Characteristics and Communication Patterns in Blogosphere
%A Duarte, Fernando
%A Mattos, Bernardo
%A Bestavros, Azer
%A Almeida, Virgilio
%A Almeida, Jussara
%D December 15, 2006
%U http://www.cs.bu.edu/techreports/2006-033-blog-characterization.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a thorough characterization of the access patterns in
blogspace -- a fast-growing constituent of the content available
through the Internet -- which comprises a rich interconnected web of
blog postings and comments by an increasingly prominent user community
that collectively define what has become known as the blogosphere. Our
characterization of over 35 million read, write, and administrative
requests spanning a 28-day period is done from three different
blogosphere perspectives. The server view characterizes the aggregate
access patterns of all users to all blogs; the user view characterizes
how individual users interact with blogosphere objects (blogs); the
object view characterizes how individual blogs are accessed. Our
findings support two important conclusions. First, we show that the
nature of interactions between users and objects is fundamentally
different in blogspace than that observed in traditional web content.
Access to objects in blogspace could be conceived as part of an
interaction between an author and its readership. As we show in our
work, such interactions range from one-to-many ``broadcast-type'' and
many-to-one ``registration-type'' communication between an author and
its readers, to multi-way, iterative ``parlor-type'' dialogues among
members of an interest group. This more-interactive nature of the
blogosphere leads to interesting traffic and communication patterns,
which are different from those observed in traditional web
content. Second, we identify and characterize novel features of the
blogosphere workload, and we investigate the similarities and
differences between typical web server workloads and blogosphere
server workloads. Given the increasing share of blogspace traffic,
understanding such differences is important for capacity planning and
traffic engineering purposes, for example.
%R 2006-034
%T Constraint-based Mining of Frequent Arrangements of Temporal Intervals (MA Thesis)
%A Panagiotis Papapetrou
%D December 30, 2006
%U http://www.cs.bu.edu/techreports/2006-034-MA-Thesis-Panagiotis-Papapetrou.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The problem of discovering frequent arrangements of temporal intervals
is studied. It is assumed that the database consists of sequences of
events, where an event occurs during a time-interval. The goal is to
mine temporal arrangements of event intervals that appear frequently
in the database. The motivation of this work is the observation that
in practice most events are not instantaneous but occur over a period
of time and different events may occur concurrently. Thus, there are
many practical applications that require mining such temporal
correlations between intervals including the linguistic analysis of
annotated data from American Sign Language as well as network and
biological data. Two efficient methods to find frequent arrangements
of temporal intervals are described; the first one is tree-based and
uses depth first search to mine the set of frequent arrangements,
whereas the second one is prefix-based. The above methods apply
efficient pruning techniques that include a set of constraints
consisting of regular expressions and gap constraints that add
user-controlled focus into the mining process. Moreover, based on the
extracted patterns a standard method for mining association rules is
employed that applies different interestingness measures to evaluate
the significance of the discovered patterns and rules. The performance
of the proposed algorithms is evaluated and compared with other
approaches on real (American Sign Language annotations and network
data) and large synthetic datasets.
%R 2007-001
%T Generating Representative ISP Topologies From First-Principles
%A Wang, Chong
%A Byers, John
%D March 15, 2007
%U http://www.cs.bu.edu/techreports/2007-001-ISP-topology-from-first-principles.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Understanding and modeling the factors that underlie the growth and
evolution of network topologies are basic questions that impinge upon
capacity planning, forecasting, and protocol research. Early topology
generation work focused on generating network-wide connectivity maps,
either at the AS-level or the router-level, typically with an eye
towards reproducing abstract properties of observed topologies. But
recently, advocates of an alternative ``first-principles'' approach
question the feasibility of realizing representative topologies with
simple generative models that do not explicitly incorporate real-world
constraints, such as the relative costs of router configurations, into
the model. Our work synthesizes these two lines by designing a
topology generation mechanism that incorporates first-principles
constraints. Our goal is more modest than that of constructing an
Internet-wide topology: we aim to generate representative topologies
for single ISPs. However, our methods also go well beyond previous
work, as we annotate these topologies with representative capacity and
latency information. Taking only demand for network services over a
given region as input, we propose a natural cost model for building
and interconnecting PoPs and formulate the resulting optimization
problem faced by an ISP. We devise hill-climbing heuristics for this
problem and demonstrate that the solutions we obtain are
quantitatively similar to those in measured router-level ISP
topologies, with respect to both topological properties and
fault-tolerance.
%R 2007-002
%T A Geometric Approach to Slot Alignment in Wireless Sensor Networks
%A Riga, Niky
%A Matta, Ibrahim
%A Bestavros, Azer
%D March 26, 2007
%U http://www.cs.bu.edu/techreports/2007-002-geometric-slot-align.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Traditionally, slotted communication protocols have employed guard
times to delineate and align slots. These guard times may expand the
slot duration significantly, especially when clocks are allowed to
drift for longer time to reduce clock synchronization overhead.
Recently, a new class of lightweight protocols for statistical
estimation in wireless sensor networks have been proposed. This new
class requires very short transmission durations (jam signals), thus
the traditional approach of using guard times would impose significant
overhead. We propose a new, more efficient algorithm to align
slots. Based on geometrical properties of space, we prove that our
approach bounds the slot duration by only a constant factor of what is
needed. Furthermore, we show by simulation that this bound is loose
and an even smaller slot duration is required, making our approach
even more efficient.
%R 2007-003
%T Parameter Sensitive Detectors
%A Yuan, Quan
%A Thangali, Ashwin Thangali
%A Ablavsky, Vitaly
%A Sclaroff, Stan
%D April 30, 2007
%U http://www.cs.bu.edu/techreports/2007-003-parameter-sensitive-detectors.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Object detection can be challenging when the object class exhibits
large variations. One commonly-used strategy is to first partition
the space of possible object variations and then train separate
classifiers for each portion. However, with continuous spaces the
partitions tend to be arbitrary since there are no natural boundaries
(for example, consider the continuous range of human body poses). In
this paper, a new formulation is proposed, where the detectors
themselves are associated with continuous parameters, and reside in a
parameterized function space. There are two advantages of this
strategy. First, a-priori partitioning of the parameter space is not
needed; the detectors themselves are in a parameterized space.
Second, the underlying parameters for object variations can be
learned from training data in an unsupervised manner. For profile
face detection, our detection rate outperforms Viola-Jones¢ method
by 5%, for 90 false alarms. On a hand shape data set, our method
improves detection rate from 98% to 99.5% at a false positive rate of
0.1%, compared with partition based methods. On a pedestrian data
set, our method reduces miss detection rate by a factor of three at a
false positive rate of 1%, compared with Dalal-Triggs method.
%R 2007-004
%T Multi-scale 3D Scene Flow from Binocular Stereo Sequences
%A Li, Rui
%A Sclaroff, Stan
%D May 10, 2007
%U http://www.cs.bu.edu/techreports/2007-004-3d-scene-flow.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Scene flow methods estimate the three-dimensional motion field for
points in the world, using multi-camera video data. Such methods
combine multi-view reconstruction with motion estimation. This
paper describes an alternative formulation for dense scene flow
estimation that provides reliable results using only two cameras
by fusing stereo and optical flow estimation into a single
coherent framework. Internally, the proposed algorithm generates
probability distributions for optical flow and disparity. Taking
into account the uncertainty in the intermediate stages allows for
more reliable estimation of the 3D scene flow than previous
methods allow. To handle the aperture problems inherent in the
estimation of optical flow and disparity, a multi-scale method
along with a novel region-based technique is used within a
regularized solution. This combined approach both preserves
discontinuities and prevents over-regularization -- two problems
commonly associated with the basic multi-scale approaches.
Experiments with synthetic and real test data demonstrate the
strength of the proposed approach.
%R 2007-005
%T Diversity of Forwarding Paths in Pocket Switched Networks
%A Erramilli, Vijay
%A Chaintreau, Augustin
%A Crovella, Mark
%A Diot, Christophe
%D May 13, 2007
%U http://www.cs.bu.edu/techreports/2007-005-diversity-paths-psn.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Forwarding in DTNs is a challenging problem. We focus on the specific
issue of forwarding in an environment where mobile devices are carried
by people in a restricted physical space (a conference) and contact
patterns are not predictable. We show for the first time a path
explosion phenomenon between most pairs of nodes. This means that,
once the first path reaches the destination, the number of subsequent
paths grows rapidly with time, so there usually exist many
near-optimal paths. We study the path explosion phenomenon both
analytically and empirically. Our results highlight the importance of
unequal contact rates across nodes for understanding the performance
of forwarding algorithms. We also find that a variety of well-known
forwarding algorithms show surprisingly similar performance in our
setting and we interpret this fact in light of the path explosion
phenomenon.
%R 2007-006
%T Improving the Performance of Overlay Routing and P2P File Sharing using Selfish Neighbor Selection
%A Smaragdakis, Georgios
%A Laoutaris, Nikolaos
%A Bestavros, Azer
%A Byers, John
%A Roussopoulos, Mema
%D May 15, 2007
%U http://www.cs.bu.edu/techreports/2007-006-sns-overlay-routing-and-file-sharing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A foundational issue underlying many overlay network applications
ranging from routing to P2P file sharing is that of connectivity
management, i.e., folding new arrivals into the existing mesh and
re-wiring to cope with changing network conditions. Previous work has
considered the problem from two perspectives: devising practical
heuristics for specific applications designed to work well in real
deployments, and providing abstractions for the underlying problem
that are tractable to address via theoretical analyses, especially
game-theoretic analysis. Our work unifies these two thrusts first by
distilling insights gleaned from clean theoretical models, notably
that under natural resource constraints, selfish players can select
neighbors so as to efficiently reach near-equilibria that also provide
high global performance. Using Egoist, a prototype overlay routing
system we implemented on PlanetLab, we demonstrate that our neighbor
selection primitives significantly outperform existing heuristics on a
variety of performance metrics; that Egoist is competitive with an
optimal, but unscalable full-mesh approach; and that it remains highly
effective under significant churn. We also describe variants of
Eegoist's current design that would enable it to scale to overlays of
much larger scale and allow it to cater effectively to applications,
such as P2P file sharing in unstructured overlays, based on the use of
primitives such as scoped-flooding rather than routing.
%R 2007-007
%T Small Depth Quantum Circuits
%A Bera, Debajyoti
%A Green, Frederic
%A Homer, Steve
%D May 24, 2007
%U http://www.cs.bu.edu/techreports/2007-007-quantum-circuit-survey.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Quantum circuits are a general and universal formulation of quantum
computation, and small quantum circuits are likely to be the model for
the first implementations of quantum computing. Although the quantum
circuit model is quite different than the classical one, it has
nonetheless proven to be quite fruitful to look to classical circuit
models for insight and comparison. Furthermore, very small (i.e.,
constant) depth classical circuits present us with computational
models for which we can prove interesting lower bounds. In this survey
we explore the computational power and limits of small depth quantum
circuits. We prove that several quantum circuit classes are
unexpectedly powerful and can perform computations strictly stronger
than their classical counterparts. We exhibit lower bounds for the
circuit depth of quantum circuits which compute parity or fanout.
%R 2007-008
%T Amorphous Placement and Informed Diffusion for Timely Field Monitoring by Autonomous, Resource-Constrained Mobile Sensors
%A Morcos, Hany
%A Bestavros, Azer
%A Matta, Ibrahim
%D June 6, 2007
%U http://www.cs.bu.edu/techreports/2007-008-amorphous-placement-and-informed-diffusion.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Personal communication devices are increasingly equipped with sensors
which are able to passively sample their surroundings. We envision a
service that enables a community of users carrying such memory-limited
devices to query the condition of various locations in the field in
which they collectively roam. We show that existing techniques that
rely on directed placement and retrieval (DPR), are viable approaches
to implementing such a service, but only when the underlying network
is well connected. Alternatively, we propose the use of amorphous
placement and retrieval (APR), in which a cache management scheme is
employed to store sensory samples locally, and an informed exchange of
cached samples is used to diffuse the sensory data throughout the
network, in such a way that the answer to any query (targeting an
arbitrary location in the field) is likely to be found close to the
query origin. A salient characteristic in such a setting is the
relationship between the probability of roaming a location in the
field and the probability of querying that location. If roaming and
query probability distributions do not match---which is the case in
many settings---then an important determinant of the performance of
APR is the manner with which cached field samples are collectively
shared and managed. In that regard, we argue that knowledge of the
distribution of query targets could be used effectively by an informed
cache management policy to maximize the utility of collective storage
of all devices. Using a simple analytical model, we show that the use
of informed cache management is particularly important when the
mobility model results in a non-uniform distribution of users over the
field. We present results from extensive simulations which show that
in sparsely-connected networks, APR is more cost-effective than DPR,
that it provides extra resilience to node failure and packet losses,
and that its use of informed cache management yields superior
performance.
%R 2007-009
%T Swarming on optimized graphs for n-way broadcast
%A Smaragdakis, Georgios
%A Laoutaris, Nikolaos
%A Michiardi, Pietro
%A Bestavros, Azer
%A Byers, John
%A Roussopoulos, Mema
%D July 5, 2007
%U http://www.cs.bu.edu/techreports/2007-009-n-way-swarming.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In an n-way broadcast application each one of n overlay nodes wants to
push its own distinct large data file to all other n-1 destinations as
well as download their respective data files. BitTorrent-like swarming
protocols are ideal choices for handling such massive data volume
transfers. The original BitTorrent targets one-to-many broadcasts of a
single file to a very large number of receivers and thus, by
necessity, employs an almost random overlay topology. n-way broadcast
applications on the other hand, owing to their inherent n-squared
nature, are realizable only in small to medium scale networks. In this
paper, we show that we can leverage this scale constraint to construct
optimized overlay topologies that take into consideration the
end-to-end characteristics of the network and as a consequence deliver
far superior performance compared to random and myopic (local)
approaches. We present the Max-Min and Max- Sum peer-selection
policies used by individual nodes to select their neighbors. The first
one strives to maximize the available bandwidth to the slowest
destination, while the second maximizes the aggregate output rate. We
design a swarming protocol suitable for n-way broadcast and operate it
on top of overlay graphs formed by nodes that employ Max-Min or
Max-Sum policies. Using trace-driven simulation and measurements from
a PlanetLab prototype implementation, we demonstrate that the
performance of swarming on top of our constructed topologies is far
superior to the performance of random and myopic overlays. Moreover,
we show how to modify our swarming protocol to allow it to accommodate
selfish nodes.
%R 2007-010
%T Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series
%A Li, Rui
%A Tian, Tai-Peng
%A Sclaroff, Stan
%D August 21, 2007
%U http://www.cs.bu.edu/techreports/2007-010-simultaneous-learning-of-nonlinear-manifold.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The goal of this work is to learn a parsimonious and informative
representation for high-dimensional time series. Conceptually, this
comprises two distinct yet tightly coupled tasks: learning a
low-dimensional manifold and modeling the dynamical process. These two
tasks have a complementary relationship as the temporal constraints
provide valuable neighborhood information for dimensionality reduction
and conversely, the low-dimensional space allows dynamics to be learnt
efficiently. Solving these two tasks simultaneously allows important
information to be exchanged mutually. If nonlinear models are required
to capture the rich complexity of time series, then the learning
problem becomes harder as the nonlinearities in both tasks are
coupled. The proposed solution approximates the nonlinear manifold and
dynamics using piecewise linear models. The interactions among the
linear models are captured in a graphical model. By exploiting the
model structure, efficient inference and learning algorithms are
obtained without oversimplifying the model of the underlying dynamical
process. Evaluation of the proposed framework with competing
approaches is conducted in three sets of experiments: dimensionality
reduction and reconstruction using synthetic time series, video
synthesis using a dynamic texture database, and human motion
synthesis, classification and tracking on a benchmark data set. In all
experiments, the proposed approach provides superior performance.
%R 2007-011
%T Entropy Loss is Maximal for Uniform Inputs
%A Reyzin, Leonid
%D September 20, 2007
%U http://www.cs.bu.edu/techreports/2007-011-entropy-note.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A secure sketch (defined by Dodis et al.) is an algorithm that on an
input w produces an output s such that w can be reconstructed given
its noisy version w' and s. Security is defined in terms of two
parameters m and m': if w comes from a distribution of entropy m,
then a secure sketch guarantees that the distribution of w conditioned
on s has entropy at least m', where l = m-m' is called the entropy
loss. In this note we show that the entropy loss of any secure sketch
(or, more generally, any randomized algorithm) on any distribution is
no more than it is on the uniform distribution.
%R 2007-012
%T Hidden Type Variables and Conditional Extension for More Expressive Generic Programs (PhD Thesis)
%A Hallett, Joseph
%D October 2, 2007
%U http://www.cs.bu.edu/techreports/2007-012-PhD-Thesis-Joseph-Hallett.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Generic object-oriented programming languages combine parametric
polymorphism and nominal subtype polymorphism, thereby providing
better data abstraction, greater code reuse, and fewer run-time
errors. However, most generic object-oriented languages provide a
straightforward combination of the two kinds of polymorphism, which
prevents the expression of advanced type relationships. Furthermore,
most generic object-oriented languages have a type-erasure semantics:
instantiations of type parameters are not available at run time, and
thus may not be used by type-dependent operations. This dissertation
shows that two features, which allow the expression of many advanced
type relationships, can be added to a generic object-oriented
programming language without type erasure: 1. type variables that are
not parameters of the class that declares them, and 2. extension that
is dependent on the satisfiability of one or more constraints. We
refer to the first feature as hidden type variables and the second
feature as conditional extension. Hidden type variables allow:
covariance and contravariance without variance annotations or special
type arguments such as wildcards; a single type to extend, and inherit
methods from, infinitely many instantiations of another type; a
limited capacity to augment the set of superclasses after that class
is defined; and the omission of redundant type arguments. Conditional
extension allows the properties of a collection type to be dependent
on the properties of its element type. This dissertation describes
the semantics and implementation of hidden type variables and
conditional extension. A sound type system is presented. In
addition, a sound and terminating type checking algorithm is
presented. Although designed for the Fortress programming language,
hidden type variables and conditional extension can be incorporated
into other generic object-oriented languages. Many of the same
problems would arise, and solutions analogous to those we present
would apply.
%R 2007-013
%T EGOIST: Overlay Routing using Selfish Neighbor Selection
%A Smaragdakis, Georgios
%A Laoutaris, Nikolaos
%A Bestavros, Azer
%A Byers, John
%A Roussopoulos, Mema
%D October 9, 2007
%U http://www.cs.bu.edu/techreports/2007-013-egoist.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A foundational issue underlying many overlay network applications
ranging from routing to P2P file sharing is that of connectivity
management, i.e., folding new arrivals into an existing overlay, and
re-wiring to cope with changing network conditions. Previous work has
considered the problem from two perspectives: devising practical
heuristics for specific applications designed to work well in real
deployments, and providing abstractions for the underlying problem
that are analytically tractable, especially via game-theoretic
analysis. In this paper, we unify these two thrusts by using insights
gleaned from novel, realistic theoretic models in the design of
EGOIST -- a prototype overlay routing system that we implemented,
deployed, and evaluated on PlanetLab. Using measurements on PlanetLab
and trace-based simulations, we demonstrate that EGOIST's neighbor
selection primitives significantly outperform existing heuristics on a
variety of performance metrics, including delay, available bandwidth,
and node utilization. Moreover, we demonstrate that EGOIST is
competitive with an optimal, but unscalable full-mesh approach,
remains highly effective under significant churn, is robust to
cheating, and incurs minimal overhead. Finally, we discuss some of the
potential benefits EGOIST may offer to applications.
%R 2007-014
%T An Energy-conscious Transport Protocol for Multi-hop Wireless Networks
%A Riga, Niky
%A Matta, Ibrahim
%A Medina, Alberto
%A Redi, Jason
%D October 17, 2007
%U http://www.cs.bu.edu/techreports/2007-014-jtp2.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a transport protocol whose goal is to reduce power
consumption without compromising delivery requirements of
applications. To meet its goal of energy efficiency, our transport
protocol (1) contains mechanisms to balance end-to-end vs. local
retransmissions; (2) minimizes acknowledgment traffic using receiver
regulated rate-based flow control combined with selected
acknowledgements and in-network caching of packets; and (3)
aggressively seeks to avoid any congestion-based packet loss. Within
a recently developed ultra low-power multi-hop wireless network
system, extensive simulations and experimental results demonstrate
that our transport protocol meets its goal of preserving the energy
efficiency of the underlying network. This techincal report revises
the techichal report BU-TR 2006-025.
%R 2007-015
%T System F with Constraint Types (MA Thesis)
%A Donnelly, Kevin
%D December 1, 2007
%U http://www.cs.bu.edu/techreports/2007-015-MA-Thesis-Kevin-Donnelly.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
System F is a type system that can be seen as both a proof system for
second-order propositional logic and as a polymorphic programming
language. In this work we explore several extensions of System F by
types which express subtyping constraints. These systems include
terms which represent proofs of subtyping relationships between types.
Given a proof that one type is a subtype of another, one may use a
coercion term constructor to coerce terms from the first type to the
second. The ability to manipulate type constraints as first-class
entities gives these systems a lot of expressive power, including the
ability to encode generalized algebraic data types and intensional
type analysis. The main contributions of this work are in the
formulation of constraint types and proofs of type soundness and
strong normalization for an extension of System F with constraint
types.
%R 2007-016
%T TCP over CDMA2000 Networks: A Cross-Layer Measurement Study (MA Thesis)
%A Mattar, Karim
%D December 14, 2007
%U http://www.cs.bu.edu/techreports/2007-016-MA-Thesis-Karim-Mattar.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Modern cellular channels in 3G networks incorporate sophisticated
power control and dynamic rate adaptation which can have a significant
impact on adaptive transport layer protocols, such as TCP. Though
there exists studies that have evaluated the performance of TCP over
such networks, they are based solely on observations at the transport
layer and hence have no visibility into the impact of lower layer
dynamics, which are a key characteristic of these networks. In this
work, we present a detailed characterization of TCP behavior based on
cross-layer measurement of transport, as well as RF and MAC layer
parameters. In particular, through a series of active TCP/UDP
experiments and measurement of the relevant variables at all three
layers, we characterize both, the wireless scheduler in a commercial
CDMA2000 network and its impact on TCP dynamics. Somewhat
surprisingly, our findings indicate that the wireless scheduler is
mostly insensitive to channel quality and sector load over short
timescales and is mainly affected by the transport layer data rate.
Furthermore, we empirically demonstrate the impact of the wireless
scheduler on various TCP parameters such as the round trip time,
throughput and packet loss rate.
%R 2007-017
%T Examples of Network Flow Verification Using TRAFFIC(X)
%A Liu, Likai
%A Kfoury, Assaf
%D January 10, 2008
%U http://www.cs.bu.edu/techreports/2007-017-traffic-x.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
In our previous work, we developed TRAFFIC(X), a
specification language for modeling bi-directional network flows
featuring a type system with constrained polymorphism. In this paper,
we present two ways to customize the constraint system: (1) when using
linear inequality constraints for the constraint system, TRAFFIC(X)
can describe flows with numeric properties such as MTU (maximum
transmission unit), RTT (round trip time), traversal order, and
bandwidth allocation over parallel paths; (2) when using Boolean
predicate constraints for the constraint system, TRAFFIC(X) can
describe routing policies of an IP network. These examples illustrate
how to use the customized type system.
%R 2008-001
%T On the Stable Paths Problem and a Restricted Variant
%A Donnelly, Kevin
%A Kfoury, Assaf
%D January 10, 2008
%U http://www.cs.bu.edu/techreports/2008-001-stable-paths.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Interdomain routing on the Internet is performed using route
preference policies specified independently, and arbitrarily by each
Autonomous System in the network. These policies are used in the
border gateway protocol (BGP) by each AS when selecting next-hop
choices for routes to each destination. Con- flicts between policies
used by different ASs can lead to routing instabilities that,
potentially, cannot be resolved no matter how long BGP is run. The
Stable Paths Problem (SPP) is an abstract graph theoretic model of the
problem of selecting nexthop routes for a destination. A stable
solution to the problem is a set of next-hop choices, one for each AS,
that is compatible with the policies of each AS. In a stable solution
each AS has selected its best next-hop given that the next-hop choices
of all neighbors are fixed. BGP can be viewed as a distributed
algorithm for solving SPP. In this report we consider the stable
paths problem, as well as a family of restricted variants of the
stable paths problem, which we call F stable paths problems. We show
that two very simple variants of the stable paths problem are also
NP-complete. In addition we show that for networks with a DAG
topology, there is an efficient centralized algorithm to solve the
stable paths problem, and that BGP always efficiently converges to a
stable solution on such networks.
%R 2008-002
%T Wireless and Physical Security via Embedded Sensor Networks
%A Ocean, Michael
%A Bestavros, Azer
%D January 15, 2008
%U http://www.cs.bu.edu/techreports/2008-002-snbench-wireless-intrusion-detection.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Wireless Intrusion Detection Systems (WIDS) monitor 802.11 wireless
frames (Layer-2) in an attempt to detect misuse. What distinguishes
a WIDS from a traditional Network IDS is the ability
to utilize the broadcast nature of the medium to reconstruct the
physical location of the offending party, as opposed to its possibly
spoofed (MAC addresses) identity in cyber space. Traditional
Wireless Network Security Systems are still heavily anchored in the
digital plane of "cyber space" and hence cannot be used reliably or
effectively to derive the physical identity of an intruder in order to
prevent further malicious wireless broadcasts, for example by escorting
an intruder off the premises based on physical evidence. In
this paper, we argue that Embedded Sensor Networks could be used
effectively to bridge the gap between digital and physical security
planes, and thus could be leveraged to provide reciprocal benefit to
surveillance and security tasks on both planes. Toward that end, we
present our recent experience integrating wireless networking security
services into the SNBENCH (Sensor Network workBench). The
SNBENCH provides an extensible framework that enables the rapid
development and automated deployment of Sensor Network applications
on a shared, embedded sensing and actuation infrastructure.
The SNBENCH's extensible architecture allows an engineer
to quickly integrate new sensing and response capabilities into the
SNBENCH framework, while high-level languages and compilers
allow novice SN programmers to compose SN service logic, unaware
of the lower-level implementation details of tools on which
their services rely. In this paper we convey the simplicity of the
service composition through concrete examples that illustrate the
power and potential of Wireless Security Services that span both
the physical and digital plane.
%R 2008-003
%T An Information Theoretic Framework for Field Monitoring Using Autonomously Mobile Sensors
%A Morcos, Hany
%A Atia, George
%A Bestavros, Azer
%A Matta, Ibrahim
%D February 10, 2008
%U http://www.cs.bu.edu/techreports/2008-003-information-theoretic-mobile-field-monitoring.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider a mobile sensor network monitoring a spatio-temporal
field. Given limited cache sizes at the sensor nodes, the goal is to
develop a distributed cache management algorithm to efficiently answer
queries with a known probability distribution over the spatial
dimension. First, we propose a novel distributed information
theoretic approach in which the nodes locally update their caches
based on full knowledge of the space-time distribution of the
monitored phenomenon. At each time instant, local decisions are made
at the mobile nodes concerning which samples to keep and whether or
not a new sample should be acquired at the current location. These
decisions account for minimizing an entropic utility function that
captures the average amount of uncertainty in queries given the
probability distribution of query locations. Second, we propose a
different correlation-based technique, which only requires knowledge
of the second-order statistics, thus relaxing the stringent constraint
of having a priori knowledge of the query distribution, while
significantly reducing the computational overhead. It is shown that
the proposed approaches considerably improve the average field
estimation error by maintaining efficient cache content. It is further
shown that the correlation-based technique is robust to model mismatch
in case of imperfect knowledge of the underlying generative
correlation structure.
%R 2008-004
%T Detour-Based Mobility Coordination in DTNs
%A Morcos, Hany
%A Bestavros, Azer
%A Matta, Ibrahim
%D February 10, 2008
%U http://www.cs.bu.edu/techreports/2008-004-DTN-detour-coordination.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Commonly, research work in routing for delay tolerant networks (DTN)
assumes that node encounters are predestined, in the sense that they
are the result of unknown, exogenous processes that control the
mobility of these nodes. In this paper, we argue that for many
applications such an assumption is too restrictive: while the
spatio-temporal coordinates of the start and end points of a node's
journey are determined by exogenous processes, the specific path that
a node may take in space-time, and hence the set of nodes it may
encounter could be controlled in such a way so as to improve the
performance of DTN routing. To that end, we consider a setting in
which each mobile node is governed by a {\em schedule} consisting of a
list of locations that the node must visit at particular
times. Typically, such schedules exhibit some level of slack, which
could be leveraged for DTN message delivery purposes. We define the
Mobility Coordination Problem (MCP) for DTNs as follows: Given a set
of nodes, each with its own schedule, and a set of messages to be
exchanged between these nodes, devise a set of node encounters that
minimize message delivery delays while satisfying all node schedules.
The MCP for DTNs is general enough that it allows us to model and
evaluate some of the existing DTN schemes, including data mules and
message ferries. In this paper, we show that MCP for DTNs is NP-hard
and propose two detour-based approaches to solve the problem. The
first (DMD) is a centralized heuristic that leverages knowledge of the
message workload to suggest specific detours to optimize message
delivery. The second (DNE) is a distributed heuristic that is
oblivious to the message workload, and which selects detours so as to
maximize node encounters. We evaluate the performance of these
detour-based approaches using extensive simulations based on synthetic
workloads as well as real schedules obtained from taxi logs in a major
metropolitan area. Our evaluation shows that our centralized,
workload-aware DMD approach yields the best performance, in terms of
message delay and delivery success ratio, and that our distributed,
workload-oblivious DNE approach yields favorable performance when
compared to approaches that require the use of data mules and message
ferries.
%R 2008-005
%T Universal Quantum Circuits
%A Bera, Debajyoti
%A Fenner, Stephen
%A Green, Fred
%A Homer, Steve
%D February 15, 2008
%U http://www.cs.bu.edu/techreports/2008-005-univcirc.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We define and construct efficient depth universal and almost size universal
quantum circuits. Such circuits can be viewed as general purpose simulators for
central classes of quantum circuits and can be used to capture the
computational power of the circuit class being simulated. For depth we
construct universal circuits whose depth is the same order as the circuits
being simulated. For size, there is a log factor blow-up in the universal
circuits constructed here. We prove that this construction is nearly optimal.
Our results apply to a number of well-studied quantum circuit classes.
%R 2008-006
%T On Clustering Images Using Compression
%A Hescott, Benjamin
%A Koulomzin, Daniel
%D February 15, 2008
%U http://www.cs.bu.edu/techreports/2008-006-cluster.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The need for the ability to cluster unknown data to better understand its
relationship to know data is prevalent throughout science. Besides a
better understanding of the data itself or learning about a new unknown
object, cluster analysis can help with processing data, data
standardization, and outlier detection. Most clustering algorithms are
based on known features or expectations, such as the popular partition
based, hierarchical, density-based, grid based, and model based
algorithms. The choice of algorithm depends on many factors, including
the type of data and the reason for clustering, nearly all rely on some
known properties of the data being analyzed. Recently, Li et. al.
proposed a new universal similarity metric, this metric needs no
prior knowledge about the object. Their similarity metric is based on the
Kolmogorov Complexity of objects, the objects minimal description. While
the Kolmogorov Complexity of an object is not computable, in "Clustering
by Compression," Cilibrasi and Vitanyi use common compression algorithms
to approximate the universal similarity metric and cluster objects with
high success. Unfortunately, clustering using compression does not
trivially extend to higher dimensions. Here we outline a method to adapt
their procedure to images. We test these techniques on images of letters
of the alphabet.
%R 2008-007
%T Non-Uniform Reductions
%A Buhrman, Harry
%A Hescott, Benjamin
%A Homer, Steve
%A Torrenvliet, Lane
%D February 15, 2008
%U http://www.cs.bu.edu/techreports/2008-007-nonuniform.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We study properties of non-uniform reductions and related completeness
notions. We strengthen several results of Hitchcock and Pavan and give
a trade-off between the amount of advice needed for a reduction and
its honesty on NEXP. We construct an oracle relative to which this
trade-off is optimal. We show, in a more systematic study of non-uniform
reductions, that among other things non-uniformity can be removed at the
cost of more queries. In line with Post's program for complexity
theory we connect such `uniformization' properties to the separation of
complexity classes.
%R 2008-008
%T Layered graphical models for tracking partially-occluded objects
%A Ablavsky, Vitaly
%A Thangali, Ashwin
%A Sclaroff, Stan
%D March 27, 2008
%U http://www.cs.bu.edu/techreports/2008-008-layered-graphical-models.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Partial occlusions are commonplace in a variety of real world
computer vision applications: surveillance, intelligent
environments, assistive robotics, autonomous navigation, etc. While
occlusion handling methods have been proposed, most methods tend to
break down when confronted with numerous occluders in a scene. In this
paper, a layered image-plane representation for tracking
people through substantial occlusions is proposed. An image-plane
representation of motion around an object is associated with a
pre-computed graphical model, which can be instantiated
efficiently during online tracking. A global state and
observation space is obtained by linking transitions between
layers. A Reversible Jump Markov Chain Monte Carlo approach is
used to infer the number of people and track them online. The
method outperforms two state-of-the-art methods for tracking over
extended occlusions, given videos of a parking lot with numerous
vehicles and a laboratory with many desks and workstations.
%R 2008-009
%T Multiplicative Kernels: Object Detection, Segmentation and Pose Estimation
%A Yuan, Quan
%A Thangali, Ashwin
%A Ablavsky, Vitaly
%A Sclaroff, Stan
%D March 27, 2008
%U http://www.cs.bu.edu/techreports/2008-009-multiplicative-kernels.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Object detection is challenging when the object class exhibits
large within-class variations. In this work, we show that
foreground-background classification~(detection) and within-class
classification of the foreground class~(pose estimation) can be
jointly learned in a multiplicative form of two kernel functions.
One kernel measures similarity for foreground-background
classification. The other kernel accounts for latent factors that
control within-class variation and implicitly enables feature
sharing among foreground training samples. Detector training can
be accomplished via standard SVM learning. The resulting detectors
are tuned to specific variations in the foreground class. They
also serve to evaluate hypotheses of the foreground state. When
the foreground parameters are provided in training, the detectors
can also produce parameter estimate. When the foreground object
masks are provided in training, the detectors can also produce
object segmentation. The advantages of our method over past
methods are demonstrated on data sets of human hands and vehicles.
%R 2008-010
%T Camera Canvas: Image Editor for People with Severe Disabilities
%A Kim, Won-Beom
%A Kwan, Christopher
%A Fedyuk, Igor
%A Betke, Margrit
%D June 14, 2008
%U http://www.cs.bu.edu/techreports/2008-010-camera-canvas.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Camera Canvas is an image editing software package for users with
severe disabilities that limit their mobility. It is specially
designed for Camera Mouse, a camera-based mouse-substitute input
system. Users can manipulate images through various head movements,
tracked by Camera Mouse. The system is also fully usable with
traditional mouse or touch-pad input. Designing the system, we studied
the requirements and solutions for image editing and content creation
using Camera Mouse. Experiments with 20 subjects, each testing Camera
Canvas with Camera Mouse as the input mechanism, showed that users
found the software easy to understand and operate. User feedback was
taken into account to make the software more usable and the interface
more intuitive. We suggest that the Camera Canvas software makes
important progress in providing a new medium of utility and creativity
in computing for users with severe disabilities.
%R 2008-011
%T A Type System For Safe SN Resource Allocation
%A Ocean, Michael
%A Kfoury, Assaf
%A Bestavros, Azer
%D June 14, 2008
%U http://www.cs.bu.edu/techreports/2008-011-snBench-type-system.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
snBench is a platform on which novice users compose and deploy
distributed sense and respond programs for simultaneous execution on a
shared infrastructure. It is a natural imperative that we have the
ability to (1) verify the safety/correctness of newly submitted tasks
and (2) derive the resource requirements for these tasks such that
allocation may occur. Other works provide static analysis techniques
(i.e., type systems) to bound the size of data in order to establish
an upper bound on a program's required computational resources (i.e.,
storage and CPU time). We recognize a significant shortcoming in these
approaches: The upper-bound approach breaks down in many scenarios in
which data has other relationships that must be tracked. This is
especially true when images are considered a first class data type, as
some image manipulation functions may require a particular minimum or
maximum resolution to operate correctly. In fact, the results here
have benefit beyond the application to image data, and may be extended
to other data types that require tracking multiple dimensions (e.g.,
image "quality", video frame-rate or aspect ratio, and audio sampling
rate). In this technical report we present our work to provide a sized
type system for our functional-style Domain Specic Language (DSL)
called Sensor Task Execution Plan, or STEP. Our sized type system
provides an estimate of computational resource requirements as well as
resource requirements derived from implicit constraints in the static
analysis of STEP instances. We can use use these constraints to
statically verify program safety, guide resource allocation, or
enforce run-time checks that ensure the run-time behavior adheres to
the static analysis. We present the syntax and semantics of our
functional language, our type system that builds costs and
resource/data constraints, and (through both formalism and specic
details of our implementation) provide concrete examples of how the
constraints and sizing information are used in practice.
%R 2008-012
%T A Two-Tiered On-Line Server-Side Bandwidth Reservation Framework for the Real-Time Delivery of Multiple Video Streams
%A Londono, Jorge
%A Bestavros, Azer
%D July 1, 2008
%U http://www.cs.bu.edu/techreports/2008-012-2tiered-multiple-video-reservation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The advent of virtualization and cloud computing technologies
necessitates the development of effective mechanisms for the
estimation and reservation of resources needed by content providers to
deliver large numbers of video-on-demand (VOD) streams through the
cloud. Unfortunately, capacity planning for the QoS-constrained
delivery of a large number of VOD streams is inherently difficult as
VBR encoding schemes exhibit significant bandwidth variability. In
this paper, we present a novel resource management scheme to make such
allocation decisions using a mixture of per-stream reservations and an
aggregate reservation, shared across all streams to accommodate peak
demands. The shared reservation provides capacity {\em slack} that
enables statistical multiplexing of peak rates, while assuring
analytically bounded frame-drop probabilities, which can be adjusted
by trading off buffer space (and consequently delay) and bandwidth.
Our two-tiered bandwidth allocation scheme enables the delivery of any
set of streams with less bandwidth (or equivalently with higher link
utilization) than state-of-the-art deterministic smoothing
approaches. The algorithm underlying our proposed framework uses three
per-stream parameters and is linear in the number of servers, making
it particularly well suited for use in an on-line setting. We present
results from extensive trace-driven simulations, which confirm the
efficiency of our scheme especially for small buffer sizes and delay
bounds, and which underscore the significant realizable bandwidth
savings, typically yielding losses that are an order of magnitude or
more below our analytically derived bounds.
%R 2008-013
%T Supporting Predicate Routing in DTN over MANET
%A Aggradi, Gabriele
%A Esposito, Flavio
%A Matta, Ibrahim
%D July 10, 2008
%U http://www.cs.bu.edu/techreports/2008-013-preda.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider a Delay Tolerant Network (DTN) whose users (nodes) are
connected by an underlying Mobile Ad hoc Network (MANET)
substrate. Users can declaratively express high-level policy
constraints on how ``content" should be routed. For example, content
may be diverted through an intermediary DTN node for the purposes of
preprocessing, authentication, etc. To support such capability, we
implement Predicate Routing where high-level constraints of DTN nodes
are mapped into low-level routing predicates at the MANET level. Our
testbed uses a Linux system architecture and leverages User Mode Linux
to emulate every node running a DTN Reference Implementation code. In
our initial prototype, we use the On Demand Distance Vector (AODV)
MANET routing protocol. We use the network simulator ns-2
(ns-emulation version) to simulate the mobility and wireless
connectivity of both DTN and MANET nodes. We show preliminary
throughput results showing the efficient and correct operation of
propagating routing predicates, and as a side effect, the performance
benefit of content re-routing that dynamically (on-demand) breaks the
underlying end-to-end TCP connection into shorter-length TCP
connections.
%R 2008-014
%T Declarative Transport: No more transport protocols to design, only policies to specify
%A Mattar, Karim
%A Matta, Ibrahim
%A Day, John
%A Ishakian, Vatche
%A Gursun, Gonca
%D July 12, 2008
%U http://www.cs.bu.edu/techreports/2008-014-declarative-transport.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Transport protocols are an integral part of the inter-process
communication (IPC) service used by application processes to
communicate over the network infrastructure. With almost 30 years of
research on transport, one would have hoped that we have a good handle
on the problem. Unfortunately, that is not true. As the Internet
continues to grow, new network technologies and new applications
continue to emerge putting transport protocols in a never-ending flux
as they are continuously adapted for these new environments. We
propose a clean-slate transport architecture that renders all possible
transport solutions degenerate forms of a single structure. We
identify a minimal set of mechanisms that once instantiated with the
appropriate policies allows any transport solution to be
realized. Given our proposed architecture, we contend that there are
no more transport protocols to design---only policies to implement.
We implement our transport architecture in a declarative language,
Network Datalog (NDlog), making the specification of different
transport policies easy, compact, reusable, dynamically configurable
and potentially verifiable. In NDlog, transport state is represented
as database relations, state is updated/queried using database
operations, and transport policies are specified using declarative
rules. We identify limitations with NDlog that could potentially
threaten the correctness of our specification. We propose several
language extensions to NDlog that would significantly improve the
programmability of transport policies.
%R 2008-015
%T A New Lower Bound Technique for Quantum Circuits without Ancillae
%A Bera, Debajyoti
%D July 22, 2008
%U http://www.cs.bu.edu/techreports/2008-015-quantum-circuits-without-ancillae-lower-bound.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a technique to derive depth lower bounds for quantum circuits.
The technique is based on the observation that in circuits without
ancillae, only a few input states can set all the control qubits of a
Toffoli gate to $1$. This can be used to selectively remove large Toffoli
gates from a quantum circuit while keeping the cumulative error low. We
use the technique to give another proof that parity cannot be computed by
constant depth quantum circuits without ancillae.
%R 2008-016
%T The EGOIST Overlay Routing System
%A Smaragdakis, Georgios
%A Lekakis, Vassilis
%A Laoutaris, Nikolaos
%A Bestavros, Azer
%A Byers, John
%A Roussopoulos, Mema
%D July 22, 2008
%U http://www.cs.bu.edu/techreports/2008-016-egoist-overlay-routing-system.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A foundational issue underlying many overlay network applications
ranging from routing to peer-to-peer file sharing is that of
connectivity management, i.e., folding new arrivals into an existing
overlay, and re-wiring to cope with changing network conditions.
Previous work has considered the problem from two perspectives:
devising practical heuristics for specific applications designed to
work well in real deployments, and providing abstractions for the
underlying problem that are analytically tractable, especially via
game-theoretic analysis. In this paper, we unify these two thrusts by
using insights gleaned from novel, realistic theoretic models in the
design of EGOIST -- a distributed overlay routing system that we
implemented, deployed, and evaluated on PlanetLab. Using extensive
measurements of paths between nodes, we demonstrate that EGOIST's
neighbor selection primitives significantly outperform existing
heuristics on a variety of performance metrics, including delay,
available bandwidth, and node utilization. Moreover, we demonstrate
that EGOIST is competitive with an optimal, but unscalable full-mesh
approach, remains highly effective under significant churn, is robust
to cheating, and incurs minimal overhead. Finally, we use a
multiplayer peer-to-peer game to demonstrate the value of EGOIST to
end-user applications. (Note: This technical report supersedes
BUCS-TR-2007-013).
%R 2008-017
%T An Online Distributed Algorithm for Inferring Policy Routing Configurations
%A Epstein, Samuel
%A Matta, Ibrahim
%A Mattar, Karim
%D August 15, 2008
%U http://www.cs.bu.edu/techreports/2008-017-distributed-policy-routing-inference.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present an online distributed algorithm, the Causation Logging
Algorithm (CLA), in which Autonomous Systems (ASes) in the Internet
individually report route oscillations/flaps they experience to a central
Internet Routing Registry (IRR). The IRR aggregates these reports and may
observe what we call causation chains where each node on the chain caused
a route flap at the next node along the chain. A chain may also have a
causation cycle. The type of an observed causation chain/cycle allows the
IRR to infer the underlying policy routing configuration (i.e., the system
of economic relationships and constraints on route/path preferences).
Our algorithm is based on a formal policy routing model that captures the
propagation dynamics of route flaps under arbitrary changes in topology or
path preferences. We derive invariant properties of causation
chains/cycles for ASes which conform to economic relationships based on
the popular Gao-Rexford model. The Gao-Rexford model is known to be safe
in the sense that the system always converges to a stable set of paths
under static conditions. Our CLA algorithm recovers the type/property of
an observed causation chain of an underlying system and determines whether
it conforms to the safe economic Gao-Rexford model. Causes for
nonconformity can be diagnosed by comparing the properties of the
causation chains with those predicted from different variants of the
Gao-Rexford model.
%R 2008-018
%T Indexing Methods For Efficient Multiclass Recognition (MA Thesis)
%A Stefan, Alexandra
%D August 15, 2008
%U http://www.cs.bu.edu/techreports/2008-018-MA-Thesis-Alexandra-Stephan.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Many real world image analysis problems, such as face recognition and
hand pose estimation, involve recognizing a large number of classes of
objects or shapes. Large margin methods, such as AdaBoost and Support
Vector Machines (SVMs), often provide competitive accuracy rates, but
at the cost of evaluating a large number of binary classifiers, thus
making it difficult to apply such methods when thousands or millions
of classes need to be recognized. This thesis proposes a
filter-and-refine framework, whereby, given a test pattern, a small
number of candidate classes can be identified efficiently at the
filter step, and computationally expensive large margin classifiers
are used to evaluate these candidates at the refine step. Two
different filtering methods are proposed, ClassMap and OVA-VS
(One-vs.-All classification using Vector Search).
ClassMap is an embedding-based method, works for both boosted
classifiers and SVMs, and tends to map the patterns and their
associated classes close to each other in a vector space. OVA-VS maps
OVA classifiers and test patterns to vectors based on the weights and
outputs of weak classifiers of the boosting scheme. At runtime,
finding the strongest-responding OVA classifier becomes a classical
vector search problem, where well-known methods can be used to gain
efficiency.
%R 2008-019
%T "Networking is IPC": A Guiding Principle to a Better Internet
%A Day, John
%A Matta, Ibrahim
%A Mattar, Karim
%D August 15, 2008
%U http://www.cs.bu.edu/techreports/2008-019-IPC-arch.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This position paper outlines a new network architecture, i.e., a style
of construction that identifies the objects and how they relate. We do
not specify particular protocol implementations or specific interfaces
and policies. After all, it should be possible to change protocols in
an architecture without changing the architecture. Rather we outline
the repeating patterns and structures, and how the proposed model
would cope with the challenges faced by today's Internet (and that of
the future). Our new architecture is based on the following principle:
"Application processes communicate via a distributed inter-process
communication (IPC) facility. The application processes that make up
this facility provide a protocol that implements an IPC mechanism, and
a protocol for managing distributed IPC (routing, security and other
management tasks)." Existing implementation strategies, algorithms,
and protocols can be cast and used within our proposed new structure.
%R 2008-020
%T The Sensor Network Workbench: Towards Functional Specification, Verification And Deployment Of Constrained Distributed Systems (PhD Thesis)
%A Ocean, Michael
%D September 10, 2008
%U http://www.cs.bu.edu/techreports/2008-020-PhD-Thesis-Michael-Ocean.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
As the commoditization of sensing, actuation and communication
hardware increases, so does the potential for dynamically tasked sense
and respond networked systems (i.e., Sensor Networks or SNs) to
replace existing disjoint and inexible special-purpose deployments
(closed-circuit security video, anti-theft sensors, etc.). While
various solutions have emerged to many individual SN-centric
challenges (e.g., power management, communication protocols, role
assignment), perhaps the largest remaining obstacle to widespread SN
deployment is that those who wish to deploy, utilize, and maintain a
programmable Sensor Network lack the programming and systems expertise
to do so. The contributions of this thesis centers on the design,
development and deployment of the SN Workbench (snBench). snBench
embodies an accessible, modular programming platform coupled with a
exible and extensible run-time system that, together, support the
entire life-cycle of distributed sensory services. As it is impossible
to nd a one-size- ts-all programming interface, this work advocates
the use of tiered layers of abstraction that enable a variety of
high-level, domain specic languages to be compiled to a common
(thin-waist) tasking language; this common tasking language is
statically veried and can be subsequently re-translated, if needed,
for execution on a wide variety of hardware platforms. snBench
provides: (1) a common sensory tasking language (Instruction Set
Architecture) powerful enough to express complex SN services, yet
simple enough to be executed by highly constrained resources with
soft, real-time constraints, (2) a prototype high-level language (and
corresponding compiler) to illustrate the utility of the common
tasking language and the tiered programming approach in this domain,
(3) an execution environment and a run-time support infrastructure
that abstract a collection of heterogeneous resources into a single
virtual Sensor Network, tasked via this common tasking language, and
(4) novel formal methods (i.e., static analysis techniques) that
verify safety properties and infer implicit resource constraints to
facilitate resource allocation for new services. This thesis presents
these components in detail, as well as two specic case-studies: the
use of snBench to integrate physical and wireless network security,
and the use of snBench as the foundation for semester-long student
projects in a graduate-level Software Engineering course.
%R 2008-021
%T Extracting location from Contact Traces (MA Thesis)
%A Vasconcelos, Marisa
%D September 11, 2008
%U http://www.cs.bu.edu/techreports/2008-021-MA-Thesis-Marisa-Vasconcelos.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Localization is essential feature for many mobile wireless
applications. Data collected from applications such as environmental
monitoring, package tracking or position tracking has no meaning
without knowing the location of this data. Other applications have
location information as a building block for example, geographic
routing protocols, data dissemination protocols and location-based
services such as sensing coverage. Many of the techniques have the
trade-off among many features such as deployment of special hardware,
level of accuracy and computation power. In this paper, we present an
algorithm that extracts location constraints from the connectivity
information. Our solution, which does not require any special hardware
and a small number of landmark nodes, uses two types of location
constraints. The spatial constraints derive the estimated locations
observing which nodes are within communication range of each other.
The temporal constraints refine the areas, computed by the spatial
constraints, using properties of time and space extracted from a
contact trace. The intuition of the temporal constraints is to limit
the possible locations that a node can be using its previous and
future locations. To quantify this intuitive improvement in refine
the nodes estimated areas adding temporal information, we performed
simulations using synthetic and real contact traces. The results show
this improvement and also the difficulties of using real traces.
%R 2008-022
%T Overlay Network Creation And Maintenance With Selfish Users (PhD Thesis)
%A Smaragdakis, Georgios
%D September 12, 2008
%U http://www.cs.bu.edu/techreports/2008-022-PhD-Thesis-Georgios-Smaragdakis.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Overlay networks have been used for adding and enhancing functionality
to the end-users without requiring modifications in the Internet core
mechanisms. Overlay networks have been used for a variety of popular
applications including routing, file sharing, content distribution,
and server deployment. Previous work has focused on devising practical
neighbor selection heuristics under the assumption that users conform
to a specific wiring protocol. This is not a valid assumption in
highly decentralized systems like overlay networks. Overlay users may
act selfishly and deviate from the default wiring protocols by
utilizing knowledge they have about the network when selecting
neighbors to improve the performance they receive from the overlay.
This thesis goes against the conventional thinking that overlay users
conform to a specific protocol. The contributions of this thesis are
threefold. It provides a systematic evaluation of the design space of
selfish neighbor selection strategies in real overlays, evaluates the
performance of overlay networks that consist of users that select
their neighbors selfishly, and examines the implications of selfish
neighbor and server selection to overlay protocol design and service
provisioning respectively. This thesis develops a game-theoretic
framework that provides a unified approach to modeling Selfish
Neighbor Selection (SNS) wiring procedures on behalf of selfish
users. The model is general, and takes into consideration costs
reflecting network latency and user preference profiles, the inherent
directionality in overlay maintenance protocols, and connectivity
constraints imposed on the system designer. Within this framework the
notion of user's "best response" wiring strategy is formalized as a
k-median problem on asymmetric distance and is used to obtain overlay
structures in which no node can re-wire to improve the performance it
receives from the overlay. Evaluation results presented in this
thesis indicate that selfish users can reap substantial performance
benefits when connecting to overlay networks composed of non-selfish
users. In addition, in overlays that are dominated by selfish users,
the resulting stable wirings are optimized to such great extent that
even non-selfish newcomers can extract near-optimal performance
through naive wiring strategies. To capitalize on the performance
advantages of optimal neighbor selection strategies and the emergent
global wirings that result, this thesis presents EGOIST: an
SNS-inspired overlay network creation and maintenance routing
system. Through an extensive measurement study on the deployed
prototype, results presented in this thesis show that EGOIST's
neighbor selection primitives outperform existing heuristics on a
variety of performance metrics, including delay, available bandwidth,
and node utilization. Moreover, these results demonstrate that EGOIST
is competitive with an optimal but unscalable full-mesh approach,
remains highly effective under significant churn, is robust to
cheating, and incurs minimal overheads. This thesis also studies
selfish neighbor selection strategies for swarming applications. The
main focus is on n-way broadcast applications where each of n overlay
user wants to push its own distinct file to all other destinations as
well as download their respective data files. Results presented in
this thesis demonstrate that the performance of our swarming protocol
for n-way broadcast on top of overlays of selfish users is far
superior than the performance on top of existing overlays. In the
context of service provisioning, this thesis examines the use of
distributed approaches that enable a provider to determine the number
and location of servers for optimal delivery of content or services to
its selfish end-users. To leverage recent advances in virtualization
technologies, this thesis develops and evaluates a distributed
protocol to migrate servers based on end-users demand and only on
local topological knowledge. Results under a range of network
topologies and workloads suggest that the performance of the
distributed deployment is comparable to that of the optimal but
unscalable centralized deployment.
%R 2008-023
%T An Improved Robust Fuzzy Extractor (MA Thesis)
%A Kanukurthi, Bhavana
%D September 12, 2008
%U http://www.cs.bu.edu/techreports/2008-023-MA-Thesis-Bhavana-Kanukurthi.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider the problem of building robust fuzzy extractors, which
allow two parties holding similar random variables W, W' to agree on a
secret key R in the presence of an active adversary. Robust fuzzy
extractors were defined by Dodis et al. in Crypto 2006 to be
noninteractive, i.e., only one message P, which can be modified by an
unbounded adversary, can pass from one party to the other. This
allows them to be used by a single party at different points in time
(e.g., for key recovery or biometric authentication), but also
presents an additional challenge: what if R is used, and thus possibly
observed by the adversary, before the adversary has a chance to modify
P. Fuzzy extractors secure against such a strong attack are called
post-application robust. We construct a fuzzy extractor with
post-application robustness that extracts a shared secret key of up to
(2m-n)/2 bits (depending on error-tolerance and security parameters),
where n is the bit-length and m is the entropy of W. The previously
best known result, also of Dodis et al., extracted up to (2m-n)/3 bits
(depending on the same parameters).
%R 2008-024
%T Service Provisioning In Mobile Networks Through Distributed Coordinated Resource Management (PhD Thesis)
%A Morcos, Hany
%D September 12, 2008
%U http://www.cs.bu.edu/techreports/2008-024-PhD-Thesis-Hany-Morcos.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The pervasiveness of personal computing platforms offers an
unprecedented opportunity to deploy large-scale services that are
distributed over wide physical spaces. Two major challenges face the
deployment of such services: the often resource-limited nature of
these platforms, and the necessity of preserving the autonomy of the
owner of these devices. These challenges preclude using centralized
control and preclude considering services that are subject to
performance guarantees. To that end, this thesis advances a number of
new distributed resource management techniques that are shown to be
effective in such settings, focusing on two application domains:
distributed Field Monitoring Applications (FMAs), and Message Delivery
Applications (MDAs). In the context of FMA, this thesis presents two
techniques that are well-suited to the fairly limited storage and
power resources of autonomously mobile sensor nodes. The first
technique relies on amorphous placement of sensory data through the
use of novel storage management and sample diffusion techniques. The
second approach relies on an information-theoretic framework to
optimize local resource management decisions. Both approaches are
proactive in that they aim to provide nodes with a view of the
monitored field that reflects the characteristics of queries over that
field, enabling them to handle more queries locally, and thus reduce
communication overheads. Then, this thesis recognizes node mobility
as a resource to be leveraged, and in that respect proposes novel
mobility coordination techniques for FMAs and MDAs. Assuming that node
mobility is governed by a spatio-temporal schedule featuring some
slack, this thesis presents novel algorithms of various computational
complexities to orchestrate the use of this slack to improve the
performance of supported applications. The findings in this thesis,
which are supported by analysis and extensive simulations, highlight
the importance of two general design principles for distributed
systems. First, apriori knowledge (e.g., about the target phenomena of
FMAs and/or the workload of either FMAs or DMAs) could be used
effectively for local resource management. Second, judicious leverage
and coordination of node mobility could lead to significant
performance gains for distributed applications deployed over
resource-impoverished infrastructures.
%R 2008-025
%T Camera-based Interfaces and Assistive Software for People with Severe Motion Impairments
%A Betke, Margrit
%D October 1, 2008
%U http://www.cs.bu.edu/techreports/2008-025-assistive-software.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Intelligent assistive technology can greatly improve the daily lives
of people with severe paralysis, who have limited communication
abilities. People with motion impairments often prefer camera-based
communication interfaces, because these are customizable, comfortable,
and do not require user-borne accessories that could draw attention to
their disability. This technical report gives an overview of
assistive software that were specifically designed for camera-based
interfaces such as the Camera Mouse, which serves as a
mouse-replacement input system. The applications include software for
text-entry, web browsing, image editing, animation, and music therapy.
Using this software, people with severe motion impairments can
communicate with friends and family and have a medium to explore their
creativity.
%R 2008-026
%T A Typed Language for Truthful One-Dimensional Mechanism Design
%A Lapets, Andrei
%A Levin, Alex
%A Parkes, David
%D October 9, 2008
%U http://www.cs.bu.edu/techreports/2008-026-language-for-truthful-mechanism-design.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We first introduce a very simple typed language for expressing
allocation algorithms that allows automatic verification that an
algorithm is monotonic and therefore truthful. The analysis of
truthfulness is accomplished using a syntax-directed transformation
which constructs a proof of monotonicity based on an exhaustive
critical-value analysis of the algorithm. We then define a more
high-level, general-purpose programming language with typical
constructs, such as those for defining recursive functions, along with
primitives that match allocation algorithm combinators found in the
work of Mu'alem and Nisan. We demonstrate how this language can be
used to combine both primitive and user-defined combinators, allowing
it to capture a collection of basic truthful allocation algorithms. In
addition to demonstrating the value of programming language design
techniques in application to a specific domain, this work suggests a
blueprint for interactive tools that can be used to teach the simple
principles of truthful mechanism design.
%R 2008-027
%T Generalized Methods for Discovering Frequent Poly-Regions in DNA
%A Papapetrou, Panagiotis
%A Benson, Gary
%A Kollios, George
%D October 17, 2008
%U http://www.cs.bu.edu/techreports/2008-027-dna-poly-regions.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The problem of discovering frequent poly-regions (i.e. regions of high
occurrence of a set of items or patterns of a given alphabet)in a
sequence is studied, and three efficient approaches areproposed to
solve it. The first one is entropy-based and appliesa recursive
segmentation technique that produces a set of candidate segments which
may potentially lead to a poly-region.The key idea of the second
approach is the use of a set of slidingwindows over the sequence. Each
sliding window covers a sequencesegment and keeps a set of statistics
that mainly include the number of occurrences of each item or pattern
in that segment.Combining these statistics efficiently yields the
complete set ofpoly-regions in the given sequence. The third approach
applies atechnique based on the majority vote, achieving linear
running time with a minimal number of false negatives. After
identifyingthe poly-regions, the sequence is converted to a sequence
oflabeled intervals (each one corresponding to a
poly-region). Anefficient algorithm for mining frequent arrangements
of intervals is applied to the converted sequence to discover
frequentlyoccurring arrangements of poly-regions in different parts of
DNA,including coding regions. The proposed algorithms are tested
onvarious DNA sequences producing results of significant biological
meaning.
%R 2008-028
%T Hierarchical Characterization and Generation of Blogosphere Workloads
%A Duarte, Fernando
%A Mattos, Bernardo
%A Almeida, Jussara
%A Almeida, Virgilio
%A Curiel, Mariela
%A Bestavros, Azer
%D October 17, 2008
%U http://www.cs.bu.edu/techreports/2008-028-blog-workload-generation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a thorough characterization of the access patterns in
blogspace, which comprises a rich interconnected web of blog postings
and comments by an increasingly prominent user community that
collectively define what has become known as the blogosphere. Our
characterization of over 35 million read, write, and management
requests spanning a 28-day period is done at three different levels. The
user view characterizes how individual users interact with blogosphere
objects (blogs); the object view characterizes how individual blogs
are accessed; the server view characterizes the aggregate access
patterns of all users to all blogs. The more-interactive nature of the
blogosphere leads to interesting traffic and communication patterns,
which are different from those observed for traditional web content. We
identify and characterize novel features of the blogosphere workload,
and we show the similarities and differences between typical web server
workloads and blogosphere server workloads. Finally, based on our main
characterization results, we build a new synthetic blogosphere
workload generator called GBLOT, which aims at mimicking closely a
stream of requests originating from a population of blog users. Given
the increasing share of blogspace traffic, realistic workload models and
tools are important for capacity planning and traffic engineering
purposes.
%R 2008-029
%T Indexing Distances In Large Graphs And Applications In Search Tasks (MA Thesis)
%A Potamias, Michalis
%D December 1, 2008
%U http://www.cs.bu.edu/techreports/2008-029-MA-Thesis-Michalis-Potamias.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This thesis elaborates on the problem of preprocessing a large graph
so that single-pair shortest-path queries can be answered quickly at
runtime. Computing shortest paths is a well studied problem, but
exact algorithms do not scale well to real-world huge graphs in
applications that require very short response time. The focus is on
approximate methods for distance estimation, in particular in
landmarks-based distance indexing. This approach involves choosing
some nodes as landmarks and computing (offline), for each node in the
graph its embedding, i.e., the vector of its distances from all the
landmarks. At runtime, when the distance between a pair of nodes is
queried, it can be quickly estimated by combining the embeddings of
the two nodes. Choosing optimal landmarks is shown to be hard and thus
heuristic solutions are employed. Given a budget of memory for the
index, which translates directly into a budget of landmarks, different
landmark selection strategies can yield dramatically different results
in terms of accuracy. A number of simple methods that scale well to
large graphs are therefore developed and experimentally compared. The
simplest methods choose central nodes of the graph, while the more
elaborate ones select central nodes that are also far away from one
another. The efficiency of the techniques presented in this thesis is
tested experimentally using five different real world graphs with
millions of edges; for a given accuracy, they require as much as 250
times less space than the current approach which considers selecting
landmarks at random. Finally, they are applied in two important
problems arising naturally in large-scale graphs, namely social search
and community detection.
%R 2008-030
%T Forwarding in Mobile Opportunistic Networks (PhD Thesis)
%A Erramilli, Vijay
%D December 19, 2008
%U http://www.cs.bu.edu/techreports/2008-030-PhD-Thesis-Vijay-Erramilli.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Recent advances in processor speeds, mobile communications and battery
life have enabled computers to evolve from completely wired to
completely mobile. In the most extreme case, all nodes are mobile and
communication takes place at available opportunities -- using both
traditional communication infrastructure as well as the mobility of
intermediate nodes. These are \emph{mobile opportunistic} networks.
Data communication in such networks is a difficult problem, because of
the dynamic underlying topology, the scarcity of network resources and
the lack of global information. Establishing end-to-end routes in
such networks is usually not feasible. Instead a store-and-carry
forwarding paradigm is better suited for such networks. This
dissertation describes and analyzes algorithms for forwarding of
messages in such networks. In order to design effective forwarding
algorithms for mobile opportunistic networks, we start by first
building an understanding of the set of all paths between nodes, that
represent the available opportunities for any forwarding
algorithm. Relying on real measurements, we enumerate paths between
nodes and uncover what we refer to as the \emph{path explosion}
effect. The term path explosion refers to the fact that the number of
paths between a randomly selected pair of nodes increases
exponentially with time. We draw from the theory of epidemics to model
and explain the path explosion effect. This is the first contribution
of the thesis, and is a key observation that underlies subsequent
results. Our second contribution is the study of forwarding
algorithms. For this, we rely on trace driven simulations of different
algorithms that span a range of design dimensions. We compare the
performance (success rate and average delay) of these algorithms. We
make the surprising observation that most algorithms we consider have
roughly similar performance. We explain this result in light of the
path explosion phenomenon. While the performance of most algorithms
we studied was roughly the same, these algorithms differed in terms of
cost. This prompted us to focus on designing algorithms with the
explicit intent of reducing costs. For this, we cast the problem of
forwarding as an optimal stopping problem. Our third main
contribution is the design of strategies based on optimal stopping
principles which we refer to as \emph{Delegation} schemes. Our
analysis shows that using a delegation scheme reduces cost over naive
forwarding by a factor of $O(\sqrt N)$, where $N$ is the number of
nodes in the network. We further validate this result on real traces,
where the cost reduction observed is even greater. Our results so far
include a key assumption, which is unbounded buffers on nodes. Next,
we relax this assumption, so that the problem shifts to one of
prioritization of messages for transmission and dropping. Our fourth
contribution is the study of message prioritization schemes, combined
with forwarding. Our main result is that one achieves higher
performance by assigning higher priorities to young messages in the
network. We again interpret this result in light of the path explosion
effect.
%R 2008-031
%T Lightweight Modeling of Java Virtual Machine Security Constraints using Alloy
%A Reynolds, Mark C.
%D December 30, 2008
%U http://www.cs.bu.edu/techreports/2008-031-jvm-alloy.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The Java programming language has been widely described as secure by
design. Nevertheless, a number of serious security vulnerabilities have
been discovered in Java, particularly in the component known as the
Bytecode Verifier. This paper describes a method for representing Java
security constraints using the Alloy modeling language. It further
describes a system for performing a security analysis on any block of Java
bytecodes by converting the bytes into relation initializers in Alloy. Any
counterexamples found by the Alloy analyzer correspond directly to
insecure code. Analysis of a real-world malicious applet is given
to demonstrate the efficacy of the approach.
%R 2009-001
%T Foundational Theory for Understanding Policy Routing Dynamics
%A Mattar, Karim
%A Epstein, Sam
%A Matta, Ibrahim
%D January 30, 2009
%U http://www.cs.bu.edu/techreports/2009-001-DPR-theory.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper we introduce a theory of policy routing dynamics based
on fundamental axioms of routing update mechanisms. We develop a
dynamic policy routing model (DPR) that extends the static formalism
of the stable paths problem (introduced by Griffin et al.) with
discrete synchronous time. DPR captures the propagation of path
changes in any dynamic network irrespective of its time-varying
topology. We introduce several novel structures such as causation
chains, dispute fences and policy digraphs that model different
aspects of routing dynamics and provide insight into how these
dynamics manifest in a network. We exercise the practicality of the
theoretical foundation provided by DPR with two fundamental problems:
routing dynamics minimization and policy conflict detection. The
dynamics minimization problem utilizes policy digraphs, that capture
the dependencies in routing policies irrespective of underlying
topology dynamics, to solve a graph optimization problem. This
optimization problem explicitly minimizes the number of routing update
messages in a dynamic network by optimally changing the path
preferences of a minimal subset of nodes. The conflict detection
problem, on the other hand, utilizes a theoretical result of DPR where
the root cause of a causation cycle (i.e., cycle of routing update
messages) can be precisely inferred as either a transient route flap
or a dispute wheel (i.e., policy conflict). Using this result we
develop SafetyPulse, a token-based distributed algorithm to detect
policy conflicts in a dynamic network. SafetyPulse is privacy
preserving, computationally efficient, and provably correct.
%R 2009-002
%T Collocation Games And Their Application to Distributed Resource Management
%A Londono, Jorge
%A Bestavros, Azer
%A Teng, Shang-Hua
%D February 7, 2009
%U http://www.cs.bu.edu/techreports/2009-002-collocation-games.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We introduce Collocation Games as the basis of a general framework for
modeling, analyzing, and facilitating the interactions between the
various stakeholders in distributed systems in general, and in cloud
computing environments in particular. Cloud computing enables
fixed-capacity (processing, communication, and storage) resources to
be offered by infrastructure providers as commodities for sale at a
fixed cost in an open marketplace to independent, rational parties
(players) interested in setting up their own applications over the
Internet. Virtualization technologies enable the partitioning of such
fixed-capacity resources so as to allow each player to dynamically
acquire appropriate fractions of the resources for unencumbered
use. In such a paradigm, the resource management problem reduces to
that of partitioning the entire set of applications (players) into
subsets, each of which is assigned to fixed-capacity cloud resources.
If the infrastructure and the various applications are under a single
administrative domain, this partitioning reduces to an optimization
problem whose objective is to minimize the overall deployment cost. In
a marketplace, in which the infrastructure provider is interested in
maximizing its own profit, and in which each player is interested in
minimizing its own cost, it should be evident that a global
optimization is precisely the wrong framework. Rather, in this paper
we use a game-theoretic framework in which the assignment of players
to fixed-capacity resources is the outcome of a strategic "Collocation
Game". Although we show that determining the existence of an
equilibrium for collocation games in general is NP-hard, we present a
number of simplified, practically-motivated variants of the
collocation game for which we establish convergence to a Nash
Equilibrium, and for which we derive convergence and price of anarchy
bounds. In addition to these analytical results, we present an
experimental evaluation of implementations of some of these variants
for cloud infrastructures consisting of a collection of
multidimensional resources of homogeneous or heterogeneous
capacities. Experimental results using trace-driven simulations and
synthetically generated datasets corroborate our analytical results
and also illustrate how collocation games offer a feasible distributed
resource management alternative for autonomic/self-organizing systems,
in which the adoption of a global optimization approach (centralized
or distributed) would be neither practical nor justifiable.
%R 2009-003
%T Angels -- In-Network Support for Minimum Distribution Time in P2P Overlays
%A Sweha, Raymond
%A Bestavros, Azer
%A Byers, John
%D February 10, 2009
%U http://www.cs.bu.edu/techreports/2009-003-angels.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This papers proposes the use of in-network caches (which we call
Angels) to reduce the Minimum Distribution Time (MDT) of a file from a
seeder to a set of leachers. An Angel is not a leacher in the sense
that it is not interested in receiving the entire file, but rather it
is interested in minimizing the MDT to all leachers, and as such uses
its storage and up/down-link capacity to cache and forward parts of
the file to other peers. We extend the analytical results by Kumar
and Ross [1] to allow for the presence of angels by deriving a new
lower bound for MDT. We show that this new lower bound is tight by
proposing a distribution strategy under assumptions of a fluid
model. We present a GroupTree heuristic that addresses the
impracticalities of the fluid model. We evaluate our designs through
simulations that show that our GroupTree heuristic outperforms other
heuristics, that it scales well, and that it operates near the optimal
theoretical bounds.
%R 2009-004
%T Fast shortest path distance estimation in large networks
%A Potamias, Michalis
%A Bonchi, Francesco
%A Castillo, Carlos
%A Gionis, Aristides
%D March 6, 2009
%U http://www.cs.bu.edu/techreports/2009-004-shortest-distance-estimation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We study the problem of preprocessing a large graph so that
point-to-point shortest-path queries can be answered very fast.
Computing shortest paths is a well studied problem, but exact
algorithms do not scale to huge graphs encountered on the web, social
networks, and other applications. In this paper we focus on
approximate methods for distance estimation, in particular using
landmark-based distance indexing. This approach involves selecting a
subset of nodes as landmarks and computing (offline) the distances
from each node in the graph to those landmarks. At run time, when the
distance between a pair of nodes is needed, we can estimate it quickly
by combining the precomputed distances of the two nodes to the
landmarks. We prove that selecting the optimal set of landmarks is an
NP-hard problem, and thus heuristic solutions need to be employed.
Given a budget of memory for the index, which translates directly into
a budget of landmarks, different landmark selection strategies can
yield dramatically different results in terms of accuracy. A number of
simple methods that scale well to large graphs are therefore developed
and experimentally compared. The simplest methods choose central nodes
of the graph, while the more elaborate ones select central nodes that
are also far away from one another. The efficiency of the suggested
techniques is tested experimentally using five different real world
graphs with millions of edges; for a given accuracy, they require as
much as 250 times less space than the current approach in the
literature which considers selecting landmarks at random. Finally, we
study applications of our method in two problems arising naturally in
large-scale networks, namely, social search and community detection.
%R 2009-005
%T Tracking a Large Number of Objects from Multiple Views
%A Wu, Zheng
%A Hristov, Nickolay
%A Hedrick, Tyson
%A Kunz, Thomas
%A Betke, Margrit
%D March 10, 2009
%U http://www.cs.bu.edu/techreports/2009-005-multiview-tracking.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose a multi-object multi-view tracking framework for tracking
large numbers of tightly-spaced objects that rapidly move in three
dimensions. We formulate the problem of finding correspondences
across multiple views as a multidimensional assignment problem and use
a greedy randomized adaptive search procedure to solve this NP-hard
problem efficiently. To account for occlusions, we relax the
one-to-one constraint that one measurement corresponds to one object
and iteratively solve the relaxed assignment problem. After
correspondences are established, object trajectories are estimated by
stereoscopic reconstruction using an epipolar-neighborhood search. We
embedded our method into a tracker-to-tracker multi-view fusion system
that not only obtains the three-dimensional trajectories of
closely-moving objects but also accurately settles track uncertainties
that could not be resolved from single views due to occlusion. We
conducted experiments to validate our greedy assignment procedure and
our technique to recover from occlusions. We successfully track
hundreds of flying bats and provide an analysis of their group
behavior based on 150 reconstructed 3D trajectories.
%R 2009-006
%T Active Hidden Models for Tracking with Kernel Projections
%A Epstein, Samuel
%A Betke, Margrit
%D March 10, 2009
%U http://www.cs.bu.edu/techreports/2009-006-AHM-tracking.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We introduce Active Hidden Models (AHM) that utilize kernel methods
traditionally associated with classification. We use AHMs to track
deformable objects in video sequences by leveraging kernel
projections. We introduce the `` subset projection'' method which
improves the efficiency of our tracking approach by a factor of
ten. We successfu lly tested our method on facial tracking with
extreme head movements (including full 180-degree head rotation), fa
cial expressions, and deformable objects. Given a kernel and a set of
training observations, we derive unbiased e stimates of the accuracy
of the AHM tracker. Kernels are generally used in classification
methods to make trainin g data linearly separable. We prove that the
optimal (minimum variance) tracking kernels are those that make the
training observations linearly dependent.
%R 2009-007
%T Example-Based Image Registration via Boosted Classifiers
%A Mullally, William
%A Sclaroff, Stan
%A Betke, Margrit
%D March 11, 2009
%U http://www.cs.bu.edu/techreports/2009-007-boosted-classifiers-for-image-registration.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose a novel image registration framework which uses classifiers
trained from examples of aligned images to achieve registration. Our
approach is designed to register images of medical data where the
physical condition of the patient has changed significantly and image
intensities are drastically different. We use two boosted classifiers
for each degree of freedom of image transformation. These two
classifiers can both identify when two images are correctly aligned
and provide an efficient means of moving towards correct registration
for misaligned images. The classifiers capture local alignment
information using multi-pixel comparisons and can therefore achieve
correct alignments where approaches like correlation and
mutual-information which rely on only pixel-to-pixel comparisons
fail. We test our approach using images from CT scans acquired in a
study of acute respiratory distress syndrome. We show significant
increase in registration accuracy in comparison to a n approach using
mutual information.
%R 2009-008
%T RefLink: An Interface that Enables People with Motion Impairments to Analyze Web Content and Dynamically Link to References
%A Deshpande, Smita
%A Betke, Margrit
%D March 11, 2009
%U http://www.cs.bu.edu/techreports/2009-008-reflink.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present RefLink, an interface that allows users to analyze the
content of web page by dynamically linking to an online encyclopedia
such as Wikipedia. Upon opening a webpage, RefLink instantly provides
a list of terms extracted from the webpage and annotates each term by
the number of its occurrences in the page. RefLink uses a
text-to-speech interface to read out the list of terms. The user can
select a term of interest and follow its link to the
encyclopedia. RefLink can thus help the users to perform an informed
and efficient contextual analysis. Initial user testing suggests that
RefLink is a valuable web browsing tool, in particular for people with
motion impairments, because it greatly simplifies the process of
obtaining reference material and performing contextual analysis.
%R 2009-009
%T An alignment based similarity measure for hand detection in cluttered sign language video
%A Thangali, Ashwin
%A Sclaroff, Stan
%D March 11, 2009
%U http://www.cs.bu.edu/techreports/2009-009-hand-detection-similarity-measure.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Locating hands in sign language video is challenging due to a number
of factors. Hand appearance varies widely across signers due to
anthropometric variations and varying levels of signer
proficiency. Video can be captured under varying illumination, camera
resolutions, and levels of scene clutter, e.g., high-res video
captured in a studio vs. low-res video gathered by a web cam in a
user?s home. Moreover, the signers? clothing varies, e.g., skin-toned
clothing vs. contrasting clothing, short-sleeved vs. longsleeved
shirts, etc. In this work, the hand detection problem is addressed in
an appearance matching framework. The Histogram of Oriented Gradient
(HOG) based matching score function is reformulated to allow non-rigid
alignment between pairs of images to account for hand shape
variation. The resulting alignment score is used within a Support
Vector Machine hand/not-hand classifier for hand detection. The new
matching score function yields improved performance (in ROC area and
hand detection rate) over the Vocabulary Guided Pyramid Match Kernel
(VGPMK) and the traditional, rigid HOG distance on American Sign
Language video gestured by expert signers. The proposed match score
function is computationally less expensive (for training and testing),
has fewer parameters and is less sensitive to parameter settings than
VGPMK. The proposed detector works well on test sequences from an
inexpert signer in a non-studio setting with cluttered background.
%R 2009-010
%T Preferential Field Coverage Through Detour-Based Mobility Coordination
%A Morcos, Hany
%A Bestavros, Azer
%A Matta, Ibrahim
%D March 30, 2009
%U http://www.cs.bu.edu/techreports/2009-010-detour-coverage.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Controlling the mobility pattern of mobile nodes (e.g., robots) to
monitor a given field is a well-studied problem in sensor networks. In
this setup, absolute control over the nodes's mobility is
assumed. Apart from the physical ones, no other constraints are
imposed on planning mobility of these nodes. In this paper, we address
a more general version of the problem. Specifically, we consider a
setting in which mobility of each node is externally constrained by a
schedule consisting of a list of locations that the node must visit at
particular times. Typically, such schedules exhibit some level of
slack, which could be leveraged to achieve a specific coverage
distribution of a field. Such a distribution defines the relative
importance of different field locations. We define the Constrained
Mobility Coordination problem for Preferential Coverage (CMC-PC) as
follows: given a field with a desired monitoring distribution, and a
number of nodes n, each with its own schedule, we need to coordinate
the mobility of the nodes in order to achieve the following two goals:
1) satisfy the schedules of all nodes, and 2) attain the required
coverage of the given field. We show that the CMC-PC problem is
NP-complete (by reduction to the Hamiltonian Cycle problem). Then we
propose TFM, a distributed heuristic to achieve field coverage that is
as close as possible to the required coverage distribution. We verify
the premise of TFM using extensive simulations, as well as taxi logs
from a major metropolitan area. We compare TFM to the random mobility
strategy the latter provides a lower bound on performance. Our results
show that TFM is very successful in matching the required field
coverage distribution, and that it provides, at least, two-fold query
success ratio for queries that follow the target coverage distribution
of the field.
%R 2009-011
%T Seed Scheduling for Peer-to-Peer Networks
%A Esposito, Flavio
%A Matta, Ibrahim
%A Michiardi, Pietro
%A Mitsutake, Michiardi
%A Carra, Daminano
%D April 3, 2009
%U http://www.cs.bu.edu/techreports/2009-011-p2p-seed-scheduling.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The initial phase in a content distribution (file sharing) scenario
is a delicate phase due to the lack of global knowledge and the
dynamics of the overlay. An unwise distribution of the pieces in this
phase can cause delays in reaching steady state, thus increasing file
download times. We devise a scheduling algorithm at the seed (source
peer with full content), based on a proportional fair approach, and
we implement it on a real file sharing client [1]. In dynamic
overlays, our solution improves up to 25% the average downloading
time of a standard protocol ala BitTorrent
%R 2009-012
%T PreDA: Predicate Routing for DTN Architectures over MANET
%A Esposito, Flavio
%A Matta, Ibrahim
%D April 3, 2009
%U http://www.cs.bu.edu/techreports/2009-012-preda.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider a Delay Tolerant Network (DTN) whose users (nodes) are
connected by an underlying Mobile Ad hoc Network (MANET) substrate.
Users can declaratively express high-level policy constraints on how
"content" should be routed. For example, content can be directed through
an intermediary DTN node for the purposes of preprocessing,
authentication, etc., or content from a malicious MANET node can be
dropped. To support such content routing at the DTN level, we implement
Predicate Routing [1] where high-level constraints of DTN nodes are
mapped into low-level routing predicates within the MANET nodes. Our
testbed [2] uses a Linux system architecture with User Mode Linux [3] to
emulate every DTN node with a DTN Reference Implementation code [4]. In
our initial architecture prototype, we use the On Demand Distance Vector
(AODV) routing protocol at the MANET level. We use the network simulator
ns-2 (nsemulation version) to simulate the wireless connectivity of both
DTN and MANET nodes. Preliminary results show the efficient and correct
operation of propagating routing predicates. For the application of
content re-routing through an intermediary, as a side effect, results
demonstrate the performance benefit of content re-routing that
dynamically (on-demand) breaks the underlying end-to-end TCP connections
into shorter-length TCP connections.
%R 2009-013
%T Principles of Safe Policy Routing Dynamics
%A Epstein, Sam
%A Mattar, Karim
%A Matta, Ibrahim
%D April 21, 2009
%U http://www.cs.bu.edu/techreports/2009-013-safe-routing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We introduce the Dynamic Policy Routing (DPR) model that captures the
propagation of route updates under arbitrary changes in topology or
path preferences. DPR introduces the notion of causation chains where
the route flap at one node causes a flap at the next node along the
chain. Using DPR, we model the Gao-Rexford (economic) guidelines that
guarantee the safety (i.e., convergence) of policy routing. We
establish three principles of safe policy routing dynamics. The
non-interference principle provides insight into which ASes can
directly induce route changes in one another. The single cycle
principle and the multi-tiered cycle principle provide insight into
how cycles of routing updates can manifest in any network. We develop
INTERFERENCEBEAT, a distributed algorithm that propagates a small
token along causation chains to check adherence to these
principles. To enhance the diagnosis power of INTERFERENCEBEAT, we
model four violations of the Gao-Rexford guidelines (e.g., transiting
between peers) and characterize the resulting dynamics.
%R 2009-014
%T On the Performance and Robustness of Managing Reliable Transport Connections
%A Gursun, Gonca
%A Matta, Ibrahim
%A Mattar, Karim
%D April 21, 2009
%U http://www.cs.bu.edu/techreports/2009-014-reliable-conn-mgmt.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We revisit the problem of connection management for reliable
transport. At one extreme, a pure soft-state (SS) approach (as in
Delta-t \cite{Watson81}) safely removes the state of a connection at
the sender and receiver once the state timers expire without the need
for explicit removal messages. And new connections are established
without an explicit handshaking phase. On the other hand, a hybrid
hard-state/soft-state (HS+SS) approach (as in TCP) uses both explicit
handshaking as well as timer-based management of the connection's
state. In this paper, we consider the worst-case scenario of reliable
single-message communication, and develop a {\em common} analytical
model that can be instantiated to capture either the SS approach or
the HS+SS approach. We compare the two approaches in terms of goodput,
message and state overhead. We also use simulations to compare against
other approaches, and evaluate them in terms of correctness (with
respect to data loss and duplication) and robustness to bad network
conditions (high message loss rate and variable channel delays). Our
results show that the SS approach is more robust, and has lower
message overhead. On the other hand, SS requires more memory to keep
connection states, which reduces goodput. Given memories are getting
bigger and cheaper, SS presents the best choice over
bandwidth-constrained, error-prone networks.
%R 2009-015
%T Improving the accessibility of lightweight formal verification systems
%A Lapets, Andrei
%D April 30, 2009
%U http://www.cs.bu.edu/techreports/2009-015-formal-verification-accessibility.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In research areas involving mathematical rigor, there are numerous
benefits to adopting a formal representation of models and arguments:
reusability, automatic evaluation of examples, and verification of
consistency and correctness. However, broad accessibility has not been
a priority in the design of formal verification tools that can provide
these benefits. We propose a few design criteria to address these
issues: a simple, familiar, and conventional concrete syntax that is
independent of any environment, application, or verification strategy,
and the possibility of reducing workload and entry costs by employing
features selectively. We demonstrate the feasibility of satisfying
such criteria by presenting our own formal representation and
verification system. Our system's concrete syntax overlaps with
English, LaTeX, and MediaWiki markup wherever possible, and its
verifier relies on heuristic search techniques that make the formal
authoring process more manageable and consistent with prevailing
practices. We employ techniques and algorithms that ensure a simple,
uniform, and flexible definition and design for the system, so that it
easy to augment, extend, and improve.
%R 2009-016
%T Using Markets and Spam to Combat Malware (MA Thesis)
%A Zatko, Sarah
%D May, 11, 2009
%U http://www.cs.bu.edu/techreports/2009-016-MA-Thesis-Sarah-Zatko.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose an economic mechanism to reduce the incidence of malware that
delivers spam. Earlier research proposed attention markets as a solution
for unwanted messages, and showed they could provide more net benefit than
alternatives such as filtering and taxes. Because it uses a currency
system, Attention Bonds faces a challenge. Zombies, botnets, and various
forms of malware might steal valuable currency instead of stealing unused
CPU cycles. We resolve this problem by taking advantage of the fact that
the spambot problem has been reduced to financial fraud. As such, the
large body of existing work in that realm can be brought to bear. By
drawing an analogy between sending and spending, we show how a market
mechanism can detect and prevent spam malware. We prove that by using a
currency (i) each instance of spam increases the probability of detecting
infections, and (ii) the value of eradicating infections can justify
insuring users against fraud. This approach attacks spam at the source, a
virtue missing from filters that attack spam at the destination.
Additionally, the exchange of currency provides signals of interest that
can improve the targeting of ads. ISPs benefit from data management
services and consumers benefit from the higher average value of messages
they receive. We explore these and other secondary effects of attention
markets, and find them to offer, on the whole, attractive economic
benefits for all including consumers, advertisers, and the ISPs.
%R 2009-017
%T Object matching in distributed video surveillance systems by LDA-based appearance descriptors
%A Lo Presti, Liliana
%A Sclaroff, Stan
%A La Cascia, Marco
%D May 18, 2009
%U http://www.cs.bu.edu/techreports/2009-017-lda-based-matching.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Establishing correspondences among object instances is still challenging
in multi-camera surveillance systems, especially when the cameras' fields
of view are non-overlapping. Spatiotemporal constraints can help in
solving the correspondence problem but still leave a wide margin of
uncertainty. One way to reduce this uncertainty is to use appearance
information about the moving objects in the site. In this paper we present
the preliminary results of a new method that can capture salient
appearance characteristics at each camera node in the network. A Latent
Dirichlet Allocation (LDA) model is created and maintained at each node in
the camera network. Each object is encoded in terms of the LDA
bag-of-words model for appearance. The encoded appearance is then used to
establish probable matching across cameras. Preliminary experiments are
conducted on a dataset of 20 individuals and comparison against Madden's
I-MCHR is reported.
%R 2009-018
%T CSR: Constrained Selfish Routing in Ad-hoc Networks
%A Bassem, Christine
%A Bestavros, Azer
%D May 28, 2009
%U http://www.cs.bu.edu/techreports/2009-018-adhoc-selfish-routing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Routing protocols for ad-hoc networks assume that the nodes forming
the network are either under a single authority, or else that they
would be altruistically forwarding data for other nodes with no
expectation of a return. These assumptions are unrealistic since in
ad-hoc networks, nodes are likely to be autonomous and rational
(selfish), and thus unwilling to help unless they have an incentive to
do so. Providing such incentives is an important aspect that should be
considered when designing ad-hoc routing protocols. In this paper, we
propose a dynamic, decentralized routing protocol for ad-hoc networks
that provides incentives in the form of payments to intermediate nodes
used to forward data for others. In our Constrained Selfish Routing
(CSR) protocol, game-theoretic approaches are used to calculate
payments (incentives) that ensure both the truthfulness of
participating nodes and the fairness of the CSR protocol. We show
through simulations that CSR is an energy efficient protocol and that
it provides lower communication overhead in the best and average cases
compared to existing approaches.
%R 2009-019
%T On Finding Sensitivity of Quantum and Classical Gates
%A Bera, Debajyoti
%A Homer, Steve
%D June 5, 2009
%U http://www.cs.bu.edu/techreports/2009-019-quantum-sensitivity.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We consider a fault model of Boolean gates, both classical and quantum,
where some of the inputs may not be connected to the actual gate hardware.
This model is somewhat similar to the stuck-at model which is a very
popular model in testing Boolean circuits. We consider the problem of
detecting such faults; the detection algorithm can query the faulty gate
and its complexity is the number of such queries. This problem is related
to determining the sensitivity of Boolean functions.
We show how quantum parallelism can be used to detect such faults.
Specifically, we show that a quantum algorithm can detect such faults more
efficiently than a classical algorithm for a Parity gate and an AND gate.
We give explicit constructions of quantum detector algorithms and show
lower bounds for classical algorithms. We show that the model for
detecting such faults is similar to algebraic decision trees and extend
some known results from quantum query complexity to prove some of our
results.
%R 2009-020
%T On the Cost of Supporting Multihoming and Mobility
%A Ishakian, Vatche
%A Akinwumi, Joseph
%A Matta, Ibrahim
%D June 19, 2009
%U http://www.cs.bu.edu/techreports/2009-020-multihoming-mobility-cost.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
As the Internet has evolved and grown, an increasing number of nodes
(hosts or autonomous systems) have become multihomed, i.e., a node is
connected to more than one network. Mobility can be viewed as a
special case of multihoming---as a node moves, it unsubscribes from
one network and subscribes to another, which is akin to one interface
becoming inactive and another active. The current Internet
architecture has been facing significant challenges in effectively
dealing with multihoming (and consequently mobility). The Recursive
INternet Architecture (RINA) was recently proposed as a clean-slate
solution to the current problems of the Internet. In this paper, we
perform an average-case cost analysis to compare the multihoming /
mobility support of RINA, against that of other approaches such as
LISP and Mobile-IP. We also validate our analysis using trace-driven
simulation.
%R 2009-021
%T Assessing the Security of a Clean-Slate Internet Architecture
%A Boddapati, Gowtham
%A Day, John
%A Matta, Ibrahim
%A Chitkushev, Lou
%D June 22, 2009
%U http://www.cs.bu.edu/techreports/2009-021-clean-slate-internet-security.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The TCP/IP architecture was originally designed without taking
security measures into consideration. Over the years, it has been
subjected to many attacks, which has led to many patches to counter
them. Our investigations into the fundamental principles of networking
have shown that carefully following an abstract model of Interprocess
Communication (IPC) addresses many problems. Guided by this IPC
principle, we designed a clean-slate Recursive INternet Architecture
(RINA). In this paper, we show how, without the aid of cryptographic
techniques, the bare-bones architecture of RINA can resist most of the
security attacks faced by TCP/IP. We also show how hard it is for an
intruder to compromise RINA. Then, we show how RINA inherently
supports security policies in a more manageable, on-demand basis, in
contrast to the rigid, piecemeal approach of TCP/IP.
%R 2009-022
%T Learning A Family Of Detectors (PhD Thesis)
%A Yuan, Quan
%D June 30, 2009
%U http://www.cs.bu.edu/techreports/2009-022-PhD-Thesis-Quan-Yuan.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Object detection and recognition are important problems in computer
vision. The challenges of these problems come from the presence of
noise, background clutter, large within class variations of the object
class and limited training data. In addition, the computational
complexity in the recognition process is also a concern in
practice. In this thesis, we propose one approach to handle the
problem of detecting an object class that exhibits large within-class
variations, and a second approach to speed up the classification
processes. In the first approach, we show that foreground-background
Classification (detection) and within-class classification of the
foreground class (pose estimation) can be jointly solved in a
multiplicative form of two kernel functions. One kernel measures
similarity for foreground-background classification. The other kernel
accounts for latent factors that control within-class variation and
implicitly enables feature sharing among foreground training
samples. For applications where explicit parameterization of the
within-class states is unavailable, a nonparametric formulation of the
kernel can be constructed with a proper foreground distance/similarity
measure. Detector training is accomplished via standard Support Vector
Machine learning. The resulting detectors are tuned to specific
variations in the foreground class. They also serve to evaluate
hypotheses of the foreground state. When the foreground object masks
are provided in training, the detectors can also produce object
segmentation. Methods for generating a representative sample set of
detectors are proposed that can enable efficient detection and
tracking. In addition, because individual detectors verify hypotheses
of foreground state, they can also be incorporated in a
tracking-by-detection frame work to recover foreground state in image
sequences. To run the detectors efficiently at the online stage, an
input sensitive speedup strategy is proposed to select the most
relevant detectors quickly. The proposed approach is tested on data
sets of human hands, vehicles and human faces. In the second part of
the thesis, we formulate a filter-and-refine scheme to speed up
recognition processes. The binary outputs of the weak classifiers in a
boosted detector are used to identify a small number of candidate
foreground state hypotheses quickly via Hamming distance or weighted
Hamming distance. The approach is evaluated in three applications:
face recognition on the FRGC V2 data set, hand shape detection and
parameter estimation on a hand data set, and vehicle detection and
view angle estimation on a multi-view vehicle data set. On all data
sets, our approach is at least five times faster than simply
evaluating all foreground state hypotheses with virtually no loss in
classification accuracy.
%R 2009-023
%T Is a Detector Only Good for Detection?
%A Yuan, Quan
%A Sclaroff, Stan
%D July 12, 2009
%U http://www.cs.bu.edu/techreports/2009-023-what-detectors-are-good-for.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A common design of an object recognition system has two steps, a
detection step followed by a foreground within class classification
step. For example, consider face detection by a boosted cascade of
detectors followed by face ID recognition via one-vs-all (OVA)
classifiers. Another example is human detection followed by pose
recognition. Although the detection step can be quite fast, the
foreground within-class classification process can be slow and becomes
a bottleneck. In this work, we formulate a filter-and-refine scheme,
where the binary outputs of the weak classifiers in a boosted detector
are used to identify a small number of candidate foreground state
hypotheses quickly via Hamming distance or weighted Hamming
distance. The approach is evaluated in three applications: face
recognition on the FRGC V2 data set, hand shape detection and
parameter estimation on a hand data set and vehicle detection and view
angle estimation on a multi-view vehicle data set. On all data sets,
our approach has comparable accuracy and is at least five times faster
than the brute force approach.
%R 2009-024
%T Nearest-neighbor Queries in Probabilistic Graphs
%A Potamias, Michalis
%A Bonchi, Francesco
%A Gionis, Aristides
%A Kollios, George
%D July 14, 2009
%U http://www.cs.bu.edu/techreports/2009-024-probabilistic-graph-queries.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Large probabilistic graphs arise in various domains spanning from
social networks to biological and communication networks. An important
query in these graphs is the k nearest-neighbor query, which involves
finding and reporting the k closest nodes to a specific node. This
query assumes the existence of a measure of the proximity or the
distance between any two nodes in the graph. To that end, we propose
various novel distance functions that extend well known notions of
classical graph theory, such as shortest paths and random walks. We
argue that many meaningful distance functions are computationally
intractable to compute exactly. Thus, in order to process
nearest-neighbor queries, we resort to Monte Carlo sampling and
exploit novel graph-transformation ideas and pruning opportunities. In
our extensive experimental analysis, we explore the trade-offs of our
approximation algorithms and demonstrate that they scale well on
real-world probabilistic graphs with tens of millions of edges.
%R 2009-025
%T Trade and Cap: A Customer-Managed, Market-Based System for Trading Bandwidth Allowances at a Shared Link
%A Londono, Jorge
%A Bestavros, Azer
%A Laoutaris, Nikolaos
%D July 29, 2009
%U http://www.cs.bu.edu/techreports/2009-025-trade-and-cap.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose "Trade and Cap" (TC), an economics-inspired mechanism that
incentivizes users to voluntarily coordinate their consumption of the
bandwidth of a shared resource (e.g., a DSLAM link) so as to converge
on what they perceive to be an equitable allocation, while ensuring
efficient resource utilization. Under TC, rather than acting as an
arbiter, an ISP acts as an enforcer of what the community of rational
users sharing the resource decides is a fair allocation of that
resource. Our TC mechanism proceeds in two phases. In the first,
users engage in a strategic trading game in which each user agent
selfishly chooses bandwidth slots to reserve in support of primary,
interactive network usage activities. In the second phase, each user
is allowed to acquire additional bandwidth slots in support of
presumed open-ended need for fluid bandwidth, catering to secondary
applications. The acquisition of this fluid bandwidth is subject to
the remaining "buying power" of each user and by prevalent "market
prices" -- both of which are determined by the results of the trading
phase and a desirable aggregate cap on link utilization. We present
analytical results that establish the underpinnings of our TC
mechanism, including game-theoretic results pertaining to the trading
phase, and pricing of fluid bandwidth allocation pertaining to the
capping phase. Using real network traces, we present extensive
experimental results that demonstrate the benefits of our scheme,
which we also show to be practical by highlighting the salient
features of an efficient implementation architecture. While our focus
in this paper is on the rational coordination of the shared use of a
DSLAM link, we also establish the generality of our TC mechanism by
presenting a number of other direct applications, ranging from
coordination of energy-aware task schedules to coordination of ISP
uplink bandwidth consumption.
%R 2009-026
%T Angels: In-Network Support For Minimum Distribution Time in P2P Overlays (MA Thesis)
%A Sweha, Raymond
%D August 6, 2009
%U http://www.cs.bu.edu/techreports/2009-026-MA-Thesis-Raymond-Sweha.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This thesis proposes the use of in-network caches (which we call
Angels) to reduce the Minimum Distribution Time (MDT) of a file from a
seeder a node that possesses the file to a set of leechers nodes who
are interested in downloading the file. An Angel is not a leecher in
the sense that it is not interested in receiving the entire file, but
rather it is interested in minimizing the MDT to all leechers, and as
such uses its storage and up/down-link capacity to cache and forward
parts of the file to other peers. We extend the analytical results by
Kumar and Ross (Kumar and Ross, 2006) to account for the presence of
angels by deriving a new lower bound for the MDT. We show that this
newly derived lower bound is tight by proposing a distribution
strategy under assumptions of a fluid model. We present a GroupTree
heuristic that addresses the impracticalities of the fluid model. We
evaluate our designs through simulations that show that our GroupTree
heuristic outperforms other heuristics, that it scales well with the
increase of the number of leechers, and that it closely approaches the
optimal theoretical bounds.
%R 2009-027
%T Simultaneous Learning Of Non-Linear Manifold And Dynamical Models For High-Dimensional Time Series (PhD Thesis)
%A Li, Rui
%D August 10, 2009
%U http://www.cs.bu.edu/techreports/2009-027-PhD-Thesis-Rui-Li.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The goal of this work is to learn a parsimonious and informative
representation for highdimensional time series. Conceptually, this
comprises two distinct yet tightly coupled tasks: learning a
low-dimensional manifold and modeling the dynamical process. These two
tasks have a complementary relationship as the temporal constraints
provide valuable neighborhood information for dimensionality reduction
and conversely, the low-dimensional space allows dynamics to be learnt
efficiently. Solving these two tasks simultaneously allows important
information to be exchanged mutually. If nonlinear models are required
to capture the rich complexity of time series, then the learning
problem becomes harder as the nonlinearities in both tasks are
coupled. The proposed solution approximates the nonlinear manifold and
dynamics using piecewise linear models. The interactions among the
linear models are captured in a graphical model. The model structure
setup and parameter learning are done using a variational Bayesian
approach, which enables automatic Bayesian model structure selection,
hence solving the problem of over-fitting. By exploiting the model
structure, efficient inference and learning algorithms are obtained
without oversimplifying the model of the underlying dynamical process.
Evaluation of the proposed framework with competing approaches is
conducted in three sets of experiments: dimensionality reduction and
reconstruction using synthetic time series, video synthesis using a
dynamic texture database, and human motion synthesis, classification
and tracking on a benchmark data set. In all experiments, the proposed
approach provides superior performance.
%R 2009-028
%T Safe Compositional Network Sketches: Tool and Use Cases
%A Bestavros, Azer
%A Kfoury, Assaf
%A Lapets, Andrei
%A Ocean, Michael
%D October 1, 2009
%U http://www.cs.bu.edu/techreports/2009-028-netsketch-tool.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
NetSketch is a tool that enables the specification of network-flow
applications and the certification of desirable safety properties
imposed thereon. NetSketch is conceived to assist system integrators
in two types of activities: modeling and design. As a modeling tool,
it enables the abstraction of an existing system while retaining
sufficient information about it to enable future analysis of safety
properties. As a design tool, NetSketch enables the exploration of
alternative safe designs as well as the identification of minimal
requirements for outsourced subsystems. NetSketch embodies a
lightweight formal verification philosophy, whereby the power (but not
the heavy machinery) of a rigorous formalism is made accessible to
users via a friendly interface. NetSketch does so by exposing
tradeoffs between exactness of analysis and scalability, and by
combining traditional whole-system analysis with a more flexible
compositional analysis. The compositional analysis is based on a
strongly-typed Domain-Specific Language (DSL) for describing network
configurations at various levels of sketchiness along with invariants
that need to be enforced thereupon. In this paper, we overview
NetSketch, highlight its salient features, and illustrate how it could
be used in two applications: the management/shaping of traffic flows
in a vehicular network (as a proxy for CPS applications) and in a
streaming media network (as a proxy for Internet applications). In a
companion paper, we define the formal system underlying the operation
of NetSketch, in particular the DSL behind NetSketch's user-interface
when used in ``sketch mode'', and prove its soundness relative to
appropriately-defined notions of validity.
%R 2009-029
%T Safe Compositional Network Sketches: The Formal Framework
%A Bestavros, Azer
%A Kfoury, Assaf
%A Lapets, Andrei
%A Ocean, Michael
%D October 1, 2009
%U http://www.cs.bu.edu/techreports/2009-029-netsketch-formalism.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
NetSketch is a tool for the specification of constrained-flow
applications and the certification of desirable safety
properties imposed thereon. NetSketch is conceived to assist
system integrators in two types of activities: modeling and
design. As a modeling tool, it enables the abstraction of an
existing system while retaining sufficient information about it
to carry out future analysis of safety properties. As a design
tool, NetSketch enables the exploration of alternative safe
designs as well as the identification of minimal requirements
for outsourced subsystems. NetSketch embodies a lightweight
formal verification philosophy, whereby the power (but not the
heavy machinery) of a rigorous formalism is made accessible to
users via a friendly interface. NetSketch does so by exposing
tradeoffs between exactness of analysis and scalability, and by
combining traditional whole-system analysis with a more flexible
compositional analysis. The compositional analysis is based on a
strongly-typed Domain-Specific Language (DSL) for describing and
reasoning about constrained-flow networks at various levels of
sketchiness along with invariants that need to be enforced
thereupon. In this paper, we define the formal system
underlying the operation of NetSketch, in particular the DSL
behind NetSketch's user-interface when used in ``sketch mode'',
and prove its soundness relative to appropriately-defined
notions of validity. In a companion paper [BUCS-TR-2009-028],
we overview NetSketch, highlight its salient features, and
illustrate how it could be used in two applications: the
management/shaping of traffic flows in a vehicular network (as a
proxy for CPS applications) and in a streaming media network (as
a proxy for Internet applications).
%R 2009-030
%T Verification with Natural Contexts: Soundness of Safe Compositional Network Sketches
%A Lapets, Andrei
%A Kfoury, Assaf
%D October 1, 2009
%U http://www.cs.bu.edu/techreports/2009-030-verified-netsketch-soundness.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In research areas involving mathematical rigor, there are numerous
benefits to adopting a formal representation of models and
arguments: reusability, automatic evaluation of examples, and
verification of consistency and correctness. However,
accessibility has not been a priority in the design of formal
verification tools that can provide these benefits. In earlier
work [BUCS-TR-2009-015] we attempt to address this broad problem
by proposing several specific design criteria organized around the
notion of a natural context: the sphere of awareness a working
human user maintains of the relevant constructs, arguments,
experiences, and background materials necessary to accomplish the
task at hand. In this report we evaluate our proposed design
criteria by utilizing within the context of novel research a
formal reasoning system that is designed according to these
criteria. In particular, we consider how the design and
capabilities of the formal reasoning system that we employ
influence, aid, or hinder our ability to accomplish a formal
reasoning task -- the assembly of a machine-verifiable proof
pertaining to the NetSketch formalism. NetSketch is a tool for
the specification of constrained-flow applications and the
certification of desirable safety properties imposed
thereon. NetSketch is conceived to assist system integrators in
two types of activities: modeling and design. It provides
capabilities for compositional analysis based on a
strongly-typed domain-specific language (DSL) for describing and
reasoning about constrained-flow networks and invariants that
need to be enforced thereupon. In a companion paper
[BUCS-TR-2009-028] we overview NetSketch, highlight its salient
features, and illustrate how it could be used in actual
applications. In this paper, we define using a machine-readable
syntax major parts of the formal system underlying the operation
of NetSketch, along with its semantics and a corresponding
notion of validity. We then provide a proof of soundness for the
formalism that can be partially verified using a lightweight
formal reasoning system that simulates natural contexts. A
traditional presentation of these definitions and arguments can
be found in the full report on the NetSketch formalism
[BUCS-TR-2009-029].
%R 2009-031
%T Adaptive Weighing Designs for Keyword Value Computation
%A Byers, John
%A Mitzenmacher, Michael
%A Zervas, Georgios
%D October 10, 2009
%U http://www.cs.bu.edu/techreports/2009-031-channelization.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Attributing a dollar value to a keyword is an essential part of
running any profitable search engine advertising campaign. When an
advertiser has complete control over the interaction with and
monetization of each user arriving on a given keyword, the value of
that term can be accurately tracked. However, in many instances, the
advertiser may monetize arrivals indirectly through one or more third
parties. In such cases, it is typical for the third party to provide
only coarse-grained reporting: rather than report each monetization
event, users are aggregated into larger channels and the third party
reports aggregate information such as total daily revenue for each
channel. Examples of third parties that use channels include Amazon
and Google AdSense. In such scenarios, the number of channels is
generally much smaller than the number of keywords whose value per
click (VPC) we wish to learn. However, the advertiser has flexibility
as to how to assign keywords to channels over time. We introduce the
channelization problem: how do we adaptively assign keywords to
channels over the course of multiple days to quickly obtain accurate
VPC estimates of all keywords? We relate this problem to classical
results in weighing design, devise new adaptive algorithms for this
problem, and quantify the performance of these algorithms
experimentally. Our results demonstrate that adaptive weighing designs
that exploit statistics of term frequency, variability in VPCs across
keywords, and exible channel assignments over time provide the best
estimators of keyword VPCs.
%R 2009-032
%T Lightweight Formal Verification in Classroom Instruction of Reasoning about Functional Code
%A Lapets, Andrei
%D November 6, 2009
%U http://www.cs.bu.edu/techreports/2009-032-classroom-verification-functional.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In college courses dealing with material that requires mathematical rigor,
the adoption of a machine-readable representation for formal arguments can
be advantageous. Students can focus on a specific collection of constructs
that are represented consistently. Examples and counterexamples can be
evaluated. Assignments can be assembled and checked with the help of an
automated formal reasoning system. However, usability and accessibility do
not have a high priority and are not addressed sufficiently well in the
design of many existing machine-readable representations and corresponding
formal reasoning systems. In earlier work [BUCS-TR-2009-015], we attempt to
address this broad problem by proposing several specific design criteria
organized around the notion of a natural context: the sphere of awareness a
working human user maintains of the relevant constructs, arguments,
experiences, and background materials necessary to accomplish the task at
hand. We report on our attempt to evaluate our proposed design criteria by
deploying within the classroom a lightweight formal verification system
designed according to these criteria. The lightweight formal verification
system was used within the instruction of a common application of formal
reasoning: proving by induction formal propositions about functional code.
We present all of the formal reasoning examples and assignments considered
during this deployment, most of which are drawn directly from an
introductory text on functional programming. We demonstrate how the design
of the system improves the effectiveness and understandability of the
examples, and how it aids in the instruction of basic formal reasoning
techniques. We make brief remarks about the practical and administrative
implications of the system's design from the perspectives of the student,
the instructor, and the grader.
%R 2009-033
%T Efficient Support for Common Relations in Lightweight Formal Reasoning Systems
%A Lapets, Andrei
%A House, David
%D November 6, 2009
%U http://www.cs.bu.edu/techreports/2009-033-efficient-verifier-relations.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In work that involves mathematical rigor, there are numerous benefits to
adopting a representation of models and arguments that can be supplied to a
formal reasoning or verification system: reusability, automatic evaluation
of examples, and verification of consistency and correctness. However,
accessibility has not been a priority in the design of formal verification
tools that can provide these benefits. In earlier work [BUCS-TR-2009-015],
we attempt to address this broad problem by proposing several specific
design criteria organized around the notion of a natural context: the sphere
of awareness a working human user maintains of the relevant constructs,
arguments, experiences, and background materials necessary to accomplish the
task at hand. This work expands one aspect of our earlier work by further
developing an essential capability for any formal reasoning system whose
design is oriented around simulating the natural context: native support for
a collection of mathematical relations that deal with common constructs in
arithmetic and set theory. We provide a formal definition for a context of
relations that can be used to both validate and assist formal reasoning
activities. We provide a proof that any data structure that faithfully
implements this formal notion has an update algorithm that necessarily
converges. Finally, we present and prove the efficiency of an implementation
of such a data structure that leverages modular implementations of two
existing general-purpose data structures: balanced search trees and
transitive closures of hypergraphs.
%R 2009-034
%T GreenCoop: Cooperative Green Routing with Energy-efficient Servers
%A Chiaraviglio, Luca
%A Matta, Ibrahim
%D November 24, 2009
%U http://www.cs.bu.edu/techreports/2009-034-greencoop.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Energy-efficient communication has recently become a key
challenge for both researchers and industries. In this paper, we
propose a new model in which a Content Provider and an Internet
Service Provider cooperate to reduce the total power
consumption. We solve the problem optimally and compare it with
a classic formulation, whose aim is to minimize user
delay. Results, although preliminary, show that power savings
can be huge: up to 71% on real ISP topologies. We also show how
the degree of cooperation impacts overall power
consumption. Finally, we consider the impact of the Content
Provider location on the total power savings.
%R 2009-035
%T Lightweight Formal Methods for the Development of High-Assurance Networking Systems
%A Kfoury, Assaf
%D December 1, 2009
%U http://www.cs.bu.edu/techreports/2009-035-lightweight-formal-methods.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We survey several of the research efforts pursued by the iBench and
snBench projects in the CS Department at Boston University over the
last half dozen years. These activities use ideas and methodologies
inspired by recent developments in other parts of computer science --
particularly in formal methods and in the foundations of programming
languages -- but now specifically applied to the certification of
safety-critical networking systems. This is research jointly led by
Azer Bestavros and Assaf Kfoury with the participation of Adam
Bradley, Andrei Lapets, and Michael Ocean.
%R 2009-036
%T Modeling the Java Bytecode Verifier
%A Reynolds, Mark
%D December 30, 2009
%U http://www.cs.bu.edu/techreports/2009-036-bytecode-verifier.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The Java programming language has been widely described as secure by
design. Nevertheless, a number of serious security vulnerabilities
have been discovered in Java, particularly in the Bytecode Verifier, a
critical component used to verify class semantics before loading is
complete. This paper describes a method for representing Java security
constraints using the Alloy modeling language. It further describes a
system for performing a security analysis on any block of Java
bytecodes by converting the bytes into relation initializers in Alloy.
Any counterexamples found by the Alloy analyzer correspond directly to
insecure code. Analysis of the approach in the context of known
security exploits is provided. This type of analysis represents a
significant departure from standard malware analysis methods based on
signatures or anomaly detection.
%R 2010-001
%T Safe Compositional Network Sketches: Reasoning with Automated Assistance
%A Lapets, Andrei
%A Kfoury, Assaf
%A Bestavros, Azer
%D January 19, 2010
%U http://www.cs.bu.edu/techreports/2010-001-netsketch-aa.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
NetSketch is a tool for the specification of constrained-flow
networks (CFNs) and the certification of desirable safety
properties imposed thereon, conceived to assist system
integrators in modeling and design. It provides compositional
analysis capabilities based on a strongly-typed domain-specific
language (DSL) for describing and reasoning about CFNs and
relevant invariants. Users can model or design individual
network components and perform manual or automated whole-system
analysis of the properties thereof. Users can also assemble many
instances of these components into larger networks, relying on
NetSketch's less precise but more tractable compositional
analysis capabilities. This ability to trade ``precision of
analysis" for ``feasibility of analysis" according to available
resources is among the novel features of NetSketch. In earlier
work we illustrated how NetSketch is applied to actual domains,
and provided a formal definition of its underlying formalism.
While the NetSketch DSL provides automatic compositional
analysis capabilities for modeling and designing entire
networks, users may need to employ a wider variety of tools and
techniques when modeling and designing individual network
components. These can include common tools for reasoning about
systems of constraints of various classes (such as linear
constraints, quadratic constraints, and so on), as well as
logical systems and ontologies that deal with concepts relevant
to the application domain. We integrate the "aartifact"
lightweight automated assistant for formal reasoning (which has
also been applied in proving the soundness of the NetSketch
formalism) as a tool for modeling and designing individual
network components. We present several use cases within an
example application of the NetSketch DSL to demonstrate how the
automated assistant provides NetSketch users with both an
interface for reasoning formally about constraints, and a
straightforward way to implicitly employ a rich domain-specific
ontology of logical propositions. This allows users to verify
common properties of constraints and constraint sets, and to
reason about constraint relationships using automatically
verifiable algebraic manipulations.
%R 2010-002
%T A Type-Theoretic Framework for Efficient and Safe Colocation of Periodic Real-time Systems
%A Ishakian, Vatche
%A Bestavros, Azer
%A Kfoury, Assaf
%D January 24, 2010
%U http://www.cs.bu.edu/techreports/2010-002-safecolocation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Desirable application performance is typically guaranteed
through the use of Service Level Agreements (SLAs) that specify
fixed fractions of resource capacities that must be allocated
for {\em unencumbered} use by the application. The mapping
between what constitutes desirable performance and SLAs is not
unique: {\em multiple} SLA expressions might be functionally
equivalent. Having the flexibility to transform SLAs from one
form to another in a manner that is provably {\em safe} would
enable hosting solutions to achieve significant
efficiencies. This paper demonstrates the promise of such an
approach by proposing a {\em type-theoretic} framework for the
representation and safe transformation of SLAs. Based on that
framework, the paper describes a methodical approach for the
inference of efficient and safe mappings of periodic, real-time
tasks to the physical and virtual hosts that constitute a
hierarchical scheduler. Extensive experimental results support
the conclusion that the flexibility afforded by safe SLA
transformations has the potential to yield significant savings.
%R 2010-003
%T Colocation as a Service: Strategic and Operational Services for Cloud Colocation
%A Ishakian, Vatche
%A Sweha, Raymond
%A Londono, Jorge
%A Bestavros, Azer
%D March 1, 2010
%U http://www.cs.bu.edu/techreports/2010-003-caas.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
By colocating with other tenants of an Infrastructure as a Service
(IaaS) offering, IaaS users could reap significant cost savings by
judiciously sharing their use of the fixed-size instances offered by
IaaS providers. This paper presents the blueprints of a Colocation as
a Service (CaaS) framework. CaaS strategic services identify
coalitions of self-interested users that would benefit from colocation
on shared instances. CaaS operational services provide the information
necessary for, and carry out the reconfigurations mandated by
strategic services. CaaS could be incorporated into an IaaS offering
by providers; it could be implemented as a value-added proposition by
IaaS resellers; or it could be directly leveraged in a peer-to-peer
fashion by IaaS users. To establish the practicality of such
offerings, this paper presents XCS -- a prototype implementation of
CaaS on top of the Xen hypervisor. XCS makes specific choices with
respect to the various elements of the CaaS framework: it implements
strategic services based on a game-theoretic formulation of
colocation; it features novel concurrent migration heuristics which
are shown to be efficient; and it offers monitoring and accounting
services at both the hypervisor and VM layers. Extensive experimental
results obtained by running PlanetLab trace-driven workloads on the
XCS prototype confirm the premise of CaaS -- by demonstrating the
efficiency and scalability of XCS, and by quantifying the potential
cost savings accrued through the use of XCS.
%R 2010-004
%T The NP-completeness of the Restricted Stable Paths Problem with Three Aggregating Functions
%A Lapets, Andrei
%A Kfoury, Assaf
%D March 14, 2010
%U http://www.cs.bu.edu/techreports/2010-004-spp-three-aggregate.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Interdomain routing on the internet is performed using route preference
policies specified independently and arbitrarily by each autonomous system
(AS) in the network. These policies are used in the border gateway protocol
(BGP) by each AS when selecting next-hop choices for routes to each
destination. Conflicts between policies used by different ASs can lead to
routing instabilities that, potentially, cannot be resolved regardless of
how long BGP runs. The stable paths problem (SPP) is an abstract graph
theoretic model of the problem of selecting next-hop routes for a
destination. A solution to this problem is a set of next-hop choices, one
for each AS, that is compatible with the policies of each AS. In a stable
solution each AS has selected its best next-hop if the next-hop choices of
all neighbors are fixed. BGP can be viewed as a distributed algorithm for
finding a stable solution to an SPP instance. In this report we consider a
particular restricted variant of SPP, which we call (f,g,h)-SPP, in which
there exist three or more different node policies based on aggregation of
edge weights. We show that this variant is NP-complete.
%R 2010-005
%T The complexity of natural extensions of efficiently solvable problems
%A Lapets, Andrei
%D March 15, 2010
%U http://www.cs.bu.edu/techreports/2010-005-complexity-extensions.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A problem is in the class NP when it is possible to compute in
polynomial time that a given solution corresponds to a given
problem instance. Those problems for which it is possible to
compute in polynomial time a solution for any problem instance are
also in the class P. We consider a natural conjunction
operation for problems that can be computed in polynomial time. We
introduce the notion of an ``abundant" problem in P, and
specify conditions under which the conjunction of two abundant
problems in P produces a problem that is NP-complete. We
discuss how this is related to multi-dimensional variants of
common, efficiently computable graph problems.
%R 2010-006
%T The Complexity of Restricted Variants of the Stable Paths Problem
%A Donnelly, Kevin
%A Kfoury, Assaf
%A Lapets, Andrei
%D March 15, 2010
%U http://www.cs.bu.edu/techreports/2010-006-spp-two-monotonic.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Interdomain routing on the Internet is performed using route preference
policies specified independently and arbitrarily by each autonomous system
(AS) in the network. These policies are used in the border gateway protocol
(BGP) by each AS when selecting next-hop choices for routes to each
destination. Conflicts between policies used by different ASs can lead to
routing instabilities that, potentially, cannot be resolved regardless of
how long BGP runs.
The stable paths problem (SPP) is an abstract graph theoretic model of the
problem of selecting next-hop routes for a destination. A solution to this
problem is a set of next-hop choices, one for each AS, that is compatible
with the policies of each AS. In a stable solution each AS has selected its
best next-hop if the next-hop choices of all neighbors are fixed. BGP can be
viewed as a distributed algorithm for solving an SPP instance.
In this report we consider a family of restricted variants of SPP, which we
call f-SPP. We show that several natural variants of f-SPP are NP-complete.
This includes a variant in which each AS is restricted to one of only two
policies, and each of these two policies is based on a monotonic path weight
aggregation function. Furthermore, we show that for networks with particular
topologies and edge weight distributions, there exist efficient centralized
algorithms for solving f-SPP.
%R 2010-007
%T Fast Globally Optimal 2D Human Detection with Loopy Graph Models
%A Tian, Tai-Peng
%A Sclaroff, Stan
%D March 31, 2010
%U http://www.cs.bu.edu/techreports/2010-007-detection-with-loopy-graphs.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper presents an algorithm for recovering the globally optimal 2D human
figure detection using a loopy graph model. This is computationally
challenging because the time complexity scales exponentially in the size of
the largest clique in the graph. The proposed algorithm uses Branch and
Bound (BB) to search for the globally optimal solution. The algorithm
converges rapidly in practice and this is due to a novel method for quickly
computing tree based lower bounds. The key idea is to recycle the dynamic
programming (DP) tables associated with the tree model to look up the tree
based lower bound rather than recomputing the lower bound from scratch. This
technique is further sped up using Range Minimum Query data structures to
provide $O(1)$ cost for computing the lower bound for most iterations of the
BB algorithm. The algorithm is evaluated on the Iterative Parsing dataset
and it is shown to run fast empirically.
%R 2010-008
%T A Green Distributed Cooperation for Network and Content Management
%A Chiaraviglio, Luca
%A Matta, Ibrahim
%D March 31, 2010
%U http://www.cs.bu.edu/techreports/2010-008-distgreencoop.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose a distributed approach in which an Internet Service
Provider (ISP) and a Content Provider (CP) \textit{cooperate} to
minimize total power consumption. Our solution is distributed
between the ISP and the CP to limit shared information, such as
network topology and servers load. In particular, we develop
different algorithms adopting dual decomposition and Benders
decomposition techniques. We investigate the performance of the
proposed solutions on realistic case-studies. We compare our
algorithms with a centralized model, whose aim is to minimize
total power consumption. We first adopt convex functions to
model power consumption of devices: all the distributed
algorithms find optimal solutions in this scenario. We then
introduce the possibility of powering off devices. Results show
that in this case the distributed algorithms are close to the
optimal solution, with a power efficiency loss less than 18%.
For the proposed algorithms we speculate on the trade-off
between the complexity of cooperation and that of the
implementation. In particular, with the dual decomposition
approach only the Lagrange multipliers associated with the
traffic demands and users delay are shared between the ISP and
CP, but a real implementation requires a trusted third-party
server and careful tuning of parameters. On the contrary, with
a Benders decomposition technique both the traffic demands and
the ISP power consumption need to be shared, but this
information is exchanged directly. Moreover, the parameters are
easy to set, but the computational time grows linearly with the
number of iterations. Finally, we investigate improvements to
balance the power savings between the ISP and the CP.
%R 2010-009
%T On the Detection of Policy Conflicts in Interdomain Routing
%A Mattar, Karim
%A Epstein, Samuel
%A Matta, Ibrahim
%D April 27, 2010
%U http://www.cs.bu.edu/techreports/2010-009-conflict-detection.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The dynamic policy routing model (DPR) was recently introduced
to explicitly model the dynamics of policy routing. DPR extends
the formalism of the stable paths problem with discrete
synchronous time to capture the propagation of path changes in
any dynamic network using a structure called the causation
chain. In this work, we extend DPR by introducing several novel
structures, namely, causation fences and policy digraphs that
provide further insight into how the dynamics of policy routing
manifest in the network. Using our extensions to DPR, we solve a
fundamental problem: policy conflict detection. We show how the
root cause of any cycle of routing update messages, under any
routing policy configuration, can be precisely inferred as
either a transient route flap or a policy conflict. We also
develop SafetyPulse, a token-based distributed algorithm to
detect policy conflicts in any dynamic network. SafetyPulse has
several novel characteristics, namely, it is privacy preserving,
computationally efficient and provably correct.
%R 2010-010
%T User-friendly Support for Common Concepts in a Lightweight Verifier
%A Lapets, Andrei
%D May 14, 2010
%U http://www.cs.bu.edu/techreports/2010-010-aartifact-discussion.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Machine verification of formal arguments can only increase our confidence in
the correctness of those arguments, but the costs of employing machine
verification still outweigh the benefits for some common kinds of formal
reasoning activities. As a result, usability is becoming increasingly
important in the design of formal verification tools. We describe the
``aartifact" lightweight verification system, designed for processing formal
arguments involving basic, ubiquitous mathematical concepts. The system is a
prototype for investigating potential techniques for improving the usability
of formal verification systems. It leverages techniques drawn both from
existing work and from our own efforts. In addition to a parser for a
familiar concrete syntax and a mechanism for automated syntax lookup, the
system integrates (1) a basic logical inference algorithm, (2) a database of
propositions governing common mathematical concepts, and (3) a data
structure that computes congruence closures of expressions involving
relations found in this database. Together, these components allow the
system to better accommodate the expectations of users interested in
verifying formal arguments involving algebraic and logical manipulations of
numbers, sets, vectors, and related operators and predicates. We demonstrate
the reasonable performance of this system on typical formal arguments and
briefly discuss how the system's design contributed to its usability in two
case studies.
%R 2010-011
%T A User-friendly Interface for a Lightweight Verification System
%A Lapets, Andrei
%A Kfoury, Assaf
%D May 14, 2010
%U http://www.cs.bu.edu/techreports/2010-011-aartifact-interface.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
User-friendly interfaces can play an important role in bringing the benefits
of a machine-readable representation of formal arguments to a wider
audience. The ``aartifact" system is an easy-to-use lightweight verifier for
formal arguments that involve logical and algebraic manipulations of common
mathematical concepts. The system provides validation capabilities by
utilizing a database of propositions governing common mathematical concepts.
The ``aartifact" system's multi-faceted interactive user interface combines
several approaches to user-friendly interface design: (1) a familiar and
natural syntax based on existing conventions in mathematical practice, (2) a
real-time keyword-based lookup mechanism for interactive, context-sensitive
discovery of the syntactic idioms and semantic concepts found in the
system's database of propositions, and (3) immediate validation feedback in
the form of reformatted raw input. The system's natural syntax and database
of propositions allow it to meet a user's expectations in the formal
reasoning scenarios for which it is intended. The real-time keyword-based
lookup mechanism and validation feedback allow the system to teach the user
about its capabilities and limitations in an immediate, interactive, and
context-aware manner.
%R 2010-012
%T Ontology Support for a Lightweight Formal Verification System
%A Lapets, Andrei
%A Lalwani, Prakash
%A Kfoury, Assaf
%D May 14, 2010
%U http://www.cs.bu.edu/techreports/2010-012-aartifact-ontology.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The usability of verification systems is becoming increasingly
important, and the effective integration of ontologies of formal
facts (definitions, propositions, and syntactic idioms) into
machine verification systems will likely play a role in improving
the usability of such systems. The ``aartifact" lightweight
verification system utilizes an ontology of formal propositions in
order to support lightweight verification of formal arguments that
involve common mathematical concepts. The ontology is stored
within a relational database, and can be assembled and extended
using a simple web interface by contributors who are domain
experts. The database can be compiled into two separate components
of the ``aartifact" system: a verifier component that computes
congruence closures of expressions containing relations and
predicates found in the ontology, and a JavaScript application
that interactively presents to users information about the
constants, operators, relations, predicates, syntactic constructs,
and idioms found in the ontology (and, thus, supported by the
verifier). In this way, the database serves to improve both the
verification system's capacity to infer implicit applications of
logical propositions within a user's formal argument, and to
inform users in a context-aware and structured manner of the
verification system's capabilities and limitations.
%R 2010-014
%T On the Universal Generation of Mobility Models
%A Medina, Alberto
%A Gursun, Gonca
%A Basu, Prithwish
%A Matta, Ibrahim
%D May 14, 2010
%U http://www.cs.bu.edu/techreports/2010-014-ummf.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Mobility models have traditionally been tailored to specific
application domains such as human, military, or ad hoc
transportation scenarios. This tailored approach often renders a
mobility model useless when the application domain
changes. Furthermore, the failure to adapt the mobility model to
accurately match the new domain naturally leads to wrong
conclusions about the performance of protocols and applications
running atop. In this paper, we propose a mobility modeling
framework based on the observation that the mobility
characteristics of most mobility-based applications can be
captured in terms of a few fundamental factors: (1) Targets; (2)
Obstacles; (3) Dynamic Events; (4) Navigation; (5) Steering
behaviors; and (6) Dynamic Behaviors. We have designed and
implemented a Universal Mobility Modeling Framework (UMMF),
which enables the instantiation of a mobility model from a wide
universe of possibilities defined by the aforementioned factors.
We describe the mapping from application-domain-specifics to
UMMF elements, demonstrating the power and flexibility of our
approach by capturing representative mobility models with good
accuracy in terms of a large number of topological metrics. We
also describe several specific mobility scenarios and their
UMMF-based model representations.
%R 2010-015
%T Online Cache Modeling for Commodity Multicore Processors
%A West, Rich
%A Zaroo, Puneet
%A Waldspurger, Carl
%A Zhang, Xiao
%D July 2, 2010
%U http://www.cs.bu.edu/techreports/2010-015-multicore-online-cache-modeling.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Modern chip-level multiprocessors (CMPs) contain multiple
processor cores sharing a common last-level cache, memory
interconnects, and other hardware resources. Workloads running
on separate cores compete for these resources, often resulting
in highly-variable performance. It is generally desirable to
co-schedule workloads that have minimal resource contention, in
order to improve both performance and fairness. Unfortunately,
commodity processors expose only limited information about the
state of shared resources such as caches to the software
responsible for scheduling workloads that execute concurrently.
To make informed resource-management decisions, it is important
to obtain accurate measurements of per-workload cache
occupancies and their impact on performance, often summarized by
utility functions such as miss-ratio curves (MRCs). In this
paper, we first introduce an efficient online technique for
estimating the cache occupancy of individual software threads
using only commonly-available hardware performance counters. We
derive an analytical model as the basis of our occupancy
estimation, and extend it for improved accuracy on modern cache
configurations, considering the impact of set-associativity,
line replacement policy, and memory locality effects. We
demonstrate the effectiveness of occupancy estimation with a
series of CMP simulations in which SPEC benchmarks execute
concurrently on multiple cores. Leveraging our occupancy
estimation technique, we also introduce a lightweight approach
for online MRC construction, and demonstrate its effectiveness
using a prototype implementation in the VMware ESX Server
hypervisor. We present a series of experiments involving SPEC
benchmarks, comparing the MRCs we construct online with MRCs
generated offline in which various cache sizes are enforced via
static page coloring.
%R 2010-016
%T Fast Multi-Aspect 2D Human Detection
%A Tian, Tai-Peng
%A Sclaroff, Stan
%D July 2, 2010
%U http://www.cs.bu.edu/techreports/2010-016-multiaspect-2d-detection.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We address the problem of detecting human figures in images, taking
into account that the image of the human figure may be taken from a
range of viewpoints. We capture the geometric deformations of the 2D
human figure using an extension of the Common Factor Model (CFM) of
Lan and Huttenlocher. The key contribution of the paper is an improved
iterative message passing inference algorithm that runs faster than
the original CFM algorithm. This is based on the insight that
messages created using the distance transform are shift invariant and
therefore messages can be created once and then shifted for subsequent
iterations. Since shifting ($O(1)$ complexity) is faster than
computing a distance transform ($O(n)$ complexity), a significant
speedup is observed in the experiments. We demonstrate the
effectiveness of the new model for the human parsing problem using the
Iterative Parsing data set and results are competitive with the state
of the art detection algorithm of Andriluka, et al.
%R 2010-017
%T Learning Actions From the Web
%A Ikizler-Cinbis, Nazli
%A Cinbis, Gokberk
%A Sclaroff, Stan
%D July 6, 2010
%U http://www.cs.bu.edu/techreports/2010-017-learning-actions-from-web.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper proposes a generic method for action recognition in
uncontrolled videos. The idea is to use images collected from
the Web to learn representations of actions and use this
knowledge to automatically annotate actions in videos. Our
approach is unsupervised in the sense that it requires no human
intervention other than the text querying. Its benefits are
two-fold: 1) we can improve retrieval of action images, and 2)
we can collect a large generic database of action poses, which
can then be used in tagging videos. We present experimental
evidence that using action images collected from the Web,
annotating actions is possible.
%R 2010-018
%T Object Recognition and Localization via Spatial Instance Embedding
%A Ikizler-Cinbis, Nazli
%A Sclaroff, Stan
%D July 6, 2010
%U http://www.cs.bu.edu/techreports/2010-018-spatial-instance-embedding.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose an approach for improving object recognition and
localization using spatial kernels together with instance
embedding. Our approach treats each image as a bag of instances
(image features) within a multiple instance learning framework,
where the relative locations of the instances are considered as
well as the appearance similarity of the localized image
features. The introduced spatial kernel augments the recognition
power of the instance embedding in an intuitive and effective
way, providing increased localization performance. We test our
approach over two object datasets and present promising results.
%R 2010-019
%T Object, Scene and Actions: Combining Multiple Features for Human Action Recognition
%A Ikizler-Cinbis, Nazli
%A Sclaroff, Stan
%D July 6, 2010
%U http://www.cs.bu.edu/techreports/2010-019-combining-features.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In many cases, human actions can be identified not only by the
singular observation of the human body in motion, but also
properties of the surrounding scene and the related objects. In
this paper, we look into this problem and propose an approach
for human action recognition that integrates multiple feature
channels from several entities such as objects, scenes and
people. We formulate the problem in a multiple instance learning
(MIL) framework, based on multiple feature channels. By using a
discriminative approach, we join multiple feature channels
embedded to the MIL space. Our experiments over the large
YouTube dataset show that scene and object information can be
used to complement person features for human action recognition.
%R 2010-020
%T Embedding Games: Distributed Resource Management with Selfish Users (PhD Thesis)
%A Londono, Jorge
%D July 20, 2010
%U http://www.cs.bu.edu/techreports/2010-020-PhD-Thesis-Jorge-Londono.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Large scale distributed computing infrastructures pose
challenging resource management problems, which could be
addressed by adopting one of two perspectives. On the one hand,
the problem could be framed as a global optimization that aims
to minimize some notion of system-wide (social) cost. On the
other hand, the problem could be framed in a game-theoretic
setting whereby rational, selfish users compete for a share of
the resources so as to maximize their private utilities with
little or no regard for system-wide objectives. This
game-theoretic setting is particularly applicable to emerging
cloud and grid environments, testbed platforms, and many
networking applications.
By adopting the first, global optimization perspective, this
thesis presents NetEmbed: a framework, associated mechanisms,
and implementations that enable the mapping of requested
configurations to available infrastructure resources.
By adopting the second, game-theoretic perspective, this thesis
defines and establishes the premises of two resource acquisition
mechanisms: Colocation Games and Trade and Cap. Colocation Games
enable the modeling and analysis of the dynamics that result
when rational, selfish parties interact in an attempt to
minimize the individual costs they incur to secure shared
resources necessary to support their application QoS or SLA
requirements. Trade and Cap is a market-based scheduling and
load-balancing mechanism that facilitates the trading of
resources when users have a mixture of rigid and fluid jobs, and
incentivizes users to behave in ways that result in better
load-balancing of shared resources.
In addition to developing their analytical underpinnings, this
thesis establishes the viability of NetEmbed, Colocation Games,
and Trade and Cap by presenting implementation blueprints and
experimental results for many variants of these mechanisms.
The results presented in this thesis pave the way for the
development of economically-sound resource acquisition and
management solutions in two emerging, and increasingly important
settings. In pay-as-you-go settings, where pricing is based on
usage, this thesis anticipates new service offerings that enable
efficient marketplaces in the presence of non-cooperative,
selfish agents. In settings where pricing is not a function of
usage, this thesis anticipates the development of service
offerings that enable trading of usage rights to maximize the
utility of a shared infrastructure to its tenants.
%R 2010-021
%T Using Lightweight Formal Methods for JavaScript Security
%A Reynolds, Mark
%D July 23, 2010
%U http://www.cs.bu.edu/techreports/2010-021-lightweight-formal-javascript-security.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The goal of this work was to apply lightweight formal methods to the
study of the security of the JavaScript language. Previous work has
shown that lightweight formal methods present a new approach to the
study of security in the context of the Java Virtual Machine (JVM).
The current work has attempted to codify best current practices in the
form of a security model for JavaScript. Such a model is a necessary
component in analyzing browser actions for vulnerabilities, but it is
not sufficient. It is also required to capture actual
browser event traces and incorporate these into the model. The work
described herein demonstrates that it is (a) possible to construct a
model for JavaScript security that captures important properties of
current best practices within browsers; and (b) that an event
translator has been written that captures the dynamic properties of
browser site traversal in such a way that model analysis is tractable,
and yields important information about the satisfaction or refutation
of the static security rules.
%R 2010-022
%T Cloud-based Content Distribution on a Budget
%A Albanese, Francesco
%A Carra, Damiano
%A Michiardi, Pietro
%A Bestavros, Azer
%D August 12, 2010
%U http://www.cs.bu.edu/techreports/2010-022-cyclops.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
To leverage the elastic nature of cloud computing, a solution
provider must be able to accurately gauge demand for its
offering. For applications that involve swarm-to-cloud
interactions, gauging such demand is not straightforward. In
this paper, we propose a general framework, analyze a
mathematical model, and present a prototype implementation of
a canonical swarm-to-cloud application, namely peer-assisted
content delivery. Our system -- called Cyclops -- dynamically
adjusts the off-cloud bandwidth consumed by content servers
(which represents the bulk of the provider's cost) to feed a
set of swarming clients, based on a feedback signal that
gauges the real-time health of the swarm. Our extensive
evaluation of Cyclops in a variety of settings -- including
controlled PlanetLab and live Internet experiments involving
thousands of users -- show significant reduction in content
distribution costs (by as much as two orders of magnitude)
when compared to non-feedback-based swarming solutions, with
minor impact on content delivery times.
%R 2010-023
%T Customizable Keyboard
%A Missimer, Eric
%A Epstein, Samuel
%A Magee, John
%A Betke, Margrit
%D August 12, 2010
%U http://www.cs.bu.edu/techreports/2010-023-customizable-keyboard.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Customizable Keyboard is an on-screen keyboard designed to be flexible
and expandable. Instead of giving the user a keyboard layout
Customizable Keyboard allows the user to create a layout that is
accommodating to the user's needs. Customizable Keyboard also allows
the user to select from a variety of ways to interact with the
keyboard including but not limited to using the mouse pointer to
select keys and different types of scan based systems. Customizable
Keyboard provides more functionality than a typical onscreen keyboard
including the ability to control infrared devices such as TVs and send
Twitter Tweets.
%R 2010-024
%T Angels In the Cloud -- A Peer-Assisted Bulk-Synchronous Content Distribution Service
%A Sweha, Raymond
%A Ishakian, Vatche
%A Bestavros, Azer
%D August 12, 2010
%U http://www.cs.bu.edu/techreports/2010-024-cloud-angels.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Leveraging client upload capacity through peer-assisted content
distribution was shown to decrease the load on content
providers, while also improving average distribution
times. These benefits, however, are limited by the disparity
between client upload and download speeds, especially in
scenarios requiring a minimum distribution time (MDT) of a piece
of content to a set of clients. Achieving MDT is crucial for
bulk-synchronous applications, when every client in a set
must wait for all other clients in the set to finish their
downloads before being able to make use of the downloaded
content. In this paper, we propose the use of dedicated servers,
which we call angels to accelerate peer-assisted content
distribution in general, and to minimize MDT in particular. An
angel is not itself the content origin, nor is it interested in
fully downloading the content; its only purpose is to enable a
peer-assisted content distribution scheme to approach the
theoretical lower-bound for MDT. To overcome scalability issues
inherent in an optimal MDT construction, we propose and evaluate
a content exchange strategy involving angels, which we call
"Group Tree". In addition to simulation results that demonstrate
the near optimal performance of our proposed approach, we
present the architecture and implementation of CloudAngels
-- a service that allows the elastic, on-the-fly deployment of
angels (in the cloud) to assist a content provider (off the
cloud) in realizing its MDT objective.
%R 2010-025
%T Formal Verification of SLA Transformations
%A Ishakian, Vatche
%A Lapets, Andrei
%A Bestavros, Azer
%A Kfoury, Assaf
%D August 24, 2010
%U http://www.cs.bu.edu/techreports/2010-025-formal-sla-verification.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Desirable application performance is typically guaranteed
through the use of Service Level Agreements (SLAs)
that specify fixed fractions of resource capacities that must be
allocated for unencumbered use by the application. The mapping
between what constitutes desirable performance and SLAs is
not unique: multiple SLA expressions might be functionally
equivalent. Having the flexibility to transform SLAs from one
form to another in a manner that is provably safe would enable
hosting solutions to achieve significant efficiencies. This paper
demonstrates the promise of such an approach by proposing
a type-theoretic framework for the representation and safe
transformation of SLAs. Based on that framework, the paper
describes a methodical approach for the inference of efficient
and safe mappings of periodic, real-time tasks to the physical and
virtual hosts that constitute a hierarchical scheduler. Extensive
experimental results support the conclusion that the flexibility
afforded by safe SLA transformations has the potential to yield
significant savings.
%R 2010-026
%T Set Based Modeling Of Objects And Their Context (MA Thesis)
%A Cinbis, R. Gokberk
%D August 24, 2010
%U http://www.cs.bu.edu/techreports/0000-000-TBA.html
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In computer vision, many image entities can be represented as sets of
high-dimensional items. For example, an object in an image can be
represented as a set of image patches, where each image patch has a
feature vector encoding the local appearance. Set-based
representations can be rich and powerful in computer vision
applications; however, directly training classification models on sets
of unordered items, where each set can have varying cardinality, can
be difficult. In this thesis, the SetBoost supervised learning
algorithm is proposed for building set classifiers. An important
feature of the SetBoost formulation is that it reduces the problem of
learning a set classifier into a series of vector classifier learning
problems. As a result, SetBoost can utilize traditional vector
classification algorithms, such as decision trees, Support Vector
Machines, etc., in order to build set classifiers. In this thesis, the
SetBoost algorithm is demonstrated using decision tree classifiers,
and a formulation for set-based tuning of decision tree classifiers is
proposed for use within the SetBoost learning algorithm.
SetBoost has the potential to be useful in many applications. In the
second part of the thesis, a novel contextual object detection model
that uses SetBoost is proposed. In natural images, objects tend to
appear in certain arrangements with respect to the other objects
(object context) and the scene (scene context). The aim of our
proposed model is to improve localization and recognition accuracy of
object detection algorithms using object context and scene
context. The relationships between objects in an image are represented
in terms of sets, where each item has a feature vector encoding the
relationships between a pair of objects detected in the image. Scene
context is encoded based on the position of the object in the image
and a coarse image shape descriptor. The SetBoost classifiers are
trained to rescore detected objects in the image, based on
object-object and object-scene contextual relations. In the test
phase, for a given input image, first, single object detectors are
applied, and, then, these detections are rescored using the learned
context model. Our approach outperforms existing state-of-the-art
methods in challenging object detection benchmark datasets. In the VOC
2007 dataset, we observe that our context model increases the object
detection average precision score from 26.76 to 30.74, whereas
existing state-of-the-art performance is limited to 28.72.
Similarly, in the SUN dataset, our context model increases the object
detection average precision score from 7.06 to 8.75, whereas
existing state-of-the-art performance is 8.37.
%R 2010-027
%T Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flight
%A Theriault, Diane
%A Wu, Zheng
%A Hristov, Nickolay
%A Swartz, Sharon
%A Breuer, Kenneth
%A Kunz, Thomas
%A Betke, Margrit
%D September 6, 2010
%U http://www.cs.bu.edu/techreports/2010-027-3d-bat-trajectories.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The Brazilian free-tailed bat, Tadarida brasiliensis, roosts in
very large colonies, consisting of hundreds of thousands of
individuals. Each night, bats emerge from their day roosts in dense
columns in a highly coordinated manner. We recorded short segments of
an emergence using three spatially-calibrated and
temporally-synchronized thermal infrared cameras. We applied
stereoscopic methods to reconstruct the three-dimensional positions of
these flying bats. We applied a multiple hypothesis tracking
algorithm to obtain 7,016 reconstructed trajectories. Our analysis
includes estimates of the velocities of bats in flight, the distances
between animals within the emergence column, and the angles subtended
by the bats and their nearest neighbors.
%R 2010-028
%T HAIL: hierarchical adaptive interface layout
%A Magee, John
%A Betke, Margrit
%D September 6, 2010
%U http://www.cs.bu.edu/techreports/2010-028-hail.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We present a framework to adapt software to the needs of individuals
with severe motion disabilities who use mouse substitution
interfaces. Typically, users are required to adapt to the interfaces
that they wish to use. We propose interfaces that change and adapt to
the user and their individual abilities. The Hierarchical Adaptive
Interface Layout (HAIL) model is a set of specifications for the
design of user interface applications that adapt to the user. In HAIL
applications, all of the interactive components take place on
configurable toolbars along the edge of the screen. We show two
HAIL-based applications: a general purpose web browser and a Twitter
client.
%R 2010-029
%T Adaptive mappings for mouse-replacement interfaces
%A Magee, John
%A Epstein, Samuel
%A Missimer, Eric
%A Betke, Margrit
%D September 6, 2010
%U http://www.cs.bu.edu/techreports/2010-029-3d-motion-trajectories.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Users of mouse-replacement interfaces may have difficulty conforming
to the motion requirements of their interfacesystem. We have observed
users with severe motor disabilities who controlled the mouse pointer
with a head tracking interface. Our analysis shows that some users
may be able to move in some directions easier than other
directions. We propose several mouse pointer mappings that adapt to
the user's movement abilities. These mappings will take into account
the user's motions in two-or three-dimensions to move the mouse
pointer in the intended direction.
%R 2010-030
%T Tracking-Reconstruction or Reconstruction-Tracking? Comparison of Two Multiple Hypothesis Tracking Approaches to Interpret 3D Object Motion from Several Camera Views
%A Wu, Zheng
%A Hristov, Nickolay
%A Swartz, Sharon
%A Kunz, Thomas
%A Betke, Margrit
%D September 6, 2010
%U http://www.cs.bu.edu/techreports/2010-030-3d-tracking.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We developed two methods for tracking multiple objects using several
camera views. The methods use the Multiple Hypothesis Tracking (MHT)
framework to solve both the across-view data association problem
(i.e., finding object correspondences across several views) and the
across-time data association problem (i.e., the assignment of current
object measurements to previously established object tracks). The
"tracking-reconstruction method" establishes two-dimensional (2D)
objects tracks for each view and then reconstructs their
three-dimensional (3D) motion trajectories. The
"reconstruction-tracking method" assembles 2D object measurements from
all views, reconstructs 3D object positions, and then matches these 3D
positions to previously established 3D object tracks to compute 3D
motion trajectories. For both methods, we propose techniques for
pruning the number of association hypotheses and for gathering track
fragments. We tested and compared the performance of our methods on
thermal infrared video of bats using several performance measures.
Our analysis of video sequences with different levels of densities of
flying bats reveals that the reconstruction-tracking method produces
fewer track fragments than the tracking-reconstruction method but
creates more false positive 3D tracks.
%R 2010-032
%T On Modeling Speed-based Vertical Handovers in Vehicular Networks "Dad, slow down, I am watching the movie"
%A Esposito, Flavio
%A Vegni, Anna Maria
%A Matta, Ibrahim
%A Neri, Alessandro
%D September 7, 2010
%U http://www.cs.bu.edu/techreports/2010-032-vertical-handover-model.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Although vehicular ad hoc networks are emerging as a novel paradigm
for safety services, supporting real-time applications (e.g.,
video-streaming, Internet browsing, online gaming, etc.) while
maintaining ubiquitous connectivity remains a challenge due to both
high vehicle speed, and non-homogeneous nature of the network access
infrastructure. To guarantee acceptable Quality-of-Service and to
support seamless connectivity, vertical handovers across different
access networks are performed. In this work we prove the
counterintuitive result that in vehicular environments, even if a
candidate network has significantly higher bandwidth, it is not always
beneficial to abandon the serving network. To this end, we introduce
an analytical model for a vertical handover algorithm based on vehicle
speed. We argue that the proposed approach may help providers
incentivize safety by forcing vehicular speed reduction to guarantee
acceptable Quality-of-Service for real-time applications.
%R 2010-034
%T On the Impact of Seed Scheduling in Peer-to-Peer Networks
%A Esposito, Flavio
%A Matta, Ibrahim
%A Bera, Debajyoti
%A Michiardi, Pietro
%D October 15, 2010
%U http://www.cs.bu.edu/techreports/2010-034-seed-scheduling-impact.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In a content distribution (file sharing) scenario, the initial phase
is delicate due to the lack of global knowledge and the dynamics of
the overlay. An unwise piece dissemination in this phase can cause
delays in reaching steady state, thus increasing file download times.
After showing that finding the scheduling strategy for optimal
dissemination is computationally hard, even when the offline knowledge
of the overlay is given, we devise a new class of scheduling
algorithms at the seed (source peer with full content), based on a
proportional fair approach, and we implement them on a real file
sharing client. In addition to simulation results, we validated on
our own file sharing client (BUTorrent) that our solution improves up
to 25% the average downloading time of a standard file sharing
protocol. Moreover, we give theoretical upper bounds on the
improvements that our scheduling strategies may achieve.
%R 2010-035
%T On Supporting Mobility and Multihoming in Recursive Internet Architectures
%A Ishakian, Vatche
%A Akinwumi, Joseph
%A Esposito, Flavio
%A Matta, Ibrahim
%D October 15, 2010
%U http://www.cs.bu.edu/techreports/2010-035-rina-mobility.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
As the Internet has evolved and grown, an increasing number of nodes
(hosts or autonomous systems) have become multihomed, i.e., a node is
connected to more than one network. Mobility can be viewed as a
special case of multihoming --- as a node moves, it unsubscribes from
one network and subscribes to another, which is akin to one interface
becoming inactive and another active. The current Internet
architecture has been facing significant challenges in effectively
dealing with multihoming (and consequently mobility), which has led to
the emergence of several custom point-solutions. The Recursive
InterNetwork Architecture (RINA) was recently proposed as a cleanslate
solution to the current problems of the Internet. In this paper, we
present a specification of the process of ROuting in Recursive
Architectures (RORA). We also perform an average-case cost analysis to
compare the multihoming / mobility support of RINA, against that of
other approaches such as LISP and Mobile-IP. Extensive experimental
results confirm the premise that the RINA architecture and its RORA
routing approach are inherently better suited for supporting mobility
and multihoming.
%R 2010-036
%T Virtual-CPU Scheduling in the Quest Operating System
%A Danish, Matthew
%A Li, Ye
%A West, Rich
%D November 10, 2010
%U http://www.cs.bu.edu/techreports/2010-036-quest-virtual-scheduling.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper describes the scheduling framework for a new operating
system called ``Quest''. The three main goals of Quest are to ensure
safety, predictability and efficiency of software execution. For this
paper, we focus on one aspect of predictability, involving the
integrated management of tasks and I/O events such as
interrupts. Quest's scheduling infrastructure is based around the
concept of a virtual CPU (VCPU). Using both Main and I/O VCPUs, we are
able to separate the CPU bandwidth consumed by tasks from that used to
complete I/O processing. We introduce a priority-inheritance
bandwidth-preserving server policy for I/O management, called PIBS. We
show how PIBS operates with lower cost and higher throughput than a
comparable Sporadic Server for managing I/O transfers that require
small bursts of CPU time. Using a hybrid system of Sporadic Servers
for Main VCPUs, and PIBS for I/O VCPUs, we show how to maintain
temporal isolation between multiple tasks and I/O transfers from
different devices. We believe Quest's VCPU scheduling infrastructure
is scalable enough to operate on future multi- and many-core systems
supporting large numbers of threads. For a system of 24 VCPUs, we
observe a CPU scheduling overhead of approximately 0.3% when VCPU
budget is managed in 1ms units.
%R 2010-037
%T Describing and Forecasting Video Access Patterns
%A Gursun, Gonca
%A Crovella, Mark
%A Matta, Ibrahim
%D November 10, 2010
%U http://www.cs.bu.edu/techreports/2010-037-video-access-patterns.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Computer systems are increasingly driven by workloads that
reflect large-scale social behavior, such as rapid changes in the
popularity of media items like videos. Capacity planners and system
designers must plan for rapid, massive changes in workloads when such
social behavior is a factor. In this paper we make two contributions
intended to assist in the design and provisioning of such systems.We
analyze an extensive dataset consisting of the daily access counts of
hundreds of thousands of YouTube videos. In this dataset, we find that
there are two types of videos: those that show rapid changes in
popularity, and those that are consistently popular over long time
periods. We call these two types rarely-accessed and
frequently-accessed videos, respectively. We observe that most of the
videos in our data set clearly fall in one of these two types. For
each type of video we ask two questions: first, are there relatively
simple models that can describe its daily access patterns? And second,
can we use these simple models to predict the number of accesses that
a video will have in the near future, as a tool for capacity planning?
To answer these questions we develop two different frameworks for
characterization and forecasting of access patterns. We show that for
frequently-accessed videos, daily access patterns can be extracted via
principal component analysis, and used efficiently for
forecasting. For rarely-accessed videos, we demonstrate a clustering
method that allows one to classify bursts of popularity and use those
classifications for forecasting.
%R 2011-001
%T Computational Entropy and Information Leakage (MA Thesis)
%A Fuller, Benjamin
%A Reyzin, Leonid
%D January 7, 2011
%U http://www.cs.bu.edu/techreports/2011-001-MA-Thesis-Benjamin-Fuller.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We investigate how information leakage reduces computational entropy
of a random variable X. Recall that HILL and metric computational
entropy are parameterized by quality (how distinguishable is X from a
variable Z that has true entropy) and quantity (how much true entropy
is there in Z). We prove an intuitively natural result: conditioning
on an event of probability p reduces the quality of metric entropy by
a factor of p and the quantity of metric entropy by log 1/p (note that
this means that the reduction in quantity and quality is the same,
because the quantity of entropy is measured on logarithmic scale). Our
result improves previous bounds of Dziembowski and Pietrzak (FOCS
2008), where the loss in the quantity of entropy was related to its
original quality. The use of metric entropy tightens the result of
Reingold et. al. (FOCS 2008) and makes it easy to measure entropy
even after conditioning on several events. Further, we simplify
dealing with information leakage by investigating conditional metric
entropy. We show that, conditioned on leakage of L bits, metric
entropy gets reduced by a factor 2^L in quality and L in quantity. Our
formulation allow us to formulate a chain rule for leakage on
computational entropy. We show that conditioning on L bits of leakage
reduces conditional metric entropy by L bits. This is the same loss as
leaking from unconditional metric entropy. This result makes it easy
to measure entropy even after several rounds of information leakage.
%R 2011-002
%T MorphoSys: Efficient Colocation of QoS-Constrained Workloads in the Cloud
%A Ishakian, Vatche
%A Bestavros, Azer
%D January 25, 2011
%U http://www.cs.bu.edu/techreports/2011-002-morphosys.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In hosting environments such as IaaS clouds, desirable application
performance is usually guaranteed through the use of Service Level
Agreements (SLAs), which specify minimal fractions of resource
capacities that must be allocated for unencumbered use for proper
operation. Arbitrary colocation of applications with different SLAs on
a single host may result in inefficient utilization of the host’s
resources. In this paper, we propose that periodic resource allocation
and consumption models -- often used to characterize real-time
workloads -- be used for a more granular expression of SLAs. Our
proposed SLA model has the salient feature that it exposes
flexibilities that enable the infrastructure provider to safely
transform SLAs from one form to another for the purpose of achieving
more efficient colocation. Towards that goal, we present MorphoSys: a
framework for a service that allows the manipulation of SLAs to enable
efficient colocation of arbitrary workloads in a dynamic setting. We
present results from extensive trace-driven simulations of colocated
Video-on-Demand servers in a cloud setting. These results show that
potentially-significant reduction in wasted resources (by as much as
60%) are possible using MorphoSys.
%R 2011-003
%T Let the Market Drive Deployment: A Strategy for Transitioning to BGP Security
%A Gill, Phillipa
%A Schapira, Michael
%A Goldberg, Sharon
%D February 4, 2011
%U http://www.cs.bu.edu/techreports/2011-003-sbgp-transition.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
With a cryptographic root-of-trust for Internet routing (RPKI) on the
horizon, we can finally start planning the deployment of one of the
secure interdomain routing protocols proposed over a decade ago
(Secure BGP, secure origin BGP). However, if experience with IPv6 is
any indicator, this will be no easy task. Security concerns alone seem
unlikely to provide sufficient local incentive to drive the deployment
process forward. Worse yet, the security benefits provided by the
S*BGP protocols do not even kick in until a large number of ASes have
deployed them. Instead, we appeal to ISPs' interest in increasing
rev\-enue-generating traffic. We propose a strategy that governments
and industry groups can use to harness ISPs' local business objectives
and drive global S*BGP deployment. We evaluate our deployment strategy
using theoretical analysis and large-scale simulations on empirical
data. Our results give evidence that the market dynamics created by
our proposal can transition the majority of the Internet to S*BGP.
%R 2011-004
%T Safe Compositional Network Sketches: NetSketch Tool Implementation
%A Soule, Nate
%A Bestavros, Azer
%A Kfoury, Assaf
%A Lapets, Andrei
%D February 8, 2011
%U http://www.cs.bu.edu/techreports/2011-004-netsketch-implementation.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
NetSketch is a tool that enables the specification of network-flow
applications and the certification of desirable safety properties
imposed thereon. NetSketch is conceived to assist system integrators
in two types of activities: modeling and design. As a modeling tool,
it enables the abstraction of an existing system so as to retain
sufficient enough details to enable future analysis of safety
properties. As a design tool, NetSketch enables the exploration of
alternative safe designs as well as the identification of minimal
requirements for outsourced subsystems. NetSketch embodies a
lightweight formal verification philosophy, whereby the power (but not
the heavy machinery) of a rigorous formalism is made accessible to
users via a friendly interface. NetSketch does so by exposing
tradeoffs between exactness of analysis and scalability, and by
combining traditional whole-system analysis with a more flexible
compositional analysis approach based on a strongly typed,
Domain-Specific Language (DSL) to specify network configurations at
various levels of sketchiness along with invariants that need to be
enforced thereupon. In this paper we discuss a first implementation of
the NetSketch system. We begin with a brief introduction to the
NetSketch concept and formalism, and then discuss the methodology
behind the type generation process as well as the technical
implementation of the overall system. In a companion paper, we define
the formal system underlying the operation of NetSketch, in particular
the DSL behind NetSketch’s user interface when used in "sketch
mode", and prove its soundness relative to appropriately-defined
notions of validity.
%R 2011-005
%T The Filter-Placement Problem and its Application to Content De-Duplication
%A Bestavros, Azer
%A Erdos, Dora
%A Ishakian, Vatche
%A Lapets, Andrei
%A Terzi, Evimaria
%D February 21, 2011
%U http://www.cs.bu.edu/techreports/2011-005-filter-placement.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In many information networks, data items such as updates
in social networks, news flowing through interconnected RSS
feeds and blogs, measurements in sensor networks, route updates
in ad-hoc networks, etc. propagate in an uncoordinated
manner: nodes often relay information they receive
to neighbors, independent of whether or not these neighbors
received such information from other sources. This uncoordinated
data dissemination may result in significant, yet
unnecessary communication and processing overheads, ultimately
reducing the utility of information networks. To
alleviate the negative impacts of this information multiplicity
phenomenon, we propose that a subset of nodes (selected
at key positions in the network) carry out additional information
de-duplication functionality namely, the removal
(or significant reduction) of the duplicative data items relayed
through them. We refer to such nodes as filters. We
formally define the Filter Placement problem as a combinatorial
optimization problem, and study its computational
complexity for different types of graphs. We also present
polynomial-time approximation algorithms for the problem.
Our experimental results, which we obtained through extensive
simulations on synthetic and real-world information
flow networks, suggest that in many settings a relatively
small number of filters is fairly effective in removing a large
fraction of duplicative information.
%R 2011-006
%T Efficient Techniques for Recovering 2D Human Body Poses from Images (PhD Thesis)
%A Tian, Tai-Peng
%D February 23, 2011
%U http://www.cs.bu.edu/techreports/2011-006-PhD-Thesis-Tai-Peng-Tian.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Human parsing recovers the 2D spatial layout of a human figure in an
image. First, patches in the image that resemble body parts, i.e.,
head, torso and limbs, are identified, then a coherent human figure is
assembled from these candidate positions. The human model is
represented as a graph where each vertex represents a body part and
each edge represents a relationship between parts. If the graph is a
tree, then the optimal solution can be recovered efficiently using the
Min-Sum (MS) algorithm. Tree models often return incorrect solutions
with the left and right legs stacked on top of one another. To
overcome this problem, we add constraints to the tree model, yielding
a graph that contains loops. Finding the optimal solution for a loopy
graph is computationally intensive. We propose a Branch and Bound
search algorithm to recover the optimal solution. Our algorithm
converges quickly in practice due to a novel tree structured lower
bound and a fast way for evaluating these lower bounds. Naively,
evaluating each lower bound requires $O(nh)$ time for a graph with $n$
vertices and $h$ candidate body part locations. We develop an $O(1)$
time method for evaluating the lower bound (in most iterations of the
algorithm) by reusing messages from the MS algorithm and using a Range
Minimum Query data structure. We also propose a human parsing model
that encodes the viewpoint and walking phase of the human figure using
the Common Factor Model (CFM). The main computational bottleneck of
the CFM human parsing algorithm involves message creation for each
iteration of the MS algorithm. The original CFM inference requires
$O(kn)$ messages to be created for $k$ iterations of the MS algorithm
in a graph with $n$ vertices. Our new algorithm reduces this to
$O(n)$ messages created. This speedup is based on the insight that
the messages are shifted from one iteration to the next and,
therefore, messages can be created once and then shifted in subsequent
iterations (shifting is an efficient operation which requires $O(1)$
time). In our experiments, the two proposed algorithms yield an order
of magnitude computational speedup over competing algorithms.
%R 2011-007
%T Camera Canvas: Image Editing Software for People with Disabilities
%A Kwan, Christopher
%A Betke, Margrit
%D March 2, 2011
%U http://www.cs.bu.edu/techreports/2011-007-camera-canvas.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We developed Camera Canvas, photo editing and picture drawing software
for individuals who cannot use their hands to operate a computer mouse.
Camera Canvas is designed for use with camera-based mouse-replacement
interfaces that allow a user with severe motion impairments to control
the mouse pointer by moving his or her head in front of a web camera.
To make Camera Canvas accessible to as wide of a range of movement
abilities as possible, we designed its user interface so that it can be
extensively tailored to meet individual user needs. We conducted
studies with users without disabilities, who used Camera Canvas with
the mouse-replacement input system Camera Mouse. The studies showed
that Camera Canvas is easy to understand and use, even for participants
without prior experience with the Camera Mouse. An experiment with a
participant with severe cerebral palsy and quadriplegia showed that he
was able to use some but not all of the functionality of Camera Canvas.
Ongoing work includes conducting additional user studies and improving
the software based on feedback.
%R 2011-008
%T Adaptive mouse-replacement interface control functions for users with disabilities
%A Magee, John
%A Epstein, Samuel
%A Missimer, Eric
%A Kwan, Christopher
%A Betke, Margrit
%D March 2, 2011
%U http://www.cs.bu.edu/techreports/2011-008-adaptive-mouse-functions.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We discuss experiences employing a video-based mouse-replacement
interface system, the Camera Mouse, at care facilities for individuals
with severe motion impairments and propose adaptations of the system.
Traditional approaches to assistive technology are often inflexible,
requiring users to adapt their limited motions to the requirements of
the system. Such systems may have static or difficult-to-change
configurations that make it challenging for multiple users to share the
same system or for users whose motion abilities slowly degenerate. As
users fatigue, they may experience more limited motion ability or
additional unintended motions. To address these challenges, we propose
adaptive mouse-control functions to be used in our mouse-replacement
system. These functions can be changed to adapt the technology to the
needs of the user, rather than making the user adapt to the technology.
We present observations of an individual with severe cerebral palsy
using our system.
%R 2011-009
%T Menu Controller: Making Existing Software More Accessible for People with Motor Impairments
%A Paquette, Isaac
%A Kwan, Christopher
%A Betke, Margrit
%D March 2, 2011
%U http://www.cs.bu.edu/techreports/2011-009-menu-controller.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Menu Controller was developed to make existing software more accessible
for people with severe motor impairments, especially individuals who
use mouse-replacement input systems. Windows applications have menus
that are difficult to access by users with limited muscle control, due
to the size and placement of the menu entries. The goal of Menu
Controller is to take these entries and generate customizable user
interfaces that can be catered to the individual user. Menu Controller
accomplishes this by harvesting existing menu items without needing to
change any existing code in these applications and then by displaying
them to the user in an external toolbar that is more easily accessible
to people with impairments. The initial challenge in developing Menu
Controller was to find a method for harvesting and re-displaying menu
items by using the Windows API. The rest of the work involved exploring
an appropriate way for displaying the harvested menu entries. We
ultimately chose an approach based on a two-level sliding toolbar.
Experiments with a user with severe motor impairments, who used the
Camera Mouse as a mouse-replacement input system, showed that this
approach was indeed promising. The experiments also exposed areas that
need further research and development. We suggest that Menu Controller
provides a valuable contribution towards making everyday software more
accessible to people with disabilities.
%R 2011-010
%T Layered Graphical Models for Tracking Partially-Occluded Moving Objects in Video (PhD Thesis)
%A Ablavsky, Vitaly
%D March 16, 2011
%U http://www.cs.bu.edu/techreports/2011-010-PhD-Thesis-Vitaly-Ablavsky.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Tracking multiple targets using fixed cameras with non-overlapping
views is a challenging problem. One of the challenges is
predicting and tracking through occlusions caused by other targets
or by fixed objects in the scene. Considerable effort has been
devoted toward developing appearance models that are robust to
partial occlusions, tracking algorithms that cope with short-term
loss of observations, and algorithms that learn static occlusion
maps. In this thesis we consider scenarios where it is impossible
to learn a static occlusion map. This is often the case when the
scene consists of both people and large objects whose position is
not permanently fixed. These objects may enter, leave or relocate
within the scene during a short time span. We call such objects
"relocatable objects" or "relocatable occluders."
We develop a representation for scenes containing relocatable
objects that can cause partial occlusions of people in a camera's
field of view. In many practical applications, relocatable objects
tend to appear often; therefore, models for them can be learned
off-line and stored in a database. We formulate an
occluder-centric representation, called a graphical model layer,
where a person's motion in the ground plane is defined as a
first-order Markov process on activity zones, while image evidence
is aggregated in 2D observation regions that are depth-ordered
with respect to the occlusion mask of the relocatable object. We
represent real-world scenes as a composition of depth-ordered,
interacting graphical model layers, and account for image evidence
in a way that handles mutual overlap of the observation regions
and their occlusions by the relocatable objects. These layers
interact: proximate ground plane zones of different model
instances are linked to allow a person to move between the layers,
and image evidence is shared between the observation regions of
these models.
We demonstrate our formulation in tracking low-resolution,
partially-occluded pedestrians in the vicinity of parked vehicles.
In these scenarios some tracking formulations that rely on
part-based person detectors may fail completely. Our pedestrian
tracker fares well and compares favorably with the
state-of-the-art pedestrian detectors---lowering false positives
by twenty-nine percent and false negatives by forty-two
percent---and a deformable-contour--based tracker.
%R 2011-011
%T A Domain-Specific Language for the Incremental and Modular Design of Large-Scale Verifiably-Safe Flow Networks
%A Kfoury, Assaf
%D May 11, 2011
%U http://www.cs.bu.edu/techreports/2011-011-dsl-for-flow-network-design.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Flow networks are inductively defined, assembled from small
networks or modules to produce arbitrarily large ones, with
interchangeable functionally-equivalent parts. We carry out this
induction formally using a domain-specific language (DSL). Associated
with our DSL is a typing system (or static semantics), a system of
formal annotations that enforce desirable properties of flow networks as
invariants across their interfaces. A prerequisite for a type theory is
a formal semantics, i.e., a rigorous definition of the entities that
qualify as feasible flows through the networks, possibly restricted to
satisfy additional efficiency or safety requirements. We carry out this
in two ways, as a denotational semantics and as an operational (or
reduction) semantics.
%R 2011-012
%T Estimation of Instrinsic Dimension via Clustering
%A Eriksson, Brian
%A Crovella, Mark
%D May 12, 2011
%U http://www.cs.bu.edu/techreports/2011-012-intrinsic-dimension-clustering.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The problem of estimating the intrinsic dimension of a set of points
in high dimensional space is a critical issue for a wide range of
disciplines, including genomics, finance, and networking. Current
estimation techniques are dependent on either the ambient or intrinsic
dimension in terms of computational complexity, which may cause these
methods to become intractable for large data sets. In this paper, we
present a clustering-based methodology that exploits the inherent
self-similarity of data to efficiently estimate the intrinsic
dimension of a set of points. When the data satisfies a specified
general clustering condition, we prove that the estimated dimension
approaches the true Hausdorff dimension. Experiments show that the
clustering-based approach allows for more efficient and accurate
intrinsic dimension estimation compared with all prior techniques,
even when the data does not conform to obvious self-similarity
structure. Finally, we present empirical results which show the
clustering-based estimation allows for a natural partitioning of the
data points that lie on separate manifolds of varying intrinsic
dimension.
%R 2011-014
%T Safe Compositional Equation-based Modeling of Constrained Flow Networks
%A Soule, Nate
%A Bestavros, Azer
%A Kfoury, Assaf
%A Lapets, Andrei
%D May 15, 2011
%U http://www.cs.bu.edu/techreports/2011-014-netsketch-equation-based.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Numerous domains exist in which systems can be modeled as
networks with constraints that regulate the flow of traffic. Smart
grids, vehicular road travel, computer networks, and cloud-based
resource distribution, among others all have natural representations
in this manner. As these systems grow in size and complexity, analysis
and certification of safety invariants becomes increasingly
costly. The NetSketch formalism and toolset introduce a lightweight
framework for constraint-based modeling and analysis of such flow
networks. NetSketch offers a processing method based on type-theoretic
notions that enables large scale safety verification by allowing for
compositional, as opposed to whole-system, analysis. Furthermore, by
applying types to the modeled networks, analysis of composite modules
containing incomplete or underspecified components can be
conducted. The NetSketch tool exposes the power of this formalism in
an intuitive web-based graphical user interface. We describe the
NetSketch formalism and tool, a translation from an instantiation of
the NetSketch formalism to the equation-based modeling language
Modelica, and the development of an accompanying Haskell library,
HModelica, that enables the integration of NetSketch and the
OpenModelica modeling platform.
%R 2011-015
%T The Zenith Attack: Vulnerabilities and Countermeasures
%A Skowyra, Richard
%A Bestavros, Azer
%A Goldberg, Sharon
%D May 15, 2011
%U http://www.cs.bu.edu/techreports/2011-015-crypsis.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper we identify and define Zenith attacks, a new class of
attacks on content-distribution systems, which seek to expose the
popularity (i.e. access frequency) of individual items of content. As
the access pattern to most real-world content exhibits Zipf-like
characteristics, there is a small set of dominating items which account
for the majority of accesses. Identifying such items enables an
adversary to perform follow up adversarial actions targeting these
items, including mounting denial of service attacks, deploying
censorship mechanisms, and eavesdropping on or prosecution of the host
or recipient. We instantiate a Zenith attack on the Kademlia and Chord
structured overlay networks and quantify the cost of such an attack. As
a countermeasure to these attacks we propose Crypsis, a system to
conceal the lookup frequency of individual keys through aggregation
over ranges of the keyspace. Crypsis provides provable security
guarentees for concealment of lookup frequency while maintaining
logarithmic routing and state bounds.
%R 2011-016
%T A Domain Specific Language for Incremental and Modular Design of Large-Scale Verifiably-Safe Flow Networks (Preliminary Report)
%A Bestavros, Azer
%A Kfoury, Assaf
%D July 11, 2011
%U http://www.cs.bu.edu/techreports/2011-016-preliminary-dsl-for-flow-network-design.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We define a domain-specific language (DSL) to inductively assemble flow
networks from small network s or modules to produce arbitrarily large
ones, with interchangeable functionally-equivalent parts. Our small
networks or modules are ``small'' only as the building blocks in this
inductive definition (there is no limit on their size). Associated with
our DSL is a type theory, a system of formal annotations to express
desirable properties of flow networks together with rules that enforce
them as invariants across their interfaces, i.e, the rules guarantee the
properties are preserved as we build larger networks from smaller ones.
A prerequisite for a type theory is a formal semantics, i.e.,
a rigorous definition of the entities that qualify as feasible flows
through the networks, possibly restricted to satisfy additional
efficiency or safety requirements. This can be carried out in one of two
ways, as a denotational semantics or as an operational (or reduction)
semantics; we choose the first in preference to the second, partly to
avoid exponential-growth rewriting in the operational approach. We set
up a typing system and prove its soundness for our DSL.
%R 2011-017
%T The Denotational and Static Semantics of a Domain-Specific Language for Flow-Network Design
%A Kfoury, Assaf
%D July 11, 2011
%U http://www.cs.bu.edu/techreports/2011-017-denotational-and-static-semantics-of-dsl.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Flow networks are inductively defined, assembled from small network
modules to produce arbitrarily large ones, with interchangeable and
expanding functionally-equivalent parts. We carry out this induction
formally using a domain-specific language (DSL). Associated with our
DSL is a typing system (or static semantics), a system of formal
annotations that enforce desirable properties of flow networks as
invariants across their interfaces. A prerequisite for a type theory
is a formal semantics, i.e., a rigorous definition of the entities
that qualify as feasible flows through the networks, possibly
restricted to satisfy additional efficiency or safety requirements. We
carry out this via a denotational semantics.
%R 2011-018
%T Posit: An Adaptive Framework for Lightweight IP Geolocation
%A Eriksson, Brian
%A Barford, Paul
%A Maggs, Bruce
%A Nowak, Robert
%D July 11, 2011
%U http://www.cs.bu.edu/techreports/2011-018-posit.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Location-specific Internet services are predicated on the ability to
identify the geographic position of IP hosts accurately. Fundamental
to prior geolocation techniques is their reliance on landmarks with
known coordinates whose distance from target hosts is intrinsically
tied to the ability to make accurate location estimates. In this
paper, we introduce a new lightweight framework for IP geolocation
that we call Posit. The Posit framework geolocates by automatically
adapting to the geographic distribution of the measurement
infrastructure relative to each target host. This lightweight
framework requires only a small number of Ping measurements conducted
to end host targets in conjunction with a computationally efficient
geographic embedding methodology. We demonstrate that Posit performs
significantly better than all existing geolocation tools across a wide
spectrum of measurement infrastructures with varying geographic
densities. Posit is shown to geolocate hosts with median error
improvements of over 50% with respect to all current measurement-based
IP geolocation methodologies.
%R 2011-019
%T Use Cases for Compositional Modeling and Analysis of Equation-based Constrained Flow Networks
%A Soule, Nate
%A Bestavros, Azer
%A Ishakian, Vatche
%A Kfoury, Assaf
%A Lapets, Andrei
%D July 15, 2011
%U http://www.cs.bu.edu/techreports/2011-019-netsketch-usecases.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Numerous domains exist in which systems can be modeled as networks
with constraints that regulate the ow of trac. Smart grids, vehicular
road travel, computer networks, and cloud- based resource
distribution, among others all have natural representations in this
manner. As these systems grow in size and complexity, analysis and
certication of safety invariants becomes increasingly costly. The
NetSketch formalism and toolset introduce a lightweight framework for
constraint-based modeling and analysis of such ow networks. NetSketch
oers a processing method based on type-theoretic notions that enables
large scale safety verication by allowing for compositional, as
opposed to whole-system, analysis. Furthermore, by applying types to
the modeled networks, analysis of composite modules containing
incomplete or underspecied components can be conducted. Here we
describe various use cases for such modeling tasks, and walk through
the development of appropriate NetSketch models.
%R 2011-020
%T Modeling on Quicksand: Dealing with the Lack of Ground Truth in Interdomain Routing Data
%A Gill, Phillipa
%A Schapira, Michael
%A Goldberg, Sharon
%D September 8, 2011
%U http://www.cs.bu.edu/techreports/2011-020-quicksand.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Researchers studying the interdomain routing system, its properties
and new protocols, face many challenges in performing realistic
evaluations and simulations. Modeling decisions with respect to
AS-level topology, routing policies and tra c matrices are complicated
by a dearth of ground truth for each of these components. Moreover,
scalability issues arise when attempting to simulate over large
(although still incomplete) empirically-derived AS-level topologies.
In this paper, we discuss our approach for analyzing the robustness of
our results to incomplete empirical data. We do this by (1) developing
fast simulation algorithms that enable us to (2) running multiple
simulations with varied parameters that test the sensitivity of our
research results.
%R 2011-022
%T A Framework for the Evaluation and Management of Network Centrality
%A Ishakian, Vatche
%A Erdos, Dora
%A Terzi, Evimaria
%A Bestavros, Azer
%D October 13, 2011
%U http://www.cs.bu.edu/techreports/2011-022-group-centrality.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Network-analysis literature is rich in node-centrality measures that
quantify the centrality of a node as a function of the (shortest)
paths of the network that go through it. Existing work focuses on
defining instances of such measures and designing algorithms for the
specific combinatorial problems that arise for each instance. In this
work, we propose a unifying definition of centrality that subsumes all
path-counting based centrality definitions: e.g., stress, betweenness or
paths centrality. We also define a generic algorithm for computing this
generalized centrality measure for every node and every group of nodes
in the network. Next, we define two optimization problems: k-Group
Centrality Maximization and k-Edge Centrality Boosting.
In the former, the task is to identify the subset of k nodes
that have the largest group centrality. In the latter, the goal
is to identify up to k edges to add to the network so that
the centrality of a node is maximized. We show that both of
these problems can be solved efficiently for arbitrary centrality
definitions using our general framework. In a thorough
experimental evaluation we show the practical utility of our
framework and the efficacy of our algorithms.
%R 2011-023
%T Technology Diffusion in Communication Networks
%A Goldberg, Sharon
%A Liu, Zhenming
%D November 10, 2011
%U http://www.cs.bu.edu/techreports/2011-023-tech-diffusion-in-networks.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The deployment of new technologies in the Internet is notoriously
difficult, as evidence by the myriad of well-developed networking
technologies that still have not seen widespread adoption (e.g.,
secure routing, IPv6, etc.) A key hurdle is the fact that the
Internet lacks a centralized authority that can mandate the deployment
of a new technology. Instead, the Internet consists of thousands of
nodes, each controlled by an autonomous, profit-seeking firm, that
will deploy a new networking technology only if it obtains sufficient
local utility by doing so. For the technologies we study here, local
utility depends on the set of nodes that can be reached by traversing
paths consisting only of nodes that have already deployed the new
technology. To understand technology diffusion in the Internet, we
propose a new model inspired by work on the spread of influence in
social networks. Unlike traditional models, where a node's utility
depends only its immediate neighbors, in our model, a node can be
influenced by the actions of remote nodes. Specifically, we assume
node v activates (i.e. deploys the new technology) when it is adjacent
to a sufficiently large connected component in the subgraph induced by
the set of active nodes; namely, of size exceeding node v's threshold
value \theta(v). We are interested in the problem of choosing the
right seedset of nodes to activate initially, so that the rest of the
nodes in the network have sufficient local utility to follow suit. We
take the graph and thresholds values as input to our problem. We show
that our problem is both NP-hard and does not admit an (1-o(1) ln|V|
approximation on general graphs. Then, we restrict our study to
technology diffusion problems where (a) maximum distance between any
pair of nodes in the graph is r, and (b) there are at most \ell
possible threshold values. Our set of restrictions is quite natural,
given that (a) the Internet graph has constant diameter, and (b) the
fact that limiting the granularity of the threshold values makes sense
given the difficulty in obtaining empirical data that parameterizes
deployment costs and benefits. We present algorithm that obtains a
solution with guaranteed approximation rate of O(r^2 \ell \log|V|)
which is asymptotically optimal, given our hardness results. Our
approximation algorithm is a linear-programming relaxation of an 0-1
integer program along with a novel randomized rounding scheme.
%R 2011-024
%T Dynamic Pricing For Efficient Workload Colocation
%A Ishakian, Vatche
%A Sweha, Raymond
%A Bestavros, Azer
%A Appavoo, Jonathan
%D November 15, 2011
%U http://www.cs.bu.edu/techreports/2011-024-dynamic-colocation-pricing.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Pricing models for virtualized (cloud) resources are meant to reflect
the operational costs and profit margins for providers to deliver
specific resources or services to customers subject to an underlying
Service Level Agreements (SLAs). While the operational costs incurred
by cloud providers are dynamic they vary over time, depending on
factors such as energy cost, cooling strategies, and overall
utilization the pricing models extended to customers are typically
fixed they are static over time and independent of aggregate
demand. This disconnect between the cost incurred by a provider and
the price paid by a customer results in an inefficient marketplace. In
particular, it does not provide incentives for customers to express
workload scheduling flexibilities that may benefit them as well as
cloud providers. In this paper, we propose a new dynamic pricing model
that aims to address this marketplace inefficiency by giving customers
the opportunity and incentive to take advantage of any tolerances they
may have regarding the scheduling of their workloads. We present the
architecture and algorithmic blueprints of a framework for workload
colocation, which provides customers with the ability to formally
express workload scheduling flexibilities using Directed Acyclic
Graphs (DAGs), optimizes the use of cloud resources to collocate
clients’ workloads, and utilizes Shapley valuation to rationally and
thus fairly in a game-theoretic sense attribute costs to customer
workloads. In a thorough experimental evaluation we show the practical
utility of our dynamic pricing mechanism and the efficacy of the
resulting marketplace in terms of cost savings.
%R 2011-025
%T Slice Embedding Solutions for Distributed Service Architectures
%A Esposito, Flavio
%A Matta, Ibrahim
%A Ishakian, Vatche
%D December 13, 2011
%U http://www.cs.bu.edu/techreports/2011-025-slice-embedding.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Network virtualization provides a novel approach to run multiple
concurrent virtual networks over a common physical network
infrastructure. From a research perspective, this enables the
networking community to concurrently experiment with new Internet
architectures and protocols. From a market perspective, on the other
hand, this paradigm is appealing as it enables infrastructure service
providers to experiment with new business models that range from
leasing virtual slices of their infrastructure to host multiple
concurrent network services. In this paper, we present the slice
embedding problem and recent developments in the area. A slice is a
set of virtual instances spanning a set of physical resources. The
embedding problem consists of three main tasks: (1) resource
discovery, which involves monitoring the state of the physical
resources, (2) virtual network mapping, which involves matching users’
requests with the available resources, and (3) allocation, which
involves assigning the resources that match the users’ query. We also
outline how these three tasks are tightly connected, and how there
exists a wide spectrum of solutions that either solve a particular
task, or jointly solve multiple tasks along with the interactions
among them. To dissect the space of solutions, we introduce three main
classification criteria, namely, (1) the type of constraints imposed
by the user, (2) the type of dynamics considered in the embedding
process, and (3) the allocation strategy adopted. Finally, we conclude
with a few interesting research directions.
%R 2011-026
%T AngelCast: Cloud-based Peer-Assisted Live Streaming Using Optimized Multi-Tree Construction
%A Sweha, Raymond
%A Ishakian, Vatche
%A Bestavros, Azer
%D December 14, 2011
%U http://www.cs.bu.edu/techreports/2011-026-angelcast.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Increasingly, commercial content providers (CPs) offer streaming and
IPTV solutions that leverage an underlying peer-to-peer (P2P) stream
distribution architecture. The use of P2P protocols promises
significant scalability and cost savings by leveraging the local
resources of clients -- specifically, uplink capacity. A major
limitation of P2P live streaming is that playout rates are constrained
by the uplink capacities of clients, which are typically much lower
than downlink capacities, thus limiting the quality of the delivered
stream. Thus, to leverage P2P architectures without sacrificing the
quality of the delivered stream, CPs must commit additional resources
to complement those available through clients. In this paper, we
propose a cloud-based service -- AngelCast -- that enables CPs to
elastically complement P2P streaming ``as needed''. By subscribing to
AngelCast, a CP is able to deploy extra resources (``angels''),
on-demand from the cloud, to maintain a desirable stream (bit-rate)
quality. Angels need not download the whole stream (they are not
``leachers''), nor are they in possession of it (they are not
``seeders''). Rather, angels only relay (download once and upload as
many times as needed) the minimal possible fraction of the stream that
is necessary to achieve the desirable stream quality, while maximally
utilizing available client resources. We provide a lower bound on the
minimum amount of angel capacity needed to maintain a certain bit-rate
to all clients, and develop a fluid model construction that achieves
this lower bound. Realizing the limitations of the fluid model
construction -- namely, susceptibility to potentially arbitrary
start-up delays and significant degradation due to churn -- we present
a practical multi-tree construction that captures the spirit of the
optimal construction, while avoiding its limitations. In particular,
our AngelCast protocol achieves near optimal performance (compared to
the fluid-model construction) while ensuring a low startup delay by
maintaining a logarithmic-length path between any client and the
provider, and while gracefully dealing with churn by adopting a
flexible membership management approach. We present the blueprints of
a prototype implementation of AngelCast, along with experimental
results confirming the feasibility and performance potential of our
AngelCast service when deployed on Emulab and PlanetLab.
%R 2011-027
%T Turing and the Development of Computational Complexity
%A Homer, Steve
%A Selman, Alan
%D December 20, 2011
%U http://www.cs.bu.edu/techreports/2011-027-turing-and-computational-complexity.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Turing's beautiful capture of the concept of computability by the
``Turing machine" linked computability to a device with explicit steps
of operations and use of resources. This invention led in a most
natural way to build the foundations for computational complexity.
%R 2011-029
%T Quest-V: A Virtualized Multikernel for High-Confidence Systems
%A Li, Ye
%A Danish, Matthew
%A West, Rich
%D December 20, 2011
%U http://www.cs.bu.edu/techreports/2011-029-quest-v.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
This paper outlines the design of `Quest-V', which is implemented as a
collection of separate kernels operating together as a distributed system
on a chip. Quest-V uses virtualization techniques to isolate kernels and
prevent local faults from affecting remote kernels. This leads to a
high-confidence multikernel approach, where failures of system
subcomponents do not render the entire system inoperable. A virtual
machine monitor for each kernel keeps track of shadow page table mappings
that control immutable memory access capabilities. This ensures a level of
security and fault tolerance in situations where a service in one kernel
fails, or is corrupted by a malicious attack. Communication is supported
between kernels using shared memory regions for message passing.
Similarly, device driver data structures are shareable between kernels to
avoid the need for complex I/O virtualization, or communication with a
dedicated kernel responsible for I/O. In Quest-V, device interrupts are
delivered directly to a kernel, rather than via a monitor that determines
the destination. Apart from bootstrapping each kernel, handling faults and
managing shadow page tables, the monitors are not needed. This differs
from conventional virtual machine systems in which a central monitor, or
hypervisor, is responsible for scheduling and management of host resources
amongst a set of guest kernels. In this paper we show how Quest-V can
implement novel fault isolation and recovery techniques that are not
possible with conventional systems. We also show how the costs of using
virtualization for isolation of system services does not add undue
overheads to the overall system performance.
%R 2011-030
%T Safe Compositional Modeling And Analysis Of Constrained Flow Networks (MA Thesis)
%A Soule, Nate
%D December 30, 2011
%U http://www.cs.bu.edu/techreports/2011-030-MA-Thesis-Nate-Soule.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Constrained flow network models represent systems where flows exists
between nodes, and constraints exist to regulate those flows. Smart
grids, vehicular road travel, computer networks, and cloud-based
resource distribution, among other domains all have natural
representations in this manner. As these systems grow in size and
complexity, traditional analysis and certification of safety
invariants becomes increasingly costly. In addition today's techniques
require the system to be fully specified in order to perform
meaningful analysis. The NetSketch formalism and toolset introduce a
lightweight framework for modeling and analysis of constrained flow
networks that overcomes these issues. NetSketch offers a processing
method based on type-theoretic notions that enables large scale safety
verification by allowing for compositional, as opposed to
whole-system, analysis. By inferring types for sub-graphs of the
modeled networks, not only can cost of analysis be greatly reduced,
but analysis of composite modules containing incomplete or
underspecified components can be conducted. The NetSketch tool exposes
the power of this formalism in an intuitive web-based graphical user
interface. This work describes the formalism, a type system, as well
as an implementation. In addition potential use cases for this type of
modeling and analysis are investigated, and connections are drawn to
existing modeling tools and techniques.
%R 2012-001
%T Seamless Composition and Integration: A Perspective on Formal Methods Research
%A Bestavros, Azer
%A Kfoury, Assaf
%A Lapets, Andrei
%D February 7, 2012
%U http://www.cs.bu.edu/techreports/2012-001-mscs-editorial.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Formal methods are now a central component of computer-science
education and research. However, there will always be advances in
mathematical logic -- a.k.a. `formal methods' among computer
scientists -- leading to advances in reliable, safe and secure
computing. There are many research directions that will promote
the impact of formal methods on computer science in significant
and novel ways. We outline two directions, each associated with
its own research challenges, that are complementary to the current
state-of-the-art: one of composability and one of integration,
each considered in a specific context drawn from our own recent
research and teaching experience. We try to clarify why the study
and ultimate resolution of these two challenges hold the promise
of important breakthroughs in the accessability of formal methods
and, ultimately, their applicability.
%R 2012-002
%T Mechanism Design for Spatio-Temporal Request Satisfaction in Mobile Networks
%A Bassem, Christine
%A Bestavros, Azer
%D February 10, 2012
%U http://www.cs.bu.edu/techreports/2012-002-gpaas-mechanism-design.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Mobile agents participating in geo-presence-capable crowdsourcing ap-
plications should be presumed rational, competitive, and willing to deviate
from their routes if given the right incentive. In this paper, we design a
mechanism that takes into consideration this rationality for request satis-
faction in such applications. We propose the Geo-temporal Request Sat-
isfaction (GRS) problem to be that of nding the optimal assignment of
requests with specic spatio-temporal characteristics to competitive mo-
bile agents subject to spatio-temporal constraints. The objective of the
GRS problem is to maximize the total prot of the system subject to our
rationality assumptions. We dene the problem formally, prove that it
is NP-Complete, and present a practical solution mechanism, which we
prove to be convergent, and which we evaluate experimentally.
%R 2012-003
%T The Syntax and Semantics of a Domain-Specific Language for Flow-Network Design
%A Kfoury, Assaf
%D February 17, 2012
%U http://www.cs.bu.edu/techreports/2012-003-syntax-and-semantics-of-dsl.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Flow networks are inductively defined, assembled from small components
to produce arbitrarily large ones, with interchangeable
functionally-equivalent parts. We carry out this induction formally
using a domain-specific language (DSL). Associated with our DSL are a
semantics and a typing theory. The latter gives rise to a system of
formal annotations that enforce desirable properties of flow networks as
invariants across their interfaces. A prerequisite for a typing theory
is a formal semantics, i.e., a rigorous characterization of flows that
are safe ( or just feasible in this report) for the network, possibly
restricted to satisfy additional efficiency or safety requirements. We
give a detailed presentation of a denotational semantics only, but also
point out the elements that an equivalent operational semantics must
include.
%R 2012-004
%T Algebraic Characterizations of Flow-Network Typings
%A Kfoury, Assaf
%D February 17, 2012
%U http://www.cs.bu.edu/techreports/2012-004-flow-network-typings.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
A flow network N is a capacited finite directed graph, with multiple
input ports/arcs and multiple output ports/arcs. A flow f in N
assigns a non-negative real number to every arc and is feasible if it
satisfies flow conservation at every node and respects
lower-bound/upper-bound capacities at every arc. We develop an
algebraic theory of feasible flows in such networks with several
beneficial consequences. We define algorithms to infer, from a given
flow network N, an algebraic classification, which we call a typing
for N, of all assignments f_0 of values to the input and output arcs
of N that can be extended to a feasible flow f. We then establish
necessary and sufficient conditions on an arbitrary typing T
guaranteeing that T is a valid typing for some flow network N. Based
on these necessary and sufficient conditions, we define operations on
typings that preserve their validity (to be typings for flow
networks), and examine the implications for a typing theory of flow
networks.
%R 2012-005
%T On Traffic Matrix Completion in the Internet
%A Gursun, Gonca
%A Crovella, Mark
%D February 17, 2012
%U http://www.cs.bu.edu/techreports/2012-005-traffic-matrix-completion.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
The ability of an ISP to infer traffic volumes that are not
directly measurable can be useful for research, engineering, and
business intelligence. Previous work has shown that traffic matrix
completion is possible, but there is as yet no clear understanding of
which ASes are likely to be able to perform TM completion, and which
traffic flows can be inferred. In this paper we investigate the
relationship between the AS-level topology of the Internet and the
ability of an individual AS to perform traffic matrix completion. We
first frame the questions through abstract analysis of idealized
topologies, and then use actual routing measurements and topologies to
study the ability of real ASes to infer traffic flows. Our first set
of results identifies which ASes are best-positioned to perform TM
completion. We show, surprisingly, that TM completion ability is not
particularly characteristic of ASes in the ‘core,’ nor does it help
for an AS to have many peering links. Rather, the most important
factor enabling an AS to perform TM completion is the number of direct
customers it has. Our second set of results focuses on which flows can
be inferred. We show that topologically close flows are easier to
infer, and that flows passing through customers are particularly well
suited for inference.
%R 2012-006
%T Why Elasticity Matters
%A Schatzberg, Dan
%A Appavoo, Jonathan
%A Krieger, Orran
%A VanHensbergen, Eric
%D April 15, 2012
%U http://www.cs.bu.edu/techreports/2012-006-elasticity-matters.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper we proposed a new research agenda focused on elasticity.
We argued that elasticity is an important area of research and
hypothesized that research in this area will lead to more efficient
systems with less hoarding, new applications that exploit massive
cloud resources elastically, and system software and libraries that will
simplify the task of developing elastic applications.
We discussed some of our thoughts on a top-to-bottom cloud-scale
system focused on elasticity. We argued that such a system will
require: 1) a HW/IaaS layer that can quickly reallocated resources to
different applications, 2) an event driven model where resource
demand flows from the high level layers as transparently as
possible to the lowest level of the system, and 3) a model of
modularity that allows layers to be overridden as necessary
and provides applications with a component model that enables the base
elasticity to be exploited by new and advanced applications.
%R 2012-007
%T Programmable Smart Machines
%A Waterland, Amos
%A Appavoo, Jonathan
%A Schatzberg, Dan
%D April 15, 2012
%U http://www.cs.bu.edu/techreports/2012-007-programmable-smart-machines.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
In this paper we conjecture that a system can be constructed that
exploits the general ability to learn through the counting,
correlating, and memorizing of occurrences of events to fast-forward
a programmable computer. In particular, we propose a signal based
interpretation of a computer's execution that can be used to implement
a form of system state memoization using a predictive associative
memory. Such an approach may some day lead to a system that can
utilize both traditional logic and neuromorphic or other biologically
inspired mechanisms to be both programmable and smart.
%R 2012-008
%T Scalable Elastic Systems Architecture
%A Appavoo, Jonathan
%A Schatzberg, Dan
%D April 15, 2012
%U http://www.cs.bu.edu/techreports/2012-008-sesa-architecture.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Cloud computing has spurred the exploration and exploitation of
elastic access to large scales of computing. To date the predominate
building blocks by which elasticity has been exploited are
applications and operating systems that are built around traditional
computing infrastructure and programming models that are in-elastic or
at best coarsely elastic. What would happen if application themselves
could express and exploit elasticity in a fine grain fashion and this
elasticity could be efficiently mapped to the scale and elasticity
offered by modern cloud hardware systems? Would economic and market
models that exploit elasticity pervade even the lowest levels? And
would this enable greater efficiency both globally and individually?
Would novel approaches to traditional problems such as quality of
service arise? Would new applications be enabled both technically and
economically?
%R 2012-009
%T Transistor Scaled HPC Application Performance
%A Appavoo, Jonathan
%A Schatzberg, Dan
%D April 15, 2012
%U http://www.cs.bu.edu/techreports/2012-009-transistor-hpc.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
We propose a radically new, biologically inspired, model of extreme
scale computer on which application performance automatically scales
with the transistor count even in the face of component failures.
Today high performance computers are massively parallel systems
composed of potentially hundreds of thousands of traditional processor
cores, formed from trillions of transistors, consuming megawatts of
power. Unfortunately, increasing the number of cores in a system,
unlike increasing clock frequencies, does not automatically translate
to application level improvements. No general auto-parallelization
techniques or tools exist for HPC systems. To obtain application
improvements, HPC application programmers must manually cope with the
challenge of multicore programming and the significant drop in
reliability associated with the sheer number of transistors.
Drawing on biological inspiration, the basic premise behind this work
is that computation can be dramatically accelerated by integrating a
very large-scale, system-wide, predictive associative memory into the
operation of the computer. The memory effectively turns computation
into a form of pattern recognition and prediction whose result can be
used to avoid significant fractions of computation. To be effective
the expectation is that the memory will require billions of concurrent
devices akin to biological cortical systems, where each device
implements a small amount of storage, computation and localized
communication.
As typified by the recent announcement of the Lyric GP5 Probability
Processor, very efficient scalable hardware for pattern recognition
and prediction are on the horizon. One class of such devices, called
neuromorphic, was pioneered by Carver Mead in the 80's to provide a
path for breaking the power, scaling, and reliability barriers
associated with standard digital VLSI technology. Recent neuromorphic
research examples include work at Stanford, MIT, and the DARPA
Sponsored SyNAPSE Project. These devices operate transistors as
unclocked analog devices organized to implement pattern recognition
and prediction several orders of magnitude more efficiently than
functionally equivalent digital counterparts. Abstractly, the devices
can be used to implement modern machine learning or statistical
inference. When exposed to data as a time-varying signal, the devices
learn and store patterns in the data at multiple time scales and
constantly provide predictions about what the signal will do in the
future. This kind of function can be seen as a form of predictive
associative memory.
In this paper we describe our model and initial plans for exploring it.
%R 2012-010
%T Peer and Authority Pressure in Information-Propagation Models (MA Thesis)
%A Brova, George
%D May 9, 2012
%U http://www.cs.bu.edu/techreports/2012-010-MA-Thesis-George-Brova.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
Existing models of information diffusion assume that peer influence is
the main reason of the observed propagation patterns. This work
examines the role of authority pressure on the observed information
cascades. We model this intuition by characterizing some nodes in the
network as "authority" nodes. These are nodes that can influence
large number of peers, while themselves cannot be influenced by
peers. We propose a model that associates with every item two
parameters that quantify the impact of peer and the authority pressure
on the item’s propagation. Given a network and the observed diffusion
patterns of the item, we learn these parameters from the data and
characterize the item as peer- or authority-propagated. We also
develop a randomization test that evaluates the statistical
significance of our findings and makes our item characterization
robust to noise. Our experiments with real data from online media and
scientific-collaboration networks indicate that there is a strong
signal of authority pressure in these networks.
%R 2012-011
%T Secure Pairing of Mobile Devices (MA Thesis)
%A Hacker, Megan
%A Crovella, Mark
%A Reyzin, Leonid
%D May 16, 2012
%U http://www.cs.bu.edu/techreports/2012-011-MA-Thesis-Megan-Hacker.ps.Z
%I CS Department, Boston University
%Z Wed, 16 May 2012 14:43:22 GMT
%X
As mobile devices become increasingly popular, the necessity
for both user-friendly and secure pairing methods for these devices
also rises. One natural approach to pairing devices is to match them
based on a shared experience. In this work, we define a shared
experience as the act of physically holding two devices together and
shaking them for a short period. The common movement data collected
during the shaking process can subsequently be used to verify the
authenticity of a secret key established via a key exchange
protocol. This paper explores the process of key verification using
two different measures: a coherence measure derived through time
series analysis and a measure based on Hamming distance. Using ROC
curves, we show that both of these measures robustly distinguish
between the case where two devices have been shaken together and the
case where two devices have been shaken separately.