Azer Bestavros
Department of Computer Science
College of Arts & Sciences
Boston University

Office: 111 Cummington Street, MCS-276, Boston, MA 02215
Tel: 617.353.9726 / Fax: 617.353.6457 / Email: best


[CS109] [CS350] [CS697] [DWE]

[Papers] [Talks] [Projects] [Groups] [Students]


 [Coptic] [Egypt]

(c) Azer Bestavros


Active Research Projects

Past Research Projects

Supervised PhD Theses

MASS: Diagnosis and Control of Network Variability by MASS Servers

    PIs: Azer Bestavros, John Byers, and Mark Crovella
    Participants: Khaled Harfoush, Shudong Jin, Jun Liu.

    Massively Accessed Scalable Servers (a.k.a. Mass servers) are popular Internet servers, which produce a substantial fraction of the traffic flowing through the network. Mass servers are uniquely positioned (a) to observe and diagnose network conditions by tracking the flows that they generate, and (b) to manage and control network resources by better regulating and scheduling the traffic they inject into the network.

    The MASS Research Group in the Department of Computer Science at Boston University is pursuing a number of projects to achieve these goals over a wide spectrum of time scales. Over shorter time scales, a Mass server can minimize packet loss by smoothing the (otherwise bursty) process of injecting packets into the network. Over longer time scales, a Mass server can perform aggregate congestion management by bundling like connections to avoid the burstiness that results from competition among flows.

    A key component of the Mass Servers Research Group work is implementation and prototyping. To that end, members of the group are developing three key elements of Mass Servers, namely: (1) Beacon: A collection of network measurement and diagnosis tools, (2) TurnPike: A collection of network management and control protocols and services, and (3) BackBay: A platform that supports the integration of Beacon and TurnPike functionality into a high-performance web server architecture.

    More information on this project is available from the MASS Project Home Page

ITM: Internet Traffic Managers

    PIs: Ibrahim Matta, Azer Bestavros, and Mark Crovella
    Participants:  Mina Guirguis, Liang Guo

    The scalability of the Internet hinges on our ability to tame the unpredictability associated with its open architecture. This project investigates the development of basic control strategies for reducing traffic burstiness and improving network utilization. Such strategies can be applied through Traffic Managers (TMs)---special network elements strategically placed in the Internet (e.g., in front of clients/servers or at exchange/peering points between administrative domains). We believe that the incorporation of such control functionalities will be key to the ability of the network infrastructure to sustain its own growth and to nurture the Quality-of-Service (QoS) needs of emerging applications.

    More information on this project is available from the ITM Project Home Page

Commonwealth: Scalable Web Server Architectures And Protocols

    PIs: Azer Bestavros and Mark Crovella
    Participants: Paul Barford, Jun Liu, Adam Bradley, David Martin (Phd'1999), Robert Frangioso, Jorge Londono (MA'1999), Luis Aversa (MA'1998), Naomi Katagai (MA'1998)

    The phenomenal growth of the WWW is imposing considerable strain on Internet resources and Web servers, prompting numerous concerns about the Web's continued viability. The success of high-performance Web servers in alleviating these performance problems is ultimately limited, unless Web services are designed to be inherently scalable. The Commonwealth project is designing, implementing, and evaluating a prototype architecture and a set of associated protocols for scalable Web services. The Commonwealth architecture for hosting scalable Web services allows scalability through (1) parallel processing on tightly-coupled nodes within a Web site, and (2) load distribution across loosely-coupled Web sites. Commonwealth's underlying philosophy is to achieve a WEALTH of performance through the use of COMMON components, and to do so along an incremental upgrade path.

    The Commonwealth project is pursuing basic research in four main areas: (1) Data placement (caching, prefetching, and dissemination protocols); (2) Resource management (scheduling, load balancing, and admission control protocols); (3) Networking (routing, packet redirection, and connection migration) and (4) Middleware Services (replication and resource discovery).

    These problems are being addressed with particular attention to the issue of high workload variability, which---in previous work by the Oceans Group at Boston University---was shown to pose significant problems for the design of Internet-based systems.

    More information on this project is available from The CommonWealth Project Home Page

ATCP: Aggregating TCP State Across Multiple Connections

    PI: Azer Bestavros
    Participants: Olivier Hartman, Khaled Harfoush, Thomas Gschwendtner

    In this project we investigate the design of a stateful TCP stack that allows multiple connections that share a common congested path to share the same state. In that respect we investigate two possible scenarios. In the first, we investigated an extension of the TCP stack that allows a sequence of TCP connections between the same machines to share the congestion window. Our Linux implementation of this scenario shows significant improvement in performance, particularly when the individual connections are short-lived. Such a behavior is common on the web, due to the nature of the HTTP protocol and the distribution of file sizes. In the second scenario, we investigate an extension of the TCP stack that allows a set of concurrent TCP connections between the same machines to be controlled in an aggregate manner. Work on this scenario is still on-going, but initial results suggest improved performance.

DCR: Distributed Connection Routing for Scalable Internet Servers

    PIs: Azer Bestavros and Mark Crovella
    Participants: Jun Liu, David Martin (PhD'1999), Jorge Londono (MA'1999), Luis Aversa (MA'1998)

    To construct high performance Web servers, system builders are increasingly turning to distributed designs. An important challenge that arises in distributed Web servers is the need to direct incoming connections to individual hosts. Previous methods for connection routing have employed a centralized node which handles all incoming requests. In this project we investigate the use of fully-distributed, all-software techniques for connection routing and distribution amongst cluster hosts. In that respect we have implemented and tested two protocols: DPR and CM. Distributed Packet Rewriting (DPR) operates at the TCP/IP layer, allowing connection routing using packet rewriting. Connection Migration (CM) is a kernel functionality presented to the application layer thorugh an API that allows applications (e.g. web servers) to "migrate" TCP connections. Both of these techniques are elements of the Commonwealth Scalable Web Server Architecture project.

WebHint: Client/Server-based WWW Prefetching Protocols

    PI: Azer Bestavros
    Participants: Carlos Cunha (PhD'1998), Martin Mroz (MA'1998), Chau Anh Nguyen (MA'1998)

    In this project we investigate the potential performance improvements possible through prefetching of Web documents. In that respect we investigate two possible policies. The first policy is server-based. It relies on analysis of reference patterns at the server, constructing Markov models (e.g. first-order and second-order, etc.) to capture the traversal and embedding dependencies that exist between documents, and using these Markov models to speculatively service documents, or simply provide prefetching hints to clients. The second policy is client-based. It relies on analysis of the client's logs, constructing Markov models (e.g. first-order and second-order, etc.) to capture the traversal and embedding dependencies that exist between documents, and using these Markov models to aggressively prefetch documents.

    On the server side, we used extensive trace simulations based on the logs of our departmental HTTP server to show that both server load and service time could be reduced considerably, if speculative service is used. This is above and beyond what is currently achievable using client-side caching, prefetching, and server-side dissemination.

    On the client side, we used extensive trace simulations based on logs we collected from over 400 users in our departmental lab to show that service time could be reduced, if client-initiated prefetching is used. This reduction, however, is quite dependent on what we term as the user "surfing" or "working" behavior, which turns out to be quite predictable.

    As part of this project we have implemented server-assisted prefetcing by modifying the NCSA HTTP server to provide hints based on Markov models constructed for the server's local documents, and by modifying the NCSA Mosaic browser to interpret these "hints" and perform the necessary prefetching. Also, as part of this project we have implemented client-initiated prefetching by modifying the NCSA Mosaic browser to construct Markov models of the user access patterns and use these to perform the necessary prefetching. We are working on combining our server-assisted prefetching and client-initiated prefetching in a single framework that allows switching between the two regimes appropriately.

WebSeed: Content Replication Protocols for the WWW

    PI: Azer Bestavros
    Participants: Carlos Cunha (PhD'1998), Shudong Jin

    Research on replication techniques to reduce traffic and minimize the latency of information retrieval in a distributed system has concentrated on client-based caching, whereby recently/frequently accessed information is cached at a client (or at a proxy thereof) in anticipation of future accesses. We believe that such myopic solutions---focussing exclusively on a particular client or set of clients---are likely to have a limited impact. Instead, we offer a solution that allows the replication of information to be done on a global supply/demand basis.

    The primary goal of this project is to develop data dissemination mechanisms that allow information to propagate from its producers to servers that are closer to its consumers. This idea relies on a particular future model of the Internet where in addition to clients and servers, service proxies offer their storage and bandwidth capacities ``for rent''.

    Our dissemination techniques reduce network traffic and balance load amongst servers by exploiting geographic and temporal locality of reference properties exhibited in client access patterns to decide on replication and placement strategies. The level of dissemination depends on the relative popularity of documents, and on the expected reduction in traffic that results from such dissemination.

AFTER: Adaptive Forward Timely Erasure Recovery in High-Speed Networks

    PI: Azer Bestavros
    Participants: Gitae Kim (PhD'1998), Jaehee Yoon

    A variety of real-time control protocols have been developed to cope with communication failures (namely, an entire or partial packet loss) due to the limitations of network resources in distributed computing systems. These protocols make use of temporal and/or spatial redundancies to meet preset reliability and synchronization constraints---often at an expensive cost, resulting from inefficient or inflexible resource usage.

    Protocols that employ spatial redundancy, such as the techniques based on bandwidth-reservation and Forward Error correction (FEC), fail to boost data reliabilility when bit-corruptions hit a packet (or cell) in such a way that the protocols cannot mask such corruptions. Furthermore, the protocols introduced so far in the literature are based on static, proactive schemes, whereby the level of redundancy is fixed at the time of protocol initiation or throughout the protocol's lifespan. In order to guarantee the quality of service (QoS) requested by an application, they need to reserve enough redundancy in preparation for a worst-case scenario that happens rarely, if ever. In particular, these methods lack the ability to adapt to the ever-varying data rates and/or communication failure rates, making them inefficient in their use of network bandwidth.

    Protocols that employ temporal redundancy for masking communication failures provide a high degree of reliability, yet tend to suffer from high packet delays, as well as large jitter. These protocols use reactive schemes that rely on packet retransmissions when failures are detected. Automatic Repeat Request (ARQ) techniques uniquely belong to this group of protocols. Because of their high latencies due to recovery time, ARQ methods are not suitable for real-time applications that require stringent delay bounds and a high degree of data integrity. Examples of such applications include on-line financial data feeds and group simulations.

    In this project, we introduce the notion of Adaptive Forward Timely Erasure Recovery (AFTER), which allows the use of a feedback-based dynamic redundancy control scheme to provide an effective transport framework, from which a variety of transport applications can benefit. AFTER uses efficient encoding techniques (e.g. Reed Solomon or Tornado codes) that make dynamic redundancy control possible. When it is implemented for reliable data transfers, AFTER's dynamic control scheme allows us to optimize the use of communication bandwidth by dynamically balancing the use of spatial redundancy (through FEC) and temporal redundancy (through retransmission) to achieve the level of packet delay and delay variance preset by the application's requested QOS. AFTER can be implemented for both reliable and unreliable data transports, under various network environments ranging from highly reliable high-speed communication channels, such as Constant Bit Rate (CBR) services in ATM network, to low-cost best-effort data communication channels, such as conventional packet-switched networks. To that end, we have proposed and implemented an AAL-layer Lazy Packet Discard (LPD) Policy for TCP/IP over ATM. Also, we are investigating techniques for using AFTER as a mechanism for reliable multicast.

SRMS: Statistical Rate Monotonic Scheduling for Real-time Computing and Communication Systems

    PI: Azer Bestavros
    Participants: Alia Atlas (PhD'1999), Adrian Prezioso (MA'1999)

    Rate Monotonic Scheduling (RMS) is a preemptive static priority-based scheduling algorithm for real-time periodic task systems. Using each task's utilization, RMS determines whether the system can be scheduled so that 100 percent of the deadlines are met. For periodic tasks with non-deterministic resource utilization requirements, RMS must use the peak utilization requirements. This may be extremely wasteful of system resources, especially for applications with highly-variable resource utilization requirements (e.g., MPEG video communication).

    In this project, we consider SRMS---a modified RMS where the tasks have random resource utilization requirements and do not require all of their deadlines to be met. Instead, statistical guarantees are made for the percentage of deadlines which will be met per task. To permit this, admission control is added for each job of a task. The current algorithm works for tasks whose periods are multiples of each other. Each task is given a guaranteed resource utilization allowance for the period of the next lowest task. Using this allowance, each job is examined at its release time to determine if the required resource utilization is available from the allowance. In addition, we allow any unused allowance to be inherited by lower tasks.

    Our SRMS algorithm allows an efficient use of system resources while providing quality of service guarantees for task systems with random resource utilization requirements. Moreover, it allows us to decouple a task's priority from its period, thus allowing different statistical guarantees to be provided to each task independent of that task's period. These capabilities have been established both analytically and through simulations.

    More information on this project is available from the home pages of the The SRMS Workbench and The SRMS NT Service projects.

SCC: Speculative Concurrency Control for Real-time Databases

    PI: Azer Bestavros
    Participants: Spyridon Braoudakis (PhD'1994), Sue Nagy (PhD'1997), Euthimios Panagos (PhD'1996), Benjamin Mandler (MA'1994), Biao Wang (MA'1994)

    In order for multiple transactions to operate concurrently on a shared database, a protocol must be adopted to coordinate their activities. Such a protocol -- called a concurrency control algorithm -- aims at insuring a consistent state of the database system, while allowing the maximum possible concurrency among transactions.

    Traditional concurrency control algorithms can be broadly classified as either pessimistic or optimistic. Pessimistic Concurrency Control (PCC) algorithms avoid any concurrent execution of transactions as soon as conflicts that may result in future inconsistencies are detected. On the contrary, Optimistic Concurrency Control (OCC) algorithms allow such transactions to proceed at the risk of having to restart them in case these suspected inconsistencies materialize.

    For conventional database management systems with limited resources, performance studies of concurrency control methods have concluded that PCC locking protocols perform better than OCC techniques. The main reason for this good performance is that PCC's blocking-based conflict resolution policy results in resource conservation, whereas OCC with its restart-based conflict resolution policy wastes more resources. For Real-Time DataBase Systems (RTDBS), where transactions execute under strict timing constraints, maximum concurrency (or throughput) ceases to be an expressive measure of performance. Rather, the number of transactions completed before their set deadlines becomes the decisive performance measure. Most real-time concurrency control schemes considered in the literature are based on Two-Phase Locking (2PL), which is a PCC strategy.

    Despite its widespread use in commercial systems, 2PL has some properties such as the possibility of deadlocks and long and unpredictable blocking times that damage its appeal for real-time environments, where the primary performance criterion is meeting time constraints and not just preserving consistency requirements. A recent evaluation of the behavior of both PCC and OCC schemes in a real-time environment concluded that for a RTDBS with firm deadlines (where late transactions are immediately discarded) OCC outperforms PCC, especially when resource contention is low.

    However, a disadvantage of the classical OCC is that when a conflict is detected the transaction being validated is always the one to be aborted, without respect to transactions' priorities or deadlines. An even more serious problem of classical OCC is that transaction conflicts are not detected until the validation phase, at which time it may be too late to restart. PCC two-phase locking algorithms do not suffer from this problem because they detect potential conflicts as they occur. They suffer, however, from the possibility of unnecessarily missing set deadlines as a result of unbounded waiting due to blocking.

    We propose a categorically different approach to concurrency control for RTDBS, which relies on the use of redundant processes that speculate on alternative schedules, once conflicts that threaten the consistency of the database are detected. These alternative schedules are adopted only if the suspected inconsistencies materialize; otherwise, they are abandoned. Due to its nature, we have termed this approach Speculative Concurrency Control (SCC). SCC algorithms use redundancy to combine the advantages of both PCC and OCC algorithms, while avoiding their disadvantages. On the one hand, SCC resembles PCC in that potentially harmful conflicts are detected as early as possible, allowing a head-start for alternative schedules, and thus increasing the chances of meeting the set timing constraints -- should these alternative schedules be needed (due to restart as in OCC). On the other hand, SCC resembles OCC in that it allows conflicting transactions to proceed concurrently, thus avoiding unnecessary delays (due to blocking as in PCC) that may jeopardize their timely commitment.

ACCORD: Admission Control and Capacity Overload management for RTDBs

    PI: Azer Bestavros
    Participant: Susan Nagy (PhD'1998)

    Admission control and overload management techniques are central to the design and implementation of Real-Time Database Systems. In this project, we investigate various such mechanisms using ACCORD: a novel Admission Control and Capacity Overload management framework for Real-time Databases.

    The main challenge involved in scheduling transactions in a Real-Time DataBase management (RTDB) system is that the resources needed to execute a transaction are not known a priori. For example, the set of objects to be read (written) by a transaction may be dependent on user input (as in a stock market application) or dependent on sensory inputs (as in a process control application). Therefore, the a priori reservation of resources (e.g., read/write locks on data objects) to guarantee a particular Worst Case Execution Time (WCET) becomes impossible---and the non-deterministic delays associated with the on-the-fly acquisition of such resources pose the real challenge of integrating real-time scheduling with other database protocols. To deal with this challenge, most RTDB systems make two assumptions: (1) they relax the transaction deadline semantics by allowing only soft and firm (but not hard) deadlines; and (2) they adopt time-cognizant, best-effort algorithms that optimize the system performance in the presence of such flexible deadlines.

    To illustrate this state-of-affairs, consider the huge body of research on real-time concurrency control, where complex time-cognizant concurrency control techniques are proposed for the sole purpose of maximizing the number of transactions that meet their deadlines (or other metrics thereof). A careful evaluation of these elaborate techniques reveals that their superiority is materialized only when the RTDB system is overloaded. However, when the system is not overloaded, the performance of these techniques becomes comparable to that of much simpler techniques (e.g., 2PL-PA). It is important to observe that when a RTDB system is overloaded, a large percentage of transactions end up missing their deadlines. This observation leads to the following question: How better would the performance of the system be if these same transactions (that ended up missing their deadlines) were not allowed into the system in the first place? The answer is obviously ``much better'' because with hindsight, the limited resources in the system would not have been wasted on these transactions to start with. While such a clairvoyant scheduling of transactions is impossible in a real system, admission control and overload management techniques could be used to achieve the same goal. In this project, we introduce and evaluate such techniques.

    At the heart of this project is ACCORD, an Admission Control and Capacity Overload management Real-time Database framework---an architecture and a transaction model---for hard deadline RTDB systems. The system architecture consists of admission control and scheduling components which provide early notification of failure to submitted transactions that are deemed not valuable or incapable of completing on time. The transaction model consists of two components: a primary task and a compensating task. Transactions which are admitted to the system are guaranteed, by the deadline of the transaction, one of two outcomes: either the primary task will successfully commit or the compensating task will safely terminate. Our admission control mechanisms permit transactions to fail at the earliest possible point in time (i.e. at submission time) rather than at a later time. Also as a system becomes overloaded, our admission control techniques allow for the utilization of system resources in the most profitable way.

CLEOPATRA: Language and Tools for Embedded Real-Time Computing

    PI: Azer Bestavros
    Participants: Robert Popp (MA'1992), Devora Reich (MA'1992)

    Predictability---the ability to foretell that an implementation will not violate a set of specified reliability and timeliness requirements ---is a crucial, highly desirable property of responsive embedded systems. In this project we proposed and implemented a development methodology for responsive systems, which enhances predictability by eliminating potential hazards resulting from physically-unsound specifications.

    The backbone of our methodology is the Time-constrained Reactive Automaton (TRA) formalism, which adopts a fundamental notion of space and time that restricts expressiveness in a way that allows the specification of only reactive, spontaneous, and causal computation. Using the TRA model, unrealistic systems---possessing properties such as clairvoyance, caprice, infinite capacity, or perfect timing---cannot even be specified. I argue that this "ounce of prevention" at the specification level is likely to spare a lot of time and energy in the development cycle of responsive systems---not to mention the elimination of potential hazards that would have gone, otherwise, unnoticed.

    The TRA model is presented to system developers through the CLEOPATRA programming language. CLEOPATRA features a C-like imperative syntax for the description of computation, which makes it easier to incorporate in applications already using C. It is event-driven, and thus appropriate for embedded process control applications. It is object-oriented and compositional, thus advocating modularity and reusability. CLEOPATRA is semantically sound; its objects can be transformed, mechanically and unambiguously, into formal TRA automata for verification purposes, which can be pursued using model-checking or theorem proving techniques. Since 1989, an ancestor of CLEOPATRA has been in use as a specification and simulation language for embedded time-critical robotic processes.

Phd Thesis: Reduction of Quality Attacks on Adaptation Mechanisms

    Mina Guirguis, PhD 2006

    One important consideration in realizing dependable computing systems and networks is to uncover vulnerabilities in their designs to adversarial attacks. Currently, the designs of these systems employ different forms of adaptation mechanisms in order to optimize their performance by ensuring that desirable properties, such as stability, efficiency and fairness, are not compromised. This thesis discovers and studies a new type of adversarial attacks that target such adaptation mechanisms by exploiting their dynamics of operation { i.e., the characteristics of their transient behavior. We coin this new breed of adversarial attacks, Reduction of Quality (RoQ) attacks. The premise of RoQ attacks is to keep an adaptive mechanism constantly in a transient state, effectively depriving the system from much of its capacity and significantly reducing its service quality. In this thesis we develop a general control-theoretic framework that provides a unified approach to modeling and vulnerability assessment of the dynamics underlying RoQ exploits. Within this framework, we introduce and formalize the notion of an attack "Potency" that capitalizes on the attacker's best incentive: maximizing the marginal utility of its attack traffic. Unlike traditional brute-force Denial of Service attacks that aim to take down a system at any cost, RoQ attacks aim to maximize the damage inflicted on a system through consuming an innocuous, small fraction of that system's hijacked capacity. We instantiate our framework using detailed analytical models and associated metrics on a series of adaptation mechanisms that are commonly used in networking protocols, end-system admission controllers and load balancers. We assess the impact of RoQ attacks using analysis, simulations, and Internet experiments. We identify key factors that expose the tradeoffs between resilience and susceptibility to RoQ attacks. These factors could be used to harden adaptation mechanisms against RoQ exploits, in addition to developing new forms of countermeasures and defense mechanisms.

Phd Thesis:  Embedding Games: Distributed Resource Management with Selfish Users

    Jorge Londono, PhD 2010

    Large scale distributed computing infrastructures pose challenging resource management problems, which could be addressed by adopting one of two perspectives. On the one hand, the problem could be framed as a global optimization that aims to minimize some notion of system-wide (social) cost. On the other hand, the problem could be framed in a game-theoretic setting whereby rational, selfish users compete for a share of the resources so as to maximize their private utilities with little or no regard for system-wide objectives. This game-theoretic setting is particularly applicable to emerging cloud and grid environments, testbed platforms, and many networking applications.

    By adopting the first, global optimization perspective, this thesis presents NetEmbed: a framework, associated mechanisms, and implementations that enable the mapping of requested configurations to available infrastructure resources.

    By adopting the second, game-theoretic perspective, this thesis defines and establishes the premises of two resource acquisition mechanisms: Colocation Games and Trade and Cap. Colocation Games enable the modeling and analysis of the dynamics that result when rational, selfish parties interact in an attempt to minimize the individual costs they incur to secure shared resources necessary to support their application QoS or SLA requirements. Trade and Cap is a market-based scheduling and load-balancing mechanism that facilitates the trading of resources when users have a mixture of rigid and fluid jobs, and incentivizes users to behave in ways that result in better load-balancing of shared resources.

    In addition to developing their analytical underpinnings, this thesis establishes the viability of NetEmbed, Colocation Games, and Trade and Cap by presenting implementation blueprints and experimental results for many variants of these mechanisms.

    The results presented in this thesis pave the way for the development of economically-sound resource acquisition and management solutions in two emerging, and increasingly important settings. In pay-as-you-go settings, where pricing is based on usage, this thesis anticipates new service offerings that enable efficient marketplaces in the presence of non-cooperative, selfish agents. In settings where pricing is not a function of usage, this thesis anticipates the development of service offerings that enable trading of usage rights to maximize the utility of a shared infrastructure to its tenants.



Phd Thesis: Overlay Network Creation and Maintenance with Selfish Users

    Georgios Smaragdakis, PhD 2008

    Overlay networks have been used for adding and enhancing functionality to the end-users without requiring modifications in the Internet core mechanisms. Overlay networks have been used for a variety of popular applications including routing, file sharing, content distribution, and server deployment. Previous work has focused on devising practical neighbor selection heuristics under the assumption that users conform to a specific wiring protocol. This is not a valid assumption in highly decentralized systems like overlay networks. Overlay users may act selfishly and deviate from the default wiring protocols by utilizing knowledge they have about the network when selecting neighbors to improve the performance they receive from the overlay.

    This thesis goes against the conventional thinking that overlay users conform to a specific protocol. The contributions of this thesis are threefold. It provides a systematic evaluation of the design space of selfish neighbor selection strategies in real overlays, evaluates the performance of overlay networks that consist of users that select their neighbors selfishly, and examines the implications of selfish neighbor and server selection to overlay protocol design and service provisioning respectively.

    This thesis develops a game-theoretic framework that provides a unified approach to modeling Selfish Neighbor Selection (SNS) wiring procedures on behalf of selfish users. The model is general, and takes into consideration costs reflecting network latency and user preference profiles, the inherent directionality in overlay maintenance protocols, and connectivity constraints imposed on the system designer. Within this framework the notion of user's "best response" wiring strategy is formalized as a k-median problem on asymmetric distance and is used to obtain overlay structures in which no node can re-wire to improve the performance it receives from the overlay. Evaluation results presented in this thesis indicate that selfish users can reap substantial performance benefits when connecting to overlay networks composed of non-selfish users. In addition, in overlays that are dominated by selfish users, the resulting stable wirings are optimized to such great extent that even non-selfish newcomers can extract near-optimal performance through naive wiring strategies.

    To capitalize on the performance advantages of optimal neighbor selection strategies and the emergent global wirings that result, this thesis presents EGOIST: an SNS-inspired overlay network creation and maintenance routing system. Through an extensive measurement study on the deployed prototype, results presented in this thesis show that EGOIST's neighbor selection primitives outperform existing heuristics on a variety of performance metrics, including delay, available bandwidth, and node utilization. Moreover, these results demonstrate that EGOIST is competitive with an optimal but unscalable full-mesh approach, remains highly effective under significant churn, is robust to cheating, and incurs minimal overheads.

    This thesis also studies selfish neighbor selection strategies for swarming applications. The main focus is on n-way broadcast applications where each of n overlay user wants to push its own distinct file to all other destinations as well as download their respective data files. Results presented in this thesis demonstrate that the performance of our swarming protocol for n-way broadcast on top of overlays of selfish users is far superior than the performance on top of existing overlays.

    In the context of service provisioning, this thesis examines the use of distributed approaches that enable a provider to determine the number and location of servers for optimal delivery of content or services to its selfish end-users. To leverage recent advances in virtualization technologies, this thesis develops and evaluates a distributed protocol to migrate servers based on end-users demand and only on local topological knowledge. Results under a range of network topologies and workloads suggest that the performance of the distributed deployment is comparable to that of the optimal but unscalable centralized deployment.


Phd Thesis: Service Provisioning in Mobile Networks Through Coordinated Resource Management

    Hany Morcos, PhD 2008

    The pervasiveness of personal computing platforms offers an unprecedented opportunity to deploy large-scale services that are distributed over wide physical spaces. Two major challenges face the deployment of such services: the often resource-limited nature of these platforms, and the necessity of preserving the autonomy of the owner of these devices. These challenges preclude using centralized control and preclude considering services that are subject to performance guarantees. To that end, this thesis advances a number of new distributed resource management techniques that are shown to be effective in such settings, focusing on two application domains: distributed Field Monitoring Applications (FMAs), and Message Delivery Applications (MDAs). In the context of FMA, this thesis presents two techniques that are well-suited to the fairly limited storage and power resources of autonomously mobile sensor nodes. The first technique relies on amorphous placement of sensory data through the use of novel storage management and sample diffusion techniques. The second approach relies on an information-theoretic framework to optimize local resource management decisions. Both approaches are proactive in that they aim to provide nodes with a view of the monitored field that reflects the characteristics of queries over that field, enabling them to handle more queries locally, and thus reduce communication overheads. Then, this thesis recognizes node mobility as a resource to be leveraged, and in that respect proposes novel mobility coordination techniques for FMAs and MDAs. Assuming that node mobility is governed by a spatio-temporal schedule featuring some slack, this thesis presents novel algorithms of various computational complexities to orchestrate the use of this slack to improve the performance of supported applications. The findings in this thesis, which are supported by analysis and extensive simulations, highlight the importance of two general design principles for distributed systems. First, apriori knowledge (e.g., about the target phenomena of FMAs and/or the workload of either FMAs or DMAs) could be used effectively for local resource management. Second, judicious leverage and coordination of node mobility could lead to significant performance gains for distributed applications deployed over resource-impoverished infrastructures.

Phd Thesis: The Sensor Network Workbench: Towards Functional Specification, Verification and Deployment of Constrained Distributed Systems

    Michael Ocean, PhD 2008

    As the commoditization of sensing, actuation and communication hardware increases, so does the potential for dynamically tasked sense and respond networked systems (i.e., Sensor Networks or SNs) to replace existing disjoint and inexible special-purpose deployments (closed-circuit security video, anti-theft sensors, etc.). While various solutions have emerged to many individual SN-centric challenges (e.g., power management, communication protocols, role assignment), perhaps the largest remaining obstacle to widespread SN deployment is that those who wish to deploy, utilize, and maintain a programmable Sensor Network lack the programming and systems expertise to do so. The contributions of this thesis centers on the design, development and deployment of the SN Workbench (snBench). snBench embodies an accessible, modular programming platform coupled with a exible and extensible run-time system that, together, support the entire life-cycle of distributed sensory services. As it is impossible to nd a one-size- ts-all programming interface, this work advocates the use of tiered layers of abstraction that enable a variety of high-level, domain specic languages to be compiled to a common (thin-waist) tasking language; this common tasking language is statically veried and can be subsequently re-translated, if needed, for execution on a wide variety of hardware platforms. snBench provides: (1) a common sensory tasking language (Instruction Set Architecture) powerful enough to express complex SN services, yet simple enough to be executed by highly constrained resources with soft, real-time constraints, (2) a prototype high-level language (and corresponding compiler) to illustrate the utility of the common tasking language and the tiered programming approach in this domain, (3) an execution environment and a run-time support infrastructure that abstract a collection of heterogeneous resources into a single virtual Sensor Network, tasked via this common tasking language, and (4) novel formal methods (i.e., static analysis techniques) that verify safety properties and infer implicit resource constraints to facilitate resource allocation for new services. This thesis presents these components in detail, as well as two specic case-studies: the use of snBench to integrate physical and wireless network security, and the use of snBench as the foundation for semester-long student projects in a graduate-level Software Engineering course.

Phd Thesis:  A Type-Disciplined Approach to Developing Resources and Applications for the World-Wide Web

    Adam Bradley, PhD 2004

    Programming of stand-alone computer systems has long benefited from formal methods for system specification, programming, and execution; such formalisms lend themselves to precise mechanical verification of a system's desired properties. Type systems particularly have proven effective at reining in the unbounded expressive power of programming languages without excessively burdening the programmer. It is the position of this thesis that such techniques can be adapted to suit the programming of open networked systems. Thereby, benefits can be realized to the correctness and stability of the individual components of said systems, to the precision with which such systems are specified, and to the correctness, reliability, predictability, and interoperability of networked programs and systems as a whole.

    In support of this concept, five formal methods (the P, B, and X type systems, the L type model, and the CHAIN methodology) are developed which reflect the promotion of ``flows'' to first-class programming language citizens accessible to type systems and other formal correctness tools. We employ ``flow'' as a generic abstraction for mechanisms which affect the transaction of state among components of a networked system to achieve its desired end. These five systems largely reflect novel applications of existing technologies and systems to various forms of flows; two also employ a novel type construct, the ``stacked type syntax'', which captures nesting relationships within data representations.

    The P type system enforces constraints upon the flow of system state changes by restricting program side-effects to predictable classes. The B type systems identify flows of information from server back-end sources (basis data) to representations thus identifying potential representational inconsistencies within the system. The X type systems enforce XML well-formedness upon the output streams (flows) of programs. The L type model imposes structure upon the specification of the HTTP protocol's content model and supports a more precise declaration of the semantics of its syntax (i.e., its interpretive flow). The CHAIN methodology offers a systematic approach to assessing correctness of systems which construct communication channels (flows) by composing arbitrarily long sequences of intermediaries, and are thus not amenable to direct global verification.

Phd Thesis: Scalability of Multicast-based Streaming Delivery Mechanisms on the Internet

    Shudong Jin, PhD 2003

    This thesis examines the scalability of multicast-based streaming delivery through analytical and empirical evaluation methodologies. Scalability is assessed with respect to two metrics: server bandwidth and network cost. The first metric quantifies the amount of server resources needed to serve a large number of concurrent clients. The second metric quantifies the amount of network resources needed to serve these clients over the Internet. Scalability along these metrics is evaluated subject to two types of characteristic properties: access patterns and Internet topology. Access patterns assumed in this thesis allow clients to be synchronous or asynchronous, and allow them to access content sequentially or randomly. Internet topologies considered in this thesis exhibit power-law vertex degree distributions and small-world behavior. The findings of this thesis show that the server bandwidth requirement of multicast-based delivery techniques–such as stream merging and periodic broadcasting–largely depends on client access patterns. In particular, for asynchronous clients, if access is sequential, then the lower bound on server bandwidth grows logarithmically with the request arrival rate, but if client access is random, then the lower bound grows as the square root of the request arrival rate. The thesis also shows that the network cost of multicast-based streaming delivery depends mainly on the Internet topological properties. Specifically, both Internet power-law vertex degree distribution and small-world behavior affect the scaling behavior of IP multicast and of end system multicast mechanisms.

Phd Thesis: Metric-Induced Network Topologies

    Khaled Harfoush, PhD 2002

    Interest in building compact, accurate and efficient end-to-end network models increases as network-aware applications and services over the Internet are deployed. These models could be used to optimize the utilization of network resources, improve the quality of content delivery and help analyze network performance. In this talk, I will summarize the work I have done so far and the work yet to be completed as part of my PhD thesis on a framework for inferring Metric-Induced Network Topologies.

    The main contributions of the proposed thesis are as follows. First, this thesis presents MINT---a framework for the characterization of Metric-Induced Network Topologies. MINT is a general framework that uses correlations between end-to-end measurements across multiple flows emanating from a single host to model the network properties between this host and the flows' end points. A salient feature of MINT is its ability to compress the representation of the network, subject to specific sensitivity thresholds. Second, this thesis characterizes a broad class of metrics, for which MINT is applicable. For these metrics, mechanisms for integrating network models (snapshots) obtained at different points in time and/or from different hosts are presented. Third, this thesis instantiates MINT for a variety of metrics---namely loss, delay, and bandwidth---in the context of unicast messaging. In that regard, it presents novel unicast end-to-end active probing techniques that enable the correlation of observations collected from end-points. The potential of passive probing by using feedback from established connections is also evaluated. Extensive simulation experiments show the effectiveness of these novel probing approaches as well as their robustness in terms of accuracy and convergence over a wide range of network conditions. Fourth, this thesis presents an implementation of the MINT framework through a Linux Application Programming Interface called the NetScope API. The value of the MINT framework and of the NetScope API are demonstrated through Internet deployment. Finally, to demonstrate the utility of the MINT framework, this thesis uses the NetScope API to enable the following applications at Massively accessed Internet servers: (1) use inference of shared congestion between a set of flows to enable shared congestion control, (2) optimize server selection in end-system multicast, (3) provide end-to-end statistical QoS (loss and delay) guarantees for a group of flows sharing the same bottleneck links.

Phd Thesis: Statistical Rate Monotonic Scheduling: Quality of Service through Management of Variability in Real-time Systems

    Alia Atlas, PhD 1999

    Interest in real-time scheduling increases as applications with quality of service (QoS) and timeliness constraints proliferate. The classical real-time task model, used in the optimal Rate Monotonic Scheduling (RMS), assumes constant resource requirements and hard deadlines. For the many applications with variable resource requirements, RMS uses pessimistic worst-case values and results in severe resource underutilization. To eliminate such underutilization, this dissertation examines how the variability of resource requirements should be considered in the problem of scheduling periodic tasks with statistical QoS constraints on the percentage of missed deadlines. To solve this problem, two on-line algorithms and an oracle are introduced, simulated and evaluated using two novel metrics. To show applicability, a computer-aided design tool and a design and implementation in KURT Linux are presented.

    The primary contribution, Statistical Rate Monotonic Scheduling (SRMS), is proposed with associated analysis for the calculation of statistical QoS guarantees and, given QoS requirements, proper resource allocation. SRMS assumes that variability can be smoothed through aggregation. It consists of a QoS calculator, a feasibility test, a scheduler and a constant-time job admission controller. Extensions provide time aggregation across tasks and a second chance for rejected jobs.

    Additional algorithms -- Slack Stealing Job Admission Control (SSJAC) and an omniscient off-line oracle --- are introduced to permit comparison of SRMS with previous research and with theoretical performance bounds. Different value functions enable the oracle to yield solutions optimal according to different metrics --- completion count, effective processor utilization (EPU), and job failure rate (JFR). The value function for the latter is introduced to provide a metric which considers all tasks of equal value.

    Via simulation, the performance of the algorithms is examined with JFR, EPU and two novel metrics. DeltaQoS evaluates the accuracy of QoS calculations. Intertask unfairness evaluates how unfair an algorithm is to different priority tasks. Experiments show that SRMS has superior performance during overload when the adjacent period ratio is at least two.

    To facilitate application development, the SRMS Workbench, a computer-aided design tool and simulator, and a design and implementation of SRMS in KURT Linux are provided. An API is introduced to support soft/firm-deadline and design-to-time tasks. The SRMS Workbench implements simple QoS negotiation and calculation of system specifications for requested QoS.

    The Thesis is available as a BUCS Technical Report

Phd Thesis: A Framework for Adaptive Forward Timely Erasure Recovery (AFTER)

    Gitae (Keith) Kim, PhD 1998

    A variety of real-time protocols have been developed to deal with communication failures (an entire or partial packet loss) in networked computing systems. These protocols rely on temporal and/or spatial redundancies to accomplish their goals --- often at an expensive cost, resulting from inefficient resource usage. Protocols that employ spatial redundancy, such as the techniques based on resource-reservation and information redundancy (eg FEC), while improving the level of responsiveness, are prone to suffer from low resource utilization, with possible degradation on reliability. On the other hand, protocols that employ temporal redundancy provide a high level of resource utilization and reliability, yet tend to suffer from high level of message delay and delay variance. Due to the high latencies during failure recoveries, these schemes are not suitable, especailly for those that require stringent real-time guarantee. Nevertheless, applications that require a high level of data integrity (e.g., on-line financial data feeds) need to use temporal redundancies at an expense of increased latencies, in order to provide guaranteed reliability.

    In this dissertation, we introduce the notion of Adaptive Forward Timely Erasure Recovery (AFTER) scheme, that takes a dynamic redundancy control approach to provide an effective transport framework, from which a variety of lower layers in the network system can benefit. The main idea behind AFTER is to provide a flexible resource control mechanism for those clients that require different level of reliability and responsiveness, by dynamically balancing the levels between spatial and temporal redundancy in handling communication failures. AFTER is an eleboration of the adaptive redundancy control scheme, the notion introduced in the Adaptive Information Dispersal Algorithm (AIDA). AFTER can be implemented for both reliable and unreliable data transport, under various network environments ranging from high-speed B-ISDN networks to low-cost best-effort data communication channels.

    To demonstrate the flexibility and superiority of AFTER, we implement it in two different scenarios: TCP/IP over ATM (i.e., TCP-Boston) using ATM's ABR service, and multimedia file transfers via ATM's CBR service class. TCP-Boston, a TCP/IP protocol especially suitable for ATM network, shows AFTER's adaptability to a network with small-sized transfer unit, for best-effort traffic using reliable transport method. AFTER's suitability under reliable communication channel is tested through the experiment for multimedia file transfers. For each application, we present a high-level implementation details of our protocol, evaluate the performance using both simulation and analytic methods, and show that AFTER-based protocols improve their performance over their counterparts.

PhD Thesis: Trace Analysis and its Applications to Performance Enhancements of Distributed Information Systems

    Carlos Cunha, PhD 1997

    The increasing importance of large-scale distributed information systems calls for a better understanding of the nature of their use. Such an understanding is critical to enable the design of high-performance and scalable information retrieval protocols.

    Over the last few years, the World-Wide Web (WWW or Web) has emerged as a unifying infrastructure for large-scale distributed information systems. This dissertation has two goals: (1) to perform a detailed analysis and characterization of WWW usage patterns and (2) to explore the performance enhancements that are possible to achieve as a result of such an analysis. The analysis of WWW usage can be done both at the client and at the server sides, enabling performance enhancements both at the client and at the server sides. On the client side, this dissertation explores prefetching techniques that alleviate the problem of long response times, by trading in network bandwidth for timeliness. On the server side, this dissertation explores replica allocation techniques that alleviate the problems of server load balancing and network bandwidth.

    The contributions of this dissertation are: (1) the presentation of a large database of actual client traces that has already proven to be crucial for studies of characterizing WWW traffic and client caching algorithms; (2) the identification of relationships between documents that indicate probable document sequences, which can be used to circumvent long retrieval delays through pre-fetching; (3) the identification of user behavior models to help in reducing the amount of bandwidth required for pre-fetching; (4) the establishment of a simplified Internet model based on routing structures for use in problems of resource allocation; (5) the demonstration of the stability of such structures; and (6) the introduction of various replica allocation algorithms, and the evaluation of their performance.

Phd Thesis: Admission Control and Scheduling Strategies for Real-time Database Systems

    Susan Nagy, PhD 1997

    The proliferation of Real-Time DataBase (RTDB) systems as repositories of information used by time-critical applications has been tremendous during the last decade. Many such systems continue to admit transactions to the point of overload which results in degraded performance. By the appropriate use of admission control and overload management techniques, the performance of such systems may be enhanced. Moreover, for some safety-critical applications (such as command and control systems), safety constraints require the early notification of transaction failure. Failure to do so results in wasting precious system resources, which could have been used by other admitted transactions, not to mention wasting precious time which could have been used to attempt alternative options for the failing transaction.

    In this dissertation, we propose ACCORD, an Admission Control and Capacity Overload management Real-time Database framework---an architecture and a transaction model---for hard deadline RTDB systems. The system architecture consists of admission control and scheduling components which provide early notification of failure to submitted transactions that are deemed not valuable or incapable of completing on time. The transaction model consists of two components: a primary task and a compensating task. Transactions which are admitted to the system are guaranteed, by the deadline of the transaction, one of two outcomes: either the primary task will successfully commit or the compensating task will safely terminate. Our admission control mechanisms permit transactions to fail at the earliest possible point in time (i.e. at submission time) rather than at a later time. Also as a system becomes overloaded, our admission control techniques allow for the utilization of system resources in the most profitable way.

    The contributions of this dissertation are: (1) the novel ACCORD framework for RTDB systems including a system architecture and a transaction model, (2) value-cognizant admission control mechanisms based upon workload, (3) value-cognizant admission control mechanisms based upon the level of concurrency conflicts, and (4) new scheduling algorithms suitable for ACCORD. These contributions are validated by an extensive experimental evaluation of ACCORD, through simulation, which confirms the performance benefits of admission control, overload management, and early failure notification.

PhD Thesis: Client-Based Logging: A New Paradigm For Distributed Transaction Management

    Euthimios Panagos, PhD 1996

    The proliferation of inexpensive workstations and networks has created a new era in distributed computing. At the same time, non-traditional applications such as computer-aided design (CAD), computer-aided software engineering (CASE), geographic- information systems (GIS), and office-information systems (OIS) have placed increased demands for high-performance transaction processing on database systems. The combination of these factors gives rise to significant challenges in the design of modern database systems. In this thesis, we propose novel techniques whose aim is to improve the performance and scalability of these new database systems. These techniques exploit client resources through client-based transaction management.

    Client-based transaction management is realized by providing logging facilities locally even when data is shared in a global environment. This thesis presents several recovery algorithms which utilize client disks for storing recovery related informa- tion (i.e., log records). Our algorithms work with both coarse and fine-granularity locking and they do not require the merging of client logs at any time. Moreover, our algorithms support fine-granularity locking with multiple clients permitted to con- currently update different portions of the same database page. The database state is recovered correctly when there is a complex crash as well as when the updates performed by different clients on a page are not present on the disk version of the page, even though some of the updating transactions have committed.

    This thesis also presents the implementation of the proposed algorithms in a memory-mapped storage manager as well as a detailed performance study of these algorithms using the OO1 database benchmark. The performance results show that client- based logging is superior to traditional server-based logging. This is because client-based logging is an effective way to reduce dependencies on server CPU and disk resources and, thus, prevents the server from becoming a performance bottleneck as quickly when the number of clients accessing the database increases.

    The Thesis is available as BUCS Technical Report TR-96-010

PhD Thesis: Concurrency Control Protocols for Real-Time Databases

    Spyridon Braoudakis, PhD 1994

    Concurrency control methods developed for traditional database systems are not appropriate for real-time database systems (RTDBS), where, in addition to database consistency requirements, satisfying timing constraints is an integral part of the correctness criterion. Most real-time concurrency control protocols considered in the literature combine time-critical scheduling with traditional concurrency control methods to conform to transaction timing constraints. These methods rely on either transaction blocking or restarts, both of which are inappropriate for real-time concurrency control because of the unpredictability they introduce. Moreover, RTDBS performance objectives differ from those of conventional database systems in that maximizing the number of transactions that complete before their deadlines becomes the decisive performance objective, rather than merely maximizing concurrency (or throughput). Recently, Speculative Concurrency Control (SCC) was proposed as a categorically different approach to concurrency control for RTDBS. SCC relies on the use of redundant processes (shadows), which speculate on alternative schedules, once conflicts that threaten the consistency of the database are detected. SCC algorithms utilize added system resources to ensure that correct (serializable) executions are discovered and adopted as early as possible, thus increasing the likelihood of the timely commitment of transactions.

    This dissertation starts by reviewing the Order-Based SCC (SCC-OB) algorithm which associates almost as many shadows as there are serialization orders of transactions. After demonstrating SCC-OB's excessive use of redundancy, a host of novel SCC-based protocols is introduced. Conflict-Based SCC (SCC-CB) reduces the number of shadows that a running transaction needs to keep by maintaining one shadow per uncommitted conflicting transaction. It is shown that the quadratic number of shadows maintained by SCC-CB is optimal, covering all serialization orders produced by SCC-OB. SCC-CB's correctness is established by showing that it admits only serializable histories. Next, the trade-off between the number of shadows and timeliness is considered. A generic SCC algorithm (SCC-kS) that operates under a limited redundancy assumption is presented; it allows no more than a constant number $k$ of shadows to coexist on behalf of any uncommitted transaction. Next, a novel technique is proposed that incorporates additional information such as deadline, priority and criticalness within the SCC methodology. SCC with Deferred Commit (SCC-DC) utilizes this additional information to improve the timeliness through the controlled deferment of transaction commitments. A probabilistic Value Induced Shadow Allocation (VISA) policy is developed which aims at preserving the most valuable shadows for each system transaction. The thesis of this dissertation is that SCC-based algorithms offer a new dimension, redundancy, to improve the timeliness of RTDBS. SCC-based algorithms are efficient (quadratic number of shadows is optimal), scalable (redundancy can be traded-off for timeliness), and easily amendable (deadline and priority information can be incorporated).

PhD Thesis: Real-Time Scheduling for Multimedia Services Using Network Delay Estimation

    John Gibbon, PhD 1996

    A multimedia system combines audio, video, graphics, and text into one presentation. Each of these multimedia datatypes has distinct temporal characteristics. For example video has a specific number of frames that must be displayed per second. There are also temporal relationships that exist between the media. In a movie application, the audio and video streams must be synchronized to achieve a lip-syncing effect. In our system, we manage the set of temporal requirements through the scheduling of the communication channel; multimedia data is retrieved across the network at the appropriate times so that the temporal presentation requirements are met. This real-time scheduling forms a basis for the limited a priori (LAP) scheduler. The scheduler assumes that it knows enough about the system a priori to schedule the next period or limited portion of the presentation. By considering only one period at a time, the scheduler can adapt to dynamic user input or changing communication channel characteristics. A network delay model and retrieval delay estimation are used by the LAP scheduler when scheduling objects so that they arrive before their playout deadlines. This modeling and estimation also allow the LAP scheduler to decide when there are changes in the communication channel performance that require adjustments to the retrieval schedule. Furthermore, they enable the LAP scheduler to lower there source requirements of a multimedia presentation when there is less than sufficient network bandwidth or buffer space for normal playout. The characteristics of the LAP scheduler are first described by analyzing the delay estimation techniques. Properties of the LAP scheduler are further investigated by using performance results from an FDI network simulation and from an implementation of the LAP scheduler between two Unix workstations interconnected by an Ethernet network. The LAP scheduler was found to satisfy the proposed objectives for multimedia data retrieval. However, its performance is hindered by the difficulty in predicting network traffic patterns the normal approximations in the estimation process, and the lack of scheduling for resources other than the communication chanel.