Collectively, my work involves a number of projects, both past and
present, including:
Composite/MPD
This work focuses on the design of a safe and predictable
component-based system, that adapts to changes in resource availability
and usage patterns. Using "mutable protection domains" (MPDs) we are
able to leverage techniques (currently based on hardware
protection) to adapt the isolation boundaries between component
services. Each component in the system represents a logical function
unit with a well defined interface that is mapped to a specific address
space, which may be partitioned from, or shared with, other components
depending on legitimate isolation constraints. Adaptations are made
dynamically to the degree
of isolation between components according to end-to-end
execution/communication costs. When there is a resource surplus,
isolation between components is increased where it is most desirable,
and when there is a resource deficit the isolation is reduced where it
is most tolerable. Certain critical components, such as those relating
to the core functionality of the system have immutable protection
domains, thereby remaining isolated from other less critical components
at all times. This ensures that, in the case of a component failure
(e.g., memory protection violation, component deadlock) the scope of
impact of such a failure is limited, and the integrity of the base
kernel is guaranteed at all times.
The concept of MPDs allows for a system to converge upon the natural
separation of components that provides maximal levels of isolation
while ensuring predictability. This increases the dependability
of the system, without requiring the system designer to manually
partition services into separate address spaces, in order to balance
isolation and timing requirements.
Quest
Kernel-level design to support application-specific real-time services.
More to follow.
Hijack
Using principles from the work on ULS (see below), along with
interposition techniques, Hijack provides a framework to intercept
system calls and interrupts so they may be passed to user-space
execution environments, where services may be deployed and executed
outside the kernel. Such execution environments allow for the provision
of application-specific services that may over-ride those of the
underlying system.
User-level Sandboxing
This work is motivated by the desire to allow applications to customize
and extend operating systems for their specific needs. The "User-Level
Sandboxing" project is concerned with the development of safe and
efficient methods for user-level extensibility of commercial
off-the-shelf (COTS) systems. As with micro-kernels,
user-level services
and extensions can be deployed without polluting the kernel
address space with potentially unsafe application-specific code.
User-level sandboxing provides a clean separation of core system
functionality from higher-level abstractions. Likewise, user-level code
can leverage libraries and system calls, it can be rapidly prototyped
without causing system failure, and it can be written without knowledge
of kernel internals.
As part of this work, we have implemented a number of sandboxed
services on a Linux x86 platform, that include ptrace interposition
agents, PID-controllers for adaptively managing CPU allocations to
threads, and a zero-copy user-level networking stack. Such services may
be executed without the traditional costs of scheduling and
context-switching between process-level protection domains. No special
hardware support such as segmentation or tagged translation look-aside
buffers (TLBs) is required. Instead, our technique requires only
paged-based virtual memory support, and the requirement that sandboxed
extensions are written either in a type-safe language, or by a trusted
source. Type-safe language support is needed in situations where an
untrusted application developer wishes to deploy an extension that may
execute in another process's address space, and we wish to prevent that
extension from directly accessing another process's private memory area.
Using a fast method of upcalls, we show how our sandboxing technique
for implementing logical protection domains provides significant
performance improvements over traditional methods of invoking
user-level services. Experimental results show our approach to be an
efficient method for supporting application-specific extensions, with
inter-protection domain communication costs close to those of
hardware-based solutions leveraging segmentation.
The flexibility of our page-based mechanism means that we could simply
support extensions in kernel space, that are directly accessed from
"external" kernel control paths. In fact, a question we've often been
asked is, "if we use a type-safe language for application-specific
extensions why then don't we simply allow those extensions to execute
directly in the kernel?". Given the flexibiility of our approach, we
are considering this as one of several possible configurable safety
approaches. However, there are issues with this approach given that we
wish to extend COTS systems, whose kernels may well be written in
unsafe languages such as C. Most non-trivial extensions will require
interaction with the core kernel. Any extension linked with the kernel
address space that makes references to external symbols could
potentially lead to side-effects that corrupt memory in the kernel. In
contrast, any user-level extension that accesses the kernel through
system calls can only corrupt user-level memory via "unsafe" arguments
and return values to those system calls. While it is possible, and
sometimes desired, to perform run-time checks on all references to
symbols external to an extension, this would be difficult if the
run-time checks had to be inserted into a pre-existing kernel.
Moreover, the interfaces to kernel symbols are not yet as well-defined
as the standard of the system call interface. For evidence of this, one
only has to look at the frequent changes to the exported ksyms
available with Linux for kernel module developers. In summary, there is
strong motivation in support of type-safe extensions at user-level,
especially as only the extensions need to be type-safe with our
approach, and the legacy kernel can be written in an unsafe language.
In this manner, it is easier to perform run-time checks on arguments
to, and return values from, standard system calls (e.g., via a trusted
libc interface) than to modify the underlying kernel for the same end
result. For those who wish to extend systems without such protection
features we are contemplating providing a range of safety options, from
support for totally "untrusted" extensions in kernel space to type-safe
extensions in user-space, that may execute in the context of any
process.
Osmosis: a Scalable System for Stream Processing
The goal of this work is to build a scalable and distributed system,
called Osmosis,
that supports the publication, subscription, and on-line processing of
real-time data streams. Osmosis is primarily targeted at the delivery
and
synthesis of real-time data streams generated by one or more publishers
and transported to potentially thousands of subscribers. A logical
overlay network connecting various end-systems is used to transport
data that may be processed at various intermediate hosts along the path
from source to destination. The aim is to manage resources such as CPU
and network bandwidth in a distributed manner (without any central
control), so as to maximize their usage. Current developments on this
project include the implementation of a Java middleware system for
dynamically-configuring a collection of end-systems into a (logical)
k-ary n-cube overlay topology, that supports QoS-constrained routing of
data streams between known publishers and subscribers.
SafeX: Safe Kernel Extensions
General-purpose operating systems are ill-equipped to meet the quality
of service (QoS) requirements of complex real-time applications.
Consequently,
many classes of real-time applications have either been carefully
developed
to compensate for inadequate system support, or they have been
developed
to run on special purpose systems. SafeX is a safe extension
architecture
for general purpose systems, to allow applications to customize the
behavior
of the system for their individual needs.
SafeX is currently being developed for Linux. It supports the
dynamic-linking
of code into the address space of the kernel, to affect service
management
decisions. Using this approach, we have implemented a simple CPU
service
manager, that adapts the scheduling of various real-time tasks, to
ensure
their service constraints are met even when there are run-time changes
in resource demands. Extensions are written in a type-safe language, to
monitor and adapt resource usage on behalf of specific applications. By
embedding code inside the kernel,
finer-grained management of system resources can be achieved.
Experimental
results show that safe kernel extensions can lead to fewer service
violations
(and, hence, better qualities of service) for real-time tasks, compared
to user-level methods that monitor and adapt system resources.
Dionisys: End-System QoS Support for Distributed Applications
Many complex distributed applications require quality of service
guarantees
on the end-to-end transfer of information across a distributed system.
A major problem faced by any system, or infrastructure, providing QoS
guarantees
to such applications is that resource requirements and availability may
change at run-time. Consequently, adaptive resource (and, hence,
service)
management mechanisms are required to guarantee quality of service to
these
applications. This research focuses on different methods of adaptive
quality
of service management, implemented with the event-based mechanisms
offered
by the Dionisys quality of service infrastructure.
Dionisys allows applications to influence: (1) how service
should
be adapted to maintain required quality, (2) when such
adaptations
should occur, and (3) where these adaptations should be
performed.
In Dionisys, service managers execute application-specific functions to
monitor and adapt service (and, hence, resource usage and allocation),
in order to meet the quality of service requirements of adaptable
applications.
This approach allows service managers to provide service in a manner
specific
to the needs of individual applications. Moreover, applications can
monitor
and identify resource bottlenecks, adapt their requirements for
heavily-demanded
resources, or adapt to different requirements of alternative resources,
in order to improve or maintain their overall quality of service.
Likewise,
service managers can cooperate with other service managers and, by
using
knowledge of application-specific resource requirements and adaptation
capabilities, each service manager can make better decisions about
resource
allocation. The research contributions of this work involve a
comparison
of several adaptation strategies implemented using the event-based
mechanisms
in Dionisys, thereby demonstrating the flexibility of the Dionisys
quality
of service infrastructure.
This work shows that for certain applications, `out-of-band'
adaptations
are inadequate, thereby necessitating the use of `in-band' adaptations.
In-band adaptations occur as application-level data is being
processed
and/or transferred, along the logical path of resources leading to the
destination. By contrast, typical out-of-band adaptations are
not
enacted until subsequent data is transferred along the same logical
path.
Furthermore, part of this work shows that by coordinating service
adaptations
between a number of service managers, applications can achieve high
qualities
of service in an efficient manner. Dionisys has been prototyped in
user-space
on a series of Solaris workstations. Current efforts are focused on the
implementation of Dionisys within the Linux kernel, to support
scalable,
real-time applications with dynamically varying QoS requirements.
Window-Constrained Scheduling
Dynamic Window-Constrained Scheduling
(DWCS): Originally, DWCS was designed to be a network packet
scheduler, that
limited
the number of late or lost packets over finite numbers of consecutive
packets
in loss-tolerant and/or delay-constrained, heterogeneous traffic
streams.
For example, many multimedia applications (including video-on-demand
and
streamed audio) can tolerate a certain fraction of late or lost
information,
without any noticeable degradation in the quality of playback at the
receiver.
However, it is important that such applications do not suffer losses at
certain points in time, even though they can tolerate a certain
fraction
of information being lost. For most loss-tolerant applications, there
is
usually a restriction on the number of consecutive packet losses that
are
acceptable. For example, losing a series of consecutive packets from an
audio stream might result in the loss of a complete section of audio,
rather
than merely a reduction in the signal-to-noise ratio. A suitable
performance
measure in this case is a windowed loss-rate, i.e. loss-rate
constrained
over a finite range, or window, of consecutive packets. More precisely,
an application might tolerate x packet losses for every y
arrivals at the various service points across a network. Any service
discipline
attempting to meet these requirements must ensure that the number of
violations
to the loss-tolerance specification is minimized (if not zero) across
the
whole stream.
As a packet scheduler, DWCS attempts to meet windowed loss-rates for
multiple packet streams, each with their own performance objectives.
More
recently, DWCS has been applied to scheduling processes (and threads)
for
execution on available CPUs of a host machine. In its current
incarnation
as a CPU scheduler, DWCS attempts to guarantee that no more than x deadlines
are missed every y deadlines, for a given process (or thread).
To
date, DWCS has been prototyped as a Solaris-based packet scheduler for
an MPEG-1 video server running over ATM and Ethernet. It has also been
implemented as a CPU scheduler in the Linux kernel. A working version
for Linux can be downloaded
from
the
Linux DWCS website.
The Linux DWCS CPU scheduler works at the granularity of one clock
interrupt,
or `jiffy'. Future developments may focus on an implementation with
finer
real-time granularity, possibly using UTIME from Kansas University, in
a manner similar to the `one-shot' timing mode in the Rialto operating
system. A Linux-based DWCS packet scheduler is also being developed, so
that DWCS can feature in both CPU and network service managers, as part
of Linux Dionisys. DWCS has several interesting characteristics,
including
its ability to operate as a fair queueing algorithm. In many cases, it
is possible to support 100% load and still guarantee that no more than
x deadlines are missed every window of y consecutive
deadlines.
However, there is still analysis work that needs to be done, to
determine
exactly how variants of DWCS allocate resources in both under- and
over-load
situations.
Virtual Deadline Scheduling (VDS):
While DWCS is capable of guaranteeing a feasible window-constrained
schedule that demands 100% utilization, there are restrictions on the
constraints that make this possible. Specifically, in the general case,
DWCS requires all jobs (e.g., packets or tasks) to have the same
request periods and, hence, the same relative deadlines to achieve a
feasible schedule under full load conditions. While it is possible to
translate arbitrary window-constraints to meet this requirement, albeit
over a larger window of real-time, we devised an alternative algorithm
called Virtual Deadline Scheduling (VDS). VDS prioritizes each job
based on a function of its request period and window-constraint,
thereby deriving a virtual deadline (with respect to real-time).
The job with the lowest virtual deadline is given precedence. In
contrast, DWCS separately considers each job's deadline and
window-constraint when making scheduling decisions. This fact renders
DWCS unable to achieve the same level of service as VDS for
window-constrained jobs. It should be noted that the derivation of
virtual deadlines using VDS is based on the need to provide
proportional fair service within a given real-time window, for a number
of outstanding instances of a given job that still require service.
Extensions to our work on window-constrained scheduling include
looking at multi-hop service guarantees. We have already investigated
the use of DWCS and VDS along a pipeline of servers for the end-to-end
scheduling of packets in media streams. Further work in this area
involves the statistical analysis of end-to-end service guarantees in
situations when packet arrivals at a server (e.g., end-host or network
switch/router) have some statistical distribution, rather than a
deterministic pattern. Composition of per-hop/server service
requirements to enforce end-to-end service guarantees is also an open
area of study.
S-DSO: Semantic-Distributed Shared Object Run-time Support
Gigabit network technologies have made it possible to combine
workstations
into a distributed, massively-parallel computer system. Middleware,
such
as distributed shared objects (DSO), attempts to improve
programmability
of such systems, by providing globally accessible `object'
abstractions.
S-DSO supports application-specific consistency protocols for
multimedia
and groupware applications. This differs from work by other
researchers,
who have focused on consistency protocols for scientific applications
with
replicated `memory' objects. S-DSO addresses the state sharing needs of
complex distributed applications with: (1) high-frequency symmetric
accesses
to shared objects, (2) poor and unpredictable locality of accesses, (3)
dynamically changing sharing behavior, and (4) potential data races.
S-DSO enables a system exploiting application-level temporal and
spatial
constraints on shared objects to outperform shared object protocols
that
do not exploit application-level constraints. Specifically, high levels
of concurrency and scalability for complex distributed applications may
be attained if programmers can exploit the specific semantics of these
applications when implementing and using shared state abstractions. For
such shared abstractions, application-level semantics are used for
deciding
when
and who should be informed of updates to them. In particular,
this
work focuses on the notions of 'temporal' and 'spatial' consistency,
which
jointly capture a wide range of consistency knowledge and constraints
about
shared state in complex distributed programs. As part of prior research
at Georgia Tech, I designed and built an S-DSO system, along with
several
consistency protocols tailored to the requirements of a distributed
video
game, exhibiting dynamically changing requirements regarding shared
state
information. The actual code and sample video game are available for download.
In this research, it was shown that an S-DSO system offers improved
programmability
for applications using it, without sacrificing performance, compared to
equivalent applications programmed with explicit message passing.