Further details can be found in my Research Statement.
Collectively, my work involves a number of projects, both past and present, including:
QuestA standalone system for x86 architectures (so far) that is designed around three main goals: safety, predictability and efficiency. Quest is an SMP system that works on multicore and multiprocessor platforms, with support for bandwidth-preserving servers acting as "virtual CPUs" (VCPUs). The system features a novel scheduling hierarchy using both Main and IO VCPUs, to ensure temporal isolation between system events (e.g., interrupts) and conventional tasks. Current efforts are focused on VCPU scheduling on physical CPUs (PCPUs, including hyperthreads, CMP cores, or conventional processors), while considering micro-architectural resource contention. In particular, we are investigating methods to address shared cache resource management and memory bus bandwidth usage in the placement of VCPUs on PCPUs, by considering hardware performance counters as part of an integral performance monitoring subsystem. Some of our work on shared cache resource management has been conducted in collaboration with VMware.
Quest-VQuest-V extends Quest by using hardware virtualization capabilities to implement a separation kernel for multicore processors. This leads to a distributed system on a chip, with subsets of machine physical resources (processor cores, memory, and I/O devices) being confined to separate sandboxes that operate like individual hosts. Each sandbox runs its own operating system, or library services, to manage an assigned subset of machine resources without involvement of a hypervisor. Instead, each sandbox performs local CPU scheduling, interrupt handling and all other device management directly on its assigned hardware. Virtualization features are only used to provide logical isolation between sandboxes, and impose almost no overheads on normal system operation. Each sandbox communicates with other sandboxes using well-defined shared memory channels supporting real-time communication protocols. Asynchronous (non-blocking) techniques such as Simpson's four-slot protocol, or semi-asynchronous (buffered) communication channels can be configured as desired.
Currently, we have support to run a Linux sandbox with its own RAM-based filesystem in the presence of other Quest-native real-time sandboxes. This setup is useful for mixed-criticality systems, requiring legacy non-real-time applications and services to co-exist with real-time and safety-critical services. For example, in an automotive system, an infotainment service might be designed to run in a Linux sandbox, leveraging a pre-existing graphical interface, while more critical engine and chassis maangement tasks are confined to Quest native sandboxes. The operation of each sandbox is such that failure of one cannot compromise the behavior of another.
QduinoQduino is a programming environment built on Quest, and targeted at an emerging class of single board computers with GPIOs and more powerful features than the original Arduino (Atmel Mega AVR) platforms. The aim of Qduino is to build upon the ease of programming that has made the Arduino computing platform so successful, while providing support for real-time threads and predictable I/O. Qduino hides the complexity of managing threads and provides an extension to the Arduino loop() construct, so that multiple such time-predictable loops can be specified.
Composite/MPDThis work focuses on the design of a safe and predictable component-based system, that adapts to changes in resource availability and usage patterns. Using "mutable protection domains" (MPDs) we are able to leverage techniques (currently based on hardware protection) to adapt the isolation boundaries between component services. Each component in the system represents a logical function unit with a well defined interface that is mapped to a specific address space, which may be partitioned from, or shared with, other components depending on legitimate isolation constraints. Adaptations are made dynamically to the degree of isolation between components according to end-to-end execution/communication costs. When there is a resource surplus, isolation between components is increased where it is most desirable, and when there is a resource deficit the isolation is reduced where it is most tolerable. Certain critical components, such as those relating to the core functionality of the system have immutable protection domains, thereby remaining isolated from other less critical components at all times. This ensures that, in the case of a component failure (e.g., memory protection violation, component deadlock) the scope of impact of such a failure is limited, and the integrity of the base kernel is guaranteed at all times.
The concept of MPDs allows for a system to converge upon the natural separation of components that provides maximal levels of isolation while ensuring predictability. This increases the dependability of the system, without requiring the system designer to manually partition services into separate address spaces, in order to balance isolation and timing requirements.
HijackUsing principles from the work on ULS (see below), along with interposition techniques, Hijack provides a framework to intercept system calls and interrupts so they may be passed to user-space execution environments, where services may be deployed and executed outside the kernel. Such execution environments allow for the provision of application-specific services that may over-ride those of the underlying system.
This work is motivated by the desire to allow applications to
customize and extend operating systems for their specific needs.
The "User-Level Sandboxing" project is concerned with the
development of safe and efficient methods for user-level
extensibility of commercial off-the-shelf (COTS) systems.
As with micro-kernels, user-level services and
extensions can be deployed without polluting the kernel
address space with potentially unsafe application-specific code.
User-level sandboxing provides a clean separation of core system
functionality from higher-level abstractions. Likewise,
user-level code can leverage libraries and system calls, it can
be rapidly prototyped without causing system failure, and it can
be written without knowledge of kernel internals.
As part of this work, we have implemented a number of sandboxed services on a Linux x86 platform, that include ptrace interposition agents, PID-controllers for adaptively managing CPU allocations to threads, and a zero-copy user-level networking stack. Such services may be executed without the traditional costs of scheduling and context-switching between process-level protection domains. No special hardware support such as segmentation or tagged translation look-aside buffers (TLBs) is required. Instead, our technique requires only paged-based virtual memory support, and the requirement that sandboxed extensions are written either in a type-safe language, or by a trusted source. Type-safe language support is needed in situations where an untrusted application developer wishes to deploy an extension that may execute in another process's address space, and we wish to prevent that extension from directly accessing another process's private memory area.
Using a fast method of upcalls, we show how our sandboxing technique for implementing logical protection domains provides significant performance improvements over traditional methods of invoking user-level services. Experimental results show our approach to be an efficient method for supporting application-specific extensions, with inter-protection domain communication costs close to those of hardware-based solutions leveraging segmentation.
The flexibility of our page-based mechanism means that we could simply support extensions in kernel space, that are directly accessed from "external" kernel control paths. In fact, a question we've often been asked is, "if we use a type-safe language for application-specific extensions why then don't we simply allow those extensions to execute directly in the kernel?". Given the flexibiility of our approach, we are considering this as one of several possible configurable safety approaches. However, there are issues with this approach given that we wish to extend COTS systems, whose kernels may well be written in unsafe languages such as C. Most non-trivial extensions will require interaction with the core kernel. Any extension linked with the kernel address space that makes references to external symbols could potentially lead to side-effects that corrupt memory in the kernel. In contrast, any user-level extension that accesses the kernel through system calls can only corrupt user-level memory via "unsafe" arguments and return values to those system calls. While it is possible, and sometimes desired, to perform run-time checks on all references to symbols external to an extension, this would be difficult if the run-time checks had to be inserted into a pre-existing kernel. Moreover, the interfaces to kernel symbols are not yet as well-defined as the standard of the system call interface. For evidence of this, one only has to look at the frequent changes to the exported ksyms available with Linux for kernel module developers. In summary, there is strong motivation in support of type-safe extensions at user-level, especially as only the extensions need to be type-safe with our approach, and the legacy kernel can be written in an unsafe language. In this manner, it is easier to perform run-time checks on arguments to, and return values from, standard system calls (e.g., via a trusted libc interface) than to modify the underlying kernel for the same end result. For those who wish to extend systems without such protection features we are contemplating providing a range of safety options, from support for totally "untrusted" extensions in kernel space to type-safe extensions in user-space, that may execute in the context of any process.
Osmosis: a Scalable System for Stream Processing
The goal of this work is to build a scalable and distributed
system, called Osmosis, that supports the publication,
subscription, and on-line processing of real-time data streams.
Osmosis is primarily targeted at the delivery and synthesis of
real-time data streams generated by one or more publishers and
transported to potentially thousands of subscribers. A logical
overlay network connecting various end-systems is used to
transport data that may be processed at various intermediate
hosts along the path from source to destination. The aim is to
manage resources such as CPU and network bandwidth in a
distributed manner (without any central control), so as to
maximize their usage. Current developments on this project
include the implementation of a Java middleware system for
dynamically-configuring a collection of end-systems into a
(logical) k-ary n-cube overlay topology, that supports
QoS-constrained routing of data streams between known publishers
SafeX: Safe Kernel ExtensionsGeneral-purpose operating systems are ill-equipped to meet the quality of service (QoS) requirements of complex real-time applications. Consequently, many classes of real-time applications have either been carefully developed to compensate for inadequate system support, or they have been developed to run on special purpose systems. SafeX is a safe extension architecture for general purpose systems, to allow applications to customize the behavior of the system for their individual needs.
SafeX is currently being developed for Linux. It supports the
dynamic-linking of code into the address space of the kernel,
to affect service management decisions. Using this approach,
we have implemented a simple CPU service manager, that adapts
the scheduling of various real-time tasks, to ensure their
service constraints are met even when there are run-time
changes in resource demands. Extensions are written in a
type-safe language, to monitor and adapt resource usage on
behalf of specific applications. By embedding code inside the
finer-grained management of system resources can be achieved. Experimental results show that safe kernel extensions can lead to fewer service violations (and, hence, better qualities of service) for real-time tasks, compared to user-level methods that monitor and adapt system resources.
Dionisys: End-System QoS Support for Distributed ApplicationsMany complex distributed applications require quality of service guarantees on the end-to-end transfer of information across a distributed system. A major problem faced by any system, or infrastructure, providing QoS guarantees to such applications is that resource requirements and availability may change at run-time. Consequently, adaptive resource (and, hence, service) management mechanisms are required to guarantee quality of service to these applications. This research focuses on different methods of adaptive quality of service management, implemented with the event-based mechanisms offered by the Dionisys quality of service infrastructure.
Dionisys allows applications to influence: (1) how service should be adapted to maintain required quality, (2) when such adaptations should occur, and (3) where these adaptations should be performed. In Dionisys, service managers execute application-specific functions to monitor and adapt service (and, hence, resource usage and allocation), in order to meet the quality of service requirements of adaptable applications. This approach allows service managers to provide service in a manner specific to the needs of individual applications. Moreover, applications can monitor and identify resource bottlenecks, adapt their requirements for heavily-demanded resources, or adapt to different requirements of alternative resources, in order to improve or maintain their overall quality of service. Likewise, service managers can cooperate with other service managers and, by using knowledge of application-specific resource requirements and adaptation capabilities, each service manager can make better decisions about resource allocation. The research contributions of this work involve a comparison of several adaptation strategies implemented using the event-based mechanisms in Dionisys, thereby demonstrating the flexibility of the Dionisys quality of service infrastructure.
This work shows that for certain applications, `out-of-band' adaptations are inadequate, thereby necessitating the use of `in-band' adaptations. In-band adaptations occur as application-level data is being processed and/or transferred, along the logical path of resources leading to the destination. By contrast, typical out-of-band adaptations are not enacted until subsequent data is transferred along the same logical path. Furthermore, part of this work shows that by coordinating service adaptations between a number of service managers, applications can achieve high qualities of service in an efficient manner. Dionisys has been prototyped in user-space on a series of Solaris workstations. Current efforts are focused on the implementation of Dionisys within the Linux kernel, to support scalable, real-time applications with dynamically varying QoS requirements.
Scheduling and Resource Management
CAFÉ - Cache-Aware Fair and Efficient Scheduling: Modern chip-level multiprocessors (CMPs) typically contain multiple processor cores sharing a common last-level cache, memory interconnects, and other hardware resources. Workloads running on separate cores compete for these resources, often resulting in highly-variable performance. Unfortunately, commodity processors manage shared hardware resources in a manner that is opaque to higher-level schedulers responsible for multiplexing these resources across workloads with varying demands and importance. As a result, it is extremely challenging to optimize for efficient resource utilization or enforce quality-of-service policies. Effective cache management requires accurate measurement of per-thread cache occupancies and their impact on performance, often summarized by utility functions such as miss-ratio curves (MRCs). This work investigates efficient online techniques for generating MRCs and other cache utility curves, requiring only performance counters available on commodity processors. Building on these monitoring and inference techniques, the work also introduces novel methods to improve the fairness and efficiency of CMP scheduling decisions. Vtime compensation adjusts a thread's scheduling priority to account for cache and memory system interference from co-runners, and cache divvying estimates the performance impact of co-runner placements.Dynamic Window-Constrained Scheduling (DWCS): Originally, DWCS was designed to be a network packet scheduler, that limited the number of late or lost packets over finite numbers of consecutive packets in loss-tolerant and/or delay-constrained, heterogeneous traffic streams. For example, many multimedia applications (including video-on-demand and streamed audio) can tolerate a certain fraction of late or lost information, without any noticeable degradation in the quality of playback at the receiver. However, it is important that such applications do not suffer losses at certain points in time, even though they can tolerate a certain fraction of information being lost. For most loss-tolerant applications, there is usually a restriction on the number of consecutive packet losses that are acceptable. For example, losing a series of consecutive packets from an audio stream might result in the loss of a complete section of audio, rather than merely a reduction in the signal-to-noise ratio. A suitable performance measure in this case is a windowed loss-rate, i.e. loss-rate constrained over a finite range, or window, of consecutive packets. More precisely, an application might tolerate x packet losses for every y arrivals at the various service points across a network. Any service discipline attempting to meet these requirements must ensure that the number of violations to the loss-tolerance specification is minimized (if not zero) across the whole stream.
As a packet scheduler, DWCS attempts to meet windowed loss-rates for multiple packet streams, each with their own performance objectives. More recently, DWCS has been applied to scheduling processes (and threads) for execution on available CPUs of a host machine. In its current incarnation as a CPU scheduler, DWCS attempts to guarantee that no more than x deadlines are missed every y deadlines, for a given process (or thread). To date, DWCS has been prototyped as a Solaris-based packet scheduler for an MPEG-1 video server running over ATM and Ethernet. It has also been implemented as a CPU scheduler in the Linux kernel. A working version for Linux can be downloaded from the Linux DWCS website.
The Linux DWCS CPU scheduler works at the granularity of one
clock interrupt, or `jiffy'. Future developments may focus on
an implementation with finer real-time granularity, possibly
using UTIME from Kansas University, in a manner similar to the
`one-shot' timing mode in the Rialto operating system. A
Linux-based DWCS packet scheduler is also being developed, so
that DWCS can feature in both CPU and network service
managers, as part of Linux Dionisys. DWCS has several
interesting characteristics, including its ability to operate
as a fair queueing algorithm. In many cases, it is possible to
support 100% load and still guarantee that no more than x
deadlines are missed every window of y consecutive
deadlines. However, there is still analysis work that needs to
be done, to determine exactly how variants of DWCS allocate
resources in both under- and over-load situations.
Virtual Deadline Scheduling
(VDS): While DWCS is capable of guaranteeing a
feasible window-constrained schedule that demands 100%
utilization, there are restrictions on the constraints that
make this possible. Specifically, in the general case, DWCS
requires all jobs (e.g., packets or tasks) to have the same
request periods and, hence, the same relative deadlines to
achieve a feasible schedule under full load conditions. While
it is possible to translate arbitrary window-constraints to
meet this requirement, albeit over a larger window of
real-time, we devised an alternative algorithm called Virtual
Deadline Scheduling (VDS). VDS prioritizes each job based on a
function of its request period and window-constraint, thereby
deriving a virtual deadline (with respect to real-time).
The job with the lowest virtual deadline is given precedence.
In contrast, DWCS separately considers each job's deadline and
window-constraint when making scheduling decisions. This fact
renders DWCS unable to achieve the same level of service as
VDS for window-constrained jobs. It should be noted that the
derivation of virtual deadlines using VDS is based on the need
to provide proportional fair service within a given real-time
window, for a number of outstanding instances of a given job
that still require service.
Extensions to our work on window-constrained scheduling include looking at multi-hop service guarantees. We have already investigated the use of DWCS and VDS along a pipeline of servers for the end-to-end scheduling of packets in media streams. Further work in this area involves the statistical analysis of end-to-end service guarantees in situations when packet arrivals at a server (e.g., end-host or network switch/router) have some statistical distribution, rather than a deterministic pattern. Composition of per-hop/server service requirements to enforce end-to-end service guarantees is also an open area of study.
S-DSO: Semantic-Distributed Shared Object Run-time SupportGigabit network technologies have made it possible to combine workstations into a distributed, massively-parallel computer system. Middleware, such as distributed shared objects (DSO), attempts to improve programmability of such systems, by providing globally accessible `object' abstractions. S-DSO supports application-specific consistency protocols for multimedia and groupware applications. This differs from work by other researchers, who have focused on consistency protocols for scientific applications with replicated `memory' objects. S-DSO addresses the state sharing needs of complex distributed applications with: (1) high-frequency symmetric accesses to shared objects, (2) poor and unpredictable locality of accesses, (3) dynamically changing sharing behavior, and (4) potential data races.
S-DSO enables a system exploiting application-level temporal
and spatial constraints on shared objects to outperform shared
object protocols that do not exploit application-level
constraints. Specifically, high levels of concurrency and
scalability for complex distributed applications may be
attained if programmers can exploit the specific semantics of
these applications when implementing and using shared state
abstractions. For such shared abstractions, application-level
semantics are used for deciding when and who
should be informed of updates to them. In particular, this
work focuses on the notions of 'temporal' and 'spatial'
consistency, which jointly capture a wide range of consistency
knowledge and constraints about shared state in complex
distributed programs. As part of prior research at Georgia
Tech, I designed and built an S-DSO system, along with several
consistency protocols tailored to the requirements of a
distributed video game, exhibiting dynamically changing
requirements regarding shared state information. The actual
code and sample video game are available for download.
In this research, it was shown that an S-DSO system offers
improved programmability for applications using it, without
sacrificing performance, compared to equivalent applications
programmed with explicit message passing.