Type-Aware Programming Models for Distributed Applications
MetadataShow full item record
Modern applications are distributed: from the simplest interactive web applications to social networks with massive datacenters around the world. Even simple distributed applications depend on a complex ecosystem of servers, databases, and caches to operate. In order to scale services and handle turbulent internet traffic, developers of distributed applications constantly balance fundamental tradeoffs between parallelism and locality, replication and synchronization, consistency and availability. This task is made more difficult by the fact that each component operates independently by design, knowing little about the original intent of the application or its specific performance needs. Layers of abstraction between the application and its data prevent the system from adapting itself to better meet the requirements of the application. Distributed application developers need interfaces that can communicate the structure and semantics of their programs to distributed systems that know how to use that information to optimize performance. Programmers should be able to improve data layout without completely re-architecting the system, and tell the system which data accesses should be less accurate or the highest priority. The system should be able to find concurrency and exploit it, leveraging weaker constraints to improve performance. Programmers should be protected from common mistakes, such as consistency bugs, by the languages and platforms they use. This dissertation explores new programming models that use type systems and abstract data types to communicate application semantics to distributed systems. The new interfaces place minimal burden on programmers by using the abstract behavior of existing data structures to naturally express high-level properties. New runtime techniques and optimizations are proposed to correspond with each additional piece of information passed down to the underlying system. These techniques leverage concurrency both in massively data-parallel analytics workloads and in web-service workloads with abundant inter-request parallelism. First, we propose a way to automatically move computation closer to data, statically analyzing remote data accesses and improving locality through compiler-assisted lightweight thread migrations. Next, we present the design of global shared data structures that enable threads to cooperate rather than contend for access using distributed combining. Then we explore ways of exposing concurrency between transactions in distributed datastores using abstract properties of the datatypes, such as commutativity. Finally, we introduce a programming model, IPA, that makes it safer to trade off consistency for performance. Explicit performance and correctness constraints allow the system to adapt to changing conditions by relaxing the consistency of some operations, secure in the knowledge that the type system will enforce safety by requiring the developer to consider the effects of weak operations. Together, these programming models and techniques in this work contribute to the toolkit available to distributed application developers to make their lives easier and their software more robust.