Techniques for Integrating Erasure Codes and Model Checkers with Distributed Systems
Loading...
Date
Authors
Michael, Ellis
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Distributed systems are vital components in our modern computing infrastructure. They must be correct, guaranteeing that important safety properties hold not just in the common case but in all possible cases. They must also be performant and efficient, providing low latency and high throughput to their users while consuming as few resources as possible. This thesis addresses these broad goals of distributed systems in two ways. First, this thesis introduces the Amalgam protocol which utilizes erasure codes to reduce the storage impact of fault-tolerant distributed systems. Traditionally, state machine replication protocols have provided strong consistency guarantees along with the ability to execute arbitrary operations on shared state in a single, atomic transaction. However, replication comes at the cost of storage overhead. The Amalgam protocol obviates this trade-off by handling transactions on erasure-coded data with similar performance characteristics to a replicated baseline. Amalgam guarantees linearizability while tolerating the complete failure of a configurable number of machines. The protocol also supports the replacement of failed machines and utilizes a snapshot protocol to ensure that replacement servers correctly rebuild their state and restore the overall health of the system. This thesis also presents the DSLabs framework. Distributed systems are notoriously difficult to implement, and because they are inherently non-deterministic, distributed systems are also notoriously difficult to test. Adverse network conditions can manifest bugs which might otherwise lie dormant. Explicit-state model checking allows for the systematic exploration of all possible executions of a distributed system. However, many distributed systems have infinitely large state spaces which grow in size very quickly as the execution depth increases. DSLabs is a framework for designing and model checking distributed systems. DSLabs enables the creation of guided searches of the state space of a distributed system, allowing a system designer to circumvent the state explosion problem by focusing on areas of the state space which are likely to be problematic. The DSLabs programming framework is designed to be used by novice programmers --- in particular, university students --- and provides powerful tools for model checking and visualizing distributed systems, letting students focus on the already difficult task of implementation.
Description
Thesis (Ph.D.)--University of Washington, 2023
