Show simple item record

dc.contributor.advisorCeze, Luis Hen_US
dc.contributor.authorLucia, Brandon M.en_US
dc.date.accessioned2013-07-25T17:51:39Z
dc.date.available2013-07-25T17:51:39Z
dc.date.issued2013-07-25
dc.date.submitted2013en_US
dc.identifier.otherLucia_washington_0250E_11953.pdfen_US
dc.identifier.urihttp://hdl.handle.net/1773/23483
dc.descriptionThesis (Ph.D.)--University of Washington, 2013en_US
dc.description.abstractParallel and concurrent software is more complex than sequential code because interactions between concurrent computations and the ordering of program events can vary across ex- ecutions. This nondeterministic variation is hard to understand and control, introducing the potential for concurrency bugs. This dissertation addresses two challenges related to concurrency bugs, focusing on shared-memory multi-threaded programs. First, concurrency bugs are hard to find, understand, and fix, but debugging is essential to software correctness. Second, concurrency bugs cause schedule-dependent failures that degrade system reliability. Targeting debugging, we develop two new concurrency debugging techniques based on statistical analysis and novel abstractions of inter-thread communication. These techniques isolate communications related to bugs and reconstruct failing executions. We show several hardware and software system designs that efficiently implement these techniques. Targeting the avoidance of schedule-dependent failures, we then develop two techniques for automatically avoiding schedule-dependent failures due to atomicity violations, a common concurrent program failure. We use specialized serializability analyses to identify code that should be atomic and system support to enforce atomicity. We implement these techniques with architecture and system support. Finally, we develop a mechanism for general schedule-dependent failure avoidance. We use a statistical analysis and leverage large communities of deployed systems to learn how to constrain executions to avoid previously seen failures. We show a software-only distributed system implementation that avoids real software failures with overheads low enough for production use.en_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.subjectComputer Architecture; Concurrency; Debugging; Failure-avoidance; Reliability; System Supporten_US
dc.subject.otherComputer scienceen_US
dc.subject.othercomputer science and engineeringen_US
dc.titleSystem Support for Concurrent Software Reliabilityen_US
dc.typeThesisen_US
dc.embargo.termsNo embargoen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record