Improving Fault Tolerance and Performance of Data Center Networks

dc.contributor.advisorAnderson, Thomas E
dc.contributor.advisorKrishnamurthy, Arvind
dc.contributor.authorLiu, Vincent
dc.date.accessioned2017-02-14T22:38:06Z
dc.date.available2017-02-14T22:38:06Z
dc.date.issued2017-02-14
dc.date.submitted2016-09
dc.descriptionThesis (Ph.D.)--University of Washington, 2016-09
dc.description.abstractData center networks are a key component to the explosive growth of cloud computing---enabling the utilization of tens to hundreds of thousands of co-located servers for large-scale computing and services. As applications and data sets continue to grow rapidly, the challenge for data center networks is to keep pace---by providing enough bandwidth while also lowering costs, increasing flexibility, and maintaining reliability. My thesis is that a key part of the answer is the network's wiring topology: topology has foundational cross-layer effects, and a small amount of intentional asymmetry in the topology can help data center networks meet that challenge. I present two complementary innovations that demonstrate this. The first, F10, is a co-design of the network topology and failover protocols to provide efficient, near-instantaneous, fine-grained, and localized recovery and rebalancing for common-case network failures. My results show that following network link and switch failures, F10 has 1/7th the packet loss of current schemes. The second innovation, Subways, proposes and evaluates a new method to add network capacity by connecting multiple network links per server in an overlapping topology. Using a simulation-based methodology, my work shows that Subways offers substantial performance benefits for popular application workloads: up to a 3.1x speedup in MapReduce and a 2.5x throughput improvement in memcache for a fixed average request latency, relative to an equivalent-bandwidth network that differs only in its wiring.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherLiu_washington_0250E_16576.pdf
dc.identifier.urihttp://hdl.handle.net/1773/38103
dc.language.isoen_US
dc.rights
dc.subject
dc.subject.otherComputer science
dc.subject.othercomputer science and engineering
dc.titleImproving Fault Tolerance and Performance of Data Center Networks
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Liu_washington_0250E_16576.pdf
Size:
8.16 MB
Format:
Adobe Portable Document Format