Improving Fault Tolerance and Performance of Data Center Networks
| dc.contributor.advisor | Anderson, Thomas E | |
| dc.contributor.advisor | Krishnamurthy, Arvind | |
| dc.contributor.author | Liu, Vincent | |
| dc.date.accessioned | 2017-02-14T22:38:06Z | |
| dc.date.available | 2017-02-14T22:38:06Z | |
| dc.date.issued | 2017-02-14 | |
| dc.date.submitted | 2016-09 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2016-09 | |
| dc.description.abstract | Data center networks are a key component to the explosive growth of cloud computing---enabling the utilization of tens to hundreds of thousands of co-located servers for large-scale computing and services. As applications and data sets continue to grow rapidly, the challenge for data center networks is to keep pace---by providing enough bandwidth while also lowering costs, increasing flexibility, and maintaining reliability. My thesis is that a key part of the answer is the network's wiring topology: topology has foundational cross-layer effects, and a small amount of intentional asymmetry in the topology can help data center networks meet that challenge. I present two complementary innovations that demonstrate this. The first, F10, is a co-design of the network topology and failover protocols to provide efficient, near-instantaneous, fine-grained, and localized recovery and rebalancing for common-case network failures. My results show that following network link and switch failures, F10 has 1/7th the packet loss of current schemes. The second innovation, Subways, proposes and evaluates a new method to add network capacity by connecting multiple network links per server in an overlapping topology. Using a simulation-based methodology, my work shows that Subways offers substantial performance benefits for popular application workloads: up to a 3.1x speedup in MapReduce and a 2.5x throughput improvement in memcache for a fixed average request latency, relative to an equivalent-bandwidth network that differs only in its wiring. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Liu_washington_0250E_16576.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/38103 | |
| dc.language.iso | en_US | |
| dc.rights | ||
| dc.subject | ||
| dc.subject.other | Computer science | |
| dc.subject.other | computer science and engineering | |
| dc.title | Improving Fault Tolerance and Performance of Data Center Networks | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Liu_washington_0250E_16576.pdf
- Size:
- 8.16 MB
- Format:
- Adobe Portable Document Format
