Towards More Efficient Communication for Distributed Learning Systems

dc.contributor.advisorCeze, Luis
dc.contributor.advisorKrishnamurthy, Arvind
dc.contributor.authorLuo, Liang
dc.date.accessioned2020-10-26T20:41:06Z
dc.date.available2020-10-26T20:41:06Z
dc.date.issued2020-10-26
dc.date.submitted2020
dc.descriptionThesis (Ph.D.)--University of Washington, 2020
dc.description.abstractThe explosion of data volume and ever-increasing speed of accelerators shift the bottleneckof large-scale distributed training tasks from computation to communication. We observesignificant pressure on the communication backends of various mainstream learning systemsin multiple environments when running such tasks. Achieving efficient large scale learningrelies on more effective communication planes. We provide detailed analysis that root-causes the bottlenecks affecting the communica-tion efficiency of these systems in the context of different environments. We pinpoint suchbottlenecks from the software, hardware and network infrastructure stacks. We show how these obstacles can be overcome with a systematic codesign of a streamlinedcommunication stack, a balanced hardware and cluster configuration with the distributedtraining workload, together with awareness of network topology and environment. We showthis series of approaches, named Parameter Box, Parameter Hub, Parameter Link along with Cloud Collectives, accelerate distributed training from small clusters to datacenters and allthe way to the commercial clouds while providing varying degrees of customization to suitdifferent needs.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherLuo_washington_0250E_21986.pdf
dc.identifier.urihttp://hdl.handle.net/1773/46428
dc.language.isoen_US
dc.rightsCC BY
dc.subject
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleTowards More Efficient Communication for Distributed Learning Systems
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Luo_washington_0250E_21986.pdf
Size:
4.89 MB
Format:
Adobe Portable Document Format