A Joint Model Provisioning and Request Dispatch Solution for Mobile Inference Serving at the Edge

Prasad, Anish Nagendra

A Joint Model Provisioning and Request Dispatch Solution for Mobile Inference Serving at the Edge

dc.contributor.advisor	Peng, Yang
dc.contributor.author	Prasad, Anish Nagendra
dc.date.accessioned	2021-08-26T18:02:51Z
dc.date.available	2021-08-26T18:02:51Z
dc.date.issued	2021-08-26
dc.date.submitted	2021
dc.description	Thesis (Master's)--University of Washington, 2021
dc.description.abstract	With the advancement of machine learning (ML), a growing number of mobile clients rely onML inference for making time-sensitive and safety-critical decisions. Therefore, the demand for high-quality and low-latency inference services at the network edge has become the key to the modern intelligent society. This thesis proposes a novel solution that jointly provisions inference models and dispatches inference requests for reducing the latency of mobile inference serving on edge nodes. Unlike existing solutions that either direct inference requests to the nearest edge node or balance the workload between edge nodes, the solution we propose provisions each edge node with the optimal type and the number of inference serving instances under a holistic consideration of networking, computing, and memory resources. Mobile clients can thus utilize ML inference services on edge nodes that offer minimal inference serving latency. In this work, we implement the proposed solution using TensorFlow Serving and Kubernetes on a cluster of edge nodes, including Nvidia Jetson Nano and Jetson Xavier. We further demonstrate the proposed solution’s effectiveness in reducing the overall inference latency under various system parameters and practical system settings through simulation and testbed experiments, respectively.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Prasad_washington_0250O_23263.pdf
dc.identifier.uri	http://hdl.handle.net/1773/47193
dc.language.iso	en_US
dc.rights	none
dc.subject
dc.subject	Computer science
dc.subject.other	Computing and software systems
dc.title	A Joint Model Provisioning and Request Dispatch Solution for Mobile Inference Serving at the Edge
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Prasad_washington_0250O_23263.pdf
Size:: 552.28 KB
Format:: Adobe Portable Document Format

Download

Collections

Computing and Software Systems (UW Bothell)
MS in Computer Science and Software Engineering