Kernel Mechanisms for Efficient GPU Accelerated Deep Neural Network Inference on Embedded Devices

Loading...
Thumbnail Image

Authors

Nigam, Hemant

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Embedded platforms with integrated graphics processing units (GPUs) are popular choices, for use-cases, like Autonomous machines, to run the Deep Neural Networks (DNNs) inference workload. However, due to a rapid increase in data volume, DNN inference is becoming even more computationally intensive and memory sensitive, which necessitates a mechanism for improving DNN inference efficiency on existing embedded systems. This Master’s thesis investigates the memory sensitivity of DNN inference – specifically, the impact of off-chip memory (DRAM) contention on DNN inference performance. It demonstrates a prototype GPU aware memory isolation mechanism: a locking mechanism in the GPU driver to reduce DRAM contention caused by multicore CPUs, thus improving DNN inference efficiency. Experiments performed on a Jetson TX2 board running the Linux4Tegra OS shows the benefits of our proposed mechanism, with up to 13.5% speedup of a micro-benchmark and up to 41% and 86% speedup of two object detection benchmarks.

Description

Thesis (Master's)--University of Washington, 2018

Citation

DOI