Kernel Mechanisms for Efficient GPU Accelerated Deep Neural Network Inference on Embedded Devices

Stiber, MichaelNigam, Hemant2018-04-242018-04-242018Nigam_washington_0250O_18302.pdfhttp://hdl.handle.net/1773/41727Thesis (Master's)--University of Washington, 2018Embedded platforms with integrated graphics processing units (GPUs) are popular choices, for use-cases, like Autonomous machines, to run the Deep Neural Networks (DNNs) inference workload. However, due to a rapid increase in data volume, DNN inference is becoming even more computationally intensive and memory sensitive, which necessitates a mechanism for improving DNN inference efficiency on existing embedded systems. This Master’s thesis investigates the memory sensitivity of DNN inference – specifically, the impact of off-chip memory (DRAM) contention on DNN inference performance. It demonstrates a prototype GPU aware memory isolation mechanism: a locking mechanism in the GPU driver to reduce DRAM contention caused by multicore CPUs, thus improving DNN inference efficiency. Experiments performed on a Jetson TX2 board running the Linux4Tegra OS shows the benefits of our proposed mechanism, with up to 13.5% speedup of a micro-benchmark and up to 41% and 86% speedup of two object detection benchmarks.application/pdfen-USnoneDeep Neural NetworkDRAMEdge ComputingGPU AccelerationInferenceComputer scienceTo Be AssignedKernel Mechanisms for Efficient GPU Accelerated Deep Neural Network Inference on Embedded DevicesThesis