Kernel Mechanisms for Efficient GPU Accelerated Deep Neural Network Inference on Embedded Devices
MetadataShow full item record
Embedded platforms with integrated graphics processing units (GPUs) are popular choices, for use-cases, like Autonomous machines, to run the Deep Neural Networks (DNNs) inference workload. However, due to a rapid increase in data volume, DNN inference is becoming even more computationally intensive and memory sensitive, which necessitates a mechanism for improving DNN inference efficiency on existing embedded systems. This Master’s thesis investigates the memory sensitivity of DNN inference – specifically, the impact of off-chip memory (DRAM) contention on DNN inference performance. It demonstrates a prototype GPU aware memory isolation mechanism: a locking mechanism in the GPU driver to reduce DRAM contention caused by multicore CPUs, thus improving DNN inference efficiency. Experiments performed on a Jetson TX2 board running the Linux4Tegra OS shows the benefits of our proposed mechanism, with up to 13.5% speedup of a micro-benchmark and up to 41% and 86% speedup of two object detection benchmarks.