Automated Parallelization to Improve Usability and Efficiency of Distributed Neural Network Training
Grabaskas, Nathaniel J
MetadataShow full item record
The recent success of Deep Neural Networks (DNNs)  has triggered a race to build larger and larger DNNs ; however, a known limitation is the training speed . To solve this speed problem, distributed neural network training has become an increasingly large area of research , . Usability, the complexity for a machine learning or data scientist to implement distributed neural network training, is an aspect rarely considered, yet critical. There is strong evidence growing complexity has a direct impact on development effort, maintainability, and fault proneness of software –. We investigated, if automation can greatly reduce the implementation complexity of distributing neural network training across multiple devices without loss of computational efficiency when compared to manual parallelization. Experiments were conducted using Convolutional Neural Networks (CNN) and Multi-Layer Perceptron (MLP) networks to perform image classification on CIFAR-10 and MNIST datasets. Hardware consisted of an embedded, four node NVIDIA Jetson TX1 cluster. Torch Automatic Distributed Neural Network (TorchAD-NN) reduces the implementation complexity of data parallel neural network training by more than 90% and providing components, with near zero implementation complexity, to easily parallelize all or only select fully-connected neural layers.