Building and Accelerating a Declarative Platform for Machine Learning Model Serving

Lu, Yao

Building and Accelerating a Declarative Platform for Machine Learning Model Serving

Files

Lu_washington_0250E_19183.pdf (10.48 MB)

Date

2018-11-28

Authors

Lu, Yao

Abstract

Artificial intelligence has become the topic of the current and next decade. Numerous AI-related applications in computer vision, natural language processing and audio are deployed to change people's lives. Building effective algorithms and highly available systems to fulfill the exploding demand are the key challenges for both researchers and industry workers. To achieve these goals, we would expect technical advances in multiple areas including machine learning, databases and distributed systems. This work focuses on providing better big-data platforms for AI applications. One core challenge here is to map various machine learning and domain-specific algorithms onto the big-data platform. Well-known systems including Spark or Hadoop apply user-defined functions (UDFs) and let the system engineer specify runtime details such as storage, degree of parallelism etc. Instead, we aim at applying UDFs upon a relational big-data platform; in such a manner, complex machine learning functionalities and legacy optimizations from the database domain can both come to bear. With this ground, we have developed Optasia, a dataflow system to efficiently process machine learning inference queries on video feeds from multiple cameras. Key gains of Optasia result from modularizing machine learning pipelines in way that relational query optimization can be applied. Specifically, Optasia can de-duplicate the work of common modules, auto-parallelize the query plans based on input size, number of inputs and operation complexity, and offer chunk-level parallelism that allows multiple tasks to process the feed of a single camera. We show evaluation on complex vision inference queries with traffic videos from many cameras. Optasia produces high accuracy with many fold improvements in query completion time and resource usage relative to existing systems. To this end, basic query optimization is explored in Optasia for better runtime parallelism. However, we identify that many other query optimization techniques, including predicate pushdown, are of limited use for machine learning inference queries. This is because the UDFs which extract relational columns from unstructured inputs are often very expensive; query predicates will remain stuck behind these UDFs if they happen to require relational columns that are generated by the UDFs. In our recent work, we show constructing and applying probabilistic predicates to filter data blobs that do not satisfy the query predicate; such filtering is parametrized to different target accuracies. To support complex predicates and to avoid per-query training, we augment a cost-based query optimizer to choose plans with appropriate combinations of simpler probabilistic predicates. Experiments with several machine learning workloads on a big-data cluster show that query processing improves by as much as 10x. Moreover, we showcase an interactive demonstration system in with probabilistic predicates to accelerate machine learning inference queries. Users can query upon various document, image and video inputs, and inspect modified query plans as well as results. These are the initial steps towards bringing declarative dataflow engines to bear for scalable machine learning model serving. Much work remains in AI + systems; advances for each of the subproblems will be applicable to end applications and will lead to better user experiences as well as less cost.