Data Storage and Exploration in a Video Data Management System

Daum, Maureen

Data Storage and Exploration in a Video Data Management System

Files

Daum_washington_0250E_26257.pdf (28.12 MB)

Date

2024-02-12

relationships.isAuthorOf

Daum, Maureen

Abstract

Increasingly many scientific and engineering domains rely on video data, which is information dense and relatively easy to collect. At the same time, recent advances in computer vision and machine learning have opened the door to automating analysis of this data. As a result, there has been a resurgence of research and innovation in powerful libraries and data management systems to support users in storing, processing, and querying videos. Current video database management systems (VDBMSs) are optimized to efficiently apply machine learning (ML) models over videos to extract their semantic content and generate query results. However, relatively little attention has been directed towards optimizing VDBMS storage managers, despite the fact that videos are stored in compressed formats due to their large size, and accessing the raw pixel information adds non-negligible latency to queries due to the required decoding step. Further, current VDBMSs are limited by assuming that there exist pretrained models relevant to users’ videos and queries; this is not the case for many domain-specific datasets, like those collected for wildlife monitoring. Users that lack a relevant pretrained model are unable to benefit from the data management and query processing capabilities that VDBMSs provide. This dissertation studies the two challenges identified above. First, we introduce TASM, a tile-based storage manager for VDBMSs. TASM accelerates workloads that operate over spatial subsets of frames by storing videos in such a way that makes it possible to decode only particular regions of frames. In contrast, current VDBMS storage managers must decode entire frames, even if only particular regions are requested, which is inefficient. TASM uses the video codec feature of tiles to divide frames into independently-decodable regions, and it automatically adjusts the layout of these tiles based on the observed query workload. Second, we introduce VOCALExplore, which is a system that supports users in exploring large video datasets and efficiently training domain-specific models that can be used to automatically extract the semantic content from unlabeled videos. VOCALExplore provides a high-level API that does not require ML expertise; it automatically navigates decisions around feature extraction and sample selection that are essential for producing a high-quality model. Importantly, VOCALExplore does not require an extensive preprocessing phase that causes the user to wait a significant amount of time before interacting with the system. Instead, VOCALExplore operates in a pay-as-you-go manner and performs processing in the background as the user interacts with it. While there still exist many open problems surrounding VDBMSs, this thesis takes steps towards making them more efficient and usable by a larger audience.