High-Performance Transaction Processing in Disk-based Databases

dc.contributor.advisorPeter, Simon
dc.contributor.authorHwang, Deukyeon
dc.date.accessioned2026-02-05T19:34:15Z
dc.date.available2026-02-05T19:34:15Z
dc.date.issued2026-02-05
dc.date.submitted2025
dc.descriptionThesis (Ph.D.)--University of Washington, 2025
dc.description.abstractAchieving high-performance transaction processing in disk-based databases has long required system designers to choose between lock-based concurrency control methods, which suffer from CPU overhead and reduced parallelism, and timestamp-based methods, which provide superior concurrency but incur prohibitive I/O overhead when timestamp metadata is stored on disk. Modern high-speed storage devices like NVMe SSDs exacerbate this trade-off, as the CPU becomes the bottleneck for lock-based methods while disk-based timestamp storage wastes the storage device’s speed on frequent small metadata operations. This dissertation introduces a novel approach that eliminates this fundamental trade-off through approximate timestamp storage and demonstrates that timestamp-based concurrency control protocols—specifically Strict Timestamp Ordering (STO), Multi-Version Timestamp Ordering (MVTO), and TicToc—can maintain correctness (serializability) even when timestamps are overapproximated for inactive keys, as long as active keys maintain exact timestamps throughout their transaction lifetime. This key insight enables designing FPSketch, a hybrid data structure combining a hash table for exact timestamps of active keys with a probabilistic sketch for approximate upper bounds of inactive keys. The first contribution is the design, implementation, and evaluation of FPSketch integrated with STO, MVTO, and TicToc in the SplinterDB key-value store. FPSketch achieves nearly the idealized performance while requiring only minimal memory—as little as 32KiB for an 80GB database—by eliminating the need to access timestamp metadata from disk during normal operation. Experimental evaluation on modern NVMe SSDs demonstrates that TicToc with FPSketch achieves up to 14x higher goodput than traditional two-phase locking, up to 5.9x higher goodput than disk-based timestamp storage. The second contribution is a comprehensive analytical and experimental study evaluating FPSketch across the entire storage performance spectrum, from traditional hard disk drives with millisecond latencies to emerging CXL-based storage approaching DRAM-like speeds. The evaluation reveals that FPSketch’s benefits scale with the fundamental gap between local memory and remote storage access, ensuring its continued relevance as storage technology evolves. On slow storage (HDDs and SATA SSDs), FPSketch enables timestamp-based protocols to outperform traditional concurrency control methods: on SATA SSD, TicTocFocus-Sketch achieves up to 6.89× and 2.52× higher goodput than two-phase locking (2PL) and KR-OCC, respectively, while on HDD it reaches up to 1.8× the goodput of KR-OCC. FPSketch also eliminates the prohibitive overhead of timestamp disk accesses, achieving improvements of up to 569% over disk-based timestamp storage. On fast storage (simulated CXL-based SSDs), where systems transition from I/O-bound to CPU-bound, FPSketch continues to provide substantial benefits by keeping timestamp metadata in fast local memory, enabling timestamp-based protocols to significantly outperform traditional approaches. Together, these contributions establish that approximate, in-memory metadata management enables high-performance transaction processing for disk-based databases. FPSketch demonstrates that approximate metadata management can unlock advanced concurrency control designs that would otherwise be impractical, providing a practical solution that enables efficient timestamp-based concurrency control across diverse storage technologies while requiring only minimal memory overhead.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherHwang_washington_0250E_29025.pdf
dc.identifier.urihttps://hdl.handle.net/1773/55187
dc.language.isoen_US
dc.rightsCC BY-NC-ND
dc.subjectApproximate Data Structures
dc.subjectConcurrency Control
dc.subjectDatabase Systems
dc.subjectIn-Memory Metadata
dc.subjectModern Storage Systems
dc.subjectTransaction Processing
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleHigh-Performance Transaction Processing in Disk-based Databases
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hwang_washington_0250E_29025.pdf
Size:
502.75 KB
Format:
Adobe Portable Document Format