Facilitating FPGA Prototyping with Hardware OS Primitives
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Both data center operators and the research community have embraced hardware accelerators,because of their potential for significant improvements in performance and energy efficiency.
There have now been several large-scale deployments of accelerators in datacenters from com-
panies such as Google, Facebook, and Microsoft. FPGAs have become a compelling acceleration
platform, because their reconfigurability allows them to be repurposed as the application mix
changes. Both Microsoft and Amazon have deployed FPGAs throughout their datacenter to both
rent to consumers as well as accelerate their own services. Microsoft in particular attaches the
FPGAs it uses to accelerate its own workloads directly to the network. Directly attaching the
FPGA to the network further reduces latency, improves cost-performance, and reduces energy
use relative to mediating network communications with CPUs. However, building accelerated
applications or services for direct-attached FPGAs is challenging, especially with the complex
I/O and multi-accelerator capacity of modern FPGAs.
This thesis argues that direct-attached accelerator systems can be built in a modular manner
that preserves the benefits of a direct-attached accelerator while also reducing the engineering
burden. We first describe a design and prototype for Apiary, a microkernel operating system for
direct-attached FPGA accelerators based on messaging passing over a network on chip (NoC)
architecture. The key idea in Apiary is to raise the level of abstraction for accelerated application
code, with isolation, threaded execution, and interprocess communication provided by a portable
hardware OS layer in order to ease development difficulties. We propose specific hardware OS
primitives to provide these services and abstractions. We then conduct an end-to-end case study
of Apiary by prototyping a selection of these primitives to evaluate how well they serve Apiary’s
design goals. We then describe Beehive, a hardware network stack we designed and prototyped
for Apiary based around message passing over a NoC. We show that our architecture is better
able to support the complexity of a software datacenter network stack by providing replication of
elements and applications and standard TCP and UDP interoperation. At the same time, direct-
attached accelerators using Beehive can achieve 4x improvement in end-to-end RPC tail latency
for Linux UDP clients versus a CPU-attached accelerator.
Description
Thesis (Ph.D.)--University of Washington, 2025
