Parallel prefetching and caching
High-performance I/O systems depend on prefetching and caching to deliver good performance to applications. These two techniques have generally been considered in isolation, even though there are significant interactions between them: a block prefetched too early may cause a block that is needed soon to be evicted from the cache, thus reducing the effectiveness of the cache, while a block cached too long may reduce the effectiveness of prefetching by denying opportunities to the prefetcher. Using both analytical and experimental methods, we study the problem of integrated prefetching and caching for an I/O system with multiple disks.In a theoretical analysis, we consider algorithms for integrated prefetching and caching in a model abstracting relevant characteristics of file systems with multiple disks. Previously, the "aggressive" algorithm was shown by Cao, Felten, Karlin, and Li to have near-optimal performance in the single disk case. We show that the natural extension of the aggressive algorithm to the parallel disk case is suboptimal by a factor near the number of disks in the worst case. Our main theoretical result is a new algorithm, "reverse aggressive," with provably near-optimal performance in the presence of multiple disks.Using disk-accurate trace-driven simulation, we explore the performance characteristics of several algorithms in cases in which applications provide full advance knowledge of accesses using hints. The algorithms tested are the two mentioned previously, plus the "fixed horizon" algorithm of Patterson et al., and a new algorithm, "forestall," that combines the desirable characteristics of the others. We find that when performance is limited by I/O stalls, aggressive prefetching helps to alleviate the problem; that more conservative prefetching is appropriate when significant I/O stalls are not present; and that a single, simple strategy is capable of doing both.We also consider three related problems. First, we present an optimal algorithm for a restricted version of the single disk prefetching and caching problem. Next, we propose an approach to the integration of prefetching and caching policies with processor and disk scheduling policies. Finally, we show the NP-hardness of a problem of ordering requests to maximize locality of reference.