Array restructuring for cache locality
Abstract
Caches are used in almost every modem processor design to reduce the long memory access latency, which is increasingly a bottleneck to program performance. For caches to be effective, programs must exhibit good data locality. Thus, an optimizing compiler may have to restructure programs to enhance their locality. We focus on the class of restructuring techniques that target array accesses in loops.There are two approaches to enhancing the locality of such accesses: loop restructuring and array restructuring. Under loop restructuring, a compiler adopts a canonical array layout but transforms the order in which loop iterations are performed and thereby reorders the execution of array accesses. Under array restructuring, in contrast, a compiler lays out array elements in an order that matches the access pattern, while preserving the flow of control. While loop restructuring has been studied extensively, array restructuring has received much less attention despite advantages such as its applicability to complicated loop structures that may hamper loop restructuring.To fill the void, this dissertation investigates how to perform array restructuring effectively--efficiently, automatically, and generally. We present a formal framework for array transformations that meet these objectives. Such transformations are represented by linear transformations of array index vectors. Within this framework, we develop algorithms to solve various problems in array restructuring: selecting transformations based on the access pattern, laying out elements of restructured arrays, and determining which elements are accessed by a loop and thus restructuring only that part of an array.To evaluate our array restructuring technique, we implemented a prototype compiler and performed a series of experiments with loops commonly used in related loop restructuring studies. Experimental measurements showed that array restructuring improved performance substantially in many cases, despite a modest runtime overhead in some. Moreover, the results also indicated that array restructuring complemented loop restructuring in applicability and performance: it applied where loop restructuring did not; when both applied, it offered comparable, sometimes even better, performance; in cases where it did not perform as well, loop restructuring improved performance considerably anyway. This observation points to the potential benefit of integrating the two complementary approaches.