Interference-Free Regions and Their Application to Compiler Optimization and Data-Race Detection
MetadataShow full item record
Programming languages must be defined precisely so that programmers can reason carefully about the behavior of their code and language implementers can provide correct and efficient compilers and interpreters. However, until quite recently, mainstream languages such as Java and C++ did not specify exactly how programs that use shared-memory multithreading should behave (e.g., when do writes by one thread become visible to another thread?). The memory model of a programming language addresses such questions. The recently-approved memory model for C++ effectively requires programs to be "data-race-free": all executions of the program must have the property that any conflicting memory accesses in different threads are ordered by synchronization. To meet this requirement, programmers must ensure that threads properly coordinate accesses to shared memory using synchronization mechanisms such as mutual-exclusion locks. We introduce a new abstraction for reasoning about data-race-free programs: interference-free regions. An interference-free region, or IFR, is a region surrounding a memory access during which no other thread can modify the accessed memory location without causing a data race. Specifically, the interference-free region for a memory access extends from the last acquire call (e.g., mutex lock) before the access to the first release call (e.g., mutex unlock) after the access. Using IFRs, we can reason sequentially about code that contains synchronization operations. IFRs enable entirely thread-local reasoning, meaning we do not need to have the whole program available in order to make useful inferences. We develop IFRs as a abstract concept, and also present two practical applications of IFRs. First, IFR-based reasoning can be used to extend the scope of compiler optimizations. Compilers typically optimize within synchronization-free regions, since the data-race-freedom assumption permits sequential reasoning in the absence of synchronization. We make the observation that this rule of thumb is overly conservative: it is safe to optimize across synchronization calls as long as the calls are interference-free for the variable in question. (We say that a variable is interference-free at a call if the call falls in the interference-free region for an access to that variable.) We have developed two symmetric compiler analyses for determining which variables are interference-free at each synchronization call, thereby allowing later optimization passes to optimize in larger regions that may include synchronization. Second, we have developed an algorithm for dynamic data-race detection based on the concept of IFRs. Data-race detection is an important problem to the programming languages community: programmers need to eliminate data races during software development in order to avoid costly bugs in production systems. Our algorithm monitors active IFRs for each thread at runtime, reporting a data race if conflicting IFRs in different threads overlap in real time. Conservative approximations of IFRs are inferred using a static instrumentation pass. We compare our algorithm to two precise data-race detectors, and determine that our algorithm catches many data races and provides better performance on most benchmarks. As a final step, we extend the compiler analyses used in both projects to be interprocedural (i.e., analyzing more than one function at a time). Specifically, we classify functions according to their synchronization behavior, making it easier to infer when IFRs propagate through function calls. On the compiler optimization side, this change means that we can optimize across calls that contain internal synchronization. On the data-race detection side, we are able to statically infer longer IFRs, meaning that we are more likely to detect data races.