An analysis of software interface issues for SMT processors
Redstone, Joshua Abram
MetadataShow full item record
Simultaneous Multithreading (SMT) has gradually progressed from a research concept to commercial processor technology. This thesis explores three software interface issues on SMT that are important to its real-world applicability. These issues are: operating system performance on SMT, the impact of spinning on SMT, and register file limitations to scaling SMT. We investigate these issues with a new, detailed simulation infrastructure capable of modeling all operating system activity.First, we present an analysis of operating system execution on SMT. Many of the applications most amenable to multithreading technologies, such as the Apache web server, spend a significant fraction of their time in kernel code. We compare Apache's user- and kernel-mode behavior to a multiprogrammed SPECInt workload. Overall, our results demonstrate the micro-architectural impact of an OS-intensive workload on an SMT processor. The synergy between the SMT processor and Web and OS software produces a greater throughput gain over superscalar execution than seen on any previously examined workloads, including commercial databases.Second, we study the cost of synchronization on SMT. Spinning can exact a large performance cost on SMT, because all threads share execution resources. We quantify the impact of spinning on SMT and the performance benefit of replacing spinning with SMT-lock-based code. We observe that spinning's degradation of performance ranges widely between more than 3x on multiprogrammed workloads to a negligible amount on the Apache workload.Finally, we explore architectural register sharing on SMT. A significant impediment to the construction of SMTs larger than two or four contexts is register file size. We introduce and evaluate mini-threads, a simple extension to SMT that increases thread-level parallelism without the commensurate increase in register hardware. A mini-threaded SMT CPU adds additional per-thread state to each hardware context; an application executing in a context can create mini-threads that will utilize its own per-thread state, but share the context's architectural register set. Our results quantify the factors affecting performance in detail and demonstrate that mini-threads can improve performance significantly, particularly on small-scale, space-sensitive CPU designs.