Tullsen, Dean Michael
MetadataShow full item record
This dissertation examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar processor's functional units in a single cycle. Simultaneous multithreading significantly increases processor utilization in the face of both long instruction latencies and limited available parallelism per thread.This research presents several models of simultaneous multithreading and compares them with alternative organizations: a wide superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing architectures. The results show that both (single-threaded) superscalar and fine-grain multithreaded architectures are limited in their ability to utilize the resources of a wide-issue super-scalar processor. Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multithreading. Simultaneous multithreading is also an attractive alternative to single-chip multiprocessors; simultaneous multithreaded processors with a variety of organizations outperform corresponding conventional multiprocessors with similar execution resources.This dissertation also shows that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. An architecture for simultaneous multithreading is presented that achieves three goals: (1) it minimizes the architectural impact on a conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads which will use the processor most efficiently each cycle, thereby providing the "best" instructions to the processor.An analytic response-time model shows that the benefits of simultaneous multithreading in a multiprogrammed environment are not limited to increased throughput. Those throughput increases lead to significant reductions in queueing time for runnable processes, leading to response-time improvements that in many cases are significantly greater than the throughput improvements themselves.