How to compare the performance of two SMT microarchitectures
PublisherInstitute of Electrical and Electronics Engineers Inc.
Source2001 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2001
IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2001
Google Scholar check
MetadataShow full item record
In this paper we discuss methods and metrics for comparing the performance of two simultaneous multithreading microarchitectures. We identify conditions under which the instructions-per-cycle metric may be misleading for comparing two simultaneous multithreading microarchitectures for the same amount of work. Part of the problem is isolated to the definition of what is same work When simulating a mix of independent programs under the same initial conditions on two different simultaneous multithreading microarchitectures there are two approaches to ensure the work of the two runs is same: constant-work-per-thread or variable-work-per-thread. For both approaches the total number of instructions in the run is constant, however, for the first, the instructions from each thread is also constant, whereas for the second is not. We claim that: (a) when simulating two microarchitectures with the constant-work-per-thread approach, the instructions-percycle is sufficient to compare them to establish the microarchitecture with the best performance, (b) when variable-work-per-thread approach is used the instruction-per-cycle may be inadequate for comparing performance. We attribute this to the inability of the instructions-per-cycle metric to account for differences in the load-balance of the two runs. A new performance metric, SMT-speedup, is proposed that enables accurate comparison of the performance of two simultaneous multithreading microarchitectures for runs with different load-balance. The new metric considers the load-balance in terms of the size and performance of each thread. In light of the insight gain in this paper we contend that a simultaneous multithreading microarchitecture may need to trade-off throughput and load-balance to achieve the best performance. © 2001 IEEE.