A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2003; you can also visit the original URL.
The file type is
Exploiting instruction- and data-level parallelism
Simultaneous multithreaded vector architectures combine the best of data-level and instruction-level parallelism and perform better than either approach could separately. Our design achieves performance equivalent to executing 15 to 26 scalar instructions/cycle for numerical applications.doi:10.1109/40.621210 fatcat:5oanmvkc3vfe7lq3w4jcdbkmjy