Exploiting instruction- and data-level parallelism

R. Espasa, M. Valero
1997 IEEE Micro  
Simultaneous multithreaded vector architectures combine the best of data-level and instruction-level parallelism and perform better than either approach could separately. Our design achieves performance equivalent to executing 15 to 26 scalar instructions/cycle for numerical applications.
doi:10.1109/40.621210 fatcat:5oanmvkc3vfe7lq3w4jcdbkmjy