Effective usage of vector registers in decoupled vector architectures

L. Villa, R. Espasa, M. Valero
Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -  
Thzs paper presents a study of the ampact of reduci n g t h e vector regaster saze rn a decoupled vector archztecture. In traditional in-order vector archatectures, l o i i g ?lector regrsters haue typically been the n o r m . W e start presenting data that shours that, even for hrgh.ly rmtorrnable codes, only a small fraction os all elements of (I long vector regaster are actually used. U7e also sho,w that reducang the regaster swe an a tradataonal ~i~e c t o~ architect,ui,e in an attempt t o
more » ... educe hardware cost u n d iiiaxainzze regrster utalazataon 1-esults an a se-' i l c~e lmformaizce degradataon. However, we combine t h e decoupling techiiique urith the vector register rediietron and show that the resultany architecture tolera t t s very well the regaster szze cuts. W e samulate a stlccfzon of Perfect Club and SpecfpSL programs usrirg U trace draven approach, and compare t h e ezecutroll. t i m e LIZ a conventional vector architecture with a tlr-coupled vector architecture uszng dafferent regasters ' i i z c s . Halvang the regrster szze and uszng decouplzng pi,oe'ides speedups between 1.04-1.49 over a tradataonal iiZ-order .rector machrnes. Euen reduczng the regaster leiig-th t o 1/4 the original size (and, in. some cases, t o 1/81 the performance of the decoupled machzne IS better t h a n a conventronal vector model. Moreover, 'we observe that the resultaizg decoupled machine wath short r-egisters tolerates very well long m e m o r y latenczes.
doi:10.1109/empdp.1998.647238 dblp:conf/pdp/VillaEV98 fatcat:pan34gs665arje5xohx4kyw45m