Batched matrix computations on hardware accelerators based on GPUs

Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, Jack Dongarra
2015 The international journal of high performance computing applications  
Contractions can often be implemented as index reordering plus batched GEMM (and hence, be highly efficient)
doi:10.1177/1094342014567546 fatcat:lb3idu5ksvgdtk3tmtpfd2putq