A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2013; you can also visit the original URL.
The file type is
Lecture Notes in Computer Science
We present an automatic, static program transformation that schedules and generates ecient memory transfers between a computer host and its hardware accelerator, addressing a well-known performance bottleneck. Our automatic approach uses two simple heuristics: to perform transfers to the accelerator as early as possible and to delay transfers back from the accelerator as late as possible. We implemented this transformation as a middle-end compilation pass in the pips/Par4All compiler. In thedoi:10.1007/978-3-642-36036-7_16 fatcat:jpftk6kotjbgtnq2pux3ep7wly