A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Improving Data Access Efficiency by Using Context-Aware Loads and Stores
2015
Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM - LCTES'15
We describe a set of techniques where the compiler enhances load and store instructions so that they can be executed with fewer stalls and/or enable the L1 DC to be accessed in a more energy-efficient ...
Memory operations have a significant impact on both performance and energy usage even when an access hits in the level-one data cache (L1 DC). ...
This research was supported in part by the Swedish Research Council grant 2009-4566 and the US National Science Foundation grants CNS-0964413 and CNS-0915926. ...
doi:10.1145/2670529.2754960
dblp:conf/lctrts/BardizbanyanSWL15
fatcat:rp5cnyxtjbhjxeohqjyxs5uwnm
Towards a performance- and energy-efficient data filter cache
2013
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems - ODES '13
The proposed design provides an efficient DFC that yields both energy and performance improvements. ...
As CPU data requests to the level-one (L1) data cache (DC) can represent as much as 25% of an embedded processor's total power dissipation, techniques that decrease L1 DC accesses can significantly enhance ...
ARM use a combination of an in-order Cortex A7 and out-of-order Cortex A15 cores in their big.LITTLE [13] systems. Energy efficiency is a very important feature for the alwayson cores. ...
doi:10.1145/2443608.2443614
fatcat:i3dgen64izg6hjwykm3mc75q4i
A Super-Pipelined Energy Efficient Subthreshold 240 MS/s FFT Core in 65 nm CMOS
2012
IEEE Journal of Solid-State Circuits
improving performance and energy efficiency. ...
Measurements of super-pipelined multipliers demonstrate 30% energy savings and 1.6 performance improvement. ...
CONCLUSIONS This paper proposes circuit and architecture techniques to enhance energy efficiency in the subthreshold regime, with application to an FFT module. ...
doi:10.1109/jssc.2011.2169311
fatcat:b6dmqcr5cjhqfd3lhtv5lhbqym
Efficient complex operators for irregular codes
2011
2011 IEEE 17th International Symposium on High Performance Computer Architecture
This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies. ...
That approach emphasized energy savings while matching the performance of a conventional processor, but three factors limited its performance and energy efficiency gains. ...
Acknowledgements This research was funded by the US National Science Foundation under NSF CAREER Awards 06483880 and 0846152, and under NSF CCF Award 0811794. ...
doi:10.1109/hpca.2011.5749754
dblp:conf/hpca/SampsonVGGST11
fatcat:yqjxqk44jba4tjjtwweqcwpypi
Coming challenges in microarchitecture and architecture
2001
Proceedings of the IEEE
Computers have exhibited ever-increasing performance and decreasing costs, making them more affordable and, in turn, accelerating additional software and hardware development that fueled this process even ...
New process technology requires more expensive megafabs and new performance levels require larger die, higher power consumption, and enormous design and validation effort. ...
The next step in performance enhancement beyond pipelining calls for executing several instructions in parallel. ...
doi:10.1109/5.915377
fatcat:iywintmu5jeltngkqn6h3ryh7u
Diet SODA
2010
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design - ISLPED '10
A case study was performed on digital still camera (DSC) applications; the results show that Diet SODA achieves ∼130x better performance and ∼340x better energy efficiency than a DSP solution. ...
Power has become the most critical design constraint for embedded handheld devices. This paper proposes a power-efficient SIMD architecture, referred to as Diet SODA, for DSP applications. ...
Acknowledgment Thanks to Yongjun Park for his help and feedback. This work was supported in part by NSF grants CSR 0910699 and CSR 0910851 and by ARM. ...
doi:10.1145/1840845.1840862
dblp:conf/islped/SeoDWCMM10
fatcat:idlrzqr4ofaqroow6kmui6fuhi
An energy and bandwidth efficient ray tracing architecture
2013
Proceedings of the 5th High-Performance Graphics Conference on - HPG '13
We propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing while keeping performance high. ...
First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering. ...
The Vegetation and Hairball models are from Samuli Laine, and the Sibenik Cathedral model is from Marko Dabrovic. ...
doi:10.1145/2492045.2492058
dblp:conf/egh/KoptaSSBD13
fatcat:tosuv5j7vjg7nnj2ltpvklhqdu
On the Greenness of In-Situ and Post-Processing Visualization Pipelines
2015
2015 IEEE International Parallel and Distributed Processing Symposium Workshop
As an alternative, in-situ pipelines are proposed in order to enhance the knowledge discovery process via "real-time" visualization. ...
Thus, in this work, we study the greenness (i.e., power, energy, and energy efficiency) of the in-situ and the post-processing visualization pipelines, using a proxy heat-transfer simulation as an example ...
Department of Energy (DOE) Office of Advanced Scientific Computing Research (ASCR) via DE-SC0012637 and by infrastructure provided by Supermicro. ...
doi:10.1109/ipdpsw.2015.132
dblp:conf/ipps/AdhinarayananFW15
fatcat:xp6ctgnn65b6tfzc2zmdrwws3a
A Review: Effective Techniques for Hardware Modelling of Machine Learning Algorithms
2021
Zenodo
In this text, we compare the different techniques for hardware modelling of different machine learning (ML) algorithms, and their hardware-level performance. ...
This text will be useful for any researcher or system designer that needs to first evaluate the optimum techniques for ML design, and then inspired by this, they can further extend it and optimize the ...
To enhance execution, tiled framework augmentation is organized as a pipelined paired snake tree for performing augmentation and producing incomplete wholes. ...
doi:10.5281/zenodo.4769299
fatcat:t6ed5mxqgbbuvftougg5mqgxwy
A Review: Effective Techniques for Hardware Modelling of Machine Learning Algorithms
2021
Zenodo
In this text, we compare the different techniques for hardware modelling of different machine learning (ML) algorithms, and their hardware-level performance. ...
This text will be useful for any researcher or system designer that needs to first evaluate the optimum techniques for ML design, and then inspired by this, they can further extend it and optimize the ...
To enhance execution, tiled framework augmentation is organized as a pipelined paired snake tree for performing augmentation and producing incomplete wholes. ...
doi:10.5281/zenodo.5832190
fatcat:piqcz4amdncgzmzyfvftlca3wi
Addressing instruction fetch bottlenecks by using an instruction register file
2007
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools - LCTES '07
This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. ...
Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend. ...
Acknowledgments We thank the anonymous reviewers for their constructive comments and suggestions. ...
doi:10.1145/1254766.1254800
dblp:conf/lctrts/HinesTW07
fatcat:fh4qhy3yfza25dunua3l5mrehu
Addressing instruction fetch bottlenecks by using an instruction register file
2007
SIGPLAN notices
This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. ...
Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend. ...
Acknowledgments We thank the anonymous reviewers for their constructive comments and suggestions. ...
doi:10.1145/1273444.1254800
fatcat:fgvdcnbqifdudgtxjfrcdr7thi
Macro Data Load: An Efficient Mechanism for Enhancing Loaded Data Reuse
2011
IEEE transactions on computers
In a 64-bit processor, for example, a byte load will bring a full 64-bit data from cache and save it in an internal hardware structure, while using for itself only the specified byte out of the 64-bit ...
eliminated, resulting in a related energy reduction of 11.4 percent, 9.0 percent, and 14.3 percent on average, respectively. ...
We expect that just as store-to-load forwarding techniques have become conventional in high-performance processors, an area-efficient and complexity-effective loadto-load forwarding technique, like our ...
doi:10.1109/tc.2010.131
fatcat:bbnofm6whva5tbpvipy7wlixy4
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
2009
SIGPLAN notices
Next sequential line prefetch for the data cache can be enhanced by only prefetching when the triggering instruction has been previously accessed in the TH-IC. ...
LIFE enables designers to boost instruction fetch efficiency by reducing energy cost without negatively affecting performance. ...
Acknowledgments We thank the anonymous reviewers for their constructive comments and suggestions. This research was supported in part by NSF grants CCR-0312493, CCF-0444207, and CNS-0615085. ...
doi:10.1145/1543136.1542469
fatcat:pj4y476apff4re2hrtjqsnodze
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
2009
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems - LCTES '09
Next sequential line prefetch for the data cache can be enhanced by only prefetching when the triggering instruction has been previously accessed in the TH-IC. ...
LIFE enables designers to boost instruction fetch efficiency by reducing energy cost without negatively affecting performance. ...
Acknowledgments We thank the anonymous reviewers for their constructive comments and suggestions. This research was supported in part by NSF grants CCR-0312493, CCF-0444207, and CNS-0615085. ...
doi:10.1145/1542452.1542469
dblp:conf/lctrts/HinesPGWT09
fatcat:itrmcfgtlvbaldl3iy7becq4fm
« Previous
Showing results 1 — 15 out of 35,237 results