Filters








35,237 Hits in 6.2 sec

Improving Data Access Efficiency by Using Context-Aware Loads and Stores

Alen Bardizbanyan, Magnus Själander, David Whalley, Per Larsson-Edefors
2015 Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM - LCTES'15  
We describe a set of techniques where the compiler enhances load and store instructions so that they can be executed with fewer stalls and/or enable the L1 DC to be accessed in a more energy-efficient  ...  Memory operations have a significant impact on both performance and energy usage even when an access hits in the level-one data cache (L1 DC).  ...  This research was supported in part by the Swedish Research Council grant 2009-4566 and the US National Science Foundation grants CNS-0964413 and CNS-0915926.  ... 
doi:10.1145/2670529.2754960 dblp:conf/lctrts/BardizbanyanSWL15 fatcat:rp5cnyxtjbhjxeohqjyxs5uwnm

Towards a performance- and energy-efficient data filter cache

Alen Bardizbanyan, Magnus Själander, David Whalley, Per Larsson-Edefors
2013 Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems - ODES '13  
The proposed design provides an efficient DFC that yields both energy and performance improvements.  ...  As CPU data requests to the level-one (L1) data cache (DC) can represent as much as 25% of an embedded processor's total power dissipation, techniques that decrease L1 DC accesses can significantly enhance  ...  ARM use a combination of an in-order Cortex A7 and out-of-order Cortex A15 cores in their big.LITTLE [13] systems. Energy efficiency is a very important feature for the alwayson cores.  ... 
doi:10.1145/2443608.2443614 fatcat:i3dgen64izg6hjwykm3mc75q4i

A Super-Pipelined Energy Efficient Subthreshold 240 MS/s FFT Core in 65 nm CMOS

Dongsuk Jeon, Mingoo Seok, Chaitali Chakrabarti, David Blaauw, Dennis Sylvester
2012 IEEE Journal of Solid-State Circuits  
improving performance and energy efficiency.  ...  Measurements of super-pipelined multipliers demonstrate 30% energy savings and 1.6 performance improvement.  ...  CONCLUSIONS This paper proposes circuit and architecture techniques to enhance energy efficiency in the subthreshold regime, with application to an FFT module.  ... 
doi:10.1109/jssc.2011.2169311 fatcat:b6dmqcr5cjhqfd3lhtv5lhbqym

Efficient complex operators for irregular codes

Jack Sampson, Ganesh Venkatesh, Nathan Goulding-Hotta, Saturnino Garcia, Steven Swanson, Michael Bedford Taylor
2011 2011 IEEE 17th International Symposium on High Performance Computer Architecture  
This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies.  ...  That approach emphasized energy savings while matching the performance of a conventional processor, but three factors limited its performance and energy efficiency gains.  ...  Acknowledgements This research was funded by the US National Science Foundation under NSF CAREER Awards 06483880 and 0846152, and under NSF CCF Award 0811794.  ... 
doi:10.1109/hpca.2011.5749754 dblp:conf/hpca/SampsonVGGST11 fatcat:yqjxqk44jba4tjjtwweqcwpypi

Coming challenges in microarchitecture and architecture

R. Ronen, A. Mendelson, K. Lai, Shih-Lien Lu, F. Pollack, J.P. Shen
2001 Proceedings of the IEEE  
Computers have exhibited ever-increasing performance and decreasing costs, making them more affordable and, in turn, accelerating additional software and hardware development that fueled this process even  ...  New process technology requires more expensive megafabs and new performance levels require larger die, higher power consumption, and enormous design and validation effort.  ...  The next step in performance enhancement beyond pipelining calls for executing several instructions in parallel.  ... 
doi:10.1109/5.915377 fatcat:iywintmu5jeltngkqn6h3ryh7u

Diet SODA

Sangwon Seo, Ronald G. Dreslinski, Mark Woh, Chaitali Chakrabarti, Scott Mahlke, Trevor Mudge
2010 Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design - ISLPED '10  
A case study was performed on digital still camera (DSC) applications; the results show that Diet SODA achieves ∼130x better performance and ∼340x better energy efficiency than a DSP solution.  ...  Power has become the most critical design constraint for embedded handheld devices. This paper proposes a power-efficient SIMD architecture, referred to as Diet SODA, for DSP applications.  ...  Acknowledgment Thanks to Yongjun Park for his help and feedback. This work was supported in part by NSF grants CSR 0910699 and CSR 0910851 and by ARM.  ... 
doi:10.1145/1840845.1840862 dblp:conf/islped/SeoDWCMM10 fatcat:idlrzqr4ofaqroow6kmui6fuhi

An energy and bandwidth efficient ray tracing architecture

Daniel Kopta, Konstantin Shkurko, Josef Spjut, Erik Brunvand, Al Davis
2013 Proceedings of the 5th High-Performance Graphics Conference on - HPG '13  
We propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing while keeping performance high.  ...  First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering.  ...  The Vegetation and Hairball models are from Samuli Laine, and the Sibenik Cathedral model is from Marko Dabrovic.  ... 
doi:10.1145/2492045.2492058 dblp:conf/egh/KoptaSSBD13 fatcat:tosuv5j7vjg7nnj2ltpvklhqdu

On the Greenness of In-Situ and Post-Processing Visualization Pipelines

Vignesh Adhinarayanan, Wu-Chun Feng, Jonathan Woodring, David Rogers, James Ahrens
2015 2015 IEEE International Parallel and Distributed Processing Symposium Workshop  
As an alternative, in-situ pipelines are proposed in order to enhance the knowledge discovery process via "real-time" visualization.  ...  Thus, in this work, we study the greenness (i.e., power, energy, and energy efficiency) of the in-situ and the post-processing visualization pipelines, using a proxy heat-transfer simulation as an example  ...  Department of Energy (DOE) Office of Advanced Scientific Computing Research (ASCR) via DE-SC0012637 and by infrastructure provided by Supermicro.  ... 
doi:10.1109/ipdpsw.2015.132 dblp:conf/ipps/AdhinarayananFW15 fatcat:xp6ctgnn65b6tfzc2zmdrwws3a

A Review: Effective Techniques for Hardware Modelling of Machine Learning Algorithms

Amita P. Thakare
2021 Zenodo  
In this text, we compare the different techniques for hardware modelling of different machine learning (ML) algorithms, and their hardware-level performance.  ...  This text will be useful for any researcher or system designer that needs to first evaluate the optimum techniques for ML design, and then inspired by this, they can further extend it and optimize the  ...  To enhance execution, tiled framework augmentation is organized as a pipelined paired snake tree for performing augmentation and producing incomplete wholes.  ... 
doi:10.5281/zenodo.4769299 fatcat:t6ed5mxqgbbuvftougg5mqgxwy

A Review: Effective Techniques for Hardware Modelling of Machine Learning Algorithms

Amita P. Thakare, Dr. Sunil Kumar
2021 Zenodo  
In this text, we compare the different techniques for hardware modelling of different machine learning (ML) algorithms, and their hardware-level performance.  ...  This text will be useful for any researcher or system designer that needs to first evaluate the optimum techniques for ML design, and then inspired by this, they can further extend it and optimize the  ...  To enhance execution, tiled framework augmentation is organized as a pipelined paired snake tree for performing augmentation and producing incomplete wholes.  ... 
doi:10.5281/zenodo.5832190 fatcat:piqcz4amdncgzmzyfvftlca3wi

Addressing instruction fetch bottlenecks by using an instruction register file

Stephen Roderick Hines, Gary Tyson, David Whalley
2007 Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools - LCTES '07  
This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth.  ...  Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.  ...  Acknowledgments We thank the anonymous reviewers for their constructive comments and suggestions.  ... 
doi:10.1145/1254766.1254800 dblp:conf/lctrts/HinesTW07 fatcat:fh4qhy3yfza25dunua3l5mrehu

Addressing instruction fetch bottlenecks by using an instruction register file

Stephen Roderick Hines, Gary Tyson, David Whalley
2007 SIGPLAN notices  
This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth.  ...  Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.  ...  Acknowledgments We thank the anonymous reviewers for their constructive comments and suggestions.  ... 
doi:10.1145/1273444.1254800 fatcat:fgvdcnbqifdudgtxjfrcdr7thi

Macro Data Load: An Efficient Mechanism for Enhancing Loaded Data Reuse

Lei Jin, Sangyeun Cho
2011 IEEE transactions on computers  
In a 64-bit processor, for example, a byte load will bring a full 64-bit data from cache and save it in an internal hardware structure, while using for itself only the specified byte out of the 64-bit  ...  eliminated, resulting in a related energy reduction of 11.4 percent, 9.0 percent, and 14.3 percent on average, respectively.  ...  We expect that just as store-to-load forwarding techniques have become conventional in high-performance processors, an area-efficient and complexity-effective loadto-load forwarding technique, like our  ... 
doi:10.1109/tc.2010.131 fatcat:bbnofm6whva5tbpvipy7wlixy4

Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)

Stephen Roderick Hines, Yuval Peress, Peter Gavin, David Whalley, Gary Tyson
2009 SIGPLAN notices  
Next sequential line prefetch for the data cache can be enhanced by only prefetching when the triggering instruction has been previously accessed in the TH-IC.  ...  LIFE enables designers to boost instruction fetch efficiency by reducing energy cost without negatively affecting performance.  ...  Acknowledgments We thank the anonymous reviewers for their constructive comments and suggestions. This research was supported in part by NSF grants CCR-0312493, CCF-0444207, and CNS-0615085.  ... 
doi:10.1145/1543136.1542469 fatcat:pj4y476apff4re2hrtjqsnodze

Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)

Stephen Roderick Hines, Yuval Peress, Peter Gavin, David Whalley, Gary Tyson
2009 Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems - LCTES '09  
Next sequential line prefetch for the data cache can be enhanced by only prefetching when the triggering instruction has been previously accessed in the TH-IC.  ...  LIFE enables designers to boost instruction fetch efficiency by reducing energy cost without negatively affecting performance.  ...  Acknowledgments We thank the anonymous reviewers for their constructive comments and suggestions. This research was supported in part by NSF grants CCR-0312493, CCF-0444207, and CNS-0615085.  ... 
doi:10.1145/1542452.1542469 dblp:conf/lctrts/HinesPGWT09 fatcat:itrmcfgtlvbaldl3iy7becq4fm
« Previous Showing results 1 — 15 out of 35,237 results