Filters








18,296 Hits in 4.3 sec

Compiler-managed partitioned data caches for low power

Rajiv Ravindran, Michael Chu, Scott Mahlke
2007 Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools - LCTES '07  
However, doing this in hardware alone is difficult due to hardware complexity, high power dissipation, overheads of dynamic discovery of application characteristics, and increased likelihood of making  ...  Using four direct-mapped partitions, we eliminated 25% of the tag checks and recorded an average 15% reduction in the energy-delay product compared to a hardware-managed 4-way set-associative cache.  ...  Zehra Sura of IBM TJ Watson Research Center for the initial discussions on partitioned caches.  ... 
doi:10.1145/1254766.1254809 dblp:conf/lctrts/RavindranCM07 fatcat:wavoh73t6ne27nusrbyznwdxz4

Compiler-managed partitioned data caches for low power

Rajiv Ravindran, Michael Chu, Scott Mahlke
2007 SIGPLAN notices  
However, doing this in hardware alone is difficult due to hardware complexity, high power dissipation, overheads of dynamic discovery of application characteristics, and increased likelihood of making  ...  Using four direct-mapped partitions, we eliminated 25% of the tag checks and recorded an average 15% reduction in the energy-delay product compared to a hardware-managed 4-way set-associative cache.  ...  Zehra Sura of IBM TJ Watson Research Center for the initial discussions on partitioned caches.  ... 
doi:10.1145/1273444.1254809 fatcat:icvlk2szozeaxlbqt4pmwrw77u

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights [article]

Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, Baoxin Li
2021 arXiv   pre-print
This paper provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators.  ...  In particular, it discusses enhancement modules in the architecture design and the software support; categorizes different hardware designs and acceleration techniques and analyzes them in terms of hardware  ...  of sparse tensors while leveraging data reuse, load balancing of computations, and compiler support.  ... 
arXiv:2007.00864v2 fatcat:k4o2xboh4vbudadfiriiwjp7uu

Better exploration of region-level value locality with integrated computation reuse and value prediction

Youfeng Wu, Dong-Yuan Chen, Jesse Fang
2001 SIGARCH Computer Architecture News  
In this paper, we propose a speculative multithreading scheme in which the same hardware can be efficiently used for both computation reuse and value prediction.  ...  For example, the integrated approach improves over computation reuse from a speedup of 1.25 to 1.40, and improves over value prediction from 1.28 to 1.40.  ...  Daniel Connors helped us to understand several implementation issues with compiler-directed computation reuse. Brad Calder provided helpful insights into the value-profiling algorithm [3] .  ... 
doi:10.1145/384285.379255 fatcat:lhohw6q73jg73nn3lp6ctc22ei

Better exploration of region-level value locality with integrated computation reuse and value prediction

Youfeng Wu, Dong-Yuan Chen, Jesse Fang
2001 Proceedings of the 28th annual international symposium on Computer architecture - ISCA '01  
In this paper, we propose a speculative multithreading scheme in which the same hardware can be efficiently used for both computation reuse and value prediction.  ...  For example, the integrated approach improves over computation reuse from a speedup of 1.25 to 1.40, and improves over value prediction from 1.28 to 1.40.  ...  Daniel Connors helped us to understand several implementation issues with compiler-directed computation reuse. Brad Calder provided helpful insights into the value-profiling algorithm [3] .  ... 
doi:10.1145/379240.379255 dblp:conf/isca/WuCF01 fatcat:5jgyvnijuja3bpir2vlsy2h4gq

TOWARDS GREEN COMPUTING, IMPORTANCE, IMPACT AND POSSIBLE SOLUTIONS-A REVIEW

Ambooj Yadav
2018 International Journal of Advanced Research in Computer Science  
resources management: this is a management technique for dynamic resource division and the combination of variable size, will optimum the environment for every individual computing task  ...  involves three facts: 2.c.1 Dynamic energy consumption management of operating system: Dynamic energy consumption management implies that the operating system regulates system unit power consumption dynamically  ... 
doi:10.26483/ijarcs.v9i1.5299 fatcat:dsrkxeiumvfz7dyx62ov7bhy6a

PAMS: Pattern Aware Memory System for embedded systems

Tassadaq Hussain, Nehir Sonmez, Oscar Palomar, Osman Unsal, Adrian Cristal, Eduard Ayguade, Mateo Valero, S. A. Gursal
2014 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)  
The benchmarking applications (having static and dynamic data structures) results show that PAMS consumes 20% less hardware resources, 32% less on chip power and achieves a maximum speedup of 52x and 2.9x  ...  for static and dynamic data structures respectively.  ...  The BMS uses Central Direct Memory Access (CDMA) controller to improve the performance of the SDRAM controller by managing complex patterns in hardware.  ... 
doi:10.1109/reconfig.2014.7032544 dblp:conf/reconfig/HussainSPUCAVG14 fatcat:4fytjqvrrzhcnggo4sdnkzueqe

The pressure is on [computer systems research]

K. Kavi, J.C. Browne, A. Tripathi
1999 Computer  
Computer Cover Feature As applications become more demanding, computer systems research must not only redefine traditional roles but also unite diverse disciplines in a common goal: To make quantum leaps  ...  use computing systems. • Identify challenges and directions for building synergistic relationships among historically disjoint computing systems research areas. • Create new paradigms for computer and  ...  Design principles for dynamic and adaptive architectures must consider the entire spectrum. Directions and recommendations There are three major directions in this area.  ... 
doi:10.1109/2.738301 fatcat:37juh2i5yjhxtdajris5w5ewvq

Input data reuse in compiling window operations onto reconfigurable hardware

Zhi Guo, Betul Buyukkurt, Walid Najjar
2004 Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools - LCTES '04  
It then computes the size of the hardware buffer and defines three sets of data values for each window: the window set, the managed set and the killed set.  ...  Balancing computation with I/O has been considered as a critical factor of the overall performance for embedded systems in general and reconfigurable computing systems in particular.  ...  The Hardware Architecture One of the most important characteristics of window operations is that the compiler can decouple the memory accesses from the computations and thereby can maximize data reuse.  ... 
doi:10.1145/997163.997199 dblp:conf/lctrts/GuoBN04 fatcat:7jebbxsemffmljxm4fayk6oqwy

Input data reuse in compiling window operations onto reconfigurable hardware

Zhi Guo, Betul Buyukkurt, Walid Najjar
2004 SIGPLAN notices  
It then computes the size of the hardware buffer and defines three sets of data values for each window: the window set, the managed set and the killed set.  ...  Balancing computation with I/O has been considered as a critical factor of the overall performance for embedded systems in general and reconfigurable computing systems in particular.  ...  The Hardware Architecture One of the most important characteristics of window operations is that the compiler can decouple the memory accesses from the computations and thereby can maximize data reuse.  ... 
doi:10.1145/998300.997199 fatcat:ek236uypwncmvhowvkkaifarey

The OpenTM Transactional Application Programming Interface

Woongki Baek, Chi Cao Minh, Martin Trautmann, Christos Kozyrakis, Kunle Olukotun
2007 Parallel Architecture and Compilation Techniques (PACT), Proceedings of the International Conference on  
OpenTM extends OpenMP, a widely used API for shared-memory parallel programming, with a set of compiler directives to express non-blocking synchronization and speculative parallelization based on memory  ...  The implementation builds upon the OpenMP support in the GCC compiler and includes a runtime for the C programming language. We evaluate the performance and programmability features of OpenTM.  ...  Acknowledgements We would like to thank the anonymous reviewers for their feedback at various stages of this work. We also want to thank Sun Microsystems for making the TL2 code available.  ... 
doi:10.1109/pact.2007.4336227 fatcat:nn7gbfngvrff5egt4jzfgurptm

Region-based parallelization of irregular reductions on explicitly managed memory hierarchies

Seonggun Kim, Hwansoo Han, Kwang-Moo Choe
2009 Journal of Supercomputing  
Irregular reduction is one of important computation patterns for many complex scientific applications, and it typically requires high performance and large bandwidth of memory.  ...  Our framework employs iteration reordering based on regions of data along with dynamic scheduling of parallel tasks.  ...  Conclusions Irregular reductions are important computation patterns in the computational simulations for fluid dynamics and molecular dynamics.  ... 
doi:10.1007/s11227-009-0340-3 fatcat:tswiy42kw5hvhiclwlqfukuyri

Storageless value prediction using prior register values

Dean M. Tullsen, John S. Seng
1999 SIGARCH Computer Architecture News  
Even without the large buffers, register-value prediction can be made as or more effective than last-value prediction, particularly with the aid of compiler management of values in the register file.  ...  We show an average gain of 12% with dynamic RVP and moderate compiler assistance on a next generation processor, and 15% on a 16-wide processor.  ...  CCR-980869, and a grant from Compaq Computer Corporation.  ... 
doi:10.1145/307338.301002 fatcat:to5qaazdnnbs5o5wfr26q3qoya

Configurable processors for embedded computing

N. Dutt, Kiyoung Choi
2003 Computer  
An ADL supporting design space exploration for embedded SoCs and automatic generation of a retargetable compiler-simulator toolkit.  ...  A project, "Modern Embedded Systems: Compilers, Architecture, and Languages," to develop methodologies, tools, and algorithms for fully programmable platform-based designs in specific application domains  ...  By comparison, reconfigurable processors support both postmanufacturing and dynamic runtime configurability.  ... 
doi:10.1109/mc.2003.1160063 fatcat:43uu5gnfq5fubdrjahvzrblz44

Pushing the Level of Abstraction of Digital System Design: a Survey on How to Program FPGAs

Emanuele Del Sozzo, Davide Conficconi, Alberto Zeni, Mirko Salaris, Donatella Sciuto, Marco D. Santambrogio
2022 ACM Computing Surveys  
They are state-of-the-art for prototyping, telecommunications, embedded, and an emerging alternative for cloud-scale acceleration.  ...  Here, we survey three leading digital design abstractions: Hardware Description Languages (HDLs), High-Level Synthesis (HLS) tools, and Domain-Specific Languages (DSLs).  ...  ACKNOWLEDGEMENTS The authors are grateful for feedbacks from Reviewers and NECSTLab members, with a particular mention to A. Damiani, A. Parravicini, E. D'Arnese, F. Carloni, F. Peverelli, and R.  ... 
doi:10.1145/3532989 fatcat:nsk5lwvt3vba5fbxmaj7sgpwru
« Previous Showing results 1 — 15 out of 18,296 results