Filters








3,427 Hits in 11.0 sec

A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, Krste Asanovic
2013 SIGARCH Computer Architecture News  
In this paper, we evaluate the potential of hardware cache partitioning mechanisms and policies to improve efficiency by allowing background applications to run simultaneously with interactive foreground  ...  Co-scheduling applications without LLC partitioning leads to a 10% energy improvement and average throughput improvement of 54% compared to running tasks separately, but can result in foreground performance  ...  ACKNOWLEDGEMENTS We would especially like to thank everyone at Intel who made it possible for us to use the cache-partitioning machine in this paper, including Opher Kahn, Andrew Herdrich, Ravi Iyer, Gans  ... 
doi:10.1145/2508148.2485949 fatcat:dbxw7vtev5b3zoazf52dzxkaqe

A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, Krste Asanovic
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
In this paper, we evaluate the potential of hardware cache partitioning mechanisms and policies to improve efficiency by allowing background applications to run simultaneously with interactive foreground  ...  Co-scheduling applications without LLC partitioning leads to a 10% energy improvement and average throughput improvement of 54% compared to running tasks separately, but can result in foreground performance  ...  ACKNOWLEDGEMENTS We would especially like to thank everyone at Intel who made it possible for us to use the cache-partitioning machine in this paper, including Opher Kahn, Andrew Herdrich, Ravi Iyer, Gans  ... 
doi:10.1145/2485922.2485949 dblp:conf/isca/CookMBDPA13 fatcat:n3pvdexokbcttgkyoiwfbtdvty

The Mondrian Data Engine

Mario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel, Babak Falsafi, Boris Grot, Dionisios Pnevmatikatos
2017 Proceedings of the 44th Annual International Symposium on Computer Architecture  
Compared to a CPU-centric and a baseline NMP system, the Mondrian Data Engine improves the performance of basic data analytics operators by up to 49× and 5×, and efficiency by up to 28× and 5×, respectively  ...  Near-memory processing (NMP) architectures are reemerging as promising candidates to improve computing efficiency through tight coupling of logic and memory.  ...  This work has been partially funded by a Microsoft Research PhD scholarship and the following projects: Nano-Tera YINS, CHIST-ERA DIVIDEND, and Horizon 2020's dRedBox and CE-EuroLab-4-HPC.  ... 
doi:10.1145/3079856.3080233 fatcat:lctwfx5ilbdazdhjvjtbvgo3fq

The Mondrian Data Engine

Mario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel, Babak Falsafi, Boris Grot, Dionisios Pnevmatikatos
2017 SIGARCH Computer Architecture News  
Compared to a CPU-centric and a baseline NMP system, the Mondrian Data Engine improves the performance of basic data analytics operators by up to 49× and 5×, and efficiency by up to 28× and 5×, respectively  ...  Near-memory processing (NMP) architectures are reemerging as promising candidates to improve computing efficiency through tight coupling of logic and memory.  ...  This work has been partially funded by a Microsoft Research PhD scholarship and the following projects: Nano-Tera YINS, CHIST-ERA DIVIDEND, and Horizon 2020's dRedBox and CE-EuroLab-4-HPC.  ... 
doi:10.1145/3140659.3080233 fatcat:zjfvuprhmnahhmqfnj6ems7fli

Direct address translation for virtual memory in energy-efficient embedded systems

Xiangrong Zhou, Peter Petrov
2008 ACM Transactions on Embedded Computing Systems  
The application information extracted and analyzed by the compiler is utilized dynamically by the microarchitecture and the operating system to perform energy-efficient and, for many memory references,  ...  We demonstrate that by using application information regarding virtual memory layout, an efficient and conflict-free translation process can be implemented through the utilization of a small hardware direct  ...  energy efficiency, multitasking, and realtime response are requirements of utmost importance.  ... 
doi:10.1145/1457246.1457251 fatcat:tcgxwkzpq5hr3dpxjfeus5lrre

Energy-Efficient Stream Compaction Through Filtering and Coalescing Accesses in GPGPU Memory Partitions

Albert Segura, Jose-Maria Arnau, Antonio Gonzalez
2021 IEEE transactions on computers  
Results show that the ISCU achieves a performance speedup of 2.2x and 90% energy savings derived from a high reduction of 78% memory accesses, while incurring in 8.5% area overhead.  ...  Recent work has pointed out the importance of stream compaction operations, and has proposed a Stream Compaction Unit (SCU) to offload them to a specialized hardware.  ...  (AEI/FEDER, EU), and the ICREA Academia program.  ... 
doi:10.1109/tc.2021.3104749 fatcat:vu7if2ciljefzab4njku5fyn6a

On Improving Efficiency and Utilization of Last Level Cache in Multicore Systems

Yumna Zahid, Hina Khurshid, Zulfiqar Ali Memon
2018 Information Technology and Control  
This article aims to provide the researchers with the state-of-the-art critical review of the various approaches that focus on data replication and cache partitioning techniques for L3 cache.  ...  To increase performance and energy efficiency, various techniques are proposed.  ...  Thereby, it is imperative to develop countermeasures to efficiently utilize LLC for improved performance and energy consumption.  ... 
doi:10.5755/j01.itc.47.3.18433 fatcat:pgrmyliv3ra5vjlkqqv3vhuudu

Energy-efficient address translation for virtual memory support in low-power and real-time embedded processors

Xiangrong Zhou, Peter Petrov
2005 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '05  
The proposed methodology relies on the combined efforts of compiler, operating system, and hardware architecture to achieve a significant power reduction.  ...  The set of virtual pages is partitioned into groups, such that for each group only a few of the least significant bits are used as an index to obtain the physical page number.  ...  energy efficiency, multitasking, and real-time response are requirements of utmost importance.  ... 
doi:10.1145/1084834.1084848 dblp:conf/codes/ZhouP05 fatcat:qxp7sy7jufbtpogoklx5xmcuoe

Customizable embedded processor architectures

P. Petrov, A. Orailoglu
2003 Euromicro Symposium on Digital System Design, 2003. Proceedings.  
The proposed architecture is capable of utilizing application information to boost the performance and lower the power consumption of the most important microarchitectural components such as instruction  ...  We outline the underlying algorithms for compiletime extraction of the utilized application properties and we present the architectural principles of the hardware support.  ...  The hardware support for the partitioned cache has to resolve the following problems: identification of the mapping between a memory instruction and a particular cache partition; identification of the  ... 
doi:10.1109/dsd.2003.1231986 dblp:conf/dsd/PetrovO03 fatcat:3o5wge3whfgalhez7llde5pukq

Adaptive Cache Management for Energy-Efficient GPU Computing

Xuhao Chen, Li-Wen Chang, Christopher I. Rodrigues, Jie Lv, Zhiying Wang, Wen-Mei Hwu
2014 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture  
However, GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design, which limits system performance and energy-efficiency.  ...  Experimental results show that cache efficiency is significantly improved and on-chip resources are better utilized for cachesensitive benchmarks.  ...  We also thank the anonymous reviewers for their insightful comments and suggestions, and Wenhao Jia from Princeton University for generously sharing his source code. This work is partly  ... 
doi:10.1109/micro.2014.11 dblp:conf/micro/ChenCRLWH14 fatcat:vgfsmescnvb7jfwk3vpyjwqdnm

A Survey on Graph Processing Accelerators: Challenges and Opportunities [article]

Chuangyi Gui, Long Zheng, Bingsheng He, Cheng Liu, Xinyu Chen, Xiaofei Liao, Hai Jin
2019 arXiv   pre-print
Despite a wealth of existing efforts on developing graph processing systems for improving the performance and/or energy efficiency on traditional architectures, dedicated hardware solutions, also referred  ...  Graph is a well known data structure to represent the associated relationships in a variety of applications, e.g., data science and machine learning.  ...  There are a large number of studies that attempt to use software solutions to improve the performance and energy efficiency of graph processing.  ... 
arXiv:1902.10130v1 fatcat:p5lzlf3gubckfpu4eowgo4myi4

A survey of architectural techniques for improving cache power efficiency

Sparsh Mittal
2014 Sustainable Computing: Informatics and Systems  
This enables the processor to sustain a higher instruction rate, which improves both performance and energy efficiency.  ...  Tsai and Chen [64] propose a technique for improving energy efficiency of embedded processors by using a memory structure called "Trace Reuse cache".  ... 
doi:10.1016/j.suscom.2013.11.001 fatcat:ovbeupgvizabdiubjzqta7mpba

Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systems

Rakesh Reddy, Peter Petrov
2007 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems - CASES '07  
We study the effect of multiple tasks on the data cache, and propose a technique which leverages configurable cache architectures to eliminate inter-task cache interference.  ...  Furthermore, dynamic and leakage power are significantly reduced as only a subset of the cache is active at any moment.  ...  approach for real-time and energy-efficient embedded applications.  ... 
doi:10.1145/1289881.1289917 dblp:conf/cases/ReddyP07 fatcat:24zlovxkrfbbjnndewp4htfusa

Energy Discounted Computing on Multicore Smartphones

Meng Zhu, Kai Shen
2016 USENIX Annual Technical Conference  
a deep energy discount.  ...  Experimental results on a multicore smartphone show that we can reach up to 63% energy discount in the best-effort task processing with little performance impact on the interactive applications.  ...  Acknowledgments This work was supported in part by the National Science Foundation grants CNS-1217372, CNS-1239423, and CCF-1255729, and by a Google Research Award.  ... 
dblp:conf/usenix/ZhuS16 fatcat:crtgvu6jtvhfhbsh36jsgbxxvy

Dynamically tuning processor resources with adaptive processing

D.H. Albonesi, R. Balasubramonian, S.G. Dropsbo, S. Dwarkadas, E.G. Friedman, M.C. Huang, V. Kursun, G. Magklis, M.L. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu (+2 others)
2003 Computer  
Using adaptive processing to dynamically tune major microprocessor resources, developers can achieve greater energy efficiency with reasonable hardware and software overhead while avoiding undue performance  ...  The adaptive processing approach to improving microprocessor energy efficiency dynamically tunes major microprocessor resources-such as caches and hardware queues-during execution to better match varying  ...  This technique can also be applied to the L1 I-cache, while an approach such as drowsy caches 6 can be used in the L1 Dcache and L2 cache to preserve their state.  ... 
doi:10.1109/mc.2003.1250883 fatcat:qcwpb552pzaaflerlmyrbohndm
« Previous Showing results 1 — 15 out of 3,427 results