Filters








576 Hits in 4.8 sec

Improving the performance and power efficiency of shared helpers in CMPs

Anahita Shayesteh, Glenn Reinman, Norm Jouppi, Tim Sherwood, Suleyman Sair
2006 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems - CASES '06  
In a multicore environment, our intelligent and flexible sharing of helper provides an average 24% speedup compared to static sharing in conjoined cores.  ...  If there is a single core, these auxiliary structures can be turned on and off dynamically to tune the energy/performance of the machine to the needs of the running application.  ...  in a more power-efficient manner.  ... 
doi:10.1145/1176760.1176802 dblp:conf/cases/ShayestehRJSS06 fatcat:duoakcfkgjgotl6jtq3rxp23bq

High-Performance Energy-Efficient Multicore Embedded Computing

A. Munir, S. Ranka, A. Gordon-Ross
2012 IEEE Transactions on Parallel and Distributed Systems  
This paper outlines typical requirements of embedded applications and discusses state-of-the-art hardware/software high-performance energy-efficient embedded computing (HPEEC) techniques that help meeting  ...  high-performance embedded computing demands in an energy-efficient manner.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSERC and the NSF.  ... 
doi:10.1109/tpds.2011.214 fatcat:vagqmojdsjevvc2u2ewqrcjjpq

A helper thread based EDP reduction scheme for adapting application execution in CMPs

Yang Ding, Mahmut Kandemir, Padma Raghavan, Mary Jane Irwin
2008 Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)  
performance, energy efficiency and CPU availability are becoming increasingly critical.  ...  The helper thread runs parallel to the application execution threads and tries to determine the ideal number of CPUs, threads and voltage/frequency levels to employ at any given point in execution.  ...  Acknowledgements This work is supported in part by NSF grants CCF 0444345, CNS 0720645, CCF 0702519, and a grant from Microsoft Corporation.  ... 
doi:10.1109/ipdps.2008.4536297 dblp:conf/ipps/DingKRI08 fatcat:pkfrvfkjorb5bk24me6qbprkmi

Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems

Jaejin Lee, Changhee Jung, Daeseob Lim, Yan Solihin
2009 IEEE Transactions on Parallel and Distributed Systems  
In a standard CMP, the scheme achieves an average speedup of 1.33.  ...  Using a real CMP system with a shared L2 cache between two cores, our helper thread prefetching plus hardware L2 prefetching achieves an average speedup of 1.15 over the hardware L2 prefetching for the  ...  Because of that, c8 outperforms helper in mg. C8+helper performs better than other cases on the average, except for CMP.  ... 
doi:10.1109/tpds.2008.224 fatcat:dekoh4fecrgznpn6py3nnzlz3m

Dynamically configurable shared CMP helper engines for improved performance

Anahita Shayesteh, Glenn Reinman, Norman Jouppi, Suleyman Sair, Tim Sherwood
2005 SIGARCH Computer Architecture News  
In a multicore environment, our intelligent and flexible sharing of helper engines, provides an average 24% speedup over static sharing in conjoined cores.  ...  As more of the processor is broken down into helper engines, and as we add more and more cores onto a single chip which can potentially share helpers, the decisions that are made about these structures  ...  Which resources should be shared, how should they be allocated, and how can we efficiently manage their power?  ... 
doi:10.1145/1105734.1105744 fatcat:6yazv4fkbzckfcyqlnfa65h2de

Helper thread prefetching for loosely-coupled multiprocessor systems

Changhee Jung, Daeseob Lim, Jaejin Lee, Y. Solihin
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
In a standard CMP, the scheme achieves an average speedup of 1.33.  ...  processor in memory executes the helper thread.  ...  We model contention in the system between the application and helper threads on shared resources, such as the L2 cache and the system bus in the CMP configuration, plus memory controller and the DRAM resources  ... 
doi:10.1109/ipdps.2006.1639375 dblp:conf/ipps/JungLLS06 fatcat:yu6tyngi6zdtllclk2slh7fnmy

Efficient emulation of hardware prefetchers via event-driven helper threading

Ilya Ganusov, Martin Burtscher
2006 Proceedings of the 15th international conference on Parallel architectures and compilation techniques - PACT '06  
This paper explores the idea of using available general-purpose cores in a CMP as helper engines for individual threads running on the active cores.  ...  The advance of multi-core architectures provides significant benefits for parallel and throughput-oriented computing, but the performance of individual computation threads does not improve and may even  ...  In addition, the communication between the threads must occur via shared memory, increasing the contention for the shared cache ports and wasting dynamic power.  ... 
doi:10.1145/1152154.1152178 dblp:conf/IEEEpact/GanusovB06 fatcat:xbd5prrckjf3vlymoeipmmdcv4

Virtual Aggregated Processor in Multi-core Computers

Zhiyi Huang, Andrew Trotman, Jiaqi Zhang, Xiangfei Jia, Mariusz Nowostawski, Nathan Rountree, Paul Werstein
2008 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies  
We have proposed and implemented two techniques, helper thread and I/O specialization, to demonstrate the potential effectiveness of the Virtual Aggregated Processor technology.  ...  Parallel computing has been in the spotlight with the advent of multi-core computers.  ...  Acknowledgment The authors would like to thank Stuart Barson for his excellent comments and suggestions on the VAP project.  ... 
doi:10.1109/pdcat.2008.27 dblp:conf/pdcat/HuangTZJNRW08 fatcat:tloo6s7hjrcqzelj42f5odekw4

A Hardware Framework for Yield and Reliability Enhancement in Chip Multiprocessors

Abhisek Pan, Rance Rodrigues, Sandip Kundu
2015 ACM Transactions on Embedded Computing Systems  
The resulting design changes are minimal and impose insignificant cost in terms of area and power.  ...  In mgff, by varying the IC queue depth from 8 to 48, the performance loss for core 3 improved from 40 to less than 1%. Performance loss of core4 improved from 90 to 4%.  ... 
doi:10.1145/2629688 fatcat:eummtzr6mrh5pny4hyrfzcy6ta

ReMAP: A Reconfigurable Heterogeneous Multicore Architecture

Matthew A. Watkins, David H. Albonesi
2010 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture  
or more powerful cores. 1 As we explain in Section II, the virtualization of the fabric makes this dynamic division of the fabric transparent to software.  ...  ReMAP demonstrates significantly higher performance and energy efficiency compared to hard-wired communication-only mechanisms, and over what can ideally be achieved by allocating the fabric area to additional  ...  ACKNOWLEDGMENTS This research was supported by an NSF Graduate Research Fellowship; NSF grants CCF-0916821, CCF-0811729, and CNS-0708788; and equipment grants from Intel.  ... 
doi:10.1109/micro.2010.15 dblp:conf/micro/WatkinsA10 fatcat:6konrd23gnfeplyn7invrfqgaq

Scalable memory registration for high performance networks using helper threads

Dong Li, Kirk W. Cameron, Dimitrios S. Nikolopoulos, Bronis R. de Supinski, Martin Schulz
2011 Proceedings of the 8th ACM International Conference on Computing Frontiers - CF '11  
We investigate design policies and performance implications of the helper thread approach.  ...  However RDMA operations in some high performance networks require communication memory explicitly registered with the network adapter and pinned by the OS.  ...  Acknowledgments The research leading to these results has received funding from the European Community's Seventh Framework Programme  ... 
doi:10.1145/2016604.2016652 dblp:conf/cf/LiCNSS11 fatcat:2tqporroxrf2dbk4jtzxx5xatu

Software data spreading

Md Kamruzzaman, Steven Swanson, Dean M. Tullsen
2010 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation - PLDI '10  
In addition, despite using more cores for the same computation, data spreading actually saves power since it reduces access to DRAM.  ...  Software data spreading is a software-only technique that uses compiler-directed thread migration to aggregate cache capacity across cores and chips and improve performance.  ...  They would also like to thank Jeff Brown for frequent help with the simulation tools, Sajia Akhter for help with some of the graphics, and Nathan Goulding with the writing.  ... 
doi:10.1145/1806596.1806648 dblp:conf/pldi/KamruzzamanST10 fatcat:7jhmhlppubfhpdshh73p2lak4e

Software data spreading

Md Kamruzzaman, Steven Swanson, Dean M. Tullsen
2010 SIGPLAN notices  
In addition, despite using more cores for the same computation, data spreading actually saves power since it reduces access to DRAM.  ...  Software data spreading is a software-only technique that uses compiler-directed thread migration to aggregate cache capacity across cores and chips and improve performance.  ...  They would also like to thank Jeff Brown for frequent help with the simulation tools, Sajia Akhter for help with some of the graphics, and Nathan Goulding with the writing.  ... 
doi:10.1145/1809028.1806648 fatcat:x7xygmokfjg2hjl5oubjyefluy

Inter-core prefetching for multicore processors using migrating helper threads

Md Kamruzzaman, Steven Swanson, Dean M. Tullsen
2011 SIGARCH Computer Architecture News  
The results show that inter-core prefetching improves performance by an average of 31 to 63%, depending on the architecture, and speeds up some applications by as much as 2.8×.  ...  The compute thread then migrates between cores, following the path of the prefetch threads, and finds the data already waiting for it.  ...  Acknowledgments The authors would like to thank the anonymous reviewers and James Laudon for many useful suggestions. They would also like to thank Sajia Akhter for help with some of the graphics.  ... 
doi:10.1145/1961295.1950411 fatcat:yer3rocu45ehdlmksqhvoybtxu

Inter-core prefetching for multicore processors using migrating helper threads

Md Kamruzzaman, Steven Swanson, Dean M. Tullsen
2011 SIGPLAN notices  
The results show that inter-core prefetching improves performance by an average of 31 to 63%, depending on the architecture, and speeds up some applications by as much as 2.8×.  ...  The compute thread then migrates between cores, following the path of the prefetch threads, and finds the data already waiting for it.  ...  Acknowledgments The authors would like to thank the anonymous reviewers and James Laudon for many useful suggestions. They would also like to thank Sajia Akhter for help with some of the graphics.  ... 
doi:10.1145/1961296.1950411 fatcat:lxikjxzsrfay3pi7dmdp6mx6ei
« Previous Showing results 1 — 15 out of 576 results