Filters








15 Hits in 8.8 sec

The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors

Jian Li, J.F. Martinez, M.C. Huang
10th International Symposium on High Performance Computer Architecture (HPCA'04)  
We present the thrifty barrier, a hardware-software approach to saving energy in parallel applications that exhibit barrier synchronization imbalance.  ...  However, little attention has been paid to multiprocessor environments where, due to the co-operative nature of the computation, the most energy-efficient execution in each processor may not translate  ...  ACKNOWLEDGMENTS We thank Evan Speight and the anonymous reviewers for useful feedback. This work was supported in part by gifts from Intel.  ... 
doi:10.1109/hpca.2004.10018 dblp:conf/hpca/LiMH04 fatcat:jxnyt5sm4narzfsnq37j25wiie

Energy reduction in multiprocessor systems using transactional memory

Tali Moreshet, R.I. Bahar, M. Herlihy
2005 ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.  
In this work we focus on new energy consumption issues unique to multiprocessor systems: synchronization of accesses to shared memory.  ...  We investigate and compare different means of providing atomic access to shared memory, including locks and lock-free synchronization (i.e., transactional memory), with respect to energy as well as performance  ...  [10] , energy reduction in the synchronization of shared memory multiprocessors by using a variation of barrier synchronization.  ... 
doi:10.1109/lpe.2005.195542 fatcat:ohnr3nfnijgabgivvyeipmgcfu

Energy reduction in multiprocessor systems using transactional memory

Tali Moreshet, R. Iris Bahar, Maurice Herlihy
2005 Proceedings of the 2005 international symposium on Low power electronics and design - ISLPED '05  
In this work we focus on new energy consumption issues unique to multiprocessor systems: synchronization of accesses to shared memory.  ...  We investigate and compare different means of providing atomic access to shared memory, including locks and lock-free synchronization (i.e., transactional memory), with respect to energy as well as performance  ...  [10] , energy reduction in the synchronization of shared memory multiprocessors by using a variation of barrier synchronization.  ... 
doi:10.1145/1077603.1077683 dblp:conf/islped/MoreshetBH05 fatcat:n2yqcqewbrfxfjoiujxooa6ocy

RegionScout

Andreas Moshovos
2005 SIGARCH Computer Architecture News  
It has been shown that many requests miss in all remote nodes in shared memory multiprocessors. We are motivated by the observation that this behavior extends to much coarser grain areas of memory.  ...  In the second RegionScout is used to avoid snoop induced tag lookups thus reducing energy.  ...  My understanding of multiprocessors and the ideas presented in this paper have benefited significantly from discussions with Angelos Bilas, Babak Falsafi and Dionisios Pnevmatikatos.  ... 
doi:10.1145/1080695.1069990 fatcat:jgb6j47y7zddtkbins5jslk3je

Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips

Timothy N. Miller, Xiang Pan, Renji Thomas, Naser Sedaghati, Radu Teodorescu
2012 IEEE International Symposium on High-Performance Comp Architecture  
effects of imbalance in multithreaded applications.  ...  We present two implementations of Booster: Booster VAR, which virtually eliminates the effects of core-to-core frequency variation in near-threshold CMPs, and Booster SYNC, which additionally reduces the  ...  The authors would like to thank the anonymous reviewers for their valuable feedback and suggestions, most of which have been included in this final version.  ... 
doi:10.1109/hpca.2012.6168942 dblp:conf/hpca/MillerPTST12 fatcat:n4liepekzrfw5mhvuycrfvqmd4

$C\!\!-\!\!Lock$ : Energy Efficient Synchronization for Embedded Multicore Systems

Seung Hun Kim, Sang Hyong Lee, Minje Jun, Byunghoon Lee, Won Woo Ro, Eui-Young Chung, Jean-Luc Gaudiot
2014 IEEE transactions on computers  
Also, in order to save more energy, disables the clocks of the cores which are blocked for the access to the shared data until the shared data become available.  ...  Data synchronization among multiple cores has been one of the critical issues which must be resolved in order to optimize the parallelism of multicore architectures.  ...  The thrifty barrier [11] is proposed as a hardware-software approach to reduce the energy waste in barrier spin-loops by estimating the wait time and forcing the processor into an appropriate low-power  ... 
doi:10.1109/tc.2013.84 fatcat:kaigds4epfesjcvthg34wo7z3i

Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Nikolas Ioannou, Marcelo Cintra
2011 Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 '11  
component that augments the local components with information regarding iii inter-thread synchronization.  ...  Implicit speculative parallelism frees the programmer from the additional effort to explicitly partition the work into finer and properly synchronized tasks.  ...  In order to obtain energy savings using barrier-aware DVFS, the discrepancy in thread execution times between two barriers, and in turn the expected barrier stall times when all cores run at the maximum  ... 
doi:10.1145/2155620.2155654 dblp:conf/micro/IoannouC11 fatcat:fskfxnf45jcvhm4n3vnw3crvm4

Meeting points

Qiong Cai, José González, Ryan Rakvic, Grigorios Magklis, Pedro Chaparro, Antonio González
2008 Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08  
Thread delaying saves energy consumptions by running the core containing the critical thread at maximum frequency while scaling down the frequency and voltage of the cores containing non-critical threads  ...  We define the critical thread the one with the longest completion time in the parallel region. Knowing the criticality of each thread has many potential applications.  ...  The thrifty barrier uses the idleness at the barrier to move the faster cores to a low power mode. It has been shown that the DVFS approach outperforms the thrifty barrier approach [25] .  ... 
doi:10.1145/1454115.1454149 dblp:conf/IEEEpact/CaiGRMCG08 fatcat:dliao2wrwnb37ibc753op3gqmm

Embracing heterogeneity with dynamic core boosting

Hyoun Kyu Cho, Scott Mahlke
2014 Proceedings of the 11th ACM Conference on Computing Frontiers - CF '14  
Such an imbalance can be significantly exacerbated by performance asymmetry among cores, which is likely to exist in future generations of chip multiprocessors (CMPs) either for energy efficiency or due  ...  Even for embarrassingly parallel programs in the form of SPMD (single program multiple data), the threads are not perfectly balanced due to control flow divergence, non-deterministic memory latencies,  ...  Prior work has suggested using barrier synchronizations for thread criticality prediction for saving energy either by transitioning into low power modes after reaching a barrier or by scaling down the  ... 
doi:10.1145/2597917.2597932 dblp:conf/cf/ChoM14 fatcat:tx22yxw3mnbu7dpe5ccwealwua

Clock gate on abort: Towards energy-efficient hardware Transactional Memory

Sutirtha Sanyal, Sourav Roy, Adrian Cristal, Osman S. Unsal, Mateo Valero
2009 2009 IEEE International Symposium on Parallel & Distributed Processing  
Also in the protocol we are proposing a gating-aware contention management policy to set the duration of the clock gating period precisely so that both performance and energy can be improved.  ...  With our proposal we got an average 19% savings in the total consumed energy and even an average speed-up of 4%.  ...  ; by the European Network of Excellence on High-Performance Embedded Architecture and Compilation (HiPEAC) and by the European Commission FP7 project VELOX (216852).  ... 
doi:10.1109/ipdps.2009.5160971 dblp:conf/ipps/SanyalRCUV09 fatcat:57vpigrcqzc23kddfr7qoyxcnq

Power-performance considerations of parallel computing on chip multiprocessors

Jian Li, José F. Martínez
2005 ACM Transactions on Architecture and Code Optimization (TACO)  
On the other hand, our experiments show that, when a limited power budget is in place, power-thrifty memory-bound applications may actually enjoy better scalability than more computeintensive codes, even  ...  This paper looks at the power-performance implications of running parallel applications on chip multiprocessors (CMPs).  ...  ACKNOWLEDGMENTS We thank David Albonesi, Rajit Manohar, and the anonymous reviewers for useful feedback. This work was supported in part by NSF awards CNS-0509404, CCF-0429922, and gifts from Intel.  ... 
doi:10.1145/1113841.1113844 fatcat:dzyuf4hm4zggxbysjhoghnffwm

RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

A. Moshovos
32nd International Symposium on Computer Architecture (ISCA'05)  
It has been shown that many requests miss in all remote nodes in shared memory multiprocessors. We are motivated by the observation that this behavior extends to much coarser grain areas of memory.  ...  In the second RegionScout is used to avoid snoop induced tag lookups thus reducing energy.  ...  My understanding of multiprocessors and the ideas presented in this paper have benefited significantly from discussions with Angelos Bilas, Babak Falsafi and Dionisios Pnevmatikatos.  ... 
doi:10.1109/isca.2005.42 dblp:conf/isca/Moshovos05 fatcat:farqa5h66bacxk22skpln6llhe

RA-LPEL: A Resource-Aware Light-Weight Parallel Execution Layer for Reactive Stream Processing Networks on The SCC Many-core Tiled Architecture [article]

Nilesh Karavadara, UH Research Archive, UH Research Archive
2016
In this thesis, we will focus on the Reactive Stream Program (RSP). In stream processing, the system consists of computing nodes, which are connected via communication streams.  ...  In computing the available computing power has continuously fallen short of the demanded computing performance. As a consequence, performance improvement has been the main focus of processor design.  ...  Thrifty Barrier [90] , exploits barrier synchronisation imbalance in parallel applications to reduce energy consumption.  ... 
doi:10.18745/th.17225 fatcat:lunf25ikfngmlijsjpgycp2mem

Who Should Read This Book [chapter]

2016 Securing the Outdoor Construction Site  
The action and the energy in the Unix community were shifting to Linux and BSD and open-source developers.  ...  But as the cost of compute cycles and memory dropped, the economic reasons for favoring a special-purpose language that was relatively thrifty with both lost their force.  ...  In one notorious example, as late as Release 9 the Mac OS memory manager sometimes required the user to manually deallocate memory by turfing out exited but stillresident programs.  ... 
doi:10.1016/b978-0-12-802383-9.00019-8 fatcat:5cxlhe2k4fbjzfwouwq6qa7iqe

Who should Read this Book? [chapter]

Child Protection  
The action and the energy in the Unix community were shifting to Linux and BSD and open-source developers.  ...  But as the cost of compute cycles and memory dropped, the economic reasons for favoring a special-purpose language that was relatively thrifty with both lost their force.  ...  In one notorious example, as late as Release 9 the Mac OS memory manager sometimes required the user to manually deallocate memory by turfing out exited but stillresident programs.  ... 
doi:10.4135/9781446212677.n4 fatcat:2fda5et7nfd45ms5nihte4wyea