Shared resource aware scheduling on power-constrained tiled many-core processors

Sudhanshu Shekhar Jha, Wim Heirman, Ayose Falcón, Jordi Tubella, Antonio González, Lieven Eeckhout
2016 Proceedings of the ACM International Conference on Computing Frontiers - CF '16  
h i g h l i g h t s • A low-overhead and high scalable hierarchical power manager on a tiled many-core architecture with shared LLC and VR. • Shared DVFS and cache adaptation can degrade performance of co-scheduled threads on a tile. • DVFS and cache-aware thread migration (DCTM) to ensure optimum per-tile co-scheduling of compatible threads at runtime. • DCTM assisted hierarchical power manager improves performance by up to 20% compared to conventional centralized power manager with per-core
more » ... . a b s t r a c t Power management through dynamic core, cache and frequency adaptation is becoming a necessity in today's power-constrained many-core environments. Unfortunately, as core count grows, the complexity of both the adaptation hardware and the power management algorithms increases exponentially. This calls for hierarchical solutions, such as on-chip voltage regulators per-tile rather than per-core, along with multi-level power management. As power-driven adaptation of shared resources affects multiple threads at once, the efficiency in a tile-organized many-core processor architecture hinges on the ability to co-schedule compatible threads to tiles in tandem with hardware adaptations per tile and per core. In this paper, we propose a two-tier hierarchical power management methodology to exploit pertile voltage regulators and clustered last-level caches. In addition, we include a novel thread migration layer that (i) analyzes threads running on the tiled many-core processor for shared resource sensitivity in tandem with core, cache and frequency adaptation, and (ii) co-schedules threads per tile with compatible behavior. On a 256-core setup with 4 cores per tile, we show that adding sensitivity-based thread migration to a two-tier power manager improves system performance by 10% on average (and up to 20%) while using 4× less on-chip voltage regulators. It also achieves a performance advantage of 4.2% on average (and up to 12%) over existing solutions that do not take DVFS sensitivity into account. (J. Tubella), antonio@ac.upc.edu (A. González), lieven.eeckhout@ugent.be (L. Eeckhout). efficient way on par with Moore's law [40] . With continued emphasis on technology scaling for increased circuit densities, controlling chip power consumption has become a first-order design constraint. Due to the end of Dennard scaling [12] (slowed supply voltage scaling), we may become so power-constrained that we are no longer able to power on all transistors at the same timedark silicon [16] . Runtime factors such as thermal emergencies [7] and power capping [19] further constrain the available chip power. Owing to all the above factors, power budgeting on many-core systems has received considerable attention recently [22, 36, 37, 39, 49, 51] .
doi:10.1145/2903150.2903490 dblp:conf/cd/JhaHFT0E16 fatcat:v7uohhnrm5cw5ne5o5c3ut2yvu