Shared resource aware scheduling on power-constrained tiled many-core processors
Journal of Parallel and Distributed Computing
h i g h l i g h t s • A low-overhead and high scalable hierarchical power manager on a tiled many-core architecture with shared LLC and VR. • Shared DVFS and cache adaptation can degrade performance of co-scheduled threads on a tile. • DVFS and cache-aware thread migration (DCTM) to ensure optimum per-tile co-scheduling of compatible threads at runtime. • DCTM assisted hierarchical power manager improves performance by up to 20% compared to conventional centralized power manager with per-core
... . a r t i c l e i n f o Keywords: Many-core tiled architecture Thread migration Power budget Adaptive microarchitecture a b s t r a c t Power management through dynamic core, cache and frequency adaptation is becoming a necessity in today's power-constrained many-core environments. Unfortunately, as core count grows, the complexity of both the adaptation hardware and the power management algorithms increases exponentially. This calls for hierarchical solutions, such as on-chip voltage regulators per-tile rather than per-core, along with multi-level power management. As power-driven adaptation of shared resources affects multiple threads at once, the efficiency in a tile-organized many-core processor architecture hinges on the ability to co-schedule compatible threads to tiles in tandem with hardware adaptations per tile and per core. In this paper, we propose a two-tier hierarchical power management methodology to exploit pertile voltage regulators and clustered last-level caches. In addition, we include a novel thread migration layer that (i) analyzes threads running on the tiled many-core processor for shared resource sensitivity in tandem with core, cache and frequency adaptation, and (ii) co-schedules threads per tile with compatible behavior. On a 256-core setup with 4 cores per tile, we show that adding sensitivity-based thread migration to a two-tier power manager improves system performance by 10% on average (and up to 20%) while using 4× less on-chip voltage regulators. It also achieves a performance advantage of 4.2% on average (and up to 12%) over existing solutions that do not take DVFS sensitivity into account.