1,566 Hits in 5.4 sec


Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, Dietmar Fey
2014 Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14  
management.  ...  We present HPX -a parallel runtime system which extends the C++11/14 standard to facilitate distributed operations, enable fine-grained constraint based parallelism, and support runtime adaptive resource  ...  We would also like to thank LSU HPC, LONI, XSEDE, ALCF, and the Gauss Center for Supercomputing. for granting us allocations for their compute resources.  ... 
doi:10.1145/2676870.2676883 dblp:conf/pgas/KaiserHASF14 fatcat:qnv2cktjufdzvo2jlsmdpw77he

Embracing heterogeneity with dynamic core boosting

Hyoun Kyu Cho, Scott Mahlke
2014 Proceedings of the 11th ACM Conference on Computing Frontiers - CF '14  
and synchronization operations.  ...  Relying on dynamic voltage and frequency scaling to accelerate individual cores at a fine granularity, DCB attempts to balance the workloads by detecting and boosting critical threads.  ...  It also shows that fine-grained (quantum length of 1000 instructions) dynamic per-core performance adaptation is possible.  ... 
doi:10.1145/2597917.2597932 dblp:conf/cf/ChoM14 fatcat:tx22yxw3mnbu7dpe5ccwealwua

The Performance Implication of Task Size for Applications on the HPX Runtime System

Patricia Grubel, Hartmut Kaiser, Jeanine Cook, Adrian Serio
2015 2015 IEEE International Conference on Cluster Computing  
The HPX runtime system [1] employs asynchronous fine-grained task scheduling and incorporates a dynamic performance modeling capability, providing an ideal experimental platform.  ...  As High Performance Computing moves toward Exascale, where parallel applications will be expected to run on millions of cores concurrently, every component of the computational model must perform optimally  ...  We also thank the anonymous reviewers for their insightful recommendations.  ... 
doi:10.1109/cluster.2015.119 dblp:conf/cluster/GrubelKCS15 fatcat:vjxornmsvvdengb36gfcvh4tri

Thread Progress Equalization: Dynamically Adaptive Power-Constrained Performance Optimization of Multi-Threaded Applications

Yatish Turakhia, Guangshuo Liu, Siddharth Garg, Diana Marculescu
2017 IEEE transactions on computers  
Dynamically adaptive multi-core architectures have been proposed as an effective solution to optimize performance for peak power constrained processors.  ...  In processors, the micro-architectural parameters or voltage/frequency of each core to be changed at run-time, thus providing a range of power/performance operating points for each core.  ...  Like TPEq (and unlike CS), MaxBIPS can be used for fine-grained optimization of core configurations.  ... 
doi:10.1109/tc.2016.2608951 fatcat:qdhcpdihlrdcnkisw4xjis4cru

Exploring parallelization for medium access schemes on many-core software defined radio architecture

Xi Zhang, Junaid Ansari, Manish Arya, Petri Mähönen
2013 Proceedings of the second workshop on Software radio implementation forum - SRIF '13  
An SDR architecture consisting of many-core homogeneous computing elements provides easy protocol implementation, a high level of portability and extension possibilities.  ...  Therefore, in this paper, we explore how a homogeneous SDR architecture is used for efficient realization and execution of Medium Access Control (MAC) protocols.  ...  Homogeneous many-core architecture provides a mid-way between multi-core CPUs and Graphics Processing Units (GPUs) for a balance between programmability and parallelism.  ... 
doi:10.1145/2491246.2491250 dblp:conf/sigcomm/ZhangAAM13 fatcat:dashdootlffg3pnekdpyygzl74

A Survey: Runtime Software Systems for High Performance Computing

2017 Supercomputing Frontiers and Innovations  
Many share common properties such as multi-tasking either preemptive or non-preemptive, message-driven computation such as active messages, sophisticated fine-grain synchronization such as dataflow and  ...  scheduling and resource management for dynamic adaptive control.  ...  TBB Intel TBB [42] is a C++ template library supporting generation and scheduling of multiple fine-grain tasks on shared memory multi-core platforms.  ... 
doi:10.14529/jsfi170103 fatcat:yqj65kpvhngovcmgrr46vwwr6i

FPGA and ASIC convergence

C. Valderrama, L. Jojczyk, P. DaCunha Possa, J. Dondo Gazzano
2011 2011 VII Southern Conference on Programmable Logic (SPL)  
computational resource for non graphics applications.  ...  FPGA architectures on the embedded systems arena. [3] [4] .  ...  The multi-core Teraflops includes a 2D mesh NoC and fine-grain power management (combining frequency scaling and multiple cores activation).  ... 
doi:10.1109/spl.2011.5782660 fatcat:tuym2sakgbhyxmkwjbj6tzfaqq

Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips

Timothy N. Miller, Xiang Pan, Renji Thomas, Naser Sedaghati, Radu Teodorescu
2012 IEEE International Symposium on High-Performance Comp Architecture  
The governor manages a "boost budget" that dictates how many cores can be sped up (depending on the power constraints) at any given time.  ...  Evaluation using PARSEC and SPLASH2 benchmarks running on a simulated 32-core system shows an average performance improvement of 11% for Booster VAR and 23% for Booster SYNC.  ...  The authors would like to thank the anonymous reviewers for their valuable feedback and suggestions, most of which have been included in this final version.  ... 
doi:10.1109/hpca.2012.6168942 dblp:conf/hpca/MillerPTST12 fatcat:n4liepekzrfw5mhvuycrfvqmd4

Programming MPSoC platforms: Road works ahead!

R. Leupers, A. Vajda, M. Bekooij, Soonhoi Ha, R. Domer, A. Nohl
2009 2009 Design, Automation & Test in Europe Conference & Exhibition  
The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture.  ...  On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool  ...  Hence, we argue that the frequency at which each core executes shall be modifiable at a fine-grain level during program execution and according to the needs of the executing application(s).  ... 
doi:10.1109/date.2009.5090917 fatcat:dz4ubgggofc3dnfqlnyknucgsa

A simulation framework for rapid prototyping and evaluation of thermal mitigation techniques in many-core architectures

Tanguy Sassolas, Chiara Sandionigi, Alexandre Guerre, Julien Mottin, Pascal Vivet, Hela Boussetta, Nicolas Peltier
2015 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)  
For many-core architectures, this is often managed by online scheduling techniques.  ...  Here, the focus is on tools that allow the evaluation of dynamic thermal management techniques on many-core architectures early in the design flow.  ... 
doi:10.1109/islped.2015.7273485 dblp:conf/islped/SassolasSGMVBP15 fatcat:m6jbyzye6bgjlofir5ucqm6kty

MAESTRO: Orchestrating predictive resource management in future multicore systems

Sangyeun Cho, Socrates Demetriades
2011 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS)  
Current resource management strategies are mostly reactive and have limited awareness of an application's resource usage and asymmetry in hardware resources.  ...  the learning to relevant program and system control structures, and makes resource management decisions such as task mapping and cache partitioning in a predictive manner.  ...  Instructionoriented knowledge provides hints about functional unit usages and how trends of fine-grained architectural events like cache misses and synchronization actions change at specific program points  ... 
doi:10.1109/ahs.2011.5963917 dblp:conf/ahs/ChoD11 fatcat:7qfcxjpkdjd33lwzubornspiom

Improving Resilience to Timing Errors by Exposing Variability Effects to Software in Tightly-Coupled Processor Clusters

Abbas Rahimi, Daniele Cesarini, Andrea Marongiu, Rajesh K. Gupta, Luca Benini
2014 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
Further, VOMP reaches energy saving of 2%-46% and 15%-50% for tasks, and sections, respectively.  ...  Our results show that VOMP greatly reduces the cost of timing error recovery compared to the baseline schedulers of OpenMP, yielding speedup of 3%-36% for tasks, and 26%-49% for sections.  ...  Every cluster has independent power and clock domain, therefore enabling fine-grained power and variability management [5] .  ... 
doi:10.1109/jetcas.2014.2315883 fatcat:emu6fpxpxreyjihdgyxv7xhhde

Multi-core acceleration of chemical kinetics for simulation and prediction

John C. Linford, John Michalakes, Manish Vachharajani, Adrian Sandu
2009 Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09  
The implementation of a three-stage Rosenbrock solver for SIMD architectures is discussed.  ...  A comparative performance analysis for each platform in double and single precision on coarse and fine grids is presented.  ...  This work was partially supported by the National Center for Supercomputing Applications which made available the NCSA's experimental GPU cluster.  ... 
doi:10.1145/1654059.1654067 dblp:conf/sc/LinfordMVS09 fatcat:ym4dm6gxljdkdoxkixsl33a43e

COUNTDOWN: a Run-time Library for Performance-Neutral Energy Saving in MPI Applications [article]

Daniele Cesarini, Andrea Bartolini, Pietro Bonfà, Carlo Cavazzoni, Luca Benini
2019 arXiv   pre-print
In a complete production --- Quantum ESPRESSO --- for a 3.5K cores run, COUNTDOWN saves 22.36% energy, with a performance penalty below 3%.  ...  and synchronization.  ...  FRAMEWORK COUNTDOWN is a simple run-time library for profiling and fine-grain power management written in C language.  ... 
arXiv:1806.07258v2 fatcat:zkykp7lbbfdxpgua7woe4vsaxq

Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers

Thomas Heller, Hartmut Kaiser, Andreas Schäfer, Dietmar Fey
2013 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - ScalA '13  
The presented results are acquired from various homogeneous and heterogeneous runs including up to 1024 nodes (16384 conventional cores) combined with up to 16 Xeon Phi accelerators (3856 hardware threads  ...  Our measurements demonstrate the advantage of using the intrinsically asynchronous and message driven programming model exposed by HPX which enables better latency hiding, fine to medium grain parallelism  ...  This results in taking a very coarse grained function and reducing it into multiple fine grained functions whose parts are executed in parallel.  ... 
doi:10.1145/2530268.2530269 dblp:conf/sc/HellerKSF13 fatcat:oibtueaddzdyjjhiw573e2o4i4
« Previous Showing results 1 — 15 out of 1,566 results