295 Hits in 2.5 sec

PaRSEC: Exploiting Heterogeneity to Enhance Scalability

George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas Herault, Jack J. Dongarra
2013 Computing in science & engineering (Print)  
heterogeneous resources (http:// herbsutter. com/welcome-to-the-jungle); the additional complexity hinders all efforts at writing highperforming yet portable applications.  ...  New high-performance computing system designs with steeply escalating processor and core counts, burgeoning heterogeneity and accelerators, and increasingly unpredictable memory access times call for dramatically  ...  In this section, using PaRSEC to illustrate the case, we describe how a shift toward the dataflow programming model from conventional practices can preserve and enhance programmer productivity while achieving  ... 
doi:10.1109/mcse.2013.98 fatcat:qyghrkdwyjhs5porciopjjipi4

Dynamic task discovery in PaRSEC

Reazul Hoque, Thomas Herault, George Bosilca, Jack Dongarra
2017 Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - ScalA '17  
Successfully exploiting distributed collections of heterogeneous many-cores architectures with complex memory hierarchy through a portable programming model is a challenge for application developers.  ...  The solution explored in this paper, PaRSEC, is based on such a programming paradigm, supported by a highly efficient task-based runtime.  ...  To enhance the productivity of the application developers, PaRSEC implicitly infers all the communication from the expression of the tasks, supporting one-to-many and many-to-many types of communications  ... 
doi:10.1145/3148226.3148233 dblp:conf/sc/HoqueHBD17 fatcat:oxppr5z4cjhlpnl67nfbolhmim

Understanding PARSEC performance on contemporary CMPs

Major Bhadauria, Vincent M. Weaver, Sally A. McKee
2009 2009 IEEE International Symposium on Workload Characterization (IISWC)  
No investigation to date has profiled PARSEC on real hardware to better understand scaling properties and bottlenecks.  ...  PARSEC is a reference application suite used in industry and academia to assess new Chip Multiprocessor (CMP) designs.  ...  We also thank Chris Fensch from the University of Cambridge for his patches to enable execution on the SPARC platform.  ... 
doi:10.1109/iiswc.2009.5306793 dblp:conf/iiswc/BhadauriaWM09 fatcat:qx5lefg44ncnxc7ur34f3yvn4q

The PARSEC benchmark suite

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, Kai Li
2008 Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08  
The benchmark suite has been made available to the public.  ...  PARSEC includes emerging applications in recognition, mining and synthesis (RMS) as well as systems applications which mimic large-scale multi-threaded commercial programs.  ...  ACKNOWLEDGEMENTS First and foremost we would like to acknowledge the many authors of the PARSEC benchmark programs which are too numerous to be listed here.  ... 
doi:10.1145/1454115.1454128 dblp:conf/IEEEpact/BieniaKSL08 fatcat:mskdckm6urbknpxxt5c6hldlli

Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems

Qinglei Cao, Yu Pei, Kadir Akbudak, George Bosilca, Hatem Ltaief, David Keyes, Jack Dongarra
2021 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)  
This requires to extend PaRSEC with new features to integrate rank information into the dataflow so that proper decisions can be taken at runtime.  ...  During the last decade, lowrank matrix approximations, where the main idea consists of exploiting data sparsity typically by compressing off-diagonal tiles up to an application-specific accuracy threshold  ...  Starvation, latency, overhead and heterogeneity are the four main barriers on which PaRSEC focuses to overcome algorithm scalability and efficiency.  ... 
doi:10.1109/ipdps49936.2021.00017 fatcat:ivotweqnhvetdba7y3hy74wrba

Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent, Samuel Paul Thibault
2017 IEEE Transactions on Parallel and Distributed Systems  
The authors furthermore thank Cédric Augonnet, David Goudin and Xavier Lacoste from CEA CESTA who strongly contributed to this achievement as well as Luc Giraud and Emmanuel Jeannot for their feedback  ...  PaRSEC exploits this property to ensure an excellent scalability.  ...  Indeed, StarPU is optimized for heterogeneous architectures while PaRSEC has been originally designed to efficiently exploit homogeneous clusters.  ... 
doi:10.1109/tpds.2017.2766064 fatcat:ns3cvnpfjzfexcqu3o4mmo6ef4

Hierarchical DAG Scheduling for Hybrid Distributed Systems

Wei Wu, Aurelien Bouteiller, George Bosilca, Mathieu Faverge, Jack Dongarra
2015 2015 IEEE International Parallel and Distributed Processing Symposium  
Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak computational capacity.  ...  We propose two novel recursive algorithmic variants for one-sided factorizations and describe the changes to the PaRSEC task-scheduling runtime to build a framework where the task granularity is dynamically  ...  It also enhances the scalability of the underlying algorithms, by providing a recursive approach capable of mapping the algorithm on all potential computing resources, with an immediate positive impact  ... 
doi:10.1109/ipdps.2015.56 dblp:conf/ipps/WuBBFD15 fatcat:aj4lyl5kjrd7rckzohdtlmvrmi

The autonomic operating system research project

Davide B. Bartolini, Riccardo Cattaneo, Gianluca C. Durelli, Martina Maggio, Marco D. Santambrogio, Filippo Sironi
2013 Proceedings of the 50th Annual Design Automation Conference on - DAC '13  
With AcOS, we investigate intelligent resource allocation to achieve user-specified service-level objectives on application performance and to respect system-level thresholds on CPU temperature.  ...  This paper describes the Autonomic Operating System (AcOS) project; AcOS enhances commodity operating systems with an autonomic layer that enables self-* properties through adaptive resource allocation  ...  This paper illustrates how AcOS exploits this structure to enhance commodity operating systems with performance-aware resource allocation and temperature management.  ... 
doi:10.1145/2463209.2488828 dblp:conf/dac/BartoliniCDMSS13 fatcat:epd2ggix5vaqjkrsemyb2l3k2m

Assembly Operations for Multicore Architectures Using Task-Based Runtime Systems [chapter]

Damien Genet, Abdou Guermouche, George Bosilca
2014 Lecture Notes in Computer Science  
Several strategies are proposed to decompose the assembly problem while relying on a scheduling middle-ware to maximize the overlap between stages and increase the parallelism and thus the performance.  ...  This strategy requires the contribution blocks to be large enough to ensure performance. Moreover, the approach suffers from the lack of scalability: only intra-block parallelism is exploited.  ...  adapted to the heterogeneous context).  ... 
doi:10.1007/978-3-319-14313-2_29 fatcat:a3vl4ycfmvhm7opvvf3tdkrmgu

A Survey: Runtime Software Systems for High Performance Computing

2017 Supercomputing Frontiers and Innovations  
To address these challenges, strategies employing runtime software systems are being pursued to exploit information about the status of the application and the system hardware operation throughout the  ...  High Performance Computing system design and operation are challenged by requirements for significant advances in efficiency, scalability, productivity, and portability at the end of Moore's Law with approaching  ...  Therefore, for greater capability, improved performance through enhanced efficiency and higher scalability will have to be derived with the future use of runtime systems as a possible important contributor  ... 
doi:10.14529/jsfi170103 fatcat:yqj65kpvhngovcmgrr46vwwr6i

A taxonomy of computer-based simulations and its mapping to parallel and distributed systems simulation tools

Anthony Sulistio, Chee Shin Yeo, Rajkumar Buyya
2004 Software, Practice & Experience  
Therefore, the aim of this paper is to develop a comprehensive taxonomy for design of computer-based simulations, and apply this taxonomy to categorize and analyze various simulation tools for parallel  ...  In recent years, extensive research has been conducted in the area of simulation to model large complex systems and understand their behavior, especially in parallel and distributed systems.  ...  ACKNOWLEDGEMENTS The authors would like to acknowledge all researchers of the simulation tools described in this paper and thank them for their outstanding work.  ... 
doi:10.1002/spe.585 fatcat:b6z6njmvp5gsjcypyzcynoeb24

Decoupled Control and Data Processing for Approximate Near-Threshold Voltage Computing

Ismail Akturk, Nam Sung Kim, Ulya R. Karpuzcu
2015 IEEE Micro  
Accordingly, the limited parallel scalability of applications is more likely to restrict the use of more cores than the power budget. 4 Even if applications featured perfect parallel scalability, parametric  ...  to reduce the operating voltage, V DD .  ...  Figure 3 demonstrates a hypothetical NTV chip, clustered to enhance scalability. A few cores with per-core private memories and a shared cluster memory constitute each cluster.  ... 
doi:10.1109/mm.2015.85 fatcat:n6a6657vu5gn5mvccocabgrcui

Notice of Violation of IEEE Publication Principles An Energy-Efficient Heterogeneous Memory Architecture for Future Dark Silicon Embedded Chip-Multiprocessors

Salman Onsori, Arghavan Asad, Kaamran Raahemifar, Mahmood Fathy
2018 IEEE Transactions on Emerging Topics in Computing  
Our convex model optimizes numbers and placement of eDRAM and STT-RAM memory banks on the memory layer to exploit the advantages of both technologies in future eCMPs.  ...  In this article, we present a convex optimization model to design a 3D stacked hybrid memory architecture in order to minimize the future embedded systems energy consumption in the dark silicon era.  ...  In addition to the aforementioned advantages, 3D ICs also provide heterogeneous integration, on-chip interconnect length reduction, and a modular and scalable design.  ... 
doi:10.1109/tetc.2016.2563323 fatcat:acioedfuo5fyvmkpulhaybadye

Notice of Violation of IEEE Publication Principles A heterogeneous memory organization with minimum energy consumption in 3D chip-multiprocessors

Arghavan Asad, Salman Onsori, Mahmood Fathy, Mohammad Reza Jahed-Motlagh, Kaamran Raahemifar
2016 2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)  
In this article, we propose a stacked hybrid memory system for 3D chip-multiprocessors to take advantages of both traditional and non-volatile memory technologies.  ...  For reaching this target, we present a convex optimization-based model that minimizes the system energy consumption while satisfy endurance constraint in order to design a reliable memory system.  ...  [10] proposed a wear-leveling technique for a PRAM-based memory system to enhance the lifetime. Wang et al.  ... 
doi:10.1109/ccece.2016.7726817 dblp:conf/ccece/AsadOFMR16 fatcat:ej36y4vsi5e6zimwdh5dmirohe

Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications

Qinglei Cao, Yu Pei, Kadir Akbudak, Aleksandr Mikhalev, George Bosilca, Hatem Ltaief, David Keyes, Jack Dongarra
2020 Proceedings of the Platform for Advanced Scientific Computing Conference  
We propose a novel solution to this problem: at the mathematical level, we reduce the computational requirement by exploiting the data sparsity structure of the matrix off-diagonal tiles by means of low-rank  ...  approximations; and, at the programming-paradigm level, we integrate PaRSEC, a dynamic, task-based runtime to reach unparalleled levels of efficiency for solving extreme-scale linear algebra matrix operations  ...  This pruning phase limits potential scalability [37] . QUARK has no implicit support for heterogeneous nor distributed architectures though.  ... 
doi:10.1145/3394277.3401846 dblp:conf/pasc/CaoPAMBLKD20 fatcat:v6sqej7rfvgqbokaihh7uuqcpi
« Previous Showing results 1 — 15 out of 295 results