Filters








884 Hits in 4.9 sec

NUMA-aware scheduling and memory allocation for data-flow task-parallel applications

Andi Drebes, Antoniu Pop, Karine Heydemann, Nathalie Drach, Albert Cohen
2016 SIGPLAN notices  
Our run-time algorithms for NUMA-aware task and data placement are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes.  ...  Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability through the elimination of false dependences and enables finegrained dynamic  ...  This work is supported by EPSRC grant EP/M004880/1 and A. Pop is funded by a Royal Academy of Engineering Research Fellowship.  ... 
doi:10.1145/3016078.2851193 fatcat:6cw5chndhrbbnegw2i6soz2whu

NUMA-aware scheduling and memory allocation for data-flow task-parallel applications

Andi Drebes, Antoniu Pop, Karine Heydemann, Nathalie Drach, Albert Cohen
2016 Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16  
Our run-time algorithms for NUMA-aware task and data placement are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes.  ...  Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability through the elimination of false dependences and enables finegrained dynamic  ...  This work is supported by EPSRC grant EP/M004880/1 and A. Pop is funded by a Royal Academy of Engineering Research Fellowship.  ... 
doi:10.1145/2851141.2851193 dblp:conf/ppopp/DrebesPHD016 fatcat:g3yvdhdazrespa4o6trdrrtbae

Locality Aware Task Scheduling in Parallel Data Stream Processing [chapter]

Zbyněk Falt, Martin Kruliš, David Bednárek, Jakub Yaghob, Filip Zavoral
2015 Studies in Computational Intelligence  
In addition, we have implemented a NUMA aware memory allocator that improves data locality in NUMA systems.  ...  scheduler and allocator.  ...  Acknowledgment This work was supported by the Czech Science Foundation (GACR), projects P103-13-08195S and P103-14-14292P, and by Specific Research project SVV-2014-260100.  ... 
doi:10.1007/978-3-319-10422-5_35 fatcat:56g7pv2m3faybftgshjicfxbrm

Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages

Andi Drebes, Karine Heydemann, Nathalie Drach, Antoniu Pop, Albert Cohen
2014 ACM Transactions on Architecture and Code Optimization (TACO)  
We present a joint scheduling and memory allocation algorithm for efficient execution of task-parallel programs on non-uniform memory architecture (NUMA) systems.  ...  Existing locality-aware scheduling strategies for fine-grained tasks have strong limitations: they are specific to some class of machines or applications, they do not handle task dependences, they require  ...  A NUMA-aware memory allocator associates the amount and the location of allocated memory to the allocating thread.  ... 
doi:10.1145/2641764 fatcat:tlu32qcxpnbsxpgikiw5blbg5e

Scalable Task Parallelism for NUMA

Andi Drebes, Antoniu Pop, Karine Heydemann, Albert Cohen, Nathalie Drach
2016 Proceedings of the 2016 International Conference on Parallel Architectures and Compilation - PACT '16  
We show that using NUMA-aware task and data placement, it is possible to preserve the uniform hardware abstraction of contemporary task-parallel programming models for both computing and memory resources  ...  Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability by eliminating false dependences and enabling fine-grained dynamic control over  ...  Acknowledgments Our work was supported by the grants EU FET-HPC Ex-aNoDe H2020-671578 and UK EPSRC EP/M004880/1. A. Pop is funded by a Royal Academy of Engineering University Research Fellowship.  ... 
doi:10.1145/2967938.2967946 dblp:conf/IEEEpact/DrebesPH0D16 fatcat:wae2cwmvkvf7jcdsni3537vtj4

Using Data Dependencies to Improve Task-Based Scheduling Strategies on NUMA Architectures [chapter]

Philippe Virouleau, François Broquedis, Thierry Gautier, Fabrice Rastello
2016 Lecture Notes in Computer Science  
Data placement and task scheduling strategies have a significant impact on performances when considering NUMA architectures.  ...  the tasks data dependencies.  ...  software environment for very high performance computing.  ... 
doi:10.1007/978-3-319-43659-3_39 fatcat:i6ou3fm2efcz5np6rafvfnuvkm

A NUMA-Aware Fine Grain Parallelization Framework for Multi-core Architecture

Corentin Rossignon, Pascal Henon, Olivier Aumage, Samuel Thibault
2013 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum  
suitable for the hardware and minimizing the time penalty due to Non Uniform Memory Accesses.  ...  In this paper, we present some solutions to handle two problems commonly encountered when dealing with fine grain parallelization on multi-core architecture: expressing algorithm using a task grain size  ...  NUMA AWARE SCHEDULING In order to easily experiment with NUMA-aware task scheduling, we implemented our own basic NUMA task scheduler.  ... 
doi:10.1109/ipdpsw.2013.204 dblp:conf/ipps/RossignonHAT13 fatcat:7fkxf74bwjctnppge73xbkbpba

Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores

Fengguang Song, Jack Dongarra
2014 Proceedings of the 28th ACM international conference on Supercomputing - ICS '14  
To overcome the bottleneck, we have designed NUMA-aware tile algorithms with the help of a dynamic scheduling runtime system to minimize NUMA memory accesses.  ...  The main idea is to identify the data that is, either read a number of times or written once by a thread resident on a remote NUMA node, then utilize the runtime system to conduct data caching and movement  ...  The runtime system uses a data-availability-driven (i.e., data-flow) scheduling method and knows where data are located and which tasks are waiting for the data.  ... 
doi:10.1145/2597652.2597670 dblp:conf/ics/SongD14 fatcat:bs6625utpzaftigfw75sin2zcy

MDTM: Optimizing Data Transfer Using Multicore-Aware I/O Scheduling

Liang Zhang, Phil Demar, Bockjoo Kim, Wenji Wu
2017 2017 IEEE 42nd Conference on Local Computer Networks (LCN)  
The MDTM scheduler exploits underlying multicore layouts to optimize throughput by reducing delay and contention for I/O reading and writing operations.  ...  With our evaluations, we show how MDTM successfully avoids NUMA-based congestion and significantly improves end-to-end data transfer rates across high-speed wide area networks.  ...  Current data transfer applications depend on the operating system (OS) to schedule the user threads and allocate memory.  ... 
doi:10.1109/lcn.2017.64 dblp:conf/lcn/ZhangDKW17 fatcat:5fw764fqmjckjpl4su7fl43kom

Reducing Cache Coherence Traffic with a NUMA-Aware Runtime Approach

Paul Caheny, Lluc Alvarez, Said Derradji, Mateo Valero, Miquel Moreto, Marc Casas
2018 IEEE Transactions on Parallel and Distributed Systems  
For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol combined with runtime managed NUMA-aware scheduling and data  ...  The effectiveness of this joint approach is demonstrated by speedups of 3.14x to 9.97x and coherence traffic reductions of up to 99% in comparison to NUMA-oblivious scheduling and data allocation.  ...  ACKNOWLEDGMENTS This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493), by the Spanish Ministry of Science and Innovation (contracts TIN2015-  ... 
doi:10.1109/tpds.2017.2787123 fatcat:kz7gsqv5zvawvh7uulewkoiawe

Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies

Isaac Sánchez Barrera, Miquel Moretó, Eduard Ayguadé, Jesús Labarta, Mateo Valero, Marc Casas
2018 Proceedings of the 2018 International Conference on Supercomputing - ICS '18  
We propose techniques at the runtime system level to further mitigate the impact of NUMA effects on parallel applications' performance.  ...  We leverage runtime system metadata expressed in terms of a task dependency graph, where nodes are pieces of serial code and edges are control or data dependencies between them, to efficiently reduce data  ...  This NUMA-aware scheduling also provides a higher probability for a task to hit its data in the cache of the processor if previous tasks using that data also ran in the same socket.  ... 
doi:10.1145/3205289.3205310 dblp:conf/ics/BarreraMALVC18 fatcat:shndpuchkzagdo5z7ka7vkee6q

Higher-level parallelization for local and distributed asynchronous task-based programming

Hartmut Kaiser, Thomas Heller, Daniel Bourgeois, Dietmar Fey
2015 Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware - ESPM '15  
Among those efforts is the development of seamlessly integrating various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous execution flows, continuation style  ...  In this paper, we present the results of developing those higher level parallelization facilities in HPX, a general purpose C++ runtime system for applications of any scale.  ...  1339782 (STORM), and the DoE award DE-SC0008714 (XPRESS).  ... 
doi:10.1145/2832241.2832244 dblp:conf/sc/KaiserHBF15 fatcat:5eaakuz6h5dldlnbjyv6k2twuq

User-Level Memory Scheduler for Optimizing Application Performance in NUMA-Based Multicore Systems [article]

Geunsik Lim, Sang-Bum Suh
2021 arXiv   pre-print
This paper presents a user-space memory scheduler that allocates the ideal memory node for tasks by monitoring the characteristics of non-uniform memory architecture.  ...  For the technique of using the kernel space that shifts the tasks to the ideal memory node, the characteristics of the applications of the user-space cannot be considered.  ...  PROPOSAL OF USER-LEVEL NUMA-AWARE MEMORY SCHEDULER Our proposed technique maintains an ideal memory locality to help the high-performance execution of the application by removing the possibility of memory  ... 
arXiv:2101.09284v1 fatcat:uglpzosigzh2nppxte47p345cu

Dense Matrix Computations on NUMA Architectures with Distance-Aware Work Stealing

2015 Supercomputing Frontiers and Innovations  
Work stealing occurs within an innovative NUMA-aware scheduling policy to reduce data movement between NUMA nodes.  ...  We employ the dynamic runtime system OmpSs to decrease the overhead of data motion in the now ubiquitous non-uniform memory access (NUMA) high concurrency environment of multicore processors.  ...  Drebes [14] presents a scheduling and allocation algorithm for the OpenStream language.  ... 
doi:10.14529/jsfi150103 fatcat:l7mujkltgzh25oy66xnyz63iei

Bandwidth and Locality Aware Task-stealing for Manycore Architectures with Bandwidth-Asymmetric Memory

Han Zhao, Quan Chen, Yuxian Qiu, Ming Wu, Yao Shen, Jingwen Leng, Chao Li, Minyi Guo
2018 ACM Transactions on Architecture and Code Optimization (TACO)  
To solve the two problems, we propose a Bandwidth and Locality Aware Task-stealing (BATS) system, which consists of an HBM-aware data allocator, a bandwidth-aware traffic balancer, and a hierarchical task-stealing  ...  Parallel computers now start to adopt Bandwidth-Asymmetric Memory architecture that consists of traditional DRAM memory and new High Bandwidth Memory (HBM) for high memory bandwidth.  ...  Same to state-of-theart locality-aware task-stealing schedulers, such as LAWS [9] and RELWS [25] , BATS targets for programs that use parallelized data initialization.  ... 
doi:10.1145/3291058 fatcat:tp6lvtiwhrehhpjdh4m3hb4bja
« Previous Showing results 1 — 15 out of 884 results