17 Hits in 7.9 sec

Runtime-Aware Architectures: A First Approach

2014 Supercomputing Frontiers and Innovations  
As a consequence, the role of decoupling again applications from the hardware was moved to the runtime system.  ...  In this paper, we introduce a first approach towards a Runtime-Aware Architecture (RAA), a massively parallel architecture designed from the runtime's perspective.  ...  This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2012-34557, the HiPEAC Network of Excellence, and by the European Research Council under the European  ... 
doi:10.14529/jsfi140102 fatcat:4bh33566cfbz7iylsf2ufppsfa

Runtime-Assisted Cache Coherence Deactivation in Task Parallel Programs

Paul Caheny, Lluc Alvarez, Mateo Valero, Miquel Moreto, Marc Casas
2018 SC18: International Conference for High Performance Computing, Networking, Storage and Analysis  
To reduce the area and power needs of the directory, recent proposals reduce its size by classifying data as private or shared, and disable coherence for private data.  ...  With increasing core counts, the scalability of directory-based cache coherence has become a challenging problem.  ...  The input and output information allows the runtime system to transparently manage GPUs [47] , [58] , stacked DRAM memories [59] , multi-node clusters [60] , and scratchpad memories [61] .  ... 
doi:10.1109/sc.2018.00038 fatcat:5zfpsxgbxbf7vfg36ng2k5jg6y

FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing [article]

Michael Riera, Masudul Hassan Quraishi, Erfan Bank Tavakoli, Fengbo Ren
2021 arXiv   pre-print
In this paper, we present FLASH 1.0, a C++-based software framework for rapid parallel deployment and enhancing host code portability in heterogeneous computing.  ...  portability with a normalized framework overhead between 1% - 13% of the total kernel runtime.  ...  transparently scaling up kernel execution across multiple heterogeneous accelerators on a single node with built-in automated synchronization and partitioning. • We develop the device runtimes for supporting  ... 
arXiv:2106.13645v1 fatcat:fuu2rc6abffhnibbv7wuc5hn4q

Message from the Program Co-chairs

2010 2010 19th IEEE Asian Test Symposium  
for the upcoming generation of heterogeneous parallel systems.  ...  many-core architectures • New approaches for leveraging on-die messaging facilities • Traditional and new programming models for novel many-core hardware • Concepts for runtime systems on novel many-core  ...  ACKNOWLEDGMENTS The authors would like to thank Adam Lackorzyński and Alexander Warg for sharing their Fiasco.OC knowledge.  ... 
doi:10.1109/ats.2010.6 fatcat:iadg4ce5pbernnmwnjzkj4tnba

Message from the Program Co-Chairs

2013 2013 International Conference on Computer and Robot Vision  
for the upcoming generation of heterogeneous parallel systems.  ...  many-core architectures • New approaches for leveraging on-die messaging facilities • Traditional and new programming models for novel many-core hardware • Concepts for runtime systems on novel many-core  ...  ACKNOWLEDGMENTS The authors would like to thank Adam Lackorzyński and Alexander Warg for sharing their Fiasco.OC knowledge.  ... 
doi:10.1109/crv.2013.5 fatcat:qahsqru4wbdsjeksn742a4fyiy

A Framework for Efficient Execution of Data Parallel Irregular Applications on Heterogeneous Systems

Roberto Ribeiro, João Barbosa, Luís Paulo Santos
2015 Parallel Processing Letters  
This not being the case, consumer kernels can still be used for the irregular application.  ...  Comparisons with an alternative framework, StarPU, which targets regular workloads, consistently demonstrate signicant speedups.  ...  Acknowledgements This work is funded by National Funds through the FCT -Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) and by ERDF -European Regional Development  ... 
doi:10.1142/s0129626415500048 fatcat:myuixzcz7fbctcy7mbcxulpyiy

Memory-Efficient Object-Oriented Programming on GPUs [article]

Matthias Springer
2019 arXiv   pre-print
Object-oriented programming is often regarded as too inefficient for high-performance computing (HPC), despite the fact that many important HPC problems have an inherent object structure.  ...  We first develop an embedded C++ Structure of Arrays (SOA) data layout DSL for SMMO applications. We then design a lock-free, dynamic memory allocator that stores allocations in SOA layout.  ...  Such weak behaviors are in part due to incoherent L1 caches. For example, the old value of a may still be in t 2 's L1 cache while b is a cache miss.  ... 
arXiv:1908.05845v1 fatcat:5o5fn5jcbbfjhl4ikg65tml7by

Resource Allocation for Software Pipelines in Many-core Systems

Janmartin Jahn
This dissertation tackles a major challenge, resource allocation, for complex, memory-intensive applications.  ...  However, as OpenMP focuses on shared-memory systems, parallel OpenMP programs are not suitable for systems without shared memory and cache coherence.  ...  OpenMP OpenMP is an Application Programming Interface (API) for shared-memory parallel programming in C/C++ and Fortran programs.  ... 
doi:10.5445/ir/1000040472 fatcat:2475netffrarvczqwjttkacmla

Learning Capacity in Simulated Virtual Neurological Procedures

Mattia Samuel Mancosu, Silvester Czanner, Martin Punter
2020 Journal of WSCG  
ACKNOWLEDGMENTS The authors acknowledge the support of the NSERC/Creaform Industrial Research Chair on 3-D Scanning for conducting the work presented in this paper.  ...  ACKNOWLEDGEMENTS The authors would like to thank Oana Rotaru-Orhei for her comments and the three anonymous reviewers for their insightful suggestions.  ...  The significance of our work is that we have found a value of epsilon which works across the image data in slicer3D available for our research.  ... 
doi:10.24132/csrn.2020.3001.13 fatcat:uytlm7nytrhmnk553ellfhl54a

Dynamic task scheduling and binding for many-core systems through stream rewriting

Lars Middendorf
Basically, the active tasks of an application and their dependencies are encoded as a token stream, which is iteratively modified by a set of rewriting rules at runtime.  ...  This thesis proposes a novel model of computation, called stream rewriting, for the specification and implementation of highly concurrent applications.  ...  Christian Haubelt for his guidance  ... 
doi:10.18453/rosdok_id00001530 fatcat:ikxsq7kcuvb7phvraau7c743zm

A multitasking and data-driven architecture for multi-agents simulations

Sébastien Schertenleib
From an historical standpoint, 3DRTS started principally as homebrew developments. The underlined consequences are the lack of standardization for producing such applications.  ...  As an outcome, the next-generation of computer hardware and home consoles are presenting multitasking architectures.  ...  This happens when data are shared across components.  ... 
doi:10.5075/epfl-thesis-3545 fatcat:p7mjymobsvdwdbnftyxefeoqmm

50 Algebra in Computational Complexity (Dagstuhl Seminar 14391) Manindra Agrawal, Valentine Kabanets, Thomas Thierauf, and Christopher Umans 85 Privacy and Security in an Age of Surveillance (Dagstuhl Perspectives Workshop

Maria-Florina Balcan, Bodo Manthey, Heiko Röglin, Tim Roughgarden, Artur D'avila Garcez, Marco Gori, Pascal Hitzler, Luís Lamb, Bart Preneel, Phillip Rogaway, Mark Ryan, Peter (+5 others)
Current HPC-related standards (MPI, OpenMP, OpenACC) do not seem suitable since resilience cuts across concrete runtime environments and may also extend beyond HPC to Clouds and data centers involving  ...  Management of forthcoming exascale clusters requires frequent collection and sharing of runtime information about the health of the nodes, their resources and the running applications.  ...  all-pairs shortest paths problem on n-node graphs with edge weights in [0, n k ] (for arbitrary k) running in n 3 /2 (log n) δ time for an unspecified δ > 0.  ... 

INFN-CNAF Annual Report 2016 Editors Luca dell'Agnello Cover Design

Lucia Morganti, Elisabetta Ronchieri, Francesca Address
In Europe the work is supported by the Italian National Institute for Nuclear Physics (INFN), the Italian University and Research Ministry (MIUR), and the University of Geneva.  ...  Serra from Advansid srl for useful discussions on SiPM problematics.  ...  The fair share works by continuously updating a dynamic priority index for each user: it is increased for those having less runtime than expected, and reduced it for those running more.  ... 

Advanced visualization techniques for flow simulations : from higher-order polynomial data to time-dependent topology [article]

Markus Üffinger, Universität Stuttgart, Universität Stuttgart
An indispensable instrument for such analysis is provided by computational Wow visualization.  ...  Computational Wuid dynamics (CFD) has become an important tool for predicting Fluid behavior in research and industry.  ...  An explanation for this behavior might be that newer GPUs with their larger caches are less sensitive to incoherence during computation, which, for example, is induced by the varying step sizes.  ... 
doi:10.18419/opus-6440 fatcat:zphqre7c3zg67coau3dx3vrpvm

Multi-Core Memory Models and Concurrency Theory (Dagstuhl Seminar 11011) Feature-Oriented Software Development (FOSD) (Dagstuhl Seminar 11021) Multimodal Music Processing (Dagstuhl Seminar 11041) Learning from the Past: Implications for the Future Internet and its Management? (Dagstuhl Seminar 11042) Sparse Representations and Efficient Sensing of Data (Dagstuhl Seminar 11051)

Hans Boehm, Ursula Goltz, Holger Hermanns, Peter Sewell, Sven Apel, William Cook, Krzysztof Czarnecki, Oscar, Zhenjiang Hu, Andy Schürr, Perdita Stevens, James (+12 others)
2011 unpublished
The need for self-configuration in access networks, programmable nodes (measurement is an important case on layer 3). 12.  ...  An approach based on rewriting logic for detecting inconsistencies in heterogeneous model-based specifications and to synchronise their constituent models will be presented.  ...  Ongoing work [3] considers lattices as spatial discretization for the hyperbolic cross fast Fourier transform.  ... 
« Previous Showing results 1 — 15 out of 17 results