90 Hits in 11.2 sec

Preparing sparse solvers for exascale computing

Hartwig Anzt, Erik Boman, Rob Falgout, Pieter Ghysels, Michael Heroux, Xiaoye Li, Lois Curfman McInnes, Richard Tran Mills, Sivasankaran Rajamanickam, Karl Rupp, Barry Smith, Ichitaro Yamazaki (+1 others)
2020 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences  
This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms.  ...  Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms.  ...  in machine learning, etc.  ... 
doi:10.1098/rsta.2019.0053 pmid:31955673 fatcat:bqw6xqixbrabddmxglmtcbw2wa

Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning [article]

Aleksei Petrenko, Zhehui Huang, Tushar Kumar, Gaurav Sukhatme, Vladlen Koltun
2020 arXiv   pre-print
Our architecture combines a highly efficient, asynchronous, GPU-based sampler with off-policy correction techniques, allowing us to achieve throughput higher than 10^5 environment frames/second on non-trivial  ...  We present the "Sample Factory", a high-throughput training system optimized for a single-machine setting.  ...  We associate each computational workload with one of three dedicated types of components. These components communicate with each other using a fast protocol based on FIFO queues and shared memory.  ... 
arXiv:2006.11751v2 fatcat:65ajk4my5jaeljhdyb25vstmxe

Concurrent Computing in the Many-core Era (Dagstuhl Seminar 15021)

Michael Philippsen, Pascal Felber, Michael L. Scott, J. Eliot B. Moss, Marc Herbstritt
2015 Dagstuhl Reports  
and potential uses of emerging hardware support for synchronization extensions, and (3) considering the increasing complexity resulting from the explosion in heterogeneity.  ...  This seminar is a successor to Dagstuhl Seminars 08241 "Transactional memory: From implementation to application" and 12161 "Abstractions for scalable multicore computing", respectively held in June 2008  ...  Can introduce message passage, shared memory, and things like memory models later, in appropriate contexts.  ... 
doi:10.4230/dagrep.5.1.1 dblp:journals/dagstuhl-reports/PhilippsenFSM15 fatcat:owcmta65hzb5vmglwq3dwzbehy

The "MIND" scalable PIM architecture [chapter]

Thomas Sterling, Maciej Brodowicz
2005 Advances in Parallel Computing  
MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components.  ...  MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction  ...  Whittaker of NASA/JPL for lending his unparalleled expertise in the field of logic and VLSI design, and countless hours spent in discussions leading to the refinement of the architectural components of  ... 
doi:10.1016/s0927-5452(05)80010-3 fatcat:m7w6wqjxjrg3zowstdubiknvne

Migration in Hardware Transactional Memory on Asymmetric Multiprocessor

Zivojin Sustran, Jelica Protic
2021 IEEE Access  
Therefore, the proposed solution should be fully implemented in hardware.  ...  The experiments were performed using a significantly upgraded Gem5 simulator and eight parallel applications from the STAMP benchmark suite.  ...  The machine learning is frequently used to achieve better performance in systems with transactional memory [56] . The focus in [11] - [13] is on the fairness of scheduling.  ... 
doi:10.1109/access.2021.3077539 fatcat:hddugctl5vfknpkml4tvi5i5ii

High Performance Computing in Satellite SAR Interferometry: A Critical Perspective

Pasquale Imperatore, Antonio Pepe, Eugenio Sansosti
2021 Remote Sensing  
Existing implementations of the different InSAR stages using diverse parallel strategies and architectures are examined and their performance discussed.  ...  Synthetic aperture radar (SAR) interferometry has rapidly evolved in the last decade and can be considered today as a mature technology, which incorporates computationally intensive and data-intensive  ...  Scheme for multiple threads on a shared memory machine. Figure 6 . 6 Figure 6. Scheme for multiple threads on a shared memory machine. Figure 7 . 7 Figure 7.  ... 
doi:10.3390/rs13234756 fatcat:flsy4h75gzdvzgjpfx4xwivem4

The Paramountcy of Reconfigurable Computing [chapter]

Reiner Hartenstein
2012 Energy-Efficient Distributed Computing Systems  
To obtain the payoff from RC we need a new understanding of computing and supercomputing, as well as of the use of accelerators (section 19.6.3).  ...  But brute force disruptive architectural developments in industry and threatening unaffordable operation cost by excessive power consumption are a massive future survival problem for our existing cyber  ...  We have to cope with a wide classification variety of hardware architectures 387 : multi-threading, homogeneous vs. heterogeneous 386 , message passing, shared memory, UMA or NUMA, symmetric (SMP) vs  ... 
doi:10.1002/9781118342015.ch18 fatcat:shfb4oycu5hu5boizx6oltlgwa

Pico: A Domain-Specific Language For Data Analytics Pipelines

Claudia Misale, Marco Aldinucci, Guy Tremblay
2017 Zenodo  
., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics  ...  Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely  ...  Acknowledgements Funding This work has been partially supported by the Italian Ministry of Education and Research (MIUR), by the EU-H2020 RIA project "Toreador" (no. 688797), the EU-H2020 RIA project  ... 
doi:10.5281/zenodo.579753 fatcat:aadje57qh5hk3ijmqn4j7vkhpm

Synchronization methods in parallel and distributed discrete-event simulation

Shafagh Jafer, Qi Liu, Gabriel Wainer
2013 Simulation modelling practice and theory  
When LP1 executes event e1 (whose timestamp is 1), it generates and sends a new event message e2 to LP2 (with a timestamp of 2).  ...  Synchronous operation The first techniques developed for solving these problems proposed different centralized and decentralized mechanisms for implementing global clocks, and they used synchronous operations  ...  The need for a common memory pool, nonetheless, makes these two approaches best suited for shared-memory architectures.  ... 
doi:10.1016/j.simpat.2012.08.003 fatcat:azaynxjyingybamdj2whgo5bme

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks [article]

Sanaa Hamid Mohamed, Taisir E.H. El-Gorashi, Jaafar M.H. Elmirghani
2019 arXiv   pre-print
This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks.  ...  The MapReduce programming model and its widely-used open-source platform; Hadoop, are enabling the development of a large number of cloud-based services and big data applications.  ...  This work was supported by the Engineering and Physical Sciences Research Council, INTERNET (EP/H040536/1), STAR (EP/K016873/1) and TOWS (EP/S016570/1) projects.  ... 
arXiv:1910.00731v1 fatcat:kvi3br4iwzg3bi7fifpgyly7m4

A novel parallel learning algorithm for pattern classification

Yi Wang, Jian Fu, Bingyang Wei
2019 SN Applied Sciences  
Margin setting algorithm (MSA) is a novel machine learning algorithm for pattern classification.  ...  To reduce the execution time during classification, a parallel implementation of MSA, called PMSA is proposed for multicore and multiprocessor system.  ...  There are POSIX thread, Open Multi-Processing (OpenMP) and Message Passing Interface (MPI) for shared and distributed memory architectures [14] .  ... 
doi:10.1007/s42452-019-1687-6 fatcat:hb4bor76cnazjgbdks7xorb2cy

A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic [article]

Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean, Jack Dongarra, Mark Gates, Thomas Grützmacher, Nicholas J. Higham, Sherry Li, Neil Lindquist, Yang Liu (+13 others)
2020 arXiv   pre-print
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in  ...  As we expect the reader to be familiar with the basics of numerical linear algebra, we refrain from providing a detailed background on the algorithms themselves but focus on how mixed- and multiprecision  ...  Acknowledgments This work was supported by the US Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S.  ... 
arXiv:2007.06674v1 fatcat:o5bkpov6bfd6fborkmcujgrwpu

D6.4: Report on approaches to Petascaling

Mohammad Jowkar, Carlo Cavazzoni, Xu Guo, Giorgos Goumas
2009 Zenodo  
The work done in task 6.4 will together with task 6.5 be used in task 6.3 to create a benchmark set.  ...  Task 6.3 in return, will be used by task 5.4 for evaluating and comparing potential future petaflop/s systems.  ...  The code authors try to keep a uniform format throughout the application. Generally useful comments are found in most parts of the code, but mostly in German.  ... 
doi:10.5281/zenodo.6546112 fatcat:rsmdzoeqbbbdzoe2zkx3czi2ry

Parallel Logic Programming: A Sequel [article]

Agostino Dovier, Andrea Formisano, Gopal Gupta, Manuel V. Hermenegildo, Enrico Pontelli, Ricardo Rocha
2022 arXiv   pre-print
The goal of the survey is to serve not only as a reference for researchers and developers of logic programming systems, but also as engaging reading for anyone interested in logic and as a useful source  ...  Since its inception, logic programming has been recognized as a programming paradigm with great potential for automated exploitation of parallelism.  ...  The approaches for the explicit description of parallelism in logic programming can be largely classified into three categories: (1) message passing; (2) shared memory; and (3) data-flow.  ... 
arXiv:2111.11218v2 fatcat:hek4fidju5fblprut2squ6o3rm

EURETILE D7.3 - Dynamic DAL benchmark coding, measurements on MPI version of DPSNN-STDP (distributed plastic spiking neural net) and improvements to other DAL codes [article]

Pier Stanislao Paolucci, Iuliana Bacivarov, Devendra Rai, Lars Schor, Lothar Thiele, Hoeseok Yang, Elena Pastorelli, Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Francesco Simula (+2 others)
2014 arXiv   pre-print
The project is about the software and hardware architecture of future many-tile distributed fault-tolerant systems.  ...  The EURETILE project required the selection and coding of a set of dedicated benchmarks.  ...  In a hardware implementation, based on several independent memory banks, if all synapses incoming to the same neuron were stored in contiguity, this task could be easily accelerated. 9 If the barrier  ... 
arXiv:1408.4587v1 fatcat:dl6xlqx7cba4hlvfqvhv6blavq
« Previous Showing results 1 — 15 out of 90 results