Filters








74,767 Hits in 3.4 sec

Benchmarking weak memory models

Carl G. Ritson, Scott Owens
2016 Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16  
To achieve good multi-core performance, modern microprocessors have weak memory models, rather than enforce sequential consistency.  ...  In particular, our technique supports the reasoned selection of macrobenchmarks to use in investigating trade-offs in using weak memory models.  ...  Introduction The complexity of weak memory consistency models (WMMs), as implemented in modern hardware (x86, ARM, POWER, etc.), makes the challenging task of writing correct and efficient concurrent programs  ... 
doi:10.1145/2851141.2851150 dblp:conf/ppopp/RitsonO16 fatcat:sbysez2eyncp7mie25g4vvmx74

Extending the BT NAS Parallel Benchmark to exascale computing

Rob F. Van der Wijngaart, Srinivas Sridharan, Victor W. Lee
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
The NAS Parallel Benchmarks (NPB) are a wellknown suite of benchmarks that proxy scientific computing applications.  ...  We discuss how scaling BT would impact computation, memory access, and communications, and highlight the expected bottleneck, which turns out to be not memory or communication bandwidth, but network latency  ...  PRECISE ANALYTICAL MODEL In this section we first describe the precise model of computation, memory usage, and communication for our target benchmark BT.  ... 
doi:10.1109/sc.2012.55 dblp:conf/sc/WijngaartSL12 fatcat:lfnhjnwmdrdsllljgi3klamo6q

GenMC: A Model Checker for Weak Memory Models [article]

Michalis Kokologiannakis, Viktor Vafeiadis
2021 Zenodo  
This is the artifact accompanying our paper "GenMC: A Model Checker for Weak Memory Models", accepted for publication at CAV 2021.  ...  Apart from the claims above, other minor claims regarding GenMC's features are made throughout the paper (e.g., memory model support, spinloop handling, etc).  ...  The artifact (available on Zenodo) consists of a Docker image containing binaries for all the model checking tools used, along with all the benchmarks used in the submitted version of our paper, and GenMC's  ... 
doi:10.5281/zenodo.4884947 fatcat:uso2wcoahrfcbmpx4ti2sb3vda

Petascale Block-Structured AMR Applications without Distributed Meta-data [chapter]

Brian Van Straalen, Phil Colella, Daniel T. Graves, Noel Keen
2011 Lecture Notes in Computer Science  
Both show good weak scaling to 131K processors without any thread-level or SIMD vector parallelism.  ...  Therefore, we focus on a methodology for constructing weak-scaled AMR benchmarks because this methodology models the dominant usecase for scientific problems that employ this computational method.  ...  Figure 3 shows the memory usage for a sample weak scaling run of a gas dynamics solver.  ... 
doi:10.1007/978-3-642-23397-5_37 fatcat:6bkihbqxxbdvdi4xh6xfz64o64

GenMC: A Model Checker for Weak Memory Models [article]

Michalis Kokologiannakis, Viktor Vafeiadis
2021 Zenodo  
This is the artifact accompanying our paper "GenMC: A Model Checker for Weak Memory Models", conditionally accepted for publication at CAV 2021.  ...  Apart from the claims above, other minor claims regarding GenMC's features are made throughout the paper (e.g., memory model support, spinloop handling, etc).  ...  A.7.8 Reproducing Claim 8 (< 5m). ./ memory . sh Runs GenMC on a benchmark similar to the one of Section 6.3 in the paper. A memory race is detected.  ... 
doi:10.5281/zenodo.4722967 fatcat:7skmlq2al5gidd6rohx5banarm

TentacleNet: A Pseudo-Ensemble Template for Accurate Binary Convolutional Neural Networks [article]

Luca Mocerino, Andrea Calimera
2019 arXiv   pre-print
Experimental results collected over three realistic benchmarks show TentacleNet fills the gap left by classical binary models, ensuring substantial memory savings w.r.t. state-of-the-art binary ensemble  ...  Despite the unquestionable savings offered, memory footprint above all, it may induce an excessive accuracy loss that prevents a widespread use.  ...  BENN boosting) with large memory savings (44.4%). To be noted that the memory footprint of the smallest BENN (20629 kB) gets bigger than the original FP32 model (19984 kB).  ... 
arXiv:1912.10103v2 fatcat:intkoln2xfc67efz2fvu46nq4a

Performance Modeling of Hybrid MPI/OpenMP Scientific Applications on Large-scale Multicore Cluster Systems

Xingfu Wu, Valerie Taylor
2011 2011 14th IEEE International Conference on Computational Science and Engineering  
We use STREAM memory benchmarks to provide initial performance analysis and model validation of MPI and OpenMP programs on these multicore clusters because the measured sustained memory bandwidth can provide  ...  In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code in magnetic fusion to validate our performance model of  ...  We use the OpenMP models to model the performance of the hybrid GTC because the hybrid GTC is a weak scaling application.  ... 
doi:10.1109/cse.2011.42 dblp:conf/cse/WuT11 fatcat:jdmv3dqdgzd2fji5wqvntnslfe

Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

Xingfu Wu, Valerie Taylor
2013 Journal of computer and system sciences (Print)  
We use STREAM memory benchmarks and Intel's MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured  ...  In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model  ...  , Intel's MPI benchmarks and the hybrid MPI/OpenMP GTC, and presented a practical performance modeling framework for weak-scaling hybrid MPI/OpenMP applications based on the memory bandwidth contention  ... 
doi:10.1016/j.jcss.2013.02.005 fatcat:endtpctavzgmtpd7qjl52tahse

Cache coherence for GPU architectures

I. Singh, A. Shriraman, W. W. L. Fung, M. O'Connor, T. M. Aamodt
2013 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)  
Moreover, these protocols increase the verification complexity of the GPU memory system.  ...  By providing coherent L1 caches, TC-Weak improves the performance of GPU applications with inter-workgroup communication by 85% over disabling the non-coherent L1 caches in the baseline GPU.  ...  TC-Weak uses timestamps to drive all consistency operations. It imple- [34, 54] (b) TC-Weak ments Release Consistency [19] , enabling full support of C++ and Java memory models [58] on GPUs.  ... 
doi:10.1109/hpca.2013.6522351 dblp:conf/hpca/SinghSFOA13 fatcat:jim536rppvbpdkl4zuqvdrpbxy

Regional Consistency: Programmability and Performance for Non-Cache-Coherent Systems [article]

Bharath Ramesh and Calvin J. Ribbens and Srinidhi Varadarajan
2013 arXiv   pre-print
Results on up to 256 processors for representative benchmarks demonstrate the potential of RegC in the context of our prototype distributed shared memory system.  ...  Our primary objective is to define a memory consistency model that presents the familiar thread-based shared memory programming model, but allows good application performance on non-cache-coherent systems  ...  The weak consistency (WC) model, one of the earliest weak models, differentiates shared data into two categories: data that has no effect on concurrent execution, and data that includes synchronization  ... 
arXiv:1301.4490v1 fatcat:6vouqq535fa5ldk6egitb2kzlq

Regional Consistency: Programmability and Performance for Non-cache-coherent Systems

Bharath Ramesh, Calvin J. Ribbens, Srinidhi Varadarajan
2013 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications  
Results on up to 256 processors for representative benchmarks demonstrate the potential of RegC in the context of our prototype distributed shared memory system.  ...  Our primary objective is to define a memory consistency model that presents the familiar threadbased shared memory programming model, but allows good application performance on non-cache-coherent systems  ...  The weak consistency (WC) model, one of the earliest weak models, differentiates shared data into two categories: data that has no effect on concurrent execution, and data that includes synchronization  ... 
doi:10.1109/trustcom.2013.115 dblp:conf/trustcom/RameshRV13 fatcat:bc7egm6ufng4zawzojixfx2z2e

Performance Evaluation of ParalleX Execution model on Arm-based Platforms [article]

Nikunj Gupta, Rohit Ashiwal, Bine Brank, Sateesh K. Peddoju, Dirk Pleiter
2020 arXiv   pre-print
In this paper, we port an Asynchronous Many-Task runtime system based on the ParalleX model, i.e., High Performance ParalleX (HPX), and evaluate it on the Arm ecosystem with a suite of benchmarks.  ...  We wrote these benchmarks with an emphasis on vectorization and distributed scaling.  ...  In this paper, we execute several benchmarks on an AMT based on the ParalleX model, i.e. HPX. We investigate both distributed and shared memory models with a special emphasis on vectorization.  ... 
arXiv:2010.12195v1 fatcat:uofh3nlz7fg43fsiwexjdji6ci

ARM-Powered Numerical Weather Prediction: Running the ECMWF Model on Fugaku

Sam Hatfield
2022 Zenodo  
Taking advantage of a collaboration between the European Centre for Medium-Range Weather Forecasts (ECMWF) and R-CCS, we have been evaluating the IFS global atmospheric model on Fugaku for the purposes  ...  In this poster we will recount our experiences in porting the IFS to Fugaku and provide benchmark comparisons with ECMWF's brand new AMD-powered supercomputer, the Atos BullSequana XH2000.  ...  memory to run 12 MPI ranks per node, and so we only use 6High- performance and portable spectral transforms with ecTrans • ECMWF has started to make certain components of its model open source • Notably  ... 
doi:10.5281/zenodo.6806031 fatcat:guh6fcinwfgqzl4ieuy6mtav2a

Framework Support for the Efficient Implementation of Multi-version Algorithms [chapter]

Ricardo J. Dias, Tiago M. Vale, João M. Lourenço
2015 Lecture Notes in Computer Science  
In transactional memory multi-version algorithms, several versions of the same memory location may exist.  ...  Software Transactional Memory algorithms associate metadata with the memory locations accessed during a transaction's lifetime.  ...  The adaptation of these algorithms to support a weak-atomicity model is straightforward.  ... 
doi:10.1007/978-3-319-14720-8_8 fatcat:2ng7e2npprhyzg2wcoxpmhefp4

Performance Modeling of Maximal Sharing

Michael J. Steindorfer, Jurgen J. Vinju
2016 Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering - ICPE '16  
We used a combination of new techniques to predict the impact of maximal sharing on existing code: Object Redundancy Profiling (ORP) to model the effect on memory when sharing all immutable objects, and  ...  on applying MASHO to real and complex case: we conclude that ORP and ECP combined can accurately predict gains and losses of maximal sharing, and also that (by isolating variables) a cheap predictive model  ...  Under the assumption of weak immutability, fingerprinting leads to an accurate model efficiently.  ... 
doi:10.1145/2851553.2851566 dblp:conf/wosp/SteindorferV16 fatcat:lfegfjicrvf3jofg7ts4ehrd3i
« Previous Showing results 1 — 15 out of 74,767 results