Filters








25 Hits in 6.0 sec

Towards dynamic reconfigurable load-balancing for hybrid desktop platforms

Alecio P. D. Binotto, Carlos E. Pereira, Dieter W. Fellner
2010 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)  
It has the proposal to extend OpenCL with a module that schedule and balance the workload over the CPU and GPU for the specific case study in a high level.  ...  Innovation The approach abstracts the Pus using the OpenCL API as the platform independent programming model.  ...  A. Binotto thanks the support given by DAAD and Alßan, scholarship no. E07D402961BR.  ... 
doi:10.1109/ipdpsw.2010.5470804 dblp:conf/ipps/BinottoPF10 fatcat:yiorju2tondnbfgfsrw5gcf7xm

An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms

Alecio P. D. Binotto, Carlos E. Pereira, Arjan Kuijper, Andre Stork, Dieter W. Fellner
2011 2011 IEEE International Conference on High Performance Computing and Communications  
It has been a significant research and personal challenge and it is one of the most important steps on my career.  ...  To reach this goal, a set of personal, technical, and financial support were needed, which without any of them I could not have developed this work.  ...  Based on related work, there is also a need to apply those concepts to solvers for SLEs or general tasks over the asymmetric CPU-GPU platform.  ... 
doi:10.1109/hpcc.2011.20 dblp:conf/hpcc/BinottoPKSF11 fatcat:bjdij42z5fe7dmfykjjj3n7p74

Clock Math — a System for Solving SLEs Exactly

Jakub Hladík, Róbert Lórencz, Ivan Šimeček
2013 Acta Polytechnica  
The system is capable of solving Hilbert's matrix without losing a single bit of precision, and with a significant speedup compared to existing CPU solvers.  ...  In this paper, we present a GPU-accelerated hybrid system that solves ill-conditioned systems of linear equations exactly. Exactly means without rounding errors due to using integer arithmetics.  ...  support; • adjust and run the solver on our university STAR cluster to test AMD's OpenCL CPU implementation.  ... 
doaj:425cbe3e525242bbbf2cb1e386dc195d fatcat:epy6kbbxnbfj5fd7nbgafsmbfy

Benchmarking the first generation of production quality Arm‐based supercomputers

Simon McIntosh‐Smith, James Price, Andrei Poenaru, Tom Deakin
2019 Concurrency and Computation  
The first system is Isambard, a Cray XC50 "Scout" system operated by the GW4 Alliance and the UK Met Office as a Tier-2 national HPC service.  ...  Both systems use Marvell ThunderX2 CPUs, which deliver high core counts and class-leading memory bandwidth.  ...  ACKNOWLEDGMENTS As the world's first production Arm supercomputer, the GW4 Isambard project could not have happened without support from a lot of people.  ... 
doi:10.1002/cpe.5569 fatcat:mc6hueaxsncr7cadxgfmjvowim

A performance analysis of the first generation of HPC‐optimized Arm processors

Simon McIntosh‐Smith, James Price, Tom Deakin, Andrei Poenaru
2019 Concurrency and Computation  
Figure 3 compares the performance of our target platforms over a range of representative mini-apps.  ...  First, the wider vectors in the x86 CPUs give them a significant peak floating-point advantage over ThunderX2.  ... 
doi:10.1002/cpe.5110 fatcat:jzyeber3dve5xf23e5jqphkaxe

Large-scale parallelism for constraint-based local search: the costas array case study

Yves Caniou, Philippe Codognet, Florian Richoux, Daniel Diaz, Salvador Abreu
2014 Constraints  
We present the parallel implementation of a constraint-based Local Search algorithm and investigate its performance on several hardware platforms with several hundreds or thousands of cores.  ...  Performance evaluation on some classical CSP benchmarks shows that speedups are very good for a few tens of cores, and good up to a few hundreds of cores.  ...  In this paper we address the issue of parallelizing constraint solvers for massively parallel architectures, with the aim of tackling platforms with several thousands of CPUs.  ... 
doi:10.1007/s10601-014-9168-4 fatcat:7ih73pvltvdebchs3erjchjhte

D9.3.3: Report on prototypes evaluation

Lennart Johnsson, Gilbert Netzer
2013 Zenodo  
DSPs common for embedded systems and with a TDP about one order of magnitude less than x86 CPUs, the emerging heterogeneous CPUs integrating x86 and GPU cores, and traditional GPUs with a novel direct  ...  Prototype efforts assessed the use of FPGAs for function acceleration, the use of CPUs for the mobile market and with a TDP about two orders of magnitude less than typical x86 CPUs for the HPC market,  ...  The GPU based prototype did not have a significant advantage over x86 CPUs from an energy efficiency perspective.  ... 
doi:10.5281/zenodo.6553033 fatcat:nvxbrlq5jzdfhbkh5fde3kpl4e

D9.2.1: First Report on Multi-Petascale to Exascale Software

Volker Strumpen
2011 Zenodo  
Since the recent end of frequency scaling, we observe a rapid evolution of power-efficient computer architectures.  ...  Tegra 3 contains three building blocks, a quad-core CPU, a 12-core GPU, and a special-purpose H.264 video decoder.  ...  A CUDA program consists of one or more sections that are executed on either the host CPU or a GPU device.  ... 
doi:10.5281/zenodo.6552877 fatcat:2bluglv435ew5chaauw2ppdhl4

Etude de la Distribution de Calculs Creux sur une Grappe Multi-coeurs [article]

Mouadh Ayachi
2019 arXiv   pre-print
Thus, it is necessary in this case to run these applications on architectures parallel making multiple computers work together and running over 10 operations at floating point per second (or a petaflops  ...  A hollow matrix is a very large matrix that contains a small proportion non-zero elements.  ...  NVIDIA : Tegra X1 est un processeur qui a un CPU 8 CPU-core, 64-bit ARM® CPU et un GPU NVIDIA Maxwell 256-core GPU, Tegra K1 avec un CPU NVIDIA 4-Plus-1™ Quad-Core ARM Cortex-A15 "r3" et un GPU 192 NVIDIA  ... 
arXiv:1902.02156v1 fatcat:gwfikusiznfevpr3ulk2nycpo4

D6.6: Report on petascale software libraries and programming models

Giovanni Erbacci, Carlo Cavazzoni, Filippo Spiga, Iris Christadler
2009 Zenodo  
The work starts from an analysis of the applications mainly identified in D6.1 and D6.2.2, covering a broad range of scientific areas, and representative of the European HPC usage, to assess the state  ...  OpenCL OpenCL (Open Computing Language, [39] ) is a framework for writing programs that execute across platforms consisting of CPUs, GPUs, DSPs and CELL processors.  ...  dense domain 30m 25 1 V3: sparse subdomain, loop over sparse domain 120m 37 1 V4: arrays distributed, workaround iterator, inner reduction; still serial execution 30m 40 1 V5: proper iterator from Chapel  ... 
doi:10.5281/zenodo.6546116 fatcat:ojachtibqfeorbsef6z2lpvbie

RAPTOR: Robust and Perception-aware Trajectory Replanning for Quadrotor Fast Flight [article]

Boyu Zhou, Jie Pan, Fei Gao, Shaojie Shen
2020 arXiv   pre-print
We also introduce a perception-aware planning strategy to actively observe and avoid unknown obstacles.  ...  In this paper, we present RAPTOR, a robust and perception-aware replanning framework to support fast and safe flight.  ...  All simulations run on an Intel Core i7-8700K CPU and GeForce GTX 1080 Ti GPU.  ... 
arXiv:2007.03465v1 fatcat:igrkpoj5qjc2hhif6ba5g27yzy

Recent Developments and Applications of the MMPBSA Method

Changhao Wang, D'Artagnan Greene, Li Xiao, Ruxi Qi, Ray Luo
2018 Frontiers in Molecular Biosciences  
We conclude with a few future directions aimed at making MMPBSA a more robust and efficient method. Bhavaraju, M., and Hansmann, U. H. E. (2015).  ...  Effect of single point mutations in a form of systemic amyloidosis. (2016). Binding of ACE-inhibitors to in vitro and patient-derived amyloid-beta fibril models.  ...  After extensive testing, the optimal GPU performance was observed using the Jacobi-preconditioned CG solver with a significant speedup that was up to 50 times faster than the standard CG solver on CPU.  ... 
doi:10.3389/fmolb.2017.00087 pmid:29367919 pmcid:PMC5768160 fatcat:oyvpseomy5b63ekx2psvi5dusy

A Comprehensive Survey on Secure Outsourced Computation and its Applications

Yang Yang, Xindi Huang, XiMeng Liu, Hongju Cheng, Jian Weng, Xiangyang Luo, Victor Chang
2019 IEEE Access  
In this survey, we provide a technical review and comparison of existing outsourcing schemes using diverse secure computation methods.  ...  The proposed schemes are valid if the SLE problem is solvable. By utilizing AHE scheme, Wang et al. [187] proposed a different iterative approach for securely outsourcing SLE problems.  ...  Using cloud computing technology, resources (e.g., CPU and storage) are available on-demand for users' terminals [1] .  ... 
doi:10.1109/access.2019.2949782 fatcat:ternbyhqezgd5cvhtfqfggqdqq

Acceleration techniques for numerical flow visualization [article]

Simon Stegmaier, Universität Stuttgart, Universität Stuttgart
2006
Obviously, there is a strong dependence on the available computing hardware: what is reasonable on one hardware platform might be unbearable on another platform.  ...  This straightforwardly leads to the idea of switching to another (remote) visualization platform while keeping the researcher's workspace untouched.  ...  When iterating over the destination cells, the destination cell is known and the source cell has to be determined.  ... 
doi:10.18419/opus-2589 fatcat:kzfwvoihbfcvnoa455ubhjufpy

Dagstuhl Reports, Volume 7, Issue 8, August 2017, Complete Issue [article]

2018
Acknowledgments Schloss Dagstuhl was the perfect place for hosting a seminar like this.  ...  In this talk, I present experiments, conducted on a variety of platforms including CPUs and GPUs, that showcase the differences that can occur even for randomly selected inputs.  ...  In this talk I will present the work-in-progress design and implementation of a constraint solver called JIT Fuzzing Solver (JFS).  ... 
doi:10.4230/dagrep.7.8 fatcat:gksmijgk5ff6reblxsqnt33aze
« Previous Showing results 1 — 15 out of 25 results