193 Hits in 3.6 sec

Supporting CUDA for an extended RISC-V GPU architecture [article]

Ruobing Han, Blaise Tine, Jaewon Lee, Jaewoong Sim, Hyesoon Kim
2021 arXiv   pre-print
More specifically, we design and implement a pipeline that can execute CUDA source code on an RISC-V GPU architecture.  ...  We have succeeded in executing CUDA kernels with several important features, like multi-thread and atomic instructions, on an RISC-V GPU architecture.  ...  executes the binary file on an extended RISC-V GPU architecture.  ... 
arXiv:2109.00673v1 fatcat:c5hebaydfrdf7lphewbxcdyeoa

D7.10: E-CAM Software Porting and Benchmarking Data V

Alan O'Cais Et. Al.
2021 Zenodo  
modules related to those developed in the ESDWs tomassively parallel machines (STFC); and (b) benchmarking and scaling of at least 8 new modules related to those developed in the ESDWs on a variety of architectures  ...  We intend to make this work for any Linux distribution, and a wide variety of CPU architectures (Intel, AMD, ARM, POWER, RISC-V) and accelerators.  ...  The future CSC resources include the LUMI HPC cluster, which includes a large array of AMD Instinct GPUs. As is well known, CUDA is an NVIDIA specific platform or API for Nvidia GPUs.  ... 
doi:10.5281/zenodo.4720645 fatcat:npm3inkqu5bndjpcoaf4e7znsu

A characterization and analysis of PTX kernels

Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili
2009 2009 IEEE International Symposium on Workload Characterization (IISWC)  
The emulator can execute compiled kernels from the CUDA compiler, currently supports the full PTX 1.4 specification [4], and has been validated against the full CUDA SDK.  ...  This paper proposes a set of metrics for GPU workloads and uses these metrics to analyze the behavior of GPU programs.  ...  We also thank David Kaeli, Hyesoon Kim, and Nagesh Lakshminarayana for their insightful comments on this paper.  ... 
doi:10.1109/iiswc.2009.5306801 dblp:conf/iiswc/KerrDY09 fatcat:mz3sbt3drrdnlbo46nqm5gtwmi

Vortex: Extending the RISC-V ISA for GPGPU and 3D-GraphicsResearch [article]

Blaise Tine, Fares Elsabbagh, Krishna Yalamarthy, Hyesoon Kim
2021 arXiv   pre-print
We argue that one of the reasons for the lack of open-source infrastructure for GPUs is rooted in the complexity of their ISA and software stacks.In this work, we first propose an ISA extension to RISC-V  ...  To demonstrate the feasibility of the minimally extended RISC-V ISA, we implemented the complete software and hardware stacks of Vortex on FPGA.  ...  We gratefully acknowledge the support of Intel Corporation and NSF CCRI 2016701, NSF CNS 1815047 for providing FPGA resources.  ... 
arXiv:2110.10857v1 fatcat:bxjizz5hx5dzrft4qb4nhkbdqa

Vortex: OpenCL Compatible RISC-V GPGPU [article]

Fares Elsabbagh, Blaise Tine, Priyadarshini Roshan, Ethan Lyons, Euna Kim, Da Eun Shim, Lingjun Zhu, Sung Kyu Lim, Hyesoon kim
2020 arXiv   pre-print
In this work, we present Vortex, a RISC-V General-Purpose GPU that supports OpenCL.  ...  Vortex implements a SIMT architecture with a minimal ISA extension to RISC-V that enables the execution of OpenCL programs. We also extended OpenCL runtime framework to use the new ISA.  ...  CONCLUSIONS In this paper we proposed Vortex that supports an extended version of RISC-V for GPGPU applications.  ... 
arXiv:2002.12151v1 fatcat:uvuhcu7hbfbkneh3iph5v7cpvm

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 CCF Transactions on High Performance Computing  
In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability  ...  Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers.  ...  CUDA-x86 includes full support for NVIDIA's CUDA C/C++ language for GPUs.  ... 
doi:10.1007/s42514-020-00039-4 fatcat:nn56xhjm6rcu7kya6gfnyjg66q

Accelerating Compute-Intensive Applications with GPUs and FPGAs

Shuai Che, Jie Li, Jeremy W. Sheaffer, Kevin Skadron, John Lach
2008 2008 Symposium on Application Specific Processors  
This is an inherent problem attributable to architectural design, middleware support and programming style of the target platform.  ...  Based on our results, we present an application characteristic to accelerator platform mapping, which can aid developers in selecting an appropriate target architecture for their chosen application.  ...  We would like to thank David Tarjan for his helpful advice, and the anonymous reviewers for their excellent suggestions on how to improve the paper.  ... 
doi:10.1109/sasp.2008.4570793 dblp:conf/sasp/CheLSSL08 fatcat:4bzbpvrponfatlgxe7lu3buxka

GpuTejas: A parallel simulator for GPU architectures

Geetika Malhotra, Seep Goel, Smruti R. Sarangi
2014 2014 21st International Conference on High Performance Computing (HiPC)  
Secondly, it introduces a novel scheduling and partitioning scheme for parallelizing a GPU simulator. We evaluate the performance of our simulator with a set of Rodinia benchmarks.  ...  As compared to the sequential version of GpuTejas, the parallel version has an error limited to <7.67% for our suite of benchmarks, which is similar to the numbers reported by competing parallel simulators  ...  It does not support CUDA and the GPGPU framework.  ... 
doi:10.1109/hipc.2014.7116897 dblp:conf/hipc/MalhotraGS14 fatcat:r4twex4ed5c3fna2wrj2n2zyaq

Design space explorations for streaming accelerators using Streaming Architectural Simulator

Muhammad Shafiq, M. Pericas, N. Navarro, E. Ayguade
2013 Proceedings of 2013 10th International Bhurban Conference on Applied Sciences & Technology (IBCAST)  
Our design space explorations for different architectural aspects of a GPU like device are with reference to a base line established for NVIDIA's Fermi architecture (GPU Tesla C2050).  ...  In the recent years streaming accelerators like GPUs have been pop-up as an effective step towards parallel computing.  ...  Initially they proposed a GPU performance model [17] and later extended it as integrated performance and power model for GPUs [18] . CuMAPz is a CUDA program analysis tool proposed by Y.  ... 
doi:10.1109/ibcast.2013.6512151 fatcat:s7shhkgqlnb4baucblqnxqrvyi

FlexGrip: A soft GPGPU for FPGAs

Kevin Andryc, Murtaza Merchant, Russell Tessier
2013 2013 International Conference on Field-Programmable Technology (FPT)  
This architecture supports direct CUDA compilation to a binary which is executable on the FPGAbased GPGPU without hardware recompilation.  ...  The benefits of our architecture are evaluated for a collection of five standard CUDA benchmarks which are compiled using standard GPGPU compilation tools.  ...  ACKNOWLEDGMENTS We thank L-3 KEO for their support and contributions. We also thank Xilinx for the donation of the ISE 14.2 toolkit and Modelsim SE 10.1 software.  ... 
doi:10.1109/fpt.2013.6718358 dblp:conf/fpt/AndrycMT13 fatcat:7ey67anaezbj7p7dgz2qtlnzty

gpuMF: a framework for parallel hybrid metaheuristics on GPU with application to the minimisation of harmonics in multilevel inverters

Vincent Roberge, Mohammed Tarbouchi, Francis Okou
2015 International Journal of Process Systems Engineering  
To address this shortcoming, we developed gpuMF, a framework for parallel hybrid metaheuristics on GPUs.  ...  GPU metaheuristic framework (gpuMF) exploits the intrinsic parallelism found in metaheuristics and fully utilises the massively parallel architecture of GPUs.  ...  CUDA is proprietary and only for NVIDIA GPUs while OpenCL is an open standard supported by many vendors including NVIDIA.  ... 
doi:10.1504/ijpse.2015.071426 fatcat:r7uftbaj3ncgblefy4lnniq35a

Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures

Naila Farooqui, Andrew Kerr, Greg Eisenhauer, Karsten Schwan, Sudhakar Yalamanchili
2012 2012 IEEE International Symposium on Performance Analysis of Systems & Software  
Lynx is embedded into the broader GPU Ocelot system, which provides run-time code generation of CUDA programs for heterogeneous architectures.  ...  for running instrumented GPU kernels.  ...  ACKNOWLEDGEMENTS This research was supported by NSF under grants CCF-0905459, OCI-0910735, and IIP-1032032.  ... 
doi:10.1109/ispass.2012.6189206 dblp:conf/ispass/FarooquiKESY12 fatcat:pcfzrlhmorhq7c2i4ppjzfyuy4

Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra [article]

Paul Scheffler, Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini
2020 arXiv   pre-print
In this work, we enhance a memory-streaming RISC-V ISA extension to accelerate sparse-dense products through streaming indirection.  ...  We propose further uses for our indirection hardware, such as scatter-gather operations and codebook decoding, and compare our work to state-of-the-art CPU, GPU, and accelerator approaches, measuring a  ...  We evaluate their CsrMV kernels from CUDA Toolkit 10.0 on a GTX 1080 Ti GPU (Pascal GP104 architecture, FP32 and FP64) and a Jetson AGX Xavier (Volta architecture, FP32 only).  ... 
arXiv:2011.08070v2 fatcat:fmvliell4fc6joyox3invozd4e

Comparison of RISC-V and transport triggered architectures for a post-quantum cryptography application

2020 Turkish Journal of Electrical Engineering and Computer Sciences  
The RISC-V is chosen as it is the 9 most lately version of classical RISC architecture.  ...  In this study, we developed an NTRU public key cryptosystem application and designed several processors to 8 compare them in many aspects. We address two different architectures in this work.  ...  An efficient GPU implementation of NTRU was published by Jens Hermans et al by using 24 the CUDA platform [14] .  ... 
doi:10.3906/elk-2003-27 fatcat:3bkhbeosqnhglmczd337m6h6vq

GPGPU Based Parallelized Client-Server Framework for Providing High Performance Computation Support [article]

Poorna Banerjee, Amit Dave
2015 arXiv   pre-print
Parallelization of user-submitted tasks on the GPGPU has been achieved using NVIDIA Compute Unified Device Architecture (CUDA).  ...  With the advent of General Purpose GPUs (GPGPU), applications not directly associated with graphics operations can also harness the computation capabilities of GPUs.  ...  V.  ... 
arXiv:1505.05655v1 fatcat:komsnr3cgvdlxcxxaqupu4tvji
« Previous Showing results 1 — 15 out of 193 results