Filters








22,590 Hits in 3.7 sec

A quantitative performance analysis model for GPU architectures

Yao Zhang, John D. Owens
2011 2011 IEEE 17th International Symposium on High Performance Computer Architecture  
We develop a microbenchmark-based performance model for NVIDIA GeForce 200-series GPUs.  ...  Our model identifies GPU program bottlenecks and quantitatively analyzes performance, and thus allows programmers and architects to predict the benefits of potential program optimizations and architectural  ...  Thanks also to our funding agencies, the HP Labs Innovation Research Program, the National Science Foundation (Award 0541448), and the SciDAC Insti-tute for Ultrascale Visualization, and to NVIDIA for  ... 
doi:10.1109/hpca.2011.5749745 dblp:conf/hpca/ZhangO11 fatcat:34wri3iysbhzlgft567adb3afa

The landscape of GPGPU performance modeling tools

Souley Madougou, Ana Varbanescu, Cees de Laat, Rob van Nieuwpoort
2016 Parallel Computing  
While both programmers and architects have clear opinions about the causes of this performance gap, finding and quantifying the real problems remains a topic for performance modeling tools.  ...  In this paper, we sketch the landscape of modern GPUs' performance limiters and optimization opportunities, and dive into details on modeling attempts for GPU-based systems.  ...  Quantitative methods. The methods in this class use measurements to derive a (set of) model(s) for the performance of a kernel on a given architecture.  ... 
doi:10.1016/j.parco.2016.04.002 fatcat:natyqsdamjhcpjqufmvp5i5g5i

Editorial Performance Modelling, Benchmarking and Simulation of High-Performance Computing Systems

S. A. Jarvis
2011 Computer journal  
Technical submissions were encouraged in areas including: performance modelling and analysis of applications and highperformance computing systems; novel techniques and tools for performance evaluation  ...  The call for papers also encouraged submissions that included analysis of power consumption and reliability, and also performance modelling research that made use of analytical methods as well as those  ...  I am grateful to Jutta Mackwell (at the Computer Journal Editorial Office) and Erol Gelenbe (Editor-in-Chief) for assisting with the production of this issue of the Computer Journal.  ... 
doi:10.1093/comjnl/bxr113 fatcat:z2tb7cuqnvci5ffs6mouc3fddy

Exploring the Heterogeneous Design Space for both Performance and Reliability

Rafael Ubal, Dana Schaa, Perhaad Mistry, Xiang Gong, Yash Ukidave, Zhongliang Chen, Gunar Schirner, David Kaeli
2014 Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14  
We describe the design of a framework that supports a range of heterogeneous devices to be evaluated based on different performance/reliability criteria.  ...  Given the multiplicity of different design trade-offs in hardware and software, and the rate of introduction of new architectures and hardware/software features, it becomes difficult to properly model  ...  ACKNOWLEDGMENTS The authors would like thank AMD, Analog Devices, NVIDIA, Samsung and Qualcomm for supporting this work.  ... 
doi:10.1145/2593069.2596680 dblp:conf/dac/UbalSMGUCSK14 fatcat:i4b2nezub5abbgmmtz32v7mkry

Benchmarking open source deep learning frameworks

Ghadeer Al-Bdour, Raffi Al-Qurran, Mahmoud Al-Ayyoub, Ali Shatnawi
2020 International Journal of Electrical and Computer Engineering (IJECE)  
The purpose of this work is to provide a qualitative and quantitative comparison among three such frameworks: TensorFlow, Theano and CNTK.  ...  For most of our experiments, we find out that CNTK's implementations are superior to the other ones under consideration.  ...  Networks architecture CNN is used for the MNIST, CIFAR-10, IMDB and Self-Driving Car datasets, where a different network architecture is used for each dataset.  ... 
doi:10.11591/ijece.v10i5.pp5479-5486 fatcat:ypbgo5yhybhf5hfj2uoqn7qksq

On the communication complexity of 3D FFTs and its implications for Exascale

Kenneth Czechowski, Casey Battaglino, Chris McClanahan, Kartik Iyer, P.-K. Yeung, Richard Vuduc
2012 Proceedings of the 26th ACM international conference on Supercomputing - ICS '12  
Of particular interest is the performance impact of choosing high-density processors, typified today by graphics co-processors (GPUs), as the base processor for an exascale system.  ...  This paper revisits the communication complexity of largescale 3D fast Fourier transforms (FFTs) and asks what impact trends in current architectures will have on FFT performance at exascale.  ...  APPENDIX We have released a tech report version of this paper which includes an extended appendix with: (1) a more detailed discussion of the the performance model, (2) raw data used to generate the technology  ... 
doi:10.1145/2304576.2304604 dblp:conf/ics/CzechowskiBMIYV12 fatcat:uc5xjoqwojhmjcfqzltaoqshcq

CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis [article]

Maohua Zhu, Liu Liu, Chao Wang, Yuan Xie
2016 arXiv   pre-print
Moreover, we analyze the detailed quantitative performance, throughput, power, energy, and performance density for both approaches.  ...  To improve the performance and maintain the scalability, we present CNNLab, a novel deep learning framework using GPU and FPGA-based accelerators.  ...  Performance. Fig. 6 (a) presents the running time for the eight layers. GPU has better performance than FPGA on all the layers, and the speedup can achieve up to 1000x for FC layers.  ... 
arXiv:1606.06234v1 fatcat:en7acoahonb7beqrnxv553g46e

Real-Time Concrete Crack Detection and Instance Segmentation using Deep Transfer Learning

Lasitha Piyathilaka, D.M.G. Preethichandra, U. Izhar, Gayan Kahandawa
2020 Engineering Proceedings  
We evaluated the trained YOLACT model for concrete crack detection with ResNet-50 and ResNet-101 backbone architectures for both precision and speed of detection.  ...  The trained model achieved high mAP results with real-time frame rates when tested on concrete crack images on a single GPU.  ...  Table 1 . 1 Quantitative results analysis with different backbone architectures.  ... 
doi:10.3390/ecsa-7-08260 fatcat:crnjb6sitfdvpp2thtmp2nsfvq

A Modeling Approach based on UML/MARTE for GPU Architecture [article]

Antonio Wendell De Oliveira Rodrigues , Frédéric Guyomarc'H, Jean-Luc Dekeyser (INRIA Lille - Nord Europe)
2011 arXiv   pre-print
This paper presents a metamodel extension for MARTE profile and a model for GPU architectures. The main goal is to specify the task and data allocation in the memory hierarchy of these architectures.  ...  The results show that this approach will help to generate code for GPUs based on model transformations using Model Driven Engineering (MDE).  ...  In this paper, we present a metamodel for designing of GPU characteristics, and a model for a specific GPU architecture.  ... 
arXiv:1105.4424v1 fatcat:e2csase72bh2vlrldzwkpam3iu

Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants

Fu-Tao Zhang, Zhi-Hong Zhu, Xiao-Ran Tong, Zhi-Xiang Zhu, Ting Qi, Jun Zhu
2015 Scientific Reports  
We proposed mixed linear model approaches using GPU (Graphic Processing Unit) computation to simultaneously dissect various genetic effects.  ...  Analyses can be performed for estimating genetic main effects, GxG epistasis effects, and GxE environment interaction effects on large-scale omics data for complex traits, and for estimating heritability  ...  Pen Wang for his helps in developing GPU-based software, and also thank Drs. Robert Anholt and Jian Yang for reading the manuscript and constructive criticisms.  ... 
doi:10.1038/srep10298 pmid:26223539 pmcid:PMC5155518 fatcat:6qpqwz6hc5gg3ibgmqprdrqaoa

Performance modeling for highly-threaded many-core GPUs

Lin Ma, Roger D. Chamberlain, Kunal Agrawal
2014 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors  
Highly-threaded many-core GPUs can provide high throughput for a wide range of algorithms and applications.  ...  In particular, the model not only helps to explore and reduce the configuration space for tuning kernel execution on GPUs, but also reflects performance bottlenecks and predicts how the runtime will trend  ...  As a general rule, these models fall in two categories: (1) asymptotic models for algorithm analysis at a high level of abstraction that attempt to capture only the essential features of GPU architectures  ... 
doi:10.1109/asap.2014.6868641 dblp:conf/asap/MaCA14 fatcat:u3422gn3wvfexk2ebynu4jdpce

Adaptation of Multidimensional Positive Definite Advection Transport Algorithm to Modern High-Performance Computing Platforms

Bogdan Rosa, Lukasz Szustak, Andrzej A. Wyszogrodzki, Krzysztof Rojek, Damian K. Wójcik, Roman Wyrzykowski
2015 International Journal of Modeling and Optimization  
The new C++ implementations are designed and optimized under modern CPU and GPU based high-performance computing platforms.  ...  Recently, the dynamical core of EULAG has been implemented into COSMO (Consortium for Small-scale Modeling) weather prediction framework and is expected to be in operational use.  ...  Adaptation of MPDATA to modern computing architectures is a step forward in improving performance of the whole EULAG model.  ... 
doi:10.7763/ijmo.2015.v5.456 fatcat:ilnrsepdvrbcfo6azudcz65z4u

Quantitatively driven visualization and analysis on emerging architectures

P McCormick, E Anderson, S Martin, C Brownlee, J Inman, M Maltrud, M Kim, J Ahrens, L Nau
2008 Journal of Physics, Conference Series  
In this paper we explore an approach that exploits these emerging architectures to provide an integrated environment for highperformance data analysis and visualization.  ...  To further complicate matters, the computer architectures that have traditionally provided improved performance are undergoing a revolutionary change as manufacturers transition to building multi-and many-core  ...  In addition, this system combines the operations needed for both qualitative and quantitative visualization and analysis.  ... 
doi:10.1088/1742-6596/125/1/012095 fatcat:nvd6qsstlndyjonm2wjvmumvfm

An Empirical Evaluation of GPGPU Performance Models [chapter]

Souley Madougou, Ana Lucia Varbanescu, Cees de Laat, Rob van Nieuwpoort
2014 Lecture Notes in Computer Science  
In this paper, we sketch the landscape of modern GPUs' performance limiters and optimization opportunities, and dive into details on modeling attempts for GPU-based systems.  ...  While both programmers and architects have clear opinions about the causes of this performance gap, finding and quantifying the real problems remains a topic for performance modeling tools.  ...  We believe our study is a good starting point for a much needed, thorough investigation of the stat-of-the-art in performance modeling for GPUs.  ... 
doi:10.1007/978-3-319-14325-5_15 fatcat:knvwnw7ufjglbo4vgs7qpidqte

CARAT-GxG: CUDA-Accelerated Regression Analysis Toolkit for Large-Scale Gene–Gene Interaction with GPU Computing System

Sungyoung Lee, Min-Seok Kwon, Taesung Park
2014 Cancer Informatics  
In order to overcome this limitation, we propose CARAT-GxG, a GPU computing system-oriented toolkit, for performing regression analysis with GGI using CUDA (compute unified device architecture).  ...  We expect that CARAT-GxG will enable large-scale regression analysis with GGI for GWAS data.  ...  Jointly developed the structure and arguments for the paper: SL, MSK, TP. Con tributed to the writing of the manuscript: SL, TP. Agree with manuscript results and conclusions: MSK.  ... 
doi:10.4137/cin.s16349 pmid:25574130 pmcid:PMC4263399 fatcat:cvuahf6uibhv3oo7pamnel6wsa
« Previous Showing results 1 — 15 out of 22,590 results