CUDA ARCHITECTURE ANALYSIS AS THE DRIVING FORCE OF PARALLEL CALCULATION ORGANIZATION [chapter]

Andriy Dudnik, Taras Shevchenko National University of Kyiv, Ukraine, Tetiana Domkiv, National Aviation University, Ukraine
2020 Innovative scientific researches: European development trends and regional aspect  
Advancements in the Internet and cloud computing have led to a wealth of multimedia data, and processing of these data has become more complex and computationally intensive. With the advent of scalable low-cost GPUs with very high computing power, processing such big data has become less expensive and efficient. Rapid developments are also taking place in the field of programming languages and various programming and debugging tools, which simplify GPU programming. However, efficient and
more » ... e use of GPU resources remains a challenge. The purpose of this article is to provide a brief overview of the NVIDIA CUDA architecture and to consider the various programming and optimization strategies adopted by researchers to accelerate GPU computing. The purpose of this study is to provide researchers with knowledge about the various programming methods and optimizations in GPU programming and to motivate them to create highly efficient parallel algorithms by removing the maximum available graphics processor capabilities. Graphics Processing Unit (GPU) has entered the General Purpose Computing Domain (GPGPU) for over a decade now. The growth of frequencies of universal processors is stopped by physical limitations and high power consumption, and their performance is increasing more and more often due to the placement of several cores in one chip. Each core works separately from the others, following Chapter «Engineering sciences» different instructions for different processes. Specialized vector capabilities (SSЕ2 and SSЕ3) for four-component (single precision floating-point calculations) and two-component (double precision) vectors appeared in universal processors due to the increased demand for graphic applications in the first place. That is why the use of GPU is more profitable for certain tasks, because they have been made for this. For example, in Nvidia video chips, the main unit is a multiprocessor with eight to ten cores and hundreds of ALUs in general, several thousand registers and a small amount of shared memory. In addition, the video card contains fast global memory with access to all multiprocessors, local memory in each multiprocessor, as well as special memory for constants. CPU cores are designed to execute a single stream of consecutive instructions with maximum performance, and GPUs are designed to quickly execute a large number of parallel threads of instructions. Universal processors are optimized to achieve high performance of a single instruction stream, processing both integers and floating point numbers. At the same time, access to memory is random.
doi:10.30525/978-9934-588-38-9-59 fatcat:kd37fedkvzbldcwe7fpu32pexa