Filters








254 Hits in 4.1 sec

Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations [article]

Sian Jin, Pascal Grosset, Christopher M. Biwer, Jesus Pulido, Jiannan Tian, Dingwen Tao, James Ahrens
2020 arXiv   pre-print
higher compression and decompression throughput than CPU-based compressors.  ...  evaluate the practicality of using GPU-based lossy compression on two real-world extreme-scale cosmology simulations, namely HACC and Nyx, based on a series of assessment metrics; and (3) we develop a  ...  We acknowledge the computing resources provided on Darwin, which is operated by the Los Alamos National Laboratory.  ... 
arXiv:2004.00224v1 fatcat:qrp75qnsdndfddzcmfzlhwrbiy

A Survey of Compressed GPU-Based Direct Volume Rendering [article]

Marcos Balsa Rodríguez, Enrico Gobbetti, José A. Iglesias Guitián, Maxim Makhinya, Fabio Marton, Renato Pajarola, Susanne K. Suter
2012 Eurographics State of the Art Reports  
at data production time and decompress on demand at rendering time.  ...  Great advancements in commodity graphics hardware have favored GPU-based volume rendering as the main adopted solution for interactive exploration of rectilinear scalar volumes on commodity platforms.  ...  Introduction GPU accelerated direct volume rendering (DVR) on consumer platforms is nowadays the standard approach for interactively exploring rectilinear scalar volumes.  ... 
doi:10.2312/conf/eg2013/stars/117-136 fatcat:3cadb2miwngrjoaqmrmudww6lq

State-of-the-Art in Compressed GPU-Based Direct Volume Rendering

M. Balsa Rodríguez, E. Gobbetti, J.A. Iglesias Guitián, M. Makhinya, F. Marton, R. Pajarola, S.K. Suter
2014 Computer graphics forum (Print)  
at data production time and decompress on demand at rendering time.  ...  Abstract Great advancements in commodity graphics hardware have favored GPU-based volume rendering as the main adopted solution for interactive exploration of rectilinear scalar volumes on commodity platforms  ...  Introduction GPU accelerated direct volume rendering (DVR) on consumer platforms is nowadays the standard approach for interactively exploring rectilinear scalar volumes.  ... 
doi:10.1111/cgf.12280 fatcat:3stzgmtlw5hwtk6zxiz4kaep6m

User-Defined Functions for HDF5 [article]

Lucas C. Villa Real, Maximilien de Bayser
2021 arXiv   pre-print
Moreover, we describe the built-in security model that limits the system resources a UDF can access.  ...  Some of those processing tasks include filtering, cleansing, aggregation, normalization, and data format translation -- all of which generate even more data.  ...  An alternative solution targeting such systems could leverage a memory- mapped mechanism to load (and possibly decompress) pages of data on demand.  ... 
arXiv:2109.11709v1 fatcat:thjvemkxfjhpniofip6zubx67a

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks [article]

Minsoo Rhu, Mike O'Connor, Niladrish Chatterjee, Jeff Pool, Stephen W. Keckler
2017 arXiv   pre-print
Popular deep learning frameworks require users to fine-tune their memory usage so that the training data of a deep neural network (DNN) fits within the GPU physical memory.  ...  We introduce a high-performance virtualization strategy based on a "compressing DMA engine" (cDMA) that drastically reduces the size of the data structures that are targeted for CPU-side allocations.  ...  Figure 9 provides an overview of the cDMA architecture embedded into the memory system of a GPU.  ... 
arXiv:1705.01626v1 fatcat:pdwydyfxiffwtpyemiu3gh6xpa

Fine-Grained Energy and Performance Profiling framework for Deep Convolutional Neural Networks [article]

Crefeda Faviola Rodrigues, Graham Riley, Mikel Lujan
2018 arXiv   pre-print
There is a huge demand for on-device execution of deep learning algorithms on mobile and embedded platforms. These devices present constraints on the application due to limited resources and power.  ...  However, current benchmarks studies in existing deep learning frameworks (for example, Caffe, Tensorflow, Torch and others) are based on performance of these applications on high-end CPUs and GPUs.  ...  Figure 6 , shows the performance and energy on our system at different levels (CPU, GPU and System) for an inference executing on the GPU.  ... 
arXiv:1803.11151v2 fatcat:k3h4gnnbdvfk7beac6nhlmreem

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs [article]

Esha Choukse, Michael Sullivan, Mike O'Connor, Mattan Erez, Jeff Pool, David Nellans, Steve Keckler
2019 arXiv   pre-print
GPUs offer orders-of-magnitude higher memory bandwidth than traditional CPU-only systems.  ...  Highly-compressible memory entries can thus be accessed completely from device memory, while incompressible entries source their data using both on and off-device accesses.  ...  While domain-specific compression [18, 19] has been explored to help with large workloads on GPUs, hardware memory compression to increase memory capacity for general purpose applications in GPUs remains  ... 
arXiv:1903.02596v2 fatcat:f66tmngn3nalxc77nloqzfwi4e

Hardware accelerator design for data centers

Serif Yesil, Muhammet Mustafa Ozdal, Taemin Kim, Andrey Ayupov, Steven Burns, Ozcan Ozturk
2015 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)  
To overcome this problem, customized applicationspecific accelerators are becoming integral parts of modern system on chip (SOC) architectures.  ...  As the size of available data is increasing, it is becoming inefficient to scale the computational power of traditional systems.  ...  Furthermore, IBM and NVIDIA are collaborating on integration of GPUs in data centers [10] . OpenPower initiative has allowed many data analytic applications to be ported to GPUs.  ... 
doi:10.1109/iccad.2015.7372648 dblp:conf/iccad/YesilOKABO15 fatcat:swlgxlan55ezhcuzifg2hr2fga

Accelerating Deep Neural Networks implementation: A survey

Meriam Dhouibi, Ahmed Karim Ben Salem, Afef Saidi, Slim Ben Saoud
2021 IET Computers & Digital Techniques  
Deploying such Deep Neural Networks (DNN) on embedded devices is still a challenging task considering the massive requirement of computation and storage.  ...  Then, a detailed description of different optimization techniques used in recent research works is explored.  ...  In this perspective, Xilinx invented PYNQ to design embedded systems with their Zynq SoCs on easier way.  ... 
doi:10.1049/cdt2.12016 fatcat:3kl4j5ztl5eahmgv7vetu2egay

Project CGX: Algorithmic and System Support for Scalable Deep Learning on a Budget [article]

Ilia Markov, Hamidreza Ramezanikebrya, Dan Alistarh
2022 arXiv   pre-print
The main scaling approach is data-parallel GPU-based training, which has been boosted by hardware and software support for highly efficient inter-GPU communication, in particular via bandwidth overprovisioning  ...  The ability to scale out training workloads has been one of the key performance enablers of deep learning.  ...  The NVIDIA GPUDirect technology allows GPUs on the same machine to communicate faster without the need for extra memory copies.  ... 
arXiv:2111.08617v3 fatcat:p5dyekknfrhjfnmstxd77zqkpi

DC-Patch: A Microarchitectural Fault Patching Technique for GPU Register Files

Alejandro Valero, Dario Suarez-Gracia, Ruben Gran-Tejero
2020 IEEE Access  
These benefits are obtained with less than 2 and 6% impact on the system performance and area, respectively.  ...  This work aims to tolerate permanent faults from process variations in large GPU register files operating below the safe supply voltage limit.  ...  systems to high-performance data centers.  ... 
doi:10.1109/access.2020.3025899 fatcat:bvwhzkmssjhodd7ji3zour3cc4

Wavelet encoding of BRDFs for real-time rendering

Luc Claustres, Loïc Barthe, Mathias Paulin
2007 Graphics Interface  
Its integration into most real-time rendering systems requires both data compression and the implementation of the decompression and filtering stages on contemporary graphics processing units (GPUs).  ...  This paper improves the quality of real-time per-pixel lighting on GPUs using a wavelet decomposition of acquired BRDFs.  ...  GPU implementations to speed up the transform, e.g. the JasPer codec [40] , do exist but decompression occurs on the whole image and not on a per-pixel basis [12] .  ... 
doi:10.1145/1268517.1268546 dblp:conf/graphicsinterface/ClaustresBP07 fatcat:myasozrdunhpbexbp33gza5j6e

On the path to sustainable, scalable, and energy-efficient data analytics: Challenges, promises, and future directions

Sriram Lakshminarasimhan, Prabhat Kumar, Wei-keng Liao, Alok Choudhary, Vipin Kumar, Nagiza F. Samatova
2012 2012 International Green Computing Conference (IGCC)  
We propose a number of future directions that could be pursued on the path to sustainable data analytics at scale.  ...  As scientific data is reaching exascale, scalable and energy efficient data analytics is quickly becoming a top notch priority.  ...  ACKNOWLEDGEMENTS We would like to thank David Boyuka II and John Jenkins for insightful comments and discussions on the paper. This work was supported in part by the U.S.  ... 
doi:10.1109/igcc.2012.6322265 dblp:conf/green/LakshminarasimhanKLCKS12 fatcat:dremvfojkre5vm2n7qhimzxjgm

syGlass: Interactive Exploration of Multidimensional Images Using Virtual Reality Head-mounted Displays [article]

Stanislav Pidhorskyi, Michael Morehead, Quinn Jones, George Spirou, Gianfranco Doretto
2018 arXiv   pre-print
data exploration, annotation, and cataloguing.  ...  Inspecting and manipulating data of this complexity is very challenging in traditional visualization systems.  ...  After decompression, they can be efficiently uploaded to GPU as a 3D texture, because the voxel data they represent is already memory aligned.  ... 
arXiv:1804.08197v4 fatcat:q5de65caijfplfscg7ppmkbmse

Horizontal Review on Video Surveillance for Smart Cities: Edge Devices, Applications, Datasets, and Future Trends

Mostafa Ahmed Ezzat, Mohamed A. Abd El Ghany, Sultan Almotairi, Mohammed A.-M. Salem
2021 Sensors  
The automation strategy of today's smart cities relies on large IoT (internet of Things) systems that collect big data analytics to gain insights.  ...  Namely, the application of video surveillance in smart cities, algorithms, datasets, and embedded systems.  ...  One of the statistical simulation methods used in analytics, data processing, and machine learning is decision tree learning.  ... 
doi:10.3390/s21093222 pmid:34066509 fatcat:27lploodmvdl3k36x4jzw2acly
« Previous Showing results 1 — 15 out of 254 results