A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
[article]
2019
arXiv
pre-print
We present our vertically integrated hardware/software co-design, which includes a custom DIMM module enhanced with near-data processing cores tailored for DL tensor operations. ...
These custom DIMMs are populated inside a GPU-centric system interconnect as a remote memory pool, allowing GPUs to utilize for scalable memory bandwidth and capacity expansion. ...
for near-memory tensor operations. ...
arXiv:1908.03072v2
fatcat:yiwl72jovnhkniwtn6cg3owdfy
Application-Driven Near-Data Processing for Similarity Search
[article]
2017
arXiv
pre-print
This paper proposes an application-driven near-data processing accelerator for similarity search: the Similarity Search Associative Memory (SSAM). ...
Similarity search is a key to a variety of applications including content-based search for images and video, recommendation systems, data deduplication, natural language processing, computer vision, databases ...
RELATED WORK The concept of near-data processing has been studied in the literature for decades. ...
arXiv:1606.03742v2
fatcat:tgyyr4avubbzjmr7pz7obiqmle
Memory Management Techniques in Heterogeneous Computing Systems
2019
IET Computers & Digital Techniques
Change in dynamic random access memory (DRAM) architecture, integration of memory-centric hardware accelerator in the heterogeneous system and Processing-in-Memory (PIM) are the techniques adopted from ...
The CPU/GPU processes all the data on a computer's memory and hence the speed of the data movement to/from memory and the size of the memory affect computer speed. ...
In this architecture, the memory controller is placed near memory to reduce the data transfer overheads caused by SIMD (single-instruction-multiple-data) processors. ...
doi:10.1049/iet-cdt.2019.0092
fatcat:vnsfc6twxncxdfav4bzhxdxn64
GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing
[article]
2021
arXiv
pre-print
Recently, Graph Convolutional Networks (GCNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data. ...
Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory to buffer the intermediate data for back-propagation. 2) GCN training involves both memory-intensive data ...
DRAM-based Near-Memory Processing Many prior works propose 3D/2.5D-stacked memory based NMP accelerators for graph processing [40] - [44] , DNN acceleration [83] - [91] , or general-purpose applications ...
arXiv:2111.00680v1
fatcat:3inhvousrvdkbcxtcrxfxiplwm
A Modern Primer on Processing in Memory
[article]
2022
arXiv
pre-print
PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, or in the memory controllers), so that data movement between ...
The emergence of 3D-stacked memory plus logic, the adoption of error correcting codes inside the latest DRAM chips, proliferation of different main memory standards and chips, specialized for different ...
Symposium on Advanced Parallel Processing Technology in August 2019 [471] , and as a keynote talk at the 37th IEEE International Conference on Computer Design in November 2019 [222] . ...
arXiv:2012.03112v2
fatcat:z7rsgn54mjbmncv5ezq4u3aqqm
A Case Study of Processing-in-Memory in off-the-Shelf Systems
2021
USENIX Annual Technical Conference
Systems designed to perform computing in or near memory have been proposed for decades to overcome the proverbial memory wall, yet most never made it past blueprints or simulations. ...
This property helps some applications defy the von-Neumann bottleneck, while for others, architectural limitations stand in the way of reaching the hardware potential. Our analysis explains why. ...
A special thank you to Vincent Palatin for his patience and support explaining the details of this new hardware. ...
dblp:conf/usenix/NiderMZRLG0JGCF21
fatcat:oyu474s6zjfm7nxogzsxsifqey
McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge
2020
IEEE Access
The server-class Titan RTX GPU [34] , which is the fastest GPU at int4 precision, has approximately 30 MB of on-chip SRAM cache/register file/shared memory and a complicated datapath architecture for ...
-We propose a novel dataflow for the judicious utilization of the existing in-DRAM bus to provide the systolic array accelerator with input data for matrix-matrix multiplication. ...
doi:10.1109/access.2020.3011265
fatcat:hmfnggvh7nbglef3dkzkuxjloa
An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators
2020
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
., in/near-memory processing) for the DNN accelerator design. This paper systematically investigates the interconnection networks in modern DNN accelerator designs. ...
As a result, efficient interconnection and data movement mechanisms for future on-chip artificial intelligence (AI) accelerators are worthy of study. ...
TABLE III SUMMARY III OF DNN ACCELERATORS BASED ON IN/NEAR-MEMORY PROCESSING ...
doi:10.1109/jetcas.2020.3022920
fatcat:idqitgwnrnegbd4dhrly3xsxbi
Processing Data Where It Makes Sense: Enabling In-Memory Computation
[article]
2019
arXiv
pre-print
changes, (2) exploiting the logic layer in 3D-stacked memory technology to accelerate important data-intensive applications. ...
We discuss at least two promising directions for processing-in-memory (PIM): (1) performing massively-parallel bulk operations in memory by exploiting the analog operational properties of DRAM, with low-cost ...
GPU Applications In the last decade, Graphics Processing Units (GPUs) have become the accelerator of choice for a wide variety of data-parallel applications. ...
arXiv:1903.03988v1
fatcat:l2sl2wqwmrejvfbi3sxrpwasby
TOP-PIM
2014
Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14
We also introduce a methodology for rapid design space exploration by analytically predicting performance and energy of in-memory processors based on metrics obtained from execution on today's GPU hardware ...
Our results show that, on average, viable PIM configurations show moderate performance losses (27%) in return for significant energy efficiency improvements (76% reduction in EDP) relative to a representative ...
ACKNOWLEDGEMENTS We would like to thank Yasuko Eckert and Wei Huang for their input on modeling memory stack thermals. We appreciate the invaluable comments from the anonymous reviewers. ...
doi:10.1145/2600212.2600213
dblp:conf/hpdc/ZhangJLGXI14
fatcat:gfgw5o2kara6jnft3tcadzhclu
Accelerator Architectures —A Ten-Year Retrospective
2018
IEEE Micro
It has been an exciting ten years since we coedited an IEEE MICRO special issue titled "Accelerator Architectures" in 2008, shortly after NVIDIA launched the CUDA architecture for GPU Computing. ...
The article also articulates the importance of education for growing the adoption of accelerators. ...
For example, according to Microsoft, 9 there are more than one million servers with FPGA accelerators in the Azure Cloud data centers. ...
doi:10.1109/mm.2018.2877839
fatcat:dpers46ctva4xhmjwwhtlqoyei
Near-Memory Computing: Past, Present, and Future
[article]
2019
arXiv
pre-print
The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. ...
At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory --- called near-memory computing (NMC) --- more viable. ...
ACKNOWLEDGMENT This work was performed in the framework of Horizon 2020 program for the project "Near-Memory Computing (Ne-MeCo)" and is funded by European Commission under Marie Sklodowska-Curie Innovative ...
arXiv:1908.02640v1
fatcat:nvppe5zx2vbb5k4phfi4o5qoqq
An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System
[article]
2022
arXiv
pre-print
., with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. ...
., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. ...
HBM-PIM features Single Instruction Multiple Data (SIMD) units, which support multiply-add and multiply-accumulate operations, near the banks in HBM layers [171, 172] , and it is designed to accelerate ...
arXiv:2207.07886v1
fatcat:lf3i2nyfznfrhimtns5zjsvkdy
2020 Index IEEE Computer Architecture Letters Vol. 19
2021
IEEE computer architecture letters
-June 2020 72-75 SQL SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD. ...
Newton, ., +, LCA July-Dec. 2020 151-154 Solid state drives SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD. ...
doi:10.1109/lca.2020.3048555
fatcat:rpa2p25anzftjljygpm67ytioe
In particular, k-nearest neighbors (kNN), a cornerstone algorithm in these applications, incurs significant data movement. ...
., GPUs). ...
BEYOND DRAM Prior work has primarily focused on commodity DRAMs as the target for near-data processing in main memory but presents serious challenges when integrating compute into commodity DRAM memory ...
doi:10.1145/2818950.2818984
dblp:conf/memsys/MundoLCO15
fatcat:klhp577ftncphov7mnt4qvfv5e
« Previous
Showing results 1 — 15 out of 1,322 results