1,322 Hits in 3.5 sec

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning [article]

Youngeun Kwon, Yunjae Lee, Minsoo Rhu
2019 arXiv   pre-print
We present our vertically integrated hardware/software co-design, which includes a custom DIMM module enhanced with near-data processing cores tailored for DL tensor operations.  ...  These custom DIMMs are populated inside a GPU-centric system interconnect as a remote memory pool, allowing GPUs to utilize for scalable memory bandwidth and capacity expansion.  ...  for near-memory tensor operations.  ... 
arXiv:1908.03072v2 fatcat:yiwl72jovnhkniwtn6cg3owdfy

Application-Driven Near-Data Processing for Similarity Search [article]

Vincent T. Lee, Amrita Mazumdar, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin
2017 arXiv   pre-print
This paper proposes an application-driven near-data processing accelerator for similarity search: the Similarity Search Associative Memory (SSAM).  ...  Similarity search is a key to a variety of applications including content-based search for images and video, recommendation systems, data deduplication, natural language processing, computer vision, databases  ...  RELATED WORK The concept of near-data processing has been studied in the literature for decades.  ... 
arXiv:1606.03742v2 fatcat:tgyyr4avubbzjmr7pz7obiqmle

Memory Management Techniques in Heterogeneous Computing Systems

Anakhi Hazarika, Soumyajit Poddar, Hafizur Rahaman
2019 IET Computers & Digital Techniques  
Change in dynamic random access memory (DRAM) architecture, integration of memory-centric hardware accelerator in the heterogeneous system and Processing-in-Memory (PIM) are the techniques adopted from  ...  The CPU/GPU processes all the data on a computer's memory and hence the speed of the data movement to/from memory and the size of the memory affect computer speed.  ...  In this architecture, the memory controller is placed near memory to reduce the data transfer overheads caused by SIMD (single-instruction-multiple-data) processors.  ... 
doi:10.1049/iet-cdt.2019.0092 fatcat:vnsfc6twxncxdfav4bzhxdxn64

GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing [article]

Zhe Zhou and Cong Li and Xuechao Wei and Guangyu Sun
2021 arXiv   pre-print
Recently, Graph Convolutional Networks (GCNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data.  ...  Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory to buffer the intermediate data for back-propagation. 2) GCN training involves both memory-intensive data  ...  DRAM-based Near-Memory Processing Many prior works propose 3D/2.5D-stacked memory based NMP accelerators for graph processing [40] - [44] , DNN acceleration [83] - [91] , or general-purpose applications  ... 
arXiv:2111.00680v1 fatcat:3inhvousrvdkbcxtcrxfxiplwm

A Modern Primer on Processing in Memory [article]

Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, Rachata Ausavarungnirun
2022 arXiv   pre-print
PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, or in the memory controllers), so that data movement between  ...  The emergence of 3D-stacked memory plus logic, the adoption of error correcting codes inside the latest DRAM chips, proliferation of different main memory standards and chips, specialized for different  ...  Symposium on Advanced Parallel Processing Technology in August 2019 [471] , and as a keynote talk at the 37th IEEE International Conference on Computer Design in November 2019 [222] .  ... 
arXiv:2012.03112v2 fatcat:z7rsgn54mjbmncv5ezq4u3aqqm

A Case Study of Processing-in-Memory in off-the-Shelf Systems

Joel Nider, Craig Mustard, Andrada Zoltan, John Ramsden, Larry Liu, Jacob Grossbard, Mohammad Dashti, Romaric Jodin, Alexandre Ghiti, Jordi Chauzi, Alexandra Fedorova
2021 USENIX Annual Technical Conference  
Systems designed to perform computing in or near memory have been proposed for decades to overcome the proverbial memory wall, yet most never made it past blueprints or simulations.  ...  This property helps some applications defy the von-Neumann bottleneck, while for others, architectural limitations stand in the way of reaching the hardware potential. Our analysis explains why.  ...  A special thank you to Vincent Palatin for his patience and support explaining the details of this new hardware.  ... 
dblp:conf/usenix/NiderMZRLG0JGCF21 fatcat:oyu474s6zjfm7nxogzsxsifqey

McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge

Seunghwan Cho, Haerang Choi, Eunhyeok Park, Hyunsung Shin, Sungjoo Yoo
2020 IEEE Access  
The server-class Titan RTX GPU [34] , which is the fastest GPU at int4 precision, has approximately 30 MB of on-chip SRAM cache/register file/shared memory and a complicated datapath architecture for  ...  -We propose a novel dataflow for the judicious utilization of the existing in-DRAM bus to provide the systolic array accelerator with input data for matrix-matrix multiplication.  ... 
doi:10.1109/access.2020.3011265 fatcat:hmfnggvh7nbglef3dkzkuxjloa

An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators

Seyed Morteza Nabavinejad, Mohammad Baharloo, Kun-Chih Chen, Maurizio Palesi, Tim Kogel, Masoumeh Ebrahimi
2020 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
., in/near-memory processing) for the DNN accelerator design. This paper systematically investigates the interconnection networks in modern DNN accelerator designs.  ...  As a result, efficient interconnection and data movement mechanisms for future on-chip artificial intelligence (AI) accelerators are worthy of study.  ...  TABLE III SUMMARY III OF DNN ACCELERATORS BASED ON IN/NEAR-MEMORY PROCESSING  ... 
doi:10.1109/jetcas.2020.3022920 fatcat:idqitgwnrnegbd4dhrly3xsxbi

Processing Data Where It Makes Sense: Enabling In-Memory Computation [article]

Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, Rachata Ausavarungnirun
2019 arXiv   pre-print
changes, (2) exploiting the logic layer in 3D-stacked memory technology to accelerate important data-intensive applications.  ...  We discuss at least two promising directions for processing-in-memory (PIM): (1) performing massively-parallel bulk operations in memory by exploiting the analog operational properties of DRAM, with low-cost  ...  GPU Applications In the last decade, Graphics Processing Units (GPUs) have become the accelerator of choice for a wide variety of data-parallel applications.  ... 
arXiv:1903.03988v1 fatcat:l2sl2wqwmrejvfbi3sxrpwasby


Dongping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, Michael Ignatowski
2014 Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14  
We also introduce a methodology for rapid design space exploration by analytically predicting performance and energy of in-memory processors based on metrics obtained from execution on today's GPU hardware  ...  Our results show that, on average, viable PIM configurations show moderate performance losses (27%) in return for significant energy efficiency improvements (76% reduction in EDP) relative to a representative  ...  ACKNOWLEDGEMENTS We would like to thank Yasuko Eckert and Wei Huang for their input on modeling memory stack thermals. We appreciate the invaluable comments from the anonymous reviewers.  ... 
doi:10.1145/2600212.2600213 dblp:conf/hpdc/ZhangJLGXI14 fatcat:gfgw5o2kara6jnft3tcadzhclu

Accelerator Architectures —A Ten-Year Retrospective

Wen-mei Hwu, Sanjay Patel
2018 IEEE Micro  
It has been an exciting ten years since we coedited an IEEE MICRO special issue titled "Accelerator Architectures" in 2008, shortly after NVIDIA launched the CUDA architecture for GPU Computing.  ...  The article also articulates the importance of education for growing the adoption of accelerators.  ...  For example, according to Microsoft, 9 there are more than one million servers with FPGA accelerators in the Azure Cloud data centers.  ... 
doi:10.1109/mm.2018.2877839 fatcat:dpers46ctva4xhmjwwhtlqoyei

Near-Memory Computing: Past, Present, and Future [article]

Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, Albert-Jan Boonstra
2019 arXiv   pre-print
The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse.  ...  At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory --- called near-memory computing (NMC) --- more viable.  ...  ACKNOWLEDGMENT This work was performed in the framework of Horizon 2020 program for the project "Near-Memory Computing (Ne-MeCo)" and is funded by European Commission under Marie Sklodowska-Curie Innovative  ... 
arXiv:1908.02640v1 fatcat:nvppe5zx2vbb5k4phfi4o5qoqq

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System [article]

Juan Gómez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu
2022 arXiv   pre-print
., with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck.  ...  ., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles.  ...  HBM-PIM features Single Instruction Multiple Data (SIMD) units, which support multiply-add and multiply-accumulate operations, near the banks in HBM layers [171, 172] , and it is designed to accelerate  ... 
arXiv:2207.07886v1 fatcat:lf3i2nyfznfrhimtns5zjsvkdy

2020 Index IEEE Computer Architecture Letters Vol. 19

2021 IEEE computer architecture letters  
-June 2020 72-75 SQL SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD.  ...  Newton, ., +, LCA July-Dec. 2020 151-154 Solid state drives SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD.  ... 
doi:10.1109/lca.2020.3048555 fatcat:rpa2p25anzftjljygpm67ytioe


Carlo C. del Mundo, Vincent T. Lee, Luis Ceze, Mark Oskin
2015 Proceedings of the 2015 International Symposium on Memory Systems - MEMSYS '15  
In particular, k-nearest neighbors (kNN), a cornerstone algorithm in these applications, incurs significant data movement.  ...  ., GPUs).  ...  BEYOND DRAM Prior work has primarily focused on commodity DRAMs as the target for near-data processing in main memory but presents serious challenges when integrating compute into commodity DRAM memory  ... 
doi:10.1145/2818950.2818984 dblp:conf/memsys/MundoLCO15 fatcat:klhp577ftncphov7mnt4qvfv5e
« Previous Showing results 1 — 15 out of 1,322 results