722,817 Hits in 2.6 sec

Accelerator-level Parallelism [article]

Mark D. Hill, Vijay Janapa Reddi
2021 arXiv   pre-print
Already today, mobile systems concurrently employ multiple accelerators in what we call accelerator-level parallelism (ALP).  ...  A promising approach to further improve computer system performance under energy constraints is to employ hardware accelerators.  ...  In this viewpoint and in Figure 1 , we assert that another major parallelism level is emerging: Accelerator-Level Parallelism (ALP) .  ... 
arXiv:1907.02064v5 fatcat:kgxkrmv4yjfkhcteg5ttq7zs4u

Exploiting Memory-Level Parallelism in Reconfigurable Accelerators

Shaoyi Cheng, Mingjie Lin, Hao Jun Liu, Simon Scott, John Wawrzynek
2012 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines  
As memory accesses increasingly limit the overall performance of reconfigurable accelerators, it is important for high level synthesis (HLS) flows to discover and exploit memory-level parallelism.  ...  This paper develops 1) a framework where parallelism between memory accesses can be revealed from runtime profile of applications and provided to a high level synthesis flow, and 2) a novel multiaccelerator  ...  Consequently, only limited opportunities for parallelizing memory accesses can be exposed. Memory-level Parallelism Discovery.  ... 
doi:10.1109/fccm.2012.35 dblp:conf/fccm/ChengLLSW12 fatcat:lhi434zzhrckzoyfuhez7vgruy

Parallel Level set algorithm with MPI and accelerated on GPU [article]

Zhenlin Wang
2016 arXiv   pre-print
This work also presents a GPU acceleration for solving level-set PDE using finite difference method.  ...  Due to the unknown evolving interface, narrow band algorithm brings load balance problem for parallelizing computing. This work presents a tool for evenly distributing work loads on CPUs.  ...  Acceleration by GPU numerically solving level-set PDE using finite difference method only needs simple operations but on large grid points. This part of work is suitable for GPU acceleration.  ... 
arXiv:1612.04040v1 fatcat:s7eowx4z5rfbvh7sz633z7t2la

Accelerating Meta Data Checks for Software Correctness and Security

Weihaw Chuang, Satish Narayanasamy, Brad Calder
2007 Journal of Instruction-Level Parallelism  
Patil and Fischer [15] provided bounds and dangling pointers checks using a second (shadow) processor running on a separate co-processor to accelerate checking.  ...  We also consider the performance trade-offs at the micro-architectural level for placing the meta data along with the pointer versus the object.  ... 
dblp:journals/jilp/ChuangNC07 fatcat:6o3qlpin6fhwhnx2wkslaihluy

Trireme: Exploring Hierarchical Multi-Level Parallelism for Domain Specific Hardware Acceleration [article]

Georgios Zacharopoulos, Adel Ejjeh, Ying Jing, En-Yu Yang, Tianyu Jia, Iulian Brumar, Jeremy Intan, Muhammad Huzaifa, Sarita Adve, Vikram Adve, Gu-Yeon Wei, David Brooks
2022 arXiv   pre-print
To assist the design process and expose every possible level of parallelism, we present Trireme, a fully automated tool-chain that explores multiple levels of parallelism and produces domain specific accelerator  ...  Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level and pipeline parallelism.  ...  major forms of parallelism (loop level, task level and pipeline parallelism) relevant to accelerator design.  ... 
arXiv:2201.08603v1 fatcat:7yfobur46rdpbij72cpjpj2mnu

Two-Level Reorder Buffers: Accelerating Memory-Bound Applications on SMT Architectures

Jason Loew, Dmitry Ponomarev
2008 2008 37th International Conference on Parallel Processing  
We propose a low complexity mechanism for accelerating memory-bound threads on SMT processors without adversely impacting the performance of other concurrently running applications.  ...  Our results demonstrate about 30% improvement over DCRA resource distribution mechanism in terms of "harmonic mean of weighted IPCs" metric. 37th International Conference on Parallel Processing 0190-3918  ...  Parallelism (ILP).  ... 
doi:10.1109/icpp.2008.24 dblp:conf/icpp/LoewP08 fatcat:3pvoy7lnvzgivbnkcqgpch6mxm

Acceleration of first and higher order recurrences on processors with instruction level parallelism [chapter]

Michael Schlansker, Vinod Kathail
1994 Lecture Notes in Computer Science  
This report describes parallelization techniques for accelerating a broad class of recurrences on processors with instruction level parallelism.  ...  for processors with limited instruction level parallelism.  ...  The work focuses primarily on the acceleration of recurrences for multiprocessors. We treat specifically the acceleration of recurrences on uniprocessors with instruction level parallelism.  ... 
doi:10.1007/3-540-57659-2_24 fatcat:entbv5o3ovc3thh6fivabfu4je

Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems

Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabaleswar K. Panda
2009 2009 International Conference on Parallel Processing  
Checkpoint/Restart is becoming increasingly important for large scale parallel jobs.  ...  As a result, deployment of Checkpoint/Restart mechanisms for large scale parallel applications is limited.  ...  Transparent application-level checkpointing may be achieved through compiler techniques [15] .  ... 
doi:10.1109/icpp.2009.73 dblp:conf/icpp/OuyangGP09 fatcat:lxvsfr2l35g7zmk4ibyvbfelk4

ZIPPER: Exploiting Tile- and Operator-level Parallelism for General and Scalable Graph Neural Network Acceleration [article]

Zhihui Zhang, Jingwen Leng, Shuwen Lu, Youshan Miao, Yijia Diao, Minyi Guo, Chao Li, Yuhao Zhu
2021 arXiv   pre-print
Besides, the semantics gap between the high-level GNN programming model and efficient hardware makes it difficult in accelerating general-domain GNNs.  ...  To address the challenge, we propose Zipper, an efficient yet general acceleration system for GNNs.  ...  CONCLUSION In this work, we propose ZIPPER, a general and scalable GNN acceleration system that implements the inter-tile pipelining to exploit the tile-and operation-level parallelism.  ... 
arXiv:2107.08709v1 fatcat:2vejfeqgnzanbit3bg6rk5vife

Algorithm-level Feedback-controlled Adaptive data prefetcher: Accelerating data access for high-performance processors

Yong Chen, Huaiyu Zhu, Hui Jin, Xian-He Sun
2012 Parallel Computing  
A general and effective data prefetcher and accelerator must be dynamic in nature.  ...  DAHC is a new cache structure designed for data prefetching and data-access acceleration.  ...  Last, to our knowledge, this study is the first work exploring algorithm-level dynamic adaptation to accelerate data access depending on applications' access pattern.  ... 
doi:10.1016/j.parco.2012.06.002 fatcat:cvphilrl5fb27junstmpma5eaa

Implementation of stereo matching using a high level compiler for parallel computing acceleration

Jinglin Zhang, Jean-Francois Nezan, Jean-Gabriel Cousin, Erwan Raffin
2012 Proceedings of the 27th Conference on Image and Vision Computing New Zealand - IVCNZ '12  
Heterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators.  ...  And the HMPP workbench can greatly reduce the time of application development using parallel computing device.  ...  In order to satisfy the demand of real time, it must be with the benefit of hardware acceleration, especially data parallel architectures-GPU.  ... 
doi:10.1145/2425836.2425892 dblp:conf/ivcnz/ZhangNCR12 fatcat:twofvuppxvdqbhmc7ff6a3u33a

High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers

Kamalavasan Kamalakkannan, Gihan R. Mudalige, Istvan Z. Reguly, Suhaib A. Fahmy
2021 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)  
It leverages key characteristics of the application class and its computation-communication pattern and the architectural capabilities of the FPGA to accelerate solvers for high-performance computing applications  ...  However, these tools still require low level modification of code to produce accelerators with optimum performance.  ...  A number of previous works have also utilized high-level frameworks for generating efficient FPGA accelerators.  ... 
doi:10.1109/ipdps49936.2021.00117 fatcat:l72imzolbresvdhtamaxaxhniu

Fast Parallel High-Level Synthesis Design Space Explorer: Targeting FPGAs to accelerate ASIC Exploration

Md Imtiaz Rashid, Benjamin Carrion Schafer
2022 Proceedings of the Great Lakes Symposium on VLSI 2022  
One additional way to accelerate the exploration process is by parallelizing the explorer tcreating multi-threaded versions.  ...  To address this, in this work we present a dedicated multi-threaded parallel HLS DSE explorer that is able to accelerate HLS DSE for ASICs by targeting first FPGAs and using machine learning to convert  ...  Finally, (3) considering that FPGA HLS tools are free can we create a dedicated parallel explorer to further accelerate the exploration process?  ... 
doi:10.1145/3526241.3530339 fatcat:fdgduhabtfhctdufsaxosgbalu

GPU-accelerated hydraulic simulations of large-scale natural gas pipeline networks based on a two-level parallel process

Yue Xiang, Peng Wang, Bo Yu, Dongliang Sun
2020 Oil & Gas Science and Technology  
First, based on the Decoupled Implicit Method for Efficient Network Simulation (DIMENS) method, presented in our previous study, a novel two-level parallel simulation process and the corresponding parallel  ...  Then, the implementation of the two-level parallel simulation in GPU is introduced in detail. Finally, some numerical experiments are provided to test the performance of the proposed method.  ...  Therefore, the two-level parallel simulation process is designed to be coarse and fine level processes.  ... 
doi:10.2516/ogst/2020076 fatcat:bpjqlbuldzbynhqkmysaikwnam

High-level Performance Evaluation of Object Detection based on Massively Parallel Focal-plane Acceleration Requiring Minimum Pixel Area Overhead

Eloy Parra-Barrero, Jorge Fernández-Berni, Fernanda D. V. R. Oliveira, Ricardo Carmona-Galán, Ángel Rodríguez-Vázquez
2016 Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications  
Smart CMOS image sensors can leverage the inherent data-level parallelism and regular computational flow of early vision by incorporating elementary processors at pixel level.  ...  A performance evaluation of the proposed scheme in terms of accuracy and acceleration for face detection is reported.  ...  CONCLUSIONS We have described a massively parallel focal-plane processing architecture capable of rendering useful image representations for object detection acceleration.  ... 
doi:10.5220/0005651200790085 dblp:conf/visapp/Parra-BarreroFO16 fatcat:fn5dmuv4kvbjnjkawlafrupte4
« Previous Showing results 1 — 15 out of 722,817 results