Filters








644 Hits in 3.9 sec

Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures

Jorge Gonzalez-Dominguez, Sabela Ramos, Juan Tourino, Bertil Schmidt
2016 IEEE Transactions on Parallel and Distributed Systems  
As these studies are time consuming operations, some tools exploit the characteristics of different hardware accelerators (such as GPUs and Xeon Phi coprocessors) to reduce the runtime.  ...  Nevertheless, all these approaches are not able to efficiently exploit the whole computational capacity of modern clusters that contain both GPUs and Xeon Phi coprocessors.  ...  This work was also supported by the Ministry of Economy and Competitiveness of Spain and FEDER funds of the EU (Project TIN2013-42148-P).  ... 
doi:10.1109/tpds.2015.2460247 fatcat:x4xjzvjmarcwncmjpmq26mlwb4

A Meta-Model Assisted Coprocessor Synthesis Framework for Compiler/Architecture Parameters Customization

Sotirios Xydis, Gianluca Palermo, Vittorio Zaccaria, Cristina Silvano
2013 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013  
Hardware coprocessors are extensively used in modern heterogeneous systems-on-chip (SoC) designs to provide efficient implementation of application-specific functions.  ...  Comparative experimental results, over a set of real-life benchmarks, prove the effectiveness of the proposed approach in terms of quality improvements of the design solutions and exploration runtime reductions  ...  Within this context, Shafer and Wakabayashi [6] proposed a combination of parameter clustering along with an adaptive simulating annealer to accelerate exploration's runtime.  ... 
doi:10.7873/date.2013.143 dblp:conf/date/XydisPZS13 fatcat:2sxeze5nnbhg7ppfhpgvth5ntm

Comparing FPGAs to Graphics Accelerators and the Playstation 2 Using a Unified Source Description

Lee W. Howes, Paul Price, Oskar Mencer, Olav Beckmann, Oliver Pell
2006 2006 International Conference on Field Programmable Logic and Applications  
GPU PS2 ASC GPU ASC ASC PS2 Runtime API Accelerated Application ASC Code FPGA  ...  Field programmable gate arrays (FPGAs), graphics processing units (GPUs) and Sony's PlayStation 2 vector units offer scope for hardware acceleration of applications.  ...  MOTIVATION We consider accelerating software with coprocessors and classify them into custom and general purpose coprocessors.  ... 
doi:10.1109/fpl.2006.311203 dblp:conf/fpl/HowesPMBP06 fatcat:7s2cszwrljforcmadgw4skinii

VOCL-FT

Antonio J. Peña, Wesley Bland, Pavan Balaji
2015 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15  
Popular accelerator programming models rely on offloading computation operations and their corresponding data transfers to the coprocessors, leveraging synchronization points where needed.  ...  CCS Concepts •Software and its engineering → Runtime environments;  ...  Department of Energy, Office of Science, Advanced Scientific Computing Research (SC-21), under contract DE-AC02-06CH11357.  ... 
doi:10.1145/2807591.2807640 dblp:conf/sc/PenaBB15 fatcat:wwfupzyiwfbuzova7rk6jwfvyy

Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators

Azzam Haidar, Yulu Jia, Piotr Luszczek, Stanimire Tomov, Asim YarKhan, Jack Dongarra
2015 Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - ScalA '15  
A wide variety of heterogeneous compute resources are available to modern computers, including multiple sockets containing multicore CPUs, one-or-more GPUs of varying power, and coprocessors such as the  ...  We propose a productive programming model starting from serial code, which achieves parallelism and scalability by using a task-superscalar runtime environment to adapt the computation to the available  ...  the Department of Energy, and the NVIDIA and Intel Corporations.  ... 
doi:10.1145/2832080.2832085 dblp:conf/sc/HaidarJLTYD15 fatcat:ppxzxzbmyvc4rjc6qiarh4kaly

A small and adaptive coprocessor for information flow tracking in ARM SoCs

Muhammad Abdul Wahab, Pascal Cotret, Mounir Nasr Allah, Guillaume Hiet, Arnab Kumar Biswas, Vianney LapOtre, Guy Gogniat
2018 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig)  
These goals are accomplished by taking advantage of a notable feature of ARM CoreSight components (context ID) combined with a custom DIFT coprocessor and RFBlare.  ...  The area overhead of this work is lower than 1% and power overhead is 16.2% on a middle-class Xilinx Zynq SoC.  ...  generating unauthorized behavior.  ... 
doi:10.1109/reconfig.2018.8641695 dblp:conf/reconfig/WahabCAHBLG18 fatcat:zx5elrvskjblhfho6mj3jpk4we

A small and adaptive coprocessor for information flow tracking in ARM SoCs [article]

Muhammad Abdul Wahab, Pascal Cotret, Mounir Nasr Allah, Guillaume Hiet, Arnab Kumar Biswas, Vianney Lapôtre, Guy Gogniat
2018 arXiv   pre-print
These goals are accomplished by taking advantage of a notable feature of ARM CoreSight components (context ID) combined with a custom DIFT coprocessor and RFBlare.  ...  The area overhead of this work is lower than 1% and power overhead is 16.2% on a middle-class Xilinx Zynq SoC.  ...  generating unauthorized behavior.  ... 
arXiv:1812.01541v1 fatcat:re7vqhee7zc7vh5u3mziw5f67a

An interactive codesign environment for domain-specific coprocessors

Patrick Schaumont, Doris Ching, Ingrid Verbauwhede
2006 ACM Transactions on Design Automation of Electronic Systems  
We demonstrate our approach using several designs including an AES encryption coprocessor and a Viterbi decoding coprocessor.  ...  We present a language and design environment called GEZEL that can be used for the design, verification and implementation such coprocessor-based systems.  ...  INTRODUCTION For reasons of energy-efficiency, modern embedded systems use specialized and distributed processing components.  ... 
doi:10.1145/1124713.1124719 fatcat:rgev32aap5cwzapad53v4sq5wa

Porting Feastflow To The Intel Xeon Phi: Lessons Learned

Georgios Goumas
2014 Zenodo  
Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization.  ...  and optimization of two core building block kernels for FEASTFLOW: an axpy vector operation and a sparse matrix-vector multiplication (spmv).  ...  However, utilising modern multi-and many-core architectures as well as hardware accelerators such as GPUs has become state-of-the-art recently.  ... 
doi:10.5281/zenodo.822670 fatcat:qmcxfe6z2fhsnltprheuewrwwa

SPIRIT: Spectral-Aware Pareto Iterative Refinement Optimization for Supervised High-Level Synthesis

Sotirios Xydis, Gianluca Palermo, Vittorio Zaccaria, Cristina Silvano
2015 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
Comparative experimental results demonstrate the effectiveness of the proposed approach in terms of quality improvements of the design solutions and exploration runtime reductions.  ...  , and with points found in high variance regions of the design space, thus improving prediction accuracy.  ...  Concluding, to give also a practical view of the comparison between SPIRIT and Zero-Gradient approaches, we add some data about the exploration runtime 6 to better understand the behavior shown in Figure  ... 
doi:10.1109/tcad.2014.2363392 fatcat:3jxlzphvf5ch5odrhx3zem6eiu

Performance Evaluation of Massively Parallel Systems Using SPECOMP Suite

Dheya Mustafa
2022 Computers  
We present an extensive evaluation study of the performance peaks and scalability of these two modern architectures using SPEC OMP benchmarks.  ...  IBM has developed three generations of Blue Gene supercomputers—Blue Gene/L, P, and Q—that use, at a large scale, low-power processors to achieve high performance.  ...  Conflicts of Interest: The author declares no conflict of interest.  ... 
doi:10.3390/computers11050075 fatcat:4lcuefuno5fwbdibxht43taw74

Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study

Kaixi Hou, Hao Wang, Wu-chun Feng
2014 2014 43rd International Conference on Parallel Processing Workshops  
Perhaps nowhere is this more evident than with the Intel Xeon Phi coprocessor.  ...  Furthermore, the process of optimizing the performance on such platforms is complex and requires architectural expertise.  ...  Modern compiler techniques can automatically vectorize and speed-up a myriad bunch of applications without much modification of algorithms on the modern multicore processors [3] .  ... 
doi:10.1109/icppw.2014.44 dblp:conf/icppw/HouWF14 fatcat:5rn33ekzvncyrpubcrokyy2kpa

Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

Christopher P. Stone, Andrew T. Alferman, Kyle E. Niemeyer
2018 Computer Physics Communications  
The runtimes for both ODE solvers decreased 2.5-2.7x with the SIMD implementations on the host CPU and 4.7-4.9x with the Xeon Phi coprocessor compared to the baseline parallel code.  ...  Two runtime benchmarks were conducted to clearly determine any performance advantage offered by either method: evaluating the right-hand-side source terms in parallel, and integrating a series of constant-pressure  ...  This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1053575.  ... 
doi:10.1016/j.cpc.2018.01.015 fatcat:x2ilsi2nsbal3ccgbno7nhxvzy

Benchmarking Modern Edge Devices for AI Applications

Pilsung KANG, Jongmin JO
2021 IEICE transactions on information and systems  
By comparing the performance with other GPU (graphics processing unit) accelerated systems in different platforms, we assess the computational capability of the modern edge devices featuring a significant  ...  Meanwhile, the paradigm of edge computing has emerged as one of the foremost areas in which applications using the AI technology are being most actively researched, due to its potential benefits and impact  ...  Moreover, most reports examine only a particular subset of modern edge devices, thus lacking a comprehensive comparison between these devices with respect to the AI domain applications.  ... 
doi:10.1587/transinf.2020edp7160 fatcat:4uo7pehd7vbylckgmpoh5s34im

A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics (Extended Version) [article]

Anil Shanbhag, Samuel Madden, Xiangyao Yu
2020 arXiv   pre-print
In this paper, we adopt a model-based approach to understand when and why the performance gains of running queries on GPUs vs on CPUs vary from the bandwidth ratio (which is roughly 16x on modern hardware  ...  There has been significant amount of excitement and recent work on GPU-based database systems.  ...  A single modern GPU can have up to 32 GB of HBM capable of delivering up to 1.2 TBps of memory bandwidth and 14 Tflops of compute.  ... 
arXiv:2003.01178v1 fatcat:6btcxiccwnfa5crpwgzhl26lhm
« Previous Showing results 1 — 15 out of 644 results