A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution
2016
Journal of Real-Time Image Processing
The FPGA design leads to a scalable architecture performing four (4x) times faster than the real-time on low-end Xilinx Virtex 5 devices and sixty-nine times (69x) faster than the real-time on the Virtex ...
The proposed techniques accelerate GPU reconstruction of Ultra-High Definition content, by achieving three (3x) times faster than the real-time performance on mid-range and previous generation devices ...
Table 6 : 6 Quality performance of the SIL-SEABI implementations on CPU, GPU and FPGA platforms. SIL-SEABI Implementations Quality
Platform: CPU
GPU
FPGA
output size (Ref.) ...
doi:10.1007/s11554-016-0619-6
fatcat:3xkd4eex3bexdgjw4p7sjbbmbe
Mapping a data-flow programming model onto heterogeneous platforms
2012
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems - LCTES '12
We demonstrate a working example that maps a pipeline of medical image-processing algorithms onto a prototype heterogeneous platform that includes CPUs, GPUs and FPGAs. ...
In this paper we explore mapping of a high-level macro data-flow programming model called Concurrent Collections (CnC) onto heterogeneous platforms in order to achieve high performance and low energy consumption ...
Acknowledgments We thank the Center for Domain Specific Computing (NSF Expeditions in Computing Award CCF-0926127) that funded this work. ...
doi:10.1145/2248418.2248428
dblp:conf/lctrts/SbirleaZBCS12
fatcat:pt3s2jlcibehho65hstsw65ahm
Energy-efficient FPGA Implementation of the k-Nearest Neighbors Algorithm Using OpenCL
2016
Position Papers of the 2016 Federated Conference on Computer Science and Information Systems
High-level Synthesis (HLS) simplifies FPGA programming by allowing designers to program FPGAs in several high-level languages e.g. C/C++, OpenCL and SystemC. ...
Furthermore, using an FPGA-specific OpenCL coding style and providing appropriate HLS directives can yield an FPGA implementation comparable to a GPU also in terms of execution time. ...
This work is also supported in part by the European Commission through the ECOSCALE project (H2020-ICT-671632). ...
doi:10.15439/2016f327
dblp:conf/fedcsis/MuslimDMLQ16
fatcat:c7gspjezb5ek3dx2hmudvkedkm
Enabling development of OpenCL applications on FPGA platforms
2013
2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors
The increased development time, level of experience needed by the developers, lower turns per day and difficulty involved in faster iterations over designs affect the time-to-market for many solutions. ...
The flow uses Xilinx AutoESL tool to obtain the design specification for compute cores. An architecture provided integrates the cores with memory and host interfaces. ...
The compute devices in a platform can be CPU, GPU, DSP, FPGA or any other accelerator. ...
doi:10.1109/asap.2013.6567546
dblp:conf/asap/ShagrithayaKA13
fatcat:5cb6mpbe35htjax7vn5skzbrwa
A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform
2016
Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '16
This position paper presents the design of a new runtime for a new heterogeneous hardware platform being developed to explore energy efficient, high performance computing. ...
In particular, this work explores the use of FPGAs to achieve both the power and performance goals of exascale, as well as utilising the runtime to automatically effect dynamic configuration and reconfiguration ...
An accelerator may be a CPU, GPU, FPGA, or co-processor such as the Xeon Phi [10] . ...
doi:10.1145/2931088.2931090
dblp:conf/hpdc/HarveyBSN16
fatcat:cr5mxbiwpncynfl3kx6fpx2o7a
Pangaea
2008
Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08
Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multicores, extending the current state-of-the-art CPU-GPU integration that physically ...
We implement Pangaea and the current CPU-GPU designs in fully-functional synthesizable RTL based on the production quality RTL of an IA32 CPU and an Intel GMA X4500 GPU. ...
Henry Wong and Tor Aamodt are partly supported by the Natural Sciences and Engineering Research Council of Canada. ...
doi:10.1145/1454115.1454125
dblp:conf/IEEEpact/WongBSACWCGJW08
fatcat:p37zbpaobza7pngzkxogk37fyy
Towards facilities for modeling and synthesis of architectures for resource allocation problem in systems engineering
2020
Proceedings of the 24th ACM Conference on Systems and Software Product Line: Volume A - Volume A
Exploring architectural design space is often beyond human capacity and makes architectural design a difficult task. ...
More specifically, this work reports on the use of the Clafer modeling language and its gateway to the CSP Choco Solver, on an industrial case study of heterogeneous hardware resource allocation (GPP-GPGPU-FPGA ...
This work discusses a possible approach to compute allocation schemes for hardware platforms with CPUs, GPUs and FPGAs nodes. ...
doi:10.1145/3382025.3414963
dblp:conf/splc/CreffNLM20
fatcat:tmwmzafabfadlo32ygqytsemha
Optimizing CNN-based Hyperspectral Image Classification on FPGAs
[article]
2019
arXiv
pre-print
Besides, previous CNN models used in HSI are not specially designed for efficient implementation on embedded devices such as FPGAs. ...
A customized architecture which enables the proposed algorithm to be mapped effectively onto FPGA resources is then proposed to support real-time on-board classification with low power consumption. ...
Besides, we propose and optimize the hardware architecture to accelerate our proposed network in FPGA by parallel processing, data pre-fetching and design space exploration. ...
arXiv:1906.11834v1
fatcat:arcbhexooja6hhmm4j5z4sgbei
Programming Heterogeneous Systems from an Image Processing DSL
[article]
2016
arXiv
pre-print
Using its FPGA with two low-power ARM cores, our design achieves up to 6x higher performance and 8x lower energy compared to the quad-core ARM CPU on an NVIDIA Tegra K1, and 3.5x higher performance with ...
We address this problem by extending the image processing language, Halide, so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler ...
Lime [2] goes a step further by providing a unified language for CPU, GPU, and FPGA, with semantics to delineate boundaries between computation blocks. ...
arXiv:1610.09405v1
fatcat:p2qq2gcifnez7mtrswcl2h2vfy
Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS
2014
2014 International Conference on Field-Programmable Technology (FPT)
FPGA. ...
The time required for initial hardware compilation of these TILT designs and configuration of the target application onto the overlay is roughly comparable to the compile times of the OpenCL HLS designs ...
The host offloads the parallel compute intensive second portion defined within kernels onto accelerator(s) such as CPUs, GPUs and recently FPGAs [7] . ...
doi:10.1109/fpt.2014.7082748
dblp:conf/fpt/RashidSB14
fatcat:4xo72zlg6fgh7nj4v63pc2yqe4
Apps with Hardware: Enabling Run-time Architectural Customization in Smart Phones
2016
USENIX Annual Technical Conference
We present our prototype smart phone using the Zedboard, which pairs a Xilinx Zynq FPGA with an embedded Cortex A9, running an Android-based system which we extended to provide run-time system support ...
We introduce a novel mechanism to enable sharing the FPGA in a practical manner by leveraging the unique deployment model of mobile applications -namely that deployment is via an app store, where we introduce ...
This research was supported in part by NSF SaTC grant number 1406192. ...
dblp:conf/usenix/CoughlinIK16
fatcat:l2327ch37vgppj6hhgbaq3dyyy
Efficient Machine Learning, Compilers, and Optimizations for Embedded Systems
[article]
2022
arXiv
pre-print
Challenges also come from the diverse application-specific requirements, including real-time responses, high-throughput performance, and reliable inference accuracy. ...
Deep Neural Networks (DNNs) have achieved great success in a massive number of artificial intelligence (AI) applications by delivering high-quality computer vision, natural language processing, and virtual ...
There is a great amount of hardware-aware work, each of which often adopts a specific hardware device (CPU, GPU, embedded/mobile device) and requires a different hardware-cost metric (e.g., prioritizes ...
arXiv:2206.03326v1
fatcat:th66tbqxibez7hmctl2ytdiroa
Parallel Programming Models for Heterogeneous Many-Cores : A Survey
[article]
2020
arXiv
pre-print
While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to ...
Intel has been developing oneAPI that includes DPC++ (an implementation of SYCL with extensions) for its CPUs, GPUs and FPGAs [27] . ...
Recently, Intel has turned to implementing OpenCL for its CPUs, GPUs and FPGAs, and made its partial implementation open to the public [22] . ...
arXiv:2005.04094v1
fatcat:e2psrdnyajh3hih3znnjjbezae
Dynamic SIMD Parallel Execution on GPU from High-Level Dataflow Synthesis
2022
Journal of Low Power Electronics and Applications
Nonetheless, such a design method might not be enough on its own to achieve the desired performance goals, and supporting tools are useful to be able to efficiently explore the design space so as to optimize ...
Developing and fine-tuning software programs for heterogeneous hardware such as CPU/GPU processing platforms comprise a highly complex endeavor that demands considerable time and effort of software engineers ...
CPU GPU ...
doi:10.3390/jlpea12030040
fatcat:2rhk5lszrrcxxdvgaolihqc57m
HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation
[article]
2020
arXiv
pre-print
Novel techniques include a highly flexible and scalable architecture with a hybrid Spatial/Winograd convolution (CONV) Processing Engine (PE), a comprehensive design space exploration tool, and a complete ...
Experimental results show that the accelerators generated by HybridDNN can deliver 3375.7 and 83.3 GOPS on a high-end FPGA (VU9P) and an embedded FPGA (PYNQ-Z1), respectively, which achieve a 1.8x higher ...
ACKNOWLEDGMENTS This work is supported in part by the IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) and XMotors.ai. ...
arXiv:2004.03804v1
fatcat:2r7ymftbordw5odrfndowsuxg4
« Previous
Showing results 1 — 15 out of 360 results