10,309 Hits in 5.5 sec

HipaccVX: Wedding of OpenVX and DSL-based Code Generation [article]

M. Akif Özkan, Burak Ok, Bo Qiao, Jürgen Teich, Frank Hannig
2020 arXiv   pre-print
In this paper, we analyze OpenVX vision functions to find an orthogonal set of computational abstractions.  ...  Yet, the OpenVX' algorithm space is constrained to a small set of vision functions. This hinders accelerating computations that are not included in the standard.  ...  A kernel in OpenVX is the abstract representation of a computer vision function [32] .  ... 
arXiv:2008.11476v1 fatcat:e4yyu4ei7nayjma5rpt6p3nmei

HipaccVX: wedding of OpenVX and DSL-based code generation

M. Akif Özkan, Burak Ok, Bo Qiao, Jürgen Teich, Frank Hannig
2020 Journal of Real-Time Image Processing  
In this paper, we analyze OpenVX vision functions to find an orthogonal set of computational abstractions.  ...  Yet, the OpenVX ' algorithm space is constrained to a small set of vision functions. This hinders accelerating computations that are not included in the standard.  ...  In this way, we achieve performance portability not only for OpenVX ' CV functions but also for userdefined kernels 2 that are expressed with these computational abstractions.  ... 
doi:10.1007/s11554-020-01015-5 fatcat:iowzgiohnvc3beo4at6aamcb5y

A Low-Memory, Straightforward and Fast Bilateral Filter Through Subsampling in Spatial Domain

Francesco Banterle, Massimiliano Corsini, Paolo Cignoni, Roberto Scopigno
2011 Computer graphics forum (Print)  
We show different applications of the proposed filter, in particular efficient cross-bilateral filtering, real-time edge-aware image editing and fast video denoising.  ...  In this work we present a new algorithm for accelerating the colour bilateral filter based on a subsampling strategy working in the spatial domain.  ...  Acknowledgements We greatly thank Marco Di Benedetto, Andrew Adams and Jiawen Chen for their help with OpenGL code and compiling their filters' implementations.  ... 
doi:10.1111/j.1467-8659.2011.02078.x fatcat:iyotsl35o5eg3nbozmereyk6si

ADRENALINE: An OpenVX Environment to Optimize Embedded Vision Applications on Many-core Accelerators

Giuseppe Tagliavini, Germain Haugou, Andrea Marongiu, Luca Benini
2015 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip  
The acceleration of Computer Vision algorithms is an important enabler to support the more and more pervasive applications of the embedded vision domain.  ...  In this work we introduce ADRENALINE 1 , a novel framework for fast prototyping and optimization of OpenVX applications for heterogeneous SoCs with many-core accelerators.  ...  OpenVX [18] has been introduced as a cross-platform standard for imaging and vision application domains, with the aim to raise significantly the level of abstraction at which CV applications should be  ... 
doi:10.1109/mcsoc.2015.45 dblp:conf/mcsoc/TagliaviniHMB15 fatcat:cr3vfu7mqvhq5jxigsh3iltfp4

OpenVX-Based Python Framework for Real-time Cross-Platform Acceleration of Embedded Computer Vision Applications

Ori Heimlich, Elishai Ezra Tsur
2016 Frontiers in ICT  
With OpenVX, Vision processing is modeled with coarse-grained data flow graphs, which can be optimized and accelerated by the platform implementer.  ...  OpenVX is a standardized interface, released in late 2014, in an attempt to provide both system and kernel level optimization to vision applications.  ...  to simplify or enrich low level implementation with modern approaches.  ... 
doi:10.3389/fict.2016.00028 fatcat:7srx6qr6hje6nkjel4c7wijkie

PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs [article]

Riyadh Baghdadi, Albert Cohen, Serge Guelton, Sven Verdoolaege, Jun Inoue, Tobias Grosser, Georgia Kouveli, Alexey Kravets, Anton Lokhmotov, Cedric Nugteren, Fraser Waters, Alastair F. Donaldson
2013 arXiv   pre-print
We motivate the design and implementation of a platform-neutral compute intermediate language (PENCIL) for productive and performance-portable accelerator programming.  ...  Second, we will investigate the use of directives and extensions in cross-component optimizations, where dependence information associated with several computational kernels is collectively exploited to  ...  The main computational kernel adds to each cell all data coming in from its edges; we wish to do this for all cells.  ... 
arXiv:1302.5586v1 fatcat:vwbfqcou7ncslinc4rstbq5boy

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Kalhan Koul, Jackson Melchert, Kavya Sreedhar, Leonard Truong, Gedeon Nyengele, Keyi Zhang, Qiaoyi Liu, Jeff Setter, Po-Han Chen, Yuchen Mei, Maxwell Strange, Ross Daly (+21 others)
2022 ACM Transactions on Embedded Computing Systems  
With the slowing of Moore's law, computer architects have turned to domain-specific hardware specialization to continue improving the performance and efficiency of computing systems.  ...  This enables the creation of design-space exploration frameworks that automatically generate accelerator architectures that approach the efficiencies of hand-designed accelerators, with a significantly  ...  Vertices are terminals, and directed edges are wired connections. Vertices can have multiple incoming edges, which abstracts away low-level multiplexers. Each vertex can be annotated with attributes.  ... 
doi:10.1145/3534933 fatcat:atsai6sto5e67cyx7tyg2bl5w4

Hardware/Software Co-Programmable Framework for Computational SSDs to Accelerate Deep Learning Service on Large-Scale Graphs [article]

Miryeong Kwon, Donghyun Gouk, Sangwon Lee, Myoungsoo Jung
2022 arXiv   pre-print
Graph neural networks (GNNs) process large-scale graphs consisting of a hundred billion edges.  ...  We fabricate HolisticGNN's hardware RTL and implement its software on an FPGA-based computational SSD (CSSD).  ...  Note that, it would be possible to use cross-platform abstraction platforms, such as OpenCL [69] or SYCL [60] , rather than using RPC.  ... 
arXiv:2201.09189v1 fatcat:pgambrqnjvcenncyvzuyii6gzm

Pushing the Level of Abstraction of Digital System Design: a Survey on How to Program FPGAs

Emanuele Del Sozzo, Davide Conficconi, Alberto Zeni, Mirko Salaris, Donatella Sciuto, Marco D. Santambrogio
2022 ACM Computing Surveys  
Field Programmable Gate Arrays (FPGAs) are spatial architectures with a heterogenous reconfigurable fabric.  ...  They are state-of-the-art for prototyping, telecommunications, embedded, and an emerging alternative for cloud-scale acceleration.  ...  ACKNOWLEDGEMENTS The authors are grateful for feedbacks from Reviewers and NECSTLab members, with a particular mention to A. Damiani, A. Parravicini, E. D'Arnese, F. Carloni, F. Peverelli, and R.  ... 
doi:10.1145/3532989 fatcat:nsk5lwvt3vba5fbxmaj7sgpwru

NN2CAM: Automated Neural Network Mapping for Multi-Precision Edge Processing on FPGA-Based Cameras [article]

Petar Jokic, Stephane Emery, Luca Benini
2021 arXiv   pre-print
In contrast to prior work, the accelerator is purely logic and thus supports end-to-end processing on FPGAs without on-chip microprocessors.  ...  To present the performance of the system we employ this tool to implement two CNN edge processing networks on an FPGA-based high-speed camera with various precision settings showing computational throughputs  ...  NN MAPPING FRAMEWORK Mapping a trained neural network onto an FPGA platform for on-board inference requires multiple layers of abstraction to be crossed.  ... 
arXiv:2106.12840v1 fatcat:mgbrgszixfdpvajtbkmmga6f2m

The HighPerMeshes framework for numerical algorithms on unstructured grids

Samer Alhaddad, Jens Förstner, Stefan Groth, Daniel Grünewald, Yevgen Grynko, Frank Hannig, Tobias Kenter, Franz‐Josef Pfreundt, Christian Plessl, Merlind Schotte, Thomas Steinke, Jürgen Teich (+2 others)
2021 Concurrency and Computation  
A code generator and a matching back end allow the acceleration of HighPerMeshes code with GPUs. Finally, the achievable performance and scalability are demonstrated for different example problems.  ...  The mapping of the abstract algorithmic description onto parallel hardware, including distributed memory compute clusters, is presented.  ...  The authors gratefully acknowledge the funding of this project by computing time provided by the Paderborn Center for Parallel Computing (PC 2 ).  ... 
doi:10.1002/cpe.6616 fatcat:qhyzqcnqhraf3mhf4bkvp72psm

Modulo scheduling for highly customized datapaths to increase hardware reusability

Kevin Fan, Hyun hul Park, Manjunath Kudlur, S ott Mahlke
2008 Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization - CGO '08  
The scheduler is able to target accelerators with widely varying levels of datapath functional capability and connectivity, and thus, varying degrees of programmability.  ...  This paper proposes a constraint-driven modulo scheduler that maps softwarepipelineable loops onto programmable loop accelerator hardware.  ...  ACKNOWLEDGMENTS We thank Yuanyuan Tian for her help with quantifying graph similarity, as well as the anonymous referees who provided excellent feedback.  ... 
doi:10.1145/1356058.1356075 dblp:conf/cgo/FanPKM08 fatcat:3julctovlzfktfhjraym5djny4

Investigating Performance Portability Of A Highly Scalable Particle-In-Cell Simulation Code On Various Multi-Core Architectures

Benjamin Worpitz, Prof. Dr. Wolfgang E. Nagel, Dr. Michael Bussmann, Dr. Guido Juckeland, Dr. Andreas Knüpfer, Dr. Bernd Trenkler
2015 Zenodo  
The alpaka library defines and implements an abstract hierarchical redundant parallelism model.  ...  This allows to achieve portability of performant codes across various types of accelerators by ignoring specific unsupported levels and utilizing only the ones supported on a specific accelerator.  ...  OpenCL does not support dynamic allocation of memory (new, delete) in kernel code. SYCL 10 is a cross-platform abstraction layer based on OpenCL.  ... 
doi:10.5281/zenodo.49768 fatcat:gw53fnzwxfa53n2xqg6dpohqle

Intelligent Detection of Vehicle Driving Safety Based on Deep Learning

Deyun Wang, Deepak Kumar Jain
2022 Wireless Communications and Mobile Computing  
When the image resolution is consistent with the feature extraction model, the average accuracy of Deconv-SSD is compared with the original SSD algorithm in the PASCALVOC public dataset, from 77.2% to  ...  In the self-made seat belt detection dataset, Squeeze-YOLO can reach 73 FPS when the average accuracy is 99.96%, the semantic segmentation algorithm accelerated by pruning achieves an accuracy of 94.87%  ...  Some researchers have proposed an accelerated way to delete the entire convolution kernel, except that a certain convolution kernel is compared with a certain weight in the deleted convolution kernel,  ... 
doi:10.1155/2022/1095524 fatcat:xdds57pupzgh5ez3pvpgvsjcta

Flexible function-level acceleration of embedded vision applications using the Pipelined Vision Processor

Robert Bushey, Hamed Tabkhi, Gunar Schirner
2013 2013 Asilomar Conference on Signals, Systems and Computers  
To satisfy these requirements innovative solutions are required to deliver high performance pixel processing combined with low energy per pixel execution.  ...  The paper also addresses the benefits and challenges of architecting and programming at the function-level granularity and abstractions.  ...  It has low computational complexity and thus is not an intuitive choice for HW acceleration.  ... 
doi:10.1109/acssc.2013.6810535 dblp:conf/acssc/BusheyTS13 fatcat:3yijp7sxkve3lii6wfr6hlaxiy
« Previous Showing results 1 — 15 out of 10,309 results