Filters








31,200 Hits in 5.0 sec

Parallel Processing of Sequential Media Algorithms on Heterogeneous Multi-Processor System-on-Chip

Peng Zhao, Dawei Wang, Ming Yan, Sikun Li
2009 Journal of Computers  
And heterogeneous MPSoCs provides more opportunities for parallelization accelerating of sequential media algorithms.  ...  Moreover, the difference between processing elements, reflected in architecture templates, is used to achieve "the maximum" performance and efficiency of heterogeneous MPSoCs.  ...  And then tiles are mapping to VMP-MPSoC architecture template for parallel processing. A.  ... 
doi:10.4304/jcp.4.6.477-484 fatcat:bljxfxehrbfl3lfkv3p2rcm6yu

Hardware Compilation of Deep Neural Networks: An Overview

Ruizhe Zhao, Shuanglong Liu, Ho-Cheung Ng, Erwei Wang, James J. Davis, Xinyu Niu, Xiwei Wang, Huifeng Shi, George A. Constantinides, Peter Y. K. Cheung, Wayne Luk
2018 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
Design templates for neural network accelerators are studied with a specific focus on their derivation methodologies.  ...  A neural network model has various layer types, connection patterns and data representations, and the corresponding implementation can be customised with different architectural and modular parameters.  ...  : An intuitive way to analyse and explore nested loops in hardware terms is through parallelism.  ... 
doi:10.1109/asap.2018.8445088 dblp:conf/asap/ZhaoLNWDNWSCCL18 fatcat:v5txrrsfifa6bah2oksjdlrsgi

Design space exploration in application-specific hardware synthesis for multiple communicating nested loops

Rosilde Corvino, Abdoulaye Gamatie, Marc Geilen, Lech Jozwiak
2012 2012 International Conference on Embedded Computer Systems (SAMOS)  
Behavioral specifications of data-intensive applications are usually given in the form of a loop-based sequential code, which requires parallelization and task scheduling for an efficient MPSoC implementation  ...  This paper proposes a method for a concurrent exploration of data and task parallelism when using loop transformations to optimize data transfer and storage mechanisms for both single and multiple communicating  ...  Fig. 3 . 3 An architecture template with three processing tiles.  ... 
doi:10.1109/samos.2012.6404166 dblp:conf/samos/CorvinoGGJ12 fatcat:id7qxcpzibhuvp3x7a7gn5vdqu

A Parallel for Loop Memory Template for a High Level Synthesis Compiler

Craig Moore, Wim Meeus, Harald Devos, Dirk Stroobandt
2010 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools  
The template fits parallel for loops with no loop dependencies and sequential bodies. We found two alternative template implementations using our compiler.  ...  We propose a parametrized memory template for applications with parallel for loops. The template's parameters reflect important trade-offs made during system design.  ...  The authors would like to thank Stephen Neuendorffer of Xilinx for his recommendations and suggestions.  ... 
doi:10.1109/dsd.2010.62 dblp:conf/dsd/MooreMDS10 fatcat:liaciriu25h4xjommpnnus4xyy

Mapping of nomadic multimedia applications on the ADRES reconfigurable array processor

Mladen Berekovic, Andreas Kanstein, Bingfeng Mei, Bjorn De Sutter
2009 Microprocessors and microsystems  
ADRES supports a VLIW-like programming model with a pure VLIW mode for legacy code, and a (coarse-grain reconfigurable) array mode with very high parallelism for the processing of compute intensive loops  ...  An XML-based architecture description language allows a designer to easily generate different processor instances with full compiler support by specifying different values for the communication topology  ...  Acknowledgements This research has been performed in the context of IMECs M4 Research Program, which is partly funded by Samsung and Freescale Semiconductors.  ... 
doi:10.1016/j.micpro.2009.02.008 fatcat:vltv2oj5snay5losjb4cbngemu

A Distributed Shared Memory Model and C++ Templated Meta-Programming Interface for the Epiphany RISC Array Processor [article]

David Richie, James Ross, Jamie Infantolino
2017 arXiv   pre-print
The approach offers an extremely simple parallel programming model well suited for the architecture.  ...  Whereas such a processor offers high computational energy efficiency and parallel scalability, developing effective programming models that address the unique architecture features has presented many challenges  ...  Army Research Laboratory-hosted Department of Defense Supercomputing Resource Center for its support of this work.  ... 
arXiv:1704.08343v1 fatcat:ve6tmei4brdshj5w4x74xij3em

Algorithmic transformation techniques for efficient exploration of alternative application instances

Todor Stefanov, Bart Kienhuis, Ed Deprettere
2002 Proceedings of the tenth international symposium on Hardware/software codesign - CODES '02  
Following the Y-chart paradigm for designing a system, an application and an architecture are modeled separately and mapped onto each other in an explicit design step.  ...  Next, a performance analysis for alternative application instances, architecture instances and mappings has to be done, thereby exploring the design space of the target system.  ...  Third, we use a Y-chart environment to map the KPN onto an architecture template and do performance analysis.  ... 
doi:10.1145/774789.774792 dblp:conf/codes/StefanovKD02 fatcat:6dwtw4mkrfgnjpknbgc5oqa7me

Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures [chapter]

Hamed Fatemi, Henk Corporaal, Twan Basten, Richard Kleihorst, Pieter Jonker
2005 Lecture Notes in Computer Science  
To scrutinize the effect of DLP and ILP in our architecture (template), an area model based on the number of ALUs (ILP) and the number of processing elements (DLP) in the template is defined, as well as  ...  This paper explores the limitations and bottlenecks of increasing support for parallelism along the DLP and ILP axes in isolation and in combination.  ...  Section 2 explains the architecture on which our measurements are based. The area and performance models for this architecture are studied in Sections 3 and 4.  ... 
doi:10.1007/11558484_87 fatcat:ullnx6puq5gs5oezt4s6k3ibh4

Efficient applications in user transparent parallel image processing

F.J. Seinstra, D. Koelma, J.M. Geusebroek, F.C. Verster, A.W.M. Smeulders
2002 Proceedings 16th International Parallel and Distributed Processing Symposium  
In an attempt to provide such a tool, we have designed a software architecture that allows transparent (i.e., sequential) implementation of data parallel imaging applications for execution on homogeneous  ...  Based on experimental results we conclude that our architecture constitutes a powerful and user-friendly tool for obtaining high performance in many important image processing research areas.  ...  Performance Evaluation Because template matching is such an important task in image processing, it is essential for our software architecture to perform well for this application.  ... 
doi:10.1109/ipdps.2002.1016511 dblp:conf/ipps/SeinstraKGVS02 fatcat:mnwcfvnvondkjpfuxmb4rwkbwu

Architecture synthesis of high-performance application-specific processors

Mauricio Breternitz, John Paul Shen
1990 Conference proceedings on 27th ACM/IEEE design automation conference - DAC '90  
ARCHITECTURE SYNTHESIS This section documents an experimental project, called the White Dwarf [2], which explores the feasibility of the architecture synthesis approach.  ...  The results and experiences from this project served as the basis of and the model for the architecture synthesis method presented herein, called Application-Specific Processor Design (ASPD).  ...  The White Dwarf design also indicates that the VLIW-like [4] architectural model can serve as an effective and efficient architecture template for ASP synthesis for scientific. engineering and embedded  ... 
doi:10.1145/123186.123398 dblp:conf/dac/BreternitzS90 fatcat:h5lhg46wtzhijefjimgjwczz5i

Heterogeneous coarse-grained processing elements: A template architecture for embedded processing acceleration

G. Ansaloni, P. Bonzini, L. Pozzi
2009 2009 Design, Automation & Test in Europe Conference & Exhibition  
Reconfigurable Architectures are good candidates for application accelerators that cannot be set in stone at production time.  ...  Just like the integration of hardwired multiplier and memory blocks enabled FPGAs to efficiently implement digital signal processing applications, in this paper we study a customizable architecture template  ...  The evolution of coarse-grained architectures should not happen in isolation.  ... 
doi:10.1109/date.2009.5090723 dblp:conf/date/AnsaloniBP09 fatcat:twqcaz7vyzgqtnemo2ch2rqmqm

Algorithmic transformation techniques for efficient exploration of alternative application instances

Todor Stefanov, Bart Kienhuis, Ed Deprettere
2002 Proceedings of the tenth international symposium on Hardware/software codesign - CODES '02  
Following the Y-chart paradigm for designing a system, an application and an architecture are modeled separately and mapped onto each other in an explicit design step.  ...  Next, a performance analysis for alternative application instances, architecture instances and mappings has to be done, thereby exploring the design space of the target system.  ...  Third, we use a Y-chart environment to map the KPN onto an architecture template and do performance analysis.  ... 
doi:10.1145/774790.774792 fatcat:leoufvt2sjefpkhex2jx6wu23e

Engineering and implementing software architectural patterns based on feedback loops

Dhaminda B. Abeywickrama, Nicklas Hoch, Franco Zambonelli
2015 Scalable Computing : Practice and Experience  
In this paper, we present SimSOTA-an integrated Eclipse plug-in to architect, engineer and implement self-adaptive systems based on our feedback loop-based approach.  ...  A highly decentralized system of autonomous service components consists of multiple and interacting feedback loops which can be organized into a variety of architectural patterns.  ...  Parallel AMs SC pattern). For example, the Java template for the Parallel AMs SC pattern (cf.  ... 
doi:10.12694/scpe.v15i4.1052 fatcat:qdzjbztmfzg75a7esabluijiae

P3L: A structured high-level parallel language, and its structured support

Bruno Bacci, Marco Danelutto, Salvatore Orlando, Susanna Pelagatti, Marco Vanneschi
1995 Concurrency Practice and Experience  
The methodology is based on the de nition of a new, high-level, explicitly parallel language, called P 3 L, and of a set of static tools that automatically adapt the program features for each target architecture  ...  are frequently encountered in parallel applications, and that can e ciently be implemented.  ...  Acknowledgments We w ould like to thank Milon Mackey, for implementing the front-end of the P 3 L compiler, and for the useful discussion about the interface with the host sequential language (C++) of  ... 
doi:10.1002/cpe.4330070305 fatcat:4g3qawn65bgp5nx27mnem626u4

Enabling FPGAs for the Masses [article]

Janarbek Matai, Dustin Richmond, Dajung Lee, Ryan Kastner
2014 arXiv   pre-print
In other words, HLS designers must implement the application using an abstract language in a manner that generates an efficient micro-architecture; we call this process writing restructured code.  ...  To do this, we study methodologies of restructuring software code for HLS tools; we provide examples of designing different kernels in state-of-the art HLS tools; and we present a list of challenges for  ...  for software programmers using HLS templates (restructured code) and parallel programming patterns VI.  ... 
arXiv:1408.5870v1 fatcat:nlrtru3mznatznvgfu653lrx4e
« Previous Showing results 1 — 15 out of 31,200 results