Filters








4,207 Hits in 5.8 sec

A Reconfigurable Architecture for Binary Acceleration of Loops with Memory Accesses

Nuno Paulino, João Canas Ferreira, João M. P. Cardoso
2014 ACM Transactions on Reconfigurable Technology and Systems  
This article presents a reconfigurable hardware/software architecture for binary acceleration of embedded applications.  ...  The implementation of Megablocks with memory accesses uses a memory-sharing mechanism to support concurrent accesses to the entire address space of the GPP's data memory.  ...  The RPU shares BRAM access with the GPP through the LMB Multiplexers. A Reconfigurable Architecture for Binary Acceleration of Loops with Memory Accesses 2:3 Table I .  ... 
doi:10.1145/2629468 fatcat:eogxo4p4yrb4tdehmgfxa7yvra

Speculative Loop-Pipelining in Binary Translation for Hardware Acceleration

Sejong Oh, Tag Gon Kim, Jeonghun Cho, Elaheh Bozorgzadeh
2008 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
This paper presents a speculative loop pipelining technique to overcome limitations of binary translation for hardware acceleration.  ...  The experimental results show a promising speedup of up to 2.53 compared with the code in which memory accesses are not optimized in the pipeline fashion due to conservative memory analysis.  ...  Therefore, the binary translator for hardware acceleration translates critical kernels of the software to the accelerator such as very long instruction word (VLIW) DSP, coarse-grained reconfigurable architecture  ... 
doi:10.1109/tcad.2008.915533 fatcat:wamudsm4fzcqzd4xglaiesbb5a

Transparent Trace-Based Binary Acceleration for Reconfigurable HW/SW Systems

Joao Bispo, Nuno Paulino, Cardoso, Ferreira
2013 IEEE Transactions on Industrial Informatics  
This paper presents a novel approach to accelerate program execution by mapping repetitive traces of executed instructions, called Megablocks, to a runtime reconfigurable array of functional units.  ...  A prototype implementation of the system using a cacheless MicroBlaze microprocessor running code located in external memory reaches speedups from to for a set of 14 benchmark kernels.  ...  Load/Store FUs allow for memory access to random memory positions. Paek et al. [24] implement a coarse-grained array of homogeneous FUs with a static interconnection scheme.  ... 
doi:10.1109/tii.2012.2235844 fatcat:vhk62w4htvclvoka7ccovpmuw4

From Instruction Traces to Specialized Reconfigurable Arrays

Joao Bispo, Nuno Paulino, Joao M.P. Cardoso, Joao Canas Ferreira
2011 2011 International Conference on Reconfigurable Computing and FPGAs  
This paper presents an offline tool-chain which automatically extracts loops (Megablocks) from MicroBlaze instruction traces and creates a tailored Reconfigurable Processing Unit (RPU) for those loops.  ...  The system moves loops from the CPU to the RPU transparently, at runtime, and without changing the executable binaries.  ...  In this paper we present a system which can automatically map loops of a MicroBlaze executable binary to a Reconfigurable Processing Unit (RPU).  ... 
doi:10.1109/reconfig.2011.43 dblp:conf/reconfig/BispoPCF11 fatcat:7ylonxm5o5a7feuogqqgaes6j4

A Two-Dimensional Superscalar Processor Architecture

Sascha Uhrig, Basher Shehan, Ralf Jahr, Theo Ungerer
2009 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns  
This paper proposes a new processor architecture optimized for execution of sequential instruction streams.  ...  In contrast to well-known coarse-grained reconfigurable architectures no special synthesis tools are required and no configuration overhead occurs.  ...  Together with some additional components for branch execution and memory accesses (at the west side and the east side of the array), the GAP is able to execute conventional program binaries within its  ... 
doi:10.1109/computationworld.2009.46 fatcat:53hhjvdovnhp5jvzh5f2xj33gq

Architecture for Transparent Binary Acceleration of Loops with Memory Accesses [chapter]

Nuno Paulino, João Canas Ferreira, João M. P. Cardoso
2013 Lecture Notes in Computer Science  
By using a memory sharing mechanism, the RPU can access the GPP's data memory, allowing the acceleration of Megablocks with load/store operations.  ...  This paper presents an extension to a hardware/software system architecture in which repetitive instruction traces, called Megablocks, are accelerated by a Reconfigurable Processing Unit (RPU).  ...  This work was funded by the European Regional Development Fund through the COMPETE Programme (Operational Programme for Competitiveness) and by national funds from the FCT-Fundação para a Ciência e a Tecnologia  ... 
doi:10.1007/978-3-642-36812-7_12 fatcat:p4kifbrcmngerorcyu3jotyx7q

Dynamic reconfiguration with binary translation: breaking the ILP barrier with software compatibility

A.C.S. Beck, L. Carro
2005 Proceedings. 42nd Design Automation Conference, 2005.  
The proposed approach combines a reconfigurable architecture with a binary translation mechanism, being totally transparent for the software designer.  ...  We present results regarding the impact of area and power, and compare the proposed approach with other Java machines, including a VLIW one.  ...  This avoids unnecessary accesses in the register bank or in the main memory, accelerating the execution and saving power.  ... 
doi:10.1109/dac.2005.193908 fatcat:xx5c4ihlk5b5vjmmhi6urcuz5m

Dynamic reconfiguration with binary translation

Antonio Carlos S. Beck, Luigi Carro
2005 Proceedings of the 42nd annual conference on Design automation - DAC '05  
The proposed approach combines a reconfigurable architecture with a binary translation mechanism, being totally transparent for the software designer.  ...  We present results regarding the impact of area and power, and compare the proposed approach with other Java machines, including a VLIW one.  ...  This avoids unnecessary accesses in the register bank or in the main memory, accelerating the execution and saving power.  ... 
doi:10.1145/1065579.1065771 dblp:conf/dac/BeckC05 fatcat:5y4lnhqofbhahgkqhpubotee5u

A Coarse-Grained Array Accelerator for Software-Defined Radio Baseband Processing

Bruno Bougard, Bjorn De Sutter, Diederik Verkest, Liesbet Van der Perre, Rudy Lauwereins
2008 IEEE Micro  
Memory interface To access the streaming data handled by the accelerated kernels, the accelerator core shares a level-1 scratchpad memory with the main CPU.  ...  Our accelerator, an instance of the Adres (architecture for dynamically reconfigurable embedded systems) architecture template, 10 aims to combine the advantages of all the aforementioned approaches  ...  His compiler research has focused on wholeprogram optimization, program compaction, binary rewriting, and code generation techniques for reconfigurable architectures.  ... 
doi:10.1109/mm.2008.49 fatcat:j3lcc5uscrfjfegerkmgctinf4

A Dynamic Modulo Scheduling with Binary Translation: Loop optimization with software compatibility

Ricardo Ferreira, Waldir Denver, Monica Pereira, Stephan Wong, Carlos A. Lisbȏa, Luigi Carro
2015 Journal of Signal Processing Systems  
In the past years, many works have demonstrated the applicability of Coarse-Grained Reconfigurable Array (CGRA) accelerators to optimize loops by using software pipelining approaches.  ...  In this work, we present a novel run-time translation technique for the modulo scheduling approach that can convert binary code onthe-fly to run on a CGRA.  ...  [4] proposes an approach that combines offline partitioning and mapping with online reconfiguration to accelerate loops in a reconfigurable coprocessor.  ... 
doi:10.1007/s11265-015-0974-8 fatcat:zpk2rzhw5zdorcyalkjkqwmere

A run-time modulo scheduling by using a binary translation mechanism

Ricardo Ferreira, Waldir Denver, Monica Pereira, Jorge Quadros, Luigi Carro, Stephan Wong
2014 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV)  
Moreover, a comparison to the state-of-the-art static compiler-based approaches for inner loop accelerators has been done by using CGRA and VLIW as target architectures.  ...  As proof of concept of scaling, a change in the memory bandwidth has been evaluated (from one memory access per cycle to two memory accesses per cycle).  ...  Normalized ILP for 1 or 2 memory access per clock cycle marks have been used to validate our approach. Table I presents the instruction composition of the detected loops from the binary.  ... 
doi:10.1109/samos.2014.6893197 dblp:conf/samos/FerreiraDPQCW14 fatcat:b6u7duf44ba6xm7dwmt3bjl4i4

GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs [article]

Markus Vogt, Gerald Hempel, Jeronimo Castrillon, Christian Hochberger
2015 arXiv   pre-print
In recent years, architectures combining a reconfigurable fabric and a general purpose processor on a single chip became increasingly popular.  ...  These restrictions still represent a high entry barrier for the wider community of programmers that new hybrid architectures are intended for.  ...  Furthermore, we define an architecture-specific penalty for memory accesses. Along with these heuristic, we are able to estimate the speedup of the accelerator in question.  ... 
arXiv:1509.00025v2 fatcat:a6oygvn5enc7viygzizjmdmntm

Partitioning and Vectorizing Binary Applications for a Reconfigurable Vector Computer [chapter]

Tobias Kenter, Gavin Vaz, Christian Plessl
2014 Lecture Notes in Computer Science  
In order to leverage the use of reconfigurable architectures in general-purpose computing, quick and automated methods to find suitable accelerator designs are required.  ...  Where applicable, we leverage outer-loop vectorization. We evaluate our tools with a set of characteristic loops, systematically analyzing different dependency and data layout properties.  ...  A distinctive feature of the HC-1 architecture is the availability of a fast multi channel memory interface, which provides the application engines with access to 8 independent memory banks through 8 dedicated  ... 
doi:10.1007/978-3-319-05960-0_13 fatcat:33t272pe7rakrlmfltbypifhwq

Performance estimation framework for automated exploration of CPU-accelerator architectures

Tobias Kenter, Christian Plessl, Marco Platzner, Michael Kauschke
2011 Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '11  
In this paper we present a fast and fully automated approach for studying the design space when interfacing reconfigurable accelerators with a CPU.  ...  Our challenge is, that a reasonable evaluation of architecture parameters requires a hardware/software partitioning that makes best use of each given architecture configuration.  ...  Acknowledgment This work is supported by Intel Corporation through a grant for the project "A multimode reconfigurable processing unit (MM-RPU)".  ... 
doi:10.1145/1950413.1950448 dblp:conf/fpga/KenterPPK11 fatcat:uapzuc4i7vbuzptg3bn6wfjxqi

Trace-Based Reconfigurable Acceleration with Data Cache and External Memory Support

Nuno Miguel Cardanha Paulino, Joao Canas Ferreira, Joao Manuel Paiva Cardoso
2014 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications  
This paper presents a binary acceleration approach based on extending a General Purpose Processor (GPP) with a Reconfigurable Processing Unit (RPU), both sharing an external data memory.  ...  A prototype implementation of the architecture on a Spartan-6 FPGA was validated with 12 benchmarks and achieved an overall geometric mean speedup of 1.91x.  ...  Our previous work presented a binary acceleration approach in which the execution of frequently executed loops is transparently migrated at run-time to a Reconfigurable Processing Unit (RPU), a tailored  ... 
doi:10.1109/ispa.2014.29 dblp:conf/ispa/PaulinoFC14 fatcat:u3mcgsqss5cf5lzno7pzbustga
« Previous Showing results 1 — 15 out of 4,207 results