Filters








976 Hits in 4.6 sec

Design Space Exploration for a Custom VLIW Architecture

M. K.Jain, Veena Ramnani
2013 International Journal of Computer Applications  
The objective of this research is to develop a retargetable compiler that can generate efficient code in terms of code size, cycle count and retargetability efforts for a VLIW processor.  ...  Hardware/Software co-design methodology has made it possible to find an optimal architecture for a given application by exploring the design space before building a real hardware prototype.  ...  It consists of parallel executions units, parallel memory pipelines, a large visible register set and an efficient branch architecture.  ... 
doi:10.5120/9951-4598 fatcat:n672wx3e55bi3k2xt4jmsrl5yy

A framework for Compiler Level statistical analysis over customized VLIW architecture

Amir Hossein Ashouri, Vittorio Zaccaria, Sotirios Xydis, Gianluca Palermo, Cristina Silvano
2013 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC)  
The proposed methodology provides the designer with an integrated framework to automatically (i) generate optimized applicationspecific VLIW architectural configurations and (ii) analyze compiler level  ...  transformations, enabling application-specific compiler tuning over customized VLIW system architectures.  ...  From a high level point of view, we first generate a set of promising VLIW architectural candidates that tailors to the characteristics of the target application, optimizing on the performance-intensity  ... 
doi:10.1109/vlsi-soc.2013.6673262 dblp:conf/vlsi/AshouriZXPS13 fatcat:ts5qanwymndwhetynb2bwodsga

Exploiting Statically Identified ILP for Network Processor Applications

Byeong Kil Lee
2010 International Journal of Computer and Electrical Engineering  
Network processors with various parallel architectures are appearing in the market, however, a thorough investigation of the implications of static versus dynamic scheduling of this class of emerging workloads  ...  With the large parallelism and the loop nature of network applications, our experimental analysis supports static scheduling as an appropriate strategy for network processor applications.  ...  For a more intuitive comparison, we use a normalized speedup to total execution cycles of the unscheduled VLIW.  ... 
doi:10.7763/ijcee.2010.v2.236 fatcat:ksy2nvipxnbenbrs4xh63a65mu

Mapping of nomadic multimedia applications on the ADRES reconfigurable array processor

Mladen Berekovic, Andreas Kanstein, Bingfeng Mei, Bjorn De Sutter
2009 Microprocessors and microsystems  
ADRES supports a VLIW-like programming model with a pure VLIW mode for legacy code, and a (coarse-grain reconfigurable) array mode with very high parallelism for the processing of compute intensive loops  ...  An XML-based architecture description language allows a designer to easily generate different processor instances with full compiler support by specifying different values for the communication topology  ...  Acknowledgements This research has been performed in the context of IMECs M4 Research Program, which is partly funded by Samsung and Freescale Semiconductors.  ... 
doi:10.1016/j.micpro.2009.02.008 fatcat:vltv2oj5snay5losjb4cbngemu

Mapping Control-Intensive Video Kernels onto a Coarse-Grain Reconfigurable Architecture: the H.264/AVC Deblocking Filter

C. Arbelo, A. Kanstein, S. Lopez, J. F. Lopez, M. Berekovic, R. Sarmiento, J.-Y. Mignolet
2007 2007 Design, Automation & Test in Europe Conference & Exhibition  
In this sense, the mapping of this decoder's functionality onto a C-programmable coarse-grained reconfigurable architecture named ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) is  ...  compared with an implementation on a Very Long Instruction Word (VLIW) dedicated processor.  ...  ACKNOWLEDGMENT The authors wish to thank IMEC for supporting the student internships which made this work possible, and for providing access to the tools.  ... 
doi:10.1109/date.2007.364587 dblp:conf/date/ArbeloKLLBSM07 fatcat:mfapuuuw65gazgr7cjdogcd7gu

XDSPCORE: a compiler-based configurable digital signal processor

A. Krall, I. Pryanishnikov, U. Hirnschrott, C. Panis
2004 IEEE Micro  
The second problem that hardware specialization exposes is a lack of compilation algorithms for such architectures, and the computational hardness of known algorithms.  ...  These constraints, along with a relatively narrow application domain, have led designers to create special architectural features, as found in the Harvard architecture, VLIW (very long instruction word  ...  Removing the dynamic checks can reduce the code size by up to a factor of 4.5. Software pipelining Software pipelining is an important optimization for pipelined VLIW architectures.  ... 
doi:10.1109/mm.2004.40 fatcat:lclck3wx2zd4zkilqn2oanqpdq

Bioinformatics on Embedded Systems: A Case Study of Computational Biology Applications on VLIW Architecture [chapter]

Yue Li, Tao Li
2005 Lecture Notes in Computer Science  
We investigate the basic characteristics of the benchmarks, impact of function units, the efficiency of VLIW execution, cache behavior and the impact of compiler optimizations.  ...  The architectural implications observed from this study can be applied to the design optimizations. To the best of our knowledge, this is one of the first such studies that have ever been attempted.  ...  Predicated execution can eliminate all non-loop backward branches from a program. Figure 10 shows the speedups of program execution due to superblock and hyperblock optimizations.  ... 
doi:10.1007/11599555_5 fatcat:q6btwnsyabbsdivngrglhmnihu

The superblock: An effective technique for VLIW and superscalar compilation

Wen -Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, Daniel M. Lavery
1993 Journal of Supercomputing  
However, ILP within basic blocks is extremely limited for control-intensive programs. We h a v e developed a set of techniques for exploiting ILP across basic block boundaries.  ...  Superblock optimizations and scheduling are shown to be useful while taking into account a v ariety of architectural features.  ...  Acknowledgments The authors would like t o a c knowledge all the members of the IMPACT research group for their support.  ... 
doi:10.1007/bf01205185 fatcat:pvcamk2wbbd3vknrkecwp5mpfu

The Superblock: An Effective Technique for VLIW and Superscalar Compilation [chapter]

Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, Daniel M. Lavery
1993 Instruction-Level Parallelism  
However, ILP within basic blocks is extremely limited for control-intensive programs. We h a v e developed a set of techniques for exploiting ILP across basic block boundaries.  ...  Superblock optimizations and scheduling are shown to be useful while taking into account a v ariety of architectural features.  ...  Acknowledgments The authors would like t o a c knowledge all the members of the IMPACT research group for their support.  ... 
doi:10.1007/978-1-4615-3200-2_7 fatcat:rktyy2dkd5dapokuxhkxa77wqe

Hybrid-DBT: Hardware/Software Dynamic Binary Translation Targeting VLIW

Simon Rokicki, Erven Rohou, Steven Derrien
2018 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
In order to provide dynamic adaptation of the performance/energy trade-off, systems today rely on heterogeneous multi-core architectures (different micro-architectures on a chip).  ...  The impact on the total execution time of applications and the quality of generated binaries are also measured.  ...  This trade-off depends on the application being executed: for data-intensive applications, the use of statically scheduled architecture (e.g.  ... 
doi:10.1109/tcad.2018.2864288 fatcat:a2axpoyilrajlmxujmtwwxll7q

A loop accelerator for low power embedded VLIW processors

Binu Mathew, Al Davis
2004 Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '04  
It is particularly significant for embedded systems where memory and power budgets are limited. A distributed address generation and loop acceleration architecture for VLIW processors is presented.  ...  The idea is evaluated in the context of a fine grain VLIW architecture executing complex perception algorithms such as speech and visual feature recognition.  ...  The address generators work in tandem with a VLIW execution unit called the loop unit. All branch and loop related instructions are dispatched to it.  ... 
doi:10.1145/1016720.1016726 dblp:conf/codes/MathewD04 fatcat:fcnyomdcofgdhjvhhrs2chebri

Supporting runtime reconfigurable VLIWs cores through dynamic binary translation

Simon Rokicki, Erven Rohou, Steven Derrien
2018 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)  
In this work, we propose to enrich these architectures with runtime configurable VLIW cores, which are very efficient at compute-intensive kernels.  ...  They however do not really exploit the characteristics of workloads (compute-intensive vs. control dominated).  ...  VLIW cores help processing compute-intensive workloads for a significantly lower energy budget than for their OoO counterparts.  ... 
doi:10.23919/date.2018.8342160 dblp:conf/date/RokickiRD18 fatcat:g5hf657vx5b3xb736kmwnf4fi4

Still Image Processing on Coarse-Grained Reconfigurable Array Architectures

Matthias Hartmann, Vasileios (Vassilis) Pantazis, Tom Vander Aa, Mladen Berekovic, Christian Hochberger
2008 Journal of Signal Processing Systems  
We investigate the mapping of two image processing algorithms, Wavelet encoding and decoding, and TIFF compression on this novel type of array architectures in a systematic way.  ...  on a state-of-the art commercial DSP platform, the c64x DSP from Texas Instruments.  ...  Introduction A new branch of programmable processor architectures for demanding DSP applications has emerged in the recent years, such as image processing or video coding: coarse-grained reconfigurable  ... 
doi:10.1007/s11265-008-0309-0 fatcat:3wpnpyuojbagho777csh7gz4bi

Still Image Processing on Coarse-Grained Reconfigurable Array Architectures

Matthias Hartmann, Vassilis Pantazis, Tom Vander Aa, Mladen Berekovic, Christian Hochberger, Bjorn de Sutter
2007 2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia  
We investigate the mapping of two image processing algorithms, Wavelet encoding and decoding, and TIFF compression on this novel type of array architectures in a systematic way.  ...  on a state-of-the art commercial DSP platform, the c64x DSP from Texas Instruments.  ...  Introduction A new branch of programmable processor architectures for demanding DSP applications has emerged in the recent years, such as image processing or video coding: coarse-grained reconfigurable  ... 
doi:10.1109/estmed.2007.4375805 dblp:conf/estimedia/HartmannPABHS07 fatcat:zebtyha64zdczg2vkvb7b4ytru

A Dynamic Modulo Scheduling with Binary Translation: Loop optimization with software compatibility

Ricardo Ferreira, Waldir Denver, Monica Pereira, Stephan Wong, Carlos A. Lisbȏa, Luigi Carro
2015 Journal of Signal Processing Systems  
Our results demonstrate that the greedy run-time algorithm can reach a near-optimal ILP rate, better than an off-line compiler approach for a 16-issue VLIW processor.  ...  In the past years, many works have demonstrated the applicability of Coarse-Grained Reconfigurable Array (CGRA) accelerators to optimize loops by using software pipelining approaches.  ...  Such (computationally intensive) applications are usually characterized by intensive loops and common use of arrays.  ... 
doi:10.1007/s11265-015-0974-8 fatcat:zpk2rzhw5zdorcyalkjkqwmere
« Previous Showing results 1 — 15 out of 976 results