A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Design Space Exploration for a Custom VLIW Architecture
2013
International Journal of Computer Applications
The objective of this research is to develop a retargetable compiler that can generate efficient code in terms of code size, cycle count and retargetability efforts for a VLIW processor. ...
Hardware/Software co-design methodology has made it possible to find an optimal architecture for a given application by exploring the design space before building a real hardware prototype. ...
It consists of parallel executions units, parallel memory pipelines, a large visible register set and an efficient branch architecture. ...
doi:10.5120/9951-4598
fatcat:n672wx3e55bi3k2xt4jmsrl5yy
A framework for Compiler Level statistical analysis over customized VLIW architecture
2013
2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC)
The proposed methodology provides the designer with an integrated framework to automatically (i) generate optimized applicationspecific VLIW architectural configurations and (ii) analyze compiler level ...
transformations, enabling application-specific compiler tuning over customized VLIW system architectures. ...
From a high level point of view, we first generate a set of promising VLIW architectural candidates that tailors to the characteristics of the target application, optimizing on the performance-intensity ...
doi:10.1109/vlsi-soc.2013.6673262
dblp:conf/vlsi/AshouriZXPS13
fatcat:ts5qanwymndwhetynb2bwodsga
Exploiting Statically Identified ILP for Network Processor Applications
2010
International Journal of Computer and Electrical Engineering
Network processors with various parallel architectures are appearing in the market, however, a thorough investigation of the implications of static versus dynamic scheduling of this class of emerging workloads ...
With the large parallelism and the loop nature of network applications, our experimental analysis supports static scheduling as an appropriate strategy for network processor applications. ...
For a more intuitive comparison, we use a normalized speedup to total execution cycles of the unscheduled VLIW. ...
doi:10.7763/ijcee.2010.v2.236
fatcat:ksy2nvipxnbenbrs4xh63a65mu
Mapping of nomadic multimedia applications on the ADRES reconfigurable array processor
2009
Microprocessors and microsystems
ADRES supports a VLIW-like programming model with a pure VLIW mode for legacy code, and a (coarse-grain reconfigurable) array mode with very high parallelism for the processing of compute intensive loops ...
An XML-based architecture description language allows a designer to easily generate different processor instances with full compiler support by specifying different values for the communication topology ...
Acknowledgements This research has been performed in the context of IMECs M4 Research Program, which is partly funded by Samsung and Freescale Semiconductors. ...
doi:10.1016/j.micpro.2009.02.008
fatcat:vltv2oj5snay5losjb4cbngemu
Mapping Control-Intensive Video Kernels onto a Coarse-Grain Reconfigurable Architecture: the H.264/AVC Deblocking Filter
2007
2007 Design, Automation & Test in Europe Conference & Exhibition
In this sense, the mapping of this decoder's functionality onto a C-programmable coarse-grained reconfigurable architecture named ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) is ...
compared with an implementation on a Very Long Instruction Word (VLIW) dedicated processor. ...
ACKNOWLEDGMENT The authors wish to thank IMEC for supporting the student internships which made this work possible, and for providing access to the tools. ...
doi:10.1109/date.2007.364587
dblp:conf/date/ArbeloKLLBSM07
fatcat:mfapuuuw65gazgr7cjdogcd7gu
XDSPCORE: a compiler-based configurable digital signal processor
2004
IEEE Micro
The second problem that hardware specialization exposes is a lack of compilation algorithms for such architectures, and the computational hardness of known algorithms. ...
These constraints, along with a relatively narrow application domain, have led designers to create special architectural features, as found in the Harvard architecture, VLIW (very long instruction word ...
Removing the dynamic checks can reduce the code size by up to a factor of 4.5.
Software pipelining Software pipelining is an important optimization for pipelined VLIW architectures. ...
doi:10.1109/mm.2004.40
fatcat:lclck3wx2zd4zkilqn2oanqpdq
Bioinformatics on Embedded Systems: A Case Study of Computational Biology Applications on VLIW Architecture
[chapter]
2005
Lecture Notes in Computer Science
We investigate the basic characteristics of the benchmarks, impact of function units, the efficiency of VLIW execution, cache behavior and the impact of compiler optimizations. ...
The architectural implications observed from this study can be applied to the design optimizations. To the best of our knowledge, this is one of the first such studies that have ever been attempted. ...
Predicated execution can eliminate all non-loop backward branches from a program. Figure 10 shows the speedups of program execution due to superblock and hyperblock optimizations. ...
doi:10.1007/11599555_5
fatcat:q6btwnsyabbsdivngrglhmnihu
The superblock: An effective technique for VLIW and superscalar compilation
1993
Journal of Supercomputing
However, ILP within basic blocks is extremely limited for control-intensive programs. We h a v e developed a set of techniques for exploiting ILP across basic block boundaries. ...
Superblock optimizations and scheduling are shown to be useful while taking into account a v ariety of architectural features. ...
Acknowledgments The authors would like t o a c knowledge all the members of the IMPACT research group for their support. ...
doi:10.1007/bf01205185
fatcat:pvcamk2wbbd3vknrkecwp5mpfu
The Superblock: An Effective Technique for VLIW and Superscalar Compilation
[chapter]
1993
Instruction-Level Parallelism
However, ILP within basic blocks is extremely limited for control-intensive programs. We h a v e developed a set of techniques for exploiting ILP across basic block boundaries. ...
Superblock optimizations and scheduling are shown to be useful while taking into account a v ariety of architectural features. ...
Acknowledgments The authors would like t o a c knowledge all the members of the IMPACT research group for their support. ...
doi:10.1007/978-1-4615-3200-2_7
fatcat:rktyy2dkd5dapokuxhkxa77wqe
Hybrid-DBT: Hardware/Software Dynamic Binary Translation Targeting VLIW
2018
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
In order to provide dynamic adaptation of the performance/energy trade-off, systems today rely on heterogeneous multi-core architectures (different micro-architectures on a chip). ...
The impact on the total execution time of applications and the quality of generated binaries are also measured. ...
This trade-off depends on the application being executed: for data-intensive applications, the use of statically scheduled architecture (e.g. ...
doi:10.1109/tcad.2018.2864288
fatcat:a2axpoyilrajlmxujmtwwxll7q
A loop accelerator for low power embedded VLIW processors
2004
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '04
It is particularly significant for embedded systems where memory and power budgets are limited. A distributed address generation and loop acceleration architecture for VLIW processors is presented. ...
The idea is evaluated in the context of a fine grain VLIW architecture executing complex perception algorithms such as speech and visual feature recognition. ...
The address generators work in tandem with a VLIW execution unit called the loop unit. All branch and loop related instructions are dispatched to it. ...
doi:10.1145/1016720.1016726
dblp:conf/codes/MathewD04
fatcat:fcnyomdcofgdhjvhhrs2chebri
Supporting runtime reconfigurable VLIWs cores through dynamic binary translation
2018
2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)
In this work, we propose to enrich these architectures with runtime configurable VLIW cores, which are very efficient at compute-intensive kernels. ...
They however do not really exploit the characteristics of workloads (compute-intensive vs. control dominated). ...
VLIW cores help processing compute-intensive workloads for a significantly lower energy budget than for their OoO counterparts. ...
doi:10.23919/date.2018.8342160
dblp:conf/date/RokickiRD18
fatcat:g5hf657vx5b3xb736kmwnf4fi4
Still Image Processing on Coarse-Grained Reconfigurable Array Architectures
2008
Journal of Signal Processing Systems
We investigate the mapping of two image processing algorithms, Wavelet encoding and decoding, and TIFF compression on this novel type of array architectures in a systematic way. ...
on a state-of-the art commercial DSP platform, the c64x DSP from Texas Instruments. ...
Introduction A new branch of programmable processor architectures for demanding DSP applications has emerged in the recent years, such as image processing or video coding: coarse-grained reconfigurable ...
doi:10.1007/s11265-008-0309-0
fatcat:3wpnpyuojbagho777csh7gz4bi
Still Image Processing on Coarse-Grained Reconfigurable Array Architectures
2007
2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia
We investigate the mapping of two image processing algorithms, Wavelet encoding and decoding, and TIFF compression on this novel type of array architectures in a systematic way. ...
on a state-of-the art commercial DSP platform, the c64x DSP from Texas Instruments. ...
Introduction A new branch of programmable processor architectures for demanding DSP applications has emerged in the recent years, such as image processing or video coding: coarse-grained reconfigurable ...
doi:10.1109/estmed.2007.4375805
dblp:conf/estimedia/HartmannPABHS07
fatcat:zebtyha64zdczg2vkvb7b4ytru
A Dynamic Modulo Scheduling with Binary Translation: Loop optimization with software compatibility
2015
Journal of Signal Processing Systems
Our results demonstrate that the greedy run-time algorithm can reach a near-optimal ILP rate, better than an off-line compiler approach for a 16-issue VLIW processor. ...
In the past years, many works have demonstrated the applicability of Coarse-Grained Reconfigurable Array (CGRA) accelerators to optimize loops by using software pipelining approaches. ...
Such (computationally intensive) applications are usually characterized by intensive loops and common use of arrays. ...
doi:10.1007/s11265-015-0974-8
fatcat:zpk2rzhw5zdorcyalkjkqwmere
« Previous
Showing results 1 — 15 out of 976 results