A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Implementing and Optimizing of Entire System Toolkit of VLIW DSP Processors for Embedded Sensor-Based Systems
2015
Scientific Programming
However, the exploiting of VLIW DSPs in sensor-based domain has imposed a heavy challenge on software toolkit design. ...
In this paper, we present our methods and experiences to develop system toolkit flows for a VLIW DSP, which is designed dedicated to sensor-based systems. ...
cluster files and multibank register architectures. ...
doi:10.1155/2015/507896
fatcat:75hbku2xwfbczefag77atw5spy
Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures
2013
The Scientific World Journal
Register-file-connected clustered (RFCC) VLIW architecture uses the mechanism of global register file to accomplish the inter-cluster data communications, thus eliminating the performance and energy consumption ...
Clustering has become a common trend in very long instruction words (VLIW) architecture to solve the problem of area, energy consumption, and design complexity. ...
In a clustered VLIW architecture, the FUs and register files are divided into several smaller groups. Each group is called a cluster. ...
doi:10.1155/2013/913038
pmid:23970841
pmcid:PMC3732635
fatcat:y2erhzf76jgk3dzt7qunz6jnb4
Modeling wire delay, area, power, and performance in a simulation infrastructure
2006
IBM Journal of Research and Development
To illustrate the capabilities of Justice, we simulate a number of VLIW processors and analyze the tradeoffs between power, performance, and wire length in these architectures. ...
It then modifies the architectural specification by adding delay elements on communication paths whose delay is one or more clock cycles. ...
The register files of the clustered architectures are noticeably smaller than those of the non-clustered architectures, as two 16-entry register files are smaller than one 32-entry register file with twice ...
doi:10.1147/rd.502.0311
fatcat:vjfuudrwtzcgtcwszdn5l3xq3u
SLAP: A Split Latency Adaptive VLIW pipeline architecture which enables on-the-fly variable SIMD vector-length
[article]
2021
arXiv
pre-print
The VLIW architecture, the de facto signal processing engine, suffers badly from a breakdown in lockstep execution of scalar and vector instructions. ...
We describe the Split Latency Adaptive Pipeline (SLAP) VLIW architecture, a cache performance improvement technology that requires zero change to object code, while removing smart DMAs and their overhead ...
SLAP based VLIW Architecture tor instruction flow). ...
arXiv:2102.13301v1
fatcat:j6wol5bddbhwddwhac5rll54rq
A low cost split-issue technique to improve performance of SMT clustered VLIW processors
2010
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
This paper proposes cluster-level split-issue, which implements split-issue at a cluster-level boundary for clustered VLIW processors. ...
Split-issue at operation-level is a technique that allows issuing a VLIW instruction in parts without breaking execution semantics. ...
BASE ARCHITECTURE The evaluation done in this paper is based on the VEX clustered architecture [14] modeled upon the commercial HP/ST ST200 [2] , [1] VLIW family. ...
doi:10.1109/ipdps.2010.5470351
dblp:conf/ipps/GuptaSL10
fatcat:pzc4vspkz5cizdvmf3qy5s2qwi
Scalable vector processors for embedded systems
2003
IEEE Micro
A separate intercluster network moves vectors between functional units when needed. The local register file block within each cluster provides operands to one data path and one network interface. ...
A superscalar processor, on the other hand, must issue one instruction per functional unit per cycle. The clustered organization in Figure 3b associates a small instruction queue with each cluster. ...
Kozyrakis has a PhD in computer science from the University of California at Berkeley, where he was the architect of the VIRAM processor. He is a member of the IEEE and the ACM. ...
doi:10.1109/mm.2003.1261385
fatcat:arrxeb4uk5ek3ohjheugjmxyji
Optimizing coarse-grain reconfigurable hardware utilization through multiprocessing: an H.264/AVC decoder example
2007
VLSI Circuits and Systems III
We introduce a multi-processing extension to t he coarse-grained reconfigurable architecture ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) to deal with this kind of applications, ...
This paper discusses the architecture and an exploration into how to potentially partition a given array for executing an H.264/AVC baseline decoder. ...
Furthermore, the VLIW scheduler now also supports clustered register files, which is a requirement for efficiently utilizing the global data register files when partitioning the ADRES. ...
doi:10.1117/12.722077
fatcat:kg5dvj7p7zerlmckn5uacazsli
A distributed control path architecture for VLIW processors
2005
14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)
In this paper, we propose a distributed control path architecture for VLIW processors (DVLIW) to overcome the scalability problem of VLIW control paths. ...
DVLIW employs a multicluster design where each cluster contains a local instruction memory that provides all intra-cluster control. ...
A single CMPP operation is issued on a cluster and the result is stored in the predicate register file. ...
doi:10.1109/pact.2005.5
dblp:conf/IEEEpact/ZhongFMS05
fatcat:j6lo66yhp5dufhcpnot4s4fziu
Application Specific Cache Simulation Analysis for Application Specific Instructionset Processor
2014
International Journal of Computer Applications
An Efficient Simulation of application specific instruction-set processors (ASIP) is a challenging onus in the area of VLSI design. ...
Each cluster is a collection of register files and a tightly coupled a set of functional units. Functional units within a cluster directly access only cluster register files. ...
SimpleScalar is MIPS based architecture used in design space exploration, perform two level cache simulation.VEX defines a 32-bit clustered VLIW ISA is scalable and customizable to specific application ...
doi:10.5120/15782-4526
fatcat:nei26od2fzc5vewabepqbktnga
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications
2007
2007 IEEE 13th International Symposium on High Performance Computer Architecture
In coupled mode, the cores execute multiple instruction streams in lock-step to collectively function as a wide-issue VLIW. ...
To attack this mismatch, this paper proposes a multicore architecture, referred to as Voltron, that extends traditional multicore systems in two ways. ...
Much gratitude goes to the anonymous referees who provided helpful feedback on this work. ...
doi:10.1109/hpca.2007.346182
dblp:conf/hpca/ZhongLM07
fatcat:sauqiioqtvfaro65x6xyffqr6m
A Register File Architecture and Compilation Scheme for Clustered ILP Processors
[chapter]
2002
Lecture Notes in Computer Science
We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster ...
Our scheme makes use of a small Caching Register Buffer (CRB) attached to the traditional partitioned local register file, which is used to store copies of remote registers. ...
[20] proposed a partitioned register file scheme in which individually addressable registers are replaced by queues. ...
doi:10.1007/3-540-45706-2_68
fatcat:cw37lzv3svfotfwme3cpyawupy
A Run-Time Task Migration Scheme for an Adjustable Issue-Slots Multi-core Processor
[chapter]
2012
Lecture Notes in Computer Science
In this paper, we present a run-time task migration scheme for an adjustable/reconfigurable issue-slots very long instruction word (VLIW) multi-core processor. ...
With a task migration scheme, a code running on a core can be shifted to a larger or a smaller issue-width core for increasing the performance or reducing the power consumption of the whole system, respectively ...
Acknowledgment This work is supported by the European Commission in the context of the ERA (Embedded Reconfigurable Architectures) collaborative project #249059 (FP7). ...
doi:10.1007/978-3-642-28365-9_9
fatcat:fig2crumcrdqvg7iuxwtyxpkye
Cluster-level simultaneous multithreading for VLIW processors
2007
2007 25th International Conference on Computer Design
For instance, with 4 threads CSMT shows an average speedup of 113% over a single-thread VLIW architecture and 36% over Interleaved MultiThreading (IMT). ...
All bundles belonging to a VLIW instruction from a given thread are issued simultaneously. ...
Clustered VLIW architectures tackle this problem by introducing more than one register file and clustering the FUs according to the register files they are connected to. ...
doi:10.1109/iccd.2007.4601890
dblp:conf/iccd/GuptaSL07
fatcat:5agwdhsovngmjohzlhiixxfs5u
The heterogeneous block architecture
2014
2014 IEEE 32nd International Conference on Computer Design (ICCD)
Based on these observations, we propose a fine-grained heterogeneous core design, called the heterogeneous block architecture (HBA), that combines heterogeneous execution backends into one core. ...
Our extensive evaluations compare this example HBA design to multiple baseline core designs (including monolithic outof-order, clustered out-of-order, in-order and a state-of-the-art heterogeneous core ...
Local execution cluster: Both the out-of-order and VLIW/in-order execution backends in our design are built around a local execution cluster that contains simple ALUs, a local register file, and a bypass ...
doi:10.1109/iccd.2014.6974710
dblp:conf/iccd/FallinWM14
fatcat:4smrwymodvfgrkmpvucziqn524
Overcoming the limitations of conventional vector processors
2003
Proceedings of the 30th annual international symposium on Computer architecture - ISCA '03
It is designed around a clustered vector register file and uses a separate network for operand transfers across functional units. ...
A renaming table makes the clustered register file transparent at the instruction set level. Renaming also enables precise exceptions for vector instructions at a performance loss of less than 5%. ...
We also thank Suzanne Rivoire, Steve Scott, and the anonymous referees for their insightful comments on earlier versions of this paper. ...
doi:10.1145/859618.859664
fatcat:v6oxqqbqabf2xj5wi7olotxwte
« Previous
Showing results 1 — 15 out of 446 results