446 Hits in 5.2 sec

Implementing and Optimizing of Entire System Toolkit of VLIW DSP Processors for Embedded Sensor-Based Systems

Xu Yang, Mingbin Zeng, Yanjun Zhang
2015 Scientific Programming  
However, the exploiting of VLIW DSPs in sensor-based domain has imposed a heavy challenge on software toolkit design.  ...  In this paper, we present our methods and experiences to develop system toolkit flows for a VLIW DSP, which is designed dedicated to sensor-based systems.  ...  cluster files and multibank register architectures.  ... 
doi:10.1155/2015/507896 fatcat:75hbku2xwfbczefag77atw5spy

Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures

Haijing Tang, Xu Yang, Siye Wang, Yanjun Zhang
2013 The Scientific World Journal  
Register-file-connected clustered (RFCC) VLIW architecture uses the mechanism of global register file to accomplish the inter-cluster data communications, thus eliminating the performance and energy consumption  ...  Clustering has become a common trend in very long instruction words (VLIW) architecture to solve the problem of area, energy consumption, and design complexity.  ...  In a clustered VLIW architecture, the FUs and register files are divided into several smaller groups. Each group is called a cluster.  ... 
doi:10.1155/2013/913038 pmid:23970841 pmcid:PMC3732635 fatcat:y2erhzf76jgk3dzt7qunz6jnb4

Modeling wire delay, area, power, and performance in a simulation infrastructure

N. P. Carter, A. Hussain
2006 IBM Journal of Research and Development  
To illustrate the capabilities of Justice, we simulate a number of VLIW processors and analyze the tradeoffs between power, performance, and wire length in these architectures.  ...  It then modifies the architectural specification by adding delay elements on communication paths whose delay is one or more clock cycles.  ...  The register files of the clustered architectures are noticeably smaller than those of the non-clustered architectures, as two 16-entry register files are smaller than one 32-entry register file with twice  ... 
doi:10.1147/rd.502.0311 fatcat:vjfuudrwtzcgtcwszdn5l3xq3u

SLAP: A Split Latency Adaptive VLIW pipeline architecture which enables on-the-fly variable SIMD vector-length [article]

Ashish Shrivastava and Alan Gatherer and Tong Sun and Sushma Wokhlu and Alex Chandra
2021 arXiv   pre-print
The VLIW architecture, the de facto signal processing engine, suffers badly from a breakdown in lockstep execution of scalar and vector instructions.  ...  We describe the Split Latency Adaptive Pipeline (SLAP) VLIW architecture, a cache performance improvement technology that requires zero change to object code, while removing smart DMAs and their overhead  ...  SLAP based VLIW Architecture tor instruction flow).  ... 
arXiv:2102.13301v1 fatcat:j6wol5bddbhwddwhac5rll54rq

A low cost split-issue technique to improve performance of SMT clustered VLIW processors

Manoj Gupta, Fermin Sanchez, Josep Llosa
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
This paper proposes cluster-level split-issue, which implements split-issue at a cluster-level boundary for clustered VLIW processors.  ...  Split-issue at operation-level is a technique that allows issuing a VLIW instruction in parts without breaking execution semantics.  ...  BASE ARCHITECTURE The evaluation done in this paper is based on the VEX clustered architecture [14] modeled upon the commercial HP/ST ST200 [2] , [1] VLIW family.  ... 
doi:10.1109/ipdps.2010.5470351 dblp:conf/ipps/GuptaSL10 fatcat:pzc4vspkz5cizdvmf3qy5s2qwi

Scalable vector processors for embedded systems

C.E. Kozyrakis, D.A. Patterson
2003 IEEE Micro  
A separate intercluster network moves vectors between functional units when needed. The local register file block within each cluster provides operands to one data path and one network interface.  ...  A superscalar processor, on the other hand, must issue one instruction per functional unit per cycle. The clustered organization in Figure 3b associates a small instruction queue with each cluster.  ...  Kozyrakis has a PhD in computer science from the University of California at Berkeley, where he was the architect of the VIRAM processor. He is a member of the IEEE and the ACM.  ... 
doi:10.1109/mm.2003.1261385 fatcat:arrxeb4uk5ek3ohjheugjmxyji

Optimizing coarse-grain reconfigurable hardware utilization through multiprocessing: an H.264/AVC decoder example

Andreas Kanstein, Sebastian López Suárez, Bjorn De Sutter, Valentín de Armas Sosa, Kamran Eshraghian, Félix B. Tobajas
2007 VLSI Circuits and Systems III  
We introduce a multi-processing extension to t he coarse-grained reconfigurable architecture ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) to deal with this kind of applications,  ...  This paper discusses the architecture and an exploration into how to potentially partition a given array for executing an H.264/AVC baseline decoder.  ...  Furthermore, the VLIW scheduler now also supports clustered register files, which is a requirement for efficiently utilizing the global data register files when partitioning the ADRES.  ... 
doi:10.1117/12.722077 fatcat:kg5dvj7p7zerlmckn5uacazsli

A distributed control path architecture for VLIW processors

Hongtao Zhong, K. Fan, S. Mahlke, M. Schlansker
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
In this paper, we propose a distributed control path architecture for VLIW processors (DVLIW) to overcome the scalability problem of VLIW control paths.  ...  DVLIW employs a multicluster design where each cluster contains a local instruction memory that provides all intra-cluster control.  ...  A single CMPP operation is issued on a cluster and the result is stored in the predicate register file.  ... 
doi:10.1109/pact.2005.5 dblp:conf/IEEEpact/ZhongFMS05 fatcat:j6lo66yhp5dufhcpnot4s4fziu

Application Specific Cache Simulation Analysis for Application Specific Instructionset Processor

Ravi Khatwal, Manoj Kumar Jain
2014 International Journal of Computer Applications  
An Efficient Simulation of application specific instruction-set processors (ASIP) is a challenging onus in the area of VLSI design.  ...  Each cluster is a collection of register files and a tightly coupled a set of functional units. Functional units within a cluster directly access only cluster register files.  ...  SimpleScalar is MIPS based architecture used in design space exploration, perform two level cache simulation.VEX defines a 32-bit clustered VLIW ISA is scalable and customizable to specific application  ... 
doi:10.5120/15782-4526 fatcat:nei26od2fzc5vewabepqbktnga

Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications

Hongtao Zhong, Steven A. Lieberman, Scott A. Mahlke
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
In coupled mode, the cores execute multiple instruction streams in lock-step to collectively function as a wide-issue VLIW.  ...  To attack this mismatch, this paper proposes a multicore architecture, referred to as Voltron, that extends traditional multicore systems in two ways.  ...  Much gratitude goes to the anonymous referees who provided helpful feedback on this work.  ... 
doi:10.1109/hpca.2007.346182 dblp:conf/hpca/ZhongLM07 fatcat:sauqiioqtvfaro65x6xyffqr6m

A Register File Architecture and Compilation Scheme for Clustered ILP Processors [chapter]

Krishnan Kailas, Manoj Franklin, Kemal Ebcioğlu
2002 Lecture Notes in Computer Science  
We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster  ...  Our scheme makes use of a small Caching Register Buffer (CRB) attached to the traditional partitioned local register file, which is used to store copies of remote registers.  ...  [20] proposed a partitioned register file scheme in which individually addressable registers are replaced by queues.  ... 
doi:10.1007/3-540-45706-2_68 fatcat:cw37lzv3svfotfwme3cpyawupy

A Run-Time Task Migration Scheme for an Adjustable Issue-Slots Multi-core Processor [chapter]

Fakhar Anjam, Quan Kong, Roel Seedorf, Stephan Wong
2012 Lecture Notes in Computer Science  
In this paper, we present a run-time task migration scheme for an adjustable/reconfigurable issue-slots very long instruction word (VLIW) multi-core processor.  ...  With a task migration scheme, a code running on a core can be shifted to a larger or a smaller issue-width core for increasing the performance or reducing the power consumption of the whole system, respectively  ...  Acknowledgment This work is supported by the European Commission in the context of the ERA (Embedded Reconfigurable Architectures) collaborative project #249059 (FP7).  ... 
doi:10.1007/978-3-642-28365-9_9 fatcat:fig2crumcrdqvg7iuxwtyxpkye

Cluster-level simultaneous multithreading for VLIW processors

Manoj Gupta, Fermin Sanchez
2007 2007 25th International Conference on Computer Design  
For instance, with 4 threads CSMT shows an average speedup of 113% over a single-thread VLIW architecture and 36% over Interleaved MultiThreading (IMT).  ...  All bundles belonging to a VLIW instruction from a given thread are issued simultaneously.  ...  Clustered VLIW architectures tackle this problem by introducing more than one register file and clustering the FUs according to the register files they are connected to.  ... 
doi:10.1109/iccd.2007.4601890 dblp:conf/iccd/GuptaSL07 fatcat:5agwdhsovngmjohzlhiixxfs5u

The heterogeneous block architecture

Chris Fallin, Chris Wilkerson, Onur Mutlu
2014 2014 IEEE 32nd International Conference on Computer Design (ICCD)  
Based on these observations, we propose a fine-grained heterogeneous core design, called the heterogeneous block architecture (HBA), that combines heterogeneous execution backends into one core.  ...  Our extensive evaluations compare this example HBA design to multiple baseline core designs (including monolithic outof-order, clustered out-of-order, in-order and a state-of-the-art heterogeneous core  ...  Local execution cluster: Both the out-of-order and VLIW/in-order execution backends in our design are built around a local execution cluster that contains simple ALUs, a local register file, and a bypass  ... 
doi:10.1109/iccd.2014.6974710 dblp:conf/iccd/FallinWM14 fatcat:4smrwymodvfgrkmpvucziqn524

Overcoming the limitations of conventional vector processors

Christos Kozyrakis, David Patterson
2003 Proceedings of the 30th annual international symposium on Computer architecture - ISCA '03  
It is designed around a clustered vector register file and uses a separate network for operand transfers across functional units.  ...  A renaming table makes the clustered register file transparent at the instruction set level. Renaming also enables precise exceptions for vector instructions at a performance loss of less than 5%.  ...  We also thank Suzanne Rivoire, Steve Scott, and the anonymous referees for their insightful comments on earlier versions of this paper.  ... 
doi:10.1145/859618.859664 fatcat:v6oxqqbqabf2xj5wi7olotxwte
« Previous Showing results 1 — 15 out of 446 results