A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2003; you can also visit the original URL.
The file type is application/pdf
.
Filters
Partitioned schedules for clustered VLIW architectures
Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing
This paper presents results on a new approach to partitioning a modulo-scheduled loop for distributed execution on parallel clusters of functional units organized as a VLIW machine. ...
A partitioning algorithm has been implemented to perform some experiments with the clustered architecture model, an organization widely accepted as being essential for very wide issue machines. 1 ...
Clustered VLIW Architecture A wide issue VLIW machine organized in a single cluster would require a highly multiported register file, which needs a disproportionate chip area. ...
doi:10.1109/ipps.1998.669945
dblp:conf/ipps/FernandesLT98
fatcat:afy6wpmcmrahzncrkgwtenwqlq
Compiler driven architecture design space exploration for DSP workloads: A study in software programmability versus hardware acceleration
2009
2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers
The paper uses a research compiler for architectural design space exploration to present comparisons between compiler generated scalable software programmable DSP architectures versus hardware acceleration ...
It shows that scaled up compiler generated software programmable DSP architectures can be attractive alternatives to non-programmable hardware acceleration. ...
In these experiments, a version of the Unified Assign and Schedule algorithm developed by Ozer et. al for performing instruction scheduling and partitioning on a two-way clustered VLIW architecture is ...
doi:10.1109/acssc.2009.5470122
fatcat:6hiqu5mt6jeqvgoxqux56mneqy
Distributed modulo scheduling
1999
Proceedings Fifth International Symposium on High-Performance Computer Architecture
Organizations composed by clusters of a few functional units and small private register files have been proposed to deal with this problem, an approach highly dependent on scheduling and partitioning strategies ...
Experimental results have shown the algorithm is effective for configurations up to 8 clusters, or even more when targeting vectorizable loops. 1 ...
A two-phase approach to partitioning and modulo scheduling for a clustered architecture is proposed in [6] . ...
doi:10.1109/hpca.1999.744349
dblp:conf/hpca/FernandesLT99
fatcat:n7ughiyq4vewbks52777hlhwaa
Partitioned register files for VLIWs
1992
ACM SIGMICRO Newsletter
We analyze a Limited Connectivity VLIW architecture as a realizable alternative that limits the number of ports. We present a fine-grain code partitioning method for this model. ...
An ideal VLIW architecture requires a large multiport register file that is currently not realizable in practice. ...
The Partitioning Methodology Compiling code for a LC-VLIW architecture is not a trivial task. ...
doi:10.1145/144965.145839
fatcat:rin4hvcqx5evpeegwerh2fy5py
Enabling compiler flow for embedded VLIW DSP processors with distributed register files
2007
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools - LCTES '07
This presents new challenges for devising compiler optimization schemes for such architectures. ...
For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register ...
It includes the work on partitioning register files to work with instruction scheduling [1] , loop partitions for clustered register files [2] , and global register allocations for cluster register files ...
doi:10.1145/1254766.1254793
dblp:conf/lctrts/ChenTCLYLL07
fatcat:plonxxe56fdznjhdrludvnahda
Enabling compiler flow for embedded VLIW DSP processors with distributed register files
2007
SIGPLAN notices
This presents new challenges for devising compiler optimization schemes for such architectures. ...
For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register ...
It includes the work on partitioning register files to work with instruction scheduling [1] , loop partitions for clustered register files [2] , and global register allocations for cluster register files ...
doi:10.1145/1273444.1254793
fatcat:xfvpaws5azehhi2ywm375nkwfm
Loop fusion for clustered VLIW architectures
2002
Proceedings of the joint conference on Languages, compilers and tools for embedded systems software and compilers for embedded systems - LCTES/SCOPES '02
In this paper, we examine the effects of loop fusion on DSP loops run on four simulated, clustered VLIW architectures and the Texas Instruments TMS320C64x. ...
The port requirement for a register bank can be reduced via hardware by partitioning the register bank into multiple banks connected to disjoint subsets of functional units, called clusters. ...
The authors would like to thank Texas Instruments for providing the benchmark suite used in this experiment and Dr. John Linn for his help in obtaining the benchmarks. ...
doi:10.1145/513829.513850
dblp:conf/lctrts/QianCS02
fatcat:zq7s7vkowzevphsmod7r6if7ym
Loop fusion for clustered VLIW architectures
2002
SIGPLAN notices
In this paper, we examine the effects of loop fusion on DSP loops run on four simulated, clustered VLIW architectures and the Texas Instruments TMS320C64x. ...
The port requirement for a register bank can be reduced via hardware by partitioning the register bank into multiple banks connected to disjoint subsets of functional units, called clusters. ...
The authors would like to thank Texas Instruments for providing the benchmark suite used in this experiment and Dr. John Linn for his help in obtaining the benchmarks. ...
doi:10.1145/566225.513850
fatcat:evuktpmyvjgfhan5rnzibqghqa
Loop fusion for clustered VLIW architectures
2002
Proceedings of the joint conference on Languages, compilers and tools for embedded systems software and compilers for embedded systems - LCTES/SCOPES '02
In this paper, we examine the effects of loop fusion on DSP loops run on four simulated, clustered VLIW architectures and the Texas Instruments TMS320C64x. ...
The port requirement for a register bank can be reduced via hardware by partitioning the register bank into multiple banks connected to disjoint subsets of functional units, called clusters. ...
The authors would like to thank Texas Instruments for providing the benchmark suite used in this experiment and Dr. John Linn for his help in obtaining the benchmarks. ...
doi:10.1145/513848.513850
fatcat:kzo5zvnfrne5njsdor4nppu4qu
Compiler-assisted power optimization for clustered VLIW architectures
2011
Parallel Computing
Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file ...
In this paper, we propose compiler scheduling algorithms targeting two previously ignored power-hungry components in clustered VLIW architectures, viz., instruction decoder and register file. ...
We also propose a scheduling algorithm in the context of VLIW and clustered VLIW architectures. ...
doi:10.1016/j.parco.2010.08.005
fatcat:h3gydwofqzfqxct7talb3ujjuu
Integrated temporal and spatial scheduling for extended operand clustered VLIW processors
2004
Proceedings of the first conference on computing frontiers on Computing frontiers - CF'04
This paper proposes an integrated spatial and temporal scheduling algorithm for extended operand clustered VLIW processors and evaluates its effectiveness in improving the run time performance of the code ...
Scheduling for clustered processors involves spatial concerns (where to schedule) as well as temporal concerns (when to schedule). ...
PREVIOUS WORK Recently there have been several proposals for scheduling on clustered VLIW architectures. ...
doi:10.1145/977091.977155
dblp:conf/cf/NagpalS04
fatcat:vb77pehtzrfcta325m2ffelie4
Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures
2013
The Scientific World Journal
In this paper, we presented compiler optimization techniques for an RFCC VLIW architecture called Lily, which is designed for encryption systems. ...
penalty caused by explicit inter-cluster data move operations in traditional bus-connected clustered (BCC) VLIW architecture. ...
[12] have presented a graph-partitioning-based instruction scheduling for clustered architecture. ...
doi:10.1155/2013/913038
pmid:23970841
pmcid:PMC3732635
fatcat:y2erhzf76jgk3dzt7qunz6jnb4
A distributed control path architecture for VLIW processors
2005
14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)
In this paper, we propose a distributed control path architecture for VLIW processors (DVLIW) to overcome the scalability problem of VLIW control paths. ...
VLIW architectures are popular in embedded systems because they offer high-performance processing at low cost and energy. ...
We also thank the anonymous referees for their excellent suggestions and feedback. ...
doi:10.1109/pact.2005.5
dblp:conf/IEEEpact/ZhongFMS05
fatcat:j6lo66yhp5dufhcpnot4s4fziu
Distributed Data Cache Designs for Clustered VLIW Processors
2005
IEEE transactions on computers
In this paper, we propose partitioning the L1 data cache among clusters for clustered VLIW processors. We refer to this kind of design as fully distributed processors. ...
For each alternative, instruction scheduling techniques targeted to cyclic code are developed. ...
FLEXIBLE COMPILER-MANAGED L0 BUFFERS FOR A CLUSTERED VLIW PROCESSOR
Architecture The last distributed cache configuration we have proposed for clustered VLIW processors consists of a slow centralized ...
doi:10.1109/tc.2005.163
fatcat:wdjpa6fgefczpct3plb7yth7hu
UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores
[chapter]
2013
Lecture Notes in Computer Science
In this paper we propose UCIFF, a new scheduling algorithm for heterogeneous clustered VLIW processors with software DVFS control, that performs cluster assignment, instruction scheduling and fast frequency ...
Clustered VLIW processors are scalable wide-issue statically scheduled processors. ...
Since the target architecture is a statically scheduled clustered VLIW one, it is the job of the scheduler to find the best frequency for each cluster so that the desired metric (Energy, EDP, ED2, Delay ...
doi:10.1007/978-3-642-37658-0_9
fatcat:rxoojqpeobhpdpaz47b2cf6uty
« Previous
Showing results 1 — 15 out of 914 results