Filters








914 Hits in 3.2 sec

Partitioned schedules for clustered VLIW architectures

M.M. Fernandes, J. Llosa, N. Topham
Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing  
This paper presents results on a new approach to partitioning a modulo-scheduled loop for distributed execution on parallel clusters of functional units organized as a VLIW machine.  ...  A partitioning algorithm has been implemented to perform some experiments with the clustered architecture model, an organization widely accepted as being essential for very wide issue machines. 1  ...  Clustered VLIW Architecture A wide issue VLIW machine organized in a single cluster would require a highly multiported register file, which needs a disproportionate chip area.  ... 
doi:10.1109/ipps.1998.669945 dblp:conf/ipps/FernandesLT98 fatcat:afy6wpmcmrahzncrkgwtenwqlq

Compiler driven architecture design space exploration for DSP workloads: A study in software programmability versus hardware acceleration

Michael C. Brogioli, Joseph R. Cavallaro
2009 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers  
The paper uses a research compiler for architectural design space exploration to present comparisons between compiler generated scalable software programmable DSP architectures versus hardware acceleration  ...  It shows that scaled up compiler generated software programmable DSP architectures can be attractive alternatives to non-programmable hardware acceleration.  ...  In these experiments, a version of the Unified Assign and Schedule algorithm developed by Ozer et. al for performing instruction scheduling and partitioning on a two-way clustered VLIW architecture is  ... 
doi:10.1109/acssc.2009.5470122 fatcat:6hiqu5mt6jeqvgoxqux56mneqy

Distributed modulo scheduling

M.M. Fernandes, J. Llosa, N. Topham
1999 Proceedings Fifth International Symposium on High-Performance Computer Architecture  
Organizations composed by clusters of a few functional units and small private register files have been proposed to deal with this problem, an approach highly dependent on scheduling and partitioning strategies  ...  Experimental results have shown the algorithm is effective for configurations up to 8 clusters, or even more when targeting vectorizable loops. 1  ...  A two-phase approach to partitioning and modulo scheduling for a clustered architecture is proposed in [6] .  ... 
doi:10.1109/hpca.1999.744349 dblp:conf/hpca/FernandesLT99 fatcat:n7ughiyq4vewbks52777hlhwaa

Partitioned register files for VLIWs

Andrea Capitanio, Nikil Dutt, Alexandru Nicolau
1992 ACM SIGMICRO Newsletter  
We analyze a Limited Connectivity VLIW architecture as a realizable alternative that limits the number of ports. We present a fine-grain code partitioning method for this model.  ...  An ideal VLIW architecture requires a large multiport register file that is currently not realizable in practice.  ...  The Partitioning Methodology Compiling code for a LC-VLIW architecture is not a trivial task.  ... 
doi:10.1145/144965.145839 fatcat:rin4hvcqx5evpeegwerh2fy5py

Enabling compiler flow for embedded VLIW DSP processors with distributed register files

Chung-Kai Chen, Ling-Hua Tseng, Shih-Chang Chen, Young-Jia Lin, Yi-Ping You, Chia-Han Lu, Jenq-Kuen Lee
2007 Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools - LCTES '07  
This presents new challenges for devising compiler optimization schemes for such architectures.  ...  For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register  ...  It includes the work on partitioning register files to work with instruction scheduling [1] , loop partitions for clustered register files [2] , and global register allocations for cluster register files  ... 
doi:10.1145/1254766.1254793 dblp:conf/lctrts/ChenTCLYLL07 fatcat:plonxxe56fdznjhdrludvnahda

Enabling compiler flow for embedded VLIW DSP processors with distributed register files

Chung-Kai Chen, Ling-Hua Tseng, Shih-Chang Chen, Young-Jia Lin, Yi-Ping You, Chia-Han Lu, Jenq-Kuen Lee
2007 SIGPLAN notices  
This presents new challenges for devising compiler optimization schemes for such architectures.  ...  For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register  ...  It includes the work on partitioning register files to work with instruction scheduling [1] , loop partitions for clustered register files [2] , and global register allocations for cluster register files  ... 
doi:10.1145/1273444.1254793 fatcat:xfvpaws5azehhi2ywm375nkwfm

Loop fusion for clustered VLIW architectures

Yi Qian, Steve Carr, Philip Sweany
2002 Proceedings of the joint conference on Languages, compilers and tools for embedded systems software and compilers for embedded systems - LCTES/SCOPES '02  
In this paper, we examine the effects of loop fusion on DSP loops run on four simulated, clustered VLIW architectures and the Texas Instruments TMS320C64x.  ...  The port requirement for a register bank can be reduced via hardware by partitioning the register bank into multiple banks connected to disjoint subsets of functional units, called clusters.  ...  The authors would like to thank Texas Instruments for providing the benchmark suite used in this experiment and Dr. John Linn for his help in obtaining the benchmarks.  ... 
doi:10.1145/513829.513850 dblp:conf/lctrts/QianCS02 fatcat:zq7s7vkowzevphsmod7r6if7ym

Loop fusion for clustered VLIW architectures

Yi Qian, Steve Carr, Philip Sweany
2002 SIGPLAN notices  
In this paper, we examine the effects of loop fusion on DSP loops run on four simulated, clustered VLIW architectures and the Texas Instruments TMS320C64x.  ...  The port requirement for a register bank can be reduced via hardware by partitioning the register bank into multiple banks connected to disjoint subsets of functional units, called clusters.  ...  The authors would like to thank Texas Instruments for providing the benchmark suite used in this experiment and Dr. John Linn for his help in obtaining the benchmarks.  ... 
doi:10.1145/566225.513850 fatcat:evuktpmyvjgfhan5rnzibqghqa

Loop fusion for clustered VLIW architectures

Yi Qian, Steve Carr, Philip Sweany
2002 Proceedings of the joint conference on Languages, compilers and tools for embedded systems software and compilers for embedded systems - LCTES/SCOPES '02  
In this paper, we examine the effects of loop fusion on DSP loops run on four simulated, clustered VLIW architectures and the Texas Instruments TMS320C64x.  ...  The port requirement for a register bank can be reduced via hardware by partitioning the register bank into multiple banks connected to disjoint subsets of functional units, called clusters.  ...  The authors would like to thank Texas Instruments for providing the benchmark suite used in this experiment and Dr. John Linn for his help in obtaining the benchmarks.  ... 
doi:10.1145/513848.513850 fatcat:kzo5zvnfrne5njsdor4nppu4qu

Compiler-assisted power optimization for clustered VLIW architectures

Rahul Nagpal, Y.N. Srikant
2011 Parallel Computing  
Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file  ...  In this paper, we propose compiler scheduling algorithms targeting two previously ignored power-hungry components in clustered VLIW architectures, viz., instruction decoder and register file.  ...  We also propose a scheduling algorithm in the context of VLIW and clustered VLIW architectures.  ... 
doi:10.1016/j.parco.2010.08.005 fatcat:h3gydwofqzfqxct7talb3ujjuu

Integrated temporal and spatial scheduling for extended operand clustered VLIW processors

Rahul Nagpal, Y. N. Srikant
2004 Proceedings of the first conference on computing frontiers on Computing frontiers - CF'04  
This paper proposes an integrated spatial and temporal scheduling algorithm for extended operand clustered VLIW processors and evaluates its effectiveness in improving the run time performance of the code  ...  Scheduling for clustered processors involves spatial concerns (where to schedule) as well as temporal concerns (when to schedule).  ...  PREVIOUS WORK Recently there have been several proposals for scheduling on clustered VLIW architectures.  ... 
doi:10.1145/977091.977155 dblp:conf/cf/NagpalS04 fatcat:vb77pehtzrfcta325m2ffelie4

Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures

Haijing Tang, Xu Yang, Siye Wang, Yanjun Zhang
2013 The Scientific World Journal  
In this paper, we presented compiler optimization techniques for an RFCC VLIW architecture called Lily, which is designed for encryption systems.  ...  penalty caused by explicit inter-cluster data move operations in traditional bus-connected clustered (BCC) VLIW architecture.  ...  [12] have presented a graph-partitioning-based instruction scheduling for clustered architecture.  ... 
doi:10.1155/2013/913038 pmid:23970841 pmcid:PMC3732635 fatcat:y2erhzf76jgk3dzt7qunz6jnb4

A distributed control path architecture for VLIW processors

Hongtao Zhong, K. Fan, S. Mahlke, M. Schlansker
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
In this paper, we propose a distributed control path architecture for VLIW processors (DVLIW) to overcome the scalability problem of VLIW control paths.  ...  VLIW architectures are popular in embedded systems because they offer high-performance processing at low cost and energy.  ...  We also thank the anonymous referees for their excellent suggestions and feedback.  ... 
doi:10.1109/pact.2005.5 dblp:conf/IEEEpact/ZhongFMS05 fatcat:j6lo66yhp5dufhcpnot4s4fziu

Distributed Data Cache Designs for Clustered VLIW Processors

E. Gibert, J. Sanchez, A. Gonzalez
2005 IEEE transactions on computers  
In this paper, we propose partitioning the L1 data cache among clusters for clustered VLIW processors. We refer to this kind of design as fully distributed processors.  ...  For each alternative, instruction scheduling techniques targeted to cyclic code are developed.  ...  FLEXIBLE COMPILER-MANAGED L0 BUFFERS FOR A CLUSTERED VLIW PROCESSOR Architecture The last distributed cache configuration we have proposed for clustered VLIW processors consists of a slow centralized  ... 
doi:10.1109/tc.2005.163 fatcat:wdjpa6fgefczpct3plb7yth7hu

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores [chapter]

Vasileios Porpodas, Marcelo Cintra
2013 Lecture Notes in Computer Science  
In this paper we propose UCIFF, a new scheduling algorithm for heterogeneous clustered VLIW processors with software DVFS control, that performs cluster assignment, instruction scheduling and fast frequency  ...  Clustered VLIW processors are scalable wide-issue statically scheduled processors.  ...  Since the target architecture is a statically scheduled clustered VLIW one, it is the job of the scheduler to find the best frequency for each cluster so that the desired metric (Energy, EDP, ED2, Delay  ... 
doi:10.1007/978-3-642-37658-0_9 fatcat:rxoojqpeobhpdpaz47b2cf6uty
« Previous Showing results 1 — 15 out of 914 results