Filters








739 Hits in 3.0 sec

Inter-cluster communication models for clustered VLIW processors

A. Terechko, E. Le Thenaff, M. Garg, J. van Eijndhoven, H. Corporaal
The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.  
Clustering is a well-known technique to improve the implementation of single register file VLIW processors.  ...  This paper, however, identifies and evaluates five different inter-cluster communication models, including copy operations, dedicated issue slots, extended operands, extended results, and broadcasting.  ...  Inter-cluster communication models The inter-cluster data transports have to satisfy constraints of the implementation of a clustered VLIW.  ... 
doi:10.1109/hpca.2003.1183552 dblp:conf/hpca/TerechkoTGEC03 fatcat:bo774g5sjjectcyedj72yd3hc4

An efficient heuristic for instruction scheduling on clustered vliw processors

Xuemeng Zhang, Hui Wu, Jingling Xue
2011 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems - CASES '11  
Clustering is a well-known technique for improving the scalability of classical VLIW processors.  ...  This paper presents a novel phase coupled priority-based heuristic for scheduling a set of instructions in a basic block on a clustered VLIW processor.  ...  Instruction scheduling for clustered VLIW processors becomes more challenging as there are additional inter-cluster communication constraints.  ... 
doi:10.1145/2038698.2038707 dblp:conf/cases/ZhangWX11 fatcat:3zjmwsdgz5aydldtvp43jiijje

Instruction scheduling with k-successor tree for clustered VLIW processors

Xuemeng Zhang, Hui Wu, Jingling Xue
2013 Design automation for embedded systems  
Clustering is a well-known technique for improving the scalability of classical VLIW (Very Long Instruction Word) processors. A clustered VLIW processor consists of multiple clusters.  ...  This paper proposes a novel phase coupled, priority-based heuristic for scheduling a set of operations in a basic block on a clustered VLIW processor.  ...  Instruction scheduling for clustered VLIW processors becomes more challenging, as there are additional inter-cluster communication constraints.  ... 
doi:10.1007/s10617-012-9103-0 fatcat:v55x7phnxrgd7gpyiaozmh6xca

A low cost split-issue technique to improve performance of SMT clustered VLIW processors

Manoj Gupta, Fermin Sanchez, Josep Llosa
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
This paper proposes cluster-level split-issue, which implements split-issue at a cluster-level boundary for clustered VLIW processors.  ...  However, implementing splitissue at operation-level requires complex structures and is not practical for an embedded VLIW processor.  ...  For instance, for workload mmhh, using CCSI results into a performance gain of 7.4% on a 2-Thread CSMT machine for 'No split communication' model.  ... 
doi:10.1109/ipdps.2010.5470351 dblp:conf/ipps/GuptaSL10 fatcat:pzc4vspkz5cizdvmf3qy5s2qwi

A distributed control path architecture for VLIW processors

Hongtao Zhong, K. Fan, S. Mahlke, M. Schlansker
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
In this paper, we propose a distributed control path architecture for VLIW processors (DVLIW) to overcome the scalability problem of VLIW control paths.  ...  The major problem with traditional VLIW designs is that they do not scale efficiently due to bottlenecks that result from centralized resources and global communication.  ...  We also thank the anonymous referees for their excellent suggestions and feedback.  ... 
doi:10.1109/pact.2005.5 dblp:conf/IEEEpact/ZhongFMS05 fatcat:j6lo66yhp5dufhcpnot4s4fziu

Reducing the complexity of instruction-level power models for VLIW processors

A. Bona, M. Sami, D. Sciuto, C. Silvano, V. Zaccaria, R. Zafalon
2005 Design automation for embedded systems  
Globally, the proposed approach reduces the complexity of the characterization problem for a K -issue VLIW processor to quadratic (O(K * |C| 2 )) with respect to the number of operation clusters.  ...  The proposed model has been further extended to provide early power figures and energy/performance trade-offs for multi-cluster VLIW architectures composed of multiple data-path units and a single instruction  ...  Clusters have a separate Register File and communicate through an inter-cluster communication mechanism that supports send and receive operations.  ... 
doi:10.1007/s10617-006-9045-5 fatcat:lwk4hdzybzaltdwk747dzrjpci

Evaluation of Bus Based Interconnect Mechanisms in Clustered VLIW Architectures

Anup Gangwar, M. Balakrishnan, Preeti Ranjan Panda, Anshul Kumar
2007 International journal of parallel programming  
Clustered VLIW architectures, with a subset of FUs connected to any RF are the solution to this scalability problem.  ...  Recent studies with a wide variety of inter-cluster interconnection mechanisms have presented substantial gains in performance (number of cycles) over the most studied RFto-RF type interconnections.  ...  In this paper we extend the previous reported work on performance of bus based inter-cluster interconnects in clustered VLIW processors.  ... 
doi:10.1007/s10766-007-0045-2 fatcat:uvnquxln7jawrj36qnbi7lxkj4

Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures

Haijing Tang, Xu Yang, Siye Wang, Yanjun Zhang
2013 The Scientific World Journal  
Register-file-connected clustered (RFCC) VLIW architecture uses the mechanism of global register file to accomplish the inter-cluster data communications, thus eliminating the performance and energy consumption  ...  penalty caused by explicit inter-cluster data move operations in traditional bus-connected clustered (BCC) VLIW architecture.  ...  of concurrent inter-cluster data communications.  ... 
doi:10.1155/2013/913038 pmid:23970841 pmcid:PMC3732635 fatcat:y2erhzf76jgk3dzt7qunz6jnb4

Cluster-level simultaneous multithreading for VLIW processors

Manoj Gupta, Fermin Sanchez
2007 2007 25th International Conference on Computer Design  
Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and low power.  ...  In this paper, we propose CSMT (Cluster-level Simultaneous MultiThreading) to allow some degree of SMT in clustered VLIW processors with minimal hardware cost and complexity.  ...  The VEX architecture is decoupled from the implementation of the inter-cluster communication networks and, for our evaluations, a fully connected point to point communication network between clusters has  ... 
doi:10.1109/iccd.2007.4601890 dblp:conf/iccd/GuptaSL07 fatcat:5agwdhsovngmjohzlhiixxfs5u

A shared reconfigurable VLIW multiprocessor system

Fakhar Anjam, Stephan Wong, Faisal Nadeem
2010 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)  
The results show that we can achieve two times better performance for our dual-processor system (with shared resources) compared to a uni-processor system or a 2-cluster processor system for applications  ...  In this paper, we present the design and implementation of an open-source reconfigurable very long instruction word (VLIW) multiprocessor system.  ...  This performance penalty is due to the inter-cluster communication.  ... 
doi:10.1109/ipdpsw.2010.5470734 dblp:conf/ipps/AnjamWN10 fatcat:kv7gdsryjfhtve52ntylhep4gm

Compiler-assisted energy optimization for clustered VLIW processors

Rahul Nagpal, Y.N. Srikant
2012 Journal of Parallel and Distributed Computing  
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption.  ...  Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication.  ...  model different clustered VLIW configurations and inter-cluster communication models.  ... 
doi:10.1016/j.jpdc.2012.04.005 fatcat:4tr2gyvznbgyvlbxjjbqkvuv5a

Compiler-assisted power optimization for clustered VLIW architectures

Rahul Nagpal, Y.N. Srikant
2011 Parallel Computing  
However, inter-cluster communication in clustered architectures leads to increased leakage in functional components and a high number of register accesses.  ...  2-Clustered (4-Clustered) VLIW machine.  ...  The contentions for the limited number of slow inter-cluster communication channels in the context of clustered VLIW architectures introduce many short idle cycles.  ... 
doi:10.1016/j.parco.2010.08.005 fatcat:h3gydwofqzfqxct7talb3ujjuu

A Register File Architecture and Compilation Scheme for Clustered ILP Processors [chapter]

Krishnan Kailas, Manoj Franklin, Kemal Ebcioğlu
2002 Lecture Notes in Computer Science  
We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster  ...  communication operations into a single broadcast operation using a new sendb instruction.  ...  not cached in the CRB and the inter-cluster communication bus is available for broadcasting the register value.  ... 
doi:10.1007/3-540-45706-2_68 fatcat:cw37lzv3svfotfwme3cpyawupy

Tradeoff between data-, instruction-, and thread-level parallelism in stream processors

Jung Ho Ahn, Mattan Erez, William J. Dally
2007 Proceedings of the 21st annual international conference on Supercomputing - ICS '07  
We develop detailed VLSI-cost and processorperformance models for a multi-threaded Stream Processor and evaluate the tradeoffs, in both functionality and hardware costs, of mechanisms that exploit the  ...  This paper explores the scalability of the Stream Processor architecture along the instruction-, data-, and thread-level parallelism dimensions.  ...  For StreamMD the reason relates to the effective inter-cluster communication enabled by SIMD.  ... 
doi:10.1145/1274971.1274991 dblp:conf/ics/AhnED07 fatcat:ip56nttx25hjvdb4yj6hfrzl7i

Compiler-assisted leakage energy optimization for clustered VLIW architectures

Rahul Nagpal, Y. N. Srikant
2006 Proceedings of the 6th ACM & IEEE International conference on Embedded software - EMSOFT '06  
This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to  ...  The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.  ...  Clustering brings along extra contentions for a limited number of slow cross-paths (for inter-cluster communication).  ... 
doi:10.1145/1176887.1176921 dblp:conf/emsoft/NagpalS06 fatcat:23imitsqzbacfbfvjdr5jpwng4
« Previous Showing results 1 — 15 out of 739 results