Filters








172 Hits in 2.7 sec

Scalar operand networks

M.D. Taylor, W. Lee, S.P. Amarasinghe, A. Agarwal
2005 IEEE Transactions on Parallel and Distributed Systems  
We call interconnects optimized for scalar data transport, whether centralized or distributed, scalar operand networks.  ...  The paper also presents a 5-tuple performance model for SONs and analyzes their performance sensitivity to network properties for ILP workloads.  ...  Mapping ILP to architectures with distributed scalar operand networks is not as straightforward as with early, centralized architectures.  ... 
doi:10.1109/tpds.2005.24 fatcat:tmy2ry64yjd7dl65r4jnck7hki

Scalar operand networks: on-chip interconnect for ILP in partitioned architectures

M. Bedford Taylor, W. Lee, S. Amarasinghe, A. Agarwal
The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.  
Specifically, a scalar operand network is the interconnection network in microprocessors that provides the means for operand transport between ALUs and register files.  ...  We call interconnects optimized for scalar data transport, whether centralized or distributed, Scalar Operand Networks.  ...  ILP Computation on Partitioned Architectures Once the processor resources are partially or fully distributed, and connected via a scalar operand network, the mapping of ILP can take many forms.  ... 
doi:10.1109/hpca.2003.1183551 dblp:conf/hpca/TaylorLAA03 fatcat:bnhqjbn6nfagddtqlkc2gr6wyy

Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications

Hongtao Zhong, Steven A. Lieberman, Scott A. Mahlke
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
First, it provides a dual-mode scalar operand network to enable efficient inter-core communication and lightweight synchronization.  ...  Current chip multiprocessors increase throughput by utilizing multiple cores to perform computation in parallel.  ...  Acknowledgments We thank Mike Schlansker for his excellent comments and suggestions for this work. Much gratitude goes to the anonymous referees who provided helpful feedback on this work.  ... 
doi:10.1109/hpca.2007.346182 dblp:conf/hpca/ZhongLM07 fatcat:sauqiioqtvfaro65x6xyffqr6m

A PRAM-NUMA Model of Computation for Addressing Low-TLP Workloads

Martti Forsell
2011 International Journal of Networking and Computing  
It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability  ...  In this paper we show that integrating non-uniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem and provide a natural way for migration of the legacy code  ...  Recently, applying these ideas to architectures designed especially for PRAM implementation on a CMP with a help of the network-on-chip technology [3] have lead to two very promising research lines:  ... 
doi:10.15803/ijnc.1.1_21 fatcat:kl3qyj2airbcvbuq42fsdarqde

A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network

M.B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, W. Lee, A. Saraf, N. Shnidman, V. Strumpen (+2 others)
2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC.  
The Raw microprocessor was implemented to explore architectural solutions to scalability problems in scalar operand networks [2,3].  ...  Registers 24..27 are mapped to the four physical networks on the chip.  ...  The Raw microprocessor was implemented to explore architectural solutions to scalability problems in scalar operand networks [2,3].  ... 
doi:10.1109/isscc.2003.1234253 fatcat:ocoo3qewcbhntcbmvk2qohsp5u

A New Compiler for Space-Time Scheduling of ILP Processors

Rajendra Kumar, P. K. Singh
2011 International Journal of Computer and Electrical Engineering  
In this paper, we propose a compiler RPCC for general purpose sequential programs on the raw machine.  ...  The code generation for parallel register share architecture involves some issues that are not present in sequential code compilation and is inherently complex.  ...  One of the critical goals in code optimization for Multiprocessor-System-on-a-Chip (MPSoC) architectures is to minimize the number of off-chip memory accesses.  ... 
doi:10.7763/ijcee.2011.v3.375 fatcat:zkukpxggjvbv5nhacioojp6uma

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Karthikeyan Sankaralingam, Ramadass Nagarajan, Robert McDonald, Rajagopalan Desikan, Saurabh Drolia, M.S. Govindan, Paul Gratz, Divya Gulati, Heather Hanson, Changkyu Kim, Haiming Liu, Nitya Ranganathan (+5 others)
2006 Microarchitecture (MICRO), Proceedings of the Annual International Symposium on  
Growing on-chip wire delays will cause many future microarchitectures to be distributed, in which hardware resources within a single processor become nodes on one or more switched micronetworks.  ...  and a distributed 1MB nonuniform (NUCA) on-chip memory system.  ...  Acknowledgments We thank our design partners at IBM Microelectronics, and Synopsys for their generous university program.  ... 
doi:10.1109/micro.2006.19 dblp:conf/micro/SankaralingamNMDDGGGHKLRSSSKB06 fatcat:jiw42btzfbaujpk4efkrrhxhxu

Space-time scheduling of instruction-level parallelism on a raw machine

Walter Lee, Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, Saman Amarasinghe
1998 Proceedings of the eighth international conference on Architectural support for programming languages and operating systems - ASPLOS-VIII  
In this paper, we describe the compilation techniques used to exploit ILP on the Raw machine, a NURA machine composed of fully replicated processing units connected via a mostly static programmable network  ...  The compiler handles the orchestration by performing spatial and temporal instruction scheduling, as well as data partitioning using a distributed on-chip memory model.  ...  Figure 2 shows the organization of ports for the static network on a single tile.  ... 
doi:10.1145/291069.291018 dblp:conf/asplos/LeeBFSBSA98 fatcat:bwsn74jmxbd6baz3y7hwi4p7cy

TRIPS

Karthikeyan Sankaralingam, Charles R. Moore, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Nitya Ranganathan, Doug Burger, Stephen W. Keckler, Robert G. McDonald
2004 ACM Transactions on Architecture and Code Optimization (TACO)  
This EDGE ISA is coupled with hardware mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in different modes for instruction, data, or thread-level  ...  Our results show that high performance can be obtained in each of the three modes-ILP, TLP, and DLP-demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.  ...  The back sides of the L1 caches are connected to secondary memory tiles through the two-dimensional, switched on-chip interconnection network (OCN).  ... 
doi:10.1145/980152.980156 fatcat:wf63r7g6hnewhfuwjbd46apzna

DataScalar architectures

Doug Burger, Stefanos Kaxiras, James R. Goodman
1997 Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97  
In this execution model, each processor broadcasts operands it loads from its local memory to all other units.  ...  Our intuition and results show that DataScalar architectures work best with codes for which traditional parallelization techniques fail.  ...  Vijaykumar, David Wood, and Todd Bezenek for their helpful discussions and comments on drafts of this paper. We also thank Todd Austin, who developed the original SimpleScalar tool set.  ... 
doi:10.1145/264107.264215 dblp:conf/isca/BurgerKG97 fatcat:uqpa6bqnoneopjamcnstnmp3fi

DataScalar architectures

Doug Burger, Stefanos Kaxiras, James R. Goodman
1997 SIGARCH Computer Architecture News  
In this execution model, each processor broadcasts operands it loads from its local memory to all other units.  ...  Our intuition and results show that DataScalar architectures work best with codes for which traditional parallelization techniques fail.  ...  Vijaykumar, David Wood, and Todd Bezenek for their helpful discussions and comments on drafts of this paper. We also thank Todd Austin, who developed the original SimpleScalar tool set.  ... 
doi:10.1145/384286.264215 fatcat:5fuljppb7ncajjaclnc2an6rue

Synergistic Processing in Cell's Multicore Architecture

M. Gschwind, H.P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, T. Yamazaki
2006 IEEE Micro  
We also thank Valentina Salapura for her help and numerous suggestions in the preparation of this article.  ...  Acknowledgments We thank Jim Kahle, Ted Maeurer, Jaime Moreno, and Alexandre Eichenberger for their many comments and suggestions in the preparation of this work.  ...  On-Chip Interconnects Figure 2 is a die photo of the Cell BE.  ... 
doi:10.1109/mm.2006.41 fatcat:tt5nh6bppzdnxh6rhwfdcq7gle

Tradeoff between data-, instruction-, and thread-level parallelism in stream processors

Jung Ho Ahn, Mattan Erez, William J. Dally
2007 Proceedings of the 21st annual international conference on Supercomputing - ICS '07  
We evaluate the specific effects on performance of scaling along the different parallelism dimensions and explain the limitations of the ILP, DLP, and TLP hardware mechanisms.  ...  We develop detailed VLSI-cost and processorperformance models for a multi-threaded Stream Processor and evaluate the tradeoffs, in both functionality and hardware costs, of mechanisms that exploit the  ...  on-chip structures, and the LRFs that are highly partitioned and tightly connected to the ALUs in order to support their bandwidth demands.  ... 
doi:10.1145/1274971.1274991 dblp:conf/ics/AhnED07 fatcat:ip56nttx25hjvdb4yj6hfrzl7i

The "MIND" scalable PIM architecture [chapter]

Thomas Sterling, Maciej Brodowicz
2005 Advances in Parallel Computing  
MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing.  ...  It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die.  ...  Whittaker of NASA/JPL for lending his unparalleled expertise in the field of logic and VLSI design, and countless hours spent in discussions leading to the refinement of the architectural components of  ... 
doi:10.1016/s0927-5452(05)80010-3 fatcat:m7w6wqjxjrg3zowstdubiknvne

Mat-core: A matrix core extension for general-purpose processors

Mostafa I. Soliman
2007 2007 International Conference on Computer Engineering & Systems  
Mat-Core extends a general-purpose scalar processor (for executing scalar instructions) with a matrix unit (for executing vector/matrix instructions).  ...  This paper proposes new processor architecture to exploit the increasingly number of transistors per integrated circuit and improve the performance of many applications on generalpurpose processors.  ...  Superscalar architectures have used the increasable chip resources to dynamically extracting and dispatching more independent scalar instructions in the same clock cycle.  ... 
doi:10.1109/icces.2007.4447064 fatcat:d54cn2hyjfaqhbqaeh47nnay5q
« Previous Showing results 1 — 15 out of 172 results