Filters








84 Hits in 4.2 sec

Compiler-directed Data Partitioning for Multicluster Processors

M.L. Chu, S.A. Mahlke
International Symposium on Code Generation and Optimization (CGO'06)  
This work proposes a compiler-directed approach to synergistically partition both data objects and computation across multiple clusters.  ...  The distribution of data objects is generally ignored. In this work, we examine explicit partitioning of data objects and its affects on operation partitioning.  ...  GLOBAL DATA PARTITIONING This section introduces our compiler-directed Global Data Partitioning (GDP) approach for jointly partitioning data objects and computation across a multicluster architecture.  ... 
doi:10.1109/cgo.2006.9 dblp:conf/cgo/ChuM06 fatcat:hd7fbatnyre75lckc3vdbygo5u

Cost-sensitive partitioning in an architecture synthesis system for multicluster processors

M.L. Chu, K.C. Fan, R.A. Ravindran, S.A. Mahlke
2004 IEEE Micro  
This article focuses on the latter topic-compiler-directed architecture synthesis. More specifically, we examine compiler-directed synthesis of an ASIP's data path architecture.  ...  Hierarchical multicluster data path synthesis system Figure 1 shows our hierarchical system for multicluster architecture synthesis.  ... 
doi:10.1109/mm.2004.7 fatcat:hi3tnsrh4bdblhsrggff4weibu

Code and data partitioning for fine-grain parallelism

Michael L. Chu, Scott A. Mahlke
2007 Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools - LCTES '07  
This paper focuses on an alternative compiler-directed method for program parallelization by exploiting fine-grain instructionlevel parallelism (ILP).  ...  Introduction The recent shift to multicore designs for mainstream processors offers the potential to improve the performance of current applications.  ...  Our profile-guided data access partitioning technique was implemented as part of the Trimaran compiler infrastructure, a retargetable compiler for VLIW/EPIC processors.  ... 
doi:10.1145/1254766.1254798 dblp:conf/lctrts/ChuM07 fatcat:emwrcua3ofavdaa6fgsxtemvlm

Code and data partitioning for fine-grain parallelism

Michael L. Chu, Scott A. Mahlke
2007 SIGPLAN notices  
This paper focuses on an alternative compiler-directed method for program parallelization by exploiting fine-grain instructionlevel parallelism (ILP).  ...  Introduction The recent shift to multicore designs for mainstream processors offers the potential to improve the performance of current applications.  ...  Our profile-guided data access partitioning technique was implemented as part of the Trimaran compiler infrastructure, a retargetable compiler for VLIW/EPIC processors.  ... 
doi:10.1145/1273444.1254798 fatcat:ergokj7amnghppgaj36vqfknji

Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

Michael Chu, Rajiv Ravindran, Scott Mahlke
2007 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007)  
We propose a profile-guided method for partitioning memory accesses across distributed data caches.  ...  Overall, our data partitioning reduces stall cycles by up to 51% versus data-incognizant partitioning, and has an overall speedup average of 30% over a single core processor.  ...  RELATED WORK The topic of compiler partitioning for distributed architectures has been studied significantly in the past, especially in the context of multicluster VLIW processors.  ... 
doi:10.1109/micro.2007.15 dblp:conf/micro/ChuRM07 fatcat:z2rxc2rffra3fjb27p67l6ga2m

Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

Michael Chu, Rajiv Ravindran, Scott Mahlke
2007 Microarchitecture (MICRO), Proceedings of the Annual International Symposium on  
We propose a profile-guided method for partitioning memory accesses across distributed data caches.  ...  Overall, our data partitioning reduces stall cycles by up to 51% versus data-incognizant partitioning, and has an overall speedup average of 30% over a single core processor.  ...  RELATED WORK The topic of compiler partitioning for distributed architectures has been studied significantly in the past, especially in the context of multicluster VLIW processors.  ... 
doi:10.1109/micro.2007.4408269 fatcat:2dypqbgdajampkgtqmgjzx7hoi

Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications

Hongtao Zhong, Steven A. Lieberman, Scott A. Mahlke
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
This paper describes the Voltron architecture and associated compiler support for orchestrating bi-modal execution. 1-4244-0805-9/07/$25.00 ©2007 IEEE  ...  However, general-purpose applications do not provide many opportunities for identifying such threads, due to frequent use of pointers, recursive data structures, if-then-else branches, small function bodies  ...  Acknowledgments We thank Mike Schlansker for his excellent comments and suggestions for this work. Much gratitude goes to the anonymous referees who provided helpful feedback on this work.  ... 
doi:10.1109/hpca.2007.346182 dblp:conf/hpca/ZhongLM07 fatcat:sauqiioqtvfaro65x6xyffqr6m

Experiments with an ocean circulation model on CEDAR

L. DeRose, K. Gallivan, E. Gallopoulos
1992 Proceedings of the 6th international conference on Supercomputing - ICS '92  
The code was parameterized to offer several choices for data partitionings of the computational domain, for placement strategies for the data in the memory hierarchy, and for the number of clusters and  ...  We present the design of the GFDL ocean circulation model as adapted for simulations of the Mediterranean basin for the Cedar multicluster architecture.  ...  For Cedar, we consider partitionings to be of two types, one primary, taking into account data partitioning across clusters, the other secondary, specifying the partitioning across vector processors in  ... 
doi:10.1145/143369.143440 dblp:conf/ics/DeRoseGG92 fatcat:xejyme4nirf2xbkhudgqoik5ue

A distributed control path architecture for VLIW processors

Hongtao Zhong, K. Fan, S. Mahlke, M. Schlansker
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
In this paper, we propose a distributed control path architecture for VLIW processors (DVLIW) to overcome the scalability problem of VLIW control paths.  ...  DVLIW employs a multicluster design where each cluster contains a local instruction memory that provides all intra-cluster control.  ...  We also thank the anonymous referees for their excellent suggestions and feedback.  ... 
doi:10.1109/pact.2005.5 dblp:conf/IEEEpact/ZhongFMS05 fatcat:j6lo66yhp5dufhcpnot4s4fziu

MPIPP

Hu Chen, Wenguang Chen, Jian Huang, Bob Robert, H. Kuhn
2006 Proceedings of the 20th annual international conference on Supercomputing - ICS '06  
algorithms for multiclusters.  ...  It is desired to have a tool to map parallel processes to processors (or cores) automatically.  ...  INTRODUCTION SMP(Symmetric Multi-Processor) clusters and multiclusters are widely used to execute message-passing parallel applications.  ... 
doi:10.1145/1183401.1183451 dblp:conf/ics/ChenCHRK06 fatcat:y2etu5dounefpiygz6l4ucabtq

Parallel hyperspectral image processing on distributed multicluster systems

Fangbin Liu
2011 Journal of Applied Remote Sensing  
Such approaches work well for individual compute clusters, but-due to the inherently large wide-area communication overheads-these are generally not applied in distributed multicluster systems.  ...  As individual cluster computers often cannot satisfy the computational demands of emerging problems in hyperspectral imaging, there is a growing need for distributed supercomputing using multicluster systems  ...  Acknowledgments This work has been supported by the Netherlands Organization for Scientific Research (NWO) under Grant No. 643.000.602 (JADE-MM: Adaptive High-Performance Distributed Multimedia Computing  ... 
doi:10.1117/1.3595292 fatcat:ijiih7lb7naubjhcgg6bmyq3au

Region-based hierarchical operation partitioning for multicluster processors

Michael Chu, Kevin Fan, Scott Mahlke
2003 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation - PLDI '03  
The main challenge associated with clustered architectures is compiler support to effectively partition operations across the available resources on each cluster.  ...  In this work, we present a novel technique for clustering operations based on graph partitioning methods.  ...  Research on partitioning for multiprocessors has many similarities to clustering for multicluster processors.  ... 
doi:10.1145/781163.781165 fatcat:4mha7dlh3reixbibxtvmuywjrm

Region-based hierarchical operation partitioning for multicluster processors

Michael Chu, Kevin Fan, Scott Mahlke
2003 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation - PLDI '03  
The main challenge associated with clustered architectures is compiler support to effectively partition operations across the available resources on each cluster.  ...  In this work, we present a novel technique for clustering operations based on graph partitioning methods.  ...  Research on partitioning for multiprocessors has many similarities to clustering for multicluster processors.  ... 
doi:10.1145/781131.781165 dblp:conf/pldi/ChuFM03 fatcat:j6344vduifawnkfzxrylgpmm3u

Region-based hierarchical operation partitioning for multicluster processors

Michael Chu, Kevin Fan, Scott Mahlke
2003 SIGPLAN notices  
The main challenge associated with clustered architectures is compiler support to effectively partition operations across the available resources on each cluster.  ...  In this work, we present a novel technique for clustering operations based on graph partitioning methods.  ...  Research on partitioning for multiprocessors has many similarities to clustering for multicluster processors.  ... 
doi:10.1145/780822.781165 fatcat:iub2t4orwnb3zo6osvt323wflq

Parallelization and performance of Conjugate Gradient algorithms on the Cedar hierarchical-memory multiprocessor

Ulrike Meier, Rudolf Eigenmann
1991 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '91  
We describe its parallel implementation on the Cedar hierarchical memory multiprocessor from both angles, explicit manual parallelization and automatic compilation.  ...  The broad application range makes it an interesting object for investigating novel architectures and programming systems.  ...  A m uch less understood area in parallelizing compilation is entered once we attempt to partition data and distribute them to di erent processor or processor clusters.  ... 
doi:10.1145/109625.109644 dblp:conf/ppopp/MeierE91 fatcat:eakzchabpvht3dt5fg3njrkswa
« Previous Showing results 1 — 15 out of 84 results