Filters








445 Hits in 3.7 sec

Maximizing multiprocessor performance with the SUIF compiler

M.W. Hall, J.M. Anderson, S.P. Amarasinghe, B.R. Murphy, Shih-Wei Liao, E. Bugnion, M.S Lam
1996 Computer  
This article describes automatic parallelization techniques in the SUIF (Stanford University Intermediate  ...  To use a multiprocessor effectively, the compiler must exploit coarse-grain parallelism, locating large computations that can execute independently in parallel.  ...  Acknowledgments This research was supported in part by the Air Force Materiel Command and ARPA contracts F30602-95-C-0098, DABT63-95-C-0118, and DABT63-94-C-0054; a Digital Equipment Corporation grant;  ... 
doi:10.1109/2.546613 fatcat:6x7urb56urbrho5ycgavfdxwte

Combining Compile-Time and Run-Time Parallelization

Sungdo Moon, Byoungro So, Mary W. Hall
1999 Scientific Programming  
We support this claim with the results of an experiment that measures the safety of parallelization at run time for loops left unparallelized by the Stanford SUIF compiler's automatic parallelization system  ...  We present results of measurements on programs from two benchmark suites – SPECFP95and NASsample benchmarks – which identify inherently parallel loops in these programs that are missed by the compiler.  ...  analysis, that have significantly enhanced the effectiveness of automatic parallelization.  ... 
doi:10.1155/1999/490628 fatcat:gb63mgz6brhp3ijzql6nxakpxy

Tulipse: A Visualization Framework for User-Guided Parallelization [chapter]

Yi Wen Wong, Tomasz Dubrownik, Wai Teng Tang, Wen Jun Tan, Rubing Duan, Rick Siow Mong Goh, Shyh-hao Kuo, Stephen John Turner, Weng-Fai Wong
2012 Lecture Notes in Computer Science  
Our paper will demonstrate how these two new perspectives aid in the parallelization of code.  ...  Current options available to the programmer include either automatic parallelization or a complete rewrite in a parallel programming language. However, there are limitations with these options.  ...  This work was supported by the Agency for Science, Technology and Research PSF Grant No. 102-101-0028. We are also grateful to the anonymous reviewers for their suggestions.  ... 
doi:10.1007/978-3-642-32820-6_3 fatcat:6rxr3qnytbglrl4y6qsvmus3oa

Source level transformations to improve I/O data partitioning

Yijian Wang, David Kaeli
2003 Proceedings of the international workshop on Storage network architecture and parallel I/Os - SNAPI '03  
We use the SUIF compiler infrastructure to perform data-flow analysis and recognize LDADs.  ...  The main goal for parallel I/O is to increase I/O parallelism by providing multiple, independent data channels between processors and disks.  ...  In [9] , Hall et al. proposed to automatically parallelize and optimize sequential programs for sharedmemory multiprocessors using SUIF.  ... 
doi:10.1145/1162618.1162622 fatcat:qsdmohpq6bhkvggdpqa2whi6u4

Factors Influencing the Performance of a CPU-RFU Hybrid Architecture [chapter]

Girish Venkataramani, Suraj Sudhir, Mihai Budiu, Seth Copen Goldstein
2002 Lecture Notes in Computer Science  
Previous efforts at combining a processor core with a reconfigurable fabric are examined in the light of these issues. We also present simulation results that emphasize the impact of these factors.  ...  However, today's superscalar processors are both complex and adept at extracting Instruction Level Parallelism (ILP), which introduces many complex issues to the design of a hybrid CPU-RFU system.  ...  Acknowledgments This research is funded by the National Science Foundation (NSF) under Grant No. CCR-9876248.  ... 
doi:10.1007/3-540-46117-5_98 fatcat:izs4k4djbzcklkiiy5cvoxmtqa

OMPI

Hirotaka Ogawa, Satoshi Matsuoka
1996 Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '96  
To validate the effectiveness for our OMPI system, we performed baseline as well as more extensive benchmarks on a set of application cores with different communication characteristics, on the 64-node  ...  MPI is gaining acceptance as a standard for message-passing in high-performance computing, due to its powerful and flexible support of various communication styles.  ...  Another interesting approach is to implement the core functionality of a subset of MPI, and implement more sophisticated functionalities be implemented in terms of the core subset, and optimized via our  ... 
doi:10.1145/369028.369106 fatcat:2aqqjayc7ze6palivzspbpbdcu

Use of a Bit-true Data Flow Analysis for Processor-Specific Source Code Optimization

Heiko Falk, Jens Wagner, Andre Schaefer
2006 2006 IEEE/ACM/IFIP Workshop on Embedded Systems for Real Time Multimedia  
Nowadays, key characteristics of a processor's instruction set are only exploited in high-level languages by using inline assembly or compiler intrinsics.  ...  Inserting intrinsics into the source code is up to the programmer, since only few automatic approaches exist. Additionally, these approaches base on simple code pattern matching strategies.  ...  They base on the automatic insertion of compiler intrinsics into the source code in order to exploit specific features of the TI instruction set.  ... 
doi:10.1109/estmed.2006.321286 dblp:conf/estimedia/FalkWS06 fatcat:hgoiy5oog5do7fdodzuhqapede

Program mapping onto network processors by recursive bipartitioning and refining

Jia Yu, Jingnan Yao, Laxmi Bhuyan, Jun Yang
2007 Proceedings - Design Automation Conference  
The performance of our scheme is evaluated with a suite of NP benchmarks using SUIF/Machine SUIF compiler and Intel IXA Architecture Tool.  ...  The bipartition continues until the code of the tasks can be fit into the instruction memory of processing elements.  ...  Pure code duplication is no longer effective in balancing the workload in this case, thus balancing stage time in the hybrid parallel and pipeline model becomes critical.  ... 
doi:10.1145/1278480.1278681 dblp:conf/dac/YuYBY07 fatcat:f2boqepclbauho5twbui4ymjlq

Program Mapping onto Network Processors by Recursive Bipartitioning and Refining

Jia Yu, Jingnan Yao, Laxmi Bhuyan, Jun Yang
2007 Proceedings - Design Automation Conference  
The performance of our scheme is evaluated with a suite of NP benchmarks using SUIF/Machine SUIF compiler and Intel IXA Architecture Tool.  ...  The bipartition continues until the code of the tasks can be fit into the instruction memory of processing elements.  ...  Pure code duplication is no longer effective in balancing the workload in this case, thus balancing stage time in the hybrid parallel and pipeline model becomes critical.  ... 
doi:10.1109/dac.2007.375275 fatcat:dcod5jzhkfffnn3eg7zcpdtf4u

Effet du son de blé et de la nature des lipides du régime sur la digestibilité, l'activité des enzymes digestives et la lipémie des porcelets méditerranéens de la race Alentejana

JPB Freire, J. Peiniau, LF Cunha, JAA Almeida, A. Aumaitre
1996 Annales de Zootechnie  
In addition, the effect of the diet on the serum level of glucose, urea, cholesterol and triglycerides was measured prior and during 5 h after an experimental meal.  ...  The inclusion of wheat bran to the diet decreased the FAD of energy by 3 and 6 percentage units in the presence of olive oil and tallow, respectively.  ...  In addition, the effect of the diet on the serum level of glucose, urea, cholesterol and triglycerides was measured prior and during 5 h after an experimental meal.  ... 
doi:10.1051/animres:19960408 fatcat:p3w5a2abujentcwuhdm3rxsgyy

Automatic speculative DOALL for clusters

Hanjun Kim, Nick P. Johnson, Jae W. Lee, Scott A. Mahlke, David I. August
2012 Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12  
However, the programmers still need to analyze the data and control dependences of the program to find effective parallelization strategies.  ...  Automatic parallelization research has a rich history, especially in the scientific computing community.  ...  The primary contributions of this paper are: • The first fully-automatic speculative parallelization system targeting commodity clusters (called Cluster Spec-DOALL) • Highly effective communication optimizations  ... 
doi:10.1145/2259016.2259029 dblp:conf/cgo/KimJLMA12 fatcat:rekpv7ckurahrafuvatz55o2ha

Ace

Mukund Raghavachari, Anne Rogers
1997 SIGPLAN notices  
Introduction Shared memory's popularity as a parallel programming model is due, in part, to the fact that the complexity of communication is concealed from the programmer.  ...  In addition, the lack of separation between application and communication code inhibits the creation of protocol libraries.  ...  We wish to thank the National Center for Supercomputing Applications at the University of Illinois (UrbanwChampaign) for the use of their CM-5. We would like to thank A. Appel, C. Dunworth, J.  ... 
doi:10.1145/263767.263777 fatcat:5b55hnxgrvau7fwt6erepj5p3y

The Stanford Hydra CMP

L. Hammond, B.A. Hubbert, M. Siu, M.K. Prabhu, M. Chen, K. Olukolun
2000 IEEE Micro  
These processors can extract greater amounts of instruction-level parallelism, or ILP, by finding nondependent instructions that occur near each other in the original program code.  ...  Unfortunately, there is only a finite amount of ILP present in any particular sequence of instructions that the processor executes because instructions from the same sequence are typically highly interdependent  ...  Monica Lam, Jeff Oplinger, and David Heine from the SUIF group provided help with compiler support and analysis of early speculation algorithms. Nick Kucharewski, Raymond M.  ... 
doi:10.1109/40.848474 fatcat:hwou4dbdqfhi5clj6o23atuaka

Floating Point to Fixed Point Conversion of C Code [chapter]

Andrea G. M. Cilio, Henk Corporaal
1999 Lecture Notes in Computer Science  
It allows the user to specify the position of the binary point in the source code and let the converter automatically transform floating-point variables and operations.  ...  We demonstrate the validity of our approach on a series of experiments.  ...  The effect of the heuristic is less pronounced when 1 A( 0) is larger.  ... 
doi:10.1007/978-3-540-49051-7_16 fatcat:okgoxwk6pvfv3cudk5j6hls3oe

ISDL

George Hadjiyiannis, Silvina Hanono, Srinivas Devadas
1997 Proceedings of the 34th annual conference on Design automation conference - DAC '97  
The features and flexibility of ISDL enable the description of vastly different architectures, in particular VLIW architectures.  ...  We have written a tool that, given an ISDL description of a processor, automatically generates an assembler for it. Ongoing work includes the development of an automatic code-generator generator.  ...  The second set of brackets in each operation contain an RTL description of the effect of the operation.  ... 
doi:10.1145/266021.266108 dblp:conf/dac/HadjiyiannisHD97 fatcat:pm4nhmoadrhezezgahy3ywor7q
« Previous Showing results 1 — 15 out of 445 results