Filters








6,369 Hits in 7.0 sec

Decoupling synchronization and data transfer in message passing systems of parallel computers

T. Stricker, J. Stichnoth, D. O'Hallaron, S. Hinrichs, T. Gross
1995 Proceedings of the 9th international conference on Supercomputing - ICS '95  
The designers of the communication system of future parallel computers are therefore strongly encouraged to provide good synchronization facilities in addition to high throughput data transfers to support  ...  Synchronization is an important issue for the design of a scalable parallel computer, and some systems include special hardware support for control messages or barriers.  ...  Control and data transfer messages The key function of the communication system for a parallel computer is to transfer data, as well as to provide explicit synchronization.  ... 
doi:10.1145/224538.224539 dblp:conf/ics/StrickerSOHG95 fatcat:qgnr6mz5bjf3tbzxcibsoc7y24

Automatic distribution of vision-tasks on computing clusters

Thomas Müller, Binh An Tran, Alois Knoll, John D. Owens, I-Jong Lin, Yu-Jin Zhang, Giordano B. Beretta
2011 Parallel Processing for Imaging Applications  
In this paper a consistent and efficient but yet convenient system for parallel computer vision, and in fact also realtime actuator control is proposed.  ...  This, in combination with a generic interface for hardware abstraction and integration of external software components, is setup on basis of the message passing interface (MPI).  ...  ACKNOWLEDGMENTS This work is supported by the German Research Foundation (DFG) within the Collaborative Research Center SFB 453 on High-Fidelity Telepresence and Teleaction ¶ .  ... 
doi:10.1117/12.872131 dblp:conf/ppia/0001TK11 fatcat:qcrxk2tfr5galjeig2lmhtnre4

Ten ways to waste a parallel computer

Katherine Yelick
2009 Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09  
a send with a receive to identify memory address to put data -Wildly popular in HPC, but cumbersome in some applications -Couples data transfer with synchronization • Using global address space  ...  decouples synchronization -Pay for what you need!  ...  • Enable programmers to get performance -Expose features for performance -Don't hide them • Go Green -Enable energy-efficient computers and software • Work with experts on software, algorithms, applications  ... 
doi:10.1145/1555754.1555755 dblp:conf/isca/Yelick09 fatcat:n2iyji7a7ncy7ns6f7vkk23h5m

Ten ways to waste a parallel computer

Katherine Yelick
2009 SIGARCH Computer Architecture News  
a send with a receive to identify memory address to put data -Wildly popular in HPC, but cumbersome in some applications -Couples data transfer with synchronization • Using global address space  ...  decouples synchronization -Pay for what you need!  ...  • Enable programmers to get performance -Expose features for performance -Don't hide them • Go Green -Enable energy-efficient computers and software • Work with experts on software, algorithms, applications  ... 
doi:10.1145/1555815.1555755 fatcat:prvqonkgindo7hsk5fw7er4qnq

Supporting sets of arbitrary connections on iWarp through communication context switches

Anja Feldmann, Thomas M. Stricker, Thomas E. Warfel
1993 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures - SPAA '93  
In this paper we introduce the ConSet communication model for distributed memory parallel computers.  ...  Message passing is an alternative programming Cost of a communication context switch: The algorithm is charged the necessary synchronization costs tsync which is related to the size of the machine, plus  ...  Shewchukfor assistancewith the FEM application, Gary Miller for general guidance, Eric Schwabe for early work on the communication compiler, and Thomas Gross, David O'Hallaron, and the rest of the CMU/  ... 
doi:10.1145/165231.165257 dblp:conf/spaa/FeldmannSW93 fatcat:yzuebwuthjfbbfdnxanozq5sie

Optimizing bandwidth limited problems using one-sided communication and overlap

C. Bell, D. Bonachea, R. Nishtala, K. Yelick
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
Even though the new algorithms require more messages for the same total volume of data, the resulting overlap leads to speedups of over 1.75× and 1.9× for the two-sided and one-sided implementations, respectively  ...  Our UPC results use the Berkeley UPC compiler with the GASNet communication system, and demonstrate the portability and scalability of that language and implementation, with performance approaching 0.5  ...  opportunities to further reduce overhead by decoupling synchronization from data transfer.  ... 
doi:10.1109/ipdps.2006.1639320 dblp:conf/ipps/BellBNY06 fatcat:33xs5aiegvgezkxiw77wn2ob6u

Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems

Danish Shehzad, Zeki Bozkuş
2016 Computational Intelligence and Neuroscience  
In NEURON for communication between processors Message Passing Interface (MPI) is used.  ...  The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors.  ...  Remote Memory Access unlike two-sided communication decouples data transfer from the synchronization of systems.  ... 
doi:10.1155/2016/3676582 pmid:27413363 pmcid:PMC4931058 fatcat:hcy75apri5gwfc42z5py4g3de4

The Nexus Approach to Integrating Multithreading and Communication

Ian Foster, Carl Kesselman, Steven Tuecke
1996 Journal of Parallel and Distributed Computing  
In this paper, we address the question of how to integrate threads and communication in high-performance distributed-memory systems.  ...  Lightweight threads have an important role to play in parallel systems: they can be used to exploit shared-memory parallelism, to mask communication and I/O latencies, to implement remote memory access  ...  Acknowledgments We are grateful to Hubertus Franke, John Garnett, Jonathan Geisler, David Kohr, Tal Lancaster, Robert Olson, and James Patton for their input to the Nexus design and implementation.  ... 
doi:10.1006/jpdc.1996.0108 fatcat:kp25veq36fcm5mpv6rzq655xbe

Decoupled hardware support for distributed shared memory

Steven K. Reinhardt, Robert W. Pfile, David A. Wood
1996 SIGARCH Computer Architecture News  
Two benchmarks are hampered by high communication overheads, but selectively replacing shared-memory operations with message passing provides speedups of at least 16 on both decoupled systems.  ...  To demonstrate the feasibility and simplicity of this access control device, we designed and built an FPGA-based version in under one year.  ...  Babak Falsafi and Shubu Mukherjee contributed to the development of the simulator used in this paper. Mark Hill and Jim Larus provided valuable comments on drafts of this paper.  ... 
doi:10.1145/232974.232979 fatcat:xlvt3pco3vakrhow3fr5taaefy

Decoupled hardware support for distributed shared memory

Steven K. Reinhardt, Robert W. Pfile, David A. Wood
1996 Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96  
Two benchmarks are hampered by high communication overheads, but selectively replacing shared-memory operations with message passing provides speedups of at least 16 on both decoupled systems.  ...  To demonstrate the feasibility and simplicity of this access control device, we designed and built an FPGA-based version in under one year.  ...  Babak Falsafi and Shubu Mukherjee contributed to the development of the simulator used in this paper. Mark Hill and Jim Larus provided valuable comments on drafts of this paper.  ... 
doi:10.1145/232973.232979 dblp:conf/isca/ReinhardtPW96 fatcat:3brtvx2ihjcczkrrqbw7qj6wra

Integrated Worst-Case Execution Time Estimation of Multicore Applications

Dumitru Potop-Butucaru, Isabelle Puaut, Marc Herbstritt
2013 Worst-Case Execution Time Analysis  
In this paper we extend a state-of-the-art WCET analysis technique to compute tight WCETs estimates of parallel applications running on multicores.  ...  We demonstrate that our analysis produces tighter execution time bounds than classical techniques which first determine the WCET of sequential code regions and then compute the global response time by  ...  Many WCRT estimation methods compute end-to-end response times of distributed applications communicating using message passing, or multiprocessor systems (e.g. [11] ).  ... 
doi:10.4230/oasics.wcet.2013.21 dblp:conf/wcet/Potop-ButucaruP13 fatcat:qkir7t55mfgexczbptxlpcn3fy

Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications

Hongtao Zhong, Steven A. Lieberman, Scott A. Mahlke
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
Current chip multiprocessors increase throughput by utilizing multiple cores to perform computation in parallel.  ...  In decoupled mode, the cores execute a set of fine-grain communicating threads extracted by the compiler.  ...  This research was supported by the National Science Foundation grant CCR-0325898, the MARCO Gigascale Systems Research Center, and equipment donated by Hewlett-Packard and Intel Corporation.  ... 
doi:10.1109/hpca.2007.346182 dblp:conf/hpca/ZhongLM07 fatcat:sauqiioqtvfaro65x6xyffqr6m

Accelerating large-scale DEVS-based simulation on the cell processor

Qi Liu, Gabriel Wainer
2010 Proceedings of the 2010 Spring Simulation Multiconference on - SpringSim '10  
By taking a performance-centered approach, the technique allows for exploitation of multi-dimensional parallelism to overcome the bottlenecks in the simulation process.  ...  This paper presents a new technique for efficient parallel simulation of large-scale DEVS-based models on the IBM Cell processor, which has one Power Processing Element (PPE) and eight Synergistic Processing  ...  Double buffering is used extensively in DMA transfers of event, state and rule data to/from the buffers to tap data-streaming parallelism.  ... 
doi:10.1145/1878537.1878667 fatcat:eexfdkiqnzd6povb3yw3zvdm24

Calculation of worst-case execution time for multicore processors using deterministic execution

Hamid Mushtaq, Zaid Al-Ars, Koen Bertels
2015 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)  
In this paper we extend a state-of-the-art WCET analysis technique to compute tight WCETs estimates of parallel applications running on multicores.  ...  We demonstrate that our analysis produces tighter execution time bounds than classical techniques which first determine the WCET of sequential code regions and then compute the global response time by  ...  Many WCRT estimation methods compute end-to-end response times of distributed applications communicating using message passing, or multiprocessor systems (e.g. [11] ).  ... 
doi:10.1109/patmos.2015.7347584 dblp:conf/patmos/MushtaqAB15 fatcat:fl7tbwaeezbcngayfxmhj5m6ge

The Design and Performance Evaluation of the DI-Multicomputer

Lynn Choi, Andrew A. Chien
1996 Journal of Parallel and Distributed Computing  
A preliminary version of some of this work appears in 10].  ...  The DI-multicomputer network interface provides e cient communication for both short and long messages, decoupling the processor from the transmission overhead for long messages while achieving a minimum  ...  Acknowledgements The research described in this paper was supported in part by NSF grants CCR-9209336 and MIP-92-23732, ONR grants N00014-92-J-1961 and N00014-93-1-1086and NASA grant NAG 1-613.  ... 
doi:10.1006/jpdc.1996.0094 fatcat:xgbqnqa32zf3zolmmqpi3srhau
« Previous Showing results 1 — 15 out of 6,369 results