Filters








4,419 Hits in 8.9 sec

Performing work with asynchronous processors: Message-delay-sensitive bounds

Dariusz R. Kowalski, Alex A. Shvartsman
2005 Information and Computation  
The first family uses as its basis a shared-memory ୋ Preliminary version of this paper appeared as: Performing Work with Asynchronous Processors: Message-Delay-Sensitive Bounds, Information and Computation  ...  Here, the upper bounds on work and communication are given as functions of p, t, and d, the upper bound on message delays, however, algorithms have no knowledge of d and they cannot rely on the existence  ...  Delay-sensitive lower bound on work for asynchronous algorithms We now present delay-sensitive lower bounds for asynchronous algorithms for the Do-All problem.  ... 
doi:10.1016/j.ic.2005.08.002 fatcat:p5wdppgfgndq5jyruu4hgsygle

Performing work with asynchronous processors

Dariusz R. Kowalski, Alex A. Shvartsman
2003 Proceedings of the twenty-second annual symposium on Principles of distributed computing - PODC '03  
The first family uses as its basis a shared-memory ୋ Preliminary version of this paper appeared as: Performing Work with Asynchronous Processors: Message-Delay-Sensitive Bounds, Information and Computation  ...  Here, the upper bounds on work and communication are given as functions of p, t, and d, the upper bound on message delays, however, algorithms have no knowledge of d and they cannot rely on the existence  ...  Delay-sensitive lower bound on work for asynchronous algorithms We now present delay-sensitive lower bounds for asynchronous algorithms for the Do-All problem.  ... 
doi:10.1145/872035.872076 dblp:conf/podc/KowalskiS03 fatcat:mcqqowndlzcsnplje55ec6b2ua

Multiple Instruction Stream Processor

Richard A. Hankins, Gautham N. Chinya, Jamison D. Collins, Perry H. Wang, Ryan Rakvic, Hong Wang, John P. Shen
2006 SIGARCH Computer Architecture News  
Using a research prototype MISP processor built on an IA-32-based multiprocessor system equipped with special firmware, we demonstrate the feasibility of implementing the MISP architecture.  ...  Microprocessor design is undergoing a major paradigm shift towards multi-core designs, in anticipation that future performance gains will come from exploiting threadlevel parallelism in the software.  ...  We acknowledge productive collaboration with Dion Rodgers, Baiju Patel, Prashant Sethi, Sanjiv Shah, Paul Petersen, Dave Poulsen, Grant Haab, Shirish Aundhe, Suresh Srinivas, John Reid and Xinmin Tian.  ... 
doi:10.1145/1150019.1136495 fatcat:ibcv5a3h5fhnzjpvcw7jlk5xli

Coordinated Energy Management in Heterogeneous Processors

Indrani Paul, Vignesh Ravi, Srilatha Manne, Manish Arora, Sudhakar Yalamanchili
2014 Scientific Programming  
DynaCo improves measured average energy-delay squared (ED2) product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.  ...  We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm.  ...  The memory-bounded behavior of Neighbor makes it insensitive to CPU frequency with minimal performance loss at the lower CPU DVFS state of P4.  ... 
doi:10.1155/2014/210762 fatcat:t73zetmjrjfojmlg7dlovv7xl4

Coordinated energy management in heterogeneous processors

Indrani Paul, Vignesh Ravi, Srilatha Manne, Manish Arora, Sudhakar Yalamanchili
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
DynaCo improves measured average energy-delay squared (ED^2) product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.  ...  We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power-and performance-management algorithm.  ...  The memory-bounded behavior of Neighbor makes it insensitive to CPU frequency with minimal performance loss at the lower CPU DVFS state of P4.  ... 
doi:10.1145/2503210.2503227 dblp:conf/sc/PaulRMAY13 fatcat:gqy377iu5bdmvm2e2zx64suao4

Analytic evaluation of shared-memory systems with ILP processors

Daniel J. Sorin, Vijay S. Pai, Sarita V. Adve, Mary K. Vernon, David A. Wood
1998 SIGARCH Computer Architecture News  
This paper develops and validates an analytical model for evaluating various types of architectural alternatives for shared-memory systems with processors that aggressively exploit instruction-level parallelism  ...  Compared to simulation, the analytical model is many orders of magnitude faster to solve, yielding highly accurate system performance estimates in seconds.  ...  Acknowledgments We would like to thank Derek Eager for many helpful discussions related to the work in this paper and Aaron Gresch for his work on an early version of the model.  ... 
doi:10.1145/279361.279408 fatcat:jtuze4fjond67e454oavdi3tnm

A priority-based processor sharing model for TDM passive optical networks

Yan Wang, Moshe Zukerman, Ronald Addie, Sammy Chan, Richard Harris
2010 IEEE Journal on Selected Areas in Communications  
The mean message delay is evaluated using a multiqueue processor sharing (MPS) model and an MPS with Heterogeneous Traffic (MPS-HT) model for the two approaches respectively.  ...  We extend the MPS model to a general MPS-HT model that enables the analysis of message delay performance in the case where the service quanta may be different for different services.  ...  [24] , packets are scheduled according to their delay bound requirements. Whenever it is not urgent to transmit the delay-sensitive packets, packets of best-effort traffic are scheduled first.  ... 
doi:10.1109/jsac.2010.100811 fatcat:gxua4tgxpndmbf66rfxpkjaizu

Unstructured peer-to-peer networks for sharing processor cycles

Asad Awan, Ronaldo A. Ferreira, Suresh Jagannathan, Ananth Grama
2006 Parallel Computing  
We build our system to work in an environment similar to current file-sharing networks such as Gnutella and Freenet.  ...  We support our claims of robustness and scalability analytically with high probabilistic guarantees.  ...  The delays measured depend on the sizes of the messages transferred. For random walk messages, we used a size of 1KB.  ... 
doi:10.1016/j.parco.2005.09.002 fatcat:f7crjkiu2ja2xek4qfhldamb44

Large-scale parallel programming: experience with BBN butterfly parallel processor

Thomas J. LeBlanc, Michael L. Scott, Christopher M. Brown
1988 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems - PPEALS '88  
In the course of our work with the Butterfly we have ported three compilers, developed five major and several minor library packages, built two different operating systems, and implemented dozens of applications  ...  Both locality and Amdahl's law become increasingly important with a very large number of nodes.  ...  of asynchronous message passing between heavyweight processes.  ... 
doi:10.1145/62115.62131 dblp:conf/ppopp/LeBlancSB88 fatcat:h6n5vcg375avvk63lwokvll74i

Large-scale parallel programming: experience with BBN butterfly parallel processor

Thomas J. LeBlanc, Michael L. Scott, Christopher M. Brown
1988 SIGPLAN notices  
In the course of our work with the Butterfly we have ported three compilers, developed five major and several minor library packages, built two different operating systems, and implemented dozens of applications  ...  Both locality and Amdahl's law become increasingly important with a very large number of nodes.  ...  of asynchronous message passing between heavyweight processes.  ... 
doi:10.1145/62116.62131 fatcat:mtulpjhdabaqdiz2ub5utyneqe

Emulating shared-memory Do-All algorithms in asynchronous message-passing systems

Dariusz R. Kowalski, Mariam Momenzadeh, Alexander A. Shvartsman
2010 Journal of Parallel and Distributed Computing  
This paper studies the efficiency of emulating shared-memory task-performing algorithms on asynchronous message-passing processors with quantifiable message latency.  ...  While prior solutions assumed synchrony and constant delays, the solutions given here yields subquadratic efficiency with asynchronous processors when the delays and failures is suitably constrained.  ...  With this in mind, a delay-sensitive study of Do-All in [13] yields asynchronous algorithms achieving subquadratic 1 work as long as the message delay d is o(N ).  ... 
doi:10.1016/j.jpdc.2009.12.002 fatcat:fy3bjknfqndpxdjodsoq7cua3i

Emulating Shared-Memory Do-All Algorithms in Asynchronous Message-Passing Systems [chapter]

Dariusz R. Kowalski, Mariam Momenzadeh, Alexander A. Shvartsman
2004 Lecture Notes in Computer Science  
This paper studies the efficiency of emulating shared-memory task-performing algorithms on asynchronous message-passing processors with quantifiable message latency.  ...  While prior solutions assumed synchrony and constant delays, the solutions given here yields subquadratic efficiency with asynchronous processors when the delays and failures is suitably constrained.  ...  With this in mind, a delay-sensitive study of Do-All in [13] yields asynchronous algorithms achieving subquadratic 1 work as long as the message delay d is o(N ).  ... 
doi:10.1007/978-3-540-27860-3_20 fatcat:o3ikfz2bgfbfxo6oamzwjnvuae

Tradeoff between data-, instruction-, and thread-level parallelism in stream processors

Jung Ho Ahn, Mattan Erez, William J. Dally
2007 Proceedings of the 21st annual international conference on Supercomputing - ICS '07  
We argue that for stream applications with scalable parallel algorithms the performance is not very sensitive to the control structures used within a large range of area-efficient architectural choices  ...  This paper explores the scalability of the Stream Processor architecture along the instruction-, data-, and thread-level parallelism dimensions.  ...  Second, a Stream Processor is designed to efficiently tolerate latencies by relying on software and the locality hierarchy, thereby reducing the sensitivity of performance to pipeline delays.  ... 
doi:10.1145/1274971.1274991 dblp:conf/ics/AhnED07 fatcat:ip56nttx25hjvdb4yj6hfrzl7i

Mamba: A scalable communication centric multi-threaded processor architecture

Gregory A. Chadwick, Simon W. Moore
2012 2012 IEEE 30th International Conference on Computer Design (ICCD)  
With multi-core architectures now firmly entrenched in many application areas both computer architects and programmers now face new challenges.  ...  Computer architects must increase core count to increase explicit parallelism available to the programmer in order to provide better performance whilst leaving the programming model presented tractable  ...  Mi-crobenchmarks looking at lock, barrier and FIFO performance show that Mamba is not sensitive to thread count, an equal amount of work split amongst more of less threads results in similar performance  ... 
doi:10.1109/iccd.2012.6378652 dblp:conf/iccd/ChadwickM12 fatcat:peugqonstvbtlgb64iq67aw344

LoGPC

Csaba Andras Moritz, Matthew I. Frank
1998 Performance Evaluation Review  
Finally, we use the model to identify trade-offs between synchronous and asynchronous message passing styles.  ...  AbstractÐIn many real applications, for example, those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources  ...  The model is asynchronous, i.e., processors work asynchronously and the latency experienced by any message is unpredictable, but, in an unloaded network, bounded above by v.  ... 
doi:10.1145/277858.277933 fatcat:ddcgz6e7wrfdtcddsmao53lfzq
« Previous Showing results 1 — 15 out of 4,419 results