Filters








8,883 Hits in 2.2 sec

Bounds on series-parallel slowdown [article]

András Z. Salamon, Vashti Galpin
2009 arXiv   pre-print
The slowdown ratio describes how additional constraints affect the makespan. We disprove an existing conjecture positing a bound of two on the slowdown when workload is not considered.  ...  We use activity networks (task graphs) to model parallel programs and consider series-parallel extensions of these networks.  ...  Additionally, we do not need to consider networks that can be decomposed such that there is at least one series or parallel node in the decomposition, since the slowdown is then bounded above by the slowdown  ... 
arXiv:0904.4512v1 fatcat:qnyzmdxod5ggngfrmrxb7cq2e4

A Comparative Performance Study for Compute Node Sharing

Jeho Park, Shui F. Lam
2012 Journal of Computing Science and Engineering  
Under the assumption, we developed a methodology to simulate job arrivals to a set of compute nodes, and gather and process performance data to calculate the percentage slowdown of parallel jobs.  ...  We introduce a methodology for the study of the application-level performance of time-sharing parallel jobs on a set of compute nodes in high performance clusters and report our findings.  ...  In Section V, we show the results of a series of concurrent job executions on a time-sharing, non-dedicated Linux cluster.  ... 
doi:10.5626/jcse.2012.6.4.287 fatcat:rrqwhqqmlndg5j3ubuuazv3ffm

Scalable and precise dynamic datarace detection for structured parallelism

Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, Eran Yahav
2012 SIGPLAN notices  
Our experimental results indicate an average (geometric mean) slowdown of 2.78× on a 16core SMP system.  ...  Our algorithm requires constant space per memory location, works in parallel, and is efficient in practice. We implemented and evaluated our algorithm on a set of 15 benchmarks.  ...  We would also like to thank John Mellor-Crummey from Rice University for his feedback and suggestions on this work. This work was supported in part by the U.S.  ... 
doi:10.1145/2345156.2254127 fatcat:defsbudcwba3devgqofmnfzupm

Page 3963 of Mathematical Reviews Vol. , Issue 91G [page]

1991 Mathematical Reviews  
We give efficient PRAM parallel algorithms for the string editing problem. If m = min(|x|, |y|) and n = max(|x], |y|), then our CREW bound is O(log mlogn) time with O(mn/logm) processors.  ...  Summary: “The string editing problem for input strings x and y consists of transforming x into y by performing a series of weighted edit operations on x of overall minimum cost.  ... 

Automatic Methods for Hiding Latency in Parallel and Distributed Computation

Matthew Andrews, Tom Leighton, P. Takis Metaxas, Lisa Zhang
1999 SIAM journal on computing (Print)  
For example, given any "dataflow" type of algorithm that runs in T steps on an n-node ring with unit link delays, we show how to run the algorithm in O(T ) steps on any n-node bounded-degree connected  ...  In this paper we describe methods for mitigating the degradation in performance caused by high latencies in parallel and distributed networks.  ...  Even in the scenario where a large virtual network is being simulated on a small parallel machine, it is incumbent on the programmer to find the parallelism necessary to efficiently implement the algorithm  ... 
doi:10.1137/s0097539797326502 fatcat:dluxkbnzhjbjvb74tkmjbgucba

Predictability of the recent slowdown and subsequent recovery of large-scale surface warming using statistical methods

Michael E. Mann, Byron A. Steinman, Sonya K. Miller, Leela M. Frankcombe, Matthew H. England, Anson H. Cheung
2016 Geophysical Research Letters  
While our analyses focus on combining semiempirical estimates of internal climatic variability with statistical hindcast experiments, possible implications for initialized model predictions are also discussed  ...  The temporary slowdown in large-scale surface warming during the early 2000s has been attributed to both external and internal sources of climate variability.  ...  We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their  ... 
doi:10.1002/2016gl068159 fatcat:uiaqjf6e6jdppm6jhvl5ezw37q

Mechanisms and policies for supporting fine-grained cycle stealing

Kyung Dong Ryu, Jeffrey K. Hollingsworth, Peter J. Keleher
1999 Proceedings of the 13th international conference on Supercomputing - ICS '99  
on the number of physical pages that can be obtained by guest processes when host processes are active, and 3) a new page-out strategy that adaptively increases the pageout rate of guest processes when  ...  significantly impacting host processes: 1) a new guest process priority that prevents processes from stealing any processor time from host processes, 2) a new page replacement policy that imposes hard bounds  ...  The left graph shows the slowdown experienced by the parallel applications.  ... 
doi:10.1145/305138.305170 dblp:conf/ics/RyuHK99 fatcat:2iizifev45amnn5xjxxb5thkau

User-guided symbiotic space-sharing of real workloads

Jonathan Weinberg, Allan Snavely
2006 Proceedings of the 20th annual international conference on Supercomputing - ICS '06  
Symbiotic space-sharing is a technique that can improve system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources.  ...  Speedup can be induced both by mixing benchmark categories and even by mixing some memory-bound codes. In one instance however, we observe some cross-category slowdown.  ...  We do not present rigorous error bounds on these experiments because the conclusions we intend to draw need only be notional.  ... 
doi:10.1145/1183401.1183450 dblp:conf/ics/WeinbergS06 fatcat:pynmyy2hgvbsbpxiidgqneburm

Exploring the Capacity of a Modern SMT Architecture to Deliver High Scientific Application Performance [chapter]

Evangelia Athanasaki, Nikos Anastopoulos, Kornilios Kourtis, Nectarios Koziris
2006 Lecture Notes in Computer Science  
We evaluate and contrast speculative precomputation (SPR) and thread-level parallelism (TLP) techniques for a series of scientific codes executed on an SMT processor.  ...  We also examine the effect of thread synchronization mechanisms on multithreaded parallel applications that are executed on a single SMT processor.  ...  This requirement can be satisfied by imposing a specific upper bound on the amount of data to be prefetched.  ... 
doi:10.1007/11847366_19 fatcat:o64abdsbdbdszbzlmewiranvdu

Towards Accommodating Real-time Jobs on HPC Platforms [article]

Sam Nickolay, Eun-Sung Jung, Rajkumar Kettimuthu, Ian Foster
2021 arXiv   pre-print
Our results show that with 10% real-time job percentages, just-in-time checkpointing combined with our heuristic can improve the slowdowns of real-time jobs by 35% while limiting the increase of the slowdowns  ...  Scientists want to use results from one experiment to guide the selection of the next or even to improve the course of a single experiment.  ...  Thus, the production parallel job schedulers and the research studies on parallel job schedulers use heuristics.  ... 
arXiv:2103.13130v1 fatcat:ods6hyryyrde5mzdj4x5whtybu

Empirical evaluation of shared parallel execution on independently scheduled clusters

M. Ghanesh, S. Kumar, J. Subhlok
2005 CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005.  
Parallel machines are typically space shared, or time shared such that only one application executes on a group of nodes at any given time.  ...  It is generally assumed that executing multiple parallel applications simultaneously on a group of independently scheduled nodes is not efficient because of synchronization requirements.  ...  Since EP is a compute bound program with no communication, it is expected that it will show the same slowdown whether gang scheduling or independent scheduling (or any other fair way of sharing the CPU  ... 
doi:10.1109/ccgrid.2005.1558570 dblp:conf/ccgrid/GhaneshKS05 fatcat:peeub6zcrje77l7x5xokftm36e

Enabling Large Neural Networks on Tiny Microcontrollers with Swapping [article]

Hongyu Miao, Felix Xiaozhu Lin
2021 arXiv   pre-print
Out-of-core NNs on MCUs raise multiple concerns: execution slowdown, storage wear out, energy consumption, and data security.  ...  Running neural networks (NNs) on microcontroller units (MCUs) is becoming increasingly important, but is very difficult due to the tiny SRAM size of MCU.  ...  A common pattern in an NN is that one or more computebound layers followed by one or more IO-bound layers, i.e. a pipeline with interleaved compute-bound and IO-bound stages.  ... 
arXiv:2101.08744v3 fatcat:xaudg3rs7zc6pesq6apa6jhizi

Improved parallel polynomial division and its extensions

D. Bini, V. Pan
1992 Proceedings., 33rd Annual Symposium on Foundations of Computer Science  
The paper demonstrates some new techniques of supere ective slowdown of parallel algebraic computations, which we combine with a technique of stream contraction.  ...  We also show how to extend our techniques to parallel implementation of other recursive processes, such as the evaluation modulo x N of the m-th root, p(x) 1=m , of p(x) (for any xed natural m), for which  ...  This improvement relies on the new techniques that, in particular, decrease by q times the processor bound for a large class of parallel algebraic computations in the result of their slowdown by s = o(  ... 
doi:10.1109/sfcs.1992.267811 dblp:conf/focs/BiniP92 fatcat:4kbn6w4annetvl5eixyjk5hyxe

Scalable and precise dynamic datarace detection for structured parallelism

Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, Eran Yahav
2012 Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation - PLDI '12  
Our experimental results indicate an average (geometric mean) slowdown of 2.78× on a 16core SMP system.  ...  Our algorithm requires constant space per memory location, works in parallel, and is efficient in practice. We implemented and evaluated our algorithm on a set of 15 benchmarks.  ...  We would also like to thank John Mellor-Crummey from Rice University for his feedback and suggestions on this work. This work was supported in part by the U.S.  ... 
doi:10.1145/2254064.2254127 dblp:conf/pldi/RamanZSVY12 fatcat:n5mvqmh4g5frrbvhexvsoqf6lu

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory [article]

Dan Alistarh, Christopher De Sa, Nikola Konstantinov
2018 arXiv   pre-print
Our results give improved upper and lower bounds on the "price of asynchrony" when executing the fundamental SGD algorithm in a concurrent setting.  ...  Given the recent practical focus on distributed machine learning, significant work has been dedicated to the convergence properties of this algorithm under the inconsistent and noisy updates arising from  ...  of the form √ τ max n; 4) prove lower bounds on the slowdown due to asynchrony.  ... 
arXiv:1803.08841v2 fatcat:kc2rbde7qnfzroup25ucmvjwta
« Previous Showing results 1 — 15 out of 8,883 results