The Internet Archive has a preservation copy of this work in our general collections.
The file type is application/pdf
.
Filters
Bounds on series-parallel slowdown
[article]
2009
arXiv
pre-print
The slowdown ratio describes how additional constraints affect the makespan. We disprove an existing conjecture positing a bound of two on the slowdown when workload is not considered. ...
We use activity networks (task graphs) to model parallel programs and consider series-parallel extensions of these networks. ...
Additionally, we do not need to consider networks that can be decomposed such that there is at least one series or parallel node in the decomposition, since the slowdown is then bounded above by the slowdown ...
arXiv:0904.4512v1
fatcat:qnyzmdxod5ggngfrmrxb7cq2e4
A Comparative Performance Study for Compute Node Sharing
2012
Journal of Computing Science and Engineering
Under the assumption, we developed a methodology to simulate job arrivals to a set of compute nodes, and gather and process performance data to calculate the percentage slowdown of parallel jobs. ...
We introduce a methodology for the study of the application-level performance of time-sharing parallel jobs on a set of compute nodes in high performance clusters and report our findings. ...
In Section V, we show the results of a series of concurrent job executions on a time-sharing, non-dedicated Linux cluster. ...
doi:10.5626/jcse.2012.6.4.287
fatcat:rrqwhqqmlndg5j3ubuuazv3ffm
Scalable and precise dynamic datarace detection for structured parallelism
2012
SIGPLAN notices
Our experimental results indicate an average (geometric mean) slowdown of 2.78× on a 16core SMP system. ...
Our algorithm requires constant space per memory location, works in parallel, and is efficient in practice. We implemented and evaluated our algorithm on a set of 15 benchmarks. ...
We would also like to thank John Mellor-Crummey from Rice University for his feedback and suggestions on this work. This work was supported in part by the U.S. ...
doi:10.1145/2345156.2254127
fatcat:defsbudcwba3devgqofmnfzupm
Page 3963 of Mathematical Reviews Vol. , Issue 91G
[page]
1991
Mathematical Reviews
We give efficient PRAM parallel algorithms for the string editing problem. If m = min(|x|, |y|) and n = max(|x], |y|), then our CREW bound is O(log mlogn) time with O(mn/logm) processors. ...
Summary: “The string editing problem for input strings x and y consists of transforming x into y by performing a series of weighted edit operations on x of overall minimum cost. ...
Automatic Methods for Hiding Latency in Parallel and Distributed Computation
1999
SIAM journal on computing (Print)
For example, given any "dataflow" type of algorithm that runs in T steps on an n-node ring with unit link delays, we show how to run the algorithm in O(T ) steps on any n-node bounded-degree connected ...
In this paper we describe methods for mitigating the degradation in performance caused by high latencies in parallel and distributed networks. ...
Even in the scenario where a large virtual network is being simulated on a small parallel machine, it is incumbent on the programmer to find the parallelism necessary to efficiently implement the algorithm ...
doi:10.1137/s0097539797326502
fatcat:dluxkbnzhjbjvb74tkmjbgucba
Predictability of the recent slowdown and subsequent recovery of large-scale surface warming using statistical methods
2016
Geophysical Research Letters
While our analyses focus on combining semiempirical estimates of internal climatic variability with statistical hindcast experiments, possible implications for initialized model predictions are also discussed ...
The temporary slowdown in large-scale surface warming during the early 2000s has been attributed to both external and internal sources of climate variability. ...
We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their ...
doi:10.1002/2016gl068159
fatcat:uiaqjf6e6jdppm6jhvl5ezw37q
Mechanisms and policies for supporting fine-grained cycle stealing
1999
Proceedings of the 13th international conference on Supercomputing - ICS '99
on the number of physical pages that can be obtained by guest processes when host processes are active, and 3) a new page-out strategy that adaptively increases the pageout rate of guest processes when ...
significantly impacting host processes: 1) a new guest process priority that prevents processes from stealing any processor time from host processes, 2) a new page replacement policy that imposes hard bounds ...
The left graph shows the slowdown experienced by the parallel applications. ...
doi:10.1145/305138.305170
dblp:conf/ics/RyuHK99
fatcat:2iizifev45amnn5xjxxb5thkau
User-guided symbiotic space-sharing of real workloads
2006
Proceedings of the 20th annual international conference on Supercomputing - ICS '06
Symbiotic space-sharing is a technique that can improve system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources. ...
Speedup can be induced both by mixing benchmark categories and even by mixing some memory-bound codes. In one instance however, we observe some cross-category slowdown. ...
We do not present rigorous error bounds on these experiments because the conclusions we intend to draw need only be notional. ...
doi:10.1145/1183401.1183450
dblp:conf/ics/WeinbergS06
fatcat:pynmyy2hgvbsbpxiidgqneburm
Exploring the Capacity of a Modern SMT Architecture to Deliver High Scientific Application Performance
[chapter]
2006
Lecture Notes in Computer Science
We evaluate and contrast speculative precomputation (SPR) and thread-level parallelism (TLP) techniques for a series of scientific codes executed on an SMT processor. ...
We also examine the effect of thread synchronization mechanisms on multithreaded parallel applications that are executed on a single SMT processor. ...
This requirement can be satisfied by imposing a specific upper bound on the amount of data to be prefetched. ...
doi:10.1007/11847366_19
fatcat:o64abdsbdbdszbzlmewiranvdu
Towards Accommodating Real-time Jobs on HPC Platforms
[article]
2021
arXiv
pre-print
Our results show that with 10% real-time job percentages, just-in-time checkpointing combined with our heuristic can improve the slowdowns of real-time jobs by 35% while limiting the increase of the slowdowns ...
Scientists want to use results from one experiment to guide the selection of the next or even to improve the course of a single experiment. ...
Thus, the production parallel job schedulers and the research studies on parallel job schedulers use heuristics. ...
arXiv:2103.13130v1
fatcat:ods6hyryyrde5mzdj4x5whtybu
Empirical evaluation of shared parallel execution on independently scheduled clusters
2005
CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005.
Parallel machines are typically space shared, or time shared such that only one application executes on a group of nodes at any given time. ...
It is generally assumed that executing multiple parallel applications simultaneously on a group of independently scheduled nodes is not efficient because of synchronization requirements. ...
Since EP is a compute bound program with no communication, it is expected that it will show the same slowdown whether gang scheduling or independent scheduling (or any other fair way of sharing the CPU ...
doi:10.1109/ccgrid.2005.1558570
dblp:conf/ccgrid/GhaneshKS05
fatcat:peeub6zcrje77l7x5xokftm36e
Enabling Large Neural Networks on Tiny Microcontrollers with Swapping
[article]
2021
arXiv
pre-print
Out-of-core NNs on MCUs raise multiple concerns: execution slowdown, storage wear out, energy consumption, and data security. ...
Running neural networks (NNs) on microcontroller units (MCUs) is becoming increasingly important, but is very difficult due to the tiny SRAM size of MCU. ...
A common pattern in an NN is that one or more computebound layers followed by one or more IO-bound layers, i.e. a pipeline with interleaved compute-bound and IO-bound stages. ...
arXiv:2101.08744v3
fatcat:xaudg3rs7zc6pesq6apa6jhizi
Improved parallel polynomial division and its extensions
1992
Proceedings., 33rd Annual Symposium on Foundations of Computer Science
The paper demonstrates some new techniques of supere ective slowdown of parallel algebraic computations, which we combine with a technique of stream contraction. ...
We also show how to extend our techniques to parallel implementation of other recursive processes, such as the evaluation modulo x N of the m-th root, p(x) 1=m , of p(x) (for any xed natural m), for which ...
This improvement relies on the new techniques that, in particular, decrease by q times the processor bound for a large class of parallel algebraic computations in the result of their slowdown by s = o( ...
doi:10.1109/sfcs.1992.267811
dblp:conf/focs/BiniP92
fatcat:4kbn6w4annetvl5eixyjk5hyxe
Scalable and precise dynamic datarace detection for structured parallelism
2012
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation - PLDI '12
Our experimental results indicate an average (geometric mean) slowdown of 2.78× on a 16core SMP system. ...
Our algorithm requires constant space per memory location, works in parallel, and is efficient in practice. We implemented and evaluated our algorithm on a set of 15 benchmarks. ...
We would also like to thank John Mellor-Crummey from Rice University for his feedback and suggestions on this work. This work was supported in part by the U.S. ...
doi:10.1145/2254064.2254127
dblp:conf/pldi/RamanZSVY12
fatcat:n5mvqmh4g5frrbvhexvsoqf6lu
The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory
[article]
2018
arXiv
pre-print
Our results give improved upper and lower bounds on the "price of asynchrony" when executing the fundamental SGD algorithm in a concurrent setting. ...
Given the recent practical focus on distributed machine learning, significant work has been dedicated to the convergence properties of this algorithm under the inconsistent and noisy updates arising from ...
of the form √ τ max n; 4) prove lower bounds on the slowdown due to asynchrony. ...
arXiv:1803.08841v2
fatcat:kc2rbde7qnfzroup25ucmvjwta
« Previous
Showing results 1 — 15 out of 8,883 results