A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Optimizing bandwidth limited problems using one-sided communication and overlap
2006
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium
In this paper we show that the one-sided communication model used in these languages also has a significant performance advantage for bandwidth-limited applications. ...
Our optimizations rely on aggressively overlapping communication with computation but spreading communication events throughout the course of the local computation. ...
Optimizing Bandwidth-Limited Applications In this section we consider a problem that is often hailed as the canonical example of a problem limited by bisection bandwidth, the 3D FFT. ...
doi:10.1109/ipdps.2006.1639320
dblp:conf/ipps/BellBNY06
fatcat:33xs5aiegvgezkxiw77wn2ob6u
Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap
2009
2009 IEEE International Symposium on Parallel & Distributed Processing
We demonstrate that the PGAS model, using a new port of the Berkeley UPC compiler and GASNet one-sided communication layer, outperforms two-sided (MPI) communication in both microbenchmarks and a case ...
/P communication layer for supporting one-sided communication and PGAS languages. ...
Acknowledgements We would like to thank Michael Blocksome, Douglas Miller, Sameer Kumar and the entire IBM DCMF team for their support in helping us port GASNet to BG/P. ...
doi:10.1109/ipdps.2009.5161076
dblp:conf/ipps/NishtalaHBY09
fatcat:wqopvxul3jeezk3ppoub7kdztm
Optimizing communication overlap for high-speed networks
2007
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07
We believe that algorithm design and optimization techniques that hide latency by taking advantage of communication overlap will facilitate obtaining good parallel efficiency and performance on the highly ...
We believe that due to the levels of concurrencies proposed for Petascale systems, efficient use of non-blocking communication including overlapping will be one of the keys for achieving good performance ...
Despite the lower latency and higher bandwidth on Elan networks, when using non-blocking communication, scalability is affected by the small TLB size and limited memory footprint. ...
doi:10.1145/1229428.1229436
dblp:conf/ppopp/IancuS07
fatcat:44xivstxi5dbzl3ydluhn2msce
Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
2016
The international journal of high performance computing applications
We improve data locality, combine it with an efficient sparse matrix vector kernel, and investigate the potential of overlapping computation with communication as well as the possibility of concurrent ...
A comprehensive performance evaluation is conducted using a suitable performance model. ...
Larger problems provide more parallelism, which brings the achieved bandwidth closer to the maximum bandwidth the roofline performance model is based on (see Section 6). ...
doi:10.1177/1094342016646844
fatcat:oyteuyn6qfdw7cf5ya25uag5jq
Speeding up NGB with distributed file streaming framework
2006
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium
By studying I/O patterns of NGB codes we have identified program locations where it is possible to overlap computation and data workflow phases. ...
In addition to the challenges it provides, it also offers new opportunities for optimization. ...
Acknowledgements The authors would like to thank Eric Huang and Wenguang Chen for their comments during the early stages of this study. ...
doi:10.1109/ipdps.2006.1639655
dblp:conf/ipps/LiCHRK06
fatcat:kgdl735csjdppmgknutpykeww4
Parallel Sparse Matrix-Vector Multiplication as a Test Case for Hybrid MPI+OpenMP Programming
2011
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be ...
achieved by using a dedicated communication thread, which may run on a virtual core. ...
Keller and T. Schönemeyer for valuable discussions, A. Basermann for providing the RCM transformation, and K. Stüben and H. J. Plum for providing and supporting the AMG test case. ...
doi:10.1109/ipdps.2011.332
dblp:conf/ipps/SchubertHFW11
fatcat:64mqcflvdbf7blzuwamb3hbkgi
Finite Duration Root Nyquist Pulses with Maximum In-Band Fractional Energy
2010
IEEE Communications Letters
We design root Nyquist pulses having maximum inband fractional energy for a given finite bandwidth. The problem of maximizing the ratio of in-band energy to total energy has been dealt with earlier. ...
But an exact solution could not be found since it involved optimization of a quadratic objective function with quadratic constraints. ...
We have limitations of channel bandwidth, so we need to concentrate most of its energy in a finite bandwidth to use the maximum spectral resources. ...
doi:10.1109/lcomm.2010.09.100314
fatcat:h4qh2c5fqvcxhdtgsn5j5r2mce
Wideband Printed Monopole Design Using a Genetic Algorithm
2007
IEEE Antennas and Wireless Propagation Letters
The parasitic elements optimize the effective feedgap between the radiator on one side and the groundplane on the other side. ...
ANTENNA GEOMETRY The microstrip-fed GA plate monopole is printed on one side of FR4 substrate of 1.52 mm thickness and metalization 1536-1225/$25.00 © 2007 IEEE Authorized licensed use limited to: DUBLIN ...
doi:10.1109/lawp.2007.891962
fatcat:iqkjktpuyjcwbcdbvyalptzvoe
A preliminary evaluation of the hardware acceleration of the cray gemini interconnect for PGAS languages and comparison with MPI
2011
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems - PMBS '11
rate, aggregate bandwidth, and computation and communication overlap capability. ...
The study also reveals important information about how to optimize one-sided Gemini communication. ...
We also measured the messaging rate using get instead of put for CAF. In the bandwidth limit, as one might expect, the get and put performance is identical. ...
doi:10.1145/2088457.2088467
fatcat:4fxwk2i35ngwdcnwbotdpsmv3i
Productivity and performance using partitioned global address space languages
2007
Proceedings of the 2007 international workshop on Parallel symbolic computation - PASCO '07
Both compilers use a source-to-source strategy that translates the parallel languages to C with calls to a communication layer called GASNet. ...
The result is portable highperformance compilers that run on a large variety of shared and distributed memory multiprocessors. ...
HAND-OPTIMIZED BENCHMARKS The performance benefits of one-sided communication are not limited to microbenchmarks. ...
doi:10.1145/1278177.1278183
dblp:conf/issac/YelickBCCDDGHHHIKNSWW07
fatcat:hpedjb24vvfkbpi7fbawt6xf4u
A Simulation Framework to Automatically Analyze the Communication-Computation Overlap in Scientific Applications
2010
2010 IEEE International Conference on Cluster Computing
Valgrind instruments the legacy MPI application and generates the execution traces, then Dimemas uses the obtained traces and reconstructs the application's time-behavior on a configurable parallel platform ...
of simulated time behaviors, that further allows useful comparisons of the non-overlapped and the overlapped executions. ...
One solution to optimize network usage is to overlap communication delays with useful computation of the application. ...
doi:10.1109/cluster.2010.33
dblp:conf/cluster/SuboticSLV10
fatcat:egsuajz4mzfy3awxwju657xqwe
Analyzing communication models for distributed thread-collaborative processors in terms of energy and time
2015
2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
In this work, we analyze data movement optimizations for distributed heterogeneous systems based on CPUs and GPUs. ...
Insights include that (1) specialized models offer substantial advantages for a variety of workloads, (2) thread-collaborative models only seem to be limited by reduced overlap possibilities, and (3) a ...
ACKNOWLEDGMENT We gratefully acknowledge the generous support of this research effort by Nvidia, Xilinx Inc, and the EXTOLL Corporation. ...
doi:10.1109/ispass.2015.7095817
dblp:conf/ispass/KlenkOF15
fatcat:nztvqp3njfb3bidilhlnnbxxay
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
[article]
2018
arXiv
pre-print
and communication resources. ...
We found that timely training requires high performance parameter servers (PSs) with optimized network stacks and gradient processing pipelines, as well as server and network hardware with balanced computation ...
a communication-bound workload. ...
arXiv:1805.07891v1
fatcat:jrur6u3vjfgrxpfi6lialuhoru
POSTER
2016
Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security - CCS'16
Although several tree-based ORAMs such as PathORAM [8] and RingORAM [6] have achieved near-optimal bandwidth for single client scenarios, their low overall throughput due to high latency of access -as ...
with privacy (position map) and designing everything else using append-only data structures that can be then merged securely in a separate eviction step. ...
RingO-RAM [6] further optimizes PathORAM [8] for practical deployment by reducing the bandwidth complexity constants. Problem Definition. ...
doi:10.1145/2976749.2989062
dblp:conf/ccs/ChakrabortiS16
fatcat:2iwjhh2vbzczhnfifgdbjwkkpm
Performance portable optimizations for loops containing communication operations
2008
Proceedings of the 22nd annual international conference on Supercomputing - ICS '08
Effective use of communication networks is critical to the performance and scalability of parallel applications. ...
Studies of well-tuned programs have suggested that PGAS languages are effective at utilizing modern networks because their one-sided communication is a good match to the underlying network hardware. ...
We are aware of only one other compiler effort to exploit overlap for loop nests using one sided communication. This is work performed by Paek and presented in his PhD Thesis. ...
doi:10.1145/1375527.1375567
dblp:conf/ics/IancuCY08
fatcat:ufzgdktrj5civkicf27kfuenoy
« Previous
Showing results 1 — 15 out of 42,946 results