A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization
2012
2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
access Data dependence is central for:
parallelization
locality optimization
... ...
algorithms
leaves final validation to the programmer
Framework > Core notions
Data-dependence
call
loop
iter
call
loop
iter
access
p 0
i 0
p 1
p 3
i 1
p 2
(carries a generalized ...
doi:10.1109/micro.2012.47
dblp:conf/micro/KetterlinC12
fatcat:nlrwjoqt3vaonmvhuconz3wjwu
Discovery of Potential Parallelism in Sequential Programs
2013
2013 42nd International Conference on Parallel Processing
The data-dependence profiler serves as the foundation of the parallelism discovery framework. Traditional dependence profiling approaches introduce a tremendous amount of time and memory overhead. ...
The framework contains two main components: an efficient data-dependence profiler and a set of parallelism discovery algorithms based on a language-independent concept called Computational Unit. ...
Parwiz also includes a few optimizations to lower the overhead of dynamic data-dependence profiling. ...
doi:10.1109/icpp.2013.119
dblp:conf/icpp/LiJW13
fatcat:6dc5s2ao4rhv7avxb4oai77hoi
Runtime automatic speculative parallelization
2011
International Symposium on Code Generation and Optimization (CGO 2011)
By leveraging the idle cores in a CMP to analyze, optimize, and participate in the execution of a running sequential program, RASP enables a collection of simpler cores to achieve sequential performance ...
In contrast to other systems for automatic speculative parallelization, RASP uses dynamic binary translation to optimize applications on-the-fly, without any need for recompilation or source code. ...
This research was supported in part by a Stanford Graduate Fellowship and an Intel Fellowship. ...
doi:10.1109/cgo.2011.5764675
dblp:conf/cgo/HertzbergO11
fatcat:z4qldi2pbnfrfhojrcwoa2xvom
A data-oriented profiler to assist in data partitioning and distribution for heterogeneous memory in HPC
2016
Parallel Computing
Profiling is of great assistance in understanding and optimizing an application's behavior. ...
In this paper we describe a profiling tool we have developed by extending the Valgrind framework and one of its tools: Callgrind. ...
Background To develop our data-oriented profiling tool, we extend Valgrind, a generic instrumentation framework, and Callgrind, its call-graph profiler. ...
doi:10.1016/j.parco.2015.10.006
fatcat:2yjaubbp4jhqne3m2ty2cd25fi
Tightfit: adaptive parallelization with foresight
2013
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2013
Irregular applications often exhibit data-dependent parallelism: Different inputs, and sometimes also different execution phases, enable different levels of parallelism. ...
The resulting prediction rule serves in deployment runs to foresee the available parallelism for a given workload and tune the parallelization system accordingly. ...
parallelism, and (ii) quantitative and structural analysis of data dependencies as a means of estimating available parallelism while abstracting away deployment-specific details. ...
doi:10.1145/2491411.2491443
dblp:conf/sigsoft/TrippR13
fatcat:in3hpkqqnbatbcx2x5chn6gasa
Towards a holistic approach to auto-parallelization
2009
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation - PLDI '09
scope for adaptation to different target architectures. ...
Using profile-driven parallelism detection we overcome the limitations of static analysis, enabling us to identify more application parallelism and only rely on the user for final approval. ...
A two-staged parallelization approach combining profiling-driven parallelism detection and machine-learning based mapping to generate OpenMP annotated parallel programs. addition, data scoping for shared ...
doi:10.1145/1542476.1542496
dblp:conf/pldi/TournavitisWFO09
fatcat:eowtsa2u3bdclepy2iiaxhbrfu
Towards a holistic approach to auto-parallelization
2009
SIGPLAN notices
scope for adaptation to different target architectures. ...
Using profile-driven parallelism detection we overcome the limitations of static analysis, enabling us to identify more application parallelism and only rely on the user for final approval. ...
A two-staged parallelization approach combining profiling-driven parallelism detection and machine-learning based mapping to generate OpenMP annotated parallel programs. addition, data scoping for shared ...
doi:10.1145/1543135.1542496
fatcat:fekruo3zf5dsxenzofj6vxz4ym
Optimizing the hybrid parallelization of BHAC
[article]
2021
arXiv
pre-print
We assess scaling and communication patterns in order to identify and alleviate MPI bottlenecks, with both runtime switches and precise code interventions. ...
Our performance characterization and threading analysis provided guidance in improving the concurrency and thus the efficiency of the OpenMP parallel regions. ...
At this point we performed a few single node scaling tests and tool-assisted profiling runs to determine the optimal node configuration for the (mainly compiler-assisted vectorization and MPI/OpenMP ratio ...
arXiv:2108.12240v1
fatcat:klcw7extx5d3nmucjau6umws6y
JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization
[article]
2021
arXiv
pre-print
We add a versatile code-translation method for multi-device utilization by which manually-optimized applications can be distributed automatically while keeping original code structure and parallelism. ...
This paper introduces JACC, an OpenACC runtime framework which enables the dynamic extension of OpenACC programs by serving as a transparent layer between the program and the compiler. ...
Acknowledgement The EPEEC project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 801051. ...
arXiv:2110.14340v1
fatcat:acfa6g7xm5dyfajen7fqkn4yri
Advanced environments for parallel and distributed applications: a view of current status
2002
Parallel Computing
In this paper we provide a view of the design and development activity concerning advanced environments for parallel and distributed computing. ...
Both classes are widely discussed, in light of the key concepts previously outlined, and several examples are provided, in order to give a picture of the current status and trends. (P. ...
, and other tools such as debuggers and profilers. ...
doi:10.1016/s0167-8191(02)00199-0
fatcat:zxwhjjonz5eejl24igm2j5pmre
HPCTOOLKIT: tools for performance analysis of optimized parallel programs
2009
Concurrency and Computation
Recently, new capabilities were added to HPCTOOLKIT for collecting call path profiles for fully-optimized codes without any compiler support, pinpointing and quantifying bottlenecks in multithreaded programs ...
HPCTOOLKIT can pinpoint and quantify scalability bottlenecks in fully-optimized parallel programs with a measurement overhead of only a few percent. ...
ACKNOWLEDGEMENTS HPCTOOLKIT include project alumni Nathan Froyd and Robert Fowler. Cristian Coarfa was involved in the development of scalability analysis using call path profiles. ...
doi:10.1002/cpe.1553
fatcat:nl2k7bwlbnanvn4cu76b23hkwq
Energy-aware parallelization flow and toolset for C code
2014
Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems - SCOPES '14
General Terms execution profiling, data dependency analysis, program parallelization, energy estimation ...
This trend accentuated the need to convert existing sequential code to effectively exploit the resources of these architectures. ...
The parallelization tool (ParTools) performs execution profiling and collects data dependencies program-wide at run-time. ...
doi:10.1145/2609248.2609264
dblp:conf/scopes/LazarescuCGLLPPTS14
fatcat:s4r2rkgq25hj5gwmuh6y3mcmpq
A Review of Parallelization Tools and Introduction to Easypar
2012
International Journal of Computer Applications
The classification is based on different eras of tool development, role playedby these tools in various parallelization stages, and features provided by parallel program assistance tools. ...
Multicore processors have paved the way to increase the performance of any application by the virtue of benefits of parallelization. ...
Parallel code developed until this point may not be optimal and there exists lot of scope for improvement. ...
doi:10.5120/8944-3108
fatcat:mxaohvalvrecrmxlplzyzq7x2i
Hidp: A hierarchical data parallel language
2013
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
This paper contributes HiDP, a hierarchical data parallel language. ...
The purpose of HiDP is to improve the coding productivity of integrating hierarchical data parallelism without significant loss of performance. ...
The * indicates that the parallelism degree is data dependent. ...
doi:10.1109/cgo.2013.6494994
dblp:conf/cgo/ZhangM13
fatcat:fparo6cmjzdypdmotsdn7swop4
Integrating profile-driven parallelism detection and machine-learning-based mapping
2014
ACM Transactions on Architecture and Code Optimization (TACO)
, demonstrating the potential of profile-guided and machine-learning based parallelization for complex multi-core platforms. ...
Using profile-driven parallelism detection we overcome the limitations of static analysis, enabling the identification of more application parallelism and only rely on the user for final approval. ...
The time spent on profiling and analysis depends on the program to be parallelized and the input data set used for profiling. ...
doi:10.1145/2579561
fatcat:x5b7hvxjgrgjnmyk3pozdrtzye
« Previous
Showing results 1 — 15 out of 48,384 results