Filters








48,384 Hits in 7.1 sec

Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization

Alain Ketterlin, Philippe Clauss
2012 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture  
access Data dependence is central for: parallelization locality optimization ...  ...  algorithms leaves final validation to the programmer Framework > Core notions Data-dependence call loop iter call loop iter access p 0 i 0 p 1 p 3 i 1 p 2 (carries a generalized  ... 
doi:10.1109/micro.2012.47 dblp:conf/micro/KetterlinC12 fatcat:nlrwjoqt3vaonmvhuconz3wjwu

Discovery of Potential Parallelism in Sequential Programs

Zhen Li, Ali Jannesari, Felix Wolf
2013 2013 42nd International Conference on Parallel Processing  
The data-dependence profiler serves as the foundation of the parallelism discovery framework. Traditional dependence profiling approaches introduce a tremendous amount of time and memory overhead.  ...  The framework contains two main components: an efficient data-dependence profiler and a set of parallelism discovery algorithms based on a language-independent concept called Computational Unit.  ...  Parwiz also includes a few optimizations to lower the overhead of dynamic data-dependence profiling.  ... 
doi:10.1109/icpp.2013.119 dblp:conf/icpp/LiJW13 fatcat:6dc5s2ao4rhv7avxb4oai77hoi

Runtime automatic speculative parallelization

Ben Hertzberg, Kunle Olukotun
2011 International Symposium on Code Generation and Optimization (CGO 2011)  
By leveraging the idle cores in a CMP to analyze, optimize, and participate in the execution of a running sequential program, RASP enables a collection of simpler cores to achieve sequential performance  ...  In contrast to other systems for automatic speculative parallelization, RASP uses dynamic binary translation to optimize applications on-the-fly, without any need for recompilation or source code.  ...  This research was supported in part by a Stanford Graduate Fellowship and an Intel Fellowship.  ... 
doi:10.1109/cgo.2011.5764675 dblp:conf/cgo/HertzbergO11 fatcat:z4qldi2pbnfrfhojrcwoa2xvom

A data-oriented profiler to assist in data partitioning and distribution for heterogeneous memory in HPC

Antonio J. Peña, Pavan Balaji
2016 Parallel Computing  
Profiling is of great assistance in understanding and optimizing an application's behavior.  ...  In this paper we describe a profiling tool we have developed by extending the Valgrind framework and one of its tools: Callgrind.  ...  Background To develop our data-oriented profiling tool, we extend Valgrind, a generic instrumentation framework, and Callgrind, its call-graph profiler.  ... 
doi:10.1016/j.parco.2015.10.006 fatcat:2yjaubbp4jhqne3m2ty2cd25fi

Tightfit: adaptive parallelization with foresight

Omer Tripp, Noam Rinetzky
2013 Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2013  
Irregular applications often exhibit data-dependent parallelism: Different inputs, and sometimes also different execution phases, enable different levels of parallelism.  ...  The resulting prediction rule serves in deployment runs to foresee the available parallelism for a given workload and tune the parallelization system accordingly.  ...  parallelism, and (ii) quantitative and structural analysis of data dependencies as a means of estimating available parallelism while abstracting away deployment-specific details.  ... 
doi:10.1145/2491411.2491443 dblp:conf/sigsoft/TrippR13 fatcat:in3hpkqqnbatbcx2x5chn6gasa

Towards a holistic approach to auto-parallelization

Georgios Tournavitis, Zheng Wang, Björn Franke, Michael F.P. O'Boyle
2009 Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation - PLDI '09  
scope for adaptation to different target architectures.  ...  Using profile-driven parallelism detection we overcome the limitations of static analysis, enabling us to identify more application parallelism and only rely on the user for final approval.  ...  A two-staged parallelization approach combining profiling-driven parallelism detection and machine-learning based mapping to generate OpenMP annotated parallel programs. addition, data scoping for shared  ... 
doi:10.1145/1542476.1542496 dblp:conf/pldi/TournavitisWFO09 fatcat:eowtsa2u3bdclepy2iiaxhbrfu

Towards a holistic approach to auto-parallelization

Georgios Tournavitis, Zheng Wang, Björn Franke, Michael F.P. O'Boyle
2009 SIGPLAN notices  
scope for adaptation to different target architectures.  ...  Using profile-driven parallelism detection we overcome the limitations of static analysis, enabling us to identify more application parallelism and only rely on the user for final approval.  ...  A two-staged parallelization approach combining profiling-driven parallelism detection and machine-learning based mapping to generate OpenMP annotated parallel programs. addition, data scoping for shared  ... 
doi:10.1145/1543135.1542496 fatcat:fekruo3zf5dsxenzofj6vxz4ym

Optimizing the hybrid parallelization of BHAC [article]

Salvatore Cielo, Oliver Porth, Luigi Iapichino, Anupam Karmakar, Hector Olivares, Chun Xia
2021 arXiv   pre-print
We assess scaling and communication patterns in order to identify and alleviate MPI bottlenecks, with both runtime switches and precise code interventions.  ...  Our performance characterization and threading analysis provided guidance in improving the concurrency and thus the efficiency of the OpenMP parallel regions.  ...  At this point we performed a few single node scaling tests and tool-assisted profiling runs to determine the optimal node configuration for the (mainly compiler-assisted vectorization and MPI/OpenMP ratio  ... 
arXiv:2108.12240v1 fatcat:klcw7extx5d3nmucjau6umws6y

JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization [article]

Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Peña
2021 arXiv   pre-print
We add a versatile code-translation method for multi-device utilization by which manually-optimized applications can be distributed automatically while keeping original code structure and parallelism.  ...  This paper introduces JACC, an OpenACC runtime framework which enables the dynamic extension of OpenACC programs by serving as a transparent layer between the program and the compiler.  ...  Acknowledgement The EPEEC project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 801051.  ... 
arXiv:2110.14340v1 fatcat:acfa6g7xm5dyfajen7fqkn4yri

Advanced environments for parallel and distributed applications: a view of current status

Pasqua D'Ambra, Marco Danelutto, Daniela di Serafino, Marco Lapegna
2002 Parallel Computing  
In this paper we provide a view of the design and development activity concerning advanced environments for parallel and distributed computing.  ...  Both classes are widely discussed, in light of the key concepts previously outlined, and several examples are provided, in order to give a picture of the current status and trends. (P.  ...  , and other tools such as debuggers and profilers.  ... 
doi:10.1016/s0167-8191(02)00199-0 fatcat:zxwhjjonz5eejl24igm2j5pmre

HPCTOOLKIT: tools for performance analysis of optimized parallel programs

L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, N. R. Tallent
2009 Concurrency and Computation  
Recently, new capabilities were added to HPCTOOLKIT for collecting call path profiles for fully-optimized codes without any compiler support, pinpointing and quantifying bottlenecks in multithreaded programs  ...  HPCTOOLKIT can pinpoint and quantify scalability bottlenecks in fully-optimized parallel programs with a measurement overhead of only a few percent.  ...  ACKNOWLEDGEMENTS HPCTOOLKIT include project alumni Nathan Froyd and Robert Fowler. Cristian Coarfa was involved in the development of scalability analysis using call path profiles.  ... 
doi:10.1002/cpe.1553 fatcat:nl2k7bwlbnanvn4cu76b23hkwq

Energy-aware parallelization flow and toolset for C code

Mihai T. Lazarescu, Albert Cohen, Adrien Guatto, Nhat Minn Lê, Luciano Lavagno, Antoniu Pop, Manuel Prieto, Andrei Terechko, Alexandru Sutii
2014 Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems - SCOPES '14  
General Terms execution profiling, data dependency analysis, program parallelization, energy estimation  ...  This trend accentuated the need to convert existing sequential code to effectively exploit the resources of these architectures.  ...  The parallelization tool (ParTools) performs execution profiling and collects data dependencies program-wide at run-time.  ... 
doi:10.1145/2609248.2609264 dblp:conf/scopes/LazarescuCGLLPPTS14 fatcat:s4r2rkgq25hj5gwmuh6y3mcmpq

A Review of Parallelization Tools and Introduction to Easypar

Sudhakar Sah, Vinay G. Vaidya
2012 International Journal of Computer Applications  
The classification is based on different eras of tool development, role playedby these tools in various parallelization stages, and features provided by parallel program assistance tools.  ...  Multicore processors have paved the way to increase the performance of any application by the virtue of benefits of parallelization.  ...  Parallel code developed until this point may not be optimal and there exists lot of scope for improvement.  ... 
doi:10.5120/8944-3108 fatcat:mxaohvalvrecrmxlplzyzq7x2i

Hidp: A hierarchical data parallel language

Yongpeng Zhang, F. Mueller
2013 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)  
This paper contributes HiDP, a hierarchical data parallel language.  ...  The purpose of HiDP is to improve the coding productivity of integrating hierarchical data parallelism without significant loss of performance.  ...  The * indicates that the parallelism degree is data dependent.  ... 
doi:10.1109/cgo.2013.6494994 dblp:conf/cgo/ZhangM13 fatcat:fparo6cmjzdypdmotsdn7swop4

Integrating profile-driven parallelism detection and machine-learning-based mapping

Zheng Wang, Georgios Tournavitis, Björn Franke, Michael F. P. O'boyle
2014 ACM Transactions on Architecture and Code Optimization (TACO)  
, demonstrating the potential of profile-guided and machine-learning based parallelization for complex multi-core platforms.  ...  Using profile-driven parallelism detection we overcome the limitations of static analysis, enabling the identification of more application parallelism and only rely on the user for final approval.  ...  The time spent on profiling and analysis depends on the program to be parallelized and the input data set used for profiling.  ... 
doi:10.1145/2579561 fatcat:x5b7hvxjgrgjnmyk3pozdrtzye
« Previous Showing results 1 — 15 out of 48,384 results