Filters








10,054 Hits in 6.0 sec

Performance Framework for HPC Applications on Homogeneous Computing Platform

Chandrashekhar B. N, Department of ISE, Advance computing research, Nitte Meenakshi Institute of Technology, Bangalore-560064, India, Sanjay H. A
2019 International Journal of Image Graphics and Signal Processing  
The key challenge on the homogeneous platform is allocation of workload among CPU and GPU cores.  ...  In this work we have considered a homogenous cluster in which each nodes consists of same capability of CPU and graphical processing unit (GPU).  ...  In the figure, X-axis shows size of the dynamic random numbers and Y-axis shows the Execution time in seconds.  ... 
doi:10.5815/ijigsp.2019.08.03 fatcat:co7cs3bdm5fqfk43uyxt2px2ue

Interference Management for Distributed Parallel Applications in Consolidated Clusters

Jaeung Han, Seungheun Jeon, Young-ri Choi, Jaehyuk Huh
2016 ACM SIGOPS Operating Systems Review  
In distributed applications, a local interference in a node can affect the whole execution of the application spanning many nodes.  ...  Consolidating multiple applications on a system can improve the resource utilization of data centers.  ...  In function GetTotalExecTime, we compute the the total (normalized) execution time of the workloads as the sum of the execution time of each workload.  ... 
doi:10.1145/2954680.2872388 fatcat:pskahmynnvdmzkkktrog7dfwga

The benefit of SMT in the multi-core era

Stijn Eyerman, Lieven Eeckhout
2014 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14  
The number of active threads in a multi-core processor varies over time and is often much smaller than the number of supported hardware threads.  ...  The overall conclusion is that the benefit of SMT in the multi-core era is to provide flexibility with respect to the available thread-level parallelism.  ...  Acknowledgments We thank the anonymous reviewers for their valuable and constructive feedback. Stijn Eyerman is a postdoctoral fellow of the Research Foundation-Flanders.  ... 
doi:10.1145/2541940.2541954 dblp:conf/asplos/EyermanE14 fatcat:rnfzngcdang4xk5iaong2mz44y

Dynamic thread assignment on heterogeneous multiprocessor architectures

Michela Becchi, Patrick Crowley
2006 Proceedings of the 3rd conference on Computing frontiers - CF '06  
Not only do the behaviors of distinct threads differ, but each thread may also present diversity in its performance and resource usage over time.  ...  Prior work has shown that heterogeneous CMPs can meet the needs of a multi-programmed computing environment better than a homogeneous CMP system.  ...  We compare their performance with the one provided by a random and a pseudo optimal static assignment policy and by two homogeneous configurations.  ... 
doi:10.1145/1128022.1128029 dblp:conf/cf/BecchiC06 fatcat:fgm3hjrsgvaf5hqzu4fhuakham

Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, Keith I. Farkas
2004 SIGARCH Computer Architecture News  
It does so by matching the various jobs of a diverse workload to the various cores.  ...  This type of architecture covers a spectrum of workloads particularly well, providing high single-thread performance when thread parallelism is low, and high throughput when thread parallelism is high.  ...  Acknowledgments The authors would like to thank the reviewers for helpful feedback, Jeff Brown for significant help with the simulation tools, and Brad Calder for helpful discussions.  ... 
doi:10.1145/1028176.1006707 fatcat:rzncce5rfrfevanucuc5uwkulu

Optimal periodic remapping of dynamic bulk synchronous computations

Ngo-Tai Fong, Cheng-Zhong Xu, Le Yi Wang
2003 Journal of Parallel and Distributed Computing  
Since general optimization techniques tend to reveal stationary properties of the workload process, they are not readily applicable to the analysis of the effect of periodic remapping.  ...  Instead, this paper develops new analytical approaches to precisely characterize the transient statistical behaviors of the workload process on both homogeneous and heterogeneous machines.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.  ... 
doi:10.1016/j.jpdc.2003.07.004 fatcat:egy3qpscwvajdjooyump7cis74

Starchart: hardware and software optimization using recursive partitioning regression trees

Kenzo Van Craeynest, Shoaib Akram, Wim Heirman, Aamer Jaleel, Lieven Eeckhout
2013 Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques  
Equal-time scheduling runs each thread or workload on each core type for an equal fraction of the time, whereas equalprogress scheduling strives at getting equal amounts of work done on each core type.  ...  Single-ISA heterogeneous multi-cores consisting of small (e.g., in-order) and big (e.g., out-of-order) cores dramatically improve energy-and power-efficiency by scheduling workloads on the most appropriate  ...  Kenzo Van Craeynest was supported through a doctoral fellowship by the Agency for Innovation by Science and Technology (IWT). Additional support is provided by the FWO project G.0179. 10  ... 
doi:10.1109/pact.2013.6618815 dblp:conf/IEEEpact/CraeynestAHJE13 fatcat:p6ixpmsr4rdxph7uajkdm6iori

Scheduling Algorithms for Asymmetric Multi-core Processors [article]

Alan David
2017 arXiv   pre-print
Growing power dissipation due to high performance requirement of processor suggests multicore processor technology, which has become the technology for present and next decade.  ...  However, asymmetric multi core architecture poses a new challenge to operating system scheduler, which traditionally assumes homogeneous hardware.  ...  Two parts of the instructions execution time has been considered separately-execution time (amount of time it takes to execute the instruction) and stall time (number of cycles due to last level cache  ... 
arXiv:1702.04028v1 fatcat:uwvezkeg5bazzaruyglkcilqyi

A Probabilistic Machine Learning Approach to Scheduling Parallel Loops with Bayesian Optimization

Khu-rai Kim, Youngjae Kim, Sungyong Park
2020 IEEE Transactions on Parallel and Distributed Systems  
Within the considered workloads, BO FSS improves the execution time of FSS by as much as 22% and 5% on average.  ...  The tuning procedure only requires online execution time measurement of the target loop.  ...  Myoung Suk Kim for his insightful comments about our statistical analysis and Rover Root for his helpful comments about the scientific workloads considered in this work.  ... 
doi:10.1109/tpds.2020.3046461 fatcat:fxc4d32wrba2pgoank5nep6s5a

Graph partitioning for parallel applications in heterogeneous Grid environments

S. Kumar, S. K. Das, R. Biswas
2002 Proceedings 16th International Parallel and Distributed Processing Symposium  
Our partitioning algorithm, called MiniMax, generates and maps partitions onto a heterogeneous system with the objective of minimizing the maximum execution time of the parallel distributed application  ...  The problem of partitioning irregular graphs and meshes for parallel computations on homogeneous systems has been extensively studied.  ...  The choice of vertex and edge weights for system and workload graphs determines the execution time of individual processors.  ... 
doi:10.1109/ipdps.2002.1015564 dblp:conf/ipps/KumarDB02 fatcat:qw67iuy3hrb5tpces7oxtqjshy

Load balancing strategy for multicore systems

E. Chovancova, J. Mihal'ov
2015 2015 13th International Conference on Emerging eLearning Technologies and Applications (ICETA)  
We have also compared two important algorithms proposed in the literature in terms of faster execution time and power efficiency.  ...  Soft-wares are written for multicore platform that distribute the workload amongst multiple identical or different cores. This functionality is called thread-level parallelism.  ...  ACKNOWLEDGMENT The work is supported by iFuture: A Leading Research Group in Department of Computer Science, Abdul Wali Khan University, Mardan. The authors are thankful to Miss.  ... 
doi:10.1109/iceta.2015.7558473 fatcat:ffoxzd7j6jc4loehhunoppfhwe

Heterogeneous Environment Aware Streaming Graph Partitioning

Ning Xu, Bin Cui, Lei Chen, Zi Huang, Yingxia Shao
2015 IEEE Transactions on Knowledge and Data Engineering  
Based on this model, we propose a new graph partitioning objective function that aims to minimize the total execution time of the graph-processing job.  ...  With the increasing availability of graph data and widely adopted cloud computing paradigm, graph partitioning has become an efficient pre-processing technique to balance the computing workload and cope  ...  Comparing the execution time in topology T 1 to in original homogeneous environment(T 0 ), Combined, Min-Workload and Min-Increased heuristics can reduce the execution time by about 27%, which is about  ... 
doi:10.1109/tkde.2014.2377743 fatcat:ic5md2ju2rfydmbj5vvtptsumm

Speedup and Power Scaling Models for Heterogeneous Many-Core Systems

Ashur Rafiev, Mohammed A. N. Al-Hayanni, Fei Xia, Rishad Shafik, Alexander Romanovsky, Alex Yakovlev
2018 IEEE Transactions on Multi-Scale Computing Systems  
the parallelizability of applications.  ...  The practical use of the method is demonstrated with a quantitative study of system load balancing efficiency.  ...  By definition, the total execution time for the workload I is a sum of sequential and parallel execution times, t s (n) and t p (n), and it represents the time interval between the first instruction in  ... 
doi:10.1109/tmscs.2018.2791531 fatcat:oo6yexn4nbfr7fosytk7mdygae

Optimal Task Reallocation in Heterogeneous Distributed Computing Systems with Age-Dependent Delay Statistics

Jorge E. Pezoa, Majeed M. Hayat, Zhuoyao Wang, Sagar Dhakal
2010 2010 39th International Conference on Parallel Processing  
of the statistics of the execution time of a workload.  ...  The model is utilized to devise task reallocation policies that optimize three metrics: the average execution time of a workload, the qualityof-service in executing a workload by a prescribed deadline  ...  The average execution time of a workload, denoted as T (S 0 ), is defined as the expected value of the random workload execution time. Mathematically, we have T (S 0 ) = E[T (S 0 )].  ... 
doi:10.1109/icpp.2010.20 dblp:conf/icpp/PezoaHWD10 fatcat:dufsxoyhyfacbhwtz4xg6s7yma

Parallelization of Array Method with Hybrid Programming: OpenMP and MPI

Apolinar Velarde Martínez
2022 Applied Sciences  
Synthetic and real workloads have been experimented with to evaluate the performance of the new parallel schedule and compare it to the sequential schedule.  ...  By using the parallel approach with hybrid programming applied to the extraction of characteristics of the PTGs, applied to the search for geographically distributed resources with Lévy random walks and  ...  Conflicts of Interest: The author declares no conflict of interest.  ... 
doi:10.3390/app12157706 fatcat:jzqnahir4vfkvfod7hrrtaesym
« Previous Showing results 1 — 15 out of 10,054 results