Filters








10,051 Hits in 11.2 sec

Active learning accelerated automatic heuristic construction for parallel program mapping

William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather
2014 Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14  
We demonstrate this technique by automatically creating a model to determine on which device to execute four parallel programs at differing problem dimensions for a representative Cpu-Gpu based system.  ...  In this work, we present a low-cost predictive modelling approach for automatic heuristic construction which significantly reduces this training overhead.  ...  Portable mapping of data parallel programs to opencl for heterogeneous systems. In CGO '13. [2] S. Kulkarni and J. Cavazos.  ... 
doi:10.1145/2628071.2628128 dblp:conf/IEEEpact/OgilviePWL14 fatcat:3siirje2ojclhmroh7yweyajya

Fast Automatic Heuristic Construction Using Active Learning [chapter]

William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather
2015 Lecture Notes in Computer Science  
We demonstrate this technique by automatically constructing a model to determine on which device to execute four parallel programs at differing problem dimensions for a representative Cpu-Gpu based heterogeneous  ...  In this work, we present a low-cost predictive modelling approach for automatic heuristic construction which significantly reduces this training overhead.  ...  We have presented a novel, low-cost predictive modelling approach for machine learning based automatic heuristic construction.  ... 
doi:10.1007/978-3-319-17473-0_10 fatcat:gwcklt44kvcgfhgpirsh76lvne

Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review

Suejb Memeti, Sabri Pllana, Alécio Binotto, Joanna Kołodziej, Ivona Brandic
2018 Computing  
and parallel programming models.  ...  The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of software optimization for parallel computing  ...  parameters; With regards to static scheduling, the attention of recent research that use machine learning and meta-heuristics is in the following optimization objectives: mapping program parallelism to  ... 
doi:10.1007/s00607-018-0614-9 fatcat:da2rfxqlcjen5frzfxreimtngm

VLIW Code Generation for a Convolutional Network Accelerator

Maurice Peemen, Wisnu Pramadi, Bart Mesman, Henk Corporaal
2015 Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems - SCOPES '15  
Earlier works have focused on energy efficient accelerators for this class of algorithms, but none of them provides a complete and practical programming model.  ...  This paper presents a compiler flow to map Deep Convolutional Networks (ConvNets) to a highly specialized VLIW accelerator core targeting the low-power embedded market.  ...  For simple programs manual construction of schedules is feasible.  ... 
doi:10.1145/2764967.2771928 dblp:conf/scopes/PeemenPMC15 fatcat:xbrlsv3bbfd3te4c7rrc3a7haq

Integrating profile-driven parallelism detection and machine-learning-based mapping

Zheng Wang, Georgios Tournavitis, Björn Franke, Michael F. P. O'boyle
2014 ACM Transactions on Architecture and Code Optimization (TACO)  
We then replace the traditional target-specific and inflexible mapping heuristics with a machine-learning based prediction mechanism, resulting in better mapping decisions while automating adaptation to  ...  , demonstrating the potential of profile-guided and machine-learning based parallelization for complex multi-core platforms.  ...  However, the mapping scheme varies not only from program to program, but also from architecture to architecture. Therefore, we need an automatic and portable solution for parallelism mapping.  ... 
doi:10.1145/2579561 fatcat:x5b7hvxjgrgjnmyk3pozdrtzye

Parallel Programming Models for Heterogeneous Many-Cores : A Survey [article]

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 arXiv   pre-print
In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability  ...  While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to  ...  As the many-core design becomes increasingly diverse, we believe that the machine-learning techniques provide a rigorous, automatic way for constructing optimization heuristics, which is more scalable  ... 
arXiv:2005.04094v1 fatcat:e2psrdnyajh3hih3znnjjbezae

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 CCF Transactions on High Performance Computing  
In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability  ...  While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to  ...  A key enabling technology for optimizing parallel programs is machine learning.  ... 
doi:10.1007/s42514-020-00039-4 fatcat:nn56xhjm6rcu7kya6gfnyjg66q

Algorithmic Skeletons and Parallel Design Patterns in Mainstream Parallel Programming

Marco Danelutto, Gabriele Mencagli, Massimo Torquati, Horacio González–Vélez, Peter Kilpatrick
2020 International journal of parallel programming  
Finally, we give our personal overview—as researchers active for more than two decades in the parallel programming models and frameworks area—of the process that led to the adoption of these concepts in  ...  This paper discusses the impact of structured parallel programming methodologies in state-of-the-art industrial and research parallel programming frameworks.  ...  by Univ. of Pisa project ''DECLWARE: Metodologie dichiarative per la progettazione e il deployment di applicazioni'' (PRA_2018_66) and EU COST Action IC1406 High Performance Modelling and Simulation for  ... 
doi:10.1007/s10766-020-00684-w fatcat:vtqcyf4he5gu3eefbjsb7nrxne

Minimizing the cost of iterative compilation with active learning

William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather
2017 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)  
Balaprakash, of Argonne National Laboratory, for his kind help in providing us the initial data for our research.  ...  Active Learning for Systems Optimization Active learning has recently emerged as a viable means for constructing heuristics for systems optimization. Zuluaga et al.  ...  Accurate heuristics for deciding the best way to optimize a program are hard to construct.  ... 
doi:10.1109/cgo.2017.7863744 fatcat:uwbdczsdevgiplgbwmeaxpdifu

0 Instruction Set Architecture [chapter]

2003 Digital Design and Computer Organization  
For instance, instruction scheduling heuristics may have decided to map several operations of the same kind onto different issue-slots, while these could have been mapped into the same issue-slot.  ...  From this experiment we can learn that the area required for actually constructing the proposed processor architecture is slightly higher, while the energy required for running our algorithm on this architecture  ...  3 Found initial prototype mapped on : core_3b 4 Loading information from APEX file ... 5 Initial prototype has 3 issue -slots 6 7 Searching for best 'ed ' fitness solution ... 8 Using ' issue -  ... 
doi:10.1201/b12403-15 fatcat:mygaz2meibgljew5tzvmuw6x5i

Contents

2016 Procedia Computer Science  
in Structured Parallel Programming M.  ...  Constructing Real-time Web Interfaces of Scientifi c Workfl ows D.  ... 
doi:10.1016/s1877-0509(16)31051-1 fatcat:pewr5t3hq5fqjike3f6wr2k6pu

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads [article]

Siyu Wang, Yi Rong, Shiqing Fan, Zhen Zheng, LanSong Diao, Guoping Long, Jun Yang, Xiaoyong Liu, Wei Lin
2020 arXiv   pre-print
In this paper, we propose Auto-MAP, a framework for exploring distributed execution plans for DNN workloads, which can automatically discovering fast parallelization strategies through reinforcement learning  ...  Efficient exploration remains a major challenge for reinforcement learning.  ...  To harness computing power to achieve be er throughput, a critical challenge is how to map diversi ed workloads to hardware accelerators automatically and e ciently.  ... 
arXiv:2007.04069v1 fatcat:asjj6wtwgnb4dcp5sis2oaogn4

HAO: Hardware-aware neural Architecture Optimization for Efficient Inference [article]

Zhen Dong, Yizhao Gao, Qijing Huang, John Wawrzynek, Hayden K.H. So, Kurt Keutzer
2021 arXiv   pre-print
Given a set of hardware resource constraints, our integer programming formulation directly outputs the optimal accelerator configuration for mapping a DNN subgraph that minimizes latency.  ...  Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs.  ...  TABLE I I NOTATIONS FOR HARDWARE DESIGN Notation Description Notation Description H feature map height P I parallelism on input channel W feature map width P O parallelism on output channel  ... 
arXiv:2104.12766v1 fatcat:wvpt6sil4zhf5dknqhv5zj76lu

A review on the self and dual interactions between machine learning and optimisation

Heda Song, Isaac Triguero, Ender Özcan
2019 Progress in Artificial Intelligence  
The techniques in the former area aim to learn knowledge from data or experience, while the techniques from the latter search for the best option or solution to a given problem.  ...  To employ these techniques automatically and effectively aligning with the real aim of artificial intelligence, both sets of techniques are frequently hybridised, interacting with each other and themselves  ...  algorithm for training a model, selection strategy for active learning, etc.  ... 
doi:10.1007/s13748-019-00185-z fatcat:zr5dsschzzddzgfvwu4sl2jeou

Progressive Codesign of an Architecture and Compiler Using a Proxy Application

Arpith Jacob, Ravi Nair, Tong Chen, Zehra Sura, Changhoan Kim, Carlo Bertolli, Samuel Antao, Kevin OBrien
2015 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)  
Such co-design is commonly done using hand-tuned codes for simple kernels that typically do not capture the nuances of realworld applications or reveal the complexities of programming a heterogeneous system  ...  The Active Memory Cube (AMC) is a novel nearmemory processor that exploits high memory bandwidth and low latency close to DRAM to execute scientific applications in an energy-efficient manner.  ...  for an accelerator, mapping of parallelism, control of resource usage, and code-size.  ... 
doi:10.1109/sbac-pad.2015.18 dblp:conf/sbac-pad/JacobNCSKBAO15 fatcat:2tsunsbztzbdhjuxn5zobup3wi
« Previous Showing results 1 — 15 out of 10,051 results