Filters








5,204 Hits in 6.2 sec

A language for hierarchical data parallel design-space exploration on GPUs

BO JOEL SVENSSON, RYAN R. NEWTON, MARY SHEERAN
2016 Journal of functional programming  
Thus, we implement not Nested Data Parallelism, but a more limited form that we call Hierarchical Data Parallelism.  ...  Obsidian is an embedded language (in Haskell) for implementing high performance kernels to be run on GPUs.  ...  We thank Henning Thielemann, Josef Svenningsson and Trevor McDonell for a lot of great feedback on the Obsidian implementation.  ... 
doi:10.1017/s0956796816000046 fatcat:b24e4ormoveuffszib5mkggozy

Multidimensional Dataflow Graph Modeling and Mapping for Efficient GPU Implementation

Lai-Huei Wang, Chung-Ching Shen, Gunasekaran Seetharaman, Kannappan Palaniappan, Shuvra S. Bhattacharyya
2012 2012 IEEE Workshop on Signal Processing Systems  
Experimental results from this study show that our approach can be used to derive fast GPU implementations, and enhance trade-off analysis during design space exploration.  ...  We demonstrate our methods with a case study of image histogram implementation on a graphics processing unit (GPU).  ...  Our proposed design methods apply dataflow transformations to exploit data parallelism hierarchically for multidimensional dataflow graphs.  ... 
doi:10.1109/sips.2012.10 dblp:conf/sips/WangSSPB12 fatcat:nkctnqxwmvd3rap4zq3ckf44ze

Author index

2013 20th Annual International Conference on High Performance Computing  
Pavanakumar Revisiting the space-filling curves for storage, reordering and partitioning mesh based data in scientific computing Murphy, Brian Solving Tridiagonal Systems on a GPU Murty, Anurag Efficient  ...  Parallel Betweenness Centrality and Its Application to Social Analytics Jain, Ashutosh GAGM: Genome Assembly on GPU using Mate pairs Jalby, William MIL: A language to build program analysis tools  ... 
doi:10.1109/hipc.2013.6799145 fatcat:jnmign7535ep5jvhn65xp4aqqe

Designing a unified programming model for heterogeneous machines

Michael Garland, Manjunath Kudlur, Yili Zheng
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
It provides constructs for bulk parallelism, synchronization, and data placement which operate across the entire machine.  ...  We describe the design of the Phalanx programming model, which seeks to provide a unified programming model for heterogeneous machines.  ...  We uses active messages for moving data when the source or the destination is on a GPU.  ... 
doi:10.1109/sc.2012.48 dblp:conf/sc/GarlandKZ12 fatcat:ua2yayf6svf5nbk7nlmlks267i

Hidp: A hierarchical data parallel language

Yongpeng Zhang, F. Mueller
2013 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)  
This paper contributes HiDP, a hierarchical data parallel language.  ...  HiDP is a sourceto-source compiler that converts a very concise data parallel language into CUDA C++ source code.  ...  This paper presents HiDP, a hierarchical data-parallel language designed for efficient execution on today's SIMT architectures.  ... 
doi:10.1109/cgo.2013.6494994 dblp:conf/cgo/ZhangM13 fatcat:fparo6cmjzdypdmotsdn7swop4

Parallel hybrid evolutionary algorithms on GPU

The Van Luong, Nouredine Melab, El-Ghazali Talbi
2010 IEEE Congress on Evolutionary Computation  
This paper presents a new methodology to design and implement efficiently and effectively hybrid evolutionary algorithms on GPU accelerators.  ...  The methodology enables efficient mappings of the explored search space onto the GPU memory hierarchy.  ...  Section III provides a depth look on the three-level decomposition. First, generic concepts for designing parallel hybrid EAs on GPU are presented (high-level).  ... 
doi:10.1109/cec.2010.5586403 dblp:conf/cec/LuongMT10 fatcat:d5f3jph2kza4nc2xsqfqvw2pny

Large neighborhood local search optimization on graphics processing units

The Van Luong, Nouredine Melab, El-Ghazali Talbi
2010 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)  
Keywords-GPU-based metaheuristics; parallel local search algorithms on GPU;  ...  However, designing applications on GPU is still complex and many issues have to be faced. We provide a methodology to design and implement large neighborhood LS algorithms on GPU.  ...  In this paper, a focus has been particularly made on the design of three different neighborhoods to the hierarchical GPU for binary problems.  ... 
doi:10.1109/ipdpsw.2010.5470889 dblp:conf/ipps/LuongMT10 fatcat:lcgl3luqw5aytbol52tirjllcu

A GPU-based iterated tabu search for solving the quadratic 3-dimensional assignment problem

The Van Luong, Lakhdar Loukil, Nouredine Melab, El-Ghazali Talbi
2010 ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010  
These methods handle with a single solution iteratively improved by exploring its neighborhood in the solution space. In this paper, we propose an iterated tabu search for solving the Q3AP.  ...  The design of this algorithm is essentially based on a new large neighborhood structure.  ...  In the present work, we propose a hierarchical parallel iterated tabu search algorithm on GPU for solving Q3AP problems.  ... 
doi:10.1109/aiccsa.2010.5587019 dblp:conf/aiccsa/LuongLMT10 fatcat:qy6jioewz5eateaeg3g3xy4x2u

Methodologies for Synthesizing and Analyzing Dynamic Dataflow Programs in Heterogeneous Systems for Edge Computing

Aurelien Bloch, Simone Casale-Brunet, Marco Mattavelli
2021 IEEE Open Journal of Circuits and Systems  
To complete the methodology of seamless porting of dataflow software and partition on CPU or GPU computing nodes, an automated methodology for exploring the configuration space and to identify high performance  ...  The steps do include the optimization of the communication between heterogeneous processing elements, a technique for the efficient mapping and parallelization of computation on independent GPU partitions  ...  Based on this methodology, a design space exploration framework called TURNUS has been developed [44], [45] .  ... 
doi:10.1109/ojcas.2021.3116342 fatcat:tcj74ra5z5bo7oox4w56fo5xuu

Multicore software technologies

Hahn Kim, Robert Bond
2009 IEEE Signal Processing Magazine  
on DAtA PArALLeLisM uPC ■ inDustrY stAnDArD ■ FoCuses MostLY on DAtA PArALLeLisM ■ BAseD on A FAMiLiAr LAnguAge (C) VsiPL11 ■ inDustrY stAnDArD ■ FoCuses MostLY on DAtA PArALLeLisMDesigneD For reAL-tiMe  ...  languages The vast majority of programming languages are designed for serial computation. There are a number of initiatives to add explicit support for parallelism to popular languages.  ... 
doi:10.1109/msp.2009.934141 fatcat:wtla5y56mneqri2f6ip2b6keg4

In-memory grid files on graphics processors

Ke Yang, Bingsheng He, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, Pedro Sander, Jiaoying Shi
2007 Proceedings of the 3rd international workshop on Data management on new hardware - DaMoN '07  
Considering the hardware characteristics of GPUs, we design a massively multi-threaded GPU-based grid file for static, memory-resident multidimensional point data.  ...  Recently, graphics processing units, or GPUs, have become a viable alternative as commodity, parallel hardware for generalpurpose computing, due to their massive data-parallelism, high memory bandwidth  ...  Lidan Shou of Zhejiang University for his lectures on multidimensional access methods.  ... 
doi:10.1145/1363189.1363196 dblp:conf/damon/YangHFLGLSS07 fatcat:jcsnlh5j4beundape2dez34fta

Mapping a data-flow programming model onto heterogeneous platforms

Alina Sbîrlea, Yi Zou, Zoran Budimlíc, Jason Cong, Vivek Sarkar
2012 Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems - LCTES '12  
We have designed a software flow for converting high-level CnC programs to the Habanero-C language.  ...  In this paper we explore mapping of a high-level macro data-flow programming model called Concurrent Collections (CnC) onto heterogeneous platforms in order to achieve high performance and low energy consumption  ...  Additional thanks to the Habanero team for their comments and feedback on this work.  ... 
doi:10.1145/2248418.2248428 dblp:conf/lctrts/SbirleaZBCS12 fatcat:pt3s2jlcibehho65hstsw65ahm

Dataflow modeling and design for cognitive radio networks

Lai-Huei Wang, Shuvra S. Bhattacharyya, Aida Vosoughi, Joseph R. Cavallaro, Markku Juntti, Jani Boutellier, Olli Silven, Mikko Valkama
2013 8th International Conference on Cognitive Radio Oriented Wireless Networks  
a GPU architecture.  ...  As RF frequency agility and reconfiguration for carrier aggregation are important goals for 4G LTE Advanced systems, we also focus on dataflow analysis for digital pre-distortion algorithms.  ...  In order to understand the interactions and design space exploration among RF, baseband algorithms, and hardware architectures, we need to prototype our proposed cognitive radio solutions on a testbed.  ... 
doi:10.1109/crowncom.2013.6636817 dblp:conf/crowncom/WangBVCJBSV13 fatcat:sop5pmoe7ra23n5tue3uhtqp6y

Daisen: A Framework for Visualizing Detailed GPU Execution [article]

Yifan Sun, Yixuan Zhang, Ali Mosallaei, Michael D. Shah, Cody Dunne, David Kaeli
2021 arXiv   pre-print
We contribute data and task abstraction for GPU performance analysis.  ...  Based on our task analysis, we propose Daisen, a framework that supports data collection from GPU simulators and provides visualization of the simulator-generated GPU execution traces.  ...  Acknowledgments We thank our reviewers for their constructive feedback. We also thank AMD for supporting this work.  ... 
arXiv:2104.00828v1 fatcat:bm2lyjogy5gvbb7fra4lx242im

Efficient Explicit-State Model Checking on General Purpose Graphics Processors [chapter]

Stefan Edelkamp, Damian Sulewski
2010 Lecture Notes in Computer Science  
We accelerate state space exploration for explicit-state model checking by executing complex operations on the graphics processing unit (GPU).  ...  In contrast to existing approaches enhancing model checking through performing parallel matrix operations on the GPU, we parallelize the breadth-first layered construction of the state space graph.  ...  We assume a hierarchical memory structure of SRAM (small, but fast parallel access) and VRAM (large, but slow parallel access) located on the GPU, to- gether with RAM located on the motherboard.  ... 
doi:10.1007/978-3-642-16164-3_8 fatcat:tzuziwamjnghhizmbegedfm5wi
« Previous Showing results 1 — 15 out of 5,204 results