93,292 Hits in 7.1 sec

Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems

Minyi Guo, Weng-Long Chang, Bo Jiang, Shu-Chien Huang, Sien-Tang Tsai, Michael Ho
2009 Journal of Supercomputing  
as a Cartesian grid (or a template in HPF terms), and to provide data locality for parallelizing compilers so that data access communication costs can be minimized.  ...  Most data alignment methods are mainly devised to align the referenced arrays using linear subscripts or quadratic subscripts with n loop index variables, and the methods are well developed.  ...  In this phase, the data accesses in a program are inspected and formulated as a system of equations in which the unknowns can be utilized to compute the virtual processors for the computations and data  ... 
doi:10.1007/s11227-009-0280-y fatcat:capol3pfenhlvmdemrh7jzzmsa

Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers [chapter]

Tzung-Shi Chen, Chih-Yung Chang
2000 Lecture Notes in Computer Science  
In this paper, we propose new data partition and alignment techniques for partitioning and aligning data arrays with a program in a way of minimizing communication over processors.  ...  We use skewed alignment instead of the dimension-ordered alignment techniques to align data arrays.  ...  For a program running on a distributed memory multicomputer, it is not easy to distribute and manage partitioned data and computations over processors when affine transformation methods addressed in [  ... 
doi:10.1007/3-540-39999-2_10 fatcat:n2u3sxu5indftjmdpeihugsdue

Comparative Analysis of Automatic Parallelization Techniques

Muntha SR, Prasad A, Gogineni K, Nikhil L, Harshavardhan VL
2017 Journal of computer science and systems biology  
This paper aims on provides an understanding on how process of parallelization is similar to the local sequence alignment and how smith waterman algorithm can be used in the filed o parallel computing.  ...  There are various ways to perform this analysis, the most popular one being analyzing the data in an array to detect the privatisable ones.  ... 
doi:10.4172/jcsb.1000262 fatcat:q3k5f7x5rnbmrg2uqv5774qbyy

Compiling and Optimizing Java 8 Programs for GPU Execution

Kazuaki Ishizaki, Akihiro Hayashi, Gita Koblents, Vivek Sarkar
2015 2015 International Conference on Parallel Architecture and Compilation (PACT)  
cache for array accesses to increase memory e ciency in GPUs, and 3) eliminate redundant data transfer between the host and the GPU.  ...  GPUs can enable significant performance improvements for certain classes of data parallel applications and are widely used in recent computer systems.  ...  We thank Marcel Mitran for his encouragement and support in pursuing the parallel streams API and lambda approach, and thank Jimmy Kwa for his extensive contribution to the implementation.  ... 
doi:10.1109/pact.2015.46 dblp:conf/IEEEpact/IshizakiHKS15 fatcat:c6bwzxy7vbbg5mbo6ohajgmepa

An SIMD Code Generation Technology for Indirect Array

Pengyuan Li, Rongcai Zhao, Qinghua Zhang, Lin Han
2016 Journal of clean energy technologies  
Due to disjoint memory references and non-aligned memory references, existing SIMD compilers can't vectorize loops containing indirect array utilizing SIMD (single instruction multiple data) instructions  ...  For an irregular indirect array access, we adopt two separately registers to store the array base and the index address.  ...  accesses to array X are dictated by the value computed by array idx .  ... 
doi:10.7763/ijcte.2016.v8.1047 fatcat:wnew4zwltnherj757n7pdcpici

Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols

Wei-keng Liao, Alok Choudhary
2008 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis  
In this paper, we propose dynamic file partitioning methods that adapt according to the underlying locking protocols in the parallel file systems and evaluate the performance of four partitioning methods  ...  to the degree of parallelism of a given file domain partitioning method.  ...  Figure 11 shows the data partitioning pattern on a 3D array and the mapping of a 4D sub-array to the global array in file.  ... 
doi:10.1109/sc.2008.5222722 dblp:conf/sc/LiaoC08 fatcat:bhwmgtzfkbg6vekccllh7iw5ua

FPGA-based protein sequence alignment : A review

Mohd. Nazrin Md. Isa, Ku Noor Dhaniah Ku Muhsen, Dayana Saiful Nurdin, Muhammad Imran Ahmad, Sohiful Anuar Zainol Murad, Shaiful Nizam Mohyar, Azizi Harun, Razaidi Hussin, Mohamad Halim Abd. Wahid
2017 EPJ Web of Conferences  
Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration  ...  During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process.  ...  There are several challenges on filling in an alignment matrix which consists of frequent memory access and high data dependency during PE configuration.  ... 
doi:10.1051/epjconf/201716201075 fatcat:grc7ik2wmjfjtfjjqq3gxokfoa

Combining structural and procedural programming by parallelizing compilation

Reiner W. Hartenstein, Karin Schmidt
1995 Proceedings of the 1995 ACM symposium on Applied computing - SAC '95  
Data is mapped in a regular form onto the Xputer memory space to be accessible by the Xputers data sequencer hardware which provides a generic set of fast address sequences.  ...  To counteract this deficiency an automatic parallelization and compilation method for Xputers has been developed for the input language C.  ...  } in a regular fashion, and how the data fields of differently mapped data variables can be aligned to a combined data field in order to use only one scan pattern.  ... 
doi:10.1145/315891.315937 dblp:conf/sac/HartensteinS95 fatcat:fcrcwh4frbd7tdrtmt4njdvlym

A flexible sparse matrix data format and parallel algorithms for the assembly of sparse matrices in general finite element applications using atomic synchronisation primitives [article]

Adam Sky, César Polindara, Ingo Muench, Carolin Birk
2021 arXiv   pre-print
In this paper we focus on the assembly process of the global stiffness matrix and explore different algorithms and their efficiency on shared memory systems using C++.  ...  A key aspect of our investigation is the use of atomic synchronization primitives for the derivation of data-race free algorithms and data structures.  ...  This concept refers to the tendency of a computer program to achieve faster access to objects whose addresses are near one another, either in space (spatial locality) or in time (temporal locality).  ... 
arXiv:2012.00585v2 fatcat:5ygxdlwitfcjtd5rzwydeyp6ge

Data-parallel support for numerical irregular problems

E.L. Zapata, O. Plata, R. Asenjo, G.P. Trabado
1999 Parallel Computing  
This paper discusses the eective parallelization of numerical irregular codes, focusing on the de®nition and use of data-parallel extensions to express the parallelism that they exhibit.  ...  Two kinds of irregularity can be distinguished in these applications. First, irregular control structures, derived from the use of conditional statements on data only known at runtime.  ...  Touriño, at the Department of Electronics and Computation, University of La Coruña, Spain, for the development of some of the methods and experiments described in this paper.  ... 
doi:10.1016/s0167-8191(99)00090-3 fatcat:ymeizehfevcfvkrivsnce2hfs4

Hardware-Acceleration of Short-Read Alignment Based on the Burrows-Wheeler Transform

Hasitha Muthumala Waidyasooriya, Masanori Hariyama
2016 IEEE Transactions on Parallel and Distributed Systems  
The proposed accelerator can align a few hundred million short DNA fragments in an hour by using 80 processing elements in parallel.  ...  We apply a data encoding scheme that reduces the data size by 96 percent, and propose a pipelined hardware decoder to decode the data.  ...  ACKNOWLEDGMENTS This work is partially supported by MEXT KAKENHI Grant Numbers 15K15958 and 24300013.  ... 
doi:10.1109/tpds.2015.2444376 fatcat:gvqh6mx3pbffzojz5ys7lj6t2m

Efficient Parallel Algorithms for 3D Laplacian Smoothing on the GPU

Lei Xiao, Guoxiang Yang, Kunyang Zhao, Gang Mei
2019 Applied Sciences  
This paper presents a GPU-accelerated parallel algorithm for Laplacian smoothing in three dimensions by considering the influence of different data layouts and iteration forms.  ...  In numerical modeling, mesh quality is one of the decisive factors that strongly affects the accuracy of calculations and the convergence of iterations.  ...  Acknowledgments: The authors would like to thank the editor and the reviewers for their contributions. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/app9245437 fatcat:v2v3f6zqdbfhhktprpt5ymfwza

Efficient SIMD code generation for irregular kernels

Seonggun Kim, Hwansoo Han
2012 SIGPLAN notices  
We extract both inter-and intra-iteration parallelism, taking data reorganization overhead into consideration.  ...  However, addressing those challenges is inevitable, since many important compute-intensive applications extensively use array indirection to reduce memory and computation requirements.  ...  For computations on disjoint data, they still fail to extract inter-iteration parallelism.  ... 
doi:10.1145/2370036.2145824 fatcat:xgx5rhqc3ra2vmxfmukmu2a4se

A Customized Many-Core Hardware Acceleration Platform for Short Read Mapping Problems Using Distributed Memory Interface with 3D–Stacked Architecture

Pei Liu, Ahmed Hemani, Kolin Paul, Christian Weis, Matthias Jung, Norbert Wehn
2016 Journal of Signal Processing Systems  
Assembling of those short reads poses a challenge on the mapping of reads to a reference genome in terms of both sensitivity and execution time.  ...  In this paper, we propose a customized many-core hardware acceleration platform for short read mapping problems based on hash-index method.  ...  Acknowledgements We are very grateful to Professor Lars Arvestad for providing many valuable suggestions.  ... 
doi:10.1007/s11265-016-1204-8 fatcat:fmfr633cdjc3tmfkoksgwmk6za

HPF-2 Support for Dynamic Sparse Computations [chapter]

R. Asenjo, O. Plata, E. L. Zapata, J. Touriño, R. Doallo
1999 Lecture Notes in Computer Science  
Dynamic data structures for sparse matrix storage are analyzed, permitting to efficiently deal with fill-in and pivoting issues.  ...  Any of the data representations considered enforces the handling of indirections for data accesses, pointer referencing and dynamic data creation.  ...  Acknowledgements We gratefully thank Iain Duff and all members in the parallel algorithm team at CERFACS, Toulouse (France), for their kindly help and collaboration.  ... 
doi:10.1007/3-540-48319-5_15 fatcat:un5jypbdjrbodhfck2rdclwuhe
« Previous Showing results 1 — 15 out of 93,292 results