A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems
2009
Journal of Supercomputing
as a Cartesian grid (or a template in HPF terms), and to provide data locality for parallelizing compilers so that data access communication costs can be minimized. ...
Most data alignment methods are mainly devised to align the referenced arrays using linear subscripts or quadratic subscripts with n loop index variables, and the methods are well developed. ...
In this phase, the data accesses in a program are inspected and formulated as a system of equations in which the unknowns can be utilized to compute the virtual processors for the computations and data ...
doi:10.1007/s11227-009-0280-y
fatcat:capol3pfenhlvmdemrh7jzzmsa
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers
[chapter]
2000
Lecture Notes in Computer Science
In this paper, we propose new data partition and alignment techniques for partitioning and aligning data arrays with a program in a way of minimizing communication over processors. ...
We use skewed alignment instead of the dimension-ordered alignment techniques to align data arrays. ...
For a program running on a distributed memory multicomputer, it is not easy to distribute and manage partitioned data and computations over processors when affine transformation methods addressed in [ ...
doi:10.1007/3-540-39999-2_10
fatcat:n2u3sxu5indftjmdpeihugsdue
Comparative Analysis of Automatic Parallelization Techniques
2017
Journal of computer science and systems biology
This paper aims on provides an understanding on how process of parallelization is similar to the local sequence alignment and how smith waterman algorithm can be used in the filed o parallel computing. ...
There are various ways to perform this analysis, the most popular one being analyzing the data in an array to detect the privatisable ones. ...
doi:10.4172/jcsb.1000262
fatcat:q3k5f7x5rnbmrg2uqv5774qbyy
Compiling and Optimizing Java 8 Programs for GPU Execution
2015
2015 International Conference on Parallel Architecture and Compilation (PACT)
cache for array accesses to increase memory e ciency in GPUs, and 3) eliminate redundant data transfer between the host and the GPU. ...
GPUs can enable significant performance improvements for certain classes of data parallel applications and are widely used in recent computer systems. ...
We thank Marcel Mitran for his encouragement and support in pursuing the parallel streams API and lambda approach, and thank Jimmy Kwa for his extensive contribution to the implementation. ...
doi:10.1109/pact.2015.46
dblp:conf/IEEEpact/IshizakiHKS15
fatcat:c6bwzxy7vbbg5mbo6ohajgmepa
An SIMD Code Generation Technology for Indirect Array
2016
Journal of clean energy technologies
Due to disjoint memory references and non-aligned memory references, existing SIMD compilers can't vectorize loops containing indirect array utilizing SIMD (single instruction multiple data) instructions ...
For an irregular indirect array access, we adopt two separately registers to store the array base and the index address. ...
accesses to array X are dictated by the value computed by
array idx . ...
doi:10.7763/ijcte.2016.v8.1047
fatcat:wnew4zwltnherj757n7pdcpici
Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols
2008
2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
In this paper, we propose dynamic file partitioning methods that adapt according to the underlying locking protocols in the parallel file systems and evaluate the performance of four partitioning methods ...
to the degree of parallelism of a given file domain partitioning method. ...
Figure 11 shows the data partitioning pattern on a 3D array and the mapping of a 4D sub-array to the global array in file. ...
doi:10.1109/sc.2008.5222722
dblp:conf/sc/LiaoC08
fatcat:bhwmgtzfkbg6vekccllh7iw5ua
FPGA-based protein sequence alignment : A review
2017
EPJ Web of Conferences
Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration ...
During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process. ...
There are several challenges on filling in an alignment matrix which consists of frequent memory access and high data dependency during PE configuration. ...
doi:10.1051/epjconf/201716201075
fatcat:grc7ik2wmjfjtfjjqq3gxokfoa
Combining structural and procedural programming by parallelizing compilation
1995
Proceedings of the 1995 ACM symposium on Applied computing - SAC '95
Data is mapped in a regular form onto the Xputer memory space to be accessible by the Xputers data sequencer hardware which provides a generic set of fast address sequences. ...
To counteract this deficiency an automatic parallelization and compilation method for Xputers has been developed for the input language C. ...
} in a regular fashion, and how the data fields of differently mapped data variables can be aligned to a combined data field in order to use only one scan pattern. ...
doi:10.1145/315891.315937
dblp:conf/sac/HartensteinS95
fatcat:fcrcwh4frbd7tdrtmt4njdvlym
A flexible sparse matrix data format and parallel algorithms for the assembly of sparse matrices in general finite element applications using atomic synchronisation primitives
[article]
2021
arXiv
pre-print
In this paper we focus on the assembly process of the global stiffness matrix and explore different algorithms and their efficiency on shared memory systems using C++. ...
A key aspect of our investigation is the use of atomic synchronization primitives for the derivation of data-race free algorithms and data structures. ...
This concept refers to the tendency of a computer program to achieve faster access to objects whose addresses are near one another, either in space (spatial locality) or in time (temporal locality). ...
arXiv:2012.00585v2
fatcat:5ygxdlwitfcjtd5rzwydeyp6ge
Data-parallel support for numerical irregular problems
1999
Parallel Computing
This paper discusses the eective parallelization of numerical irregular codes, focusing on the de®nition and use of data-parallel extensions to express the parallelism that they exhibit. ...
Two kinds of irregularity can be distinguished in these applications. First, irregular control structures, derived from the use of conditional statements on data only known at runtime. ...
Touriño, at the Department of Electronics and Computation, University of La Coruña, Spain, for the development of some of the methods and experiments described in this paper. ...
doi:10.1016/s0167-8191(99)00090-3
fatcat:ymeizehfevcfvkrivsnce2hfs4
Hardware-Acceleration of Short-Read Alignment Based on the Burrows-Wheeler Transform
2016
IEEE Transactions on Parallel and Distributed Systems
The proposed accelerator can align a few hundred million short DNA fragments in an hour by using 80 processing elements in parallel. ...
We apply a data encoding scheme that reduces the data size by 96 percent, and propose a pipelined hardware decoder to decode the data. ...
ACKNOWLEDGMENTS This work is partially supported by MEXT KAKENHI Grant Numbers 15K15958 and 24300013. ...
doi:10.1109/tpds.2015.2444376
fatcat:gvqh6mx3pbffzojz5ys7lj6t2m
Efficient Parallel Algorithms for 3D Laplacian Smoothing on the GPU
2019
Applied Sciences
This paper presents a GPU-accelerated parallel algorithm for Laplacian smoothing in three dimensions by considering the influence of different data layouts and iteration forms. ...
In numerical modeling, mesh quality is one of the decisive factors that strongly affects the accuracy of calculations and the convergence of iterations. ...
Acknowledgments: The authors would like to thank the editor and the reviewers for their contributions.
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/app9245437
fatcat:v2v3f6zqdbfhhktprpt5ymfwza
Efficient SIMD code generation for irregular kernels
2012
SIGPLAN notices
We extract both inter-and intra-iteration parallelism, taking data reorganization overhead into consideration. ...
However, addressing those challenges is inevitable, since many important compute-intensive applications extensively use array indirection to reduce memory and computation requirements. ...
For computations on disjoint data, they still fail to extract inter-iteration parallelism. ...
doi:10.1145/2370036.2145824
fatcat:xgx5rhqc3ra2vmxfmukmu2a4se
A Customized Many-Core Hardware Acceleration Platform for Short Read Mapping Problems Using Distributed Memory Interface with 3D–Stacked Architecture
2016
Journal of Signal Processing Systems
Assembling of those short reads poses a challenge on the mapping of reads to a reference genome in terms of both sensitivity and execution time. ...
In this paper, we propose a customized many-core hardware acceleration platform for short read mapping problems based on hash-index method. ...
Acknowledgements We are very grateful to Professor Lars Arvestad for providing many valuable suggestions. ...
doi:10.1007/s11265-016-1204-8
fatcat:fmfr633cdjc3tmfkoksgwmk6za
HPF-2 Support for Dynamic Sparse Computations
[chapter]
1999
Lecture Notes in Computer Science
Dynamic data structures for sparse matrix storage are analyzed, permitting to efficiently deal with fill-in and pivoting issues. ...
Any of the data representations considered enforces the handling of indirections for data accesses, pointer referencing and dynamic data creation. ...
Acknowledgements We gratefully thank Iain Duff and all members in the parallel algorithm team at CERFACS, Toulouse (France), for their kindly help and collaboration. ...
doi:10.1007/3-540-48319-5_15
fatcat:un5jypbdjrbodhfck2rdclwuhe
« Previous
Showing results 1 — 15 out of 93,292 results