Filters








124,453 Hits in 4.7 sec

Sams

Chunyang Gou, Georgi K. Kuzmanov, Georgi N. Gaydadjiev
2008 Proceedings of the 2008 workshop on Memory access on future processors a solved problem? - MAW '08  
We propose a Single-Affiliation Multiple-Stride (SAMS) scheme to support both unit-stride and strided conflict-free vector memory accesses.  ...  In this paper, we analyze the problem of supporting conflictfree access for multiple stride families in parallel memory schemes targeted for SIMD processing systems.  ...  In conflict-free access, data could be accessed in parallel as there is no module conflict during the memory access.  ... 
doi:10.1145/1366219.1366220 fatcat:zslpaf5vw5exhfrqeyqkjxm72m

Parallel Memory Implementation for Arbitrary Stride Accesses

Eero Aho, Jarno Vanne, Timo Hamalainen
2006 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation  
Arbitrary stride access capability with interleaved schemes) that try to ensure conflict free parallel data memories is described in previous research where the skewing accesses to a set or maximal amount  ...  Parallel memory modules can be used to access special This paper presents a novel parallel memory data patterns and feed the processors with only algorithm implementation allowing conflict free arbitrary  ...  PROPOSED PARALLEL MEMORY ARCHITECTURE Any skewing scheme that allows a conflict free access for a A block diagram of the proposed parallel memory specific stride 25 also provides a conflict free access  ... 
doi:10.1109/icsamos.2006.300801 dblp:conf/samos/AhoVH06 fatcat:laa7zwskzjfubatu4p65e3cy3a

Efficient Parallel Memory Organization For Turbo Decoders

Perttu Salmela, Ruirui Gu, Shuvra S. Bhattacharyya, Jarmo Takala
2007 Zenodo  
In this paper, an important result for parallel memory accesses in turbo decoders is derived as it is shown that the memory can be split into two banks to maintain conflict free accesses.  ...  Parallel memory access of a turbo decoder is of high importance as indicated by two patent applications [4, 5] . In [6] and [4] a conflict free access scheme is developed.  ... 
doi:10.5281/zenodo.40373 fatcat:towtenvd75ayfnva3hbyjptxya

Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm

Ayaz ul Hassan Khan Khan, Mayez Al-Mouhamed, Allam Fatayer, Anas Almousa, Abdulrahman Baqais, Mohammed Assayony
2014 International Journal of Networked and Distributed Computing (IJNDC)  
In this paper, two matrix transpose algorithms are proposed to alleviate the aforementioned issues of ensuring coalesced access and conflict free bank access.  ...  The degradation in performance involves the memory access pattern such as coalesced access in the global memory and bank conflict in the shared memory of streaming multiprocessors within the GPU.  ...  We have also applied the proposed transpose algorithm to recursive gaussian implementation of NVIDIA SDK and achieved about 6% improvement in performance.  ... 
doi:10.2991/ijndc.2014.2.3.2 fatcat:q4t3uohombc2hb626aoj6i6soe

An Efficient Parallel Algorithm for Latin Square Design: A Multi Core CPU Approach

Abhay B. Rathod, Sanjay M. Gulhane
2017 International Journal of System Modeling and Simulation  
These squares provide conflict free access to various subsets of an n x n array using n memory modules.  ...  The results of parallel Latin Square design were very promising and showed a potential that this design could successfully be applied to the parallel routing problems for conflict free data access.  ...  These squares provide conflict free access to various subsets of an n x n array using n memory modules.  ... 
doi:10.24178/ijsms.2017.2.2.27 fatcat:2lj7gp5kevgqxffkqqoghjp3wa

A Parallel Processing System for a High-Speed Printed Document Recognition

Kyung-Ae Moon, Hyung Lee, Hee-Jun Yoon, Jong-Won Park
1996 IAPR International Workshop on Machine Vision Applications  
This paper transforms a serial recognition algorithm using a mesh feature for the printed characters into the parallel algorithm, proposes a parallel processor systkm and a parallel memory system for the  ...  The recognition ratio of the algorithms proposed up to date is about 95%, but the recognition speed of the algorithm is 10s of characters per second.  ...  Simulation of the parallel processor and the conflict-free memory system A simulation of the parallel processor and the conflict-free memory system is performed by using CADENCE and illustrated in Fig.  ... 
dblp:conf/mva/MoonLYP96 fatcat:i5ccy43gdvforo62rx6a7hwmzq

High-throughput Contention-Free concurrent interleaver architecture for multi-standard turbo decoder

Guohui Wang, Yang Sun, Joseph R. Cavallaro, Yuanbin Guo
2011 ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors  
However, the interleaver has become a major bottleneck that limits the achievable throughput in the parallel decoders due to the massive memory conflicts.  ...  In this paper, we propose a flexible Double-Buffer based Contention-Free (DBCF) interleaver architecture that can efficiently solve the memory conflict problem for parallel turbo decoders with very high  ...  If N access = 1, there is no memory conflict. While N access > 1 means that multiple LLRs try to access the same memory module simultaneously hence causing an N access -way memory conflict.  ... 
doi:10.1109/asap.2011.6043259 dblp:conf/asap/WangSCG11 fatcat:ayg2y26egnatlfjvg7xdc5vope

A memory mapping approach for parallel interleaver design with multiples read and write accesses

C. Chavet, P. Coussy
2010 Proceedings of 2010 IEEE International Symposium on Circuits and Systems  
However, to be efficient parallel architectures require to avoid collision accesses i.e. concurrent read/write accesses should not target the same memory block.  ...  In this paper we propose a methodology which always finds a collision-free mapping of the variables in the memory banks and which optimizes the resulting interleaving architecture.  ...  Then the write memory access of D in CNear is exchanged in order to be mapped to a memory bank which will solve the conflict access in the current column.  ... 
doi:10.1109/iscas.2010.5537955 dblp:conf/iscas/ChavetC10 fatcat:bvkse23d6vga3gei5arnjb7ctu

Efficient Batched Predecessor Search in Shared Memory on GPUs

Ben Karsin, Henri Casanova, Nodari Sitchinava
2015 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)  
However, due to architectural features, for many problems it is challenging to design parallel algorithms that exploit the full compute power of GPUs. Among these features is the memory design.  ...  Although the issue of coalesced global memory access has been documented and studied extensively, another important architectural feature is the organization of shared memory into banks.  ...  Conflict-Free Search In this section we present a modified PBS algorithm, which we call Parallel Binary Search -Conflict Free (PBS-CF), that is free of shared memory bank conflicts.  ... 
doi:10.1109/hipc.2015.40 dblp:conf/hipc/KarsinCS15 fatcat:3vufqmgijrcelhnzrzli4t6jni

Optimal tree access by elementary and composite templates in parallel memory systems

V. Auletta, S.K. Das, A. De Vivo, M.C. Pinotti, V. Scarano
2002 IEEE Transactions on Parallel and Distributed Systems  
These mappings are evaluated with respect to the following criteria: 1) the largest number of data items that can be accessed in parallel without memory conflicts; 2) the number of memory conflicts that  ...  More specifically, we describe an algorithm for mapping complete binary trees of height H onto M memory modules and prove that it achieves the following performance results: 1) conflict-free access to  ...  ACKNOWLEDGMENTS The authors would like to acknowledge the editor and the anonymous referees for their valuable comments that contributed to greatly improve the paper.  ... 
doi:10.1109/71.995820 fatcat:r7lu5ezw3za7vnfzwl7q3f6iby

Design and Implementation of a Conflict-free Memory Accessing Technique for FFT on Multicluster VLIW DSP

Hong Ye, Naijie Gu, Xiaoci Zhang, Chuanwen Lin
2018 IEICE Electronics Express  
Furthermore, a novel conflict-free memory-addressing scheme called Modulo-Block Scheduling (MBS) is proposed to ensure the continuous operation.  ...  Aiming at solving the conflicts in memory accessing of these applications, an address generation technique called Mod-N Address is presented this paper.  ...  Because memory access is the major cause of data non-parallelization and power dissipated, a conflict-free memory addressing scheme is important for DSPs.  ... 
doi:10.1587/elex.15.20180674 fatcat:y2kcpgb5qjh3zlrz4wdqqkg4ue

Offline Permutation on the CUDA-enabled GPU

Akihiko KASAGI, Koji NAKANO, Yasuaki ITO
2014 IEICE transactions on information and systems  
The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computation on CUDA-enabled GPUs.  ...  The offline permutation is a task to copy numbers stored in an array a of size n to an array b of the same size along a permutation P given in advance.  ...  The conflict-free read and the conflict-free write to the shared memory is performed in k DMMs in parallel. Hence, these rounds takes Table 1 .  ... 
doi:10.1587/transinf.2014pap0010 fatcat:ftvh2gwpojheth4743fklapfbq

Memory Organization with Multi-Pattern Parallel Accesses

Arseni Vitkovski, Georgi Kuzmanov, Georgi Gaydadjiev
2008 2008 Design, Automation and Test in Europe  
Index Terms-Conflict-free access, high bandwidth, multi-pattern access, parallel memories.  ...  We propose an interleaved memory organization supporting multi-pattern parallel accesses in twodimensional (2D) addressing space.  ...  Related work: A number of solutions for conflict-free parallel memory access have been proposed in the literature.  ... 
doi:10.1109/date.2008.4484873 dblp:conf/date/VitkovskiKG08 fatcat:e7egxpno45fx7hmadfr5coz26y

Memory organization with multi-pattern parallel accesses

Arseni Vitkovski, Georgi Kuzmanov, Georgi Gaydadjiev
2008 Proceedings of the conference on Design, automation and test in Europe - DATE '08  
Index Terms-Conflict-free access, high bandwidth, multi-pattern access, parallel memories.  ...  We propose an interleaved memory organization supporting multi-pattern parallel accesses in twodimensional (2D) addressing space.  ...  Related work: A number of solutions for conflict-free parallel memory access have been proposed in the literature.  ... 
doi:10.1145/1403375.1403719 fatcat:kkfrqw7sbjh5re2tzatftqdxu4

Optimized Fast Walsh–Hadamard Transform on GPUs for non-binary LDPC decoding

Joao Andrade, Gabriel Falcao, Vitor Silva
2014 Parallel Computing  
We have developed a massively parallel Fast Walsh-Hadamard Transform (FWHT) which exploits the Graphics Processing Unit (GPU) pipeline and memory hierarchy, thereby minimizing the level of memory bank  ...  conflicts and maximizing the number of returned instructions per clock cycle for different generations of graphics processors, with considerable speedup gains in FT-SPA based non-binary LDPC decoding.  ...  For instance, bank 0 is accessed conflict free on the first stage but is accessed with conflict by threads t 0 and t 2 in both the second and third stage, respectively.  ... 
doi:10.1016/j.parco.2014.07.001 fatcat:k7gpp5vhhrajtmfoi2udcc2u6q
« Previous Showing results 1 — 15 out of 124,453 results