A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems
2001
Proceedings of the 14th international symposium on Systems synthesis - ISSS '01
In this scheme, synchronization primitives are chosen such that they can be implemented efficiently in both hardware and software on distributed shared memory architectures, without the need for atomic ...
This paper describes the implementation of a data-synchronization scheme that can be used in the functional description and hardware realization of algorithms for heterogeneous multi-processor architectures ...
We clearly separate synchronization from data transportation since in a shared memory architecture no copying of data is required. ...
doi:10.1145/500002.500003
fatcat:dhuvxzosfvamxo7oqrhlgrghxm
A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems
2001
Proceedings of the 14th international symposium on Systems synthesis - ISSS '01
In this scheme, synchronization primitives are chosen such that they can be implemented efficiently in both hardware and software on distributed shared memory architectures, without the need for atomic ...
This paper describes the implementation of a data-synchronization scheme that can be used in the functional description and hardware realization of algorithms for heterogeneous multi-processor architectures ...
We clearly separate synchronization from data transportation since in a shared memory architecture no copying of data is required. ...
doi:10.1145/500001.500003
fatcat:6jdsnn4gpjdahnqodahxhiszzy
Generation of Heterogeneous Distributed Architectures for Memory-Intensive Applications Through High-Level Synthesis
2007
IEEE Transactions on Very Large Scale Integration (vlsi) Systems
We use a combination of clustering and min-cut style partitioning techniques to yield distributed architectures, based on simulation profiling while considering various factors including data access locality ...
Synthesis should, therefore, be capable of determining a partitioned architecture, wherein array data and computations may have to be heterogeneously distributed for achieving the best performance speed-up ...
Their work has motivated our research on memory data organization and optimization. ...
doi:10.1109/tvlsi.2007.904096
fatcat:czc256r4zfc7hbe44ir6smrqwu
Massive Parallel Join in NUMA Architecture
2013
2013 IEEE International Congress on Big Data
IEEE International Congress on Big Data 978-0-7695-5006-0/13 $26.00 ...
Compared to traditional on-disk database, IMDB has advantages such as faster access to storage and simpler internal optimization algorithms. ...
In SMP architecture, threads can communicate through shared memory, thus the optimized join algorithms for SMP need to consider more about the processor synchronization cost when accessing shared memory ...
doi:10.1109/bigdata.congress.2013.37
dblp:conf/bigdata/HeZGH13
fatcat:uwwmzrfkjzebpnv2ren4cxzrpm
Long DNA Sequence Comparison on Multicore Architectures
[chapter]
2010
Lecture Notes in Computer Science
We analyze two different SW implementations on the CellBE and use simulation tools to study the performance scalability in a multicore architecture. ...
We study the memory organization that delivers the maximum bandwidth with the minimum cost. ...
TFig. 1 . 1 block(b,k) Time required to process a block of size b * k (a) Data dependency (b) Different optimal regions (c) Computation distribution
Fig. 2 . 2 (a) SPEs store data in memory. ...
doi:10.1007/978-3-642-15291-7_24
fatcat:vv2w3yjanjhjrjxrxgzlkayvda
Instruction set extensions for photonic synchronous coalesced accesses
2013
2013 IEEE High Performance Extreme Computing Conference (HPEC)
on modern architectures. ...
This operation is described, and its ISA implications explored in the context of the distributed matrix transpose, which exhibits a high degree of data non-locality, and is difficult to efficiently parallelize ...
Related work, specifically how existing parallel architectures work with distributed data, is discussed in section V, followed by conclusions in section VI.
II. ...
doi:10.1109/hpec.2013.6670326
dblp:conf/hpec/KeltcherWH13
fatcat:y7fki3y375fsvpdumbgldwy4ze
Re-engineering the ant colony optimization for CMP architectures
2019
Journal of Supercomputing
Moreover, parallel efficiency is provided for all targeted architectures, finding that core load imbalance, memory bandwidth limitations, and NUMA effects on data placement are some of the key factors ...
In the latter case, the parallel efficiency is affected by the synchronization frequency, which also affects the quality of the solution found by the distributed implementation. ...
NUMA architectures have a different memory latency depending on the NUMA node accessing the data, and may also vary depending on the consistency state of the accessed data. ...
doi:10.1007/s11227-019-02869-8
fatcat:aajgzsgk3rddrpbrnbbnjckrse
A dataflow-like programming model for future hybrid clusters
2013
International Journal of Networking and Computing
in case the memory consistency model is not optimal. ...
Broadcast, scatter and gather are modeled based on data distribution among the nodes, whereas reduction and scan follow a combining PRAM approach of having multiple threads write to the same memory location ...
The synchronization size can differ for different data and the optimal synchronization size depends on the algorithm and hardware used. ...
doi:10.15803/ijnc.3.1_15
fatcat:hzcymccayzfs7dt3t6ukkcg274
Architecture optimizations for synchronization and communication on chip multiprocessors
2008
Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)
running on CMPs
Problems
Synchronization Overhead
Spin Waits
Memory Bandwidth Bottleneck
Many Simultaneous Accesses
Cache Pollution
Data Evictions from Shared Cache
Demand-Based ...
Data Transfers
Depend on Coherence Mechanisms
5
Conventional Parallel Programming
Data parallelism by
splitting data across
multiple threads
Memory interface is
overburdened
Performance ...
doi:10.1109/ipdps.2008.4536357
dblp:conf/ipps/FideJ08
fatcat:lclrjbqwlfbvxpvyukfpbhg3e4
Scalable distributed memory embedded system with a low-cost hardware message passing interface
2009
IEICE Electronics Express
In this paper, we propose a scalable distributed memory system with a low-cost hardware message-passing interface. ...
The proposed interface improves the communication performance between nodes to decrease the overhead synchronization with a receiver reservation technique. ...
On distributed memory architecture there are synchronization issue between receive and send signal due to an imperfect synchronization. ...
doi:10.1587/elex.6.837
fatcat:4nml3maoa5hpdemsgpxgqfxp2m
From algorithm and architecture specifications to automatic generation of distributed real-time executives: a seamless flow of graphs transformations
2003
First ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2003. MEMOCODE '03. Proceedings.
We present an original architecture model which allows to perform accurate sequencer modeling, memory allocation, and heterogeneous inter-processor communications for both modes shared memory and message ...
This paper presents a seamless flow of transformations which performs dedicated distributed executive generation from a high level specification of a pair: algorithm, architecture. ...
Thanks to our architecture model, it is possible to cover a large amount of architectures based on various memory and communication networks. ...
doi:10.1109/memcod.2003.1210097
dblp:conf/memocode/GrandpierreS03
fatcat:ahxuoranenh7rix3xduw7ojmpe
A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations
2010
International journal of parallel programming
Both communication and synchronization may incur significant overhead on parallel architectures with shared memory. ...
However, the selection of the optimal ghost zone size depends on the characteristics of both the architecture and the application, and it has only been studied for message-passing systems in distributed ...
The concept of computation replication involved in ghost zones is related to data replication and distribution in the context of distributed memory systems [4] [23] , which are used to wisely distribute ...
doi:10.1007/s10766-010-0142-5
fatcat:7ygnx3qccbbllivzjfrtvwx4di
Automatic Dsp Cache Memory Management And Fast Prototyping For Multiprocessor Image Applications
2006
Zenodo
The parallel aspect of multicomponent architectures raise problems in terms of application distribution: handmade data transfers and synchronizations quickly become very complex and result in lost time ...
Moreover, when external memory is used without cache, data localisation has a great impact on performance. The distribution of data between external or internal memory is crutial. ...
doi:10.5281/zenodo.39900
fatcat:skf3b52qkzc5lhrfus3hrw3d74
P-sync: A Photonically Enabled Architecture for Efficient Non-local Data Access
2013
2013 IEEE 27th International Symposium on Parallel and Distributed Processing
This paper describes a novel synchronized global photonic bus and system architecture called P-sync that uses photonics' distance independence to greatly improve performance on many important applications ...
The architecture is evaluated in the context of a non-local yet common application: the distributed Fast Fourier Transform. ...
of optimized architectures for the user code, optimized generated code, and results from a run on target architectures. ...
doi:10.1109/ipdps.2013.56
dblp:conf/ipps/WhelihanHSRWMMKBBCHBC13
fatcat:3qw2gowypndh7h32jynxnga7b4
Using Rtos In The Aaa Methodology Automatic Executive Generation
2006
Zenodo
One of them is generic and do not depend on the algorithm. It supports the architecture specification such as memory allocations, sequence synchronizations and also inter-operator transfers. ...
The optimization problem aims to select the most efficient one between them (real-time constraints, architecture ressources. . .). ...
doi:10.5281/zenodo.39917
fatcat:a5wcm3qcwzdojimyx7lwkvhd4y
« Previous
Showing results 1 — 15 out of 81,297 results