Filters








37,843 Hits in 3.9 sec

Implementing sequentially consistent programs on processor consistent platforms

Lisa Higham, Jalal Kawash
2008 Journal of Parallel and Distributed Computing  
This motivates the study of how to implement programs designed for sequential consistency on platforms with weaker consistency models.  ...  One variant of processor consistency originally proposed by Goodman and called here PC-G is an exception.  ...  Specifically, using the addition of only one multi-writer variable, we provide a simple implementation on a PC-G platform of any 2-process sequentially consistent program that uses only single-writer variables  ... 
doi:10.1016/j.jpdc.2007.07.002 fatcat:ogf2lqrfzrghhmeycvs7w7nljm

Implementing sequentially consistent programs on processor consistent platforms

L. Higham, J. Kawash
2004 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings.  
This motivates the study of how to implement programs designed for sequential consistency on platforms with weaker consistency models.  ...  One variant of processor consistency originally proposed by Goodman and called here PC-G is an exception.  ...  Specifically, using the addition of only one multi-writer variable, we provide a simple implementation on a PC-G platform of any 2-process sequentially consistent program that uses only single-writer variables  ... 
doi:10.1109/ispan.2004.1300500 dblp:conf/ispan/HighamK04 fatcat:e75xhcwnzvfpdautkfnoevkusi

Realization and performance comparison of sequential and weak memory consistency models in network-on-chip based multi-core systems

Abdul Naeem, Xiaowen Chen, Zhonghai Lu, Axel Jantsch
2011 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)  
The results show that the weak consistency improves the performance by 46.17% and 33.76% on average in the code and consistency latencies over the sequential consistency model, due to relaxation in the  ...  This paper studies realization and performance comparison of the sequential and weak consistency models in the network-on-chip (NoC) based distributed shared memory (DSM) multi-core systems.  ...  Acknowledgements This work has been supported partially by the FP7 EU project Mosart under contract number IST-215244, and the HEC/SI joint scholarship program of Pakistan and Sweden.  ... 
doi:10.1109/aspdac.2011.5722176 dblp:conf/aspdac/NaeemCLJ11 fatcat:factsuwubfeenozqh6nefkdsom

Phase correlation processing for DPIV measurements

Adric C. Eckstein, John Charonko, Pavlos Vlachos
2008 Experiments in Fluids  
The design methodology used to implement the PIV application on a specialized FPGA platform under development is described in brief and the resulting performance benefit is analyzed.  ...  The PIV application is mapped to a Nvidia GPU system, resulting in 3x speedup over a dual quad-core Intel processor implementation.  ...  The Nvidia Tesla C1060 GPU was funded through the Nvidia Professor Partnership program.  ... 
doi:10.1007/s00348-008-0492-6 fatcat:hb2ktoovsveslkytejyxmi3uhm

Formal Reasoning about Hardware and Software Memory Models [chapter]

Abhik Roychoudhury
2002 Lecture Notes in Computer Science  
The allowed behaviors of any multithreaded Java program on any implementation platform (multi-or uni-processor), are described in terms of a memory consistency model called the Java Memory Model (JMM).  ...  The Java programming language allows multithreaded programming, where threads can be run on multiprocessor or uniprocessor platforms.  ...  However, this merely means that the returned value of C may be 0 or 1 on certain (not all) implementations. Uni-processor implementations guarantee Sequential Consistency.  ... 
doi:10.1007/3-540-36103-0_44 fatcat:vk6jcwiq4rhypawfwagfxnoqa4

On the Design and Implementation of a Portable DSM System for Low-Cost Multicomputers [chapter]

Federico Meza, Alvaro E. Campos, Cristian Ruz
2003 Lecture Notes in Computer Science  
We present a layered architecture that allows a portable, scalable, and low-cost implementation that runs on Linux and Windows.  ...  Distributed shared memory systems provide an easy-to-program parallel environment, to harness the available computing power of PC networks.  ...  Implementing a sequential-consistency protocol involves writing handlers for page-fault events on read and write operations, and for serving requests from a remote processor.  ... 
doi:10.1007/3-540-44839-x_102 fatcat:quo7lgkrxzeitke3vg4ch6x76i

A Java-based parallel platform for the implementation of evolutionary computation for engineering applications

Chun Che Fung *, Jia Bin Li, Kok Wai Wong, Kit Po Wong
2004 International Journal of Systems Science  
This paper proposes an extended version of a previously developed low cost parallel computation platform called Para Worker.  ...  The proposal is particularly useful for the implementation and execution of computational intelligence techniques such as evolutionary computing for engineering applications.  ...  Orca implements sequential consistency. Orca objects can be in one of two states: single-copy or replicated. Replicated objects are maintained consistent with the broadcast mechanism.  ... 
doi:10.1080/00207720412331303651 fatcat:hv2e3vep3jdnpeiqphotp7iiqu

Systematic and Automated Multiprocessor System Design, Programming, and Implementation

H. Nikolov, T. Stefanov, E. Deprettere
2008 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
a single processor.  ...  performance needs of applications executed on such platforms.  ...  On one hand, the applications are typically specified by the application developers as sequential programs using imperative programming languages such as C/C++ or Matlab.  ... 
doi:10.1109/tcad.2007.911337 fatcat:akppsyu3szgdpg6yp4tb32ra7e

Comparison of Hybrid Sorting Algorithms Implemented on Different Parallel Hardware Platforms

Zurek Dominik, Pietron Marcin, Wielgosz Maciej, Wiatr Kazimierz
2013 Computer Science  
There are a lot of wellknown sorting algorithms created for sequential execution on a single processor.  ...  Then, a hybrid algorithm will be presented, consisting of parts executed on both platforms (a standard CPU and GPU).  ...  The project is co-funded by the European Regional Development Fund (ERDF) as a part of the Innovative Economy program (POIG.02.03.00-00-018/08).  ... 
doi:10.7494/csci.2013.14.4.679 fatcat:fetcipfq25g3ndi6cump5rxw6u

Regional Consistency: Programmability and Performance for Non-cache-coherent Systems

Bharath Ramesh, Calvin J. Ribbens, Srinidhi Varadarajan
2013 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications  
Our primary objective is to define a memory consistency model that presents the familiar threadbased shared memory programming model, but allows good application performance on non-cache-coherent systems  ...  Results on up to 256 processors for representative benchmarks demonstrate the potential of RegC in the context of our prototype distributed shared memory system.  ...  Sequential consistency (SC) has two important properties: (1) program order is maintained at each processor, (2) global order is an interleaving of all the sequential orders at each processor.  ... 
doi:10.1109/trustcom.2013.115 dblp:conf/trustcom/RameshRV13 fatcat:bc7egm6ufng4zawzojixfx2z2e

Regional Consistency: Programmability and Performance for Non-Cache-Coherent Systems [article]

Bharath Ramesh and Calvin J. Ribbens and Srinidhi Varadarajan
2013 arXiv   pre-print
Our primary objective is to define a memory consistency model that presents the familiar thread-based shared memory programming model, but allows good application performance on non-cache-coherent systems  ...  Results on up to 256 processors for representative benchmarks demonstrate the potential of RegC in the context of our prototype distributed shared memory system.  ...  Sequential consistency (SC) has two important properties: (1) program order is maintained at each processor, (2) global order is an interleaving of all the sequential orders at each processor.  ... 
arXiv:1301.4490v1 fatcat:6vouqq535fa5ldk6egitb2kzlq

Memory Architecture and Management in an NoC Platform [chapter]

Axel Jantsch, Xiaowen Chen, Abdul Naeem, Yuang Zhang, Sando Penolazzi, Zhonghai Lu
2011 Scalable Multi-core Architectures  
On-chip Computation is moving away from a sequential to a parallel paradigm leading to dozens, hundreds, and soon even thousands of cores and computational units on a single die.  ...  Based on the observation that processor speed grows by 80% every year but memory access time decreases only by 7% per year, they predict that, if no invention breaks this  ...  Among the many other potential DME applications we present virtual address space management, synchronization, cache coherency, and memory consistency.  ... 
doi:10.1007/978-1-4419-6778-7_1 fatcat:ja4wt52fnzb43okhyabajri33a

Parallelization strategies of the canny edge detector for multi-core CPUs and many-core GPUs

Taieb Lamine Ben Cheikh, Giovanni Beltrame, Gabriela Nicolescu, Farida Cheriet, Sofiene Tahar
2012 10th IEEE International NEWCAS Conference  
Different parallel implementations of the Canny Edge Detector are run on two distinct hardware platforms, namely a multi-core CPU, and a many-core GPU.  ...  Our experiments uncover design rules that, depending on a set of applications and platform factors (parallel features, data size, and architecture), indicate which parallelization scheme is more suitable  ...  This platform includes 480 Streaming Processors (SP) or cores distributed on 15 Streaming Multiprocessors (SM) as 32 SP per SM.  ... 
doi:10.1109/newcas.2012.6328953 dblp:conf/newcas/CheikhBNCT12 fatcat:ebp3xeqwmzhnzpnycm4tbgoitm

Scalability of weak consistency in NoC based multicore architectures

Abdul Naeem, Xiaowen Chen, Zhonghai Lu, Axel Jantsch
2010 Proceedings of 2010 IEEE International Symposium on Circuits and Systems  
Within DSM systems, memory consistency is a critical issue since it affects not only performance but also the correctness of programs.  ...  In this paper, we investigate the scalability of the weak consistency model, which may be implemented using the concept of a transaction counter.  ...  ACKNOWLEDGMENTS This work has been supported partially by the FP7 EU project Mosart under contract number IST-215244, the SI/HEC joint scholarships program of Pakistan and Sweden.  ... 
doi:10.1109/iscas.2010.5537833 dblp:conf/iscas/NaeemCLJ10 fatcat:edra5r4m2bbtnbo6ho7qapyjze

High level design space exploration of RVC codec specifications for multi-core heterogeneous platforms

Christophe Lucarz, Ghislain Roquier, Marco Mattavelli
2010 2010 Conference on Design and Architectures for Signal and Image Processing (DASIP)  
As we are entering in the multicore era, sequential programs are no longer the most appropriate way to specify algorithms targeted to run on several processing units.  ...  Although the RVC standard does not imply any specific implementation design flow, it is an appropriate starting point for targeting multiple processing units platforms.  ...  INTRODUCTION Designing and implementing complex digital systems as video decoders on heterogeneous multi-core platforms is a very difficult task.  ... 
doi:10.1109/dasip.2010.5706264 dblp:conf/dasip/LucarzRM10 fatcat:ufblw2ft6rh3flznr3rh2hs4ry
« Previous Showing results 1 — 15 out of 37,843 results