Filters








961 Hits in 9.1 sec

Optimizing Shape Design with Distributed Parallel Genetic Programming on GPUs [chapter]

Simon Harding, W. Banzhaf
2012 Studies in Computational Intelligence  
This technique is well suited for a distributed parallel system to increase efficiency.  ...  Fitness evaluation of the genetic programming technique is accomplished through a custom implementation of a fluid dynamics solver running on graphics processing units (GPUs).  ...  The technique uses a so-called distributed parallel evolutionary algorithm to optimize the solution, along with a general purpose parallel fluid dynamics solver to evaluate the shape parameters.  ... 
doi:10.1007/978-3-642-28789-3_3 fatcat:c625ysjdlffe7m4nddrtgsgayu

Parallel Programming Of A Reservoir Simulator

Mariyamni Awang
1992 Jurnal Teknologi  
This study concerns applying parallel programming tore ervoir simulation using a 32-Mbyte, 12-processor parallel computer.  ...  Matrix generation was parallelized using monitors as macros to synchronize calculation . The performance of the simulator was measured by the speed up.  ...  For convergence, the simulation proceeds to the next time step, otherwise the iteration is repeated.  ... 
doi:10.11113/jt.v19.1053 fatcat:i6pt4oylnjaj7e2tnb2nf7qjei

Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models

Dimitrios S. Nikolopoulos, Eduard Ayguadé, Constantine D. Polychronopoulos
2002 International journal of parallel programming  
These techniques can be used to effectively replace manual data distribution in regular applications.  ...  This paper compares data distribution methodologies for scaling the performance of OpenMP on NUMA architectures.  ...  (39) Irregular parallel applications appear to be one class of programs where page migration is not an option and domain-specific knowledge may be required to encode proper algorithms for data and load  ... 
doi:10.1023/a:1019899812171 dblp:journals/ijpp/NikolopoulosAP02 fatcat:3neic6ykybhkzpytgjf4kqpija

Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

Leonid Oliker, Xiaoye Li, Parry Husbands, Rupak Biswas
2002 SIAM Review  
Thanks alsoto BruceHendrickson andtheotheranonymous referees for theirsuggestionsthat helped improve thepaper.  ...  This work investi- gates the performance and the programming effort for the Conjugate Gradient (CG) iterative solver for sparse matrices on each of these architectural platforms using their  ...  A second group of iterative techniques uses a projection process, which is a canonical way of extracting an approximate solution from a subspace.  ... 
doi:10.1137/s00361445003820 fatcat:o7uwxsbcfnf4ppcfc2sbpxncmm

Parallelization of an Unsteady ALE Solver with Deforming Mesh Using OpenACC

Wenpeng Ma, Zhonghua Lu, Wu Yuan, Xiaodong Hu
2017 Scientific Programming  
This paper presents a parallel, GPU-based, deforming mesh-enabled unsteady numerical solver for solving moving body problems by using OpenACC.  ...  And both 2D and 3D cases are conducted to validate the efficiency, correctness, and accuracy of the present solver.  ...  of an ALE solver that is able to simulate unsteady flows with deforming mesh.  ... 
doi:10.1155/2017/4610138 fatcat:bfjxxfuesbcufekrfelw2tgxly

Migrant threads on process farms: parallel programming with Ariadne

Edward Mascarenhas, Vernon Rego
1998 Concurrency Practice and Experience  
Sequential programs are readily converted into parallel programs for shared or distributed memory, with low development effort.  ...  We present a novel and portable threads-based system for the development of concurrent applications on shared and distributed memory environments.  ...  Distributed applications described in Section 6 include a a distributed successive over-relaxation (SOR) linear solver, a particle-physics application, and adaptive quadrature.  ... 
doi:10.1002/(sici)1096-9128(19980810)10:9<673::aid-cpe362>3.0.co;2-5 fatcat:ixdg54l7v5b2zibur5fwljtzuy

Programming with transactional coherence and consistency (TCC)

Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben Hertzberg, Mike Chen, Christos Kozyrakis, Kunle Olukotun
2004 SIGPLAN notices  
The performance of these programs may then easily be optimized, based on feedback from real program execution, using a few simple techniques.  ...  We describe two basic programming language constructs for decomposing programs into transactions, a loop conversion syntax and a general transaction-forking mechanism.  ...  ACKNOWLEDGEMENTS This work was supported by NSF grant CCR-0220138 and DARPA PCA program grants F29601-01-2-0085 and F29601-03-2-0117.  ... 
doi:10.1145/1037187.1024395 fatcat:izhh37goeffmhlpl3xizvyv66m

Programming with transactional coherence and consistency (TCC)

Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben Hertzberg, Mike Chen, Christos Kozyrakis, Kunle Olukotun
2004 ACM SIGOPS Operating Systems Review  
The performance of these programs may then easily be optimized, based on feedback from real program execution, using a few simple techniques.  ...  We describe two basic programming language constructs for decomposing programs into transactions, a loop conversion syntax and a general transaction-forking mechanism.  ...  ACKNOWLEDGEMENTS This work was supported by NSF grant CCR-0220138 and DARPA PCA program grants F29601-01-2-0085 and F29601-03-2-0117.  ... 
doi:10.1145/1037949.1024395 fatcat:tyuk7ppydbdxfgqf6ym5cxaxpe

Programming with transactional coherence and consistency (TCC)

Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben Hertzberg, Mike Chen, Christos Kozyrakis, Kunle Olukotun
2004 Proceedings of the 11th international conference on Architectural support for programming languages and operating systems - ASPLOS-XI  
The performance of these programs may then easily be optimized, based on feedback from real program execution, using a few simple techniques.  ...  We describe two basic programming language constructs for decomposing programs into transactions, a loop conversion syntax and a general transaction-forking mechanism.  ...  ACKNOWLEDGEMENTS This work was supported by NSF grant CCR-0220138 and DARPA PCA program grants F29601-01-2-0085 and F29601-03-2-0117.  ... 
doi:10.1145/1024393.1024395 dblp:conf/asplos/HammondCWHCKO04 fatcat:6dkmd6hpdjbo5pkle2uxasqbju

Programming with transactional coherence and consistency (TCC)

Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben Hertzberg, Mike Chen, Christos Kozyrakis, Kunle Olukotun
2004 SIGARCH Computer Architecture News  
The performance of these programs may then easily be optimized, based on feedback from real program execution, using a few simple techniques.  ...  We describe two basic programming language constructs for decomposing programs into transactions, a loop conversion syntax and a general transaction-forking mechanism.  ...  ACKNOWLEDGEMENTS This work was supported by NSF grant CCR-0220138 and DARPA PCA program grants F29601-01-2-0085 and F29601-03-2-0117.  ... 
doi:10.1145/1037947.1024395 fatcat:tjylhp5ikjbofblefxpdr3wpei

Towards Architecture-Adaptable Parallel Programming

Santhosh Kumaran, Michael J. Quinn
1997 Scientific Programming  
In this article, we propose a solution to this problem in the form of an architecture-adaptable programming environment.  ...  From a pragmatic point of view, this is not a major liability since our strategy will be useful in building domain-specific problem solving environments and application-oriented compilers, which can be  ...  I am grateful to my parents for teaching me to dream and to work hard to make those dreams come true. This work was supported by NSF grant ASC-9208971.  ... 
doi:10.1155/1997/586912 fatcat:2el6aq4kwjb2zlwh42mty7wj6u

Thread scheduling for cache locality

James Philbin, Jan Edler, Otto J. Anshus, Craig C. Douglas, Kai Li
1996 Proceedings of the seventh international conference on Architectural support for programming languages and operating systems - ASPLOS-VII  
Experiments with several application programs, on two systems with different cache structures, show that our thread scheduling method can improve program performance by reducing second-level cache misses  ...  This paper describes a method to improve the cache locality of sequential programs by scheduling fine-grained threads.  ...  Acknowledgements We would like to thank Thomas Anderson, Susan Eggers,  ... 
doi:10.1145/237090.237151 dblp:conf/asplos/PhilbinEADL96 fatcat:idrgeas7v5fsxim4ir3mzdntp4

The SPLASH-2 programs

Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, Anoop Gupta
1995 Proceedings of the 22nd annual international symposium on Computer architecture - ISCA '95  
The other, related goal is methodological: to assist people who will use the programs in architectural evaluations to prune the space of application and machine parameters in an informed and meaningful  ...  The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed sharedaddress-space multiprocessors.  ...  We simulate a cache-coherent shared address space multiproces-sor with physically distributed memory and one processor per node.  ... 
doi:10.1145/223982.223990 dblp:conf/isca/WooOTSG95 fatcat:46sii34aejgf7myonpew5lhqxu

The SPLASH-2 programs

Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, Anoop Gupta
1995 SIGARCH Computer Architecture News  
The other, related goal is methodological: to assist people who will use the programs in architectural evaluations to prune the space of application and machine parameters in an informed and meaningful  ...  The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed sharedaddress-space multiprocessors.  ...  We simulate a cache-coherent shared address space multiproces-sor with physically distributed memory and one processor per node.  ... 
doi:10.1145/225830.223990 fatcat:t5jsmginbrffzff6x57qpv4nra

A low-computation-complexity, energy-efficient, and high-performance linear program solver based on primal dual interior point method using memristor crossbars

Ruizhe Cai, Ao Ren, Sucheta Soundarajan, Yanzhi Wang
2018 Nano Communication Networks  
Wang, A low-computation-complexity, energy-efficient, and high-performance linear program solver based on primal dual interior point method using memristor crossbars, Nano Communication Networks (2018)  ...  Abstract Linear programming is required in a wide variety of application including routing, scheduling, and various optimization problems.  ...  Thus, a more robust feasibility detection technique is required to guarantee an optimal solution is given.  ... 
doi:10.1016/j.nancom.2018.01.001 fatcat:glqyjfvqkbechlvgqv2uvzztee
« Previous Showing results 1 — 15 out of 961 results