Filters








37,312 Hits in 4.9 sec

Computational fluid dynamics on parallel processors

William D. Gropp, Edward B. Smith
1990 Computers & Fluids  
Wilson of AFOSR, Michael J. Werle and Joseph R. Caspar of UTRC, and Martin H. Schultz of Yale University, for their comments and suggestions.  ...  These run from very fine grain machines such as vector processors and very long instruction word machines to large numbers of almost independent processors.  ...  Message passing machines have simpler access control but at a higher cost in sharing data.  ... 
doi:10.1016/0045-7930(90)90012-m fatcat:2dht5syuvzbh5i2e7lcunobx7e

Computational models and resource allocation for supercomputers

J. Mauney, D.P. Agrawal, Y.K. Choe, E.A. Harcourt, S. Kim, W.J. Staats
1989 Proceedings of the IEEE  
The computational needs of a program must be cast in terms of the computational model supported by the supercomputer, and this must be done in a way that makes effective use of the machine's resources.  ...  The computational models of available supercomputers and the associated resource allocation techniques are surveyed.  ...  Processor synchronization data flow interleaving synchronous asynchronous asynchronous Computational model Data flow model Pipeline model Array processor model Shared memory model  ... 
doi:10.1109/5.48828 fatcat:vhlqy2v3trfwblcqpsol2zivpi

A Survey of Paradigms for Building and Designing Parallel Computing Machines

Ahmed Faraz, Faiz Ul Haque Zeya, Majid Kaleem
2015 Computer Science & Engineering An International Journal  
We discuss Multiprocessor and Data Flow Machines in a concise manner.  ...  The Wave front Processors combine the Systolic Processor architecture with Data Flow machine architecture.  ...  Hayes [6] also gives a detailed description of Pipelining, RISC machines and CISC machines.  ... 
doi:10.5121/cseij.2015.5101 fatcat:axltnsfkdrbmra43kuoxnbisqi

Some computational challenges of developing efficient parallel algorithms for data-dependent computations in thermal-hydraulics supercomputer applications

S.B. Woodruff
1994 Nuclear Engineering and Design  
Although the formulations for these coefficients are local, the costs are flow-regime-or data-dependent; i.e., the computations needed for a given spatial n~de often vary widely as a function of time.  ...  One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients.  ...  A SIMD computation may be done in a pipelined manner on a single processor on a vector machine such as a CRAY Y-MP or simultaneously on the multiple processors of a data parallel machine such as a CM-2  ... 
doi:10.1016/0029-5493(94)90351-4 fatcat:ntvyhuzimbcbznbqf3jgtjecu4

Page 252 of American Society of Civil Engineers. Collected Journals Vol. 119, Issue 3 [page]

1993 American Society of Civil Engineers. Collected Journals  
In addition, the systolic-array architectures and data-flow machines ex- ploit parallelism in ways that have a promise for achieving high performance.  ...  large n, the vector processor is approximately nine times faster than the serial machine. There are three means of decreasing the time predicted by (2): 1. Increase the number of pipelines in use.  ... 

High Level Simulation & Modeling for Medical Applications - Ultrasound Case [chapter]

A. Chihoub
2002 Lecture Notes in Computer Science  
The results showed that such an architecture was both feasible and cost effective.  ...  In this paper we will present the results of mapping and simulating the B-Mode (echo), and Doppler (flow) algorithms in ultrasound processing onto 1D and 2D based architectures.  ...  The cost projections for such a machine using the number of processors estimated from the simulation indicate that such an approach is also cost efficient.  ... 
doi:10.1007/3-540-45787-9_25 fatcat:kr74ehfcwbgj3myit7nzvjvle4

The Next Four Orders of Magnitude in Performance for Parallel CFD [chapter]

D.E. Keyes
2000 Parallel Computational Fluid Dynamics 1999  
/s on the same machines.  ...  We briefly review the algorithmic structure of typical PDE-based CFD codes that is responsible for this situation and consider possible architectural and algorithmic sources for performance improvement  ...  However, the cost-effectiveness of this brute-force approach towards petaflop/s is highly sensitive to frequency and latency of global reduction operations, and to modest departures from perfect load balance  ... 
doi:10.1016/b978-044482851-4.50033-5 fatcat:qwfqsr2lonbcnhq7vrbiwqklla

The next four orders of magnitude in performance for parallel CFD [chapter]

D EKEYES
2000 Parallel Computational Fluid Dynamics 1999  
/s on the same machines.  ...  We briefly review the algorithmic structure of typical PDE-based CFD codes that is responsible for this situation and consider possible architectural and algorithmic sources for performance improvement  ...  However, the cost-effectiveness of this brute-force approach towards petaflop/s is highly sensitive to frequency and latency of global reduction operations, and to modest departures from perfect load balance  ... 
doi:10.1016/b978-044482851-4/50033-5 fatcat:brx6ced73ff2ldw26j5ikhyjhi

Parallel Computers in Signal Processing

Narsingh Deo
1985 Defence Science Journal  
The paper reviews various types of parallel computer architectures from the viewpoint of signal and image processing.  ...  Signal processing often requires a great deal of raw computing power for which it is important to take a look at parallel computers.  ...  there is a master-slave relationship among the processors, and so forth. cost-effective the hardware utilization is.  ... 
doi:10.14429/dsj.35.6031 fatcat:bbmfjzefgrcellkfnvpmuxl5ve

Engineering and scientific processing on the IBM 3090

D. H. Gibson, D. W. Rain, H. F. Walsh
1986 IBM Systems Journal  
Data flow. The IBM 3090 Vector Facility data flow is shown in Figure 4.  ...  The section size is chosen by the processor designer. Large section size increases cost and save/restore time, but reduces startup effects.  ... 
doi:10.1147/sj.251.0036 fatcat:g24zbgjswndz3ord7pmulfsfui

A data-level parallel linear-quadratic penalty algorithm for multicommodity network flows

Mustafa Ç. Pinar, Stavros A. Zenios
1994 ACM Transactions on Mathematical Software  
Particular emphasis is placed on the mapping of both the subproblem and master problem data to the processing elements of a massively parallel computer, the Connection Machine CM-2.  ...  We describe the development of a data-level, massively parallel software system for the solution of multicommodity network flow problems.  ...  C Pmar and S A. Zenios tures. Such architectures offer both scalability and cost effectiveness.  ... 
doi:10.1145/198429.198439 fatcat:5xw5y46o3rgzbpgofshdqf3dsy

A parallel language and its compilation to multiprocessor machines or VLSI

Marina C. Chen
1986 Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL '86  
Each optimizing compiler, targeted for a particular machine, determines the appropriate granular size of parallelism and attains a balance between computations and communications.  ...  In Crystal, a program consists of a system of recursion equations and is interpreted as a parallel system.  ...  In a Crystal program~ cost of cemnmnications can be extracted from difference vectors, as they are mapped to commuaicadoa vectors each of which has a cost associated with the target machine and technology  ... 
doi:10.1145/512644.512656 dblp:conf/popl/Chen86 fatcat:xfoub3v2wng6jkl4d5dklrcjw4

Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations

Leonid Oliker, Andrew Canning, Jonathan Carter, John Shalf, David Skinner, Ethier Ethier, Rupak Biswas, Jahed Djomehri, Rob Van der Wijngaart
2003 Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03  
This paper examines the intranode performance of the NEC SX-6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of scientific computing areas.  ...  However, certain applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively. * Employee of Computer Sciences  ...  Acknowledgements The authors would like to gratefully thank the Arctic Region Supercomputing Center for access to the NEC SX-6, the Center for Computational Sciences at ORNL for access to the IBM p690, and  ... 
doi:10.1145/1048935.1050213 dblp:conf/sc/OlikerCCSSEBDW03 fatcat:pbiviyz2sraefohdct4e3nljxm

Performance Improvement of Sparse Matrix Vector Product on Vector Machines [chapter]

Sunil R. Tiyyagura, Uwe Küster, Stefan Borowski
2006 Lecture Notes in Computer Science  
In this paper, recent developments in sparse storage formats on vector machines are reviewed. Then, several improvements to memory access in the sparse matrix vector product are suggested.  ...  Matrix vector multiplication is one of the key operations that has a significant impact on the performance of any iterative solver.  ...  Use of Vector Registers Most vector machines provide a programmer interface to vector registers in order to temporarily store data, like the result vector (res).  ... 
doi:10.1007/11758501_30 fatcat:2rgsndtr25e6pbw3w6fq6avrca

Three-dimensional direct particle simulation on the connection machine

LEONARDO DAGUM
1992 Journal of thermophysics and heat transfer  
It is not practical to let the processors in the main VP-set directly access the data in the geometry VP- set because of the enormous communication cost it would entail.  ...  Connection Machine Architecture The thinking machines connection machine model CM-2 is a massively parallel single-instruction multiple-data (SIMD) computer consisting of many thousands of bit serial data  ... 
doi:10.2514/3.11545 fatcat:lafst5zvczblzh5dwwkqarhn6e
« Previous Showing results 1 — 15 out of 37,312 results