Filters








113 Hits in 1.7 sec

Extending Modulo Scheduling with Memory Reference Merging [chapter]

Benoît Dupont de Dinechin
1999 Lecture Notes in Computer Science  
Experiments on the Cray T3E demonstrate the benefits of memory reference merging.  ...  This technique has been used over several years on the Cray T3E block scheduler, and was later generalized to the Cray T3E software pipeliner.  ...  Currently, software prefetching on the Cray T3E is limited to library routines that were developed and optimized in assembly code.  ... 
doi:10.1007/978-3-540-49051-7_19 fatcat:cpq6dedt7zasvbaknitzovp75y

Benchmarking computer platforms for lattice QCD applications

M. Hasenbusch, K. Jansen, D. Pleiter, H. Stüben, P. Wegner, T. Wettig, H. Wittig
2004 Nuclear Physics B - Proceedings Supplements  
The platforms considered are apeNEXT, CRAY T3E, Hitachi SR8000, IBM p690, PC-Clusters, and QCDOC.  ...  We define a benchmark suite for lattice QCD and report on benchmark results from several computer platforms.  ...  CRAY T3E-900 The CRAY T3E is a classic massively parallel computer. It has single CPU nodes and a threedimensional torus network. The T3E architecture is rather well balanced.  ... 
doi:10.1016/s0920-5632(03)02731-2 fatcat:rzizjsx3lzhcdhxsiwktgcmwbq

Performance of the CRAY T3E multiprocessor

Ed Anderson, Jeff Brooks, Charles Grassl, Steve Scott
1997 Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97  
The CRAY T3E is a scalable shared-memory multiprocessor based on the DEC Alpha 21164 microprocessor.  ...  This paper reports our experiences with the CRAY T3E and presents a variety of performance measurements. Section 2 provides a brief overview of the system architecture.  ...  The reader is referred to [6] and [7] for further details on the CRAY T3E design.  ... 
doi:10.1145/509593.509632 dblp:conf/sc/AndersonBGS97 fatcat:3gk4aq4dh5bbhg2xr3ici33j74

Retrospective: improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

Norman P. Jouppi
1998 25 years of the international symposia on Computer architecture (selected papers) - ISCA '98  
Acknowledgments Keith Farkas provided helpful comments on a draft of this retrospective as well as insightful work on stream buffer enhancements.  ...  The Cray T3D and T3E also used stream buffers on the data side, as a replacement for a secondary cache.  ...  As a result of this study allocation filters were implemented in the Cray T3E. More recently, stream buffers have been studied in the context of more modern processor designs.  ... 
doi:10.1145/285930.285958 dblp:conf/isca/Jouppi98 fatcat:bssu2a4sfba2rluwnuv4adr6pe

A study of performance on SMP and distributed memory architectures using a shared memory programming model

Eugene D. Brooks, Karen H. Warren
1997 Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97  
The type qualifier declaration supports an abstract shared memory facility on distributed memory machines while making direct use of hardware support on shared memory architectures.  ...  Although the resulting shared memory programming model is portable, it does not remove the need to arrange for overlapped or blocked remote memory references on platforms that require these tuning measures  ...  Acknowledgments * Work performed under the auspices of the U.S. Department of Energy by the Lawrence Livermore National Laboratory under contract No.  ... 
doi:10.1145/509593.509637 dblp:conf/sc/BrooksW97 fatcat:62k6mlfeebgzfnp3zubqe4ticq

Running a code for lattice quantum chromodynamics efficiently on CRAY T3E systems [chapter]

N. Attig, S. Güsken, P. Lacock, Th. Lippert, K. Schilling, P. Ueberholz, J. Viehoff
1998 Lecture Notes in Computer Science  
We present a detailed analysis of the performance of the stabilized biconjugate gradient algorithm with preconditioning on massively parallel CRAY T3E systems.  ...  Efficient parallel Krylov subspace solvers play a vital role in the solution of these systems.  ...  Acknowledgements The authors gratefully acknowledge the computer time granted by the HLRZ on the CRAY T3E systems of the Research Centre Jülich. They would like to thank E.  ... 
doi:10.1007/bfb0037145 fatcat:aysdiz5nlbcz7e4xejubewec3i

Synchronization and communication in the T3E multiprocessor

Steven L. Scott
1996 Proceedings of the seventh international conference on Architectural support for programming languages and operating systems - ASPLOS-VII  
This paper discusses the Cray T3E multiprocessor, which is based on the DEC Alpha 21164 microprocessor.  ...  This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors.  ...  Credit for much of the work presented in this paper belongs to the other architects of the T3E, Steve Oberlin and Rick Kessler.  ... 
doi:10.1145/237090.237144 dblp:conf/asplos/Scott96 fatcat:bs5zz7ivcjeyjatcic6rifvw3i

Synchronization and communication in the T3E multiprocessor

Steven L. Scott
1996 ACM SIGOPS Operating Systems Review  
This paper discusses the Cray T3E multiprocessor, which is based on the DEC Alpha 21164 microprocessor.  ...  This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors.  ...  Credit for much of the work presented in this paper belongs to the other architects of the T3E, Steve Oberlin and Rick Kessler.  ... 
doi:10.1145/248208.237144 fatcat:lsx5ybe7qnaxxieylpqjk6m4jy

Synchronization and communication in the T3E multiprocessor

Steven L. Scott
1996 SIGPLAN notices  
This paper discusses the Cray T3E multiprocessor, which is based on the DEC Alpha 21164 microprocessor.  ...  This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors.  ...  Credit for much of the work presented in this paper belongs to the other architects of the T3E, Steve Oberlin and Rick Kessler.  ... 
doi:10.1145/248209.237144 fatcat:lnt6kuuwbnfcnnhinksa7lgrem

Direct numerical simulation of turbulence with a PC/linux cluster

G.-S. Karamanos, C. Evangelinos, R. C. Boes, R. M. Kirby, G. E. Karniadakis
1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99  
T3E.  ...  The comparison concentrates on CPU and communication performance. At the kernel level, BLAS libraries are used for CPU performance evaluation.  ...  Naval Oceanographic Office (NAVO), the AP3000 at Imperial College, the Center for Advanced Scientific Computation and Visualisation at Brown University, the IBM SP2 at the Center for Fluid Mechanics at  ... 
doi:10.1145/331532.331585 dblp:conf/sc/KaramanosEBKK99 fatcat:u27ruwmgzjderaocud5jekh3p4

Highly Optimized Code for Lattice Quantum Chromodynamics on the CRAY T3E [chapter]

N. Attig, S. Güsken, P. Lacock, Th. Lippert, K. Schilling, P. Ueberholz, J. Viehoff
1998 Advances in Parallel Computing  
Acknowledgements The authors gratefully acknowledge the computer time granted by the HLRZ on the CRAY T3E of the Research Centre Jülich. They would like to thank E.  ...  Anderson of SGI/Cray Research for his advice and efforts with respect to the assembler programming and R. Vogelsang from SGI GmbH/Cray Research for his continuous support.  ...  CRAY T3E architecture The CRAY T3E, which is the second generation of Cray Research MPP systems, is the ideal machine for this kind of application.  ... 
doi:10.1016/s0927-5452(98)80071-3 fatcat:dusyt7kk65h5vmelpqyjteaozm

Design and evaluation of a TOP100 Linux Super Cluster system

Niklas Edmundsson, Erik Elmroth, Bo Kågström, Markus Mårtensson, Mats Nylén, Åke Sandgren, Mattias Wadenstein
2004 Concurrency and Computation  
The system's utilization figures exceed 90%, i.e. all 240 processors are on average utilized over 90% of the time, 24 hours a day, seven days a week.  ...  In summary, this $500 000 system is extremely cost-effective and shows the performance one would expect of a large-scale supercomputing system with distributed memory architecture.  ...  Finally, we thank the anonymous referees for constructive comments on the first version of this manuscript.  ... 
doi:10.1002/cpe.787 fatcat:h2ulkmmqendhdfdb3f2eam6sry

Compression-based ray casting of very large volume data in distributed environments

C. Bajaj, Insung Ihm, Sanghun Park, Dongsub Song
2000 Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region  
We report experimental results on both a Cray T3E and a PC/workstation cluster for the Visible Man dataset.  ...  Our method, based on data compression, attempts to enhance the rendering speedups by quickly reconstructing voxel data from local memory rather than expensively fetching them from remote memory spaces.  ...  Acknowledgements This paper was accomplished with the research fund provided by Korea Research Foundation under Support for Faculty Research Abroad.  ... 
doi:10.1109/hpc.2000.843533 fatcat:p33pz7vsxfgq5gcnkbiawjoc6y

Priority queues and sorting methods for parallel simulation

M.D. Grammatikakis, S. Liesche
2000 IEEE Transactions on Software Engineering  
The optimized message passing network simulator can process $ SHHK packet moves in one second, with an efficiency that exceeds $ SH percent for a few thousands packets on the Cray-T3E with 32 PEs.  ...  Although our concurrent implementations use the Cray-T3E ShMem library, portability can be derived from Open-MP or MPI-2 standard libraries, which will provide support for one-way communication and shared  ...  The majority of this research was carried out while the authors were with the Institute of Informatics at the University of Hildesheim, Germany.  ... 
doi:10.1109/32.846298 fatcat:ttsc6vx2qvh4bfhhitk5meohbi

Parallel scheduling of the PCG method for banded matrices rising from FDM/FEM

E.M. Ortigosa, L.F. Romero, J.I. Ramos
2003 Journal of Parallel and Distributed Computing  
For the computer architectures and number of processors employed in this study, it has been found that this implementation is more efficient than the standard one, and can be applied to narrow-band matrices  ...  In this paper, an analysis of the parallel implementation of this method on several computer architectures and for several programming paradigms is presented.  ...  Tichy´of the Institute of Computer Science, Academy of Sciences of the Czech Republic, for the references, comments and proofs of the effects of finite-precision arithmetic on the stability of the PCG  ... 
doi:10.1016/s0743-7315(03)00121-7 fatcat:lskoaluv3ba3ngljhnemdl7qsy
« Previous Showing results 1 — 15 out of 113 results