A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Extending Modulo Scheduling with Memory Reference Merging
[chapter]
1999
Lecture Notes in Computer Science
Experiments on the Cray T3E demonstrate the benefits of memory reference merging. ...
This technique has been used over several years on the Cray T3E block scheduler, and was later generalized to the Cray T3E software pipeliner. ...
Currently, software prefetching on the Cray T3E is limited to library routines that were developed and optimized in assembly code. ...
doi:10.1007/978-3-540-49051-7_19
fatcat:cpq6dedt7zasvbaknitzovp75y
Benchmarking computer platforms for lattice QCD applications
2004
Nuclear Physics B - Proceedings Supplements
The platforms considered are apeNEXT, CRAY T3E, Hitachi SR8000, IBM p690, PC-Clusters, and QCDOC. ...
We define a benchmark suite for lattice QCD and report on benchmark results from several computer platforms. ...
CRAY T3E-900 The CRAY T3E is a classic massively parallel computer. It has single CPU nodes and a threedimensional torus network. The T3E architecture is rather well balanced. ...
doi:10.1016/s0920-5632(03)02731-2
fatcat:rzizjsx3lzhcdhxsiwktgcmwbq
Performance of the CRAY T3E multiprocessor
1997
Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97
The CRAY T3E is a scalable shared-memory multiprocessor based on the DEC Alpha 21164 microprocessor. ...
This paper reports our experiences with the CRAY T3E and presents a variety of performance measurements. Section 2 provides a brief overview of the system architecture. ...
The reader is referred to [6] and [7] for further details on the CRAY T3E design. ...
doi:10.1145/509593.509632
dblp:conf/sc/AndersonBGS97
fatcat:3gk4aq4dh5bbhg2xr3ici33j74
Retrospective: improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
1998
25 years of the international symposia on Computer architecture (selected papers) - ISCA '98
Acknowledgments Keith Farkas provided helpful comments on a draft of this retrospective as well as insightful work on stream buffer enhancements. ...
The Cray T3D and T3E also used stream buffers on the data side, as a replacement for a secondary cache. ...
As a result of this study allocation filters were implemented in the Cray T3E. More recently, stream buffers have been studied in the context of more modern processor designs. ...
doi:10.1145/285930.285958
dblp:conf/isca/Jouppi98
fatcat:bssu2a4sfba2rluwnuv4adr6pe
A study of performance on SMP and distributed memory architectures using a shared memory programming model
1997
Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97
The type qualifier declaration supports an abstract shared memory facility on distributed memory machines while making direct use of hardware support on shared memory architectures. ...
Although the resulting shared memory programming model is portable, it does not remove the need to arrange for overlapped or blocked remote memory references on platforms that require these tuning measures ...
Acknowledgments * Work performed under the auspices of the U.S. Department of Energy by the Lawrence Livermore National Laboratory under contract No. ...
doi:10.1145/509593.509637
dblp:conf/sc/BrooksW97
fatcat:62k6mlfeebgzfnp3zubqe4ticq
Running a code for lattice quantum chromodynamics efficiently on CRAY T3E systems
[chapter]
1998
Lecture Notes in Computer Science
We present a detailed analysis of the performance of the stabilized biconjugate gradient algorithm with preconditioning on massively parallel CRAY T3E systems. ...
Efficient parallel Krylov subspace solvers play a vital role in the solution of these systems. ...
Acknowledgements The authors gratefully acknowledge the computer time granted by the HLRZ on the CRAY T3E systems of the Research Centre Jülich. They would like to thank E. ...
doi:10.1007/bfb0037145
fatcat:aysdiz5nlbcz7e4xejubewec3i
Synchronization and communication in the T3E multiprocessor
1996
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems - ASPLOS-VII
This paper discusses the Cray T3E multiprocessor, which is based on the DEC Alpha 21164 microprocessor. ...
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. ...
Credit for much of the work presented in this paper belongs to the other architects of the T3E, Steve Oberlin and Rick Kessler. ...
doi:10.1145/237090.237144
dblp:conf/asplos/Scott96
fatcat:bs5zz7ivcjeyjatcic6rifvw3i
Synchronization and communication in the T3E multiprocessor
1996
ACM SIGOPS Operating Systems Review
This paper discusses the Cray T3E multiprocessor, which is based on the DEC Alpha 21164 microprocessor. ...
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. ...
Credit for much of the work presented in this paper belongs to the other architects of the T3E, Steve Oberlin and Rick Kessler. ...
doi:10.1145/248208.237144
fatcat:lsx5ybe7qnaxxieylpqjk6m4jy
Synchronization and communication in the T3E multiprocessor
1996
SIGPLAN notices
This paper discusses the Cray T3E multiprocessor, which is based on the DEC Alpha 21164 microprocessor. ...
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. ...
Credit for much of the work presented in this paper belongs to the other architects of the T3E, Steve Oberlin and Rick Kessler. ...
doi:10.1145/248209.237144
fatcat:lnt6kuuwbnfcnnhinksa7lgrem
Direct numerical simulation of turbulence with a PC/linux cluster
1999
Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99
T3E. ...
The comparison concentrates on CPU and communication performance. At the kernel level, BLAS libraries are used for CPU performance evaluation. ...
Naval Oceanographic Office (NAVO), the AP3000 at Imperial College, the Center for Advanced Scientific Computation and Visualisation at Brown University, the IBM SP2 at the Center for Fluid Mechanics at ...
doi:10.1145/331532.331585
dblp:conf/sc/KaramanosEBKK99
fatcat:u27ruwmgzjderaocud5jekh3p4
Highly Optimized Code for Lattice Quantum Chromodynamics on the CRAY T3E
[chapter]
1998
Advances in Parallel Computing
Acknowledgements The authors gratefully acknowledge the computer time granted by the HLRZ on the CRAY T3E of the Research Centre Jülich. They would like to thank E. ...
Anderson of SGI/Cray Research for his advice and efforts with respect to the assembler programming and R. Vogelsang from SGI GmbH/Cray Research for his continuous support. ...
CRAY T3E architecture The CRAY T3E, which is the second generation of Cray Research MPP systems, is the ideal machine for this kind of application. ...
doi:10.1016/s0927-5452(98)80071-3
fatcat:dusyt7kk65h5vmelpqyjteaozm
Design and evaluation of a TOP100 Linux Super Cluster system
2004
Concurrency and Computation
The system's utilization figures exceed 90%, i.e. all 240 processors are on average utilized over 90% of the time, 24 hours a day, seven days a week. ...
In summary, this $500 000 system is extremely cost-effective and shows the performance one would expect of a large-scale supercomputing system with distributed memory architecture. ...
Finally, we thank the anonymous referees for constructive comments on the first version of this manuscript. ...
doi:10.1002/cpe.787
fatcat:h2ulkmmqendhdfdb3f2eam6sry
Compression-based ray casting of very large volume data in distributed environments
2000
Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region
We report experimental results on both a Cray T3E and a PC/workstation cluster for the Visible Man dataset. ...
Our method, based on data compression, attempts to enhance the rendering speedups by quickly reconstructing voxel data from local memory rather than expensively fetching them from remote memory spaces. ...
Acknowledgements This paper was accomplished with the research fund provided by Korea Research Foundation under Support for Faculty Research Abroad. ...
doi:10.1109/hpc.2000.843533
fatcat:p33pz7vsxfgq5gcnkbiawjoc6y
Priority queues and sorting methods for parallel simulation
2000
IEEE Transactions on Software Engineering
The optimized message passing network simulator can process $ SHHK packet moves in one second, with an efficiency that exceeds $ SH percent for a few thousands packets on the Cray-T3E with 32 PEs. ...
Although our concurrent implementations use the Cray-T3E ShMem library, portability can be derived from Open-MP or MPI-2 standard libraries, which will provide support for one-way communication and shared ...
The majority of this research was carried out while the authors were with the Institute of Informatics at the University of Hildesheim, Germany. ...
doi:10.1109/32.846298
fatcat:ttsc6vx2qvh4bfhhitk5meohbi
Parallel scheduling of the PCG method for banded matrices rising from FDM/FEM
2003
Journal of Parallel and Distributed Computing
For the computer architectures and number of processors employed in this study, it has been found that this implementation is more efficient than the standard one, and can be applied to narrow-band matrices ...
In this paper, an analysis of the parallel implementation of this method on several computer architectures and for several programming paradigms is presented. ...
Tichy´of the Institute of Computer Science, Academy of Sciences of the Czech Republic, for the references, comments and proofs of the effects of finite-precision arithmetic on the stability of the PCG ...
doi:10.1016/s0743-7315(03)00121-7
fatcat:lskoaluv3ba3ngljhnemdl7qsy
« Previous
Showing results 1 — 15 out of 113 results