Filters








2,186 Hits in 5.1 sec

The Conveyor - an Interconnection Device for ParallelVolumetric Transformations [article]

Daniel Cohen, Reuven Bakalash
1991 Eurographics Workshop on Graphics Hardware  
This paper presents the conveyor, an interconnection device which operates on a 3D skewed memory space and provides the capability of parallel volumetrictransformation.  ...  We are grateful to Professor Arie Kaufman for his support and advise. We also wish to acknowledge the careful review and helpful comments from our colleague Roni Yagel.  ...  Multiple buses, connecting between adjacent modules have a simple topology but the performance is linear with the number of shifts.  ... 
doi:10.2312/eggh/eggh91/077-085 dblp:conf/egh/CohenB91 fatcat:3tdb37mcqvckjjrjshe7pcfatq

Exploitation of optical interconnects in future server architectures

A. F. Benner, M. Ignatowski, J. A. Kash, D. M. Kuchta, M. B. Ritter
2005 IBM Journal of Research and Development  
Optical fiber links have become ubiquitous for links at the metropolitan and wide area distance scales, and have become common alternatives to electrical links in local area networks and cluster networks  ...  For these links closer to processors, issues such as packaging, power dissipation, and components cost assume increasing importance along with link bandwidth and link distance.  ...  Storage area network (SAN) links are used to connect servers with storage systems, and are typically useful when the stored data are shared among multiple servers.  ... 
doi:10.1147/rd.494.0755 fatcat:kn642j4yczc2nja77r2ejxo6xa

Design of a 3-D fully depleted SOI computational RAM

J.C. Koob, D.A. Leder, R.J. Sung, T.L. Brandon, D.G. Elliott, B.F. Cockburn, L. McIlrath
2005 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
Index Terms-Memory architecture, multiprocessor interconnection, parallel processing, silicon-on-insulator (SOI) technology.  ...  The architecture can be augmented with a nearest-neighbor physical 3-D communications network that can substantially reduce interconnect lengths and relieve routing congestion.  ...  Fossum at the University of Florida for their assistance with SOI technology. The authors also would like to thank A. Alimohammad, C. Gray, C. Giasson, C. Joly, J. Lamoureux, M. Redeker, R.  ... 
doi:10.1109/tvlsi.2004.842890 fatcat:ziu5daxyvbh2nbit5ars2su3m4

High-Performance DSP Processors for Intelligence Applications

Vinni Sharma
2015 IJIREEICE  
DSPs provide high computing power by employing a high level of on-chip parallelism, integrated hardware multipliers, carefully tailored instruction sets, memory organization schemes, hardware support for  ...  networks.  ...  The system of 64 DSPs with 32Mbytes of memory storage allocated to each processor may provide support for approximately 250M interconnections.  ... 
doi:10.17148/ijireeice.2015.3732 fatcat:gck3ih7evzhfbjg6wpkft47jsi

Star: A Local Network System for Real-Time Management of Imagery Data

Chuan-Lin Wu, Tse-Yun Feng, Min-Chang Lin
1982 IEEE transactions on computers  
This project serves as a research tool for using current and projected technology to innovate better schemes for parallel image processing.  ...  A model for comparing cost-effectiveness among Starnet, crossbar, and multiple buses is included.  ...  Bus-Structure Network Analysis A network with log2 N buses is used for comparison, where N is equal to the number of computer nodes to be connected.  ... 
doi:10.1109/tc.1982.1675901 fatcat:bab67cqjnjgmxe5h53d6xwdmpe

Real-Time Power System Dynamics Simulation Using a Parallel Block-Jacobi Preconditioned Newton-GMRES Scheme

Shrirang Abhyankar, Alexander J. Flueck
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
We present a parallel linear solution scheme using Krylov subspace based iterative solver GMRES with a Block-Jacobi preconditioner that shows promising prospect of a realtime dynamics simulation.  ...  Results obtained for both stable and unstable operating conditions show that real-time simulation speed can be realized by using the proposed parallel linear solution scheme.  ...  For example, the user can change the linear solution scheme from GMRES to direct LU factorization, or can change the matrix storage type, or preconditioners, via run-time options.  ... 
doi:10.1109/sc.companion.2012.48 dblp:conf/sc/AbhyankarF12 fatcat:47xinaa77vc67ff5ill3sj4dlu

Design of fault-tolerant associative processors

Behrooz Parhami, Algirdas Avizienis
1973 Proceedings of the 1st annual symposium on Computer architecture - ISCA '73  
Several schemes for reconflguration are discussed which allow us to establish an appropriate intercommunication pattern after replacing the faulty module by a spare.  ...  Associative processors are divided into four classes of fully parallel, bit-serial, word-serlal, and block-orlented systems.  ...  A fully parallel associative memory with only exact-match search operation and without masking capability can be protected against storage errors by using a code with a minimum distance of k in conjunction  ... 
doi:10.1145/800123.803979 dblp:conf/isca/ParhamiA73 fatcat:ykr6rmcbv5bw5ljfyt6kp4mozy

Design of fault-tolerant associative processors

Behrooz Parhami, Algirdas Avizienis
1973 SIGARCH Computer Architecture News  
Several schemes for reconflguration are discussed which allow us to establish an appropriate intercommunication pattern after replacing the faulty module by a spare.  ...  Associative processors are divided into four classes of fully parallel, bit-serial, word-serlal, and block-orlented systems.  ...  A fully parallel associative memory with only exact-match search operation and without masking capability can be protected against storage errors by using a code with a minimum distance of k in conjunction  ... 
doi:10.1145/633642.803979 fatcat:2q3kixrlhnawfnqk2eok35i4kq

On the Efficiency of Register File versus Broadcast Interconnect for Collective Communications in Data-Parallel Hardware Accelerators

Ardavan Pedram, Andreas Gerstlauer, Robert A. van de Geijn
2012 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing  
In this paper, we present how broadcast buses can eliminate the use of power hungry multi-ported register files in the context of data-parallel hardware accelerators for linear algebra operations.  ...  We compare a broadcast bus based architecture with conventional SIMD, 2D-SIMD and flat register file for these operations in terms of area and energy efficiency.  ...  We compare our design with typical SIMD cores with equivalent data parallelism and with L1 and L2 caches that amount to an equivalent aggregate storage space.  ... 
doi:10.1109/sbac-pad.2012.35 dblp:conf/sbac-pad/PedramGG12 fatcat:uprnlnt7ffarxc4j6zwv7omdru

Linear array implementation of the EM algorithm for PET image reconstruction

K. Rajan, L.M. Patnaik, J. Ramakrishna
1995 IEEE Transactions on Nuclear Science  
In addition, a large memory is required for the storage of the image, projection data, and the probability matrix.  ...  The novelty of the scheme is its simplicity. The linear array topology is expandable with a larger number of PE's.  ...  ACKNOWLEDGMENT The authors would like to thank the reviewers for their valuable comments and suggestions.  ... 
doi:10.1109/23.467723 fatcat:icguizcqavc6tc2ghubjl3oqii

Cots Sparse Matrix Utilization in Distribution Power Flow Applications [chapter]

Dino Ablakovic, Izudin Dzafic, Hans-Theo Neisius
2011 Lecture Notes in Electrical Engineering  
system sizes and for different network topologies.  ...  Analysis is given for the parallel direct sparse solver utilized in unbalanced and unsymmetrical Distribution System Power Flow solution, which maximizes parallelization on multi-core processors.  ...  Networks became much larger with the need for interconnection between different networks on distribution and medium voltage level.  ... 
doi:10.1007/978-94-007-2792-2_2 fatcat:7azvmuufwrf6xoavwwdnai47ja

Efficient Implementation of Fast Fourier Transform Using NOC

Lalitha Bhavani.Maddipati
2012 IOSR Journal of Electronics and Communication Engineering  
In this paper, improved algorithms for radix-8 FFT are presented. Various schemes have been proposed for computing FFT.  ...  Memory: Memory can be divided into local memory in the form of register files inside each processor element and into memory banks with storage capacities in the range from hundreds to thousands of words  ...  Memory: Memory can be divided into local memory in the form of register files inside each processor element and into memory banks with storage capacities in the range from hundreds to thousands of words  ... 
doi:10.9790/2834-0321419 fatcat:rgnxn5djnbhofa5lbwa6zbb3iu

Finite Projective Geometry based Fast, Conflict-free Parallel Matrix Computations [article]

Shreeniwas Sapre, Hrishikesh Sharma, Abhishek Patil, B. S. Adiga and Sachin Patkar
2011 arXiv   pre-print
For the problem of parallel LU/Cholesky decomposition of general matrices, the approach is motivated by the recently published scheme for interconnects of distributed systems, perfect difference networks  ...  The problem of designing approaches for parallelizing these computations, to get good speedups as much as possible as per Amdahl's law, has been continuously researched upon.  ...  Acknowledgements The work was carried out with support from Innovation Labs, Tata Consultancy Services Ltd, Bangalore, under project ID 1009295.  ... 
arXiv:1107.1127v1 fatcat:x2hxc6nzxnc4xep36l6v4y4fve

IEEE P1596, a scalable coherent interface for gigabyte/sec multiprocessor applications

D.B. Gustavson
1989 IEEE Transactions on Nuclear Science  
SCI goals include a minimum bandwidth of 1 GByte/iec per processor, efficient rapport of a coherent distribuled-csche image of ihared memory; and rapport for seg mentation, bu> repeaten and general switched  ...  die more costly switch networks.  ...  Such a scheme cannot be generalized to highly parallel systems. (In fact it cannot generally work across Fastbus Segment Interconnects, so coher ency domains are limited U.  ... 
doi:10.1109/23.34555 fatcat:4gi2ak3avngpdbef3cokrurtxy

Optimal matrix multiplication on fault-tolerant VLSI arrays

P.J. Varman, I.V. Ramakrishnan
1989 IEEE transactions on computers  
the desired array structure by appropriate switch settings on buses running parallel to the PE's.  ...  The Diogenes methodology, proposed by Rosenberg, for the design of easily testable and configurable fault-tolerant VLSI arrays, results in collinear layouts of processors (PE's) that are configured into  ...  The processors are interconnected by a system of buses running parallel to the line along which the processors are arranged.  ... 
doi:10.1109/12.16505 fatcat:m3usqgfthrchxfjwokj33bdela
« Previous Showing results 1 — 15 out of 2,186 results