676 Hits in 4.2 sec

An orthogonal multiprocessor for parallel scientific computations

K. Hwang, P.-S. Tseng, D. Kim
1989 IEEE transactions on computers  
In most cases, linear speedup can be achieved on the OMP system.  ...  Parallel algorithms being mapped include matrix arithmetic, linear system solver, FFT, array sorting, linear programming, and parallel PDE solutions.  ...  ACKNOWLEDGMENT The authors appreciate the valuable suggestions made by the anonymous referees during the long reviewing process.  ... 
doi:10.1109/12.8729 fatcat:sjzxxz4z2rdbpf4ulnwvo3ea4y

Compile-time partitioning and scheduling of parallel programs

Vivek Sarkar, John Hennessy
1986 SIGPLAN notices  
Partitioning and scheduling techniques are necessary to implement parallel languages on multiprocessors.  ...  Multiprocessor performance is maximized when parallelism between tasks is optimally traded off with communication and synchronization overhead.  ...  Batcher's iterative merge-exchange sorting algorithm [4] on 100 integers. This is an excellent algorithm for parallel sorting.  ... 
doi:10.1145/13310.13313 fatcat:74luodw6k5cmnb625wehbeg5su

Parallel sorting by regular sampling

Hanmao Shi, Jonathan Schaeffer
1992 Journal of Parallel and Distributed Computing  
The algorithm reduces memory and bus contention, which many parallel sorting algorithms suffer from, by using a regular sampling of the data to ensure good pivot selection.  ...  On a variety of shared and distributed memory machines, the algorithm achieves better than half-linear speedups.  ...  Although considerable work has been done on the theory of parallel sorting and efficient implementations on SIMD architectures, good parallel performance on a variety of multiprocessor MIMD architectures  ... 
doi:10.1016/0743-7315(92)90075-x fatcat:fkxzpz3n65cs7lcckbp7cgefzm

High-performance DSP architectures for intelligence and control applications

1991 IEEE Control Systems  
This paper describes the architectural features of DSPs for intelligence and control applications, and the node configuration of the IX-n generalpurpose neurocomputer, based on the commercially available  ...  High-precision control and fault-tolerance are achieved by exploiting the high-speed arithmetic, on-chip peripherals, direct memory access (DMA) controllers, multiprocessor support and bitmanipulation  ...  Acknowledgment The authors gratefully acknowledge the help of Rosemary Mattingley, and the support of the Department of Electrical and Computer Engineering, Drexel University and the Faculty of Electrical  ... 
doi:10.1109/37.88592 fatcat:vhs764wo7vghvphh2gfkv3chka

Optimized On-Chip-Pipelined Mergesort on the Cell/B.E [chapter]

Rikard Hultén, Christoph W. Kessler, Jörg Keller
2010 Lecture Notes in Computer Science  
fastest merge sort implementation on Cell.  ...  In this paper, we consider parallel mergesort on Cell/B.E. as a representative memory-intensive application in detail, and focus on the global merging phase, which is dominating the overall sorting time  ...  Pipelining), SSF (ePUMA), Vinnova, and CUGS. We thank Niklas Dahl and his colleagues from IBM Sweden for giving us access to their QS20 blade server.  ... 
doi:10.1007/978-3-642-15291-7_19 fatcat:lvnixmfiebg5jg3fhwdyhcbydu

Revision of Relational Joins for Multi-Core and Many-Core Architectures

Martin Krulis, Jakub Yaghob
2011 Databases, Texts, Specifications, Objects  
In this paper, we have focused on standard relational join problem from the perspective of current highly parallel architectures.  ...  Actual trend set by CPU manufacturers and recent developement in the field of graphical processing units (GPUs) offered us the computational power of multi-core and many-core architectures.  ...  First approach widely used was the merge join. Both joined sets are sorted first and then merged in single pass.  ... 
dblp:conf/dateso/KrulisY11 fatcat:6oy3sb2pore4fdrknx62qsh4qm

FlexGrip: A soft GPGPU for FPGAs

Kevin Andryc, Murtaza Merchant, Russell Tessier
2013 2013 International Conference on Field-Programmable Technology (FPT)  
This architecture supports direct CUDA compilation to a binary which is executable on the FPGAbased GPGPU without hardware recompilation.  ...  Over the past decade, soft microprocessors and vector processors have been extensively used in FPGAs for a wide variety of applications.  ...  ACKNOWLEDGMENTS We thank L-3 KEO for their support and contributions. We also thank Xilinx for the donation of the ISE 14.2 toolkit and Modelsim SE 10.1 software.  ... 
doi:10.1109/fpt.2013.6718358 dblp:conf/fpt/AndrycMT13 fatcat:7ey67anaezbj7p7dgz2qtlnzty

1983 Index IEEE Transactions on Computers Vol. C-32

1983 IEEE transactions on computers  
., and Yuan-Chieh Chow. The analysis and design of some new sorting machines; T-CJul 83 677-683 Wojciechowski, Witold, and Anthony S. Wojcik.  ...  Marsan, Marco Ajmone, + , T-CJan 83 60-71 stability and performance of the R-ALOHA packet broadcast system. balanced merge transposition of matrix stored on sequential file.  ... 
doi:10.1109/tc.1983.1676190 fatcat:xsogjoynp5dt7mqu6dy4tiodfq

ParFORM: Recent development

M. Tentyukov, J.A.M. Vermaseren, H.M. Staudenmaier
2006 Nuclear Instruments and Methods in Physics Research Section A : Accelerators, Spectrometers, Detectors and Associated Equipment  
We report on the status of our project of parallelization of the symbolic manipulation program FORM. We have now parallel versions of FORM running on Cluster- or SMP-architectures.  ...  At the end of a module the sorted streams of terms from all processors have to be merged to one final output stream again.  ...  Let us take the simplest algorithm is well suited for multithreaded process: each thread reads one row of the first matrix, one column of the second matrix and sums up the result of the multiplication  ... 
doi:10.1016/j.nima.2005.11.142 fatcat:wicctnd245hqjn4p7pwybepvne

ASIC Implementation of Various Sorting Techniques for Image Processing Applications

Malleswari Akurati, Asst. Professor, CVR College of Engineering/ ECE Department, Hyderabad, India
2019 CVR Journal of Science & Technology  
This paper addresses the design and analysis of various sorting algorithms, and its VLSI implementation based on a sorting network.  ...  The various sorting algorithms are Sinking sort, Merge sort and Library sort; all the three sorting algorithms are compared in terms of area, power and timing with a complete comparison table.  ...  Here both the I/O operations and the multiprocessors several computing operations are done concurrently.  ... 
doi:10.32377/cvrjst1608 fatcat:we5halldhzdsvojlhee3wpqfbq

Efficient gather and scatter operations on graphics processors

Bingsheng He, Naga K. Govindaraju, Qiong Luo, Burton Smith
2007 Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07  
We have evaluated our algorithms in sorting, hashing, and the sparse matrix-vector multiplication in comparison with their optimized CPU counterparts.  ...  Our results show that these optimizations yield 2-4X improvement on the GPU bandwidth utilization and 30-50% improvement on the response time.  ...  The authors thank the shepherd, Leonid Oliker, and the anonymous reviewers for their insightful suggestions, all of which improved this work.  ... 
doi:10.1145/1362622.1362684 dblp:conf/sc/HeGLS07 fatcat:wv2dh7micna6zkzjmwigvytwba

Toward Large-Scale Shared Memory Multiprocessing [chapter]

John K. Bennett, John B. Carter, Willy Zwaenepoel
1992 Scalable Shared Memory Multiprocessors  
Willow i s distinguished from other shared memory multiprocessors by a l a yered memory organization that signi cantly reduces the impact of inclusion on the cache hierarchy and that exploits locality  ...  based on the expected or observed access behavior for the data stored in that line.  ...  ACKNOWLEDGEMENTS Other members of the Computer Systems Laboratory have participated in the development o f m a n y of the ideas that we h a ve presented.  ... 
doi:10.1007/978-1-4615-3604-8_15 fatcat:k7qiuyjnbzc5bkx4eenkrzmvxq

GPU merge path

Oded Green, Robert McColl, David A. Bader
2012 Proceedings of the 26th ACM international conference on Supercomputing - ICS '12  
An efficient parallel merging algorithm partitions the sorted input arrays into sets of non-overlapping sub-arrays that can be independently merged on multiple cores.  ...  This approach demonstrates an average of 20X and 50X speedup over a sequential merge on the x86 platform for integer and floating point, respectively.  ...  Some of the new algorithms are based on a single sorting method such as the radix sort in [9] .  ... 
doi:10.1145/2304576.2304621 dblp:conf/ics/GreenMB12 fatcat:hfcvue7qfnhllfrd2t4g7zotpu

High Performance, Energy Efficiency, and Scalability With GALS Chip Multiprocessors

Zhiyi Yu, B.M. Baas
2009 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
His research interests include high-performance and energy-efficient digital VLSI design, architectures, and processor interconnects, with an emphasis on many-core processors.  ...  We present an example mesh-connected GALS chip multiprocessor and show it has a less than 1% performance (throughput) reduction on average compared to the corresponding synchronous system for many DSP  ...  ACKNOWLEDGMENT The authors would like to thank E. Work, T. Mohsenin, and other VCL colleagues; R. Krishnamurthy, M. Anders, S. Mathew, and Y. P. Cheng.  ... 
doi:10.1109/tvlsi.2008.2001947 fatcat:zrlhdhuagfh4bh25iwr7q6kwqa

Empirical analysis of overheads in cluster environments

Brian K. Schmidt, Vaidy S. Sunderam
1994 Concurrency Practice and Experience  
and data handling interact with the PVM service provider on each host (i.e. the PVM daemon); collectively, the daemons on machines in the host pool emulate a virtual concurrent machine.  ...  are on the verge of increasing by one or two orders of magnitude with the advent of fiber optics.  ...  Figure 3 3 (a): Matrix Multiplication using "Pipe-Multiply-Roll" Figure 3 ( 3 b): Algorithm Outline for Matrix Multiplication Figure 4 4 (a): Split-Sort-Merge Algorithm on 4 Tree-Connected Processors  ... 
doi:10.1002/cpe.4330060102 fatcat:zkhdwir7djcxpc2g7z2sfoxqt4
« Previous Showing results 1 — 15 out of 676 results