Filters








1,044 Hits in 4.2 sec

Local Load Balancing for Data-parallel Branch-and-bound [chapter]

Dominik Henrich
1994 Massively Parallel Processing Applications and Development  
The search itself uses a depth-first strategy.  ...  Therefore, all solutions are searched and a predefined pruning threshold (first heuristic incumbent) is used.  ... 
doi:10.1016/b978-0-444-81784-6.50032-4 fatcat:ejjw22ee3bbkvgrhngodead5v4

SIMD Vectorization of Straight Line FFT Code [chapter]

Stefan Kral, Franz Franchetti, Juergen Lorenz, Christoph W. Ueberhuber
2003 Lecture Notes in Computer Science  
Additionally, a special compiler backend is introduced which is able to (i) utilize particular code properties, (ii) generate optimized address computation, and (iii) apply specialized register allocation  ...  This paper presents compiler technology that targets general purpose microprocessors augmented with SIMD execution units for exploiting data level parallelism.  ...  Upon failure, the last alternative saved on the stack is chosen, resuming the search at that point. This implements a depth-first search with chronological backtracking.  ... 
doi:10.1007/978-3-540-45209-6_39 fatcat:32xmpbzlyvaixnmbfwozss5ebu

Block-Parallel IDA* for GPUs (Extended Manuscript) [article]

Satoru Horie, Alex Fukunaga
2017 arXiv   pre-print
On the 15-puzzle, BPIDA* on a NVIDIA GRID K520 with 1536 CUDA cores achieves a speedup of 4.98 compared to a highly optimized sequential IDA* implementation on a Xeon E5-2670 core.  ...  We propose Block-Parallel IDA* (BPIDA*), which assigns the search of a subtree to a block (a group of threads with access to fast shared memory) rather than a thread.  ...  The BPDFS function is similar to a standard, sequential f -limited depth-first search, but in each iteration of the repeat-until loop in lines 4-16 (Alg. 1), a warp performs the fetch-evaluate-expand cycle  ... 
arXiv:1705.02843v1 fatcat:tie5kpiwpvdqzjhecoocmkqgea

FFT Compiler Techniques [chapter]

Stefan Kral, Franz Franchetti, Juergen Lorenz, Christoph W. Ueberhuber, Peter Wurzinger
2004 Lecture Notes in Computer Science  
, and IBM's SIMD operations implemented on the new processors of the BlueGene/L supercomputer. The paper introduces a special compiler backend for Intel P4's SSE 2 and AMD's 3DNow!  ...  Numerical applications are accelerated by automatically vectorizing blocks of straight line code to be run on processors featuring two-way short vector SIMD extensions like Intel's SSE 2 on Pentium 4,  ...  The Vectorization Algorithm Fftw-Gel's vectorization algorithm implements a depth first search with chronological backtracking.  ... 
doi:10.1007/978-3-540-24723-4_15 fatcat:iftd7t3c6rgengdgu2q6xjsxe4

Adaptive Collapsing on Bounding Volume Hierarchies for Ray-Tracing [article]

André Susano Pinto
2010 Eurographics State of the Art Reports  
This paper presents a new heuristic in the area, based on collapsing some nodes in order to achieve a smaller expected number of node-tests.  ...  This paper presents a new heuristic in the area, based on collapsing some nodes in order to achieve a smaller expected number of node-tests.  ...  As an example, on a 4-SIMD machine test_group4 coud be used as cost function.  ... 
doi:10.2312/egsh.20101051 fatcat:4grvbbd22neitcpul4r4sp5cui

Parallel Minimax Tree Searching on GPU [chapter]

Kamil Rocki, Reiji Suda
2010 Lecture Notes in Computer Science  
Moreover, a method of minimizing warp divergence and performance degradation is described. The paper contains both the results of test performed on multiple CPUs and GPUs.  ...  The paper describes results of minimax tree searching algorithm implemented within CUDA platform. The problem regards move choice strategy in the game of Reversi.  ...  CPU First, the basic parallelized algorithm was tested on a 8-way Xeon E5540 machine to check the correctness and to obtain a reference for further GPU results.  ... 
doi:10.1007/978-3-642-14390-8_47 fatcat:vsgtohffubf2baxejohol7u6kq

Parallel graph algorithms

Michael J. Quinn, Narsingh Deo
1984 ACM Computing Surveys  
The algorithms are based on a number of models of parallel computation, including systohc arrays, assoclatwe processors, array processors, and mulhple CPU computers.  ...  Algorithms and data structures developed to solve graph problems on parallel computers are surveyed.  ...  Ifp = 1, the result is a depth-first search.  ... 
doi:10.1145/2514.2515 fatcat:vcxpxbxty5cixamq6pplo4i66m

Efficient Utilization of SIMD Extensions

F. Franchetti, S. Kral, J. Lorenz, C.W. Ueberhuber
2005 Proceedings of the IEEE  
This paper describes special purpose compiler technology that supports automatic performance tuning on machines with vector instructions.  ...  The studied SIMD instruction set extensions include Intel's SSE family, AMD's 3DNow!, Motorola's AltiVec, and IBM's BlueGene/L SIMD instructions.  ...  Watson Research Center, making it possible to work on the BlueGene/L prototype and leading to a very pleasant cooperation.  ... 
doi:10.1109/jproc.2004.840491 fatcat:4x6cjeyqlzfdjoau4et7rzoznm

Robust SIMD: Dynamically Adapted SIMD Width and Multi-Threading Depth

Jiayuan Meng, Jeremy W. Sheaffer, Kevin Skadron
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium  
to a conventional SIMD architecture.  ...  In fact, because the above effects are subject to runtime dynamics, a fixed combination of SIMD width and multi-threading depth no longer works ubiquitously across diverse applications or when cache capacities  ...  IIS-0612049 and CNS-0615277, a grant from Intel Research, a professor partnership award from NVIDIA Research, and an NVIDIA Ph.D. fellowship (Meng).  ... 
doi:10.1109/ipdps.2012.20 dblp:conf/ipps/MengSS12 fatcat:hpa3pvru7zhi5of72u4d6d4smu

Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms

Timothy Furtak, José Nelson Amaral, Robert Niewiadomski
2007 Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07  
The improvements provided are orthogonal to the gains obtained through empirical search for a suitable sorting algorithm [11] .  ...  Wall-clock performance of d-heaps is improved by up to 39% using a similar technique.  ...  If the maximum search depth in any one phase reaches 3, then that task is further subdivided into moving the first 2 elements into a vector, the next 2 elements into another, and finally combining them  ... 
doi:10.1145/1248377.1248436 dblp:conf/spaa/FurtakAN07 fatcat:5rcjmu67xnfkhfxi3fy7ab24cm

Vectorization techniques for the Blue Gene/L double FPU

J. Lorenz, S. Kral, F. Franchetti, C. W. Ueberhuber
2005 IBM Journal of Research and Development  
This paper presents vectorization techniques tailored to meet the specifics of the two-way single-instruction multiple-data (SIMD) double-precision floating-point unit (FPU), which is a core element of  ...  This paper focuses on the general-purpose basic-block vectorization and optimization methods as they are incorporated in the Vienna MAP vectorizer and optimizer.  ...  Realization of the vectorization engine The MAP vectorization algorithm is implemented using a depth-first search engine with chronological backtracking.  ... 
doi:10.1147/rd.492.0437 fatcat:vdkdszwotvc5fg6l6r2n5de4pu

Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP

Alexandru Voicu
2014 International Journal of Computer Applications  
The purpose of this paper is two-fold, on one hand pursuing an in-depth look at GPU hardware and its characteristics, and on the other demonstrating that portable, generic, mathematically grounded programming  ...  This represents the first implementation of an algorithm from the Ant Colony Optimisation (ACO) family using C++ AMP, whilst at the same time being one of the first uses of the latter programming environment  ...  First, researchers have moved from exact algorithms to approximating ones, with meta-heuristics being a noteworthy example in recent years.  ... 
doi:10.5120/17529-8100 fatcat:vc3r5elwpjek5kkj4xyxzap4xm

A parallel algorithm for graph matching and its MasPar implementation

R. Allen, L. Cinque, S. Tanimoto, L. Shapiro, D. Yasuda
1997 IEEE Transactions on Parallel and Distributed Systems  
We demonstrate that the answer is yes, and in particular, that graph matching has a natural and efficient implementation on SIMD machines.  ...  Most research on parallel search has assumed that a multiple-instructionstream/multiple-data-stream (MIMD) parallel computer is available.  ...  An SIMD processor is substantially less flexible; the speed-up for a search (in the sense of real time) compared to a serial machine is a fraction of the number of processors working on the search.  ... 
doi:10.1109/71.598276 fatcat:jqafoa57q5airmgboahlszizk4

GKLEE

Guodong Li, Peng Li, Geof Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh, Sreeranga P. Rajan
2012 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12  
We describe GKLEE's test-case reduction heuristics, and the resulting scalability improvement for a given coverage target.  ...  Existing tools based on conservative static analysis or conservative modeling of SIMD concurrency generate false alarms resulting in wasted bug-hunting.  ...  With this view, it is natural that GKLEE supports facilities such as state caching and search heuristics (e.g. depth-first, weightedrandom, bump-merging, etc.), all of which are inherited from KLEE.  ... 
doi:10.1145/2145816.2145844 dblp:conf/ppopp/LiLSGGR12 fatcat:dqcea4ws75cavme7x5lwvser7q

GKLEE

Guodong Li, Peng Li, Geof Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh, Sreeranga P. Rajan
2012 SIGPLAN notices  
We describe GKLEE's test-case reduction heuristics, and the resulting scalability improvement for a given coverage target.  ...  Existing tools based on conservative static analysis or conservative modeling of SIMD concurrency generate false alarms resulting in wasted bug-hunting.  ...  With this view, it is natural that GKLEE supports facilities such as state caching and search heuristics (e.g. depth-first, weightedrandom, bump-merging, etc.), all of which are inherited from KLEE.  ... 
doi:10.1145/2370036.2145844 fatcat:2efpoqfqdjcpfcj3htfelxzdxa
« Previous Showing results 1 — 15 out of 1,044 results