Filters








2,553 Hits in 8.4 sec

FAST

Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony D. Nguyen, Tim Kaldewey, Victor W. Lee, Scott A. Brandt, Pradeep Dubey
2010 Proceedings of the 2010 international conference on Management of data - SIGMOD '10  
FAST eliminates impact of memory latency, and exploits thread-level and datalevel parallelism on both CPUs and GPUs to achieve 50 million (CPU) and 85 million (GPU) queries per second, 5X (CPU) and 1.7X  ...  In this paper, we present FAST, an extremely fast architecture sensitive layout of the index tree.  ...  In this paper, we present FAST (Fast Architecture Sensitive Tree) search algorithm that exploits high compute in modern processors for index tree traversal.  ... 
doi:10.1145/1807167.1807206 dblp:conf/sigmod/KimCSSNKLBD10 fatcat:cpc26e36xnft3owjv7npmn3z2e

Designing fast architecture-sensitive tree search on modern multicore/many-core processors

Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony D. Nguyen, Tim Kaldewey, Victor W. Lee, Scott A. Brandt, Pradeep Dubey
2011 ACM Transactions on Database Systems  
Designing fast architecture-sensitive tree search on modern multicore/many-core processors.  ...  FAST eliminates the impact of memory latency, and exploits thread-level and data-level parallelism on both CPUs and GPUs to achieve 50 million (CPU) and 85 million (GPU) queries per second for large trees  ...  Architecture-Sensitive Tree Search on Multicore/Many-Core Processors 22:7 CPUs and shared buffers on GPUs).  ... 
doi:10.1145/2043652.2043655 fatcat:aznq3gvf45g75goaxjno2bnj5u

A fast GPU algorithm for graph connectivity

Jyothish Soman, Kothapalli Kishore, P J Narayanan
2010 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)  
We also draw interesting observations on why PRAM algorithms, such as the Shiloach-Vishkin algorithm may not be a good fit for the GPU and how they should be modified.  ...  For instance, our implementation finds connected components of a graph of 10 million nodes and 60 million edges in about 500 milliseconds on a GPU, given a random edge list.  ...  The Shiloach-Vishkin algorithm as proposed [24] may not be quite suitable on modern architectures such as the GPU.  ... 
doi:10.1109/ipdpsw.2010.5470817 dblp:conf/ipps/SomanKN10 fatcat:njb65qatt5hd3dmdn6v3smwupm

Scalable fast multipole methods on distributed heterogeneous architectures

Qi Hu, Nail A. Gumerov, Ramani Duraiswami
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPU-GPU architecture with multicore CPU(s) and one or more GPU accelerators, as well  ...  We first develop a single node version where the CPU part is parallelized using OpenMP and the GPU version via CUDA.  ...  On modern multicore and GPU architectures, this requires parallelization of the algorithm. The FMM has been sought to be parallelized almost since its invention -see e.g., [5, 6, 7, 8] .  ... 
doi:10.1145/2063384.2063432 dblp:conf/sc/HuGD11 fatcat:dxmswho42vhp7pm5klcymnebx4

Jet: Fast quantum circuit simulations with parallel task-based tensor-network contraction [article]

Trevor Vincent, Lee J. O'Riordan, Mikhail Andrenkov, Jack Brown, Nathan Killoran, Haoyu Qi, Ish Dhand
2022 arXiv   pre-print
We demonstrate the advantages of our method by benchmarking our code on several Sycamore-53 and Gaussian boson sampling (GBS) supremacy circuits against other simulators.  ...  iii) the concurrent contraction of tensor networks on all available hardware.  ...  The authors thank SOSCIP for their computational resources and financial support. We acknowledge the computational resources and support from SciNet.  ... 
arXiv:2107.09793v3 fatcat:4hoswy5yrnagdo5zs5pzystj2i

QuickProbs—A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

Adam Gudyś, Sebastian Deorowicz, Jun-Tao Guo
2014 PLoS ONE  
We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation.  ...  Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs  ...  Author Contributions Conceived and designed the experiments: AG SD. Performed the experiments: AG. Analyzed the data: AG SD. Contributed reagents/ materials/analysis tools: AG.  ... 
doi:10.1371/journal.pone.0088901 pmid:24586435 pmcid:PMC3934876 fatcat:wfhb4nqdsre43aegdai2lka5t4

Fast Automatic Heuristic Construction Using Active Learning [chapter]

William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather
2015 Lecture Notes in Computer Science  
We demonstrate this technique by automatically constructing a model to determine on which device to execute four parallel programs at differing problem dimensions for a representative Cpu-Gpu based heterogeneous  ...  Our approach, on the other hand, uses active learning to select and only focus on the most useful training examples.  ...  Our method then searches for an input for which the intermediate models or heuristics most disagree on whether it should be run on the Cpu or the Gpu.  ... 
doi:10.1007/978-3-319-17473-0_10 fatcat:gwcklt44kvcgfhgpirsh76lvne

Fast parallel GPU-sorting using a hybrid algorithm

Erik Sintorn, Ulf Assarsson
2008 Journal of Parallel and Distributed Computing  
It is 6 times faster than single CPU quicksort, and 10% faster than the recent GPU-based radix sort.  ...  This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout the whole algorithm.  ...  Meanwhile, splitting the list into too many parts would lead to longer binary searches for each bucketsort-thread and more traffic between the CPU and GPU.  ... 
doi:10.1016/j.jpdc.2008.05.012 fatcat:n5i3ukuba5ahda2tqwkt7z7l7a

Fast k-NNG Construction with GPU-Based Quick Multi-Select

Ivan Komarov, Ali Dashti, Roshan M. D'Souza, Moncho Gomez-Gesteira
2014 PLoS ONE  
Benchmarks show significant improvement over state-of-the-art implementations of the k-NN search on GPUs.  ...  Our optimization makes clever use of warp voting functions available on the latest GPUs along with use-controlled cache.  ...  For low dimensional data-sets, there are a variety of indexing data structures such as kdtrees [9] , BBD-trees [10] , random-projection trees (rp-trees) [11] , and hashing based on locally sensitive  ... 
doi:10.1371/journal.pone.0092409 pmid:24809341 pmcid:PMC4014471 fatcat:mcwn2t4adjhz7bqdob2uhtl6a4

Index Search Algorithms for Databases and Modern CPUs [article]

Florian Gross
2017 arXiv   pre-print
Over the years, many different indexing techniques and search algorithms have been proposed, including CSS-trees, CSB+ trees, k-ary binary search, and fast architecture sensitive tree search.  ...  We show how to combine index compilation with previous approaches, such as binary tree search, cache-sensitive tree search, and the architecture-sensitive tree search presented by Kim et al.  ...  Fast Architecture Sensitive Tree Search Fast architecture sensitive tree search (FAST, [KCS + 10]) unifies the optimality properties of CSS-tree search and k-ary search.  ... 
arXiv:1706.06697v1 fatcat:mlxplwmmpbgk3dsohcrhhpc2ki

Applications and Techniques for Fast Machine Learning in Science [article]

Allison McCarn Deiana, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini (+74 others)
2021 arXiv   pre-print
The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for  ...  training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms.  ...  hardware, e.g., CPU, GPU, ASIC, and FPGA.  ... 
arXiv:2110.13041v1 fatcat:cvbo2hmfgfcuxi7abezypw2qrm

Applying Deep Learning to Fast Radio Burst Classification

Liam Connor, Joeri van Leeuwen
2018 Astronomical Journal  
Upcoming Fast Radio Burst (FRB) surveys will search ∼10 ^3 beams on sky with very high duty cycle, generating large numbers of single-pulse candidates.  ...  We construct a tree-like deep neural network (DNN) that takes multiple or individual data products as input (e.g. dynamic spectra and multi-beam detection information) and trains on them simultaneously  ...  We thank Emily Petroff for helpful comments on the manuscript, as well as our anonymous referee for valuable feedback.  ... 
doi:10.3847/1538-3881/aae649 fatcat:pcq3h6e6jvg25gd7bjur3sqmkm

Jet: Fast quantum circuit simulations with parallel task-based tensor-network contraction

Trevor Vincent, Lee J. O'Riordan, Mikhail Andrenkov, Jack Brown, Nathan Killoran, Haoyu Qi, Ish Dhand
2022 Quantum  
We demonstrate the advantages of our method by benchmarking our code on several Sycamore-53 and Gaussian boson sampling (GBS) supremacy circuits against other simulators.  ...  iii) the concurrent contraction of tensor networks on all available hardware.  ...  The authors thank SOSCIP for their computational resources and financial support. We acknowledge the computational resources and support from SciNet.  ... 
doi:10.22331/q-2022-05-09-709 fatcat:4upmghf7wrg4vajckhk3kajjym

Fast Local Tone Mapping, Summed-Area Tables and Mesopic Vision Simulation [chapter]

Marcos Slomp, Michihiro Mikamo, Kazufumi Kane
2012 Computer Graphics  
to photographs and films.  ...  Display devices, on the other hand, are much more restrictive, since there is no way to dynamically improve or alter their inherently fixed dynamic range capabilities.  ...  As the expected clash between GPU and multi-core CPU architectures comes to a close, such memory access constraints tend to disappear.  ... 
doi:10.5772/37288 fatcat:eb73wm3khnarbihmm5jjdt5c3m

Corrfunc — A Suite of Blazing Fast Correlation Functions on the CPU [article]

Manodeep Sinha, Lehman H. Garrison
2019 arXiv   pre-print
The improved performance of Corrfunc arises from both efficient algorithms as well as software design that suits the underlying hardware of modern CPUs.  ...  Corrfunc is designed to be both user-friendly and fast and is publicly available at https://github.com/manodeep/Corrfunc.  ...  Mao and A. Hearin for constructive discussion about Corrfunc over the years. MS would particularly like to thank J.  ... 
arXiv:1911.03545v1 fatcat:a3ydapxxc5culkhpe4ndwedihe
« Previous Showing results 1 — 15 out of 2,553 results