33 Hits in 7.2 sec

Improving main memory hash joins on Intel Xeon Phi processors

Saurabh Jha, Bingsheng He, Mian Lu, Xuntao Cheng, Huynh Phung Huynh
2015 Proceedings of the VLDB Endowment  
Modern processor technologies have driven new designs and implementations in main-memory hash joins.  ...  In particular, we study two camps of hash join algorithms: hardwareconscious ones that advocate careful tailoring of the join algorithms to underlying hardware architectures and hardwareoblivious ones  ...  Acknowledgment We would like to thank the authors of [7] and [8] for providing the source code. This work is supported by a MoE AcRF Tier 2 grant (MOE2012-T2-2-067) in Singapore.  ... 
doi:10.14778/2735703.2735704 fatcat:ion4mquxq5difphvo3fe6pqfma

Evaluating memory-hard proof-of-work algorithms on three processors

Zonghao Feng, Qiong Luo
2020 Proceedings of the VLDB Endowment  
In this paper, we study the performance of representative memory-hard PoW algorithms on the CPU, the Graphics Processing Unit (GPU), and the Intel Knights Landing (KNL) processors.  ...  Our experimental results show that (1) the GPU dominates the CPU and the KNL processors on each algorithm, (2) all algorithms scale well with number of threads on the CPU and KNL, and (3) the size of accessed  ...  The second generation Intel Xeon Phi processor, code named Knights Landing (KNL), is a many-core processor that represents a middle-ofthe-road approach.  ... 
doi:10.14778/3380750.3380759 fatcat:pt774tpkyjerpes6yzlzwkgdku

Everything you always wanted to know about compiled and vectorized queries but were afraid to ask

Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, Peter Boncz
2018 Proceedings of the VLDB Endowment  
However, until today it is not clear which paradigm yields faster query execution, as many implementation-specific choices obstruct a direct comparison of architectures.  ...  The query engines of most modern database systems are either based on vectorization or data-centric code generation.  ...  The available memory, number of cores, and the fact that many SIMD resources are available make the Knights Landing processor seem like a perfect OLAP machine.  ... 
doi:10.14778/3275366.3275370 fatcat:bkugfkbgtnhdhcr6oeyof4baj4

vectorizing algorithm [chapter]

2014 Dictionary Geotechnical Engineering/Wörterbuch GeoTechnik  
It interleaves multiple execution instances of vectorized code to hide memory access latency with more computation.  ...  SIMD is an instruction set in mainstream processors, which provides the data level parallelism to accelerate the performance of applications.  ...  Experimental Setup We conduct experiments on two hardware platforms: a server equipped with two Intel Xeon Silver 4110 CPUs based on Skylake micro-architecture (SKX), and a server with an Intel Knights  ... 
doi:10.1007/978-3-642-41714-6_220210 fatcat:rjbjvj5xzzg2zn4u662f5liuje

High Bandwidth Memory on FPGAs: A Data Analytics Perspective [article]

Kaan Kara, Christoph Hagleitner, Dionysios Diamantopoulos, Dimitris Syrivelis, Gustavo Alonso
2020 arXiv   pre-print
In this paper, we study the usage and benefits of HBM on FPGAs from a data analytics perspective.  ...  Inclusion of High Bandwidth Memory (HBM) in FPGA devices is a recent example.  ...  Prominent examples include processors such as Intel Knights Landing (KNL) [40] , NVIDIA Titan V, and Google's TPU [1] .  ... 
arXiv:2004.01635v1 fatcat:wrme53ej3bgarbeeadw7vvgvpe

An energy-aware performance analysis of SWIMM:Smith-Waterman implementation onIntel'sMulticore andManycore architectures

Enzo Rucci, Carlos García, Guillermo Botella, Armando De Giusti, Marcelo Naiouf, Manuel Prieto-Matías
2015 Concurrency and Computation  
There are several implementations that take advantage of computing parallelization, such as many-cores, FPGAs or GPUs, in order to reduce the alignment effort.  ...  We efficiently exploit data and thread-level parallelism, reaching up to 380 GCUPS on heterogeneous architecture, 350 GCUPS for the isolated Xeon and 50 GCUPS on Xeon Phi.  ...  6.5: • The first one is equipped with: -Two Intel Xeon E5-2670 8-core 2.60GHz CPUs with hyper-threading enabled and 32 GB main memory.  ... 
doi:10.1002/cpe.3598 fatcat:rpxo3tzgofbk5mpr7q6q2rd36y

Adaptive Geospatial Joins for Modern Hardware [article]

Andreas Kipf, Harald Lang, Varun Pandey, Raul Alexandru Persa, Peter Boncz, Thomas Neumann, Alfons Kemper
2018 arXiv   pre-print
We optimized our implementation for modern hardware architectures with wide SIMD vector processing units, including Intel's brand new Knights Landing.  ...  Geospatial joins are a core building block of connected mobility applications. An especially challenging problem are joins between streaming points and static polygons.  ...  Knights Landing Processor Just like its predecessor, the KNL processor is a many integrated core (MIC) architecture which draws its computational power from wide VPUs.  ... 
arXiv:1802.09488v1 fatcat:jccu7p2fyjds5mqlkbwubc5xdu

Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines

Harald Lang, Linnea Passing, Andreas Kipf, Peter Boncz, Thomas Neumann, Alfons Kemper
2019 The VLDB journal  
We evaluate our approach with three query types: (i) a table scan query based on TPC-H Query 1, that performs up to 34% faster when addressing underutilization, (ii) a hashjoin query, where we observe  ...  up to 25% higher performance, and (iii) an approximate geospatial join query, which shows performance improvements of up to 30%.  ...  The experiments were conducted on an Intel Skylake-X (SKX) and an Intel Knights Landing (KNL) processor (cf., Table 1 ).  ... 
doi:10.1007/s00778-019-00547-y fatcat:v5w7ckyrpnhg7jc6l6gfrzg43i

Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures

Mohammed Ahmed Al Farhan, David Keyes
2018 IEEE Transactions on Parallel and Distributed Systems  
On a 64-core KNL chip, we achieve nearly 2.9x speedup of the dominant routines relative to the baseline.  ...  These optimizations are expected to be of value for many other unstructured mesh PDE-based scientific applications as multi and many-core architecture evolves.  ...  New studies of the strong scalability of PETSc-FUN3D at the node level are required with the emergence of the Knights Landing (KNL) and Skylake architectures, to evaluate the effectiveness of new tools  ... 
doi:10.1109/tpds.2018.2826533 fatcat:qr3g4ram3vg33grltsklz5pdoe

Monarch: A Durable Polymorphic Memory For Data Intensive Applications [article]

Ananth Krishna Prasad, Mahdi Nazm Bojnordi
2021 arXiv   pre-print
Our simulation results on a set of parallel memory-intensive applications indicate that Monarch outperforms an ideal DRAM caching by 1.21x on average.  ...  This paper examines Monarch, a resistive 3D stacked memory based on a novel reconfigurable crosspoint array called XAM.  ...  An in-package multi-channel DRAM (MCDRAM) is employed by Intel's Xeon Phi processors (code-named as Knights Landing) [35] .  ... 
arXiv:2108.08497v1 fatcat:ns5ex7ozmzgu7axwm75pbbwmra

Staring into the abyss

Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, Michael Stonebraker
2014 Proceedings of the VLDB Endowment  
Computer architectures are moving towards an era dominated by many-core machines with dozens or even hundreds of cores on a single chip.  ...  We implemented seven concurrency control algorithms on a main-memory DBMS and using computer simulations scaled our system to 1024 cores.  ...  We implemented a lightweight main memory DBMS with a pluggable architecture that supports seven concurrency control schemes.  ... 
doi:10.14778/2735508.2735511 fatcat:skvpdhdw6nhplk3dlzsxajtqva

Runtime-Guided Management of Scratchpad Memories in Multicore Architectures

Lluc Alvarez, Miquel Moreto, Marc Casas, Emilio Castillo, Xavier Martorell, Jesus Labarta, Eduard Ayguade, Mateo Valero
2015 2015 International Conference on Parallel Architecture and Compilation (PACT)  
In these models the runtime system manages the execution of the tasks on the architecture, allowing them to apply many optimizations in a generic way at the runtime system level.  ...  In a 32-core multicore architecture, the hybrid memory hierarchy outperforms cache-only hierarchies by up to 16%, reduces on-chip network traffic by up to 31% and saves up to 22% of the consumed power.  ...  Instead, computer architecture is exhibiting a trend towards more heterogeneity, clearly shown by proposals like the Cell B. E. [3] , GPGPUs [4] , or more recently Intel's Knights Landing [5] .  ... 
doi:10.1109/pact.2015.26 dblp:conf/IEEEpact/AlvarezMCCMLAV15 fatcat:6sz6so6b4fay3oia5tx6dx4ooe

Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption [article]

Paweł Rościszewski
2018 arXiv   pre-print
Many important computational problems require utilization of high performance computing (HPC) systems that consist of multi-level structures combining higher and higher numbers of devices with various  ...  Mapping of the application processes on computing devices has also a significant impact on these criteria.  ...  For example, the Knights Landing architecture used by the second generation of Intel MIC accelerators is built from up to 72 cores with possibility to run four threads per core.  ... 
arXiv:1809.07611v1 fatcat:f2vl3kmgznckroj6h3uwt2zwf4

Parallel Programming With Global Asynchronous Memory: Models, C++ Apis And Implementations

Maurizio Drocco, Marco Aldinucci
2017 Zenodo  
Early attempts to bring shared-memory programming model—with its programming advantages—to distributed computing, referred as the Distributed Shared Memory (DSM) model, faded away; one of the main issue  ...  On top of smart pointers, we propose a high-level C++ template library for writing applications in terms of dataflow-like networks, namely GAM nets, consisting of stateful processors exchanging pointers  ...  Knights Landing) processor [88] .  ... 
doi:10.5281/zenodo.1037585 fatcat:ecjm5xj5x5exbfxe3eokl7uneu

Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps

Ruben Langarita, Adria Armejach, Javier Setoain, Pablo Enrique Ibanez Marin, Jesus Alastruey-Benede, Miquel Moreto Planas
2020 IEEE/ACM Transactions on Computational Biology & Bioinformatics  
An algorithm based on this new layout is evaluated on both a Knights Landing (KNL) and an Skylake-based system (SKX).  ...  Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound problem.  ...  In particular, COFI can perform 15 k-steps with a manageable memory footprint of 16 GB. • We evaluate COFI on two different modern hardware platforms: an Intel Xeon Phi 7230 (KNL) and an Intel Xeon Platinum  ... 
doi:10.1109/tcbb.2020.3000253 pmid:32750858 fatcat:2vwez7mvfrcpvhg6ad2jttgcoe
« Previous Showing results 1 — 15 out of 33 results