A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Improving main memory hash joins on Intel Xeon Phi processors
2015
Proceedings of the VLDB Endowment
Modern processor technologies have driven new designs and implementations in main-memory hash joins. ...
In particular, we study two camps of hash join algorithms: hardwareconscious ones that advocate careful tailoring of the join algorithms to underlying hardware architectures and hardwareoblivious ones ...
Acknowledgment We would like to thank the authors of [7] and [8] for providing the source code. This work is supported by a MoE AcRF Tier 2 grant (MOE2012-T2-2-067) in Singapore. ...
doi:10.14778/2735703.2735704
fatcat:ion4mquxq5difphvo3fe6pqfma
Evaluating memory-hard proof-of-work algorithms on three processors
2020
Proceedings of the VLDB Endowment
In this paper, we study the performance of representative memory-hard PoW algorithms on the CPU, the Graphics Processing Unit (GPU), and the Intel Knights Landing (KNL) processors. ...
Our experimental results show that (1) the GPU dominates the CPU and the KNL processors on each algorithm, (2) all algorithms scale well with number of threads on the CPU and KNL, and (3) the size of accessed ...
The second generation Intel Xeon Phi processor, code named Knights Landing (KNL), is a many-core processor that represents a middle-ofthe-road approach. ...
doi:10.14778/3380750.3380759
fatcat:pt774tpkyjerpes6yzlzwkgdku
Everything you always wanted to know about compiled and vectorized queries but were afraid to ask
2018
Proceedings of the VLDB Endowment
However, until today it is not clear which paradigm yields faster query execution, as many implementation-specific choices obstruct a direct comparison of architectures. ...
The query engines of most modern database systems are either based on vectorization or data-centric code generation. ...
The available memory, number of cores, and the fact that many SIMD resources are available make the Knights Landing processor seem like a perfect OLAP machine. ...
doi:10.14778/3275366.3275370
fatcat:bkugfkbgtnhdhcr6oeyof4baj4
vectorizing algorithm
[chapter]
2014
Dictionary Geotechnical Engineering/Wörterbuch GeoTechnik
It interleaves multiple execution instances of vectorized code to hide memory access latency with more computation. ...
SIMD is an instruction set in mainstream processors, which provides the data level parallelism to accelerate the performance of applications. ...
Experimental Setup We conduct experiments on two hardware platforms: a server equipped with two Intel Xeon Silver 4110 CPUs based on Skylake micro-architecture (SKX), and a server with an Intel Knights ...
doi:10.1007/978-3-642-41714-6_220210
fatcat:rjbjvj5xzzg2zn4u662f5liuje
High Bandwidth Memory on FPGAs: A Data Analytics Perspective
[article]
2020
arXiv
pre-print
In this paper, we study the usage and benefits of HBM on FPGAs from a data analytics perspective. ...
Inclusion of High Bandwidth Memory (HBM) in FPGA devices is a recent example. ...
Prominent examples include processors such as Intel Knights Landing (KNL) [40] , NVIDIA Titan V, and Google's TPU [1] . ...
arXiv:2004.01635v1
fatcat:wrme53ej3bgarbeeadw7vvgvpe
An energy-aware performance analysis of SWIMM:Smith-Waterman implementation onIntel'sMulticore andManycore architectures
2015
Concurrency and Computation
There are several implementations that take advantage of computing parallelization, such as many-cores, FPGAs or GPUs, in order to reduce the alignment effort. ...
We efficiently exploit data and thread-level parallelism, reaching up to 380 GCUPS on heterogeneous architecture, 350 GCUPS for the isolated Xeon and 50 GCUPS on Xeon Phi. ...
6.5: • The first one is equipped with: -Two Intel Xeon E5-2670 8-core 2.60GHz CPUs with hyper-threading enabled and 32 GB main memory. ...
doi:10.1002/cpe.3598
fatcat:rpxo3tzgofbk5mpr7q6q2rd36y
Adaptive Geospatial Joins for Modern Hardware
[article]
2018
arXiv
pre-print
We optimized our implementation for modern hardware architectures with wide SIMD vector processing units, including Intel's brand new Knights Landing. ...
Geospatial joins are a core building block of connected mobility applications. An especially challenging problem are joins between streaming points and static polygons. ...
Knights Landing Processor Just like its predecessor, the KNL processor is a many integrated core (MIC) architecture which draws its computational power from wide VPUs. ...
arXiv:1802.09488v1
fatcat:jccu7p2fyjds5mqlkbwubc5xdu
Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines
2019
The VLDB journal
We evaluate our approach with three query types: (i) a table scan query based on TPC-H Query 1, that performs up to 34% faster when addressing underutilization, (ii) a hashjoin query, where we observe ...
up to 25% higher performance, and (iii) an approximate geospatial join query, which shows performance improvements of up to 30%. ...
The experiments were conducted on an Intel Skylake-X (SKX) and an Intel Knights Landing (KNL) processor (cf., Table 1 ). ...
doi:10.1007/s00778-019-00547-y
fatcat:v5w7ckyrpnhg7jc6l6gfrzg43i
Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures
2018
IEEE Transactions on Parallel and Distributed Systems
On a 64-core KNL chip, we achieve nearly 2.9x speedup of the dominant routines relative to the baseline. ...
These optimizations are expected to be of value for many other unstructured mesh PDE-based scientific applications as multi and many-core architecture evolves. ...
New studies of the strong scalability of PETSc-FUN3D at the node level are required with the emergence of the Knights Landing (KNL) and Skylake architectures, to evaluate the effectiveness of new tools ...
doi:10.1109/tpds.2018.2826533
fatcat:qr3g4ram3vg33grltsklz5pdoe
Monarch: A Durable Polymorphic Memory For Data Intensive Applications
[article]
2021
arXiv
pre-print
Our simulation results on a set of parallel memory-intensive applications indicate that Monarch outperforms an ideal DRAM caching by 1.21x on average. ...
This paper examines Monarch, a resistive 3D stacked memory based on a novel reconfigurable crosspoint array called XAM. ...
An in-package multi-channel DRAM (MCDRAM) is employed by Intel's Xeon Phi processors (code-named as Knights Landing) [35] . ...
arXiv:2108.08497v1
fatcat:ns5ex7ozmzgu7axwm75pbbwmra
Staring into the abyss
2014
Proceedings of the VLDB Endowment
Computer architectures are moving towards an era dominated by many-core machines with dozens or even hundreds of cores on a single chip. ...
We implemented seven concurrency control algorithms on a main-memory DBMS and using computer simulations scaled our system to 1024 cores. ...
We implemented a lightweight main memory DBMS with a pluggable architecture that supports seven concurrency control schemes. ...
doi:10.14778/2735508.2735511
fatcat:skvpdhdw6nhplk3dlzsxajtqva
Runtime-Guided Management of Scratchpad Memories in Multicore Architectures
2015
2015 International Conference on Parallel Architecture and Compilation (PACT)
In these models the runtime system manages the execution of the tasks on the architecture, allowing them to apply many optimizations in a generic way at the runtime system level. ...
In a 32-core multicore architecture, the hybrid memory hierarchy outperforms cache-only hierarchies by up to 16%, reduces on-chip network traffic by up to 31% and saves up to 22% of the consumed power. ...
Instead, computer architecture is exhibiting a trend towards more heterogeneity, clearly shown by proposals like the Cell B. E. [3] , GPGPUs [4] , or more recently Intel's Knights Landing [5] . ...
doi:10.1109/pact.2015.26
dblp:conf/IEEEpact/AlvarezMCCMLAV15
fatcat:6sz6so6b4fay3oia5tx6dx4ooe
Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption
[article]
2018
arXiv
pre-print
Many important computational problems require utilization of high performance computing (HPC) systems that consist of multi-level structures combining higher and higher numbers of devices with various ...
Mapping of the application processes on computing devices has also a significant impact on these criteria. ...
For example, the Knights Landing architecture used by the second generation of Intel MIC accelerators is built from up to 72 cores with possibility to run four threads per core. ...
arXiv:1809.07611v1
fatcat:f2vl3kmgznckroj6h3uwt2zwf4
Parallel Programming With Global Asynchronous Memory: Models, C++ Apis And Implementations
2017
Zenodo
Early attempts to bring shared-memory programming model—with its programming advantages—to distributed computing, referred as the Distributed Shared Memory (DSM) model, faded away; one of the main issue ...
On top of smart pointers, we propose a high-level C++ template library for writing applications in terms of dataflow-like networks, namely GAM nets, consisting of stateful processors exchanging pointers ...
Knights Landing) processor [88] . ...
doi:10.5281/zenodo.1037585
fatcat:ecjm5xj5x5exbfxe3eokl7uneu
Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps
2020
IEEE/ACM Transactions on Computational Biology & Bioinformatics
An algorithm based on this new layout is evaluated on both a Knights Landing (KNL) and an Skylake-based system (SKX). ...
Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound problem. ...
In particular, COFI can perform 15 k-steps with a manageable memory footprint of 16 GB. • We evaluate COFI on two different modern hardware platforms: an Intel Xeon Phi 7230 (KNL) and an Intel Xeon Platinum ...
doi:10.1109/tcbb.2020.3000253
pmid:32750858
fatcat:2vwez7mvfrcpvhg6ad2jttgcoe
« Previous
Showing results 1 — 15 out of 33 results