A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Accelerating AES with Vector Permute Instructions
[chapter]
2009
Lecture Notes in Computer Science
We demonstrate new techniques to speed up the Rijndael (AES) block cipher using vector permute instructions. ...
We focus on Intel's SSSE3 and Motorola's Altivec, but our techniques can be adapted to other systems with vector permute instructions, such as the IBM Xenon and Cell processors, the ARM Cortex series and ...
We examine another hardware option for accelerating and protecting Rijndael: vector units with permutation instructions, such as the PowerPC AltiVec unit or Intel processors supporting the SSSE3 instruction ...
doi:10.1007/978-3-642-04138-9_2
fatcat:pwamfils6beu5d2fsrqxuajyrq
Fast keyed hash/pseudo-random function using SIMD multiply and permute
[article]
2017
arXiv
pre-print
HighwayHash is a new pseudo-random function based on SIMD multiply and permute instructions for thorough and fast hashing. It is 5.2 times as fast as SipHash for 1 KiB inputs. ...
Assuming it withstands further analysis, strengthened variants may also substantially accelerate file checksums and stream ciphers. ...
We introduce a simple but seemingly novel approach: mixing multiplication results with byte-level permute instructions. Let us derive a suitable permutation. ...
arXiv:1612.06257v3
fatcat:eabluwugqbedrgmym4nzqejrla
Efficient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers
[article]
2020
IACR Cryptology ePrint Archive
dynamic loading and execution of block ciphers on the core, we demonstrate a single LWC deployment on an Artix-7 FPGA, capable of executing 3 NIST LWC Standardization Process Round 2 AEAD candidates (COMET-AES ...
In this construct, developers design hardware implementations of authenticated encryption with associated data (AEAD) inside a cryptographic core (CryptoCore) encapsulated by input/output utilities. ...
Our architecture allows for experimentation with cryptographic-specific instruction set extensions (ISEs) and memory-mapped accelerators at low overhead. 4. ...
dblp:journals/iacr/RezvaniCBBLMSVD20
fatcat:52bekqwuqbf7hhodb4j555ew7i
A specialized low-cost vectorized loop buffer for embedded processors
2011
2011 Design, Automation & Test in Europe
The vectorized loop buffer (VLB) is simplified with single loop support for SIMD devices. ...
We extend several instructions to the baseline ISA for programming and integrate it into an embedded processor for evaluation. ...
The first specialization of VLB is to employ implicit data permutation (IDP) mechanism into its organization via a special designed permutation vector register file (PVRF) [4] . ...
doi:10.1109/date.2011.5763313
dblp:conf/date/HuangWSLXL11
fatcat:sipdstrb6vevzmxoigsae3cvba
An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics
2017
IEEE Transactions on Circuits and Systems Part 1: Regular Papers
To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks ...
deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ ...
as vectors. ...
doi:10.1109/tcsi.2017.2698019
fatcat:x5o4ec64gnbirpxyqvor2swi7a
Implementation of new hybrid lightweight cryptosystem
2018
Applied Computing and Informatics
Proposed technique uses the fastest bit permutation instruction PERMS with S-box of PRESENT block cipher for non-linearity. ...
An arbitrary n-bit permutation is performed using PERMS instruction in less than log (n) number of instructions. ...
Apart from these two basic methods, bit permutation can be accelerated with the help of certain instructions like BFLY-IBFLY [6] , PPERM-PPERM3R, CROSS, GRP, OMFLIP and SWPERM-SIEVE. ...
doi:10.1016/j.aci.2018.05.001
fatcat:xuaaor2tqzaxjnjinfa5k4ttym
Gimli : A Cross-Platform Permutation
[chapter]
2017
Lecture Notes in Computer Science
This paper presents Gimli, a 384-bit permutation designed to achieve high security with high performance across a broad range of platforms, including 64bit Intel/AMD server CPUs, 64-bit and 32bit ARM smartphone ...
This paper presents Gimli, a 384-bit permutation designed to achieve high security with high performance across a broad range of platforms, including 64-bit Intel/AMD server CPUs, 64-bit and 32bit ARM ...
integer instructions ("SSE2") starting with the Pentium 4 in 2001, and 256-bit vectorized integer instructions ("AVX2") starting with the Haswell in 2013. ...
doi:10.1007/978-3-319-66787-4_15
fatcat:iezmwrpkgfarle7thx4chabixu
Towards a Truly Integrated Vector Processing Unit for Memory-bound Applications Based on a Cost-competitive Computational SRAM Design Solution
2022
ACM Journal on Emerging Technologies in Computing Systems
Operations are performed on large vectors of data occupying the entire physical row of C-SRAM array, leading to high performance gains. ...
We detail the C-SRAM system design on different levels: (i) circuit design and silicon proof of concept, (ii) system interface and instruction set architecture, and (iii) high-level software programming ...
They perform bit-serial operations for in-memory vector acceleration. These approaches can be used as dedicated accelerators for dedicated application domains such as neural network or cryptography. ...
doi:10.1145/3485823
fatcat:56ajw5q2snehvd6h5g7wluckry
Randen - fast backtracking-resistant random generator with AES+Feistel+Reverie
[article]
2018
arXiv
pre-print
Randen is an instantiation of Reverie, a recently published robust sponge-like random generator, with a new permutation built from an improved generalized Feistel structure with 16 branches. ...
This is made possible by hardware acceleration. ...
For convenience, we assume the availability of a platform-specific 128-bit SIMD vector type V with associated Load, Store and AES functions. ...
arXiv:1810.02227v1
fatcat:ocbjk47j6re4vgqwdvlo7nl46u
A Fast and Compact Accelerator for Ascon and Friends
[article]
2020
IACR Cryptology ePrint Archive
This single instruction allows us to realize all cryptographic computations that typically occur on embedded devices with high performance. ...
More concretely, with Isap and Ascon's family of modes for AEAD and hashing, we can perform cryptographic computations with a performance of about 2 cycles/byte, or about 4 cycles/byte if protection against ...
Our accelerator is configured to perform 1 permutation round per clock cycle. ...
dblp:journals/iacr/SteineggerP20
fatcat:mj5xfcjvv5bk3c2zh3yki6urs4
Improving DSP Performance with a Small Amount of Field Programmable Logic
[chapter]
2003
Lecture Notes in Computer Science
We demonstrate our methodology with the implementation of a Viterbi decoder. ...
The area overhead of the FPDAU is small relative to the DSP die size and does not require any changes to the programming model or the instruction set architecture. ...
Many DSPs have custom ACS instructions to accelerate this process. ...
doi:10.1007/978-3-540-45234-8_51
fatcat:5w2qq7yvgzbavohtsr3ra5zt4q
Climate Change Influences Potential Distribution of Infected Aedes aegypti Co-Occurrence with Dengue Epidemics Risk Areas in Tanzania
2016
PLoS ONE
In 2050 climate scenario, the predicted habitat suitability of infected Ae. aegypti co-occurrence with dengue shifted towards the central and north-easternparts with intensification in areas PLOS ONE | ...
Model predictions indicated that habitat suitability for infected Ae. aegypti co-occurrence with dengue virus in current scenarios is highly localized in the coastal areas, including Dar es Salaam, Pwani ...
, CA) according to manufacturer's instructions. ...
doi:10.1371/journal.pone.0162649
pmid:27681327
pmcid:PMC5040426
fatcat:ncxy32fzuvevbkqfub43ljxdqi
Speeding up R-LWE Post-quantum Key Exchange
[chapter]
2016
Lecture Notes in Computer Science
We optimize three independent directions: efficient pseudorandom bytes generation, decreasing the rejection rate during sampling, and vectorizing the sampling step. ...
Vectorized rejection sampling The process of filtering pseudorandom 16-bit candidates can be accelerated by using SIMD instructions. ...
Using AES (with AES-NI). We used the pipelined AES implementation of [7, 6] , which performs at 0.92 C/B on our test platform ("Skylake"). ...
doi:10.1007/978-3-319-47560-8_12
fatcat:ouvydjyguvehlefiv74jr5djdq
Auto-vectorization of interleaved data for SIMD
2006
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation - PLDI '06
Most implementations of the Single Instruction Multiple Data (SIMD) model available today require that data elements be packed in vector registers. ...
In this paper we demonstrate an automatic compilation scheme that supports effective vectorization in the presence of interleaved data with strides that are power of 2, facilitating data reorganization ...
them too with a vector instruction. ...
doi:10.1145/1133981.1133997
dblp:conf/pldi/NuzmanRZ06
fatcat:lvj2f752b5gv3jygreunnnbjea
A universal hardware API for authenticated ciphers
2015
2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)
and AES-CCM. ...
and the Keccak Permutation F, which may be used as building blocks in implementations of related ciphers. ...
AES and Keccak Permutation F Additional support is provided for designers of cipher cores of CAESAR candidates based on AES and Keccak. ...
doi:10.1109/reconfig.2015.7393283
dblp:conf/reconfig/HomsirikamolDFF15
fatcat:ewsfsxnyk5helbx2vuzxzg7fl4
« Previous
Showing results 1 — 15 out of 1,265 results