1,265 Hits in 4.3 sec

Accelerating AES with Vector Permute Instructions [chapter]

Mike Hamburg
2009 Lecture Notes in Computer Science  
We demonstrate new techniques to speed up the Rijndael (AES) block cipher using vector permute instructions.  ...  We focus on Intel's SSSE3 and Motorola's Altivec, but our techniques can be adapted to other systems with vector permute instructions, such as the IBM Xenon and Cell processors, the ARM Cortex series and  ...  We examine another hardware option for accelerating and protecting Rijndael: vector units with permutation instructions, such as the PowerPC AltiVec unit or Intel processors supporting the SSSE3 instruction  ... 
doi:10.1007/978-3-642-04138-9_2 fatcat:pwamfils6beu5d2fsrqxuajyrq

Fast keyed hash/pseudo-random function using SIMD multiply and permute [article]

Jyrki Alakuijala and Bill Cox and Jan Wassenberg
2017 arXiv   pre-print
HighwayHash is a new pseudo-random function based on SIMD multiply and permute instructions for thorough and fast hashing. It is 5.2 times as fast as SipHash for 1 KiB inputs.  ...  Assuming it withstands further analysis, strengthened variants may also substantially accelerate file checksums and stream ciphers.  ...  We introduce a simple but seemingly novel approach: mixing multiplication results with byte-level permute instructions. Let us derive a suitable permutation.  ... 
arXiv:1612.06257v3 fatcat:eabluwugqbedrgmym4nzqejrla

Efficient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers [article]

Behnaz Rezvani, Thomas Conroy, Luke Beckwith, Matthew Bozzay, Trevor Laffoon, David McFeeters, Yijia Shi, Minh Vu, William Diehl
2020 IACR Cryptology ePrint Archive  
dynamic loading and execution of block ciphers on the core, we demonstrate a single LWC deployment on an Artix-7 FPGA, capable of executing 3 NIST LWC Standardization Process Round 2 AEAD candidates (COMET-AES  ...  In this construct, developers design hardware implementations of authenticated encryption with associated data (AEAD) inside a cryptographic core (CryptoCore) encapsulated by input/output utilities.  ...  Our architecture allows for experimentation with cryptographic-specific instruction set extensions (ISEs) and memory-mapped accelerators at low overhead. 4.  ... 
dblp:journals/iacr/RezvaniCBBLMSVD20 fatcat:52bekqwuqbf7hhodb4j555ew7i

A specialized low-cost vectorized loop buffer for embedded processors

Libo Huang, Zhiying Wang, Li Shen, Hongyi Lu, Nong Xiao, Cong Liu
2011 2011 Design, Automation & Test in Europe  
The vectorized loop buffer (VLB) is simplified with single loop support for SIMD devices.  ...  We extend several instructions to the baseline ISA for programming and integrate it into an embedded processor for evaluation.  ...  The first specialization of VLB is to employ implicit data permutation (IDP) mechanism into its organization via a special designed permutation vector register file (PVRF) [4] .  ... 
doi:10.1109/date.2011.5763313 dblp:conf/date/HuangWSLXL11 fatcat:sipdstrb6vevzmxoigsae3cvba

An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

Francesco Conti, Robert Schilling, Pasquale Davide Schiavone, Antonio Pullini, Davide Rossi, Frank Kagan Gurkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini
2017 IEEE Transactions on Circuits and Systems Part 1: Regular Papers  
To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks  ...  deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ  ...  as vectors.  ... 
doi:10.1109/tcsi.2017.2698019 fatcat:x5o4ec64gnbirpxyqvor2swi7a

Implementation of new hybrid lightweight cryptosystem

C.G. Thorat, V.S. Inamdar
2018 Applied Computing and Informatics  
Proposed technique uses the fastest bit permutation instruction PERMS with S-box of PRESENT block cipher for non-linearity.  ...  An arbitrary n-bit permutation is performed using PERMS instruction in less than log (n) number of instructions.  ...  Apart from these two basic methods, bit permutation can be accelerated with the help of certain instructions like BFLY-IBFLY [6] , PPERM-PPERM3R, CROSS, GRP, OMFLIP and SWPERM-SIEVE.  ... 
doi:10.1016/j.aci.2018.05.001 fatcat:xuaaor2tqzaxjnjinfa5k4ttym

Gimli : A Cross-Platform Permutation [chapter]

Daniel J. Bernstein, Stefan Kölbl, Stefan Lucks, Pedro Maat Costa Massolino, Florian Mendel, Kashif Nawaz, Tobias Schneider, Peter Schwabe, François-Xavier Standaert, Yosuke Todo, Benoît Viguier
2017 Lecture Notes in Computer Science  
This paper presents Gimli, a 384-bit permutation designed to achieve high security with high performance across a broad range of platforms, including 64bit Intel/AMD server CPUs, 64-bit and 32bit ARM smartphone  ...  This paper presents Gimli, a 384-bit permutation designed to achieve high security with high performance across a broad range of platforms, including 64-bit Intel/AMD server CPUs, 64-bit and 32bit ARM  ...  integer instructions ("SSE2") starting with the Pentium 4 in 2001, and 256-bit vectorized integer instructions ("AVX2") starting with the Haswell in 2013.  ... 
doi:10.1007/978-3-319-66787-4_15 fatcat:iezmwrpkgfarle7thx4chabixu

Towards a Truly Integrated Vector Processing Unit for Memory-bound Applications Based on a Cost-competitive Computational SRAM Design Solution

Maha Kooli, Antoine Heraud, Henri-Pierre Charles, Bastien Giraud, Roman Gauchi, Mona Ezzadeen, Kevin Mambu, Valentin Egloff, Jean-Philippe Noel
2022 ACM Journal on Emerging Technologies in Computing Systems  
Operations are performed on large vectors of data occupying the entire physical row of C-SRAM array, leading to high performance gains.  ...  We detail the C-SRAM system design on different levels: (i) circuit design and silicon proof of concept, (ii) system interface and instruction set architecture, and (iii) high-level software programming  ...  They perform bit-serial operations for in-memory vector acceleration. These approaches can be used as dedicated accelerators for dedicated application domains such as neural network or cryptography.  ... 
doi:10.1145/3485823 fatcat:56ajw5q2snehvd6h5g7wluckry

Randen - fast backtracking-resistant random generator with AES+Feistel+Reverie [article]

Jan Wassenberg, Robert Obryk, Jyrki Alakuijala, Emmanuel Mogenet
2018 arXiv   pre-print
Randen is an instantiation of Reverie, a recently published robust sponge-like random generator, with a new permutation built from an improved generalized Feistel structure with 16 branches.  ...  This is made possible by hardware acceleration.  ...  For convenience, we assume the availability of a platform-specific 128-bit SIMD vector type V with associated Load, Store and AES functions.  ... 
arXiv:1810.02227v1 fatcat:ocbjk47j6re4vgqwdvlo7nl46u

A Fast and Compact Accelerator for Ascon and Friends [article]

Stefan Steinegger, Robert Primas
2020 IACR Cryptology ePrint Archive  
This single instruction allows us to realize all cryptographic computations that typically occur on embedded devices with high performance.  ...  More concretely, with Isap and Ascon's family of modes for AEAD and hashing, we can perform cryptographic computations with a performance of about 2 cycles/byte, or about 4 cycles/byte if protection against  ...  Our accelerator is configured to perform 1 permutation round per clock cycle.  ... 
dblp:journals/iacr/SteineggerP20 fatcat:mj5xfcjvv5bk3c2zh3yki6urs4

Improving DSP Performance with a Small Amount of Field Programmable Logic [chapter]

John Oliver, Venkatesh Akella
2003 Lecture Notes in Computer Science  
We demonstrate our methodology with the implementation of a Viterbi decoder.  ...  The area overhead of the FPDAU is small relative to the DSP die size and does not require any changes to the programming model or the instruction set architecture.  ...  Many DSPs have custom ACS instructions to accelerate this process.  ... 
doi:10.1007/978-3-540-45234-8_51 fatcat:5w2qq7yvgzbavohtsr3ra5zt4q

Climate Change Influences Potential Distribution of Infected Aedes aegypti Co-Occurrence with Dengue Epidemics Risk Areas in Tanzania

Clement N. Mweya, Sharadhuli I. Kimera, Grades Stanley, Gerald Misinzo, Leonard E. G. Mboera, Richard Paul
2016 PLoS ONE  
In 2050 climate scenario, the predicted habitat suitability of infected Ae. aegypti co-occurrence with dengue shifted towards the central and north-easternparts with intensification in areas PLOS ONE |  ...  Model predictions indicated that habitat suitability for infected Ae. aegypti co-occurrence with dengue virus in current scenarios is highly localized in the coastal areas, including Dar es Salaam, Pwani  ...  , CA) according to manufacturer's instructions.  ... 
doi:10.1371/journal.pone.0162649 pmid:27681327 pmcid:PMC5040426 fatcat:ncxy32fzuvevbkqfub43ljxdqi

Speeding up R-LWE Post-quantum Key Exchange [chapter]

Shay Gueron, Fabian Schlieker
2016 Lecture Notes in Computer Science  
We optimize three independent directions: efficient pseudorandom bytes generation, decreasing the rejection rate during sampling, and vectorizing the sampling step.  ...  Vectorized rejection sampling The process of filtering pseudorandom 16-bit candidates can be accelerated by using SIMD instructions.  ...  Using AES (with AES-NI). We used the pipelined AES implementation of [7, 6] , which performs at 0.92 C/B on our test platform ("Skylake").  ... 
doi:10.1007/978-3-319-47560-8_12 fatcat:ouvydjyguvehlefiv74jr5djdq

Auto-vectorization of interleaved data for SIMD

Dorit Nuzman, Ira Rosen, Ayal Zaks
2006 Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation - PLDI '06  
Most implementations of the Single Instruction Multiple Data (SIMD) model available today require that data elements be packed in vector registers.  ...  In this paper we demonstrate an automatic compilation scheme that supports effective vectorization in the presence of interleaved data with strides that are power of 2, facilitating data reorganization  ...  them too with a vector instruction.  ... 
doi:10.1145/1133981.1133997 dblp:conf/pldi/NuzmanRZ06 fatcat:lvj2f752b5gv3jygreunnnbjea

A universal hardware API for authenticated ciphers

Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, Malik Umar Sharif, Kris Gaj
2015 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)  
and AES-CCM.  ...  and the Keccak Permutation F, which may be used as building blocks in implementations of related ciphers.  ...  AES and Keccak Permutation F Additional support is provided for designers of cipher cores of CAESAR candidates based on AES and Keccak.  ... 
doi:10.1109/reconfig.2015.7393283 dblp:conf/reconfig/HomsirikamolDFF15 fatcat:ewsfsxnyk5helbx2vuzxzg7fl4
« Previous Showing results 1 — 15 out of 1,265 results