51 Hits in 1.1 sec

Implementing Fast Carryless Multiplication [chapter]

Joris van der Hoeven, Robin Larrieu, Grégoire Lecerf
2017 Lecture Notes in Computer Science  
Our current implementation assumes a modern AVX2 and CLMUL enabled processor.  ...  The efficient multiplication of polynomials over the finite field 2 is a fundamental problem in computer science with several applications to geometric error correcting codes and algebraic crypto-systems  ...  IMPLEMENTING FAST CARRYLESS MULTIPLICATION  ... 
doi:10.1007/978-3-319-72453-9_9 fatcat:ed5l5di7z5fpvi5xvxnxqzbnei

Faster 128-EEA3 and 128-EIA3 Software [chapter]

Roberto Avanzi, Billy Bob Brumley
2015 Lecture Notes in Computer Science  
We also show how to leverage carryless multiplication to evaluate the universal hash function making up the core of 128-EIA3.  ...  Our software implementation results on Qualcomm's Hexagon DSP architecture indicate significant performance gains when employing these techniques: up to roughly a 2-fold and 2.5-fold throughput improvement  ...  The target architecture offers a 64 by 64-bit carryless multiplication that can be used in an obvious way to implement the approach described in Section 3.2 (in fact, a single 64-bit multiplication (b  ... 
doi:10.1007/978-3-319-27659-5_14 fatcat:xplk2rtof5crvjpm6gxomzr2lq

HalftimeHash: Modern Hashing without 64-bit Multipliers or Finite Fields [article]

Jim Apple
2021 arXiv   pre-print
In addition, HalftimeHash does not use any widening 64-bit multiplications or any finite field arithmetic that could limit its portability.  ...  Similarly, clhash and UMASH, which are based on 64-bit carryless NH, have their execution times dominated by the multiplications in their base step.  ...  [28] Of these, only clhash and U-MASH include claims of being AU; each of these uses finite fields and the x86-64 instruction for carryless (polynomial) multiplication.  ... 
arXiv:2104.08865v2 fatcat:s6bf34m2jjbbpiy7qbzdyjfa2m

Strongly Universal String Hashing is Fast

D. Lemire, O. Kaser
2013 Computer journal  
We present fast strongly universal string hashing families: they can process data at a rate of 0.2 CPU cycle per byte.  ...  Moreover, conventional wisdom is that hash functions with fewer multiplications are faster. Yet we find that they may fail to be faster due to operation pipelining.  ...  (See Appendix B for implementation details.) We expect the two carryless multiplications to account for most of the running time of the reduction.  ... 
doi:10.1093/comjnl/bxt070 fatcat:ihslalyvrbc63njznfrhcbtxgy

Multiplying boolean Polynomials with Frobenius Partitions in Additive Fast Fourier Transform [article]

Ming-Shing Chen and Chen-Mou Cheng and Po-Chun Kuo and Wen-Ding Li and Bo-Yin Yang
2018 arXiv   pre-print
The algebraic operations, including field multiplication, bit-matrix transpose, and bit-matrix multiplication, are implemented with efficient SIMD instructions.  ...  We show a new algorithm and its implementation for multiplying bit-polynomials of large degrees.  ...  Carryless Multiplication PCLMULQDQ performs the carryless multiplication of 2 64-bits polynomials, i.e., PCLMULQDQ : F 2 [x] <64 × F 2 [x] <64 → F 2 [x] <127 .  ... 
arXiv:1803.11301v1 fatcat:7kk76wb25bfkrj6i6mt2m57ufm

Classic McEliece on the ARM Cortex-M4

Ming-Shing Chen, Tung Chou
2021 Transactions on Cryptographic Hardware and Embedded Systems  
This paper presents a constant-time implementation of Classic McEliece for ARM Cortex-M4.  ...  For the level-1 parameter sets mceliece348864 and mceliece348864f, our implementation takes 582 199 cycles for encapsulation and 2 706 681 cycles for decapsulation.  ...  Individual field multiplications in F 2 m are easy to implement on platforms with instructions for carryless multiplications, such as pclmulqdq.  ... 
doi:10.46586/tches.v2021.i3.125-148 fatcat:bro7j463ajf2lj6gt4225gxyfi

A comprehensive analysis of constant-time polynomial inversion for post-quantum cryptosystems

Alessandro Barenghi, Gerardo Pelosi
2020 Proceedings of the 17th ACM International Conference on Computing Frontiers  
We exploited the presence of the carryless multiplication instruction (pclmulqdq) which performs a polynomial multiplication of two 64-bit elements in a 128-bit one.  ...  Indeed, it is possible to obtain a fast squaring exploiting the pclmulqdq instruction which performs a binary polynomial multiplication of two, 64 terms, polynomials in a 128 terms one.  ... 
doi:10.1145/3387902.3397224 dblp:conf/cf/BarenghiP20a fatcat:ws3uct525banxj7q5etwlrqmba

A Hardware Processor Supporting Elliptic Curve Cryptography for Less than 9 kGEs [chapter]

Erich Wenger, Michael Hutter
2011 Lecture Notes in Computer Science  
The total size of the processor is 8,958 GEs for a 0.13 µm CMOS technology and needs 285 kcycles for a point multiplication.  ...  Our results improves the state of the art in low-resource F 2 163 ECC implementations (14 % less area needed compared to the best solution reported).  ...  Modular Multiplication. Modular multiplication has been realized using the carryless multiply-accumulate unit described in Section 5.1.  ... 
doi:10.1007/978-3-642-27257-8_12 fatcat:m6goe2ajkrc4nfckc5snbts7dq

Abstracts of Current Computer Literature

1970 IEEE transactions on computers  
A number of schemes for implementing a fast multiplier are presented and compared on the basis of speed, complexity, and cost.  ...  The factoring algorithm enables the fast Fourier transform to be implemented in general with four nested loops, and with three loops if N is a power of two.  ... 
doi:10.1109/t-c.1970.223037 fatcat:to2dda73zzh2jlyq3um6rpr3pe

A Constant-time AVX2 Implementation of a Variant of ROLLO

Tung Chou, Jin-Han Liou
2021 Transactions on Cryptographic Hardware and Embedded Systems  
our decapsulation time is 2.4x as fast.  ...  Compared to the state-of-the-art implementation of the level-1 parameter set of BIKE by Chen, Chou, and Krausz, our key generation time is 1.4x as slow, but our encapsulation time is 3.8x as fast, and  ...  First of all, as shown in Section 3, our implementation makes use of pclmulqdq for carryless multiplications. Many platforms do not support any instruction for carryless multiplications.  ... 
doi:10.46586/tches.v2022.i1.152-174 fatcat:t4n4teayqva7zelkndtcapa2qm

Frobenius Additive Fast Fourier Transform [article]

Wen-Ding Li, Ming-Shing Chen, Po-Chun Kuo, Chen-Mou Cheng, Bo-Yin Yang
2018 arXiv   pre-print
To the best of our knowledge, this is the first time that FFT-based multiplication outperforms Karatsuba and the like at such a low degree in terms of bit-operation count.  ...  Termed the Frobenius FFT, this discovery has a profound impact on polynomial multiplication, especially for multiplying binary polynomials, which finds ample application in coding theory and cryptography  ...  Arguably, one of the most important applications of FFT is fast polynomial multiplication.  ... 
arXiv:1802.03932v1 fatcat:monrgcd3sjhkbckih6fnvyqavm

The Fragility of AES-GCM Authentication Algorithm

Shay Gueron, Vlad Krasnov
2014 2014 11th International Conference on Information Technology: New Generations  
A new implementation of the GHASH function has been recently committed to a Git version of OpenSSL, to speed up AES-GCM.  ...  We identified a bug in that implementation, and made sure it was quickly fixed before trickling into an official OpenSSL trunk.  ...  blocks (here shown up to 4The symbol • represents carry-less multiplication) GHASH optimization: deferring the reduction modulo P step, by aggregating (via carryless multiplications) the cumulative contribution  ... 
doi:10.1109/itng.2014.31 dblp:conf/itng/GueronK14 fatcat:kuigbu76vvgehndulh2qwkux5m

EverCrypt: A Fast, Verified, Cross-Platform Cryptographic Provider

Jonathan Protzenko, Bryan Parno, Aymeric Fromherz, Chris Hawblitzel, Marina Polubelova, Karthikeyan Bhargavan, Benjamin Beurdouche, Joonwon Choi, Antoine Delignat-Lavaud, Cedric Fournet, Natalia Kulatova, Tahina Ramananandro (+4 others)
2020 2020 IEEE Symposium on Security and Privacy (SP)  
The API provably supports agility (choosing between multiple algorithms for the same functionality) and multiplexing (choosing between multiple implementations of the same algorithm).  ...  We substantiate the effectiveness of these techniques with new verified implementations (including hashes, Curve25519, and AES-GCM) whose performance matches or exceeds the best unverified implementations  ...  For example, the GCM-support instruction (PCLMULQDQ) performs a carryless multiply of its arguments, but the GCM algorithm operates over the Galois field GF (2 128 ), and hence multiplication in the field  ... 
doi:10.1109/sp40000.2020.00114 dblp:conf/sp/ProtzenkoPFHPBB20 fatcat:zbxp4jsbrrdfldn3kiqpceimhu

Efficient Software Implementation of Laddering Algorithms Over Binary Elliptic Curves [chapter]

Diego F. Aranha, Reza Azarderakhsh, Koray Karabina
2017 Lecture Notes in Computer Science  
In this paper, we keep pushing in this direction and study efficient implementation of regular scalar multiplication algorithms for binary curves equipped with efficient endomorphisms.  ...  More recently, AK and DJB laddering algorithms have been employed by Costello et al. for the implementation of point multiplication on elliptic curves defined over prime fields [6] .  ...  The table demonstrates that binary curves are only competitive in platforms supporting efficient vectorized binary field arithmetic through a very fast carryless multiplier.  ... 
doi:10.1007/978-3-319-71501-8_5 fatcat:dfet5za735ge5piomgp5tzejsi

Four-Dimensional Gallant–Lambert–Vanstone Scalar Multiplication

Patrick Longa, Francesco Sica
2013 Journal of Cryptology  
Our implementations improve the state-of-the-art performance of point multiplication for a variety of scenarios including side-channel protected and unprotected cases with sequential and multicore execution  ...  We show in this work how to merge the two approaches in order to get, for twists of any GLV curve over F p 2 , a four-dimensional decomposition together with fast endomorphisms Φ, Ψ over F p 2 acting on  ...  Aranha for his advice on multicore programming and Joppe Bos for his help on looking for efficient chains for implementing modular inversion.  ... 
doi:10.1007/s00145-012-9144-3 fatcat:v6z6u3oktvbopnaql2va65f6ky
« Previous Showing results 1 — 15 out of 51 results