A scalable and high performance elliptic curve processor with resistance to timing attacks

2005
International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II
The architecture of this processor is based on the Galois Field of GF(2 n ) and the

*bit*-*serial*field multiplier and*squarer*are designed. ... The point multiplication algorithm (double-add-subtract) is modified so that the processor performs the same operations for every 3*bits*of the scalar k independent of the*bit*pattern of the 3*bits*. ... In our case we have chosen the*bit*-*serial*implementation of the GF multiplier and*squarer*operations. ...##
###
Square-rich fixed point polynomial evaluation on FPGAs

2014
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays - FPGA '14
The conventional algorithm, referred to as Horner's rule, involves the least

*number*of steps but can lead to increased latency due to*serial*computation. ... By using a*squarer*design that is more efficient than general multiplication, this can result in polynomial evaluation with a 57.9% latency reduction over Horner's rule and 14.6% over Estrin's method, ... While the structure is simple, Horner's rule suffers from a*long*latency due to the*serial*arrangement of operations. ...##
###
Fast Arithmetic Architectures for Public-Key Algorithms over Galois Fields GF((2n)m)
[chapter]

1997
Lecture Notes in Computer Science
The approach explores

*bit*parallel arithmetic in the sub eld GF(2 n ), and*serial*processing for the extension eld arithmetic. ... In particular, the*number*of clock cycles for one eld multiplication, which is the atomic operation in most public-key schemes, can be reduced by a factor of n compared to all other known realizations. ... (XOR), registers (in*bits*), an*number*of clock cycles for one multiplication, respectively. ...##
###
Fast arithmetic for public-key algorithms in Galois fields with composite exponents

1999
IEEE transactions on computers
The approach explores

*bit*parallel arithmetic in the sub eld GF(2 n ), and*serial*processing for the extension eld arithmetic. ... The*bit*parallel*squarer*architectures have been completely revised. 1 optimizations are discussed. We provide two di erent approaches to squaring. ... The U operand is fed into the architectures in a*bit**serial*manner, most signi cant*bit*rst. ...##
###
FPGA High Performance Pipelined Architecture Of Elliptic Scalar Multiplication Over GF(2m) for IOT

2017
International Journal for Research in Applied Science and Engineering Technology
I estimate the maximum

*number*of different*bit*-width multiplier cores that could be mapped to the Virtex-6 and their performance. ... The*number*of pipeline stages in the architecture also critically affects the computation time. The optimal*number*of pipeline stages in the design. ...##
###
Efficient Hardware Implementation Of An Elliptic Curve Cryptographic Processor Over Gf (2 163)

2012
Zenodo
field with degree 163 in 11.92 s with the maximum achievable frequency of 251 MHz on Xilinx Virtex-4 (XC4VLX200) while 22% of the chip area is occupied, where G is the digit size of the underlying digit-

*serial*... So, we need to use a 2 to 1 multiplexer that is controlled with the key*bits*. Therefore, in order to avoid*long*critical path, another strategy should be considered. ... Output of this*squarer*together with a*number*of combinational gates such as AND, OR, and NOT gates are connected to the input of the multiplier. ...##
###
A Novel Low-Area Point Multiplication Architecture for Elliptic-Curve Cryptography

2021
Electronics
The hardware resources are reduced with the use of a

*bit*-*serial*(traditional schoolbook) multiplication method. ... For a pair of m*bit*polynomial multiplications, m clock cycles are needed in the*bit*-*serial*multiplication method. ... In this context, there exist four possibilities:*bit*-*serial*, digitserial,*bit*-parallel, and digit-parallel to compute polynomial multiplication. ...##
###
Public Key Cryptography in Sensor Networks—Revisited
[chapter]

2005
Lecture Notes in Computer Science
algorithms-Rabin's Scheme and NtruEncrypt-and analyze their architecture and performance according to various established metrics like power consumption, area, delay, throughput, level of security and energy per

*bit*...*Squarers*can be implemented in many ways. As our main concern is to conserve power we chose a*bit*-*serial*approach. ... Hence, we built a*squarer*as a*bit*-*serial*multiplier, operating on the entire width of the 512*bit*multiplicand and on a single*bit*of the multiplier at a time. ...##
###
A 26.9 K 314.5 Mb/s Soft (32400,32208) BCH Decoder Chip for DVB-S2 System

2010
IEEE Journal of Solid-State Circuits
For the high-speed and

*long*-distance data transmission, the BCH codes with*long*block length are specified to suppress the error floor due to iterative LDPC decoding. ... In contrast with the hard BCH decoder, the proposed soft BCH decoder that deals with least reliable*bits*can provide much lower complexity with similar error-correcting performance. ... Notice that, if the error-correcting capability is equal to 1, the*number*of multipliers and*squarers*is 0 because only will be computed. ...##
###
Exploration of Design Space in ECDSA
[chapter]

2002
Lecture Notes in Computer Science
Polynomial basis inverter and multiplier The multiplication (X * Y ) mod p(x) is implemented by

*serial*add-and-shift algorithm. Three m*bits**long*registers (R w , R y R v ) are used. ... The register R x is m + 1*bits**long*, the remaining three m*bits**long*. At the start, R x holds the field polynomial p(x), R y resp. R w contain the divisor y resp. the dividend X, and R v is cleared. ...##
###
Towards Efficient FPGA Implementation of Elliptic Curve Crypto-Processor for Security in IoT and Embedded Devices

2020
Menoufia Journal of Electronic Engineering Research
The aim is to obtain the optimal registers

*number*for an area optimization of ECCP architecture. ... A*bit**serial*multiplier is a good choice for area but a*bit*parallel is a good choice for time. ... occupied Slices Time in ns*Bit*parallel 163 - 4,123 4.69 Pipeline Digit*serial*82 380.12 2,718 5.26 Pipeline Digit*serial*42 408.69 1,497 9.78*Bit*parallel 409 - 17,560 5 Pipeline Digit*serial*205 276.095 ...##
###
Page 458 of IEEE Transactions on Computers Vol. 52, Issue 4
[page]

2003
IEEE Transactions on Computers
167-

*bit**squarer*and is highly optimized for the high-level architecture design and lower gate level design. ... But, its is not metic, while being too*long*for EC operations. ...##
###
On Parallelization of High-Speed Processors for Elliptic Curve Cryptography

2008
IEEE Transactions on Very Large Scale Integration (vlsi) Systems
A

*bit*-*serial*multiplier computes one*bit*of the output per cycle with a single processing block resulting in latency of . ... Hence, a*bit*-*serial*implementation of the Massey-Omura multiplier requires three -*bit*shift registers and one -function block. ...##
###
Parabolic synthesis methodology implemented on the sine function

2009
2009 IEEE International Symposium on Circuits and Systems
When analyzing the

*squarer*in Fig. 2 , it was found that the resemblance to a*bit*-*serial**squarer*[6] [7] is large. ... By introducing registers in the design of the*bit*-*serial**squarer*the partial results of x n 2 is easily extracted. ...
