Faster Montgomery and double-add ladders for short Weierstrass curves

Mike Hamburg
The Montgomery ladder and Joye ladder are well-known algorithms for elliptic curve scalar multiplication with a regular structure. The Montgomery ladder is best known for its implementation on Montgomery curves, which requires 5M+4S+1m+8A per scalar bit, and 6 field registers. Here (M, S,m,A) represent respectively field Multiplications, Squarings, multiplications by a curve constant, and Additions or subtractions. This ladder is also complete, meaning that it works on all input points and all
more » ... put points and all scalars. Many protocols do not use Montgomery curves, but instead use prime-order curves in short Weierstrass form. These have historically been much slower, with ladders costing at least 14 multiplications or squarings per bit: 8M + 6S + 27A for the Montgomery ladder and 8M+ 6S + 30A for the Joye ladder. In 2017, Kim et al. improved the Montgomery ladder to 8M+ 4S + 12A + 1H per bit using 9 registers, where the H represents a halving. Hamburg simplified Kim et al.'s formulas to 8M+ 4S + 8A + 1H per bit using 6 registers. Here we present improved formulas which compute the Montgomery ladder on short Weierstrass curves using 8M+ 3S + 7A per bit, and requiring 6 registers. We also give formulas for the Joye ladder that use 9M+3S+7A per bit, requiring 5 registers. One of our new formulas supports very efficient 4-way vectorization. We also discuss curve invariants, exceptional points, side-channel protection and how to set up and finish these ladder operations. Finally, we show a novel technique to make these ladders complete when the curve order is not divisible by 2 or 3, at a modest increase in cost. A sample implementation of these techniques is given in the supplementary material, also posted at
doi:10.13154/tches.v2020.i4.189-208 fatcat:i6ufalykdnfbvcsxpa2ik75dbm