A Novel Use of Kernel Discriminant Analysis as a Higher-Order Side-Channel Distinguisher [chapter]

Xinping Zhou, Carolyn Whitnall, Elisabeth Oswald, Degang Sun, Zhu Wang
2018 Lecture Notes in Computer Science  
Distinguishers play an important role in Side Channel Analysis (SCA), where real world leakage information is compared against hypothetical predictions in order to guess at the underlying secret key. However, the direct relationship between leakages and predictions can be disrupted by the mathematical combining of d random values with each sensitive intermediate value of the cryptographic algorithm (a so-called "d-th order masking scheme"). In the case of software implementations, as long as
more » ... masking has been correctly applied, the guessable intermediates will be independent of any one point in the trace, or indeed of any tuple of fewer than d + 1 points. However, certain d + 1-tuples of time points may jointly depend on the guessable intermediates. A typical approach to exploiting this data dependency is to pre-process the trace -computing carefully chosen univariate functions of all possible d + 1tuples -before applying the usual univariate distinguishers. This has a computational complexity which is exponential in the order d of the masking scheme. In this paper, we propose a new distinguisher based on Kernel Discriminant Analysis (KDA) which directly exploits properties of the mask implementation without the need to exhaustively pre-process the traces, thereby distinguishing the correct key with lower complexity. Experimental results for 2nd and 3rd order attacks (i.e. against 1st and 2nd order masking) verify that the KDA is an effective distinguisher in protected settings. Analysis, Side Channel Distinguisher ity of such attacks began to become apparent with the work of Kocher et al. in the late 1990s [11] . Software countermeasures such as masking [8] successfully disrupt the relationship between sensitive intermediate values and single points of observed leakage -precisely the trace feature that Differential Power Analysis (DPA) in particular targets 4 . However, tuples of points of size greater than the number of masks d can still jointly depend on the sensitive intermediates. This gives rise to so-called 'higher order' DPA [14] , which typically proceeds by combining multiple points via some (non-linear) pre-processing function before applying a standard DPA distinguisher -essentially treating the pre-processed traces in a univariate manner, albeit with an exponential (in d) increase in the impact of noise relative to a 'first order' attack [22] . Aside from the greater data complexity implied by the inflated noise, higherorder attacks are also hampered by the increasing difficulty of locating the leaking tuples. The computational complexity of an 'exhaustive search' approachin which all possible point combinations are computed and analysed -grows exponentially with d. Heuristics exist to reduce the search problem by placing informed restrictions on the regions of the trace to be iteratively explored [9] but, precisely because of their heuristic nature, these do not guarantee to find the best (or indeed any) exploitable combinations. A recent proposal (presented at Cardis 2016 [6]) aims to bypass the need for explicit enumeration of the (d + 1)tuples without recourse to heuristics, using Kernel Discriminant Analysis (KDA) [15] . KDA is a generalisation of Linear Discriminant Analysis (LDA), a statistical method to find linear combinations of features (i.e. variables in a dataset, or points in a trace) that characterise class separations. In particular, it outputs projection directions that maximise the ratio of between-group to withingroup scatter, so that 'interesting' variation may be concentrated into a reduceddimension space for further analysis. LDA has been promoted as one of a number of methods to extract sensitive data dependent features from side-channel traces for some years (beginning with [24] , to the best of our knowledge). However, because it only finds linear combinations, it is unable to locate the types of joint data dependencies exhibited by traces which have been protected by software masking. By contrast, the 'kernel trick' employed by KDA allows to implicitly map the data into a higher dimensional feature space within which to perform the discriminant analysis, thereby extracting non-linear combinations of the sort that (in the case of DPA) do yield sensitive information on further analysis. Because the mapping of the tuple candidates need not be computed explicitly (by contrast with the preprocessing required by established higher order DPA methodologies), its complexity is polynomial, rather than exponential, in d. However, another recent development in the literature has been to demonstrate the direct applicability of LDA as a side-channel distingisher, not just a pre-processing method. In this capacity, it shares the advantages of other 4 Hardware masking schemes also exist, which process shares in parallel but shift the exploitable leakage into higher moments of the (univariate) trace distributions [18] .
doi:10.1007/978-3-319-75208-2_5 fatcat:nrmo67rktbfinfipx6ovgliray