A pre-expectation calculus for probabilistic sensitivity

Alejandro Aguirre, Gilles Barthe, Justin Hsu, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja
Sensitivity properties describe how changes to the input of a program affect the output, typically by upper bounding the distance between the outputs of two runs by a monotone function of the distance between the corresponding inputs. When programs are probabilistic, the distance between outputs is a distance between distributions. The Kantorovich lifting provides a general way of defining a distance between distributions by lifting the distance of the underlying sample space; by choosing an
more » ... ropriate distance on the base space, one can recover other usual probabilistic distances, such as the Total Variation distance. We develop a relational pre-expectation calculus to upper bound the Kantorovich distance between two executions of a probabilistic program. We illustrate our methods by proving algorithmic stability of a machine learning algorithm, convergence of a reinforcement learning algorithm, and fast mixing for card shuffling algorithms. We also consider some extensions: using our calculus to show convergence of Markov chains to the uniform distribution over states and an asynchronous extension to reason about pairs of program executions with different control flow. training set to a learned model and where the distance between two training sets is the number of differing examples, and the distance between outputs measures the difference in errors labeling unseen examples. This paper is concerned with sensitivity properties of probabilistic programs. Since such programs produce distributions over their output space, the corresponding notions of sensitivity use distances over distributions. The Total Variation (TV) distance (a.k.a. statistical distance), for example, is a widely used notion of distance that measures the maximal difference between probabilities of the same event. One key benefit of the TV distance is that it is defined for distributions over arbitrary spaces. However, it is often useful to consider distances inherited from the underlying space. In this setting, the so-called Kantorovich metric gives a generic method to lift a distance E on a ground set X to a distance E # on distributions over X . The class of Kantorovich metrics cover many notions of distance. For instance, the TV distance can be obtained by applying the Kantorovich lifting to the discrete distance, which assigns distance 1 to any pair of distinct points, and distance 0 to any pair of equal points. Approach. We develop a relational expectation calculus for reasoning about sensitivity of probabilistic computations under the Kantorovich metric. Relational expectations are mappings expressing a quantitative relation (e.g., a distance or metric) between states, and are modelled as maps of the form State×State → [0, ∞]. The heart of our system is a relational expectation pre-expectation transformer, which takes as input a probabilistic program c written in a core imperative language, and a relational expectation E between output states, and produces a relational expectation rpe(c, E) between input states. The calculus is a sound approximation of sensitivity, in the sense that running the program c on inputs at distance smaller than rpe(c, E) yields output distributions at distance smaller than E # . Technically, our calculus draws inspiration from early work on probabilistic dynamic logic due to Kozen [1985] in which maps E : State → [0, ∞] serve as quantitative counterparts of Boolean predicates P : State → {0, 1}. McIver and Morgan [2005] later coined the term expectationÐnot to be confused with expected valuesÐfor such maps E. Moreover, they developed a weakest preexpectation calculus for the probabilistic imperative language pGCL. Their calculus was designed as a generalization of Dijkstra's weakest pre-conditions that supports both probabilistic and nondeterministic choice. The basic idea is to define an operator wpe(c, E) that transforms an expectation E averaged over the output distribution of a program c into an expectation evaluated over the input state. In this way, the expectation is transformed by the effects of the probabilistic program in a backwards fashion, much like how predicates are transformed through Dijkstra's weakest pre-conditions. Our pre-expectation calculus operates similarly, butÐas it aims to measure distances between distributions of outputs in terms of inputsÐmanipulates relational expectations instead. We next motivate the need for relational expectations, and explain why they are challenging. Why do we need relational pre-expectations? The classical weakest pre-expectation calculus enjoys strong theoretical properties: in particular, it is both sound and complete (in an extensional and intensional sense [Batz et al. 2021 ]) w.r.t. common program semantics (cf. Gretz et al. [2014] ). Therefore, weakest pre-expectations canÐin principleÐbe applied to reason about bounds on the Total Variation distance: Given a program c, (i) take a copy c ′ over a fresh set of program variablesÐe.g. if variable x appears in c, substitute it by x ′ in c ′ Ðand (ii) determine the weakest pre-expectation wpe(c; c ′ , E), where the expectation E measures the distance between variables in c and their counterparts in c ′ . However, this naïve approach is not practical for analyzing sensitivity: the TV distance, for example, is defined as a maximum of a difference of probabilities over all events of the output spaceÐ to compute the TV distance, we would need to compute the probability of every single output event. Moreover, the above approach pushes the difficulty of reasoning about sensitivity properties into the task of finding suitable invariants for probabilistic programsÐa highly challenging task on its own. In
doi:10.18154/rwth-2021-05003 fatcat:igkt32ar3nchthy7rxoxs2xthm