Visitors unchained

François Pottier
2017 Proceedings of the ACM on Programming Languages  
Traversing and transforming abstract syntax trees that involve name binding is notoriously difficult to do in a correct, concise, modular, customizable manner. We address this problem in the setting of OCaml, a functional programming language equipped with powerful object-oriented features. We use visitor classes as partial, composable descriptions of the operations that we wish to perform on abstract syntax trees. We introduce visitors, a simple type-directed facility for generating visitor
more » ... sses that have no knowledge of binding. Separately, we present alphaLib, a library of small hand-written visitor classes, each of which knows about a specific binding construct, a specific representation of names, and/or a specific operation on abstract syntax trees. By combining these components, a wide range of operations can be defined. Multiple representations of names can be supported, as well as conversions between representations. Binding structure can be described either in a programmatic style, by writing visitor methods, or in a declarative style, via preprogrammed binding combinators. 28:2 François Pottier in the Haskell world, several ways of eliminating it have been proposed. For instance, Cheney's work [2005] uses "Scrap your Boilerplate" [Lämmel and Peyton Jones 2005], while Weirich et al. [2011] rely on RepLib [Weirich 2006 ]. One contribution of this paper is to deal with this boilerplate, in the setting of OCaml, by exploiting automatically-generated visitor classes. We view visitor classes as incomplete, composable, open-ended descriptions of operations on user-defined data structures. Our visitors come in a limited number of "varieties", such as iter, map, and iter2: fortunately, we find that these three varieties are sufficient for our needs. The rest of the nameplate is concerned with names, binding constructs, of which there is a wide range in the literature, and with operations on abstract syntax trees, of which there is also a wide range: collecting free names, converting between surface and internal representations of names, testing for α-equivalence, performing substitution, and many more come to mind. We argue that, there, too, it is possible to distinguish and separate several concerns. One concern is the treatment of binding constructs. It has been argued in the literature that, instead of baking in support for a limited set of binding constructs, one could offer a domainspecific language, a set of binding combinators, which allow a declarative description of the binding structure. Cαml [Pottier 2006 ] and Unbound [Weirich et al. 2011] offer examples of such domain-specific languages. A second contribution of this paper is to propose that the meaning and implementation of these binding combinators can be made independent of the representation of names. Indeed, what is the meaning of a binding construct? It is summed up by the manner in which an environment is extended with suitable names at suitable points. Thus, to define the meaning of a binding construct or binding combinator, it suffices to write traversal code for it, which can be independent of the representation of names and environments, as long as it has access to a single operation, namely extend, the operation of extending an environment with a (bound) name. In this paper, we implement this traversal code, too, in the form of visitor classes, which are hand-written. The remaining concern, then, is to deal with concrete representations of bound names, free names, and environments, and with concrete operations on these entities. Examples of concrete operations are "collecting the free names of a term in nominal representation" and "substituting a term for a variable in a term in de Bruijn's representation" and "converting a term from nominal representation to de Bruijn's representation". As the issue of traversal (of sums, products, and binding constructs) has been set aside, defining one such concrete operation is a matter of: (1) defining how an environment is extended with a bound name (that is, providing extend) and (2) defining how a free name must be handled. This is done by defining a "kit", a visitor class with two methods. Our approach to defining concrete operations on terms is modular, as it separates the three components described above, namely: an auto-generated visitor for ordinary data (sums and products), visitors for binding constructs or combinators, and "kits", that is, visitors that know how to extend and look up a concrete environment so as to perform a concrete operation. The last two components can be programmed once and for all and placed in a library. We present one such library, alphaLib [Pottier 2017a], which we are developing. This library, currently at an early stage of development, allows easily constructing a toolbox of operations on terms in the "nominal" representation, where both bound names and free names are represented as "atoms" with a unique integer identity [Cheney 2005; Shinwell et al. 2003 ]. It also offers conversions to and from the "raw" representation, where all names are represented as strings. Our approach is not only modular, but also open-ended, that is, customizable. If special behavior is needed when traversing or constructing a user-defined data type, this can be specified by providing a suitable visitor method. As an example, the visitors documentation [Pottier 2017b] shows how to construct visitors for hash-consed data structures. The technique presented there can be combined with the ideas presented in this paper, so one can easily manipulate hash-consed abstract
doi:10.1145/3110272 dblp:journals/pacmpl/Pottier17 fatcat:tp3jzwuatrd37kothu4fzywyse