Zen and the Art of Symbolic Computing: Light and Fast Applicative Algorithms for Computational Linguistics [chapter]

Gérard Huet
2002 Lecture Notes in Computer Science  
Computational linguistics is an application of computer science which presents interesting challenges from the programming methodology point of view. Developing a realistic platform for the treatment of a natural language in its phonological, morphological, syntactic, and ultimately semantic aspects demands a principled modular architecture with complex cooperation between the various layers. Representing large lexical data bases, treating sophisticated phonological and morphological
more » ... ions, and processing in real time large corpuses demands fast finite-state methods toolkits. Analysing the syntactic structure, computing anaphoric relations, and dealing with the representation of information flow in dialogue understanding, demands the processing of complex constraints on graph structures, with sophisticated sharing of large nondeterministic search spaces. The talk reports on experiments in using declarative programming for the processing of the sanskrit language, in its phonological and morphological aspects. A lexicon-based morphological tagger has been designed, using an original algorithm for the analysis of euphony (the so-called sandhi process, which glues together the words of a sentence in a continuous stream of phonemes). This work, described in [2], has been implemented in a purely applicative core subset of Objective Caml [5] . The basic structures underlying this methodology have been abstracted in the Zen toolkit, distributed as free software [3] . Two complementary techniques have been put to use. Firstly, we advocate the systematic use of zippers [1] for the programming of mutable data structures in an applicative way. Zippers, or linear contexts, are related to the interaction combinators of linear logic. Secondly, a sharing functor allows the uniform minimisation of inductive data structures by representing them as shared dags. This is similar to the traditional technique of bottom-up hashing, but the computation of the keys is left to the client invoking the functor, which has two advantages: keys are computed along with the bottom-up traversal of the structure, and more importantly their computation may profit of specific statistical properties of the data at hand, optimising the buckets balancing in ways which would be unattainable by generic functions. These two complementary technologies are discussed in [4] . The talk discusses the use of these tools in the uniform representation of finite state automata and transducers as decorated lexical trees (also
doi:10.1007/3-540-36388-2_3 fatcat:b3e2pe3krbfadcqpreytu7zdse