Associative transducers for the parallel processing of streaming data
Crowd-sourcing and the rise of the internet of things are causing a massive increase in the rate of streaming data that needs to be processed. At the same time CPU clock-speeds are stagnating so parallel algorithms are needed to process the high rate of data with low query response times. Automata and transducers are natural models for querying unbounded streams but are inherently sequential, processing each item of data in the stream in order. To continue to use automata and transducers for
... eam processing on increasing data rates, a new approach is needed which offers scaling to large numbers of cores. This thesis introduces a new computational model of associative transducers which transforms the execution of a transducer into an associative operator, associativity which is then used to provide highly scalable data-parallelism. Associative transducers are backed by formal model that provides the theoretical basis for executing transducers in an associative manner and three applications demonstrate their use for processing textual, XML and geospatial data. Each use for associative transducers is individually evaluated against comparable systems, showing almost universal scaling to 64~cores and performance comparable to large MapReduce clusters or commercial database engines but with no need to load the data prior to querying. For geospatial queries in particular, a system based on associative transducers performs some queries three times faster than a comparable system running on a MapReduce cluster while using a third the number of cores.