Advances in statistical script learning
To my father, who didn't quite make it to see his son get a doctorate, and mother. Acknowledgments It takes a village to make a hermit. The, frankly, shocking completion of my doctoral studies would have been inconceivable without the love, support, and antagonism from so many friends, family members, and professional colleagues. Thanks are due, first and foremost, to my advisor, Ray Mooney, who tolerated (and even encouraged!) my various efforts to cram the (partial) meanings of whole %$!@ing
... s of whole %$!@ing sentences, and even %$!@ing discourses, into single %$!@ing vectors, among other things. My intellectual debt to my advisor is great. His support and guidance have been unfailing, extremely helpful, and, suffice it to say, animated. I owe a debt of gratitude also to Vladimir Lifschitz, who worked patiently with me during my first year of graduate school on problems in formal computational logic. I learned from him a good deal about how to approach, decompose, and solve thorny and complex problems, which, it turns out, is very useful. Though, at this particular point in history, a decade ago feels like a century, I nonetheless also owe a debt of intellectual gratitude to my undergraduate honors thesis advisor from a decade ago, Lauri Karttunen, who gave me a wonderful hands-on tutorial on parsimonious linguistic analysis. I'd be remiss in not also thanking Katrin Erk, who's been consistently supportive of me and my work, and whose various conversations and suggestions over the years have always been extremely helpful. The two summers I spent at Google, working with John DeNero and Saro Meguerdichian, had a formidable influence on me. I owe both of them, in addition v to the other wonderful people I worked with there, a nontrivial debt of gratitude. Also, the various grad student co-conspirators with whom I raised Cain during those summers, including , were, and continue to be, really helpful to talk shop with, bounce dumb ideas off, and troll both online and offline. More generally, the whole NLP community, whose various scions and roustabouts are too numerous to enumerate, is a really wonderful group, and getting to know (and even become one of) them over the past years has been a wonderful and rewarding experience. I've really wasted a tremendous amount of the time of my labmates and fellow graduate students over the past years, and without their help, attention, and friendship, there's basically no way I'd have finished my PhD. This set of people includes, but is not limited to, and Ayan Acharya. This is to say nothing of many other friends whose time I've further wasted in manifold other nonprofessional ways, who are numerous, but, thankfully to them, no doubt, need not be explicitly shamed here. Finally, thanks to my family, who have been consistently supportive of me for reasons recondite; in particular, thanks to my wife Amelia, absent whom I suspect I'd have long ago moved to an off-the-grid cabin somewhere in the woods (and she, generally mutatis mutandis, to the desert), my speech devolving to a strange proto-language, my cosmology descending into a Judge-Holden-style ur-violence, When humans encode information into natural language, they do so with the clear assumption that the reader will be able to seamlessly make inferences based on world knowledge. For example, given the sentence "Mrs. Dalloway said she would buy the flowers herself," one can make a number of probable inferences based on event co-occurrences: she bought flowers, she went to a store, she took the flowers home, and so on. Observing this, it is clear that many different useful natural language endtasks could benefit from models of events as they typically co-occur (so-called script models). Robust question-answering systems must be able to infer highlyprobable implicit events from what is explicitly stated in a text, as must robust information-extraction systems that map from unstructured text to formal assertions about relations expressed in the text. Coreference resolution systems, semantic role labeling, and even syntactic parsing systems could, in principle, benefit from event co-occurrence models. To this end, we present a number of contributions related to statistical event co-occurrence models. First, we investigate a method of incorporating multiple entities into events in a count-based co-occurrence model. We find that modeling mulvii tiple entities interacting across events allows for improved empirical performance on the task of modeling sequences of events in documents. Second, we give a method of applying Recurrent Neural Network sequence models to the task of predicting held-out predicate-argument structures from documents. This model allows us to easily incorporate entity noun information, and can allow for more complex, higher-arity events than a count-based co-occurrence model. We find the neural model improves performance considerably over the count-based co-occurrence model. Third, we investigate the performance of a sequence-to-sequence encoderdecoder neural model on the task of predicting held-out predicate-argument events from text. This model does not explicitly model any external syntactic information, and does not require a parser. We find the text-level model to be competitive in predictive performance with an event level model directly mediated by an external syntactic analysis. Finally, motivated by this result, we investigate incorporating features derived from these models into a baseline noun coreference resolution system. We find that, while our additional features do not appreciably improve top-level performance, we can nonetheless provide empirical improvement on a number of restricted classes of difficult coreference decisions.