Rhetorical Agreement: Maintaining Cohesive Conversations
Developing Enterprise Chatbots
To support a natural conversation flow between humans and automated agents, the rhetorical structure of each message must be analyzed. We classified pairs of text paragraphs as either appropriate or inappropriate for one to follow the other based on considerations of both topic and communicative discourse. To represent a multi-sentence message with respect to how it should follow a previous message in a conversation or dialogue, we built an extension of a discourse tree. An extended discourse
... ee is based on a discourse tree for RST relations, with labels for communicative actions and additional arcs for anaphora and ontology-based relations for entities. We refer to such trees as communicative discourse trees (CDTs). We explored the syntactic and discourse features indicative of correct versus incorrect request-response or question-answer pairs. Two learning frameworks were used to recognize such correct pairs: deterministic, nearestneighbor learning of CDTs as graphs and tree kernel learning of CDTs, in which a feature space of all CDT sub-trees is subject to SVM learning. We formed the positive training set from correct pairs obtained from Yahoo Answers, social networks, corporate conversations (including Enron emails) customer complaints and interviews by journalists. The corresponding negative training set was artificially created by attaching responses for different inappropriate answers that nevertheless cover the topics of questions. The evaluation showed that it is possible to recognize valid pairs in 70% of cases in the domains of weak request-response agreement and 80% of cases in the domains of strong agreement. Recognition of such pairs is essential to support automated conversations. These accuracy rates are comparable to the benchmark task of classifying discourse trees as either valid or invalid. They are also comparable to the classification of multisentence answers in factoid question-answering systems. We conclude that learning rhetorical structures in the form of CDTs is a key source of data to support answering complex questions.