The Best of Both Worlds: Combining MOCAR and MCDISP

Arjuna Tuzzi, Fiona Tweedie
In the analysis of a corpus of open-ended questions, one of the most important goals is to identify words which distinguish between groups of respondents. The MOCAR procedure within SpadT does this using hypergeometric probabilities (Lebart et al., 1998). However, while the words obtained may only occur within a particular group, the researcher has no indication of their distribution within that group. A word may be chosen which is specific to one or two responses, rather than being
more » ... ve of the group as a whole. We address this problem using the MCDISP procedure developed by Baayen (1996). The words identified by MOCAR can then be checked for significant under-dispersion, which would indicate that they are confined to a subset of the texts. We illustrate this with data from a corpus of open interviews of graduates of the University of Padua.