The DOP Estimation Method Is Biased and Inconsistent

Mark Johnson
2002 Computational Linguistics  
A "Data-Oriented Parsing" or DOP model for statistical parsing associates fragments of linguistic representations with numerical weights, where these weights are estimated by normalizing the empirical frequency of each fragment in a training corpus (see Bod (1998) and references cited therein). This note observes that this estimation method is biased and inconsistent; i.e., that the estimated distribution does not in general converge on the true distribution as the size of the training corpus
more » ... creases. * Cognitive and Linguistic Sciences, Providence, RI 02912. I would like to thank Rens Bod, Michael Collins, Eugene Charniak, David MacAllester and the anonymous reviewers for their excellent advice. c XXXX Association for Computational Linguistics
doi:10.1162/089120102317341783 fatcat:hsx5shojtrevnn2fqsjm2qwqyy