A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Limites de la lemmatisation pour l'extraction de significations
unpublished
Corpus lemmatization is a widely used procedure which is sometimes done for the sake of following a tradition. This paper highlights the limits of this process in the case of automatic extraction of semantic information, that is, when the context in which words occur is used. First, we uncovered significant differences between contexts of singular and plural forms of 58 nouns in a large French corpus. Systematically replacing plural forms by singular forms might therefore affect the
fatcat:msodxdvlmffnha7qbwaccxhgyq