A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Pré-analyse de corpus
unpublished
Most Natural Language Processing tools need homogeneous corpora in order to deliver relevant results. However , such corpora are rarely available in industrial and applicative contexts. This paper presents an original approach for preparing corpora in order to obtain useful amount of texts. The presented techniques are based on statistical and surface linguistic analysis. We present these techniques and an experiment in the information extraction domain. We demonstrate the different techniques
fatcat:jd6os6z42zbrrerbher75wzy7y