A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
A Tidy Data Model for Natural Language Processing using cleanNLP
2017
The R Journal
Recent advances in natural language processing have produced libraries that extract lowlevel features from a collection of raw texts. These features, known as annotations, are usually stored internally in hierarchical, tree-based data structures. This paper proposes a data model to represent annotations as a collection of normalized relational data tables optimized for exploratory data analysis and predictive modeling. The R package cleanNLP, which calls one of two state of the art NLP
doi:10.32614/rj-2017-035
fatcat:55i4l3urxnfdnesqvrlxixmuea