Syntax description synthesis using gradient boosted trees

Arseny Astashkin, Kirill Chuvilin
2017 2017 20th Conference of Open Innovations Association (FRUCT)  
The article considers partially formalized text documents. For such documents, it is not possible to construct a formal grammar. Therefore, an external syntax description is used to build the syntax tree. The problem is the high labor intensity and the high professional requirements for manual preparation of such descriptions. It is proposed to use machine learning methods to automate this process. The training set is composed using the documents with known syntax description. Each document is
more » ... . Each document is represented as a syntax tree using the T E Xnous parser. Each node of these trees represents a syntax element, and the set of nodes forms the training set. A way of a single syntax element description is proposed so that a formal description of the syntax elements constitutes the space of classes. In the article, this space is limited to the set of parser modes used during the documents analysis. A set of scientific articles is used for the experiments. XGBoost implementation of gradient boosted trees is chosen for result classification problem.
doi:10.23919/fruct.2017.8071289 dblp:conf/fruct/AstashkinC17 fatcat:pv4a6yf6snc2dgcco6fjrpqn4m