Automatic extraction of a semantic representation of English sentences

Martin Scaiano, Université D'Ottawa / University Of Ottawa, Université D'Ottawa / University Of Ottawa
Many natural language processing tasks are implemented using methods which do not understand language but rather its statistical properties. This thesis presents a system for extracting a semantic frame representation from sentences with the intent of building document-understanding systems. Many frame extraction systems have been motivated by a competition, such as SemEval 2007 task 19, or as a proof of concept; these systems have not been applied to further work in other tasks. The system
more » ... sks. The system presented here is intended as a foundation for developing new algorithms that use a frame-based semantic representation. Frame extraction has two parts: frame identification and role labeling. Frame identification is a fairly simple task, while role labeling is a more difficult task that has received extensive attention in recent years. The system presented here uses two layers of semantic labeling in all tasks: first a general-purpose role labeling done in the initial parse tree and then, using that information as guidance, the roles are labeled with frame-specific information. Several machine learning frameworks are explored for both tasks, usually with varying features and with different divisions of the tasks per classifier. We experimented with three types of classifiers: Naive Bayes, Decision Trees, and Support Vector Machines. The system's performance is evaluated by cross-validation on the FrameNet data, and against the SemEval 2007: Frame extraction task data. The system presented here is comparable to other state-of-the-art systems. When considering the intended use of the system and the fact that no optimizations have been done for the SemEval 2007 task, the system's results are promising, especially from the point of view of its precision.
doi:10.20381/ruor-19088 fatcat:egl47uv7tvcorggctm5mu7ggfq