TATO: Leveraging on Multiple Strategies for Semantic Textual Similarity

Tu Thanh Vu, Quan Hung Tran, Son Bao Pham
2015 Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)  
In this paper, we describe the TATO system which participated in the SemEval-2015 Task 2a: "Semantic Textual Similarity (STS) for English". Our system is trained on published datasets from the previous competitions. Based on some machine learning techniques, it combines multiple similarity measures of varying complexity ranging from simple lexical and syntactic similarity measures to complex semantic similarity ones to compute semantic textual similarity. Our final model consists of a simple
more » ... ear combination of about 30 main features out of a numerous number of features experimented. The results are promising, with Pearson's coefficients on each individual dataset ranging from 0.6796 to 0.8167 and an overall weighted mean score of 0.7422, well above the task baseline system.
doi:10.18653/v1/s15-2034 dblp:conf/semeval/VuTP15 fatcat:cqi77ld7ufhvblfqjqseiok4we