A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
This paper presents a novel self-training approach that we use to explore a scenario which is typical for under-resourced languages. We apply self-training on small multilingual dependency corpora of nine languages. Our approach employs a confidence-based method to gain additional training data from large unlabeled datasets. The method has been shown effective for five languages out of the nine languages of the SPMRL Shared Task 2014 datasets. We obtained the largest absolute improvement of twodblp:conf/depling/YuB15 fatcat:bxa3kya3nzhejimoqx3o52kuda