A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit <a rel="external noopener" href="https://journals.linguisticsociety.org/proceedings/index.php/BLS/article/download/3886/3582">the original URL</a>. The file type is <code>application/pdf</code>.
Automatic Extraction of Linguistic Data from Digitized Documents
<span title="2013-12-16">2013</span>
<i title="Linguistic Society of America">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/dfd2cmq4hfepdp5qrfmibg5qx4" style="color: black;">Proceedings of the annual meeting of the Berkeley Linguistics Society</a>
</i>
In lieu of an abstract, here is a brief excerpt:This paper presents a system for automatically extracting linguistic data from digitized linguistic documents using a combination of existing software packages and custom scripts. The system is designed to leverage existing resources in online digital libraries in order to bootstrap the creation of large, multi-lingual linguistic corpora, which can then be used to conduct data-driven experimental research into cross-linguistic or universal
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3765/bls.v39i1.3886">doi:10.3765/bls.v39i1.3886</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/u6i5xs3a7jgcdfdic3zbhcu5gq">fatcat:u6i5xs3a7jgcdfdic3zbhcu5gq</a>
</span>
more »
... ic phenomena. The system identifies instances of foreign-language text accompanied by reference-language translations within the text of printed books that have been scanned into digital format, and extracts these to produce a parallel corpus of example sentences. While the system achieves a high precision on predicting foreign text, its accuracy overall is low, and directions for improvement and future work are identified.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180721072306/https://journals.linguisticsociety.org/proceedings/index.php/BLS/article/download/3886/3582" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/f5/5e/f55e709adc7e0e86090a62a5ff8f9daf4d9d36e8.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3765/bls.v39i1.3886">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="unlock alternate icon" style="background-color: #fb971f;"></i>
Publisher / doi.org
</button>
</a>