Filters








49 Hits in 7.3 sec

Orthographic Errors in Web Pages: Toward Cleaner Web Corpora

Christoph Ringlstetter, Klaus U. Schulz, Stoyan Mihov
<span title="">2006</span> <i title="MIT Press - Journals"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/e4yflo6sufcufjwloald7xudm4" style="color: black;">Computational Linguistics</a> </i> &nbsp;
In this article we investigate the distribution of orthographic errors of various types in Web pages.  ...  As a by-product, methods are developed for efficiently detecting erroneous pages and for marking orthographic errors in acceptable Web documents, reducing thus the number of errors in corpora and linguistic  ...  Which methods help to automatically detect Web pages with many orthographic errors? Which methods help to mark orthographic errors found in Web pages?  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1162/coli.2006.32.3.295">doi:10.1162/coli.2006.32.3.295</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6p63yzddhzbk3lp3ueps2ggxta">fatcat:6p63yzddhzbk3lp3ueps2ggxta</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20110401221626/http://aclweb.org/anthology-new/J/J06/J06-3001.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/62/fe/62fe8d3e79a3f0f3df7a5d33ce1af9164c933cee.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1162/coli.2006.32.3.295"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> mitpressjournals.org </button> </a>

Texts in, meaning out: neural language models in semantic similarity task for Russian [article]

Andrey Kutuzov, Igor Andreev
<span title="2015-04-30">2015</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora.  ...  High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study.  ...  Finally, News is even larger than Web, but cleaner and biased towards one particular genre.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1504.08183v1">arXiv:1504.08183v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/qactgnco6bannjqbed2j2gtyvy">fatcat:qactgnco6bannjqbed2j2gtyvy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200902130706/https://arxiv.org/ftp/arxiv/papers/1504/1504.08183.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/82/06/82066952b9360b32e50e865010d7db0fe1e5da6b.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1504.08183v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering

Philipp Koehn, Huda Khayrallah, Kenneth Heafield, Mikel L. Forcada
<span title="">2018</span> <i title="Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ofil6chs6zhkndqe6tcritiiwm" style="color: black;">Proceedings of the Third Conference on Machine Translation: Shared Task Papers</a> </i> &nbsp;
Seventeen participants from companies, national research labs, and universities participated in this task.  ...  We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high-quality data to be  ...  Acknowledgements The shared task was supported by a Google Faculty Research Award to Johns Hopkins University and by the European Union through the Connected Europe Facility project Provision of Web-  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/w18-6453">doi:10.18653/v1/w18-6453</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/wmt/KoehnKHF18.html">dblp:conf/wmt/KoehnKHF18</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/tgouqo2aizhhtd7bmjsvyuxryu">fatcat:tgouqo2aizhhtd7bmjsvyuxryu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201106213212/https://www.pure.ed.ac.uk/ws/files/77079263/Findings_of_the_WMT_2018_Shared_Task_on_Parallel_Corpus_Filtering.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6a/31/6a31c23761b3d0f9031ac391dd4a4c896c39819f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/w18-6453"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Crawl and crowd to bring machine translation to under-resourced languages

Antonio Toral, Miquel Esplá-Gomis, Filip Klubička, Nikola Ljubešić, Vassilis Papavassiliou, Prokopis Prokopidis, Raphael Rubino, Andy Way
<span title="2016-06-25">2016</span> <i title="Springer Nature"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/qiptgj2ubngu3hrrsrkbdvpchi" style="color: black;">Language Resources and Evaluation</a> </i> &nbsp;
We present a widely applicable methodology to bring machine translation (MT) to under-resourced languages in a cost-effective and rapid manner.  ...  Our proposal relies on web crawling to automatically acquire parallel data to train statistical MT systems if any such data can be found for the language pair and domain of interest.  ...  to keep the web pages in the language of interest, e.g.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s10579-016-9363-6">doi:10.1007/s10579-016-9363-6</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/kl7gpyhu6ncuphi4pdk3yf55qy">fatcat:kl7gpyhu6ncuphi4pdk3yf55qy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190223034116/http://pdfs.semanticscholar.org/48bf/e9aa9aa6a6389e2d2e0484dc216ce173c511.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/48/bf/48bfe9aa9aa6a6389e2d2e0484dc216ce173c511.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s10579-016-9363-6"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

Philipp Koehn, Francisco Guzmán, Vishrav Chaudhary, Juan Pino
<span title="">2019</span> <i title="Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ofil6chs6zhkndqe6tcritiiwm" style="color: black;">Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)</a> </i> &nbsp;
., 2018), we posed the challenge of assigning sentencelevel quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2% and 10% of the highest-quality  ...  Eleven participants from companies, national research labs, and universities participated in this task.  ...  In Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, pages 136-144, Portland, Oregon. Association for Compu- tational Linguistics.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/w19-5404">doi:10.18653/v1/w19-5404</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/wmt/KoehnGCP19.html">dblp:conf/wmt/KoehnGCP19</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/76jaoe7bhvhqpmqr6bssd5ox4m">fatcat:76jaoe7bhvhqpmqr6bssd5ox4m</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200505083037/https://www.aclweb.org/anthology/W19-5404.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/60/e9/60e928df18d1371a6c5157738c87b6f3bc055644.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/w19-5404"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Analysis of named entity recognition and linking for tweets

Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp, Genevieve Gorrell, Raphaël Troncy, Johann Petrak, Kalina Bontcheva
<span title="">2015</span> <i title="Elsevier BV"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/v5dch4enzne6phusiwdh25za24" style="color: black;">Information Processing &amp; Management</a> </i> &nbsp;
systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.  ...  Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity  ...  Acknowledgments The authors thank Roland Roller and Sean McCorry of the University of Sheffield, and the CrowdFlower workers, for their help in annotating the entity-linked dataset; and the reviewers for  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.ipm.2014.10.006">doi:10.1016/j.ipm.2014.10.006</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3ikmvocd75h7rljgxjszeku4gu">fatcat:3ikmvocd75h7rljgxjszeku4gu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170812062225/http://www.eurecom.fr/~troncy/Publications/Derczynski_Troncy-ipm15.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/01/f1/01f1a5f3972fabb1054c5d6d8ba10e5cc3158f2f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.ipm.2014.10.006"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> elsevier.com </button> </a>

The BEA-2019 Shared Task on Grammatical Error Correction

Christopher Bryant, Mariano Felice, Øistein E. Andersen, Ted Briscoe
<span title="">2019</span> <i title="Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/nl6giaap3jbj7g4lipzi4jvbbq" style="color: black;">Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications</a> </i> &nbsp;
Towards a stan- dard evaluation method for grammatical error detec- tion and correction.  ...  There is a significant difference in the proportion of punctuation (PUNCT) errors across corpora. Punctuation errors account for just 5% of all errors in NUCLE, but almost 20% in W&I.  ...  The differences in how these metrics would rank each team are also shown, where a darker red indicates a lower rank.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/w19-4406">doi:10.18653/v1/w19-4406</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/bea/BryantFAB19.html">dblp:conf/bea/BryantFAB19</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/rta345dpvfg27mxbdp3shlo6ki">fatcat:rta345dpvfg27mxbdp3shlo6ki</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200505133316/https://www.aclweb.org/anthology/W19-4406.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3f/37/3f3704d87860a816ac3cc7257a9acccf0d463b7a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/w19-4406"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations [article]

Manuela Sanguinetti, Lauren Cassidy, Cristina Bosco, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah, Amir Zeldes
<span title="2020-11-03">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation  ...  The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle  ...  All remaining errors are ours. The work of C.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2011.02063v1">arXiv:2011.02063v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ubr7qywxxrakzjfbfnzrtmarlm">fatcat:ubr7qywxxrakzjfbfnzrtmarlm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201107233241/https://arxiv.org/pdf/2011.02063v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ff/f1/fff1aaa53a249da73ab5f71a023427c23ea7d006.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2011.02063v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Design Proposal of an Online Corpus-Driven Dictionary of Portuguese for University Students

Tanara Zingano Kuhn
<span title="">2019</span> <i title="Ubiquity Press, Ltd."> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/3uiivkulajeyndw6c66qakyppm" style="color: black;">Journal of Portuguese Linguistics</a> </i> &nbsp;
As shown in Chapter 3, at present there are no 136 See the Sketch Engine web page for further information: https://www.sketchengine.co.uk/ corpora meeting that special demand.  ...  on this cleaner version of the corpus.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5334/jpl.209">doi:10.5334/jpl.209</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ofqxdeezmnav3mzbujrgmzt3j4">fatcat:ofqxdeezmnav3mzbujrgmzt3j4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190429022558/http://repositorio.ul.pt/bitstream/10451/32013/1/ulfl242468_td.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c0/e8/c0e85a15ac65d2672a2d06fb27e90ae850a75607.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5334/jpl.209"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics [article]

Preslav Nakov
<span title="2019-11-23">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Traditionally the Web has been viewed as a source of page hit counts, used as an estimate for n-gram word frequencies.  ...  I address noun compound semantics by automatically generating paraphrasing verbs and prepositions that make explicit the hidden semantic relations between the nouns in a noun compound.  ...  to extraction errors.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1912.01113v1">arXiv:1912.01113v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3ubstjtd7nhphmw4kbewls3onq">fatcat:3ubstjtd7nhphmw4kbewls3onq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200929202252/https://arxiv.org/pdf/1912.01113v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e8/19/e819dce1e060bee0610d83ff4615b216e52d8cd5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1912.01113v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

The contingent meaning of –exbrand names in English

Laurel Smith Stvan
<span title="">2006</span> <i title="Edinburgh University Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/n2kfxy4mevhmtfixsdny2rk57q" style="color: black;">Corpora</a> </i> &nbsp;
Seven hundred and ninety-three -ex brand name types were collected and examined, derived from American English texts in the Brown and Frown corpora as well as over 600 submissions to the US Patent and  ...  Yet, despite ambiguities in its interpretation, the -ex form shows increasing use.  ...  For example, examining user postings on medical web sites and discussion groups, Malouf et al. (2006: 125) extracted non-brand terms occurring at least fifteen times on the same page as brand names,  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3366/cor.2006.1.2.217">doi:10.3366/cor.2006.1.2.217</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7cxbfh6jhvdydnpa3v67cafnua">fatcat:7cxbfh6jhvdydnpa3v67cafnua</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20140611041136/http://www.uta.edu/faculty/stvan/stvan06-ex-brands.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a6/71/a671616a5a337b48aefd3d39f30529c30e0126f3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3366/cor.2006.1.2.217"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Microblog-genre noise and impact on semantic annotation accuracy

Leon Derczynski, Diana Maynard, Niraj Aswani, Kalina Bontcheva
<span title="">2013</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/rlbme3lpqzcqnmggrcv27szery" style="color: black;">Proceedings of the 24th ACM Conference on Hypertext and Social Media - HT &#39;13</a> </i> &nbsp;
Consequently, errors are cumulative, and earlier-stage problems can severely reduce the performance of final stages.  ...  Semantic annotation of tweets is typically performed in a pipeline, comprising successive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation  ...  The task is generally approached in two stages: first, the identification of orthographic errors in an input discourse, and second, the correction of these errors.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2481492.2481495">doi:10.1145/2481492.2481495</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/ht/DerczynskiMAB13.html">dblp:conf/ht/DerczynskiMAB13</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/lrclf4a2zzcdth7cdvxchxnd54">fatcat:lrclf4a2zzcdth7cdvxchxnd54</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20131001093842/http://derczynski.com:80/sheffield/papers/ner_issues.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e3/9c/e39cdc711faddc16945c1c888aaaba2b3e249f81.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2481492.2481495"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

A corpus-based survey of four electronic swahili–english bilingual dictionaries

G De Pauw, G De Schryver, P Wagacha
<span title="2009-12-15">2009</span> <i title="African Journals Online (AJOL)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/vi7rktaysnczdiyvryu64lr4q4" style="color: black;">Lexikos</a> </i> &nbsp;
Aided by a data-driven morphological analyzer and part-of-speech tagger, we quantify the coverage of the dictionaries on large monolingual corpora of Swahili.  ...  In a second series of experiments, we investigate how applicable the dictionaries are as a tool in the development of a machine translation system, by evaluating bilingual coverage on the parallel SAWA  ...  Gilles-Maurice de Schryver would like to thank Ghent University for its continued support of his field trips in Africa.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.4314/lex.v19i1.49134">doi:10.4314/lex.v19i1.49134</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/fi4cnlex55gpdfkutvviumlwwu">fatcat:fi4cnlex55gpdfkutvviumlwwu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170922010650/https://www.ajol.info/index.php/lex/article/viewFile/49134/35479/" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/fe/f7/fef7b461206da45cdd24ac43b8afd97f2cb74ede.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.4314/lex.v19i1.49134"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

A Corpus-based Survey of Four Electronic Swahili–English Bilingual Dictionaries

Guy De Pauw, Gilles-Maurice De Schryver, Peter Waiganjo Wagacha
<span title="2011-10-20">2011</span> <i title="Stellenbosch University"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/vi7rktaysnczdiyvryu64lr4q4" style="color: black;">Lexikos</a> </i> &nbsp;
Aided by a data-driven morphological analyzer and part-of-speech tagger, we quantify the coverage of the dictionaries on large monolingual corpora of Swahili.  ...  In a second series of experiments, we investigate how applicable the dictionaries are as a tool in the development of a machine translation system, by evaluating bilingual coverage on the parallel SAWA  ...  Gilles-Maurice de Schryver would like to thank Ghent University for its continued support of his field trips in Africa.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5788/19-0-443">doi:10.5788/19-0-443</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/trnsrnd4sfbldlufljqee6cyle">fatcat:trnsrnd4sfbldlufljqee6cyle</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170922010650/https://www.ajol.info/index.php/lex/article/viewFile/49134/35479/" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/fe/f7/fef7b461206da45cdd24ac43b8afd97f2cb74ede.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5788/19-0-443"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Automated Transcription of Non-Latin Script Periodicals: A Case Study in the Ottoman Turkish Print Archive [article]

Suphan Kirmizialtin, David Wrisley
<span title="2020-11-02">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We discuss the historical situation of OT text collections and how they were excluded for the most part from the late twentieth century corpora digitization that took place in many Latin script languages  ...  , as some may expect, in right-to-left Arabic script text.  ...  models by manually correcting errors in the automated transcription and expanding each set of ground truth by an additional twenty pages of transcription after which the CER for Ahali newspaper dropped  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2011.01139v1">arXiv:2011.01139v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/binlks5egzfw7l5m4jg4feehf4">fatcat:binlks5egzfw7l5m4jg4feehf4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201105190547/https://arxiv.org/ftp/arxiv/papers/2011/2011.01139.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/25/60/2560e62bf89977a136825ae94228650b4bfd3358.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2011.01139v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 49 results