A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
[article]
2020
arXiv
pre-print
In this work, we build a customized sentence segmenter for Bengali and propose two novel methods for parallel corpus creation on low-resource setups: aligner ensembling and batch filtering. ...
We believe our study will pave the way for future research on Bengali-English machine translation as well as other low-resource languages. ...
Acknowledgements We would like to thank the ICT Division, Government of the People's Republic of Bangladesh for funding the project and Intelligent Machines Limited for providing cloud support. ...
arXiv:2009.09359v2
fatcat:chxob2dxo5adrjfqcexualahmm
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
2020
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
unpublished
In this work, we build a customized sentence segmenter for Bengali and propose two novel methods for parallel corpus creation on low-resource setups: aligner ensembling and batch filtering. ...
We believe our study will pave the way for future research on Bengali-English machine translation as well as other low-resource languages. ...
Acknowledgements We would like to thank the ICT Division, Government of the People's Republic of Bangladesh for funding the project and Intelligent Machines Limited for providing cloud support. ...
doi:10.18653/v1/2020.emnlp-main.207
fatcat:monhojbz5fa7hdhbh6y2yueu7e
Program Committee
2006
2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation
Coaching in Computer Assisted Language Learning using Machine Translation Technology. ...
As well as many papers on distributional semantics, there were some on extending the coverage of existing wordnets, linking wordnets to new resources (especially in the medical domain), using wordnets ...
Fund as well as the South African Centre for Digital Language Resources for providing funding in the various phases of the AWN project; as well as reviewers and conference participants for valuable inputs ...
doi:10.1109/scam.2006.23
dblp:conf/scam/X06c
fatcat:2dhsf7loj5hlffu2jxpmlo2qcq
Recent Advances in Neural Text Generation: A Task-Agnostic Survey
[article]
2022
arXiv
pre-print
Finally we discuss the future directions for the development of neural text generation including neural pipelines and exploiting back-ground knowledge. ...
These advances have been achieved by numerous developments, which we group under the following four headings: data construction, neural frameworks, training and inference strategies, and evaluation metrics ...
Sohel Rahman, and Rifat Shahriyar. 2020. Not lowresource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation. ...
arXiv:2203.03047v1
fatcat:iupgvcw2hbge5ioy6quiotnra4
Pretrained Transformers for Text Ranking: BERT and Beyond
2021
Proceedings of the 14th ACM International Conference on Web Search and Data Mining
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query for a particular task. ...
The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. ...
We'd like to thank the following people for comments on earlier drafts of this work: Maura Grossman, Sebastian Hofstätter, Xueguang Ma, and Bhaskar Mitra. ...
doi:10.1145/3437963.3441667
fatcat:6teqmlndtrgfvk5mneq5l7ecvq
The Taming of the Shrew - non-standard text processing in the Digital Humanities
[article]
2018
methodologies have to be reviewed and redefined since so called non-standard texts pose challenges on the lexical and syntactic level especially for machine-learning-based approaches. ...
We answer to this with modular and thus easily adjustable project workflows and system architectures. Several instances serve as examples for our methodolo [...] ...
Barman et al. (2014) investigate mixed text including three languages: Bengali, English and Hindi. ...
doi:10.18419/opus-9685
fatcat:x35u3cimanc2pfsj2jy3bvfolm
Scanning the Science-Society Horizon
[article]
2016
The increase in people participating in social media on the Internet offers a new resource for monitoring what people are discussing. ...
An open source a data gathering tool for Twitter data was developed and used to collect a dataset from Twitter with the keyword 'science' during 2011. ...
to be used, a new batch parameter instead of the update_every parameter for choosing between online or batch learning, and the removal of the parameter distributed because the multicore LDA does not have ...
doi:10.25911/5d6664e8354b8
fatcat:jmgtblj2n5e6xosu7ue3sokami
OASIcs, Volume 74, SLATE'19, Complete Volume
[article]
2019
For instance, it can be used for evaluation purposes in Machine Translation: a translation result can be missing a reference, and, still, be a good translation; thus, we should be able to see if it is ...
Other related experiments which use bilingual lexical resources to enhance existing wordnets from existing bilingual English dictionaries by intersecting lemmas are conducted for Sanskrit [9] , Bengali ...
First we can calculate the list of new instances -to check for typos (anb-new-items myfamily.ab f.anb command). And finally commit the new notes: (anb-commit myfamily.ab f.anb command). ...
doi:10.4230/oasics.slate.2019
fatcat:it3a3fn52bhrznke4mv6z7vrr4
Computing Negotiation Update Semantics in Multi-issue Bargaining Dialogues
2017
SEMDIAL 2017 (SaarDial) Workshop on the Semantics and Pragmatics of Dialogue
unpublished
Number 15H03226 entitled "Autonomous Mutual Learning among Japanese Learners of English through Interaction" (PI: Yasunari Harada) and Grant-in-Aid for Scientific Research (C) Project Number 16K02946 entitled ...
Acknowledgments Thanks to Vajjala and Meurers (2016) for sharing their feature set and Xu et al. (2015) for sharing their corpus. ...
expect the English and German sections of the dataset to differ substantially. ...
doi:10.21437/semdial.2017-10
fatcat:7zzmv3z7jrevzokb5dtlf6jr3m