Filters








9 Hits in 9.0 sec

Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation [article]

Tahmid Hasan, Abhik Bhattacharjee, Kazi Samin, Masum Hasan, Madhusudan Basak, M. Sohel Rahman, Rifat Shahriyar
2020 arXiv   pre-print
In this work, we build a customized sentence segmenter for Bengali and propose two novel methods for parallel corpus creation on low-resource setups: aligner ensembling and batch filtering.  ...  We believe our study will pave the way for future research on Bengali-English machine translation as well as other low-resource languages.  ...  Acknowledgements We would like to thank the ICT Division, Government of the People's Republic of Bangladesh for funding the project and Intelligent Machines Limited for providing cloud support.  ... 
arXiv:2009.09359v2 fatcat:chxob2dxo5adrjfqcexualahmm

Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation

Tahmid Hasan, Abhik Bhattacharjee, Kazi Samin, Masum Hasan, Madhusudan Basak, M. Sohel Rahman, Rifat Shahriyar
2020 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)   unpublished
In this work, we build a customized sentence segmenter for Bengali and propose two novel methods for parallel corpus creation on low-resource setups: aligner ensembling and batch filtering.  ...  We believe our study will pave the way for future research on Bengali-English machine translation as well as other low-resource languages.  ...  Acknowledgements We would like to thank the ICT Division, Government of the People's Republic of Bangladesh for funding the project and Intelligent Machines Limited for providing cloud support.  ... 
doi:10.18653/v1/2020.emnlp-main.207 fatcat:monhojbz5fa7hdhbh6y2yueu7e

Program Committee

2006 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation  
Coaching in Computer Assisted Language Learning using Machine Translation Technology.  ...  As well as many papers on distributional semantics, there were some on extending the coverage of existing wordnets, linking wordnets to new resources (especially in the medical domain), using wordnets  ...  Fund as well as the South African Centre for Digital Language Resources for providing funding in the various phases of the AWN project; as well as reviewers and conference participants for valuable inputs  ... 
doi:10.1109/scam.2006.23 dblp:conf/scam/X06c fatcat:2dhsf7loj5hlffu2jxpmlo2qcq

Recent Advances in Neural Text Generation: A Task-Agnostic Survey [article]

Chen Tang, Frank Guerin, Yucheng Li, Chenghua Lin
2022 arXiv   pre-print
Finally we discuss the future directions for the development of neural text generation including neural pipelines and exploiting back-ground knowledge.  ...  These advances have been achieved by numerous developments, which we group under the following four headings: data construction, neural frameworks, training and inference strategies, and evaluation metrics  ...  Sohel Rahman, and Rifat Shahriyar. 2020. Not lowresource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation.  ... 
arXiv:2203.03047v1 fatcat:iupgvcw2hbge5ioy6quiotnra4

Pretrained Transformers for Text Ranking: BERT and Beyond

Andrew Yates, Rodrigo Nogueira, Jimmy Lin
2021 Proceedings of the 14th ACM International Conference on Web Search and Data Mining  
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query for a particular task.  ...  The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond.  ...  We'd like to thank the following people for comments on earlier drafts of this work: Maura Grossman, Sebastian Hofstätter, Xueguang Ma, and Bhaskar Mitra.  ... 
doi:10.1145/3437963.3441667 fatcat:6teqmlndtrgfvk5mneq5l7ecvq

The Taming of the Shrew - non-standard text processing in the Digital Humanities [article]

Sarah Schulz, Universität Stuttgart, Universität Stuttgart
2018
methodologies have to be reviewed and redefined since so called non-standard texts pose challenges on the lexical and syntactic level especially for machine-learning-based approaches.  ...  We answer to this with modular and thus easily adjustable project workflows and system architectures. Several instances serve as examples for our methodolo [...]  ...  Barman et al. (2014) investigate mixed text including three languages: Bengali, English and Hindi.  ... 
doi:10.18419/opus-9685 fatcat:x35u3cimanc2pfsj2jy3bvfolm

Scanning the Science-Society Horizon [article]

Brenda Moon, University, The Australian National, University, The Australian National
2016
The increase in people participating in social media on the Internet offers a new resource for monitoring what people are discussing.  ...  An open source a data gathering tool for Twitter data was developed and used to collect a dataset from Twitter with the keyword 'science' during 2011.  ...  to be used, a new batch parameter instead of the update_every parameter for choosing between online or batch learning, and the removal of the parameter distributed because the multicore LDA does not have  ... 
doi:10.25911/5d6664e8354b8 fatcat:jmgtblj2n5e6xosu7ue3sokami

OASIcs, Volume 74, SLATE'19, Complete Volume [article]

Ricardo Rodrigues, Jan Janoušek, Luís Ferreira, Luísa Coheur, Fernando Batista, Hugo Gonçalo Oliveira
2019
For instance, it can be used for evaluation purposes in Machine Translation: a translation result can be missing a reference, and, still, be a good translation; thus, we should be able to see if it is  ...  Other related experiments which use bilingual lexical resources to enhance existing wordnets from existing bilingual English dictionaries by intersecting lemmas are conducted for Sanskrit [9] , Bengali  ...  First we can calculate the list of new instances -to check for typos (anb-new-items myfamily.ab f.anb command). And finally commit the new notes: (anb-commit myfamily.ab f.anb command).  ... 
doi:10.4230/oasics.slate.2019 fatcat:it3a3fn52bhrznke4mv6z7vrr4

Computing Negotiation Update Semantics in Multi-issue Bargaining Dialogues

Volha Petukhova, Harry Bunt, Andrei Malchanau
2017 SEMDIAL 2017 (SaarDial) Workshop on the Semantics and Pragmatics of Dialogue   unpublished
Number 15H03226 entitled "Autonomous Mutual Learning among Japanese Learners of English through Interaction" (PI: Yasunari Harada) and Grant-in-Aid for Scientific Research (C) Project Number 16K02946 entitled  ...  Acknowledgments Thanks to Vajjala and Meurers (2016) for sharing their feature set and Xu et al. (2015) for sharing their corpus.  ...  expect the English and German sections of the dataset to differ substantially.  ... 
doi:10.21437/semdial.2017-10 fatcat:7zzmv3z7jrevzokb5dtlf6jr3m