Filters








67,222 Hits in 6.6 sec

The University of Helsinki Submission to the WMT19 Parallel Corpus Filtering Task

Raúl Vázquez, Umut Sulubacak, Jörg Tiedemann
2019 Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)  
Then, we produced scores for each sentence by weighting these features with a classification model.  ...  This paper describes the University of Helsinki Language Technology group's participation in the WMT 2019 parallel corpus filtering task. Our scores were produced using a two-step strategy.  ...  ). , as well as the MeMAD project, funded by the European Union's Horizon 2020 Research and Innovation Programme (grant № 780069).  ... 
doi:10.18653/v1/w19-5441 dblp:conf/wmt/VazquezST19 fatcat:7ee2dyld5jfd5amckq3y2h32eu

Handling Noisy Queries in Cross Language FAQ Retrieval

Danish Contractor, Govind Kothari, Tanveer A. Faruquie, L. Venkata Subramaniam, Sumit Negi
2010 Conference on Empirical Methods in Natural Language Processing  
We demonstrate the effectiveness of our approach on a real-life dataset.  ...  In a multilingual society it is essential that data services that were developed for a specific language be made accessible through other local languages also.  ...  Mean reciprocal rank is used to evaluate a system by producing a list of possible responses to a query, ordered by probability of correctness.  ... 
dblp:conf/emnlp/ContractorKFSN10 fatcat:ffytk34o6vea5bwumcqchiedby

Community Perspectives on Zika Virus Disease Prevention in Guatemala: A Qualitative Study

Elli Leontsini, Sean Maloney, Margarita Ramirez, Luisa María Mazariegos, Elisa Juarez Chavez, Diana Kumar, Priya Parikh, Gabrielle C. Hunter
2020 American Journal of Tropical Medicine and Hygiene  
and a lowland town explored other concepts through rank orderings of prevention practices depicted on cards.  ...  Condom use, although salient for Zika prevention, was hindered by gender roles.  ...  Financial support: The study was made possible by the support of the American people through the U.S.  ... 
doi:10.4269/ajtmh.19-0578 pmid:32100677 pmcid:PMC7204582 fatcat:2pjdpim4vberlpqydqngsuyw5a

Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric: The NRC supervised submissions to the Parallel Corpus Filtering task

Chi-kiu Lo, Michel Simard, Darlene Stewart, Samuel Larkin, Cyril Goutte, Patrick Littell
2018 Proceedings of the Third Conference on Machine Translation: Shared Task Papers  
In fact, our best performing system-NRC-yisi-bicov is one of the only four submissions ranked top 10 in both evaluations.  ...  In this paper, we also describe our unsuccessful attempt in automatically synthesizing a noisy parallel development corpus for tuning the weights to combine different parallelism and fluency features.  ...  Compared to the systems trained on data subselected by the best feature (YiSi-1 precision bicov), those trained on data subselected by the regression score list had their performance decreased by 0.2-0.5  ... 
doi:10.18653/v1/w18-6481 dblp:conf/wmt/LoSSLGL18 fatcat:5ctcocbgyvanjphdmyqnxpciom

Kvasir

Liang Wang, Sotiris Tasoulis, Teemu Roos, Jussi Kangasharju
2015 Proceedings of the 24th International Conference on World Wide Web - WWW '15 Companion  
We utilize the processing power of Apache Spark to scale up Kvasir into a practical Internet service.  ...  Latent semantic analysis has long been demonstrated as a promising information retrieval technique to search for relevant articles from large text corpora.  ...  score is calculated for ranking.  ... 
doi:10.1145/2740908.2742825 dblp:conf/www/WangTRK15 fatcat:ufbvyv6bgjg35jsx437gndqp64

Neutron: An Implementation of the Transformer Translation Model and its Variants [article]

Hongfei Xu, Qiuhui Liu
2020 arXiv   pre-print
The Transformer translation model is easier to parallelize and provides better performance compared to recurrent seq2seq models, which makes it popular among industry and research community.  ...  We implement basic functions for training, decoding and data processing such as: freezing / unfreezing parameters of models, padding list of tensors to same size on assigned dimension under "utils/".  ...  There are some sentence pairs which are meaningless or even not belonging to language pairs researching on. Vocabulary based cleaning is supported for this case.  ... 
arXiv:1903.07402v2 fatcat:5ofb5et6a5e4pcklokooepq7dy

Multicore triangle computations without tuning

Julian Shun, Kanat Tangwongsan
2015 2015 IEEE 31st International Conference on Data Engineering  
On a 40-core machine with two-way hyper-threading, our parallel exact global and local triangle counting algorithms obtain speedups of 17-50x on a set of real-world and synthetic graphs, and are faster  ...  This paper describes the design and implementation of simple and fast multicore parallel algorithms for exact, as well as approximate, triangle counting and other triangle computations that scale to billions  ...  This work is supported by a Facebook Graduate Fellowship, the National Science Foundation under grant number CCF-1314590, and the Intel Labs Academic Research Office for the Parallel Algorithms for Non-Numeric  ... 
doi:10.1109/icde.2015.7113280 dblp:conf/icde/ShunT15 fatcat:d6ekzi3z2rd3zjc2ketus2oa3u

Page 7552 of Mathematical Reviews Vol. , Issue 2002J [page]

2002 Mathematical Reviews  
(S-UMEA-C; Umea) One-by-one cleaning for practical parallel list ranking. (English summary) Algorithmica 32 (2002), no. 3, 345-363.  ...  Summary: “It is hard to achieve good speed-ups for parallel list ranking on distributed-memory machines because the problem requires a substantial number of communication rounds, each incurring some start-up  ... 

Microsoft's Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data

Marcin Junczys-Dowmunt
2018 Proceedings of the Third Conference on Machine Translation: Shared Task Papers  
Based on human evaluation we ranked first among constrained systems. We believe this is mostly caused by our data filtering/weighting regime.  ...  We participated in one language direction -English-German.  ...  We believe this is mostly caused by our data filtering/weighting regime. Based on human evaluation we ranked first among constrained systems.  ... 
doi:10.18653/v1/w18-6415 dblp:conf/wmt/Junczys-Dowmunt18 fatcat:34pbjsqouze2xfkqggvq7kzy2i

Microsoft's Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data [article]

Marcin Junczys-Dowmunt
2018 arXiv   pre-print
Based on human evaluation we ranked first among constrained systems. We believe this is mostly caused by our data filtering/weighting regime.  ...  We participated in one language direction -- English-German.  ...  We believe this is mostly caused by our data filtering/weighting regime. Based on human evaluation we ranked first among constrained systems.  ... 
arXiv:1809.00196v1 fatcat:cewcs3a2lrba5jdznhz56ajcdu

NRC Parallel Corpus Filtering System for WMT 2019

Gabriel Bernier-Colborne, Chi-kiu Lo
2019 Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)  
We describe the National Research Council Canada team's submissions to the parallel corpus filtering task at the Fourth Conference on Machine Translation.  ...  Acknowledgments We thank the anonymous reviewers for their helpful suggestions on this paper.  ...  While YiSi-1 successfully served in the WMT2018 parallel corpus filtering task, YiSi-2 showed comparable accuracy on identifying clean parallel sentences on a handannotated subset of test data in our internal  ... 
doi:10.18653/v1/w19-5434 dblp:conf/wmt/Bernier-Colborne19 fatcat:bow5csgv6fe6no25nxf6nc52hu

A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions

Shigehiko Schamoni, Julian Hitschler, Stefan Riezler
2018 Conference of the Association for Machine Translation in the Americas  
We present a dataset and method for improving the translation of noisy image captions that were created by users of Wikimedia Commons.  ...  The dataset is multilingual but non-parallel, and is several orders of magnitude larger than existing parallel data for multimodal machine translation.  ...  This research was supported in part by DFG grant RI-2221/2-1 "Grounding Statistical Machine Translation in Perception and Action", and by an Amazon Academic Research Award (AARA) "Multimodal Pivots for  ... 
dblp:conf/amta/SchamoniHR18 fatcat:zku2t5zomzgxjobsmjgx3jzn2a

Memshare: a Dynamic Multi-tenant Memory Key-value Cache [article]

Asaf Cidon, Daniel Rushton, Stephen M. Rumble, Ryan Stutsman
2016 arXiv   pre-print
Even for single-tenant applications, Memshare increases the average hit rate of the current state-of-the-art memory cache by an additional 2.7% on our real-world trace.  ...  We demonstrate that Memshare increases the combined hit rate of the applications in the trace by an 6.1% (from 84.7% hit rate to 90.8% hit rate) and reduces the total number of misses by 39.7% without  ...  If there were no free segments threads would block waiting for the cleaner to add new segments to the free list. In practice the free list is never empty (we describe the reason below).  ... 
arXiv:1610.08129v1 fatcat:db6dyqa5gzgqrgvoik27pd54mm

The threshold join algorithm for top-k queries in distributed sensor networks

D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras, M. Vlachos, N. Koudas, D. Srivastava
2005 Proceedings of the 2nd international workshop on Data management for sensor networks - DMSN '05  
The objective of a top-k query is to find the k highest ranked answers to a user defined similarity function.  ...  Our preliminary experimental results, using our trace driven simulator, show that TJA is both practical and efficient.  ...  The algorithm constructs a bound which is uniform for all lists, similarly to FA, which is too coarse in practice.  ... 
doi:10.1145/1080885.1080896 dblp:conf/dmsn/Zeinalipour-YaztiVGKTVKS05 fatcat:z3p23k2bb5a5xf4yoimp7atnd4

Measuring sentence parallelism using Mahalanobis distances: The NRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared task

Patrick Littell, Samuel Larkin, Darlene Stewart, Michel Simard, Cyril Goutte, Chi-kiu Lo
2018 Proceedings of the Third Conference on Machine Translation: Shared Task Papers  
One such entry fairly consistently scored in the top ten systems in the 100M-word conditions, and for one task-translating the European Medicines Agency corpus (Tiedemann, 2009)-scored among the best systems  ...  The WMT18 shared task on parallel corpus filtering (Koehn et al., 2018b ) challenged teams to score sentence pairs from a large highrecall, low-precision web-scraped parallel corpus (Koehn et al., 2018a  ...  That is to say, we did not completely throw out the clean parallel data for this task, we simply used it as two unaligned monolingual corpora.  ... 
doi:10.18653/v1/w18-6480 dblp:conf/wmt/LittellLSSGL18 fatcat:dnvayn743rg3va3rpha7ounjba
« Previous Showing results 1 — 15 out of 67,222 results