Improved Statistical Machine Translation with Hybrid Phrasal Paraphrases Derived from Monolingual Text and a Shallow Lexical Resource

Yuval Marton
2010 Conference of the Association for Machine Translation in the Americas  
Paraphrase generation is useful for various NLP tasks. But pivoting techniques for paraphrasing have limited applicability due to their reliance on parallel texts, although they benefit from linguistic knowledge implicit in the sentence alignment. Distributional paraphrasing has wider applicability, but doesn't benefit from any linguistic knowledge. We combine a distributional semantic distance measure (based on a non-annotated corpus) with a shallow linguistic resource to create a hybrid
more » ... ic distance measure of words, which we extend to phrases. We embed this extended hybrid measure in a distributional paraphrasing technique, benefiting from both linguistic knowledge and independence from parallel texts. Evaluated in statistical machine translation tasks by augmenting translation models with paraphrase-based translation rules, we show our novel technique is superior to the non-augmented baseline and both the distributional and pivot paraphrasing techniques. We train models on both a full-size dataset as well as a simulated "low density" small dataset.
dblp:conf/amta/Marton10 fatcat:67c5iv7uvjcb5ob4uc4ceazyty