CzEng: Czech-English Parallel Corpus release version 0.5

Ondrej Bojar, Zdenek Zabokrtský
2006 Prague Bulletin of Mathematical Linguistics  
We introduce CzEng 0.5, a new Czech-English sentence-aligned parallel corpus consisting of around 20 million tokens in either language. The corpus is available on the Internet and can be used under the terms of license agreement for non-commercial educational and research purposes. Besides the description of the corpus, also preliminary results concerning statistical machine translation experiments based on CzEng 0.5 are presented.
dblp:journals/pbml/BojarZ06 fatcat:2pbd4glcnrd4ph42ka54bbavjy