A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Harvesting the Bitexts of the Laws of Hong Kong From the Web
2005
International Joint Conference on Natural Language Processing
In this paper we present our recent work on harvesting English-Chinese bitexts of the laws of Hong Kong from the Web and aligning them to the subparagraph level via utilizing the numbering system in the legal text hierarchy. Basic methodology and practical techniques are reported in detail. The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. It is particularly
dblp:conf/ijcnlp/KitLSW05
fatcat:m2ah3rw7xzgvpbpd2xyempyrqy