A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is
Non-linear mappings of the form P (ngram) γ and log(1+τ P (ngram)) log(1+τ ) are applied to the n-gram probabilities in five trainable open-source language identifiers. The first mapping reduces classification errors by 4.0% to 83.9% over a test set of more than one million 65-character strings in 1366 languages, and by 2.6% to 76.7% over a subset of 781 languages. The second mapping improves four of the five identifiers by 10.6% to 83.8% on the larger corpus and 14.4% to 76.7% on the smallerdoi:10.3115/v1/d14-1069 dblp:conf/emnlp/Brown14 fatcat:d7naeus2q5hh3fuum5r4euoyba