A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap
2007
Conference on Empirical Methods in Natural Language Processing
A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirements fall significantly below lossless information-theoretic lower bounds but it produces false positives with some quantifiable probability. Here we present a general framework for deriving smoothed language model probabilities from BFs. We investigate how a BF containing n-gram statistics can be used as a direct replacement for a conventional n-gram model. Recent work has demonstrated that corpus
dblp:conf/emnlp/TalbotO07
fatcat:l7dt4hvvw5bu3acxycu3iqi3hq