A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Lucene for n-grams using the CLUEWeb Collection
2009
Text Retrieval Conference
The ARSC team made modifications to the Apache Lucene engine to accommodate "go words," taken from the Google Gigaword vocabulary of n-grams. Indexing the Category "B" subset of the ClueWeb collection was accomplished by a divide and conquer method, working across the separate ClueWeb subsets for 1, 2 and 3-grams.
dblp:conf/trec/NewbyFM09
fatcat:ncth6q4xu5fcrp5wm7ykc7ojly