A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
TextBenDS: a generic Textual data Benchmark for Distributed Systems
[article]
2021
arXiv
pre-print
Extracting top-k keywords and documents using weighting schemes are popular techniques employed in text mining and machine learning for different analysis and retrieval tasks. The weights are usually computed in the data preprocessing step, as they are costly to update and keep track of all the modifications performed on the dataset. Furthermore, computation errors are introduced when analyzing only subsets of the dataset. Therefore, in a Big Data context, it is crucial to lower the runtime of
arXiv:2108.05689v1
fatcat:ozt5dleqnnattgwp3ic5w4uxga