A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
Novel document detection for massive data streams using distributed dictionary learning
2013
IBM Journal of Research and Development
Given the high volume of content being generated online, it becomes necessary to employ automated techniques to separate out the documents belonging to novel topics from the background discussion, in a robust and scalable manner (with respect to the size of the document set). We present a solution to this challenge based on sparse coding, in which a stream of documents (where each document is modeled as an m-dimensional vector y) can be used to learn a dictionary matrix A of dimension m k, such
doi:10.1147/jrd.2013.2247232
fatcat:gb533tvb6nfnxab4v3vr65frry