A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Research on Similarity Detection of Massive Text Based on Semantic Fingerprint
2018
Proceedings of Information Science and Cloud Computing — PoS(ISCC 2017)
unpublished
In order to find the required information quickly and efficiently in massive texts, this paper proposes a method of combining semantic fingerprint with cosine distance. After text preprocessing for Chinese texts, the Term Frequency-Inverse Document Frequency algorithm is used to extract feature words of the text, and then screen the text initially by the Simhash algorithm, finally compare these candidate texts tby using the cosine distance for the second similarity to extract the most similar
doi:10.22323/1.300.0009
fatcat:4gws6g23sbawbpnwg2xroz2o64