EmpiricalAnalysis of Document Similarity Using Statistical Model

Jyoti Phogat, Atul Kumar
2017 International Journal of Engineering Research and Applications  
ABTRACT Information retrieval is great technology behind web search services. This paper presents the statistical method for content based information. Mainly three paradigms of models are used in retrieving information. These are Boolean, probabilistic and vector space model. This paper also presents empirical studies of document similarity and discusses the issue of information retrieval system using statistical model. Vector space model is classical and most used retrieval model. The
more » ... model. The operation of retrieving information is calculated by using the cosine similarity function of query vector and set of documents vector. Finally, we concludethe results with human score various type documents like sports, politics and short stories.
doi:10.9790/9622-0706074650 fatcat:3ei7bz6mm5dibae7d443nc7qxq