Filters








150,063 Hits in 3.5 sec

Normalized Web Distance and Word Similarity [article]

Rudi L. Cilibrasi, Paul M.B. Vitanyi
2009 arXiv   pre-print
In the paper introducing the NWD it was called 'normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance  ...  (NWD) method to determine similarity between words and phrases.  ...  Word Similarity: Normalized Web Distance Can we find an equivalent of the normalized information distance for names and abstract concepts?  ... 
arXiv:0905.4039v1 fatcat:foe6ogydlfg73j3325vv3vaivi

A Review on Text Similarity Technique used in IR and its Application

Nitesh Pradhan, Manasi Gyanchandani, Rajesh Wadhvani
2015 International Journal of Computer Applications  
The normalized Google distance between two search keyword x and y is defined as:- Where N is the total number of web pages search by Google search engine.  ...  The NID (Normalized Information Distance) has value between 0 and 1, and expresses that similarity on this scale, where 0 being the same and 1 being completely different.  ...  He has more than 12 years of teaching experience and has guided more than 12 M.Tech scholars. His research area includes domains of information retrieval, data mining and digital image processing.  ... 
doi:10.5120/21257-4109 fatcat:pebaadbtrrbgljlbz262zyqmzm

Measurement of Semantic Similarity Between Words: A Survey

Ankush Maind
2012 International Journal of Computer Science Engineering and Information Technology  
Semantic similarity measures between words play an important role in community mining , document clustering, information retrieval and automatic metadata extraction.  ...  So it always made an attempt to represent the semantics words as syntactic words. Today, there are various methods proposed for finding the semantic similarity between words.  ...  This proposed metric is named Normalized Google Distance (NGD) and is given by (2) (2) Here, P and Q are the two words between which distance NGD (P, Q) is to be computed, H (P) denotes the page count  ... 
doi:10.5121/ijcseit.2012.2605 fatcat:uunpmraf5rb3fkhyojzki7hr7e

Multi-Layer Web Services Discovery Using Word Embedding and Clustering Techniques

Waeal J. Obidallah, Bijan Raahemi, Waleed Rashideh
2022 Data  
In layer four, WordNet and Normalized Google Distance are employed to represent and find the similarity between web services documents.  ...  In the third layer, four distance measures, namely, Cosine, Euclidean, Minkowski, and Word Mover, are considered to find the similarities between Web services documents.  ...  for Clustering Web services (WUP (Wu-Palmer Semantic Similarity), WMD (Word Mover's Distance), and NGD (Normalized Google Distance)).  ... 
doi:10.3390/data7050057 fatcat:7noaphxm6fe25lmm4ja2iopf7y

The Normalized Freebase Distance [chapter]

Fréderic Godin, Tom De Nies, Christian Beecks, Laurens De Vocht, Wesley De Neve, Erik Mannens, Thomas Seidl, Rik Van de Walle
2014 Lecture Notes in Computer Science  
In this paper, we propose the Normalized Freebase Distance (NFD), a new measure for determing semantic concept relatedness that is based on similar principles as the Normalized Web Distance (NWD).  ...  federal and state governments.  ...  The research activities in this paper were funded by Ghent University, iMinds (by the Flemish Government), the IWT Flanders, the FWO-Flanders, the European Union, and the Excellence Initiative of the German  ... 
doi:10.1007/978-3-319-11955-7_22 fatcat:nwveagk2qfau5fufya63jwiwoe

Word Semantic Similarity Based on Document's Title

Mohamed Said Hamani, Ramdane Maamri
2013 2013 24th International Workshop on Database and Expert Systems Applications  
In order to measure semantic similarity between two given words, this paper proposes a transformation function for web measures along with a new approach that exploits the document's title attribute and  ...  Measuring similarity between words using a search engine based on page counts alone is a challenging task.  ...  [11] proposed a distance metric between words using only page counts retrieved from a web search engine named Normalized Google Distance (NGD).  ... 
doi:10.1109/dexa.2013.12 dblp:conf/dexaw/HamaniM13 fatcat:inkvuqi745b5xlg56jykmqen4e

The Google Similarity Distance [article]

Rudi Cilibrasi, Paul M. B. Vitanyi (CWI, University of Amsterdam)
2007 arXiv   pre-print
This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts.  ...  We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity.  ...  Universality of Normalized Google Distance: Every individual web author produces both an individual Google distribution g i , and an individual prefix code-word length G i associated with g i (see [12  ... 
arXiv:cs/0412098v3 fatcat:zxqwrugd6relzkapruras6wveq

Web Similarity in Sets of Search Terms Using Database Queries

Andrew R. Cohen, Paul M. B. Vitányi
2020 SN Computer Science  
Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or another large electronic database, for instance Wikipedia, and a search engine that returns  ...  A restriction of the NWD to a set of two yields the earlier normalized Google distance (NGD), but no combination of the NGD's of pairs in a set can extract the information the NWD extracts from the set  ...  The term "name" is used here synonymously with "word" "search term" or "query." The normalized distance above is called the normalized web distance (NWD).  ... 
doi:10.1007/s42979-020-00148-5 fatcat:cijvycebufdwrnxohn2cophdwq

Compression-based Similarity [article]

Paul M.B. Vitanyi
2011 arXiv   pre-print
The distances are based on compression of the objects concerned, normalized, and can be viewed as similarity distances.  ...  We can extract a code length from the numbers returned, use the same formula as before, and derive a similarity or relative semantics between names for objects.  ...  Normalized Web Distance The web code length G is defined by G(x) = log 1/g(x) (3) G(x, y) = log 1/g(x, y).  ... 
arXiv:1110.4544v1 fatcat:tpxjgc764rg27cnds5i5dwhdt4

Compression-Based Similarity

Paul M.B. Vit´nyi
2011 2011 First International Conference on Data Compression, Communications and Processing  
The distances are based on compression of the objects concerned, normalized, and can be viewed as similarity distances.  ...  We can extract a code length from the numbers returned, use the same formula as before, and derive a similarity or relative semantics between names for objects.  ...  Normalized Web Distance The web code length G is defined by G(x) = log 1/g(x) (3) G(x, y) = log 1/g(x, y).  ... 
doi:10.1109/ccp.2011.50 dblp:conf/ccp/Vitanyi11 fatcat:gjocgqpo4zd6tgdke5ydm3empu

Semantic Relation between Words with the Web as Information Source [chapter]

Tanmay Basu, C. A. Murthy
2009 Lecture Notes in Computer Science  
Now a days it is widely used in semantic web. This paper aims to present a measure to automatically determine semantic relation between words using web as knowledge source.  ...  This relationship measure will be useful to extract semantic information from the web .  ...  NGD normally takes the values between 0 and 1 though the value lies in between 0 and ∞ [3] . For NGD two words are similar if the value is 0 and the relationship decreases as it grows to 1.  ... 
doi:10.1007/978-3-642-11164-8_43 fatcat:vh7hbruybvgrjnxpuppb27z2bm

Web Similarity [article]

Andrew R. Cohen, Paul M.B. Vitanyi
2015 arXiv   pre-print
Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or any other large electronic database, for instance Wikipedia, and a search engine that returns  ...  We develop the theory and give applications. The derivation of the NWD method is based on Kolmogorov complexity.  ...  The normalized web distance (NWD) of X ∈ X with G(X) < ∞ (equivalently Remark II. 7 .  ... 
arXiv:1502.05957v1 fatcat:kz5rizo2hjbdpfawskesghdjiy

Semantic Similarity Measurement between Words using Lexical Patterns

D. Hema Latha, Dept of Computer Science, Osmania University College For Women (OUCW), D. Linga Reddy, Dept. of Physics, UCS, Osmania University, Hyderabad, India.
2015 CVR Journal of Science & Technology  
Semantic similarity measurement between words is a tedious task in web mining, information extraction and natural language processing.  ...  In this paper, the authors proposed an automatic approach to evaluate the logical or semantic similarity between words or entities with the help of web search engines.  ...  Cilibrasi and Vitanyi proposed a distance metric between words using only page counts retrieved from a web search engine. II.  ... 
doi:10.32377/cvrjst0814 fatcat:mpj77mat3besvlgiqowucznb5e

A Graph-Based Framework for Web Document Mining [chapter]

Adam Schenker, Horst Bunke, Mark Last, Abraham Kandel
2004 Lecture Notes in Computer Science  
We introduce several different types of web document representations that utilize graphs and compare their performance for clustering and classification.  ...  In this paper we describe methods of performing data mining on web documents, where the web document content is represented by graphs.  ...  These are, from left to right: standard, simple, 5-distance, 5-simple distance, raw frequency, and normalized frequency.  ... 
doi:10.1007/978-3-540-28640-0_38 fatcat:2oubhzxixrgzrncps7twnaxzwu

AN EFFECTIVE FUZZY CLUSTERING ALGORITHM FOR WEB DOCUMENT CLASSIFICATION: A CASE STUDY IN CULTURAL CONTENT MINING

GEORGE E. TSEKOURAS, DAMIANOS GAVALAS
2013 International journal of software engineering and knowledge engineering  
We calculate the similarity ('weighted Hamming distances') between the cultural-related document vectors and for each cultural theme, we use cluster analysis to partition the documents into a number of  ...  This article presents a novel crawling and clustering method for extracting and processing cultural data from the web in a fully automated fashion.  ...  Hamming distance is referred to as an appropriate distance metric for error detection and correction codes and has also been used for defining similarity measures among web pages or web links [11] ,  ... 
doi:10.1142/s021819401350023x fatcat:bxvzywjwrzdblhhcubsn75va6y
« Previous Showing results 1 — 15 out of 150,063 results