Filters








162,749 Hits in 4.1 sec

The Google Similarity Distance [article]

Rudi Cilibrasi, Paul M. B. Vitanyi (CWI, University of Amsterdam)
2007 arXiv   pre-print
This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts.  ...  We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity.  ...  As far as the authors know there do not exist other experiments that create this type of semantic distance automatically from the web using Google or similar search engines.  ... 
arXiv:cs/0412098v3 fatcat:zxqwrugd6relzkapruras6wveq

OLAP textual aggregation approach using the Google similarity distance

Mustapha Bouakkaz, Sabile Loudcher, Youcef Ouinten
2016 International Journal of Business Intelligence and Data Mining  
The distance used in K-means is replaced by the Google similarity distance which takes into account the semantic similarity of keywords for their aggregation.  ...  In this paper, we propose a new aggregation function for textual data in an OLAP context based on the K-means method.  ...  The Google similarity distance has been proposed by Google and has been tested in more than eight billion of web pages (4) .  ... 
doi:10.1504/ijbidm.2016.076425 fatcat:wnclc762zvdefchacghmhj2gyq

Universal Similarity [article]

Paul Vitanyi (CWI, University of Amsterdam, National ICT Australia)
2005 arXiv   pre-print
In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms.  ...  For both families we give universal similarity distance measures, incorporating all particular distance measures in the family.  ...  Non-Feature Similarities: Our aim is to capture, in a single similarity metric, every effective distance: effective versions of Hamming distance, Euclidean distance, edit distances, alignment distance,  ... 
arXiv:cs/0504089v2 fatcat:wondmr4gfbhiboppn3zba7gku4

Universal similarity

P. Vitanyi
2005 IEEE Information Theory Workshop, 2005.  
In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms.  ...  For both families we give universal similarity distance measures, incorporating all particular distance measures in the family.  ...  Non-Feature Similarities: Our aim is to capture, in a single similarity metric, every effective distance: effective versions of Hamming distance, Euclidean distance, edit distances, alignment distance,  ... 
doi:10.1109/itw.2005.1531896 dblp:conf/itw/Vitanyi05 fatcat:wb6xj4eky5dullfqydvzdt3n5u

Automatic Extraction of Meaning from the Web

Rudi Cilibrasi, Paul Vitanyi
2006 2006 IEEE International Symposium on Information Theory  
In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms.  ...  For both families we give universal similarity distance measures, incorporating all particular distance measures in the family.  ...  Non-Feature Similarities: Our aim is to capture, in a single similarity metric, every effective distance: effective versions of Hamming distance, Euclidean distance, edit distances, alignment distance,  ... 
doi:10.1109/isit.2006.261979 dblp:conf/isit/CilibrasiV06 fatcat:kz67ir5ihbeevfvneggokymcze

Learning Query-Specific Distance Functions for Large-Scale Web Image Search

Yushi Jing, Michele Covell, David Tsai, James M. Rehg
2013 IEEE transactions on multimedia  
We evaluate the feasibility and efficacy of our proposed system through comprehensive human evaluation, and compare the results with the state-of-the-art image distance function used by Google image search  ...  We conjecture that given such hybrid image search engines, learning per-query distance functions over image features can improve the estimation of image similarity.  ...  Google-L2 represents the highly optimized distance function used by Google Similar Images.  ... 
doi:10.1109/tmm.2013.2279663 fatcat:yngk6votyjdpjm342of35wqypu

Normalized Google Distance of Multisets with Applications [article]

Andrew R. Cohen , P.M.B. Vitanyi
2013 arXiv   pre-print
Normalized Google distance (NGD) is a relative semantic distance based on the World Wide Web (or any other large electronic database, for instance Wikipedia) and a search engine that returns aggregate  ...  We give applications and compare the results with those obtained using the pairwise NGD. The derivation of NGD method is based on Kolmogorov complexity.  ...  In the name case we define a similarity distance based on the background information provided by Google or any search engine that produces aggregate page counts.  ... 
arXiv:1308.3177v1 fatcat:7drxkby2z5hcreqqbetbewwcqq

Similarity of Objects and the Meaning of Words [chapter]

Rudi Cilibrasi, Paul Vitanyi
2006 Lecture Notes in Computer Science  
In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms.  ...  For both families we give universal similarity distance measures, incorporating all particular distance measures in the family.  ...  " distance, because it gives a relative similarity according to the distance (with distance 0 when objects are maximally similar and distance 1 when they are maximally dissimilar) and, conversely, for  ... 
doi:10.1007/11750321_2 fatcat:75cwjcu2brbbvminyxffcaekiq

Similarity of Objects and the Meaning of Words [article]

Rudi Cilibrasi and Paul Vitanyi (CWI and University of Amsterdam)
2006 arXiv   pre-print
In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms.  ...  For both families we give universal similarity distance measures, incorporating all particular distance measures in the family.  ...  " distance, because it gives a relative similarity according to the distance (with distance 0 when objects are maximally similar and distance 1 when they are maximally dissimilar) and, conversely, for  ... 
arXiv:cs/0602065v1 fatcat:2zzebz4vfrgwni6qrkyht7ncjq

Search Engine Similarity Analysis: A Combined Content and Rankings Approach [article]

Konstantina Dritsa, Thodoris Sotiropoulos, Haris Skarpetis, Panos Louridas
2020 arXiv   pre-print
The search engine wars are a favorite topic of on-line analysts, as two of the biggest companies in the world, Google and Microsoft, battle for prevalence of the web search space.  ...  (2) the evolution of their affinity over time, (3) what aspects of the results influence similarity, and (4) how the metric differs over different kinds of search services.  ...  Acknowledgments This work was supported by the European Union's Horizon 2020 research and innovation program "FASTEN" under grant agreement No 825328.  ... 
arXiv:2011.00650v1 fatcat:dp42y5zlizgobjxcl7eqek2cmy

Google distance between words [article]

Bjørn Kjos-Hanssen, Alberto J. Evangelista
2015 arXiv   pre-print
We present a specific counterexample to the triangle inequality for this similarity distance function.  ...  Furthermore, they have developed a similarity distance function that gauges how closely related a pair of words is.  ...  Acknowledgments We thank the Office of Undergraduate Research, University of Connecticut, for selecting our work for presentation [5] . The second author thanks the organizers M. Hutter, W.  ... 
arXiv:0901.4180v2 fatcat:jfeffqflrnhkzncg5tp4rdld3q

TrackMeNot-so-good-after-all [article]

Rami Al-Rfou', William Jannen, Nikhil Patwardhan
2012 arXiv   pre-print
Google Normalized Distance Google Normalized Distance (GND) is a measure of semantic similarity based on statistics. GND utilizes an extremely large vocabulary set.  ...  Data Set Precision Recall We also performed cluster analysis of the data using Google Normalized Distance as our measure of semantic similarity. The rami.m analysis is shown in Figure 5 .  ... 
arXiv:1211.0320v1 fatcat:tfcl7kiizfgx3b3h6tsgcint3m

A new framework for Arabic recitation using speech recognition and the Jaro Winkler algorithm

Souad Larabi-Marie-Sainte, Computer Science department,College of Computer and Information Sciences,Prince Sultan University, Saudi Arabia, Betool S. Alnamlah, Norah F. Alkassim, Sara Y. Alshathry, Computer Science department,College of Computer and Information Sciences,Prince Sultan University, Saudi Arabia, Computer Science department,College of Computer and Information Sciences,Prince Sultan University, Saudi Arabia, Computer Science department,College of Computer and Information Sciences,Prince Sultan University, Saudi Arabia
2021 Maǧallaẗ Al-Kuwayt li-l-ʿulūm  
Samee'a system is based on Google Cloud Speech Recognition API to convert the Arabic speech to text and Jaro Winkler Distance algorithm to determine the similarity between the original and converted texts  ...  To validate the obtained results, two comparison studies were performed. The Jaro Winker distance was successfully compared to the cosine and the Euclidean distance.  ...  ACKNOWLEDGEMENTS The authors would like to acknowledge the support of prince sultan university.  ... 
doi:10.48129/kjs.v49i1.11231 fatcat:uo7ddxfpmngwvdn4ckraff4yh4

A Four-Pronged Low Cost and Optimized Traffic Routing Solution

Muhammad Talha Qureshi, Athaul Rai, Noman Islam, Ghazala Shafi Sheikh
2020 International Journal of Interactive Mobile Technologies  
To avoid this latency, route estimations (distance and time) are calculated using a four-pronged approach based on Google map API, open street map (OSM), routing cache and logical grid of locations.  ...  The objective is to create a generalized routing system that tries to use Google services in optimized fashion.  ...  Also, thanks to the organization Cubix, Iqra University and Sindh Madrassatul Islam University, that gave us an opportunity to work on most pragmatic-level industry-grade problems in the field of our passion  ... 
doi:10.3991/ijim.v14i10.15057 fatcat:vlb3ecdhtzfhbpl3qve77rzm24

Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines

Jennifer A. Byrne, Cyril Labbé
2016 Scientometrics  
The incorrect use of a particular TPD52L2 shRNA sequence as a negative or non-targeting control was identified in 30/48 (63%) of these publications, using a combination of Google Scholar searches and visual  ...  Comparing 5 publications from China that described knockdowns of the human TPD52L2 gene in human cancer cell lines identified unexpected similarities between these publications, flaws in experimental design  ...  JAB thanks journal editors and peer reviewers for their assistance, and members of the Children's Cancer Research Unit for discussions.  ... 
doi:10.1007/s11192-016-2209-6 fatcat:5tbkd4p42jadjds7v6jqtmvl54
« Previous Showing results 1 — 15 out of 162,749 results