14,604 Hits in 1.8 sec

An unsupervised and customizable misspelling generator for mining noisy health-related text sources

Abeed Sarker, Graciela Gonzalez-Hernandez
2018 Journal of Biomedical Informatics  
Data collection and extraction from noisy text sources such as social media typically rely on keyword-based searching/listening.  ...  In this paper, we present a customizable data-centric system that automatically generates common misspellings for complex health-related terms, which can improve the data collection process from noisy  ...  Acknowledgments Research performed for this publication by Abeed Sarker was supported by the National Institute on Drug Abuse (NIDA) of the National Institutes of Health under Award Number R01DA046619.  ... 
doi:10.1016/j.jbi.2018.11.007 pmid:30445220 pmcid:PMC6322919 fatcat:jxoqqskvdvavth2c3a6mdtvdiu

Evaluating a Spelling Support in a Search Engine [chapter]

Hercules Dalianis
2002 Lecture Notes in Computer Science  
Of these queries 10 percent were "misspelled" or erroneous and our spell checker corrected around 90 percent of these.  ...  The domain used was the web site of the Swedish National Tax Board ( Riksskatteverket, RSV), where the search engine was used between April and Sept 2001. One million queries were made by the public.  ...  I would also like to thank Ola Knutsson, Richard Domeij and Martin Hassel at NADA-KTH for valuable comments on the paper and pointers to the literature.  ... 
doi:10.1007/3-540-36271-1_16 fatcat:i7vin2hqkzfuzfe5baopim6xna

An unsupervised and customizable misspelling generator for mining noisy health-related text sources [article]

Abeed Sarker, Graciela Gonzalez-Hernandez
2018 arXiv   pre-print
In this paper, we present a customizable datacentric system that automatically generates common misspellings for complex health-related terms.  ...  Extrinsic evaluation of the system on a set of cancer-related terms showed an increase of over 67% in retrieval rate from Twitter posts when the generated variants are included.  ...  The first step in incorporating social media data for operational and research tasks involves designing data extraction/collection strategies, which are typically reliant on keyword-based searches (e.g  ... 
arXiv:1806.00910v1 fatcat:adxxsh6usrealarla3rwcrqwa4

On Correcting Misspelled Queries in Email Search

Abhijit Bhole, Raghavendra Udupa
We consider the problem of providing spelling corrections for misspelled queries in Email Search using user's own mail data.  ...  We propose SpEQ, a Machine Learning based approach that generates cor- rections for misspelled queries directly from the user's own mail data.  ...  Secondly, search intent as well as the document collection that needs to be searched are highly specific to the user in Email Search.  ... 
doi:10.1609/aaai.v29i1.9282 fatcat:da6iqbmtqrb7te5lpskqvpc66y

Spelling Errors in the Database: Shadow or Substance?

Barbara Nichols Randall
1999 Library resources & technical services  
Any Mehyl search that retrieves more than 10,000 hits is stopped. We found at least one instance of each misspelling.  ...  Anderson (1995) reiterates that users rely on-keyword and subject searching to find information.  ... 
doi:10.5860/lrts.43n3.161 fatcat:kdc3hm4by5bvjgs4x7dgpljrcq

Handling Spelling Errors in Online Catalog Searches

Karen M. Drabenstott, Marjorie S. Weller
1996 Library resources & technical services  
The purpose of this paper ls to add to our understanding and knowledge of_ spelltng errors in onl:ine catalog searches based on empirical studies of spelling errors in online catalog searches and suggest  ...  One study d,ue to collection failure. tr-l Din"" the introduction o1'online catalogs in the early I980s, librarians, system designers, and researchen have had a very accurate record of users' subiect and  ...  of misspellings or collection failure.  ... 
doi:10.5860/lrts.40n2.113 fatcat:b3h2laq3onathb56b5nznrf32e

An information retrieval approach to spelling suggestion

Sai Krishna, Prasad Pingali, Vasudeva Varma
2010 Proceedings of the 19th international conference on World wide web - WWW '10  
In addition, while searching, a misspelled word is decomposed into distinct ngrams of length varying from 2 to 5, that are used to search the index.  ...  INTRODUCTION Brill [1] reports roughly 10-15% of search engine queries contain errors. Popular search engines like Google, Yahoo provide suggestions for misspelled queries.  ... 
doi:10.1145/1772690.1772841 dblp:conf/www/KrishnaPV10 fatcat:akvgwfsz2ng5vkburybwwx2biy

Page 8 of Library Resources & Technical Services Vol. 40, Issue 2 [page]

1996 Library Resources & Technical Services  
First, we rec- ommended that online catalogs be equipped with search trees to place the burden of selecting a subject-searching approach in response to user queries on the system instead of on users and  ...  Third, we recommended that online catalogs be enhanced with tools and techniques to distinguish be- tween queries that fail due to misspellings and collection failure.  ... 

A Data-Driven Method of Discovering Misspellings of Medication Names on Twitter

Keyuan Jiang, Tingyu Chen, Liyuan Huang, Ricardo A Calix, Gordon R Bernard
2018 Studies in Health Technology and Informatics  
Compared with the phonetics-based approach, our method discovered more actual misspellings used on Twitter.  ...  It is known that medication names are misspelled on social media, and finding the misspellings is challenging because there exists no a priori knowledge as to how people would misspell a medication name  ...  This indicates the importance of considering their misspellings when collecting Twitter data.  ... 
pmid:29677938 pmcid:PMC6009827 fatcat:cmaxkf73mnchxjadglwkvyrnmy

Page 168 of Library Resources & Technical Services Vol. 43, Issue 3 [page]

1999 Library Resources & Technical Services  
We are correcting the most frequently misspelled words first. Concern about the impact of misspell- ings on the catalog should be minor.  ...  The volunteer noted only the number of occur- rences of the misspelled terms and found 697 potential misspellings of 106 words on the Ballard list.  ... 

Yizkor books

Jason J. Soo, Rebecca J. Cathey, Ophir Frieder, Michlean J. Amir, Gideon Frieder
2008 Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08  
We established a centralized index and metadata repository for the Yizkor Book collection and developed a detailed search interface accessible worldwide.  ...  Yizkor Book collections contain firsthand commemorative accounts of events from the era surrounding the rise and fall of Nazi Germany, including documents from before, during, and after the Holocaust.  ...  Thus, we crafted efficient simplistic rules to correct misspellings: First %aro% %r% -Second Ba%on Ba%n B%n Third %on --Fourth Bar% --Fifth B%n --Sixth Ba%on -- The query string is compared  ... 
doi:10.1145/1458082.1458266 dblp:conf/cikm/SooCFAF08 fatcat:yaufuib6svatdjgr5dytv42zai

Correct your text with Google

Stephanie Jacquemont, Francois Jacquenet, Marc Sebban
2007 IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)  
We think that basic spell checking may be improved (a step towards) using the Web as a corpus and taking into account the context of words that are identified as potential misspellings.  ...  We propose to use the Google search engine and some machine learning techniques, in order to design a flexible and dynamic spell checker that may evolve among the time with new linguistic features.  ...  To find them, we search for a misspelling with Google.  ... 
doi:10.1109/wi.2007.38 dblp:conf/webi/JacquemontJS07 fatcat:7e7parwgg5h4nett4ub5h2l3oi

Web Text Corpus for Natural Language Processing

Vinci Liu, James R. Curran
2006 Conference of the European Chapter of the Association for Computational Linguistics  
With many more words available on the web, better results can be obtained by collecting much larger web corpora.  ...  While most previous work accesses web text through search engine hit counts, we created a Web Corpus by downloading web pages to create a topic-diverse collection of 10 billion words of English.  ...  Gigaword Web Misspelling One factor contributing to the larger number of token types in the Web Corpus, as compared with the Gigaword, is the misspelling of words.  ... 
dblp:conf/eacl/LiuC06 fatcat:opmtp4ifobav5g52a7mhacfbhq

Conflation Methods and Spelling Mistakes - A Sensitivity Analysis in Information Retrieval

Philipp Dopichaj, Theo Härder
2004 Workshop Grundlagen von Datenbanken  
We focus on addressing this problem at the conflation stage of the retrieval process and evaluate whether conflation based on n-grams, which is said to be insensitive to misspellings, leads to better retrieval  ...  We do this by performing tests on artificially corrupted test collections and examine which characteristics of the queries and the relevant documents influence the relative retrieval quality achieved using  ...  Introduction The goal of textual information retrieval (IR) is to find from a set of texts information addressing a user's need. 1 Given a user query, a set of documents, the document collection, is searched  ... 
dblp:conf/gvd/DopichajH04 fatcat:yi5zsnitrnh5tmmg3vmd4kuora

Page 272 of Library Resources & Technical Services Vol. 43, Issue 4 [page]

1999 Library Resources & Technical Services  
failed to acknowledge that if misspellings Shadow or can so readily enter databases and some- what impede searching, misspellings by searchers are an even greater problem For subject but one that can be  ...  is nothing in the collection.  ... 
« Previous Showing results 1 — 15 out of 14,604 results