Filters








26,780 Hits in 5.4 sec

Dynamic User-Defined Similarity Searching in Semi-Structured Text Retrieval [article]

Filippo Geraci, Marco Pellegrini
2007 arXiv   pre-print
Modern text retrieval systems often provide a similarity search utility, that allows the user to find efficiently a fixed number k of documents in the data set that are most similar to a given query (here  ...  The problem is more complex when we also allow each such vector space to have an associated user-defined dynamic weight that influences its contribution to the overall dynamic aggregated and weighted similarity  ...  score σ i (e j ), moreover for each source s i we have a scalar positive weight w i that is user-defined and changes dynamically for each query.  ... 
arXiv:0705.4606v1 fatcat:lidka3yvg5c23cpctih5piuf2u

Dynamic User-Defined Similarity Searching in Semi-Structured Text Retrieval

Filippo Geraci, Marco Pellegrini
2008 Proceedings of the Third International ICST Conference on Scalable Information Systems  
Modern text retrieval systems often provide a similarity search utility, that allows the user to find efficiently a fixed number k of documents in the data set that are most similar to a given query (here  ...  The problem is more complex when we also allow each such vector space to have an associated user-defined dynamic weight that influences its contribution to the overall dynamic aggregated and weighted similarity  ...  Dynamic User-Defined Similarity Searching in Semi-Structured Text Retrieval Filippo Geraci ⋆ and Marco Pellegrini Istituto di Informatica e Telematica del CNR, Via G.  ... 
doi:10.4108/icst.infoscale2008.3488 dblp:conf/infoscale/GeraciP08 fatcat:pbt7in5m5fcsdkijofmpvs5avi

A Survey: Techniques of an Efficient Search Annotation based on Web Content Mining

Sobana. E, Muthusankar.D Muthusankar.D
2014 International Journal of Computer Applications  
Due to the overloaded of information in web, the information extraction is not effectively based on user needs.  ...  In the World Wide Web, or simply the web, the content of information is changing everyday and it is known as dynamic environment.  ...  There are two main approaches are used in Web Content Mining namely,  Unstructured text mining approach.  Semi-Structured and Structured mining approach.  ... 
doi:10.5120/18181-9072 fatcat:6vppuxoy3zb35do7mawyeabvni

A NOVEL APPROACH FOR CONVERSION OF SEMISTRUCTURED TO STRUCTURED DATA

Suchitra B., Duraisamy Dr. S.
2018 International Journal on Computer Science and Engineering  
In this paper, the research work motive is to learn about web content mining tools, techniques and the examination is concentrated on semi structured data.  ...  The web content mining is connected but differs from data mining and text mining.  ...  Internal formation of Data View Information Retrieval Information retrieval can be center for semi structured documents.  ... 
doi:10.21817/ijcse/2018/v10i2/181002101 fatcat:awqli22gjjdv5bfsku4sgdklkq

Information Retrieval Technique for Web Using NLP

Rini John, Sharvari Govilkar
2017 International Journal on Natural Language Computing  
It further improves the decision making between HCRF and Semi-CRF by using bidirectional approach rather than top-down approach. It enables better understanding of the content and page structure.  ...  HCRF) and extended Semi-Markov Conditional Random Fields (i.e. Semi-CRF) along with Visual Page Segmentation is used to get the accurate results.  ...  And then we are using Semi-CRF for segmentation of the text content to get finer and accurate results which can help to get the search results of the user query to the point.  ... 
doi:10.5121/ijnlc.2017.6501 fatcat:itsxuqwa7rarvb66x77tc3arou

Web Content Mining Techniques: A Survey

Faustina Johnson, Santosh Kumar Gupta
2012 International Journal of Computer Applications  
The web contains structured, unstructured, semi structured and multimedia data. This survey focuses on how to apply content mining on the above data.  ...  It also points out how web content mining can be utilized in web usage mining.  ...  The method used for semi-structured documents are hypertext classification and clustering, learning relations between web documents, learning extraction pattern or rules, and finding patterns in semi-structured  ... 
doi:10.5120/7236-0266 fatcat:zsc5tzbqgvg3vn7l4al4h2e75i

Survey on Query Facets Mining Approaches

2017 International Journal of Science and Research (IJSR)  
Previously there has been lot of work done for retrieving more relevant data to users in order to meet their information needs thus improving performance of search engines.  ...  Different approaches for extraction of query facets from web search results to assist information finding for queries are discussed along with similar techniques used earlier for information retrieval  ...  , the data is now semi-structured.  ... 
doi:10.21275/art20164225 fatcat:zrfcrqzlcja4hphg7ejj3gehpy

Addressing the requirements of a dynamic corporate textual information base

Peter G. Anick, Rex A. Flynn, David R. Hanssen
1991 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '91  
ACKNOWLEDGEMENTS The evolution of the ideas embodied in AI-STARS and its im-  ...  Next we consi&r querying on semi-stmctured objects and show how the notion of storing que- 2 FULL-TEXT INDEXING AND SEARCHING IN Al-STARS We refer to AI-STARS as a "lexicon-assisted" retrieval  ...  In this way, we allow querying on textual infor- mation without requiring the user to know all the different ways people have chosen to subdivide their texts (similar to the ap- proach in [McALPINE89  ... 
doi:10.1145/122860.122876 dblp:conf/sigir/AnickFH91 fatcat:ezhznzpodbd27hq36sbxh37434

The Continued Saga of DB-IR Integration [chapter]

R BAEZAYATES, M CONSENS
2004 Proceedings 2004 VLDB Conference  
the database • Stored in the file system • Stored on the web (URL) • User-defined datastore • Easy to Deploy • Easy to Maintain Oracle Text API • Three index types -context: classic text searching  ...  in the presence of structured and text data -How to support IR-style querying on DBs • Because now users seem to know IR/keyword style querying more, even though structure is good because it supports  ... 
doi:10.1016/b978-012088469-8/50118-2 fatcat:dktiusnpj5hcfbu2fopto7psqq

The Continued Saga of DB-IR Integration [chapter]

Ricardo Baeza-Yates, Mariano Consens
2004 Proceedings 2004 VLDB Conference  
the database • Stored in the file system • Stored on the web (URL) • User-defined datastore • Easy to Deploy • Easy to Maintain Oracle Text API • Three index types -context: classic text searching  ...  in the presence of structured and text data -How to support IR-style querying on DBs • Because now users seem to know IR/keyword style querying more, even though structure is good because it supports  ... 
doi:10.1016/b978-012088469-8.50118-2 dblp:conf/vldb/Baeza-YatesC04 fatcat:2lzk6qlgurgbdoj6do2qtxy2za

An Audio Retrieval Algorithm Based on Audio Shot and Inverted Index

Xueyuan Zhang, Qianhua He
2013 International Journal of Machine Learning and Computing  
An efficient audio indexing and retrieval algorithm is proposed to locate similar audio segments in the database.  ...  We also borrow the idea of inverted file from text retrieval to locate candidates efficiently. Furthermore, a similarity measure combining content and temporal order matching is proposed.  ...  the search with dynamic skip width.  ... 
doi:10.7763/ijmlc.2013.v3.294 fatcat:5pbvaz2hjjg5lhotjwfkd2qacq

A Novel Probabilistic Approach for Efficient Information Retrieval

Sonia Bansal, Reena Garg
2010 International Journal of Computer Applications  
Although many techniques share common characteristics in the information retrieval hierarchy, they all share a core set of similarities that justify their own class and these algorithms are design for  ...  In traditional retrieval systems, the query is given to large corpus to retrieve the relevant documents.  ...  It can be seen that the studies shown above in using HMM for information extraction are mostly focusing on dealing with structured or semi-structured data, while freestructured text remains largely unexplored  ... 
doi:10.5120/1354-1827 fatcat:v3565ywcp5avraoqa3n35myxa4

Web Page Data Collection Based on Multithread

Wen Tao Liu
2013 Applied Mechanics and Materials  
The web data collection is the process of collecting the semi-structured, large-scale and redundant data which include web content, web structure and web usage in the web by the crawler and it is often  ...  used for the information extraction, information retrieval, search engine and web data mining.  ...  The semi-structured data includes html documents, query logs, web search results and so on.  ... 
doi:10.4028/www.scientific.net/amm.347-350.2575 fatcat:oxiqauy3dfhivf5rzum2iztalu

A Systematic Review Web Content Mining Tools and its Applications

Manjunath Pujar, Monica R Mundada
2021 International Journal of Advanced Computer Science and Applications  
Web content mining tools were needed to scan text, images and HTML documents and provide results to the search engine.  ...  Keywords-Web content mining; web structure mining; web usage mining; data mining; information retrieval; information extraction 752 | P a g e www.ijacsa.thesai.org  ...  Semi-structure data won't have pre-define structure, it will be in hierarchical in nature. There are various methods to mine semi structure data like NP, ontology, wrapper generation.  ... 
doi:10.14569/ijacsa.2021.0120886 fatcat:qqoyefg5hjgutiqaasxmbwhoq4

WEB SCALE INFORMATION EXTRACTION USING WRAPPER INDUCTION APPROACH

RINA ZAMBAD, JAYANT GADGE
2014 International Journal of Electronics and Electical Engineering  
Reference sets are used for mapping the user search query, which improvised the scale of search on unstructured and ungrammatical post data. We validate our approach with experimental results.  ...  The proposed architecture extracts unstructured and un-grammatical data using wrapper induction and show the result in structured format.  ...  CONCLUSION AND FUTURE WORK Keyword search over semi-structured and structured data offers users great opportunities to explore betterorganized data.  ... 
doi:10.47893/ijeee.2014.1121 fatcat:jh5qa2w3offqrcnkonkke7mwcm
« Previous Showing results 1 — 15 out of 26,780 results