Filters








266,708 Hits in 5.6 sec

A Survey Paper on Concept Mining in Text Documents

K. N., S. K., Dixa Saxena
2017 International Journal of Computer Applications  
Concept Mining has become an important research area. Concept Mining is used to search or extract the concepts embedded in the text document.  ...  Concept based approach search for the informative terms based on their meaning rather than on the presence of the keyword in the text.  ...  words appear more frequently in general. value.  ... 
doi:10.5120/ijca2017914143 fatcat:kfkfox4kxnb4tggriurzei22um

Relating developers' concepts and artefact vocabulary in a financial software module

Tezcan Dilshener, Michel Wermelinger
2011 2011 27th IEEE International Conference on Software Maintenance (ICSM)  
We compared the relative importance of the domain concepts, as understood by developers, in the user manual and in the source code.  ...  We varied the searches (using exact and stem matching, discarding stop-words, etc.) and present the precision and recall. We discuss the implication of our results for maintenance.  ...  ACKNOWLEDGMENTS We thank Simon Butler for his assistance in using the JIM tool, and our industrial partner, a global financial IT solutions provider located in southern Germany, for providing the artefacts  ... 
doi:10.1109/icsm.2011.6080808 dblp:conf/icsm/DilshenerW11 fatcat:fetdf5vofraztn6tzzs5bjexga

Mining the Text Documents Using Phrase Based Tokenizer Approach

Dr.Mrs. D.shanthi
2013 IOSR Journal of Computer Engineering  
Text mining is the discovery of interesting knowledge in text documents. Many data mining techniques have been proposed for mining useful patterns in text documents.  ...  The polysemy means a word has multiple meanings, and synonymy is multiple words having the same meaning.  ...  For a document d, tf (c) is the number of occurrences of concept c in d; and ct f(c) is called the conceptual term frequency of concept c in a sentence s, which is the number of occurrences of concept  ... 
doi:10.9790/0661-0860614 fatcat:t4gxinkuuvfnbpov5wgkukceki

A cross-lingual framework for monolingual biomedical information retrieval

Dolf Trieschnigg, Djoerd Hiemstra, Franciska de Jong, Wessel Kraaij
2010 Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10  
Frequently, a concept-based representation defined in terms of a domain-specific terminological resource is employed to deal with this challenge.  ...  We demonstrate that the approach can result in significant improvements in retrieval effectiveness over word-based retrieval.  ...  The model based on a statistical thesaurus (STATTHES), also takes into account how frequently a particular word is used to refer to a concept in a corpus of documents.  ... 
doi:10.1145/1871437.1871463 dblp:conf/cikm/TrieschniggHJK10 fatcat:sfyjjbe32bbvtbjgqbsza6w67a

Enhancing information retrieval through concept-based language modeling and semantic smoothing

Lynda Said Lhadj, Mohand Boughanem, Karima Amrouche
2015 Journal of the Association for Information Science and Technology  
In language models, these issues have been addressed by considering dependencies such as bigrams, phrasal-concepts, or word relationships, but such models are estimated using simple n-grams or concept  ...  In this paper, we address polysemy and synonymy mismatch with a conceptbased language modeling approach that combines ontological concepts from external resources with frequently found collocations from  ...  We assume that a concept might be a single word or a multiple words and in both cases it might be an ontology entry or a frequent word collocation in the document but having no entry in the ontology.  ... 
doi:10.1002/asi.23553 fatcat:tavmbsf5yjbhxk7dcu3ld54hw4

PAI: Automatic indexing for extracting asserted keywords from a document

Naohiro Matsumura, Yukio Ohsawa, Mitsuru Ishizuka
2003 New generation computing  
The strategy is that the author's main point is based on the fundamental concepts represented by the cooccurrence between frequent terms in a document.  ...  In this state, all terms have equally low activities, e.g., 1.  ... 
doi:10.1007/bf03042324 fatcat:pjnmi6lzznfjxn2exzvtsxeoq4

Text Document Retrieval through Clustering using Meaningful Frequent Ordered Word Patterns

Pushpalatha K.P, G. Raju
2018 International Journal of Applied Engineering Research  
Many of the algorithms for text document retrieval are based on bag-ofwords (BoW) approach. The sequence of the words are not given much importance in such algorithms.  ...  Association mining is used to construct feature set named as Frequent Ordered Word Patterns (FOWPs) from WordNet-enriched document data sets.  ...  Since the words and group of words in sequence represent concepts or topics dealt within a document, such frequent words or word sets are selected as significant features for clustering.  ... 
doi:10.37622/ijaer/13.7.2018.4822-4833 fatcat:kgwproi2nzabjjed6t7dgetz34

Understanding and customizing stopword lists for enhanced patent mapping

Antoine Blanchard
2007 World Patent Information  
Abstract While the use of patent mapping tools is growing, the 'black-box' systems involved do not generally allow the user to interfere further than the preliminary retrieval of documents.  ...  Except for one thing: the stopword list, i.e. the list of 'noise' words to be ignored, which can be modified to one's liking and dramatically impacts the final output and analysis.  ...  In the corpus, these frequent words account for a large portion of the 298 574 occurrences of words, whereas a large fraction of words appear at a low frequency -including 2 319 'hapax' i.e. words that  ... 
doi:10.1016/j.wpi.2007.02.002 fatcat:viptx72ewjdi5flpvy5qklb3pe

Recent Developments in Text Clustering Techniques

Saurabh Sharma, Vishal Gupta
2012 International Journal of Computer Applications  
In order to make better business decisions, faster database browsing and reducing processing time of queries, Extraction of Information from text documents in efficient manner is needed.  ...  Clustering of huge number of text documents into different clusters, for better management of information, provides for a wide area in which a whole lot of research is currently being pursued.  ...  This implies that this word set is available in at least the minimum number of documents specified by the user. A frequent k-word set is a frequent word set containing k words.  ... 
doi:10.5120/4611-6604 fatcat:xeis4dnwrna4leqzo6oymqpftq

Automated Patent Document Summarization for R&D Intellectual Property Management

Amy Trappey, Charles Trappey, Burgess S. Kao
2006 2006 10th International Conference on Computer Supported Cooperative Work in Design  
In this paper, we propose a patent document summarization system using an integrated approach of key-phrase recognition and significant information density.  ...  In an era of rapid information expansion, people encounter huge amounts of intellectual property (IP) such as patents in digitalformat.  ...  Acknowledgement This paper is partially supported by the National Science Council (Taiwan) and Queensland University of Technology (Australia).  ... 
doi:10.1109/cscwd.2006.253004 dblp:conf/cscwd/TrappeyTK06 fatcat:pn6qb26z45eghlraglhs7eprxq

Towards Automatic Detection and Tracking of Topic Change [chapter]

Florian Holz, Sven Teresniak
2010 Lecture Notes in Computer Science  
So, the analysis is highly independent of the absolute word frequencies and works over the whole frequency spectrum, especially also well for low-frequent words.  ...  For that we examine the contextual shift of the concepts over time slices.  ...  Acknowledgments This research has been funded in part by DFG project Topology-based Visual Analysis of Information Spaces 6 as part of the Focus Project Nr. 1335 Scalable Visual Analytics: Interactive  ... 
doi:10.1007/978-3-642-12116-6_27 fatcat:tgjjnn5odzfyvoqywiyujwfm3i

Frequent Itemset Mining for Clustering Near Duplicate Web Documents [chapter]

Dmitry I. Ignatov, Sergei O. Kuznetsov
2009 Lecture Notes in Computer Science  
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents.  ...  Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.  ...  Samokhin for helpful discussions and participating in software realization of the approach.  ... 
doi:10.1007/978-3-642-03079-6_15 fatcat:wjozoujkbfbjjawoeiyqmetljq

Using Luhn's Automatic Abstract Method to Create Graphs of Words for Document Visualization

Luiz Cláudio Santos Silva, Renelson Ribeiro Sampaio
2014 Social Networking  
The method takes pairs of relevant words and computes the linkage force between them. Relevant words become vertices and links become edges in the resulting graph.  ...  Luhn's automatic abstract creation algorithm, and intends to aggregate more information to document visualization than word counting methods do without the need of external sources.  ...  It proposes the selection of a significant word pool through the definition of high and low frequencies cutoffs. These words should reflect the core subject, or subjects, of the text.  ... 
doi:10.4236/sn.2014.32008 fatcat:tvehoykggbbqnebkwkwy5kb2xe

An Efficient Pharse Based Pattern Taxonomy Deploying Method for Text Document Mining

S. Brindha, Dr. S. Sukumaran
2018 International Journal of Trend in Scientific Research and Development  
In this paper we present a statistical language approach to extract concepts formed by relevant single and multi-word units.  ...  The extraction of multiple word which are expressions that has been increasingly a special topic in the last few years.  ...  real word, whose significance be capable of to be understood.  ... 
doi:10.31142/ijtsrd11270 fatcat:e2u7mtefnbadzci2nkkcubz6t4

Single Document Text Summarization of a Resource-Poor Language using an Unsupervised Technique

2019 International Journal of Engineering and Advanced Technology  
Both the methods are found to be performing better for shorter documents than longer ones.  ...  Automatic text summarization of a resource-poor language is a challenging task. Unsupervised extractive techniques are often preferred for such languages due to scarcity of resources.  ...  The poor performance of TF.IDF model can be explained by the fact that sometimes significant words indicative of a concept occur frequently in the document resulting in low IDF (or ISF) value.  ... 
doi:10.35940/ijeat.a2250.109119 fatcat:h7napz4zxraqjdxudtc7huyczu
« Previous Showing results 1 — 15 out of 266,708 results