893 Hits in 7.9 sec

Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan
1998 The VLDB journal  
We explore how to organize large text databases hierarchically by topic to aid better searching, browsing and filtering.  ...  Many corpora, such as internet directories, digital libraries, and patent databases are manually organized into topic hierarchies, also called taxonomies.  ...  and Martin van den Berg for comments on the paper.  ... 
doi:10.1007/s007780050061 fatcat:nvo3ikw5svb6vkqpmx3zchdqtm

Modeling semantic relations between visual attributes and object categories via dirichlet forest prior

Xin Chen, Xiaohua Hu, Zhongna Zhou, Yuan An, Tingting He, E.K. Park
2012 Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12  
Secondly, we incorporate the obtained semantic associations between visual attributes and object categories into a text-based topic model and extract descriptive latent topics from external textual knowledge  ...  Experimental results show that the proposed model achieves better ability in describing object-related attributes and makes the inferred latent topics more descriptive.  ...  The WordNet is a large scale lexical database of English Language, in which English words are organized into concepts (synonym sets or synsets) according to synonymy and various lexical and semantic relations  ... 
doi:10.1145/2396761.2398428 dblp:conf/cikm/ChenHZAHP12 fatcat:hnsf3g3uwnb6najihunrxffq3q

Thesauri and ontologies in digital libraries

Dagobert Soergel
2005 Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries - JCDL '05  
, and hierarchy) for topic clarification and hierarchic query term expansion).  ...  It will touch on cross-database and cross-language searching as natural extensions of these functions.  ...  In effect, a topic signature is a query which identifies documents relevant to the topic.  ... 
doi:10.1145/1065385.1065526 dblp:conf/jcdl/Soergel05 fatcat:zpmxl5d3qjaldawhnm7d57xrpa

Challenges in the construction of knowledge bases for human microbiome-disease associations

Varsha Dave Badal, Dustin Wright, Yannis Katsis, Ho-Cheol Kim, Austin D. Swafford, Rob Knight, Chun-Nan Hsu
2019 Microbiome  
and highlight the need for additional innovations in natural language processing (NLP), text mining, taxonomic representations, and field-wide vocabulary standardization in human microbiome research.  ...  researchers reading full-text publications.  ...  NLP and text mining have largely matured in the general domains with [12] [13] [14] as prominent examples.  ... 
doi:10.1186/s40168-019-0742-2 pmid:31488215 pmcid:PMC6728997 fatcat:xk6zoptnq5h7vfu3yebq6hhsn4

A Case Study for Large-Scale Human Microbiome Analysis Using JCVI's Metagenomics Reports (METAREP)

Johannes Goll, Mathangi Thiagarajan, Sahar Abubucker, Curtis Huttenhower, Shibu Yooseph, Barbara A. Methé, Michael Edward Zwick
2012 PLoS ONE  
Specifically, the scalability of the dynamic weighting feature is evaluated and established by its application to the analysis of over 400 million weighted gene annotations derived from 14 billion short  ...  These strategies provide users with a reference of how to conduct similar large-scale metagenomic analyses using METAREP with their own sequence data, while in this study they reveal insights into the  ...  Douglas Rusch for his user and technical feedback.  ... 
doi:10.1371/journal.pone.0029044 pmid:22719821 pmcid:PMC3374610 fatcat:32dsjv3suvguxduhe2ee74bo3e

Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey [article]

Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, Liang Zhao
2018 arXiv   pre-print
Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents.  ...  Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper can be very useful and valuable for introducing LDA approaches in topic modeling.  ...  30916011328, 30918015103), and Nanjing Science and Technology Development Plan Project (201805036).  ... 
arXiv:1711.04305v2 fatcat:jzsx6owjyjfo3gkbohrc2ggkzq

BiG-SLiCE: A Highly Scalable Tool Maps the Diversity of 1.2 Million Biosynthetic Gene Clusters [article]

Satria A Kautsar, Justin J.J. van der Hooft, Dick de Ridder, Marnix H Medema
2020 biorxiv/medrxiv   pre-print
Genome mining for Biosynthetic Gene Clusters (BGCs) has become an integral part of natural product discovery.  ...  We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential.  ...  Acknowledgements We thank Joris Louwen for adding 21 manually selected biosynthetic pfams to the library, Vittorio Tracanna for his constructive feedback on the study, and Jorge C.  ... 
doi:10.1101/2020.08.17.240838 fatcat:jou7ldazh5acfjvwhqt56ggiqa

TOP-SPIN: TOPic discovery via Sparse Principal component INterference [article]

Martin Takáč, Selin Damla Ahipaşaoğlu, Ngai-Man Cheung, Peter Richtárik
2013 arXiv   pre-print
Our approach attacks the maximization problem in sparse PCA directly and is scalable to high-dimensional data.  ...  We propose a novel topic discovery algorithm for unlabeled images based on the bag-of-words (BoW) framework.  ...  This gap is present also when SIFT and p = 5, 000 is used, where the accuracy for group T is 84.5313% and for group D is 96.6250%.  ... 
arXiv:1311.1406v1 fatcat:idzpjcamyjewpe2qtwhgn33dpi

A Survey of Text Clustering Algorithms [chapter]

Charu C. Aggarwal, ChengXiang Zhai
2012 Mining Text Data  
The problem finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing.  ...  We will discuss the key methods used for text clustering, and their relative advantages. We will also discuss a number of recent advances in the area in the context of social network and linked data.  ...  We note that the field of text clustering is too vast to cover comprehensively in a single chapter.  ... 
doi:10.1007/978-1-4614-3223-4_4 fatcat:ileuq6jxrrgthee2bba5e4secq

Knowledge Harvesting for Business Intelligence [chapter]

Nesrine Ben Mustapha, Marie-Aude Aufaure
2013 Lecture Notes in Business Information Processing  
This paper aims at describing the importance of semantic technologies (ontologies) and knowledge extraction techniques for knowledge management, search and capture in e-business processes.  ...  We will present the state of the art of ontology learning approaches from textual data and web environment and their integration in enterprise systems to perform personalized and incremental knowledge  ...  • Informal Taxonomy: provides explicit organizing categories from general concepts to specific ones.  ... 
doi:10.1007/978-3-642-36318-4_8 fatcat:rzo4x452orbgveortwrw5izgry

A survey on information visualization: recent advances and challenges

Shixia Liu, Weiwei Cui, Yingcai Wu, Mengchen Liu
2014 The Visual Computer  
The research on InfoVis is organized into a taxonomy that contains four main categories, namely empirical methodologies, user interactions, visualization frameworks, and applications, which are each described  ...  At the conclusion of this survey, we identify existing technical challenges and propose directions for future research.  ...  Visualization of static textual information The visualization work on static text information can be classified into two categories: feature-based text visualization and topic-based text visualization.  ... 
doi:10.1007/s00371-013-0892-3 fatcat:k2y4xrmffzghvldzn6fg2tkjqi

Automated Subject Classification of Textual Documents in the Context of Web-Based Hierarchical Browsing

Koraljka Golub
2011 Knowledge organization  
Acknowledgments The Swedish Agency for Innovation Systems provided the main funding for this research.  ...  Acknowledgments Many thanks to Traugott Koch, Anders Ardö, Tatjana Aparac Jelušić, Johan Eklund, Ingo Frommholz, Repke de Vries and the Journal of Documentation reviewers for providing valuable feedback  ...  Chakrabarti, S., Dom, B., and Indyk, P. (1998b), "Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies", Journal of  ... 
doi:10.5771/0943-7444-2011-3-230 fatcat:le3bszc7dzgs5if646ctxktfyi

Multi-scale navigation of large trace data: A survey

Naser Ezzati-Jivan, Michel R. Dagenais
2017 Concurrency and Computation  
Acknowledgement The support of the Natural Sciences and Engineering Research Council of Canada (NSERC), Ericsson Software Research, and Defence Research and Development Canada (DRDC) is gratefully acknowledged  ...  Jumpshot uses the SLOG2 trace format [87] , a hierarchical file format to handle a large number of events and states in a scalable way, even for large-scale applications.  ...  structures used for managing intervals and their hierarchical organization.  ... 
doi:10.1002/cpe.4068 fatcat:dhkrl5ukhbd3dguskasfanbicq

TopSpin: TOPic Discovery via Sparse Principal Component INterference [chapter]

Martin Takáč, Selin Damla Ahipaşaoğlu, Ngai-Man Cheung, Peter Richtárik
2019 Brain-Inspired Intelligence and Visual Perception  
This gap is present also when SIFT and p = 5, 000 is used, where the accuracy for group T is 84.5313% and for group D is 96.6250%.  ...  This is the reason why we have discarded D and used only T for testing.  ...  In their work, the generative Hierarchical Latent Dirichlet Allocation (hLDA) model, previously used for text analysis [2] , is adapted to the visual domain.  ... 
doi:10.1007/978-3-030-12119-8_8 fatcat:hrwwrdmiy5an3gsiz7aw5vb5ya

Mining latent entity structures from massive unstructured and interconnected data

Jiawei Han, Chi Wang
2014 Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14  
A mining framework is proposed, to solve and integrate a chain of tasks: hierarchical topic discovery, topical phrase mining, entity role analysis and entity relation mining.  ...  The framework enables recursive construction of phrase-represented and entity-enriched topic hierarchy from text-attached information networks.  ...  While topic models have clear application in facilitating understanding, organization, and exploration in large text collections such as those found in full-text databases, difficulty in interpretation  ... 
doi:10.1145/2588555.2588890 dblp:conf/sigmod/HanW14 fatcat:js7d3r5yd5gbfgnhjgwsfmco2i
« Previous Showing results 1 — 15 out of 893 results