A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies
1998
The VLDB journal
We explore how to organize large text databases hierarchically by topic to aid better searching, browsing and filtering. ...
Many corpora, such as internet directories, digital libraries, and patent databases are manually organized into topic hierarchies, also called taxonomies. ...
and Martin van den Berg for comments on the paper. ...
doi:10.1007/s007780050061
fatcat:nvo3ikw5svb6vkqpmx3zchdqtm
Modeling semantic relations between visual attributes and object categories via dirichlet forest prior
2012
Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12
Secondly, we incorporate the obtained semantic associations between visual attributes and object categories into a text-based topic model and extract descriptive latent topics from external textual knowledge ...
Experimental results show that the proposed model achieves better ability in describing object-related attributes and makes the inferred latent topics more descriptive. ...
The WordNet is a large scale lexical database of English Language, in which English words are organized into concepts (synonym sets or synsets) according to synonymy and various lexical and semantic relations ...
doi:10.1145/2396761.2398428
dblp:conf/cikm/ChenHZAHP12
fatcat:hnsf3g3uwnb6najihunrxffq3q
Thesauri and ontologies in digital libraries
2005
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries - JCDL '05
, and hierarchy) for topic clarification and hierarchic query term expansion). ...
It will touch on cross-database and cross-language searching as natural extensions of these functions. ...
In effect, a topic signature is a query which identifies documents relevant to the topic. ...
doi:10.1145/1065385.1065526
dblp:conf/jcdl/Soergel05
fatcat:zpmxl5d3qjaldawhnm7d57xrpa
Challenges in the construction of knowledge bases for human microbiome-disease associations
2019
Microbiome
and highlight the need for additional innovations in natural language processing (NLP), text mining, taxonomic representations, and field-wide vocabulary standardization in human microbiome research. ...
researchers reading full-text publications. ...
NLP and text mining have largely matured in the general domains with [12] [13] [14] as prominent examples. ...
doi:10.1186/s40168-019-0742-2
pmid:31488215
pmcid:PMC6728997
fatcat:xk6zoptnq5h7vfu3yebq6hhsn4
A Case Study for Large-Scale Human Microbiome Analysis Using JCVI's Metagenomics Reports (METAREP)
2012
PLoS ONE
Specifically, the scalability of the dynamic weighting feature is evaluated and established by its application to the analysis of over 400 million weighted gene annotations derived from 14 billion short ...
These strategies provide users with a reference of how to conduct similar large-scale metagenomic analyses using METAREP with their own sequence data, while in this study they reveal insights into the ...
Douglas Rusch for his user and technical feedback. ...
doi:10.1371/journal.pone.0029044
pmid:22719821
pmcid:PMC3374610
fatcat:32dsjv3suvguxduhe2ee74bo3e
Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey
[article]
2018
arXiv
pre-print
Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. ...
Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper can be very useful and valuable for introducing LDA approaches in topic modeling. ...
30916011328, 30918015103), and Nanjing Science and Technology Development Plan Project (201805036). ...
arXiv:1711.04305v2
fatcat:jzsx6owjyjfo3gkbohrc2ggkzq
BiG-SLiCE: A Highly Scalable Tool Maps the Diversity of 1.2 Million Biosynthetic Gene Clusters
[article]
2020
biorxiv/medrxiv
pre-print
Genome mining for Biosynthetic Gene Clusters (BGCs) has become an integral part of natural product discovery. ...
We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. ...
Acknowledgements We thank Joris Louwen for adding 21 manually selected biosynthetic pfams to the library, Vittorio Tracanna for his constructive feedback on the study, and Jorge C. ...
doi:10.1101/2020.08.17.240838
fatcat:jou7ldazh5acfjvwhqt56ggiqa
TOP-SPIN: TOPic discovery via Sparse Principal component INterference
[article]
2013
arXiv
pre-print
Our approach attacks the maximization problem in sparse PCA directly and is scalable to high-dimensional data. ...
We propose a novel topic discovery algorithm for unlabeled images based on the bag-of-words (BoW) framework. ...
This gap is present also when SIFT and p = 5, 000 is used, where the accuracy for group T is 84.5313% and for group D is 96.6250%. ...
arXiv:1311.1406v1
fatcat:idzpjcamyjewpe2qtwhgn33dpi
A Survey of Text Clustering Algorithms
[chapter]
2012
Mining Text Data
The problem finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. ...
We will discuss the key methods used for text clustering, and their relative advantages. We will also discuss a number of recent advances in the area in the context of social network and linked data. ...
We note that the field of text clustering is too vast to cover comprehensively in a single chapter. ...
doi:10.1007/978-1-4614-3223-4_4
fatcat:ileuq6jxrrgthee2bba5e4secq
Knowledge Harvesting for Business Intelligence
[chapter]
2013
Lecture Notes in Business Information Processing
This paper aims at describing the importance of semantic technologies (ontologies) and knowledge extraction techniques for knowledge management, search and capture in e-business processes. ...
We will present the state of the art of ontology learning approaches from textual data and web environment and their integration in enterprise systems to perform personalized and incremental knowledge ...
• Informal Taxonomy: provides explicit organizing categories from general concepts to specific ones. ...
doi:10.1007/978-3-642-36318-4_8
fatcat:rzo4x452orbgveortwrw5izgry
A survey on information visualization: recent advances and challenges
2014
The Visual Computer
The research on InfoVis is organized into a taxonomy that contains four main categories, namely empirical methodologies, user interactions, visualization frameworks, and applications, which are each described ...
At the conclusion of this survey, we identify existing technical challenges and propose directions for future research. ...
Visualization of static textual information The visualization work on static text information can be classified into two categories: feature-based text visualization and topic-based text visualization. ...
doi:10.1007/s00371-013-0892-3
fatcat:k2y4xrmffzghvldzn6fg2tkjqi
Automated Subject Classification of Textual Documents in the Context of Web-Based Hierarchical Browsing
2011
Knowledge organization
Acknowledgments The Swedish Agency for Innovation Systems provided the main funding for this research. ...
Acknowledgments Many thanks to Traugott Koch, Anders Ardö, Tatjana Aparac Jelušić, Johan Eklund, Ingo Frommholz, Repke de Vries and the Journal of Documentation reviewers for providing valuable feedback ...
Chakrabarti, S., Dom, B., and Indyk, P. (1998b), "Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies", Journal of ...
doi:10.5771/0943-7444-2011-3-230
fatcat:le3bszc7dzgs5if646ctxktfyi
Multi-scale navigation of large trace data: A survey
2017
Concurrency and Computation
Acknowledgement The support of the Natural Sciences and Engineering Research Council of Canada (NSERC), Ericsson Software Research, and Defence Research and Development Canada (DRDC) is gratefully acknowledged ...
Jumpshot uses the SLOG2 trace format [87] , a hierarchical file format to handle a large number of events and states in a scalable way, even for large-scale applications. ...
structures used for managing intervals and their hierarchical organization. ...
doi:10.1002/cpe.4068
fatcat:dhkrl5ukhbd3dguskasfanbicq
TopSpin: TOPic Discovery via Sparse Principal Component INterference
[chapter]
2019
Brain-Inspired Intelligence and Visual Perception
This gap is present also when SIFT and p = 5, 000 is used, where the accuracy for group T is 84.5313% and for group D is 96.6250%. ...
This is the reason why we have discarded D and used only T for testing. ...
In their work, the generative Hierarchical Latent Dirichlet Allocation (hLDA) model, previously used for text analysis [2] , is adapted to the visual domain. ...
doi:10.1007/978-3-030-12119-8_8
fatcat:hrwwrdmiy5an3gsiz7aw5vb5ya
Mining latent entity structures from massive unstructured and interconnected data
2014
Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14
A mining framework is proposed, to solve and integrate a chain of tasks: hierarchical topic discovery, topical phrase mining, entity role analysis and entity relation mining. ...
The framework enables recursive construction of phrase-represented and entity-enriched topic hierarchy from text-attached information networks. ...
While topic models have clear application in facilitating understanding, organization, and exploration in large text collections such as those found in full-text databases, difficulty in interpretation ...
doi:10.1145/2588555.2588890
dblp:conf/sigmod/HanW14
fatcat:js7d3r5yd5gbfgnhjgwsfmco2i
« Previous
Showing results 1 — 15 out of 893 results