24,932 Hits in 2.4 sec

Seed-Guided Deep Document Clustering [chapter]

Mazar Moradi Fard, Thibaut Thonet, Eric Gaussier
2020 Lecture Notes in Computer Science  
In this paper, we jointly learn deep representations and bias the clustering results through the seed words, leading to a Seed-guided Deep Document Clustering approach.  ...  This seed-guided constrained document clustering problem was recently addressed through topic modeling approaches.  ...  Seed-Guided Deep Document Clustering Deep clustering consists in jointly performing clustering and deep representation learning in an unsupervised fashion (e.g., with an auto-encoder).  ... 
doi:10.1007/978-3-030-45439-5_1 fatcat:cug7brgy6bdxzcrwynaiarcz6y

CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and Relation Transferring [article]

Jiaxin Huang, Yiqing Xie, Yu Meng, Yunyi Zhang, Jiawei Han
2020 arXiv   pre-print
In this paper, we propose a method for seed-guided topical taxonomy construction, which takes a corpus and a seed taxonomy described by concept names as input, and constructs a more complete taxonomy based  ...  on user's interest, wherein each node is represented by a cluster of coherent terms.  ...  Seed-Guided Taxonomy Construction.  ... 
arXiv:2010.06714v1 fatcat:u7b7y444dnfxzalqmx4jhqfcqq

Urdu Documents Clustering with Unsupervised and Semi-Supervised Probabilistic Topic Modeling

Mubashar Mustafa, Feng Zeng, Hussain Ghulam, Hafiz Muhammad Arslan
2020 Information  
Document clustering is to group documents according to certain semantic features. Topic model has a richer semantic structure and considerable potential for helping users to know document corpora.  ...  Therefore, document clustering has become a challenging task in Urdu language, which has its own morphology, syntax and semantics.  ...  Seed Topics Seeded-ULDA allow a user to guide the topic discovery process. The user can give sets of seeded words that are representative of the given dataset.  ... 
doi:10.3390/info11110518 fatcat:4t3pre3d2vegzf2kojgeqhflsu

Identification of Vine Weeds in Florida Citrus

Stephen H. Futch, David W. Hall
1969 EDIS  
A combination of leaf, stem, fruit, and/or seed characteristics will aid in the identification process.  ...  A useful guide of characteristics to identify broadleaf plants are included at the end of this article.  ...  Flowers: yellow, 5 to 8 cm long, funnel-shaped, single or in small clusters. Fruit: flat, bean-like, up to 20 inches long, contains oblong, winged seeds.  ... 
doi:10.32473/edis-hs185-2003 fatcat:vochtehfw5brtp6w6isa6dlw7i

Discovering Topic Representative Terms for Short Text Clustering

Shuiqiao Yang, Guangyan Huang, Borui Cai
2019 IEEE Access  
INDEX TERMS Short text, clustering, topic representative terms.  ...  ., supported by a cluster of short texts), and we also call them topic representative terms.  ...  Short texts belong to the same latent topic are grouped as a cluster. STC 2 [4] is a deep learning based clustering framework for short texts.  ... 
doi:10.1109/access.2019.2927345 fatcat:7jltjkmohzae5ha2wk5rlvblzi

CrowdTSC: Crowd-based Neural Networks for Text Sentiment Classification [article]

Keyu Yang, Yunjun Gao, Lei Liang, Song Bian, Lu Chen, Baihua Zheng
2020 arXiv   pre-print
Sampling and clustering are utilized to reduce the cost of crowdsourcing.  ...  Also, we present an attention-based neural network and a hybrid neural network, which incorporate the collected keywords as human being's guidance into deep neural networks.  ...  To reduce the monetary cost of hiring the crowd workers, we design a cluster-based crowdsourcing method to collect keywords in the given text datasets.  ... 
arXiv:2004.12389v1 fatcat:wu23ccmilnad5bvuwiy7diqx7i

Scaling up Analogy with Crowdsourcing and Machine Learning

Joel Chan, Tom Hope, Dafna Shahaf, Aniket Kittur
2016 International Conference on Case-Based Reasoning  
We demonstrate our approach with a crowdsourced analogy identification task, whose results are used to train deep learning algorithms.  ...  In this paper, we propose to leverage crowdsourcing techniques to construct a dataset with rich "analogy-tuning" signals, used to guide machine learning models towards matches based on relations rather  ...  However, these methods are not aimed at finding analogical clusters, which requires supporting deep relational similarity rather than surface similarity.  ... 
dblp:conf/iccbr/ChanHSK16 fatcat:33ye36x4mzg6xig6txr7ik676i

A Survey on Automatically Mining Facets for Web Queries

Duhita V. Pawar, Vina M. Lomte
2017 International Journal of Electrical and Computer Engineering (IJECE)  
From these top seed sites facets are extracted by document parsing, weighting, clustering and ranking of the extracted facets.  ...  to my guide and Head of the Department of Computer Engineering, RMDSSOE, Prof.  ... 
doi:10.11591/ijece.v7i6.pp3700-3704 fatcat:ahqoepjrfbb75c4xfjcdqnfidm

Feature space learning model

Renchu Guan, Xu Wang, Maurizio Marchese, Mary Qu Yang, Yanchun Liang, Chen Yang
2018 Journal of Ambient Intelligence and Humanized Computing  
To avoid the complex training processes in deep learning models which project original feature space into low-dimensional ones, we propose a novel feature space learning (FSL) model.  ...  FSL algorithms are proposed with the feature space updating procedure; (3) FSL can provide a better data understanding and learn descriptive and compact feature spaces without the tough training for deep  ...  This model combines prior information and an assumption of consistency, which could not only embed the labeled information in similarity measurements, but also guide the clustering procedures.  ... 
doi:10.1007/s12652-018-0805-4 pmid:31068980 pmcid:PMC6502470 fatcat:ny7qwf3axrbklk25sy2cn2voum

Improving Seeded k-Means Clustering with Deviation- and Entropy-Based Term Weightings

2020 IEICE transactions on information and systems  
The outcome of document clustering depends on the scheme used to assign a weight to each term in a document.  ...  In addition, their potential combinations are investigated to find optimal solutions in guiding the clustering process.  ...  , Information and Communication Engineers trolling/guiding the process of clustering documents.  ... 
doi:10.1587/transinf.2019iip0017 fatcat:nsngvz7ewnfobhizk44utc4v24

Seeded Hierarchical Clustering for Expert-Crafted Taxonomies [article]

Anish Saha, Amith Ananthram, Emily Allaway, Heng Ji, Kathleen McKeown
2022 arXiv   pre-print
In this work, we study Seeded Hierarchical Clustering (SHC): the task of automatically fitting unlabeled data to such taxonomies using only a small set of labeled examples.  ...  HierSeed assigns documents to topics by weighing document density against topic hierarchical structure.  ...  Definitions Problem Formulation Given an unlabeled corpus D (the fitting set), a hierarchy of topics T 1 of height N and a seed documents set S for each topic, the aim of Seeded Hierarchical Clustering  ... 
arXiv:2205.11602v1 fatcat:jmdwans4jnejlp5rxpwlyd3dbe

W2VLDA: Almost Unsupervised System for Aspect Based Sentiment Analysis [article]

Aitor García-Pablos, Montse Cuadros, German Rigau
2017 arXiv   pre-print
Assessing the seed words impact Since the proposed approach heavily relies on the seed words (i.e. seeds words are the only source of supervision to guide the algorithm to the desired goal), it is interesting  ...  In the case of modelling the polarity of the documents, it usually means using a carefully selected set of seed words.  ... 
arXiv:1705.07687v2 fatcat:iuxbvind6rcz7flr2432hdlwu4

Multi-Label Annotation and Classification of Arabic Texts Based on Extracted Seed Keyphrases and Bi-Gram Alphabet Feed Forward Neural Networks Model

Fatma Elghannam
2022 ACM Transactions on Asian and Low-Resource Language Information Processing  
In this phase, review data instances were automatically annotated as multi-label based on the extracted seed keyphrases clusters.  ...  These keyphrases are referred to as seed keyphrases. Extracted seed keyphrases are divided into several clusters based on their topics. Each cluster is assigned a suitable label.  ...  To identify cluster labels, the obtained seed keyphrases are divided into groups (clusters) based on their relevance to a specific topic (their context).  ... 
doi:10.1145/3539607 fatcat:ld4u43gdsrdqtm22vs27r2zuye

Service Class Driven Dynamic Data Source Discovery with DynaBot

Daniel Rocco, James Caverlee, Ling Liu, Terence Critchlow
2007 International Journal of Web Services Research  
Third, DYNABOT incorporates methods and algorithms for efficient probing of the Deep Web and for discovering and clustering Deep Web sources and services through SCD-based service matching analysis.  ...  To address these challenges, we present DYNABOT, a service-centric crawler for discovering and clustering Deep Web sources offering dynamic content. DYNABOT has three unique characteristics.  ...  Another optimization that can be used to guide the analysis process is document text analysis.  ... 
doi:10.4018/jwsr.2007070102 fatcat:hzqbi4umnfbllpr3y43tldhwmq

Topic Embeddings - A New Approach to Classify Very Short Documents Based on Predefined Topics

Lasse Lommel, Meike Riebling, Burkhardt Funk, Christian Junginger
2019 International Conference on Wirtschaftsinformatik  
We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA.  ...  We use a real-world dataset from online advertising, which is comprised of markedly short documents.  ...  In addition to the topic-word level, seeded LDA guides the probability distributions in the document-topic layer.  ... 
dblp:conf/wirtschaftsinformatik/LommelRFJ19 fatcat:7ex3th6dn5hyhnyp5gz4yueabe
« Previous Showing results 1 — 15 out of 24,932 results