Filters








1,414 Hits in 4.6 sec

Hierarchical Metadata-Aware Document Categorization under Weak Supervision [article]

Yu Zhang, Xiusi Chen, Yu Meng, Jiawei Han
2020
Hence, this paper studies how to integrate the label hierarchy, metadata, and text signals for document categorization under weak supervision.  ...  Categorizing documents into a given label hierarchy is intuitively appealing due to the ubiquity of hierarchical topic structures in massive text corpora.  ...  CONCLUSIONS We present HiMeCat, an embedding-based generative framework for hierarchical metadata-aware document categorization under weak supervision.  ... 
doi:10.48550/arxiv.2010.13556 fatcat:ecsgzyce3bbs7flpwk73viszga

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information [article]

Yu Zhang, Shweta Garg, Yu Meng, Xiusi Chen, Jiawei Han
2022 arXiv   pre-print
In this paper, we explore the potential of using metadata to help weakly supervised text classification.  ...  To be specific, we model the relationships between documents and metadata via a heterogeneous information network.  ...  [39, 40, 41, 44] use a small set of labeled documents or keywords as supervision to categorize text with metadata.  ... 
arXiv:2111.04022v3 fatcat:cuupcigyzfhj7hjiciwonbpjzq

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification [article]

Yu Zhang, Zhihong Shen, Chieh-Han Wu, Boya Xie, Junheng Hao, Ye-Yi Wang, Kuansan Wang, Jiawei Han
2022 arXiv   pre-print
metadata-aware LMTC method trained on 10K-200K labeled documents; and (3) MICoL tends to predict more infrequent labels than supervised methods, thus alleviates the deteriorated performance on long-tailed  ...  In this paper, we study LMTC under the zero-shot setting, which does not require any annotated documents with labels and only relies on label surface names and descriptions.  ...  [19] propose a generic approach to add categorical metadata into neural text classifiers. Zhang et al. [68] further study metadata-aware LMTC.  ... 
arXiv:2202.05932v2 fatcat:ii74jumzsfd3nlxvcshmn2xlca

Automated Educational Course Metadata Generation Based on Semantics Discovery [chapter]

Marián Šimko, Mária Bieliková
2009 Lecture Notes in Computer Science  
In this paper we present a method for automated metadata generation addressing the educational knowledge discovery problem.  ...  The metadata are created automatically under the adaptive course author's (i.e., teacher's) supervision. Thus, his effort in the authoring process is reduced.  ...  Learning objects were organized hierarchically and represented using the DocBook language.  ... 
doi:10.1007/978-3-642-04636-0_11 fatcat:62w4uyrp5jcmdgs7rjoixlqc3a

Modularization and multi-granularity reuse of learning resources

Marek Meyer, Ralf Steinmetz, Abdulmotaleb El Saddik
2009 ACM SIGMultimedia Records  
Con- 6.2.4 Choosing Effectiveness Measures for Hierarchical Categorization hierarchical classification [SLN03].  ...  (6.6) Categorization of a document with the kNN method is based on the similarity of documents.  ...  The adaptation tool is designed to handle different document formats.  ... 
doi:10.1145/1662529.1662533 fatcat:cykmsfw3p5dvrfmnw4dzgvca3y

Automatic image semantic interpretation using social action and tagging data

Neela Sawant, Jia Li, James Z. Wang
2010 Multimedia tools and applications  
Applications are categorized into four types: concept semantics, person identification, location semantics and event semantics.  ...  Tang et al. presented a sparse graph-based semi-supervised learning approach for removing weak pairing between image features and tags [180] .  ...  Functions are categorized into 'organization' and 'communication', whereas sociality is categorized into 'self' and 'social'.  ... 
doi:10.1007/s11042-010-0650-8 fatcat:kqu6kyess5f3re554jsueuuzem

Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering

Sahand Vahidnia, Alireza Abbasi, Hussein A. Abbass
2021 Journal of Data and Information Science  
Document embedding approaches are utilized to transform documents into vector-based representations.  ...  Design/methodology/approach To achieve the objectives, we propose a modified deep clustering method to detect research trends from the abstracts and titles of academic documents.  ...  However, the weak performance of BERT models may come as a surprise.  ... 
doi:10.2478/jdis-2021-0024 fatcat:ww67w2ezhvdrnedr47vwfddima

NTARC: A Data Model for the Systematic Review of Network Traffic Analysis Research

Félix Iglesias, Daniel C. Ferreira, Gernot Vormayr, Maximilian Bachl, Tanja Zseby
2020 Applied Sciences  
Although the goals and methodologies are commonly similar, we lack initiatives to categorize the data, methods, and findings systematically.  ...  The success of data repositories partially lies in creating metadata structures able to categorize and identify datasets and research objects effectively.  ...  The database is made available under a Creative Commons Attribution 4.0 license.  ... 
doi:10.3390/app10124307 fatcat:cvgagi6qyjd3tf5orsp6nymdki

Video Summarization Using Deep Neural Networks: A Survey [article]

Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, Vasileios Mezaris, Ioannis Patras
2021 arXiv   pre-print
Based on the outcomes of these comparisons, as well as some documented considerations about the amount of annotated data and the suitability of evaluation protocols, we indicate potential future research  ...  Instead of not using any ground-truth data, they use less-expensive weak labels (such as video-level metadata for video categorization and category-driven summarization, or ground-truth annotations for  ...  This approach uses video-level metadata (e.g., the video title "A man is cooking") to define a categorization of videos.  ... 
arXiv:2101.06072v2 fatcat:7mozntfhdrf3lkw6pwcr5v2rpu

Big-Data-Augmented Approach to Emerging Technologies Identification: Case of Agriculture and Food Sector

Leonid Gokhberg, Ilya F. Kuzminov, Pavel Bakhtin, Elena Tochilina, Alexander Chulok, Anton Timofeev, Alina Lavrynenko
2017 Social Science Research Network  
of both structured (publication, patent metadata) and unstructured (full text reports, declarations and other documents) formats,  sentence segmentation and word tokenization,  word lemmatization and  ...  This allows to map even emerging fields which haven't yet been categorized for the purposes of official statistics. Figure 2.  ... 
doi:10.2139/ssrn.3078499 fatcat:o6ver5pxordcnmyoehokv5vuqa

Expert recommendation in community question answering: a review and future direction

Zhengfa Yang, Qian Liu, Baowen Sun, Xin Zhao
2019 International Journal of Crowd Science  
Findings This study proposes a comprehensive framework to categorize extant studies into three broad areas of CQA expert recommendation research: understanding profile modeling, recommendation approaches  ...  (Riahi et al., 2012) proposed a segmented topic model (STM) that can discover the hierarchical structure of topics, and thus, instead of grouping all users' questions under one topic, allows each question  ...  (Li and King, 2010) combined expertise-aware QLL with the Jelinek-Mercer smoothing model that leveraged multiple metadata features such as answer length, question-answer length, the number of answers  ... 
doi:10.1108/ijcs-03-2019-0011 fatcat:5waemn4e3zfu5b4n55f6qxafbu

Domino: Discovering Systematic Errors with Cross-Modal Embeddings [article]

Sabri Eyuboglu, Maya Varma, Khaled Saab, Jean-Benoit Delbrouck, Christopher Lee-Messer, Jared Dunnmon, James Zou, Christopher Ré
2022 arXiv   pre-print
Then, motivated by the recent development of powerful cross-modal representation learning approaches, we present Domino, an SDM that leverages cross-modal embeddings and a novel error-aware mixture model  ...  Aortic Valve Malformation Classification (Noisy Label Slice): Weak supervision is commonly used in medical machine learning practice to label clinical datasets.  ...  We begin with a base dataset D base that has either a hierarchical label structure (e.g. ImageNet) or rich metadata accompanying each example (e.g.. CelebA).  ... 
arXiv:2203.14960v3 fatcat:zci7on55mrft5i4lsvhoealira

TAN-NTM: Topic Attention Networks for Neural Topic Modeling [article]

Madhur Panwar, Shashank Shailabh, Milan Aggarwal, Balaji Krishnamurthy
2021 arXiv   pre-print
To this end, we develop a framework TAN-NTM, which processes document as a sequence of tokens through a LSTM whose contextual outputs are attended in a topic-aware manner.  ...  Further, we show that our method learns better latent document-topic features compared to existing topic models through improvement on two downstream tasks: document classification and topic guided keyphrase  ...  Card et al. (2018) leverages document metadata but without metadata their method is same as ProdLDA which is our baseline.  ... 
arXiv:2012.01524v2 fatcat:qzupfp6zzvcejdhqhswx7whniu

Analyzing Sentiments in One Go: A Supervised Joint Topic Modeling Approach

Zhen Hai, Gao Cong, Kuiyu Chang, Peng Cheng, Chunyan Miao
2017 IEEE Transactions on Knowledge and Data Engineering  
We propose a novel probabilistic supervised joint aspect and sentiment model (SJASM) to deal with the problems in one go under a unified framework.  ...  SJASM represents each review document in the form of opinion pairs, and can simultaneously model aspect terms and corresponding opinion words of the review for hidden aspect and sentiment detection.  ...  supported in part by a grant awarded by a Singapore MOE AcRF Tier 2 Grant (ARC30/12), a Singapore MOE AcRF Tier 1 Grant (RG 66/12), and a National Research Foundation, Prime Ministers Office, Singapore under  ... 
doi:10.1109/tkde.2017.2669027 fatcat:24uviz42ffhutc5ayjuvsudo4m

A Survey of Techniques for Event Detection in Twitter

Farzindar Atefeh, Wael Khreich
2013 Computational intelligence  
For instance, Twitter has changed the way people and businesses perform, seek advice, and create "ambient awareness" (a sort of virtual omnipresence) and reinforced the weak and strong tie of friendship  ...  Nevertheless, they are also categorized according to the detection methods that involve supervised, unsupervised, and hybrid approaches.  ... 
doi:10.1111/coin.12017 fatcat:wr3wcvxmavbarityeu2szfcuw4
« Previous Showing results 1 — 15 out of 1,414 results