927 Hits in 7.4 sec

Large Scale Unstructured Document Classification Using Unlabeled Data and Syntactic Information [chapter]

Seong-Bae Park, Byoung-Tak Zhang
2003 Lecture Notes in Computer Science  
In this paper, we present an approach for classifying large scale unstructured documents by incorporating both lexical and syntactic information of documents.  ...  Since both lexical and syntactic information can play roles of separated views for the unstructured documents, the co-training algorithm enhances the performance of document classification using both of  ...  This research was supported by the Korean Ministry of Education under the BK21-IT program and by the Korean Ministry of Science and Technology under BrainTech program.  ... 
doi:10.1007/3-540-36175-8_9 fatcat:2gs4bfit7vbklbx3zbyueocuoi

Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information

Seong-Bae Park, Byoung-Tak Zhang
2004 Information Processing & Management  
In this paper, we present an approach for classifying large scale unstructured documents by incorporating both the lexical and the syntactic information of documents.  ...  Since both the lexical and the syntactic information can play roles of separated views for the unstructured documents, the co-training algorithm enhances the performance of document classification using  ...  Acknowledgements This research was supported by the Korean Ministry of Education under the BK21-IT Program and by the Korean Ministry of Science and Technology under NRL and BrainTech program.  ... 
doi:10.1016/j.ipm.2003.09.003 fatcat:uydcfmf7jneljntkp3st73slpe

Rapid-Rate: A Framework for Semi-supervised Real-time Sentiment Trend Detection in Unstructured Big Data [article]

Vineet John
2017 arXiv   pre-print
Some sources, however, like social media (Twitter, Facebook), mailing lists (Google Groups) and forums (Quora) contain text data that is much more voluminous, but unstructured and unlabelled.  ...  This research project also aims to design and implement a re-usable document regression pipeline as a framework, Rapid-Rate, that can be used to predict document scores in real-time.  ...  Unstructured Big Data Unstructured data is defined as data, large amounts of it, in this case, which do not abide by a strict schema [14] .  ... 
arXiv:1703.08088v1 fatcat:hfvlsyw6azhmlheaq4bc4qyajy

Incremental Learning for Classification of Unstructured Data Using Extreme Learning Machine

Sathya Madhusudhanan, Suresh Jaganathan, Jayashree L S
2018 Algorithms  
In this paper, we propose a framework CUIL (Classification of Unstructured data using Incremental Learning) which clusters the metadata, assigns a label for each cluster and then creates a model using  ...  Unstructured data are irregular information with no predefined data model.  ...  of large-scale unstructured documents.  ... 
doi:10.3390/a11100158 fatcat:jmmgotm53ff37pggupn4hqwiva

Text Complexity Classification Data Mining Model Based on Dynamic Quantitative Relationship between Modality and English Context

Dan Zhang, Gengxin Sun
2021 Mathematical Problems in Engineering  
these unstructured data to obtain potentially valuable information.  ...  With the rapid development of mobile internet technology, there are a large number of unstructured data in dynamic data, such as text data, multimedia data, etc., so it is essential to analyze and process  ...  and process these unstructured data to obtain potentially valuable information.  ... 
doi:10.1155/2021/4805537 fatcat:zfnleith7vcwjbkokxkcfprhaq

LeSSA: A Unified Framework based on Lexicons and Semi-Supervised Learning Approaches for Textual Sentiment Classification

Jawad Khan, Young-Koo Lee
2019 Applied Sciences  
How to effectively utilize the concealed significant information in the unstructured data? How to learn the model while considering the most effective sentiment features?  ...  (b) training classification models based on a high-quality training dataset generated by using k-mean clustering, active learning, self-learning, and co-training algorithms.  ...  concealed information in the unstructured data.  ... 
doi:10.3390/app9245562 fatcat:adzlvshbmbfklew457auwrh7ue

Information Extraction from Multifaceted Unstructured Big Data

2019 International journal of recent technology and engineering  
Information extraction can play a vital role in transformation of unstructured data into useful information.  ...  These applications are facilitating the industries in context of data-driven decision making, big data storage, and complex analysis of large data sets.  ...  Supervised and unsupervised approaches used a large amount of training data to achieve high performance but semi-supervised uses both labeled and unlabeled corpus with a small degree of supervision [5  ... 
doi:10.35940/ijrte.b1074.0882s819 fatcat:iwpaamsftrgztduhgfkbmgo3du

An analytical study of information extraction from unstructured and multidimensional big data

Kiran Adnan, Rehan Akbar
2019 Journal of Big Data  
The extracted information from unstructured data is used to prepare data for analysis.  ...  Introduction Information extraction (IE) process extracts useful structured information from the unstructured data in the form of entities, relations, objects, events and many other types.  ...  ") AND ("big data" OR "large-scale data" OR "large data" OR "volume") AND ("unstructured data" OR "nonstructured data" OR "nonrelational data" OR "free text" OR "image" OR "audio" OR "video")).  ... 
doi:10.1186/s40537-019-0254-8 fatcat:qy5l55um7feeblec4hxohr3pqa

Learning for Biomedical Information Extraction: Methodological Review of Recent Advances [article]

Feifan Liu, Jinying Chen, Abhyuday Jagannatha, Hong Yu
2016 arXiv   pre-print
In addition, we dive into open information extraction and deep learning, two emerging and influential techniques and envision next generation of BioIE.  ...  Biomedical information extraction (BioIE) is important to many applications, including clinical decision support, integrative biology, and pharmacovigilance, and therefore it has been an active research  ...  OpenIE techniques have been drawing more and more attention to enhance and scale BioIE systems by utilizing large, complex and heterogeneous data (different genres of textual data, structured vs. unstructured  ... 
arXiv:1606.07993v1 fatcat:7d5om7zxxzhoviiriasrfwg3xi

Learning Task Specific Distributed Paragraph Representations Using a 2-Tier Convolutional Neural Network [chapter]

Tao Chen, Ruifeng Xu, Yulan He, Xuan Wang
2015 Lecture Notes in Computer Science  
Specifically, we learn distributed word representations by a continuous bag-of-words model from a large unstructured text corpus.  ...  Our proposed model has been evaluated on topic classification based on the DBpedia ontology and sentiment classification of Amazon reviews.  ...  It can capture a large number of syntactic and semantic word relationships from unstructured text data.  ... 
doi:10.1007/978-3-319-26532-2_51 fatcat:j3rbpzbujfd75mhxfpveu6wfla

Concept relation extraction using Naïve Bayes classifier for ontology-based question answering systems

G. Suresh kumar, G. Zayaraz
2015 Journal of King Saud University: Computer and Information Sciences  
Automatic ontology construction is possible by extracting concept relations from unstructured large-scale text.  ...  In this paper, we propose a methodology to extract concept relations from unstructured text using a syntactic and semantic probability-based Naı¨ve Bayes classifier.  ...  The EM algorithm is used to maximize the likelihood with both labeled and unlabeled data.  ... 
doi:10.1016/j.jksuci.2014.03.001 fatcat:na3bz6fdsrdkvgkcqggvlyr6sa

Relation extraction using label propagation based semi-supervised learning

Jinxiu Chen, Donghong Ji, Chew Lim Tan, Zhengyu Niu
2006 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL - ACL '06  
It represents labeled and unlabeled examples and their distances as the nodes and the weights of edges of a graph, and tries to obtain a labeling function to satisfy two constraints: 1) it should be fixed  ...  Shortage of manually labeled data is an obstacle to supervised relation extraction methods.  ...  to scale the weights.  ... 
doi:10.3115/1220175.1220192 dblp:conf/acl/ChenJTN06 fatcat:qvt3o7ezujbttdmjvrrrevhrji

Exploring Semi-supervised Variational Autoencoders for Biomedical Relation Extraction [article]

Yijia Zhang, Zhiyong Lu
2019 arXiv   pre-print
In contrast, there is a large amount of unlabeled biomedical text available in PubMed.  ...  The classifier is implemented using multi-layer convolutional neural networks (CNNs), and the encoder and decoder are implemented using both bidirectional long short-term memory networks (Bi-LSTMs) and  ...  Herrero-Zazo, and T. Declerck for their support with the DDI 2013 corpus. Funding This work was supported by the NIH Intramural Research Program, National Library of Medicine.  ... 
arXiv:1901.06103v1 fatcat:j3fuvh6ebnbpdhun7yt3k5lpqi

Distant Supervision with Transductive Learning for Adverse Drug Reaction Identification from Electronic Medical Records

Siriwon Taewijit, Thanaruk Theeramunkong, Mitsuru Ikeda
2017 Journal of Healthcare Engineering  
Information extraction and knowledge discovery regarding adverse drug reaction (ADR) from large-scale clinical texts are very useful and needy processes.  ...  of relations for unlabeled data.  ...  processing of large-scale unstructured clinical texts.  ... 
doi:10.1155/2017/7575280 pmid:29090077 pmcid:PMC5635478 fatcat:pdroxylgljaoragkwuqkucwq4q

NLP-based platform as a service: a brief review

Sebastião Pais, João Cordeiro, M. Luqman Jamil
2022 Journal of Big Data  
Natural language processing (NLP) is a rapidly developing field of artificial intelligence and data science that deals with speech and text processing technologies.  ...  Many NLP-related software tasks have been successfully solved and integrated that are used on the internet, such as morphological  ...  Acknowledgements This work was supported by National Founding from the FCT Fundação para a Ciência e a Tecnologia, through the MOVES Project-PTDC/EEI-AUT/28918/2017, and by Operação Centro-01-0145-FEDER  ... 
doi:10.1186/s40537-022-00603-5 dblp:journals/jbd/PaisCJ22 fatcat:xcgqvsn6fzcglkfbibtonw5qmu
« Previous Showing results 1 — 15 out of 927 results