Filters








5,936 Hits in 7.6 sec

Integrated multi-strategic Web document pre-processing for sentence and word boundary detection

Junhyeok Shim, Dongseok Kim, Jeongwon Cha, Gary Geunbae Lee, Jungyun Seo
2002 Information Processing & Management  
The objective of this paper introduces a multi-strategic integrated text preprocessing method for difficult problems of sentence boundary disambiguation and word boundary disambiguation of Web texts.  ...  accuracy of word spacing correction, and 94.61% accuracy for whole intermixed text preprocessing problems, from Korean news script Web documents.  ...  Inter-related errors of sentence boundary and word spacing In order to show that our integrated multi-strategic approach can handle complex inter-related errors between sentence boundary segmentation and  ... 
doi:10.1016/s0306-4573(01)00044-9 fatcat:ufbm3dvblrcgllfoumnxc5hayi

Distantly supervised Web relation extraction for knowledge base population

Isabelle Augenstein, Diana Maynard, Fabio Ciravegna, Krzysztof Janowicz, Stefan Schlobach, Stefan Schlobach, Krzysztof Janowicz
2016 Semantic Web Journal  
to deal with noise, and integrate information extracted from different Web pages.  ...  Our approach reduces the impact of data sparsity by making entity recognition tools more robust across domains and extracting relations across sentence boundaries using unsupervised coreference resolution  ...  Acknowledgements We thank the EKAW and SWJ reviewers for their valuable feedback.  ... 
doi:10.3233/sw-150180 fatcat:3b4eg5nr25aczife7seenjgrvi

Operational Engines [chapter]

Douglas W. Oard, Carl Madson, Joseph Olive, John McCary, Caitlin Christianson
2011 Handbook of Natural Language Processing and Machine Translation  
As an example of that process, Amit Srivastava and Daniel Kiecza (Section 6.2.3) describe the development of a new technique for establishing sentence boundaries during speech recognition that substantially  ...  As Daniel Kiecza and his colleagues (Section 6.2.2) explain, meeting the need for responsive and accurate performance by integrating systems that each have a different development heritage can be a substantial  ...  We would also like to thank Charles Wayne for his early contributions to the pre-GALE efforts.  ... 
doi:10.1007/978-1-4419-7713-7_6 fatcat:lmrbdpy5sjbkxokmxv4a7vbf74

Gather customer concerns from online product reviews – A text summarization approach

Jiaming Zhan, Han Tong Loh, Ying Liu
2009 Expert systems with applications  
Different from the existing summarization approaches centered on sentence ranking and clustering, our approach discovers and extracts salient topics from a set of online reviews and further ranks these  ...  Existing methods of opinion mining in processing customer reviews concentrates on counting positive and negative comments of review writers, which is not enough to cover all important topics and concerns  ...  Pre-processing steps, including stop words removal and word stemming (Porter, 1980) , are first applied to the review documents in order to reduce the noisy information in the following processes.  ... 
doi:10.1016/j.eswa.2007.12.039 fatcat:suy2t7pw5nacbnctzalvrencnu

Development of a Machine Learning Model for Knowledge Acquisition, Relationship Extraction and Discovery in Domain Ontology Engineering using Jaccord Relationship Extraction and Neural Network

2019 International journal of recent technology and engineering  
A Jaccord Relationship extraction process and the Neural Network Approval for Automated Theory is used for retrieval of data, automated indexing, mapping and knowledge discovery and rule generation.  ...  Updating and validation is impossible without the intervention of domain experts, which is an expensive and tedious process. Thereby, an automatic system to model the ontology has become essential.  ...  The overall process of ontology is illustrated in Figure 1 . There are four key phases in the process.. First phase of this process is pre-processing the input web source documents.  ... 
doi:10.35940/ijrte.c6362.098319 fatcat:whgnce4aa5aofalvusaypt6pje

Joint Information Extraction from the Web Using Linked Data [chapter]

Isabelle Augenstein
2014 Lecture Notes in Computer Science  
Most of the missing information is available on Web pages. To access that knowledge and populate knowledge bases, information extraction methods are necessitated.  ...  Almost all of the big name Web companies are currently engaged in building 'knowledge graphs' and these are showing significant results in improving search, email, calendaring, etc.  ...  Acknowledgements We thank Fabio Ciravegna and Diana Maynard for helping to develop this research plan, Ruben Verborgh and Tom De Nies for their writing tips, as well as the anonymous reviewers for their  ... 
doi:10.1007/978-3-319-11915-1_32 fatcat:4oona332szck3i3gyr46nml4ca

A Method for Identifying Geospatial Data Sharing Websites by Combining Multi-Source Semantic Information and Machine Learning

Quanying Cheng, Yunqiang Zhu, Hongyun Zeng, Jia Song, Shu Wang, Jinqu Zhang, Lang Qian, Yanmin Qi
2021 Applied Sciences  
For this reason, this paper proposes a method to precisely identify GDSW by combining multi-source semantic information and machine learning.  ...  Geospatial data sharing is an inevitable requirement for scientific and technological innovation and economic and social development decisions in the era of big data.  ...  and de-duplication; (4) text data pre-processing-the HTML document of each website was downloaded and processed, and special symbols, word segmentations, stop words, etc. were removed; (5) website feature  ... 
doi:10.3390/app11188705 fatcat:xcibfzfe5jd3hlksc5chrdesyu

Semantic Representation and Inference for NLP [article]

Dongsheng Wang
2021 arXiv   pre-print
Semantic representation and inference is essential for Natural Language Processing (NLP).  ...  The state of the art for semantic representation and inference is deep learning, and particularly Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and transformer Self-Attention  ...  Acknowledgments This research is partially supported by QUARTZ (721321, EU H2020 MSCA-ITN) and DABAI (5153-00004A, Innovation Fund Denmark).  ... 
arXiv:2106.08117v1 fatcat:qi3546wlhfd2xhqj3f776wa6km

MultiLingMine 2016: Modeling, Learning and Mining for Cross/Multilinguality [chapter]

Dino Ienco, Mathieu Roche, Salvatore Romeo, Paolo Rosso, Andrea Tagarelli
2016 Lecture Notes in Computer Science  
This framework is highly modular and can be customized to create applications based on Multilingual Natural Language Processing for classifying domain-dependent contents.  ...  /multi-lingual information retrieval: Ahmet Aker, Univ.  ...  Zagorka Brodić, professor of French and Serbo-Croatian languages, for the helpful discussions about Latin and Italian languages.  ... 
doi:10.1007/978-3-319-30671-1_83 fatcat:znq74oljzfefrfhzdkpphzekz4

Eliciting Data Relations of IOT based on Creative Computing

Lin Zou, Qinyun Liu, Sicong Ma, Fengbao Ma
2019 International Journal of Performability Engineering  
to make predictions based on data relations; and more importantly, it can save costs for organisations in addition to improving the effectiveness and efficiency of businesses in a creative way.  ...  Internet of things aims to create valuable results by responding to changing environments intelligently and creatively.  ...  There will be several pre-set parameter settings to cover time integrity and spatial integrity.  ... 
doi:10.23940/ijpe.19.02.p20.559570 fatcat:67clht3corhstogbtg4yv23xdm

Leveraging User Feedback for Automated Web Page Inline Linking

Adam Oest, Manjeet Rege
2014 International Journal of Multimedia and Image Processing  
documents (i.e. web pages).  ...  The goal of the system is threefold: to increase user interaction with the site being browsed, to discover relevant keywords for each document, and to effectively cluster the documents into semantically-significant  ...  We only keep multi-word phrases if a algorithm 2 returns a value of true for a majority of the words that it is comprised of, and if it returns true for the first and the last words.  ... 
doi:10.20533/ijmip.2042.4647.2014.0024 fatcat:rnc45pnc7vaztj5a2ipfpjlokm

CalibreNet: Calibration Networks for Multilingual Sequence Labeling [article]

Shining Liang, Linjun Shou, Jian Pei, Ming Gong, Wanli Zuo, Daxin Jiang
2020 arXiv   pre-print
To tackle the challenge of lack of training data in low-resource languages, we dedicatedly develop a novel unsupervised phrase boundary recovery pre-training task to enhance the multilingual boundary detection  ...  In the second step, CalibreNet refines the boundary of the initial answer.  ...  Instead, we synthesize initial answers and pre-train the model for entity boundary detection.  ... 
arXiv:2011.05723v1 fatcat:a2fcxczxyvg67avdf724icbbve

Combining literature text mining with microarray data: advances for system biology modeling

A. Faro, D. Giordano, C. Spampinato
2011 Briefings in Bioinformatics  
Thus a current challenge of bioinformatics is to develop targeted methods and tools that integrate scientific literature, biological databases and experimental data for reducing the time of database curation  ...  , chemical, genomic (including microarray datasets), clinical and other types of data repositories are now available on the Web.  ...  The workflow of these approaches is: first, the text is tokenized to identify the boundaries of the words and sentences, then a part-of-speech tagging (e.g.  ... 
doi:10.1093/bib/bbr018 pmid:21677032 fatcat:lvtg7ea74ngb5h6i2ywss6cjru

An Extended Cognitive Situation Model for Capturing Subjective Dynamics of Events from Social Media

Yujie Wang, Damminda Alahakoon, Daswin De Silva
2018 Australasian Journal of Information Systems  
These models explain how fragmental information about events are collected, integrated and updated into a coherent set of views of what the text is about.  ...  This study investigates the mechanisms for constructing and updating the situation models with continuous textual information streamed from heterogeneous forms of media.  ...  Thus researchers conduct statistical analysis of the text structure and classify documents into eventcentric topical categories labelled as lists of representative words with timestamps (James Allan,  ... 
doi:10.3127/ajis.v22i0.1701 fatcat:2ipuoayctjcwfg67llnbr4g6cm

A Review on Text-Based Emotion Detection – Techniques, Applications, Datasets, and Future Directions [article]

Sheetal Kusal, Shruti Patil, Jyoti Choudrie, Ketan Kotecha, Deepali Vora, Ilias Pappas
2022 arXiv   pre-print
Artificial Intelligence (AI) has been used for processing data to make decisions, interact with humans, and understand their feelings and emotions.  ...  The field of text-based emotion detection (TBED) is advancing to provide automated solutions to various applications, such as businesses, and finances, to name a few.  ...  Selection Criteria Authors predominantly employed IEEE, Science Direct, Scopus, and Web of Science databases for searching the documents related to emotion detection.  ... 
arXiv:2205.03235v1 fatcat:b3m25fg6xfc3leeym22eqysq5a
« Previous Showing results 1 — 15 out of 5,936 results