9,399 Hits in 7.0 sec

Online annotation of text streams with structured entities

Ken Q. Pu, Oktie Hassanzadeh, Richard Drake, Renée J. Miller
2010 Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10  
We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database.  ...  Our algorithm does so with a guarantee of constant time and space complexity for each additional word in the text stream, thus infinite text streams can be annotated.  ...  CONCLUSION AND FUTURE WORK We have presented an online entity annotation algorithm which correlates an unbounded text stream with databases of entities by means of non-deterministic annotation.  ... 
doi:10.1145/1871437.1871446 dblp:conf/cikm/PuHDM10 fatcat:qrbj34xfszaifhs24zwbktscjm

The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams [article]

David Graus, Daan Odijk, Maarten de Rijke
2017 arXiv   pre-print
The online text streams we use for our analysis comprise of social media and news streams, and span over 579 million documents in a timespan of 18 months.  ...  Specifically, we analyze nearly 80,000 entities as they emerge in online text streams before they are incorporated into Wikipedia.  ...  All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.  ... 
arXiv:1701.04039v2 fatcat:h3bh2awyljb5hmyam64kidjjhi

Processing online news streams for large-scale semantic analysis

Milos Krstajic, Florian Mansmann, Andreas Stoffel, Martin Atkinson, Daniel A. Keim
2010 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)  
In this paper, the semantic analysis of such news streams is discussed by introducing a system that streams online news collected by the Europe Media Monitor to our proposed semantic news analysis system  ...  To demonstrate the use of our system, the case studies show a) temporal analysis of entities, such as institutions or persons, and b) their co-occurence in news articles.  ...  ACKNOWLEDGEMENTS This work was partially funded by the German Research Society (DFG) under grant GK-1042, "Explorative Analysis and Visualization of Large Information Spaces".  ... 
doi:10.1109/icdew.2010.5452710 dblp:conf/icde/KrstajicMSAK10 fatcat:75zm2bodnrebncgdbhkaifxdw4

Pattern-Based Annotation of HTML-Streams [chapter]

Florian Schmedding, Max Schwaibold, Kai Simon
2009 Lecture Notes in Computer Science  
In this paper, we demonstrate Atheris, a system that annotates structured web pages by means of our web data extraction tool ViPER.  ...  Web pages containing RDFa markup facilitate a broad range of new agents that improve their usability for human readers. Unfortunately, there still exist only few web sites featuring such annotations.  ...  The purpose of these rules is the automatic labeling of named entities, the splitting of strings, and the removal of text from data entities to simplify further rule assignments.  ... 
doi:10.1007/978-3-642-02121-3_77 fatcat:57636w2mmbap7adlbqky74vjyy

Making sense of social media streams through semantics: A survey

Kalina Bontcheva, Dominic Rout
2014 Semantic Web Journal  
Unlike carefully authored news text and longer web context, social media streams pose a number of new challenges, due to their large-scale, short, noisy, contextdependent, and dynamic nature.  ...  This paper defines five key research questions in this new application area, examined through a survey of state-of-the-art approaches for mining semantics from social media streams; user, network, and  ...  The authors wish to thank Marta Sabou and Arno Scharl for the discussions on crowdsourcing and its role in semantic technologies research, as well as Diana Maynard for the discussions on opinion mining of  ... 
doi:10.3233/sw-130110 fatcat:uytdbegs3ngcbpu62i4trrxjni

A Perspective on Text Classification, Clustering, and Named-entity Recognition in Social Media

Kia Jahanbin, Research Center for Social Determinants of Health, Jahrom Universityof Medical Sciences, Jahrom, Iran, Fereshte Rahmanian, Vahid Rahmanian, Masihollah Shakeri, Heshmatollah Shakeri, Zhila Rahmanian, Abdolreza Sotoodeh Jahromi
Knowledge discovery from the text (KDT or Text Mining) was f irst introduced by Feldman & Dagan (1995), refers to the process of extracting high quality of information from structured; such as RDBMS data  ...  (Akbari ., 2018; Chen ., 1996) , semi-structured; such as XML and JSON, and unstructured text resources; such as word documents, videos, and images (Pouriyeh & Doroodchi, 2009; Pouriyeh , 2010) .  ...  And producing an annotated block of text that remarks the names of entities: [John] Person sold 500 shares of [Apple] Organization in [2018] Time.  ... 
doi:10.21276/ambi.2019.06.1.ga01 fatcat:mvug2ixu5fe3jfshswi42lxofa

Interactive Knowledge Base Population [article]

Travis Wolfe, Mark Dredze, James Mayfield, Paul McNamee, Craig Harman, Tim Finin, Benjamin Van Durme
2015 arXiv   pre-print
Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible.  ...  We argue for and describe a new paradigm where the focus is on a high-recall extraction over a small collection of documents under the supervision of a human expert, that we call Interactive Knowledge  ...  In the small text collection paradigm, asking a user for a fairly complete annotation of the entities and proposition expressed in text is feasible, and system output can be vetted by an expert.  ... 
arXiv:1506.00301v1 fatcat:jtgxyijndbcvhfz7zr6jdz43j4

ChemEx: information extraction system for chemical data curation

Atima Tharatipyakul, Somrak Numnark, Duangdao Wichadakul, Supawadee Ingsriswang
2012 BMC Bioinformatics  
Text annotator is able to extract compound, organism, and assay entities from text content while structure image recognition enables translation of chemical raster images to machine readable format.  ...  A user can view annotated text along with summarized information of compounds, organism that produces those compounds, and assay tests.  ...  The full contents of the supplement are available online at supplements/13/S17.  ... 
doi:10.1186/1471-2105-13-s17-s9 pmid:23282330 pmcid:PMC3521388 fatcat:quqcssngxvcr3f4w4z4d3mwwkm

Ontology-driven Annotation and Access of Educational Video Data in E-learning [chapter]

Aijuan Dong, Honglin Li, Baoying Wang
2010 E-learning Experiences and Future  
Based on this observation of the presentation structure, we propose a text-based segmentation algorithm-Topic Words Introduction (TWI).  ...  Slide-level segmentation operates on slide video streams, while topic-level segmentation makes use of extracted slide text.  ...  /books/e-learning-experiences-and-future/ontologydriven-annotation-and-access-of-educational-video-data-in-e-learning © 2010 The Author(s).  ... 
doi:10.5772/8798 fatcat:2obk6ytrpreufkzws7tzo7k6fi

TweetsCOV19 – A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic [article]

Dimitar Dimitrov, Erdal Baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, Stefan Dietze
2020 arXiv   pre-print
With respect to the recent outbreak of COVID-19, online discourse on Twitter reflects public opinion and perception related to the pandemic itself as well as mitigating measures and their societal impact  ...  However, obtaining, archiving and semantically annotating large amounts of tweets is costly.  ...  Some datasets contain only information filtered from raw twitter stream data, for instance, to extract subsets of relevance to particular events 24 while others include annotations, such as mentioned entities  ... 
arXiv:2006.14492v3 fatcat:63d7pjtk2zeonhwwzcpv3xemxm

Knowledge capture from multiple online sources with the extensible web retrieval toolkit (eWRT)

Albert Weichselbraun, Arno Scharl, Heinz-Peter Lang
2013 Proceedings of the seventh international conference on Knowledge capture - K-CAP '13  
Knowledge capture approaches in the age of massive Web data require robust and scalable mechanisms to acquire, consolidate and pre-process large amounts of heterogeneous data, both unstructured and structured  ...  It includes classes for caching and data management, and provides low-level text processing capabilities including language detection, phonetic string similarity measures, and string normalization.  ...  The research presented in this paper has been conducted as part of the DIVINE Project ( funded by the Austrian Research Promotion Agency, the WISDOM Project (  ... 
doi:10.1145/2479832.2479861 dblp:conf/kcap/WeichselbraunSL13 fatcat:ecrjzz36tzf3rmm4ek6zoubsrm

Hierarchical multi-label classification of social text streams

Zhaochun Ren, Maria-Hendrike Peetz, Shangsong Liang, Willemijn van Dolen, Maarten de Rijke
2014 Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval - SIGIR '14  
We extend each short document in social text streams to a more comprehensive representation via state-of-the-art entity linking and sentence ranking strategies.  ...  In this paper we focus on hierarchical multi-label classification of social text streams.  ...  The tweets were collected and annotated as part of their online reputation management campaign.  ... 
doi:10.1145/2600428.2609595 dblp:conf/sigir/RenPLDR14 fatcat:4h5pyvebgrdu7h4h7rnbnjxfpe

Survey of Semantic Media Annotation Tools for the Web: Towards New Media Applications with Linked Media [chapter]

Lyndon Nixon, Raphaël Troncy
2014 Lecture Notes in Computer Science  
toolsets which can support Linked Media conformant annotation, closing with a call to future semantic media annotation tools and services to follow the same principles and ensure the growth of a Linked  ...  Media layer of semantic descriptions of online media which can be an enabler to richer future online media services.  ...  The annotations are shown with their concept labels, with the addition/editing of annotations taking place in an easy-to-use wizard which allows plain text entry and suggests concepts to the annotator,  ... 
doi:10.1007/978-3-319-11955-7_9 fatcat:a5om5hx5nffotbe2srn4tum5o4

A survey on online tweet segmentation for linguistic features

R.P. Narmadha, G.G. Sreeja
2016 2016 International Conference on Computer Communication and Informatics (ICCCI)  
NLP models have to be constructed in order to learn the tweets with linguistic feature .Before feature extraction of the data; tweets are pre-processed with stop word removal and Stemming process.  ...  Additionally tweet vector cluster is established as potential sub-topic delegates and maintained dynamically in memory during stream processing.  ...  to make the joint predictions where Named Entity Recognition (NER) is used to find the names which is present in the text, and that names can be connected as the entries in structured or semi-structured  ... 
doi:10.1109/iccci.2016.7479955 fatcat:65brivo73ne57nvxchgdvcawwa

Complex networks for event detection in heterogeneous high volume news streams [article]

Iraklis Moutidis, Hywel T.P. Williams
2020 arXiv   pre-print
Detecting important events in high volume news streams is an important task for a variety of purposes.The volume and rate of online news increases the need for automated event detection methods thatcan  ...  Our approach uses natural language processingtechniques to detect these entities in a stream of news articles and then creates a time-stamped seriesof networks in which the detected entities are linked  ...  Acknowledgements The authors acknowledge funding from a commercial entity, Adarga Ltd ( The funder had no input or editorial influence over the manuscript.  ... 
arXiv:2005.13751v1 fatcat:ojln66iduvhxrks6dc5cdz45du
« Previous Showing results 1 — 15 out of 9,399 results