116,195 Hits in 4.7 sec

Text segmentation: A topic modeling perspective

Hemant Misra, François Yvon, Olivier Cappé, Joemon Jose
2011 Information Processing & Management  
In this paper, the task of text segmentation is approached from a topic modeling perspective.  ...  We investigate the use of two unsupervised topic models, latent Dirichlet allocation (LDA) and multinomial mixture (MM), to segment a text into semantically coherent parts.  ...  The main objective of this study is to investigate whether text segmentation can be achieved from a topic modeling perspective.  ... 
doi:10.1016/j.ipm.2010.11.008 fatcat:tj7swmzke5cile3nervaj6ef2i

Text Classification of Technical Papers Based on Text Segmentation [chapter]

Thien Hai Nguyen, Kiyoaki Shirai
2013 Lecture Notes in Computer Science  
The goal of this research is to design a multi-label classification model which determines the research topics of a given technical paper.  ...  Furthermore, we proposed a new model for text classification based on the structure of papers, called Back-off model, which achieves 60.45% Exact Match Ratio and 68.75% F-measure.  ...  Identification of nucleus and adjuncts is as a kind of text segmentation, but our text segmentation is fit for technical papers.  ... 
doi:10.1007/978-3-642-38824-8_25 fatcat:wko3nanglzbdra4e5sfv2hx7ze

Attention-based Neural Text Segmentation [article]

Pinkesh Badjatiya, Litton J Kurisinkel, Manish Gupta, Vasudeva Varma
2018 arXiv   pre-print
To the best of our knowledge, this paper is the first one to present a novel supervised neural approach for text segmentation.  ...  Compared to the existing competitive baselines, the proposed model shows a performance improvement of ~7% in WinDiff score on three benchmark datasets.  ...  Conclusions In this paper, we studied the problem of text segmentation from a neural perspective.  ... 
arXiv:1808.09935v1 fatcat:56jdw2kxwbhwxc5g454j75x2ju

Segmenting corpora of texts

Tony Berber Sardinha
2002 DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada  
For the purposes of this investigation, a segment is defined as a contiguous portion of written text consisting of at least two sentences.  ...  The aim of the research presented here is to report on a corpus-based method for discourse analysis that is based on the notion of segmentation, or the division of texts into cohesive portions.  ...  Segmentation is also a fundamental aspect underlying models of discourse. The research reported in this paper is aimed at developing a computer-assisted procedure for segmenting texts.  ... 
doi:10.1590/s0102-44502002000200004 fatcat:uest24gcqrejrhn3ba3ihbgvnu

Text Segmentation wit h Topic Models

Martin Riedl, Chris Biemann
2012 Journal for Language Technology and Computational Linguistics  
This article presents a general method to use information retrieved from the Latent Dirichlet Allocation (LDA) topic model for Text Segmentation: Using topic assignments instead of words in two well-known  ...  A further contribution to improve the segmentation accuracy is obtained through stabilizing topic assignments by using information from all LDA inference iterations.  ...  Text Segmentation Algorithms using Topic Models C99 using Topic Models The topic-based version of the C99 algorithm (Choi, 2000) , called C99LDA, divides the input text into minimal units on sentence  ... 
dblp:journals/ldvf/RiedlB12 fatcat:2ayhl5uokje7vberzo2ksberdq

An Iterative Approach to Text Segmentation [chapter]

Fei Song, William M. Darling, Adnan Duric, Fred W. Kroon
2011 Lecture Notes in Computer Science  
To search for the weakest point, we apply two different measures: one is based on language modeling of text segmentation and the other, on the interconnectivity between two segments.  ...  We present divSeg, a novel method for text segmentation that iteratively splits a portion of text at its weakest point in terms of the connectivity strength between two adjacent parts.  ...  We would also like to thank the editor Juliette Zhang for labeling the topic and subtopic structures for the Discover magazine dataset.  ... 
doi:10.1007/978-3-642-20161-5_63 fatcat:7vhh5krllvc3xd34ui3rkdb4ya

Text Segmentation Techniques: A Critical Review [chapter]

Irina Pak, Phoey Lee Teh
2017 Studies in Computational Intelligence  
Text segmentation is widely used for processing text. It is a method of splitting a document into smaller parts, which is usually called segments. Each segment has its relevant meaning.  ...  Those segments categorized as word, sentence, topic, phrase or any information unit depending on the task of the text analysis.  ...  [4] has used the topic as segment too; they proposed a novel method that includes hierarchical organization and language modeling to split the text into parts.  ... 
doi:10.1007/978-3-319-66984-7_10 fatcat:xcxqs5rcxzd3rlhr2f7phtr5pa

Tracking the Evolution of Social Emotions: A Time-Aware Topic Modeling Perspective

Chen Zhu, Hengshu Zhu, Yong Ge, Enhong Chen, Qi Liu
2014 2014 IEEE International Conference on Data Mining  
A critical challenge is how to model emotions with respect to time spans. To this end, we propose a time-aware topic modeling perspective for solving this problem.  ...  Specifically, we first develop a model named emotion-Topic over Time (eToT), in which we represent the topics of news as a Beta distribution over time and a multinomial distribution over emotions.  ...  [22] modeled texts through a mixture of topic model and sentiment model.  ... 
doi:10.1109/icdm.2014.121 dblp:conf/icdm/ZhuZGCL14 fatcat:ot45j6si3jg65pmcjo6ouoitri

Unsupervised Text Segmentation via Deep Sentence Encoders: a first step towards a common framework for text-based segmentation, summarization and indexing of media content

Iacopo Ghinassi
2021 Zenodo  
In this paper we present a new algorithm for text segmentation based on deep sentence encoders and the TextTiling algorithm.  ...  We will describe how text segmentation is an essential first step in the re-purposing of media content like TV newscasts and how the proposed methodology can add value to other subsequent tasks involving  ...  Topic modelling is a common task in natural language processing and (in the LDA perspective) it has the goal to represent documents as a mixture of a predefined number of topics.  ... 
doi:10.5281/zenodo.4744398 fatcat:ropxpxkcynaqdnv5jswuv6j3ji

Discourse Segmentation of German Texts

Uladzimir Sidarenka, Andreas Peldszus, Manfred Stede
2015 Journal for Language Technology and Computational Linguistics  
This paper addresses the problem of segmenting German texts into minimal discourse units, as they are needed, for example, in RST-based discourse parsing.  ...  Finally, we compare our approaches with the recent discourse segmentation methods proposed for English.  ...  Again, this of course depends on the purpose: A topic-based segmentation of a text, e.g. in the 'text tiling' tradition (Hearst, 1997) , is flat in the vast majority of approaches.  ... 
dblp:journals/ldvf/SidarenkaPS15 fatcat:555oe4e3zraqljsf2bahb3f6by

Natural Scene Text Understanding [chapter]

Celine Mancas, Bernard Gosseli
2007 Vision Systems: Segmentation and Pattern Recognition  
fonts, italic characters or with perspective (in a reasonable degree).  ...  Line segmentation Segmentation into lines is an old topic and the two main and successful methods are either the vertical projection profile or the Hough transform [53] .  ...  -APPENDIX A - Color Spaces Conversion This appendix details conversions and visualisation 1 of color spaces described in Chapter 2 for the D65 white point 2 , the 2 • observer and the sRGB working space  ... 
doi:10.5772/4966 fatcat:2vx67sdtrzfpfousqoagdukutm

Segmentation of Greek Texts by Dynamic Programming [chapter]

Pavlina Fragkou, Athanassios Kehagias, Vassilios Petridis
2008 Tools in Artificial Intelligence  
This is a popular approach, according to which parts of a text having similar vocabulary are likely to belong to a coherent topic segment.  ...  Advances to topic segmentation (closely related to text segmentation) include methods performing topic segmentation method based on weighted lexical chains (Sitbon & Bellot, 2005) , as well as a new informative  ...  The Tokenizer takes as input raw text and converts it into a stream of tokens.  ... 
doi:10.5772/6089 fatcat:ggidrf63m5hdvisjci2n3sny3m

Artificial Intelligence Technologies in Neurosurgery: a Systematic Literature Review Using Topic Modeling. Part II: Research Objectives and Perspectives

G.V. Danilov, M.A. Shifrin, K.V. Kotik, T.A. Ishankulov, Yu.N. Orlov, A.S. Kulikov, A.A. Potapov
2020 Sovremennye tehnologii v medicine  
The aim of the study was to conduct a systematic literature review to highlight the main directions and trends in the use of AI in neurosurgery.  ...  The current increase in the number of publications on the use of artificial intelligence (AI) technologies in neurosurgery indicates a new trend in clinical neuroscience.  ...  Such technologies (topic modeling) are used in this review.  ... 
doi:10.17691/stm2020.12.6.12 pmid:34796024 pmcid:PMC8596229 fatcat:3yswy55qjzh77kwaimtrh2n36a

Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts [article]

Pavlina Fragkou
2016 arXiv   pre-print
The aim here is to examine whether the combination of text segmentation and information extraction can be beneficial for the identification of the various topics that appear in a document.  ...  In this paper we examine the benefit of performing named entity recognition (NER) and co-reference resolution to an English and a Greek corpus used for text segmentation.  ...  Yu et al. ( [YF12] ) propose a different approach in which, each segment unit is represented by a distribution of the topics, instead of a set of word tokens thus, a text input is modeled as a sequence  ... 
arXiv:1610.09226v1 fatcat:brsxplcpgzdpbmvr3jpen34u7q

Segmentation of Argumentative Texts with Contextualised Word Representations

Georgios Petasis
2019 Proceedings of the 6th Workshop on Argument Mining  
The segmentation of argumentative units is an important subtask of argument mining, which is frequently addressed at a coarse granularity, usually assuming argumentative units to be no smaller than sentences  ...  Evaluation results suggest the examined models and approaches can exhibit comparable performance, minimising the need for feature engineering.  ...  The segmentation of text into argumentative units is typically the first sub-task encountered in such an argument mining pipeline, aiming to segment texts into argumentative and non-argumentative text  ... 
doi:10.18653/v1/w19-4501 dblp:conf/argmining/Petasis19 fatcat:zdjz6f6w4nbfzcicynk5lv4xve
« Previous Showing results 1 — 15 out of 116,195 results