352 Hits in 7.5 sec

Named Entity Recognition and Classification on Historical Documents: A Survey [article]

Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, Antoine Doucet
2021 arXiv   pre-print
In this survey, we present the array of challenges posed by historical documents to NER, inventory existing resources, describe the main approaches deployed so far, and identify key priorities for future  ...  Among semantic indexing opportunities, the recognition and classification of named entities are in great demand among humanities scholars.  ...  Maud Ehrmann and Matteo Romanello was supported by the Swiss National Science Foundation under the grants number CR-SII5_173719 (Impresso -Media Monitoring of the Past) and number PZ00P1_186033 (only for  ... 
arXiv:2109.11406v1 fatcat:zbwoybklk5bjrlf2b67qm6t7e4

Deep sparse auto-encoder features learning for Arabic text recognition

Najoua Rahal, Maroua Tounsi, Amir Hussain, Adel M. Alimi
2021 IEEE Access  
We propose a novel hybrid network, combining a Bag-of-Feature (BoF) framework for feature extraction based on a deep Sparse Auto-Encoder (SAE), and Hidden Markov Models (HMMs), for sequence recognition  ...  INDEX TERMS Arabic text recognition, feature learning, bag of features, sparse auto-encoder, hidden Markov models.  ...  Markov modeling has proved to be the most efficient classifier broadly used for Arabic text recognition.  ... 
doi:10.1109/access.2021.3053618 fatcat:p7jhbokjsjbunceuq4lu7xnmci

A Markov model of urban evolution: Neighbourhood change as a complex process

Daniel Silver, Thiago H. Silva, Wenjia Zhang
2021 PLoS ONE  
Extending previous studies, we pursue a hierarchical approach to classifying neighbourhoods that situates many neighbourhood types within the city's broader structure.  ...  Our hierarchical approach is able to incorporate a richer set of types than most past research and allows us to study how neighbourhoods' positions within this hierarchy shape their trajectories of change  ...  Acknowledgments We are grateful for ongoing discussion of these issues examined in this paper with members of the Urban Genome Project, in particular Scott Sanner, Mark Fox, Rob Wright, and Fernando Calderon-Figueroa  ... 
doi:10.1371/journal.pone.0245357 pmid:33449942 fatcat:aebqoaldvbft7ge6iabavzpt4a

Learning a Concept Hierarchy from Multi-labeled Documents

Viet-An Nguyen, Jordan L. Boyd-Graber, Philip Resnik, Jonathan Chang
2014 Neural Information Processing Systems  
In this paper, we present a model-Label to Hierarchy (L2H)-that can induce a hierarchy of user-generated labels and the topics associated with those labels from a set of multi-labeled documents.  ...  While topic models can discover patterns of word usage in large corpora, it is difficult to meld this unsupervised structure with noisy, human-provided labels, especially when the label space is large.  ...  Acknowledgments We thank Kristina Miler, Ke Zhai, Leo Claudino, and He He for helpful discussions, and thank the anonymous reviewers for insightful comments.  ... 
dblp:conf/nips/NguyenBRC14 fatcat:ffllbp4jsrgndnlfix2xghxoym

Social Media Analysis based on Semanticity of Streaming and Batch Data [article]

Barathi Ganesh HB
2018 arXiv   pre-print
In this work Conditional Random Field has been utilized to do the entity recognition and a novel approach has been proposed to find the sociolect aspects of the author (Gender, Age group).  ...  Knowledge extraction differs with respect to the application in which the research on cognitive science fed the necessities for the same.  ...  CRF has various advantages over Hidden Markov Model and Maximum Entropy Markov Models. It outperforms both the approaches for various NLP tasks.  ... 
arXiv:1801.01102v2 fatcat:sr5d2epwa5ejhgtnaru6n5kgzy

Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances

George Saon, Jen-Tzung Chien
2012 IEEE Signal Processing Magazine  
Furthermore, moving beyond temporal segmentation based on discrete Markov chains in HMMs, the Markov switching process [118] was developed as a more complicated process that can realize different BNP  ...  ACOUSTIC MODELING HIDDEN MARKOV MODELS Hidden Markov models (HMMs) [37] are a popular formalism for the representation of temporal or spatial sequence data, e.g., speech, image, video, text, music,  ... 
doi:10.1109/msp.2012.2197156 fatcat:sl3fzg2hz5emrpm6srfuc3n3ye

Probabilistic Modelling of Morphologically Rich Languages [article]

Jan A. Botha
2015 arXiv   pre-print
We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words.  ...  This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language.  ...  For historical reasons, this class of models are referred to simply as n-gram language models. 2) The other major approach is to embed words in a low-dimensional, continuous vector space and learn a function  ... 
arXiv:1508.04271v1 fatcat:6qhsfdbvt5emfiaumtwh2pzs7m

Zone-Wise Segmentation and Lexicon-Driven Recognition for Printed Myanmar Characters

Chit San Lwin, Xiangqian Wu
2018 International Journal of Scientific Research in Computer Science Engineering and Information Technology  
Hidden Markov model is used for recognition of primary characters while Kohonen self-organization map is used for peripheral characters.  ...  This paper presents a new segmentation and recognition algorithms for Myanmar script inputted from offline printed images.  ...  HMM Classification for Primary Ligatures HMM is a statistical Markov model being structured to search hidden (unobserved) states based on their transition probabilities.  ... 
doi:10.32628/cseit183844 fatcat:4kfs7v76gzehxcy42r4cz2fa4e

Message from the general chair

Benjamin C. Lee
2015 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)  
We propose a joint learning model which combines pairwise classification and mention clustering with Markov logic.  ...  Joint Learning for Coreference Resolution with Markov Logic Resolving "This-issue" Anaphora Varada Kolhatkar and Graeme Hirst Saturday 12:00pm-12:30pm -202 A (ICC) We annotate and resolve a particular  ...  method based on Markov Chain Monte Carlo (MCMC) sampling.  ... 
doi:10.1109/ispass.2015.7095776 dblp:conf/ispass/Lee15 fatcat:ehbed6nl6barfgs6pzwcvwxria

Text Detection and Recognition in Imagery: A Survey

Qixiang Ye, David Doermann
2015 IEEE Transactions on Pattern Analysis and Machine Intelligence  
The categories and sub-categories of text are illustrated, benchmark datasets are enumerated, and the performance of the most representative approaches is compared.  ...  This review provides a fundamental comparison and analysis of the remaining problems in the field.  ...  Jie Chen, the Associate Editor and the reviewers for their comments and suggestions.  ... 
doi:10.1109/tpami.2014.2366765 pmid:26352454 fatcat:cuz3qhkglnahdebxqptbsgpjmm

Automatic Language Identification in Texts: A Survey

Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén
2019 The Journal of Artificial Intelligence Research  
Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known.  ...  Language identification ("LI") is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years.  ...  We would like to thank Kimmo Koskenniemi for many valuable discussions and comments concerning the early phases of the features and the methods sections.  ... 
doi:10.1613/jair.1.11675 fatcat:axugpuogyne3nptvamgd3zwgty

Scalable Text Mining with Sparse Generative Models [article]

Antti Puurula
2016 arXiv   pre-print
, with a order of magnitude decrease in classification times for Wikipedia article categorization with a million classes.  ...  The proposed combination provides sparse generative models: a solution for text mining that is general, effective, and scalable.  ...  Acknowledgements We'd like to thank Kaggle and the LSHTC organizers for their work in making the competition a success, and the machine learning group at the University of Waikato for the computers we  ... 
arXiv:1602.02332v1 fatcat:2urzib3btveslj5ggie55irxwq

Automatic Language Identification in Texts: A Survey [article]

Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén
2018 arXiv   pre-print
Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known.  ...  For describing the features and methods we introduce a unified notation.  ...  We would like to thank Kimmo Koskenniemi for many valuable discussions and comments concerning the early phases of the features and the methods sections.  ... 
arXiv:1804.08186v2 fatcat:4rmixp4i5fb55itb7ze5avkgqy

Coreference Resolution: Toward End-to-End and Cross-Lingual Systems

André Ferreira Cruz, Gil Rocha, Henrique Lopes Cardoso
2020 Information  
We analyzed existing state-of-the-art models and approaches, and reviewed recent advances and trends in the field, namely end-to-end systems that jointly model different subtasks of coreference resolution  ...  The task of coreference resolution has attracted considerable attention in the literature due to its importance in deep language understanding and its potential as a subtask in a variety of complex natural  ...  The approach proposed by Howard and Ruder [107] for universal language model fine-tuning for text classification could further improve current performance on less-resourced languages, promising to achieve  ... 
doi:10.3390/info11020074 fatcat:jp4sajikina5noobeomccaamaa

Towards Personalized and Human-in-the-Loop Document Summarization [article]

Samira Ghodratnama
2021 arXiv   pre-print
The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models.  ...  , and (iv) the need for reference summaries.  ...  For example, SUMMARIST [66] is a multilingual summariser that is available in English, Japanese, Spanish, Arabic, Indonesian and Korean and FarsiSum [67] (which is a monolingual text summarisation  ... 
arXiv:2108.09443v2 fatcat:245c3byhg5htrnmbnoofexzh3u
« Previous Showing results 1 — 15 out of 352 results