Filters








5,337 Hits in 7.1 sec

An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features [chapter]

Ted Pedersen, Anagha Kulkarni, Roxana Angheluta, Zornitsa Kozareva, Thamar Solorio
2006 Lecture Notes in Computer Science  
Previous work by Pedersen, Purandare and Kulkarni (2005) has resulted in an unsupervised method of name discrimination that represents the context in which an ambiguous name occurs using second order co-occurrence  ...  These methods have been shown to perform well with English text, although we believe them to be language independent since they rely on lexical features and use no syntactic features or external knowledge  ...  All of the data and stop-lists for the four languages used in these experiments are available at http://www.d.umn.edu/˜tpederse/pubs.html.  ... 
doi:10.1007/11671299_23 fatcat:rczt6hc3yva35ekbxijcubpw2a

Multilingual Scene Character Recognition System using Sparse Auto-Encoder for Efficient Local Features Representation in Bag of Features [article]

Maroua Tounsi, Ikram Moalla, Frank Lebourgeois, Adel M. Alimi
2018 arXiv   pre-print
In this paper, we extended the Bag of Features (BoF)-based model using deep learning for representing features for accurate SCR of different languages.  ...  In the features coding step, a deep Sparse Auto-encoder (SAE)-based strategy was applied to enhance the representative and discriminative abilities of image features.  ...  The research leading to these results has received funding from the Ministry of Higher Education and Scien-tific Research of Tunisia under the grant agreement number LR11ES48.  ... 
arXiv:1806.07374v4 fatcat:edkrvvarazaurok7cql2aot74a

A Vector Space Modeling Approach to Spoken Language Identification

Haizhou Li, Bin Ma, Chin-Hui Lee
2007 IEEE Transactions on Audio, Speech, and Language Processing  
The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription.  ...  Analogous to representing a text document as a term vector, we convert a spoken utterance into a feature vector with its attributes representing the co-occurrence statistics of the acoustic units.  ...  Li of the Georgia Institute of Technology and R. Tong of IIR, respectively.  ... 
doi:10.1109/tasl.2006.876860 fatcat:47ruzge2nnf2znqvnw3danubvm

Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation

Alexander Panchenko, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann
2017 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers  
Namely, we present an unsupervised, knowledge-free WSID approach, which is interpretable at three levels: word sense inventory, sense feature representations, and disambiguation procedure.  ...  The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings.  ...  Acknowledgments We acknowledge the support of the Deutsche Forschungsgemeinschaft (DFG) foundation under the JOIN-T project.  ... 
doi:10.18653/v1/e17-1009 dblp:conf/eacl/BiemannPFPR17 fatcat:va7bwwnisffvle6oyvrumwyvkq

Parameterized Contrast in Second Order Soft Co-occurrences: A Novel Text Representation Technique in Text Mining and Knowledge Extraction

Amir Hossein Razavi, Stan Matwin, Diana Inkpen, Alexandre Kouznetsov
2009 2009 IEEE International Conference on Data Mining Workshops  
The method is based on second order co-occurrence vectors whose efficiency for representing meaning has been established in many applications, especially for representing word senses in different contexts  ...  We evaluate our method on two tasks: classification of textual description of dreams, and classification of medical abstracts for systematic reviews.  ...  The second-order co-occurrence representation not only contains the main features (words/terms) of each context, but also contains many second order co-occurrence features.  ... 
doi:10.1109/icdmw.2009.49 dblp:conf/icdm/RazaviMIK09 fatcat:pr24ozusnncfxb4sdgkp6hxv6i

Morpheme Segmentation and Concatenation Approaches for Uyghur LVCSR

Mijit Ablimit, Tatsuya Kawahara, Askar Hamdulla
2015 International Journal of Hybrid Information Technology  
In the unsupervised model, the Morfessor tool is used to extract pseudo-morphemes from a raw text corpus.  ...  First is the data-driven approach which concatenates morpheme sequences based on certain measures like co-occurrence frequency or mutual probability.  ...  Acknowledgements This work is supported by National Natural Science Foundation of China (NSFC grant 61462085 and 61163032).  ... 
doi:10.14257/ijhit.2015.8.8.33 fatcat:tkjyubaphbbexnyoc53flpojba

A Comparative Study of Four Language Identification Systems

Bin Ma, Haizhou Li
2006 International Journal of Computational Linguistics and Chinese Language Processing  
We also propose a novel approach to LID system backend design, where the statistics of ASMs and their co-occurrences are used to form ASM-derived feature vectors, in a vector space modeling (VSM) approach  ...  , as opposed to the traditional language modeling (LM) approach, in order to discriminate between individual spoken languages.  ...  Furthermore, some high-frequency, language-specific words can also be converted into acoustic words and included in an acoustic word vocabulary, in order to increase the indexing power of these words for  ... 
dblp:journals/ijclclp/MaL06 fatcat:6vzp4nzsabfj5oofblmu6pnshq

A Self-enriching Methodology for Clustering Narrow Domain Short Texts

D. Pinto, P. Rosso, H. Jimenez-Salazar
2010 Computer journal  
Finally, we integrate all these assessment measures in a freely available web-based system named Watermarking Corpora On-line System, which may be used by computer scientists in order to evaluate the different  ...  Analysis of the behaviour of each TST in the self-term expansion methodology by using the DK-Means clustering method on the hep-ex corpus.  ...  order to determine the correct method for calculating the list of co-occurrence used in the self-term expansion process, we have tested two different co-occurrence methods (n-grams and PMI) with different  ... 
doi:10.1093/comjnl/bxq069 fatcat:46hcjyggxbdqtjc5wyxo3ari2u

Identifying and Explaining Discriminative Attributes

Armins Stepanjans, André Freitas
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
Identifying what is at the center of the meaning of a word and what discriminates it from other words is a fundamental natural language inference task.  ...  This paper describes an explicit word vector representation model (WVM) to support the identification of discriminative attributes.  ...  Acknowledgments The authors would like to thank Viktor Schlegel for the initial supporting machine learning baselines (not included here) and related work discussions in the beginning of the project.  ... 
doi:10.18653/v1/d19-1440 dblp:conf/emnlp/StepanjansF19 fatcat:ipenseutafcxtfzl3hp4hza2ki

Disentangling from Babylonian Confusion – Unsupervised Language Identification [chapter]

Chris Biemann, Sven Teresniak
2005 Lecture Notes in Computer Science  
This work presents an unsupervised solution to language identification.  ...  The method sorts multilingual text corpora on the basis of sentences into the different languages that are contained and makes no assumptions on the number or size of the monolingual fractions.  ...  This work proposes a method that operates on words as features and finds the number of languages as well as the sentences that belong to each language in a fully unsupervised way.  ... 
doi:10.1007/978-3-540-30586-6_87 fatcat:3z7g7xqvwbcatergwsgenkpnlu

Visually Analyzing Contextualized Embeddings [article]

Matthew Berger
2020 arXiv   pre-print
Our approach is inspired by linguistic probes for natural language processing, where tasks are designed to probe language models for linguistic structure, such as parts-of-speech and named entities.  ...  In this paper we introduce a method for visually analyzing contextualized embeddings produced by deep neural network-based language models.  ...  of certain types of linguistic features, e.g. named entities or a part of a constituency parse tree.  ... 
arXiv:2009.02554v1 fatcat:m42moqp5enc7piq22aif3tvkoi

High-Precision Sentence Alignment by Bootstrapping from Wood Standard Annotations

Éva Mújdricza-Maydt, Huiqin Körkel-Qu, Stefan Riezler, Sebastian Padó
2013 Prague Bulletin of Mathematical Linguistics  
We present a semi-supervised, language- and domain-independent approach to high precision sentence alignment.  ...  The key idea is to bootstrap a supervised discriminative learner from wood-standard alignments, i.e. alignments that have been automatically generated by state-of-the-art sentence alignment tools.  ...  As we will see in the experimental evaluation, the contribution of the POS similarity feature is marginal, thus vindicating our claim of language independence.  ... 
doi:10.2478/pralin-2013-0001 fatcat:3r24gxmsvnfv3ir5ayzkly7qxm

Standard Co-training in Multiword Expression Detection [chapter]

Senem Kumova Metin
2017 Lecture Notes in Computer Science  
In this paper, considering MWE detection as a binary classification task, we propose to use a semi-supervised learning algorithm, standard co-training [1] Co-training is a semi-supervised method that employs  ...  Since MWEs occupy a prominent amount of space in both written and spoken language materials, identification of MWEs is accepted to be an important task in natural language processing.  ...  This work is carried under the grant of TÜBİTAK -The Scientific and Technological Research Council of Turkey to Project No: 115E469, Identification of Multi-word Expressions in Turkish Texts.  ... 
doi:10.1007/978-3-319-72038-8_14 fatcat:43rk556aobdvrbvieycq2yblmy

Action Type induction from multilingual lexical features

Lorenzo Gregori, Rossella Varvara, Andrea Amelio Ravelli
2019 Revista de Procesamiento de Lenguaje Natural (SEPLN)  
Finally, an unsupervised clustering method has been applied on these data in order to discover action classes based on typological closeness.  ...  Those clusters are not language-specific or language-biased, and thus constitute an inter-linguistic classification of action domain.  ...  The application of SVD to our dataset allowed us to obtain a fixed-size feature space, that is independent of the number of languages, and an approximation matrix, that smooths language-specific semantic  ... 
dblp:journals/pdln/GregoriVR19 fatcat:na5ekswtv5apnbjxpgwmbdr2li

Clustering Multi-relationnal TV Data by Diverting Supervised ILP

Vincent Claveau
2017 International Conference on Inductive Logic Programming  
In this paper, we show how to divert ILP to work unsupervised in this context: from artificial learning problems, we induce a notion of similarity between broadcasts, which is later used to perform the  ...  Traditionally, clustering operates on data described by a fixed number of (usually numerical) features; this description schema is said propositional or attribute-value.  ...  They have to be realistic enough in order to produce learning problems that will generate discriminative enough clauses, and thus relevant co-covers.  ... 
dblp:conf/ilp/Claveau17 fatcat:im5ywszeyndfbcwst3hu6qd2ke
« Previous Showing results 1 — 15 out of 5,337 results