A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2007; you can also visit the original URL.
The file type is application/pdf
.
Filters
An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features
[chapter]
2006
Lecture Notes in Computer Science
Previous work by Pedersen, Purandare and Kulkarni (2005) has resulted in an unsupervised method of name discrimination that represents the context in which an ambiguous name occurs using second order co-occurrence ...
These methods have been shown to perform well with English text, although we believe them to be language independent since they rely on lexical features and use no syntactic features or external knowledge ...
All of the data and stop-lists for the four languages used in these experiments are available at http://www.d.umn.edu/˜tpederse/pubs.html. ...
doi:10.1007/11671299_23
fatcat:rczt6hc3yva35ekbxijcubpw2a
Multilingual Scene Character Recognition System using Sparse Auto-Encoder for Efficient Local Features Representation in Bag of Features
[article]
2018
arXiv
pre-print
In this paper, we extended the Bag of Features (BoF)-based model using deep learning for representing features for accurate SCR of different languages. ...
In the features coding step, a deep Sparse Auto-encoder (SAE)-based strategy was applied to enhance the representative and discriminative abilities of image features. ...
The research leading to these results has received funding from the Ministry of Higher Education and Scien-tific Research of Tunisia under the grant agreement number LR11ES48. ...
arXiv:1806.07374v4
fatcat:edkrvvarazaurok7cql2aot74a
A Vector Space Modeling Approach to Spoken Language Identification
2007
IEEE Transactions on Audio, Speech, and Language Processing
The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription. ...
Analogous to representing a text document as a term vector, we convert a spoken utterance into a feature vector with its attributes representing the co-occurrence statistics of the acoustic units. ...
Li of the Georgia Institute of Technology and R. Tong of IIR, respectively. ...
doi:10.1109/tasl.2006.876860
fatcat:47ruzge2nnf2znqvnw3danubvm
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation
2017
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
Namely, we present an unsupervised, knowledge-free WSID approach, which is interpretable at three levels: word sense inventory, sense feature representations, and disambiguation procedure. ...
The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings. ...
Acknowledgments We acknowledge the support of the Deutsche Forschungsgemeinschaft (DFG) foundation under the JOIN-T project. ...
doi:10.18653/v1/e17-1009
dblp:conf/eacl/BiemannPFPR17
fatcat:va7bwwnisffvle6oyvrumwyvkq
Parameterized Contrast in Second Order Soft Co-occurrences: A Novel Text Representation Technique in Text Mining and Knowledge Extraction
2009
2009 IEEE International Conference on Data Mining Workshops
The method is based on second order co-occurrence vectors whose efficiency for representing meaning has been established in many applications, especially for representing word senses in different contexts ...
We evaluate our method on two tasks: classification of textual description of dreams, and classification of medical abstracts for systematic reviews. ...
The second-order co-occurrence representation not only contains the main features (words/terms) of each context, but also contains many second order co-occurrence features. ...
doi:10.1109/icdmw.2009.49
dblp:conf/icdm/RazaviMIK09
fatcat:pr24ozusnncfxb4sdgkp6hxv6i
Morpheme Segmentation and Concatenation Approaches for Uyghur LVCSR
2015
International Journal of Hybrid Information Technology
In the unsupervised model, the Morfessor tool is used to extract pseudo-morphemes from a raw text corpus. ...
First is the data-driven approach which concatenates morpheme sequences based on certain measures like co-occurrence frequency or mutual probability. ...
Acknowledgements This work is supported by National Natural Science Foundation of China (NSFC grant 61462085 and 61163032). ...
doi:10.14257/ijhit.2015.8.8.33
fatcat:tkjyubaphbbexnyoc53flpojba
A Comparative Study of Four Language Identification Systems
2006
International Journal of Computational Linguistics and Chinese Language Processing
We also propose a novel approach to LID system backend design, where the statistics of ASMs and their co-occurrences are used to form ASM-derived feature vectors, in a vector space modeling (VSM) approach ...
, as opposed to the traditional language modeling (LM) approach, in order to discriminate between individual spoken languages. ...
Furthermore, some high-frequency, language-specific words can also be converted into acoustic words and included in an acoustic word vocabulary, in order to increase the indexing power of these words for ...
dblp:journals/ijclclp/MaL06
fatcat:6vzp4nzsabfj5oofblmu6pnshq
A Self-enriching Methodology for Clustering Narrow Domain Short Texts
2010
Computer journal
Finally, we integrate all these assessment measures in a freely available web-based system named Watermarking Corpora On-line System, which may be used by computer scientists in order to evaluate the different ...
Analysis of the behaviour of each TST in the self-term expansion methodology by using the DK-Means clustering method on the hep-ex corpus. ...
order to determine the correct method for calculating the list of co-occurrence used in the self-term expansion process, we have tested two different co-occurrence methods (n-grams and PMI) with different ...
doi:10.1093/comjnl/bxq069
fatcat:46hcjyggxbdqtjc5wyxo3ari2u
Identifying and Explaining Discriminative Attributes
2019
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Identifying what is at the center of the meaning of a word and what discriminates it from other words is a fundamental natural language inference task. ...
This paper describes an explicit word vector representation model (WVM) to support the identification of discriminative attributes. ...
Acknowledgments The authors would like to thank Viktor Schlegel for the initial supporting machine learning baselines (not included here) and related work discussions in the beginning of the project. ...
doi:10.18653/v1/d19-1440
dblp:conf/emnlp/StepanjansF19
fatcat:ipenseutafcxtfzl3hp4hza2ki
Disentangling from Babylonian Confusion – Unsupervised Language Identification
[chapter]
2005
Lecture Notes in Computer Science
This work presents an unsupervised solution to language identification. ...
The method sorts multilingual text corpora on the basis of sentences into the different languages that are contained and makes no assumptions on the number or size of the monolingual fractions. ...
This work proposes a method that operates on words as features and finds the number of languages as well as the sentences that belong to each language in a fully unsupervised way. ...
doi:10.1007/978-3-540-30586-6_87
fatcat:3z7g7xqvwbcatergwsgenkpnlu
Visually Analyzing Contextualized Embeddings
[article]
2020
arXiv
pre-print
Our approach is inspired by linguistic probes for natural language processing, where tasks are designed to probe language models for linguistic structure, such as parts-of-speech and named entities. ...
In this paper we introduce a method for visually analyzing contextualized embeddings produced by deep neural network-based language models. ...
of certain types of linguistic features, e.g. named entities or a part of a constituency parse tree. ...
arXiv:2009.02554v1
fatcat:m42moqp5enc7piq22aif3tvkoi
High-Precision Sentence Alignment by Bootstrapping from Wood Standard Annotations
2013
Prague Bulletin of Mathematical Linguistics
We present a semi-supervised, language- and domain-independent approach to high precision sentence alignment. ...
The key idea is to bootstrap a supervised discriminative learner from wood-standard alignments, i.e. alignments that have been automatically generated by state-of-the-art sentence alignment tools. ...
As we will see in the experimental evaluation, the contribution of the POS similarity feature is marginal, thus vindicating our claim of language independence. ...
doi:10.2478/pralin-2013-0001
fatcat:3r24gxmsvnfv3ir5ayzkly7qxm
Standard Co-training in Multiword Expression Detection
[chapter]
2017
Lecture Notes in Computer Science
In this paper, considering MWE detection as a binary classification task, we propose to use a semi-supervised learning algorithm, standard co-training [1] Co-training is a semi-supervised method that employs ...
Since MWEs occupy a prominent amount of space in both written and spoken language materials, identification of MWEs is accepted to be an important task in natural language processing. ...
This work is carried under the grant of TÜBİTAK -The Scientific and Technological Research Council of Turkey to Project No: 115E469, Identification of Multi-word Expressions in Turkish Texts. ...
doi:10.1007/978-3-319-72038-8_14
fatcat:43rk556aobdvrbvieycq2yblmy
Action Type induction from multilingual lexical features
2019
Revista de Procesamiento de Lenguaje Natural (SEPLN)
Finally, an unsupervised clustering method has been applied on these data in order to discover action classes based on typological closeness. ...
Those clusters are not language-specific or language-biased, and thus constitute an inter-linguistic classification of action domain. ...
The application of SVD to our dataset allowed us to obtain a fixed-size feature space, that is independent of the number of languages, and an approximation matrix, that smooths language-specific semantic ...
dblp:journals/pdln/GregoriVR19
fatcat:na5ekswtv5apnbjxpgwmbdr2li
Clustering Multi-relationnal TV Data by Diverting Supervised ILP
2017
International Conference on Inductive Logic Programming
In this paper, we show how to divert ILP to work unsupervised in this context: from artificial learning problems, we induce a notion of similarity between broadcasts, which is later used to perform the ...
Traditionally, clustering operates on data described by a fixed number of (usually numerical) features; this description schema is said propositional or attribute-value. ...
They have to be realistic enough in order to produce learning problems that will generate discriminative enough clauses, and thus relevant co-covers. ...
dblp:conf/ilp/Claveau17
fatcat:im5ywszeyndfbcwst3hu6qd2ke
« Previous
Showing results 1 — 15 out of 5,337 results