Filters








3,241 Hits in 3.7 sec

Linguistica 5: Unsupervised Learning of Linguistic Structure

Jackson Lee, John Goldsmith
2016 Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations  
While Linguistica 5 inherits its predecessors' strength in unsupervised learning of natural language morphology, it incorporates significant improvements in multiple ways.  ...  This paper introduces Linguistica 5, a software for unsupervised learning of linguistic structure. It is a descendant of Goldsmith's (2001 Goldsmith's ( , 2006 Linguistica.  ...  Acknowledgments This work was completed in part with resources provided by the University of Chicago Research Computing Center and Big Ideas Generator.  ... 
doi:10.18653/v1/n16-3005 dblp:conf/naacl/LeeG16 fatcat:ndfwpyknp5arzolzbo77kqrzcm

Page 29 of Computational Linguistics Vol. 27, Issue 1 [page]

2001 Computational Linguistics  
In phase two, the objective is to learn a contextual model of each PN category, augmented with syntactic and semantic features.  ...  Second, through an unsupervised corpus-based technique, typical PN syntactic and semantic contexts are learned.  ... 

Investigating Sub-Word Embedding Strategies for the Morphologically Rich and Free Phrase-Order Hungarian

Bálint Döbrössy, Márton Makrai, Balázs Tarján, György Szaszák
2019 Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)  
Therefore, we explore and evaluate several sub-word unit based embedding strategiescharacter n-grams, lemmatization provided by an NLP-pipeline, and segments obtained in unsupervised learning (morfessor  ...  For the highly agglutinative Hungarian, semantic accuracy of word embeddings measured on word analogy tasks drops by 50-75% compared to English.  ...  Results showed that using the lemmas instead of the words was by far the most effective approach by maximizing semantic accuracy of the embeddings.  ... 
doi:10.18653/v1/w19-4321 dblp:conf/rep4nlp/DobrossyMTS19 fatcat:72tbnxawfrgbnoornhfb3eqzwe

Joint PoS Tagging and Stemming for Agglutinative Languages [article]

Necva Bölücü, Burcu Can
2017 arXiv   pre-print
In this paper, we present an unsupervised Bayesian model using Hidden Markov Models (HMMs) for joint PoS tagging and stemming for agglutinative languages.  ...  Part-of-speech tagging (PoS tagging) is one of these tasks that often suffers from sparsity.  ...  Acknowledgments This research is supported by the Scientific and Technological Research Council of Turkey (TUBITAK) with the project number EEEAG-E .  ... 
arXiv:1705.08942v1 fatcat:hauvrqaoobfgbly32rkw6ukuza

A Computational Model for Child Inferences of Word Meanings via Syntactic Categories for Different Ages and Languages

Yuji Kawai, Yuji Oshima, Yuki Sasamoto, Yukie Nagai, Minoru Asada
2018 IEEE Transactions on Cognitive and Developmental Systems  
We hypothesize that using this model with different numbers of categories can replicate the manner in which children of different ages learn words.  ...  In addition, cross-linguistic differences originating from the acquisition of language-specific syntactic categories are identified, i.e., the syntactic categories learned from English and Chinese corpora  ...  ACKNOWLEDGMENT The authors gratefully acknowledge the advice of Prof. M. Imai of Keio University.  ... 
doi:10.1109/tcds.2018.2883048 fatcat:pde5glonyjhrpbnv63jojv2pbq

Building Morphological Chains for Agglutinative Languages [article]

Serkan Ozen, Burcu Can
2017 arXiv   pre-print
In this paper, we build morphological chains for agglutinative languages by using a log-linear model for the morphological segmentation task.  ...  The results indicate that candidate generation plays an important role in such an unsupervised log-linear model that is learned using contrastive estimation with negative samples.  ...  Acknowledgments This research is supported by the Scientific and Technological Research Council of Turkey (TUBITAK) with the project number EEEAG-115E464 and we are grateful to TUBITAK for their financial  ... 
arXiv:1705.02314v1 fatcat:djigjptfpfdufo6o6un2k6bige

A Review of Open Information Extraction Techniques

Sally Ali, Hamdy Mousa, M. Hussien
2019 IJCI. International Journal of Computers and Information  
To make use of such huge amounts of textual data, there is a need to detect, extract, and structure the information conveyed through this data in a fast and scalable manner.  ...  This can be performed using Information Extraction Techniques.  ...  of their morphologies limits the implementation of different categories.  ... 
doi:10.21608/ijci.2019.35099 fatcat:ff3ssqyvzvelzp3lklvmpkj2kq

SEMANTIC ANALYZER FOR MARATHI TEXT

Pallavi Bagul .
2014 International Journal of Research in Engineering and Technology  
We describe our system as the one which analyzes the text by comparing it with the meaning of the words given in the WordNet.  ...  This paper represents a Semantic Analyzer for checking the semantic correctness of the given input text.  ...  Syntactic Analyzer Morphological Analyzer"s output is used by the Syntactic Analyzer to detect whether the output is syntactically correct or not.  ... 
doi:10.15623/ijret.2014.0303098 fatcat:xiuko7p2ezbmtc4vvgxroiw5we

Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods

Georgios Petasis, Alessandro Cucchiarelli, Paola Velardi, Georgios Paliouras, Vangelis Karkaletsis, Constantine D. Spyropoulos
2000 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '00  
recognition of about 90% of the remaining 50%.  ...  However the high performance of most existing PN classifiers heavily depends upon the availability of large dictionaries of domain-specific Proper Nouns, and a certain amount of manual work for rule writing  ...  The method combined two learning approaches: supervised learning of decision-tree classifiers and unsupervised probabilistic learning of syntactic and semantic context.  ... 
doi:10.1145/345508.345563 dblp:conf/sigir/PetasisCVPKS00 fatcat:jmfriwwgdrclnocwt4xeq23u74

Unsupervised Named Entity Recognition Using Syntactic and Semantic Contextual Evidence

Alessandro Cucchiarelli, Paola Velardi
2001 Computational Linguistics  
the effectiveness of using more fine-grained evidence--namely, syntactic and semantic contextual knowledge--in classifying NEs.  ...  Proper nouns form an open class, making the incompleteness of manually or automatically learned classification rules an obvious problem.  ...  Second, through an unsupervised corpus-based technique, typical PN syntactic and semantic contexts are learned.  ... 
doi:10.1162/089120101300346822 fatcat:hikpb332h5dqtftpu262qnyeg4

A resource-light approach to morpho-syntactic tagging. * Anna Feldman and Jirka Hana

L. Macken
2010 Literary and Linguistic Computing  
This volume deals with the problem of morpho-syntactic tagging, i.e.  ...  This volume describes an alternative approach to morpho-syntactic tagging by porting the relevant information from one language to a related language, avoiding thus the labour-  ...  This volume describes an alternative approach to morpho-syntactic tagging by porting the relevant information from one language to a related language, avoiding thus the labourintensive creation of an annotated  ... 
doi:10.1093/llc/fqq012 fatcat:53fribvvlzbi5gintrvhijro74

Using eigenvectors of the bigram graph to infer morpheme identity

Mikhail Belkin, John Goldsmith
2002 Proceedings of the ACL-02 workshop on Morphological and phonological learning -  
In particular, we look at the suffixes derived from a corpus by unsupervised learning of morphology, and we ask which of these suffixes have a consistent syntactic function (e.g., in English, -ed is primarily  ...  We exploit this technique for extending the value of automatic learning of morphology.  ...  other, independent heuristics (such as presence of suffixes determined by unsupervised learning of morphology) are syntactically homogenous.  ... 
doi:10.3115/1118647.1118652 dblp:conf/sigmorphon/BelkinG02 fatcat:mhwbsk4yr5anlh67z53w7d3bsy

Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation [article]

Kemal Oflazer and Gokhan Tur (Department of Computer Engineering, Bilkent University, Ankara Turkey)
1996 arXiv   pre-print
The unsupervised learning process produces two sets of rules: (i) choose rules which choose morphological parses of a lexical item satisfying constraint effectively discarding other parses, and (ii) delete  ...  Our approach also uses a novel approach to unknown word processing by employing a secondary morphological processor which recovers any relevant inflectional and derivational information from a lexical  ...  Acknowledgments We would like to thank Xerox Advanced Document Systems, and Lauri Karttunen of Xerox Parc and of Rank Xerox Research Centre (Grenoble) for providing us with the two-level transducer development  ... 
arXiv:cmp-lg/9604001v2 fatcat:jasty72cezhkdmc4gztn2ieope

Computational Models of Language Acquisition [chapter]

Shuly Wintner
2010 Lecture Notes in Computer Science  
can be highly informative of syntactic category This information can be extracted by some psychologically plausible mechanisms Using distributional information concerning syntactic categories involves  ...  of syntactic category This information can be extracted by some psychologically plausible mechanisms Using distributional information concerning syntactic categories involves three stages: 1 Measuring  ...  uses these labels to parse Evaluation on WSJ10 yields an f -score of almost 0.76 when parsing begins from plain text Note: algorithms for inducing part of speech categories from raw data (i.e., unsupervised  ... 
doi:10.1007/978-3-642-12116-6_8 fatcat:yeyxymemrnhchlpngncalixo3y

Unsupervised Disambiguation of Syncretism in Inflected Lexicons [article]

Ryan Cotterell, Christo Kirov, Sabrina J. Mielke, Jason Eisner
2020 arXiv   pre-print
One can, however, use unsupervised learning (as in EM) to fit a model that probabilistically disambiguates word forms.  ...  Lexical ambiguity makes it difficult to compute various useful statistics of a corpus. A given word form might represent any of several morphological feature bundles.  ...  Here, the goal is map sequences of forms into coarse-grained syntactic categories. Christodoulopoulos et al. (2010) provide a useful overview of previous work.  ... 
arXiv:1806.03740v2 fatcat:gxsxqf5lpvcx3d2tlygafbehxu
« Previous Showing results 1 — 15 out of 3,241 results