A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
Filters
Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus
[chapter]
2010
Lecture Notes in Computer Science
The proposed method is based on comparison with general reference corpus using log-likelihood similarity. ...
In the paper we present a method that allows an extraction of singleword terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. ...
The proposed method is based on comparison with general reference corpus using loglikelihood similarity that is used for corpus comparison. ...
doi:10.1007/978-3-642-13881-2_26
fatcat:wcitrfwuurb6bd74fb2asfcfom
Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code
2009
2009 6th IEEE International Working Conference on Mining Software Repositories
In this paper we present a lexical approach that uses the log-likelihood ratios of word frequencies to automatically provide labels for software components. ...
We present a prototype implementation of our labeling/comparison algorithm and provide examples of its application. ...
We thank Dominique Matter for his help with the parameters of log-likelihood ratios. ...
doi:10.1109/msr.2009.5069499
dblp:conf/msr/Kuhn09
fatcat:jqhojunm5nhfpfeibura7gnu2q
Taxonomy Extraction for Customer Service Knowledge Base Construction
[chapter]
2019
Lecture Notes in Computer Science
can improve the quality of automatically constructed taxonomic knowledge bases. ...
In this paper we explore the use of automatic taxonomy extraction from text as a means to reconstruct a customer-agent taxonomic vocabulary. ...
First we extract the terms that are most relevant to the domain, a task referred to as automatic term recognition (ATR). ...
doi:10.1007/978-3-030-33220-4_13
fatcat:2cio6ivakrbvnjwivtjvmln7xm
Clustering-based Approach to Multiword Expression Extraction and Ranking
2015
Proceedings of the 11th Workshop on Multiword Expressions
We present a domain-independent clusteringbased approach for automatic extraction of multiword expressions (MWEs). ...
The method combines statistical information from a general-purpose corpus and texts from Wikipedia articles. ...
For comparison, we use n-best lists that are ranked by popular association measures: t-score, log-likelihood, and MI. ...
doi:10.3115/v1/w15-0906
dblp:conf/mwe/Tutubalina15
fatcat:ylseuc3dc5hclcpuudb3ybum4q
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity in Web Corpora
[chapter]
2018
Communications in Computer and Information Science
We present a case study where we explore the effectiveness of different measures -namely the Mann-Withney-Wilcoxon Test, Kendall correlation coefficient, Kullback-Leibler divergence, log-likelihood and ...
Several studies have been carried out to assess the representativeness of general-purpose web corpora by comparing them to traditional corpora. ...
medical web corpora, one bootstrapped with hand-picked term seeds, and the other one bootstrapped with automatically extracted term seeds. ...
doi:10.1007/978-3-319-99133-7_17
fatcat:ncso5ksl5vfwvkrdeqehfqbuze
TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment
2013
Terminology
We report on TExSIS, a flexible bilingual terminology extraction system that uses a sophisticated chunk-based alignment method for the generation of candidate terms, after which the specificity of the ...
A comparison of our system with the LUIZ approach described by Vintar (2010) reveals that TExSIS outperforms LUIZ both for monolingual and bilingual terminology extraction. ...
(extraction corpus and general reference corpus). ...
doi:10.1075/term.19.1.01mac
fatcat:fh5gxl2ksfhvhiz3v5mvgtgfci
Text de-identification for privacy protection: A study of its impact on clinical text information content
2014
Journal of Biomedical Informatics
Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically deidentified clinical narratives has only barely been investigated ...
To study this impact in more details and assess how generalizable our findings were, we examined the overlap between select clinical information annotated in the 2010 i2b2 NLP challenge corpus and automatic ...
Acknowledgments We thank Abhisek Trivedi and Matthew Maw for their help with these studies. Research supported by VA HSR HIR 08-374. ...
doi:10.1016/j.jbi.2014.01.011
pmid:24502938
fatcat:rso3rhsnsfdnnksstsmdn5bqe4
Speaker Verification Using Support Vector Machines and High-Level Features
2007
IEEE Transactions on Audio, Speech, and Language Processing
We use support vector machine modeling of these n-gram frequencies for speaker verification. We derive a new kernel based upon linearizing a log likelihood ratio scoring system. ...
We demonstrate that our methods produce results significantly better than standard log-likelihood ratio modeling. ...
Both the standard TFIDF term weighting (9) and the log likelihood ratio (TFLLR) term weighting (6) methods were used. ...
doi:10.1109/tasl.2007.902874
fatcat:7hkkdid2a5dbboivvqxmikrs7y
A corpus of Australian contract language
2011
Proceedings of the 13th International Conference on Artificial Intelligence and Law - ICAIL '11
Profiling of the corpus is consistent with its suitability for use in language engineering applications. ...
The corpus conforms to Zipf's law and comparative type to token ratios are consistent with lower term sparsity (an expectation for legal language). ...
Applied to words, the method calculates the log likelihood ('LL') ratio of the frequency of a word in frequency lists extracted from each corpus. ...
doi:10.1145/2018358.2018387
dblp:conf/icail/CurtottiM11
fatcat:vvto626xzrdx7nlwjvn2q57bc4
A Corpus of Australian Contract Language: Description, Profiling and Analysis
2011
Social Science Research Network
Profiling of the corpus is consistent with its suitability for use in language engineering applications. ...
The corpus conforms to Zipf's law and comparative type to token ratios are consistent with lower term sparsity (an expectation for legal language). ...
Applied to words, the method calculates the log likelihood ('LL') ratio of the frequency of a word in frequency lists extracted from each corpus. ...
doi:10.2139/ssrn.2304652
fatcat:cjjgi5ytprgxvh4g7af3gcz6cq
Statistical termhood measurement for mono-word terms via corpus comparison
2009
2009 International Conference on Machine Learning and Cybernetics
This paper examines the performance of a number of statistical measures for mono-word termhood within a corpus comparison framework. ...
These measures are defined in terms of the frequency, information, and rank of a term candidate in a domain and a background corpus. ...
Rayson and Garside [8] identify key items to differentiate one corpus from another using the log-likelihood (LL) statistic. ...
doi:10.1109/icmlc.2009.5212765
fatcat:sxgbfsexdfecvi2e4ic44ogjkq
A Novel Method for Arabic Multi-Word Term Extraction
2014
International Journal of Database Management Systems
These methods present some drawbacks that limit their use. In fact they can only deal with bi-grams terms and their yield not good accuracies. ...
To evaluate and illustrate the efficiency of our proposed method for AMWTs extraction, a comparative study has been conducted based on Kalimat Corpus and using nine experiment schemes: In the linguistic ...
The aim of Extraction term is to automatically extract relevant terms from a given corpus. ...
doi:10.5121/ijdms.2014.6304
fatcat:iv22zu7tkzd5dcnh3em7tjda3e
Automatic analysis of dialect/language sets
2015
International Journal of Speech Technology
First, a method is proposed to measure spectral acoustic differences between dialects based on a volume space analysis within a 3D model using log likelihood score distributions derived from traditional ...
The proposed dialect proximity measures are evaluated and compared on a corpus of Arabic dialects, as well as a corpus of South Indian languages, which are closely related languages. ...
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) ...
doi:10.1007/s10772-014-9268-y
fatcat:ibdzfhwtkjcyna7xpaimz5egfe
Evaluation of automatic collocation extraction methods for language learning
2019
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
A number of methods have been proposed to automatically extract collocations, i.e., conventionalized lexical combinations, from text corpora. ...
This paper compares three end-to-end resources for collocation learning, all of which used the same corpus but different methods. ...
Wu for her insights on FLAX extraction, Aisulu for preparing the COCA list, Ivet for the help in large scale experiments and all the anonymous reviewers for their critical feedback. ...
doi:10.18653/v1/w19-4428
dblp:conf/bea/BhallaK19
fatcat:bvdmjypmxnfu5ac2y3efc5hbpu
Abstractive Summarization of Spoken and Written Conversations Based on Phrasal Queries
2014
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We rank and extract the utterances in a conversation based on the overall content and the phrasal query information. ...
Automatic and manual evaluation results over meeting, chat and email conversations show that our approach significantly outperforms baselines and previous extractive models. ...
We also would like to acknowledge the early discussions on the related topics with Frank Tompa. ...
doi:10.3115/v1/p14-1115
dblp:conf/acl/MehdadCN14
fatcat:f2dfsghiarf4bmcwgyabzomfqq
« Previous
Showing results 1 — 15 out of 11,868 results