108,282 Hits in 7.4 sec

Combining Code Embedding with Static Analysis for Function-Call Completion [article]

M. Weyssow, H. Sahraoui, B. Frénay, B. Vanderose
2020 arXiv   pre-print
In this work, we present a novel approach for improving current function-calls completion tools by learning from independent code repositories, using well-known natural language processing models that  ...  We evaluated our approach on a set of open-source projects unseen during the training.  ...  Evaluating the n-gram Language Model For this research question (RQ3), we adapted the steps 3, 4 and 5 of our approach (see Section III-B).  ... 
arXiv:2008.03731v2 fatcat:yytgbdsh7bhwhfq3vu7rbpisli

An Exploration of Learning to Link with Wikipedia: Features, Methods and Training Collection [chapter]

Jiyin He, Maarten de Rijke
2010 Lecture Notes in Computer Science  
We apply machine learning methods to the anchor-to-best-entry-point task and explore the impact of the following aspects of our approaches: features, learning methods as well as the collection used for  ...  The new Wikipedia collection which is of larger size and which has more links than the collection previously used, provides better training material for learning our models.  ...  For both levels of evaluation (using A2B and using A2F), the best run is the heuristic run, which outperforms all sophisticated learning methods.  ... 
doi:10.1007/978-3-642-14556-8_32 fatcat:i72v3ks6gjb5tig2queqqo3rvm

Stepwise API usage assistance using n -gram language models

André L. Santos, Gonçalo Prendi, Hugo Sousa, Ricardo Ribeiro
2017 Journal of Systems and Software  
In this article we describe an approach for recommending subsequent tokens to complete API sentences using n-gram language models built from source code corpora.  ...  The approach was evaluated against existing client code of four widely used APIs, revealing that in more than 90% of the cases the expected subsequent token is within the 10-top-most proposals of our models  ...  Acknowledgement We would like to thank Fernando Batista and the anonymous reviewers for their valuable comments for improving earlier drafts of this article.  ... 
doi:10.1016/j.jss.2016.06.063 fatcat:g3kzv2tvabctzl6hg6yhu73h7e

Learning natural coding conventions

Miltiadis Allamanis, Earl T. Barr, Christian Bird, Charles Sutton
2014 Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014  
We apply NATURALIZE to suggest natural identifier names and formatting conventions.  ...  We used NATURALIZE to generate 18 patches for 5 open source projects: 14 were accepted.  ...  Instead, n-gram models are trained using smoothing methods [22] . In our work, we use Katz smoothing.  ... 
doi:10.1145/2635868.2635883 dblp:conf/sigsoft/AllamanisBBS14 fatcat:omuai7esnfcfhkh4aurbda5tci

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling

Xu Han, Jung-jae Kim, Chee Keong Kwoh
2016 Journal of Biomedical Semantics  
Results and conclusion: We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs  ...  and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins).  ...  By using the word similarity, the n-gram model method is further improved, as the deficiency of n-gram model goes from 0.790 to 0.769, an improvement of 2.66 %.  ... 
doi:10.1186/s13326-016-0059-z pmid:27127603 pmcid:PMC4849099 fatcat:sm5l5g4gnrbehcdlqxlv4sjfee

Duplicate-Search-Based Image Annotation Using Web-Scale Data

Xin-Jing Wang, Lei Zhang, Wei-Ying Ma
2012 Proceedings of the IEEE  
ABSTRACT | Easy photo-taking and photo-sharing today make image an increasingly important type of media in people's everyday life, which arouses a growing demand for a practical Manuscript  ...  Annotation of images on the Web, based on label propagation over similar images and social information, is discussed in this paper; a system called Arista is used to demonstrate scalability.  ...  Dai, and X. Zhang for their work on the Arista system.  ... 
doi:10.1109/jproc.2012.2193109 fatcat:fox4i4n53bhp3kwuekocwziua4

Syllabification Model of Indonesian Language Named-Entity Using Syntactic n-Gram

Ahmad Muammar Fanani, Suyanto Suyanto
2021 Procedia Computer Science  
In this research, a syntactic n-Gram is proposed and investigated to syllabify the named entities since it is developed based on the n-gram that has an excellent accuracy and tends to be consistent with  ...  In this research, a syntactic n-Gram is proposed and investigated to syllabify the named entities since it is developed based on the n-gram that has an excellent accuracy and tends to be consistent with  ...  These result suggests that the model is proven to be better at handling named entities than the standard n-gram model.  ... 
doi:10.1016/j.procs.2021.01.058 fatcat:ak6uhecstfgytmjeoxd5ps2jf4

Kode_Stylers: Author Identification through Naturalness of Code: An Ensemble Approach

Panyawut Sriiesaranusorn, Supatsara Wattanakriengkrai, Teyon Son, Takeru Tanaka, Christopher Wiraatmaja, Takashi Ishio, Raula Gaikovina Kula
2020 Forum for Information Retrieval Evaluation  
In this working note, we (i) present methods to obtain features such as tokenization, N-gram TF-IDF, warning messages, and coding styles, (ii) implement our framework using Random Forest and Transformer  ...  Our team, namely Kode_Stylers, participated in the competition and used the naturalness of code as the key to our solution.  ...  Acknowledgments This work has been supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers JP18H04094, JP18KT0013, JP20K19774, and JP20H05706.  ... 
dblp:conf/fire/Sriiesaranusorn20 fatcat:fjifv7zgjjgblmffehkulocw5e

Suggesting accurate method and class names

Miltiadis Allamanis, Earl T. Barr, Christian Bird, Charles Sutton
2015 Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015  
However, suggesting names for methods and classes is much more difficult.  ...  Recent progress on automatically suggesting names for local variables tantalizes with the prospect of replicating that success with method and class names.  ...  Nonetheless, we use an n-gram model as a point of comparison for naming methods and classes to demonstrate the performance of our approach as that model has performed the best for variable naming in the  ... 
doi:10.1145/2786805.2786849 dblp:conf/sigsoft/AllamanisBBS15 fatcat:ngk2al4r25appe2qieqyyzdu4e

Automatic Spelling Correction based on n-Gram Model

S. M., A. Abd
2018 International Journal of Computer Applications  
The proposed model provides correction suggestions by selecting the most suitable suggestions from a list of corrective suggestions based on lexical resources and n-gram statistics.  ...  The evaluation of the proposed model uses English standard datasets of misspelled words. Error detection, automatic error correction, and replacement are the main features of the proposed model.  ...  Two commonly approaches for error detection are dictionary lookup and n-gram analysis.  ... 
doi:10.5120/ijca2018917724 fatcat:af2rsml5czevnbvefijdzl333m

Multidocument Arabic Text Summarization Based on Clustering and Word2Vec to Reduce Redundancy

Waheeb, Khan, Chen, Shang
2020 Information  
We adopted Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as an evaluation measure to examine our proposed technique and compare it with state-of-the-art methods.  ...  In this study, we adopt a preprocessing strategy to solve the noise problem and use the word2vec model for two purposes, first, to map the words to fixed-length vectors and, second, to obtain the semantic  ...  Acknowledgments: We are very grateful to the Chinese Scholarship Council (CSC) for providing us financial and moral support. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/info11020059 fatcat:qhu772l35zaqxntyskuynkl3ve

An Empirical Study of Chinese Name Matching and Applications

Nanyun Peng, Mo Yu, Mark Dredze
2015 Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)  
We evaluate methods for name matching in Chinese, including both string matching and learning approaches.  ...  Methods for name matching, an important component to support downstream tasks such as entity linking and entity clustering, have focused on alphabetic languages, primarily English.  ...  We use an SVM with a linear kernel. To learn possible edit rules for Chinese names we add features for pairs of n-grams.  ... 
doi:10.3115/v1/p15-2062 dblp:conf/acl/PengYD15 fatcat:n5wtnxf7ovartofb6lxgmz5qfm

Comparing Feature Engineering Approaches to Predict Complex Programming Behaviors

Wengran Wang, Yudong Rao, Yang Shi, Alexandra Milliken, Chris Martens, Tiffany Barnes, Thomas W. Price
2020 Educational Data Mining  
A piece of programming code is structurally represented by an abstract syntax tree (AST), and a variety of approaches have been proposed to extract features from these ASTs to use in learning algorithms  ...  However, we also find evidence that all approaches led to overfitting, suggesting the need for future research to select and reduce code features, which may reveal advantages in more complex feature engineering  ...  We extracted pq-Grams using the same approach. Training and evaluation.  ... 
dblp:conf/edm/WangRSMMBP20 fatcat:gvegzm3dlbh6lkatqbllopwwcy

On-demand new word learning using world wide web

Stanislas Oger, Georges Linares, Frederic Bechet, Pascal Nocera
2008 Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing  
We first demonstrate the relevance of the Web for the OOV word retrieval. Then, different methods are proposed to retrieve the hypothesis words.  ...  Most of the Web-based methods for lexicon augmenting consist in capturing global semantic features of the targeted domain in order to collect relevant documents from the Web.  ...  Collecting augmented lexicons Here we evaluate the performance of various methods for OOV word retrieval using both the ASR system outputs and the exact transcripts.  ... 
doi:10.1109/icassp.2008.4518607 dblp:conf/icassp/OgerLBN08 fatcat:xfkk4ap355hqbkdlt7ian4qzm4

Report on the TREC 2006 Genomics Experiment

Samir Abdou, Jacques Savoy
2006 Text Retrieval Conference  
In an effort to find text passages that will meet user requests, we propose and evaluate a new approach to the generation of orthographic variants of search terms (mainly genomic names in our case).  ...  Moreover when comparing a 5-gram indexing approach to word-based indexing schemes, the mean average precision decreases by about 10% when using the n-gram indexing scheme.  ...  The run labeled "UniNE1" combines both the I(n)B2 model (word-based and 5-gram) and the Okapi approach (word-based). As a data fusion approach, we used the zscore method [14] .  ... 
dblp:conf/trec/AbdouS06 fatcat:la5rpd3nfzc6po4cxiarghhe5q
« Previous Showing results 1 — 15 out of 108,282 results