A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Evaluating Large Language Models Trained on Code
[article]
2021
arXiv
pre-print
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. ...
On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves ...
Finally, we thank GitHub for partnering to build GitHub Copilot and Microsoft Azure for supporting model training with infrastructure management. ...
arXiv:2107.03374v2
fatcat:tnan6rhwq5fsfek2jydeesgmmy
A Systematic Evaluation of Large Language Models of Code
[article]
2022
arXiv
pre-print
We further identify an important missing piece in the form of a large open-source model trained exclusively on a multi-lingual corpus of code. ...
Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. ...
data, but the model itself is much smaller.There was no large open-source language model trained almost exclusively on code from multiple programming languages. ...
arXiv:2202.13169v3
fatcat:y7ukjlndkbe4fnedymmwezvx4m
Improving Code Autocompletion with Transfer Learning
[article]
2021
arXiv
pre-print
In this paper, we investigate the efficacy of pretraining autocompletion models on non-IDE, non-autocompletion, and different-language example code sequences. ...
But what if limited examples of IDE autocompletion in the target programming language are available for model training? ...
RQ3: Consider the case where a large training corpus is available in one language but not another. ...
arXiv:2105.05991v2
fatcat:upytxisexrajjioyppmbtx3rfa
CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing
[article]
2021
arXiv
pre-print
CodeTrans outperforms the state-of-the-art models on all the tasks. ...
Currently, a growing number of mature natural language processing applications make people's life more convenient. Such applications are built by source code - the language in software engineering. ...
All CodeTrans models could not be easily publicly accessible without the amazing support from the Hugging Face team; that is why we are very grateful to Patrick von Platen, Julien Chaumond, and Clément ...
arXiv:2104.02443v2
fatcat:loq72uu5a5a4tnxkdvaskros7a
CoTexT: Multi-task Learning with Code-Text Transformer
[article]
2021
arXiv
pre-print
Using self-supervision, CoTexT is pre-trained on large programming language corpora to learn a general understanding of language and code. ...
We first evaluate CoTexT with multi-task learning: we perform Code Summarization on 6 different programming languages and Code Refinement on both small and medium size featured in the CodeXGLUE dataset ...
Evaluation Tasks We evaluate our programming language and natural language generation tasks on TPU v2-8 with the settings from the original T5 model (Raffel et al., 2019) . ...
arXiv:2105.08645v4
fatcat:6mtdmdz2tjavreva3dhqbw2cri
Search4Code: Code Search Intent Classification Using Weak Supervision
[article]
2021
arXiv
pre-print
We evaluate the approach against several baselines on a real-world dataset comprised of over 1 million queries mined from Bing web search engine and show that the CNN based model can achieve an accuracy ...
Recently, natural language based code search has been an active area of research. However, the lack of real-world large-scale datasets is a significant bottleneck. ...
Discriminative Model Evaluation Te evaluate the efficacy of the various discriminative models for code search intent detection, we first train each model on the train data and compare the performance scores ...
arXiv:2011.11950v3
fatcat:ptp7vv6mj5eppde7esjqfiawpq
Learning Autocompletion from Real-World Datasets
[article]
2020
arXiv
pre-print
To combat this effect, we train models on real-world code completion examples and find that these models outperform models trained on committed source code and working version snapshots by 12.8% and 13.8% ...
Furthermore, our study characterizes a large corpus of logged autocompletion usages to investigate why training on real-world examples leads to stronger models. ...
Datasets Software language models are typically trained and evaluated offline on large corpora of existing source code [2] - [5] , [7] , [8] . ...
arXiv:2011.04542v1
fatcat:jnrpn77o7rgahd6dqgcyu4qd7q
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
[article]
2020
arXiv
pre-print
We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters. ...
We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). ...
model pre-trained on code only. ...
arXiv:2002.08155v4
fatcat:zgz4uic7fvazdm5fbnkjfrefqa
DeepClone: Modeling Clones to Generate Code Predictions
[article]
2020
arXiv
pre-print
DeepClone applies natural language processing techniques to learn from a large code corpus, and generates code tokens using the model learned. ...
) based on the code written so far. ...
Sohaib Khan (CEO at Hazen.ai) for providing us valuable feedback on the experimentation part of neural networks. We acknowledge SURFsara of providing us credits to perform experiments. ...
arXiv:2007.11671v2
fatcat:xh2rl2b5lrcnhbfbirlgtuqppu
Maybe Deep Neural Networks are the Best Choice for Modeling Source Code
[article]
2019
arXiv
pre-print
But traditional language models limit the vocabulary to a fixed set of common words. For code, this strong assumption has been shown to have a significant negative effect on predictive performance. ...
To our knowledge, this is the largest neural language model for code that has been reported. ...
A language model is a probability distribution over strings; by training a language model (LM) on a large corpus of well-written code, we hope that the LM will assign high probability to new code that ...
arXiv:1903.05734v1
fatcat:lql53s3x4zdf5d7nsv5h2jnw54
On the Effectiveness of Transfer Learning for Code Search
[article]
2021
arXiv
pre-print
To this end, we pre-train a BERT-based model on combinations of natural language and source code data and evaluate it on pairs of StackOverflow question titles and code answers. ...
In cases where the model was pre-trained on natural language "and" source code data, it also outperforms an information retrieval baseline based on Lucene. ...
In NLP, pre-training usually consists of learning a language model on large corpora of natural language text. ...
arXiv:2108.05890v1
fatcat:jg5lmsk7bzcffe67dofe5t44s4
Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs
[article]
2021
arXiv
pre-print
OpenAI's Codex, a GPT-3 like model trained on a large code corpus, has made headlines in and outside of academia. ...
Our initial evaluation uses the multi-language QuixBugs benchmark (40 bugs in both Python and Java). ...
Taken together, these laws provide evidence for training very large language models. ...
arXiv:2111.03922v1
fatcat:3kwinwq3gjhsva6eeazujtlq4q
On The Cross-Modal Transfer from Natural Language to Code through Adapter Modules
[article]
2022
arXiv
pre-print
Pre-trained neural Language Models (PTLM), such as CodeBERT, are recently used in software engineering as models pre-trained on large source code corpora. ...
These adapters are trained using programming languages and are inserted in a PTLM that is pre-trained on English corpora (N-PTLM). ...
These tasks exist on the CodeXGLUE [20] , General Language Understanding Evaluation benchmark for CODE 1 , which evaluates neural models that are trained for source code. ...
arXiv:2204.08653v1
fatcat:ffl6xpm4lra5hagsskeuycxxgu
JUNLP@Dravidian-CodeMix-FIRE2020: Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags
[article]
2020
arXiv
pre-print
Since the social media texts are not in one language and are largely code-mixed in nature, the traditional sentiment classification models fail to produce acceptable results. ...
The presented algorithm, when evaluated on the test data, garnered precision, recall, and F1 scores of 0.59, 0.66, and 0.58 respectively. ...
The designed model was evaluated on the test data and garnered an F1 score of 0.58. The code of our developed model is available as a git repository here. ...
arXiv:2010.10111v1
fatcat:itkfdmqojjeb5npfzv7ojrqsx4
Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data
[article]
2020
arXiv
pre-print
of patient records are available for training the clinical prediction model. ...
This process is often constrained by having a relatively small number of patient records for training the model. ...
Finally, the re-trained clinical prediction models were evaluated on the held out test set. ...
arXiv:2001.05295v2
fatcat:nxznepbgtjhjpn2ne6haweqprm
« Previous
Showing results 1 — 15 out of 316,727 results