Algorithmic Programming Language Identification release_e4tdjjdhfba5ffakiwwmmzf6qi

by David Klein, Kyle Murray, Simon Weber

Released as a article .

2011  

Abstract

Motivated by the amount of code that goes unidentified on the web, we introduce a practical method for algorithmically identifying the programming language of source code. Our work is based on supervised learning and intelligent statistical features. We also explored, but abandoned, a grammatical approach. In testing, our implementation greatly outperforms that of an existing tool that relies on a Bayesian classifier. Code is written in Python and available under an MIT license.
In text/plain format

Archived Files and Locations

application/pdf   83.6 kB
file_n4jql7upjnc6fmf42guo5x4fq4
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2011-06-21
Version   v1
Language   en ?
arXiv  1106.4064v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: bbc376bb-dd3a-40d9-8cf0-affd1640182a
API URL: JSON