Algorithmic Programming Language Identification
release_e4tdjjdhfba5ffakiwwmmzf6qi
by
David Klein,
Kyle Murray,
Simon Weber
2011
Abstract
Motivated by the amount of code that goes unidentified on the web, we
introduce a practical method for algorithmically identifying the programming
language of source code. Our work is based on supervised learning and
intelligent statistical features. We also explored, but abandoned, a
grammatical approach. In testing, our implementation greatly outperforms that
of an existing tool that relies on a Bayesian classifier. Code is written in
Python and available under an MIT license.
In text/plain
format
Archived Files and Locations
application/pdf
83.6 kB
file_n4jql7upjnc6fmf42guo5x4fq4
|
arxiv.org (repository) web.archive.org (webarchive) |
1106.4064v1
access all versions, variants, and formats of this works (eg, pre-prints)