4 Hits in 3.2 sec


Ziyu Yao, Daniel S. Weld, Wei-Peng Chen, Huan Sun
2018 Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18  
In this paper, we investigate a new problem of systematically mining question-code pairs from Stack Overflow (in contrast to heuristically collecting them).  ...  Furthermore, we present StaQC (Stack Overflow Question-Code pairs), the largest dataset to date of ~148K Python and ~120K SQL question-code pairs, automatically mined from SO using our framework.  ...  STAQC: A SYSTEMATICALLY MINED DATASET OF QUESTION-CODE PAIRS In this section, we present StaQC (Stack Overflow Question-Code pairs), a large-scale and diverse set of question-code pairs automatically mined  ... 
doi:10.1145/3178876.3186081 dblp:conf/www/YaoWCS18 fatcat:srpfshgrgnhclogc5w2onbpdim

CoNCRA: A Convolutional Neural Network Code Retrieval Approach [article]

Marcelo de Rezende Martins, Marco A. Gerosa
2020 arXiv   pre-print
We evaluated our approach's efficacy on a dataset composed of questions and code snippets collected from Stack Overflow.  ...  We propose a technique for semantic code search: A Convolutional Neural Network approach to code retrieval (CoNCRA).  ...  We evaluated the models on the StaQC dataset, a systematically mined question-code dataset from Stack Overflow [20] .  ... 
arXiv:2009.01959v1 fatcat:4jdnfwmlzbfo7honp3fqzc3jui

Code Generation from Natural Language with Less Prior and More Monolingual Data [article]

Sajad Norouzi, Keyi Tang, Yanshuai Cao
2021 arXiv   pre-print
By exploiting a relatively sizeable monolingual corpus of the target programming language, which is cheap to mine from the web, we achieved 81.03% exact match accuracy on Django and 32.57 BLEU score on  ...  This work investigates whether a generic transformer-based seq2seq model can achieve competitive performance with minimal code-generation-specific inductive bias design.  ...  We also would like to thank a number of Borealis AI colleagues for helpful discussions, including Wei (Victor) Yang, Peng Xu, Dhruv Kumar, and Simon J. D. Prince for feedback on the writing.  ... 
arXiv:2101.00259v2 fatcat:qytj2jhl6zbrlbz6qndpth5zxy

A Survey on Machine Learning Techniques for Source Code Analysis [article]

Tushar Sharma, Maria Kechagia, Stefanos Georgiou, Rohit Tiwari, Federica Sarro
2021 arXiv   pre-print
Additionally, we collate a comprehensive list of available datasets and tools useable in this context.  ...  Context: The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis such as testing and  ...  [355] used StaQC dataset [354] ; it contains more than 119 thousand pairs of question title and code snippet related to sql mined from StackOverflow. Xie et al.  ... 
arXiv:2110.09610v1 fatcat:jc6c3jnxcbekfbssyy7hn3zwxa