Deep learning applied to the assessment of online student programming exercises
MetadataShow full item record
Massive online open courses (MOOCs) teaching coding are increasing in number and popularity. They commonly include homework assignments in which the students must write code that is evaluated by functional tests. Functional testing may to some extent be automated however provision of more qualitative evaluation and feedback may be prohibitively labor-intensive. Provision of qualitative evaluation at scale, automatically, is the subject of much research effort. In this thesis, deep learning is applied to the task of performing automatic assessment of source code, with a focus on provision of qualitative feedback. Four tasks: language modeling, detecting idiomatic code, semantic code search, and predicting variable names are considered in detail. First, deep learning models are applied to the task of language modeling source code. A comparison is made between the performance of different deep learning language models, and it is shown how language models can be used for source code auto-completion. It is also demonstrated how language models trained on source code can be used for transfer learning, providing improved performance on other tasks. Next, an analysis is made on how the language models from the previous task can be used to detect idiomatic code. It is shown that these language models are able to locate where a student has deviated from correct code idioms. These locations can be highlighted to the student in order to provide qualitative feedback. Then, results are shown on semantic code search, again comparing the performance across a variety of deep learning models. It is demonstrated how semantic code search can be used to reduce the time taken for qualitative evaluation, by automatically pairing a student submission with an instructor’s hand-written feedback. Finally, it is examined how deep learning can be used to predict variable names within source code. These models can be used in a qualitative evaluation setting where the deep learning models can be used to suggest more appropriate variable names. It is also shown that these models can even be used to predict the presence of functional errors. Novel experimental results show that: fine-tuning a pre-trained language model is an effective way to improve performance across a variety of tasks on source code, improving performance by 5% on average; pre-trained language models can be used as zero-shot learners across a variety of tasks, with the zero-shot performance of some architectures outperforming the fine-tuned performance of others; and that language models can be used to detect both semantic and syntactic errors. Other novel findings include: removing the non-variable tokens within source code has negligible impact on the performance of models, and that these remaining tokens can be shuffled with only a minimal decrease in performance.