Code Preparation for Machine Learning

Code Preparation for Machine Learning

Summary

Converting code into vectors or sequences

Abstract

Using machine learning for images, text, or audio has become popular and relatively mainstream. On the other hand, using machine learning for code is a rather new field. Only a few commercial products are available, and the research is still in its early stages. When trying to join this field, many different topics need to be explored.

This paper aims to bring a software engineer or a programming language researcher up to speed on the current state of machine learning and show the possibilities of such technologies in respect to code. It covers code2vec, code2seq, CuBERT, CoCluBERT, CodeBERT, TreeBERT, and DeepBugs AST Context Representations, with their respective backgrounds and list tools and points out available further research. It also covers a few use cases and gives a practical example that leads through the whole paper.

Download