🚧 UNDER CONSTRUCTION 🚧
nanoGPTxCodeCompletion is a GitHub repository inspired by the work of Andrej Karpathy with nanoGPT and minGPT. The goal of this project is to explore a small version of the GPT (Generative Pre-trained Transformer) architecture to perform code completion at the token level and in-line code completion.
This repository leverages the concepts and techniques presented in the nanoGPT and minGPT repositories. nanoGPT is a simple and fast repository for training/finetuning medium-sized GPT models, prioritizing efficiency and effectiveness. On the other hand, minGPT is a PyTorch re-implementation of GPT, focusing on being small, clean, interpretable, and educational.
Additionally, this project draws reference from CodeXGLUE, a benchmark dataset and open challenge introduced by researchers from Microsoft Research Asia, Developer Division, and Bing. CodeXGlue provides a collection of code intelligence tasks and a platform for model evaluation and comparison. One of the tasks included is code completion, which is essential in software development to enhance productivity.
The code completion task aims to predict the next code token based on the context of previous tokens. In this repository, we specifically focus on token-level code completion, which is analogous to language modeling. Models should be capable of predicting the next token across various types of code.