Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task
Code for the program classification algorithms described in the paper "Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task" [1].
- Install Docker CE and GNU make.
- Clone the repository, then clone the submodules using
git submodule update --init --recursive
- Download the dataset [2] from Zenodo and extract the
task-*.csv
files intosrc/data
. - Classification targets can contain digits, so navigate to
external/code2vec/common.py
and apply the patch:
@staticmethod
def legal_method_names_checker(special_words, name):
- return name != special_words.OOV and re.match(r'^[a-zA-Z|]+$', name)
+ return name != special_words.OOV
- Run
make notebook
from repository root, run the notebooks.
- Gorchakov, A.V.; Demidova, L.A.; Sovietov, P.N. Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task. Future Internet 2023, 15, 314.
- Demidova, L.A.; Andrianova, E.G.; Sovietov, P.N.; Gorchakov, A.V. Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant. Data 2023, 8 (6), p. 109.
If you use the code available in this repository in your research work, please consider citing our paper [1] published in Future Internet:
Gorchakov, A.V.; Demidova, L.A.; Sovietov, P.N. Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task. Future Internet 2023, 15, 314. https://doi.org/10.3390/fi15090314