Skip to content

milica-skipina/ML-code-smell-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML-code-smell-detection

This repository contains the reproducibility package for the paper Automatic detection of Feature Envy and Data Class code smells using machine learning. We used the MLCQ dataset for Data Class and Feature Envy code smell detection in our experiments:

Madeyski, L. and Lewowski, T., 2020. MLCQ: Industry-relevant code smell data set. In Proceedings of the Evaluation and Assessment in Software Engineering (pp. 342-347).

publicly available at https://zenodo.org/record/3666840#.YnOJ1ehBwuU.

Dataset

A dataset containing code snippets annotated for the presence of Feature Envy and Data Class code smells from the MLCQ dataset that were available for download:

Dataset has been divided into the training (80%) and test (20%) datasets via a stratified random sampling strategy. Each experiment has been repeated 51 times on different train-test dataset splits (feature envy and data class Jupyter notebooks) in order to get more reliable results. These train-test dataset splits can be found:

Features extraction

We extracted the following features:

Results

Jupyter notebooks evaluating the performance of all approaches:

Feature importance analysis

Jupyter notebooks presenting the most important features of models trained over 51 trials using source code metrics:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published