Skip to content

Latest commit

 

History

History
68 lines (36 loc) · 3.8 KB

README.md

File metadata and controls

68 lines (36 loc) · 3.8 KB

PandaStats

Technology Exploration

I explored methods to integrate graphical user interfaces with code in Jupyter Notebook. Specifically, I explored the jupyter-widgets library and documentation on developing custom widgets.

Project Purpose

Data scientists often analyze their data and run statistical tests in Python and Jupyter Notebook. Running many statistical tests in Python in an organized fashion, however, can be challenging. When using existing statistics packages, it is easy to ignore important parameters or input incorrect data, especially when testing many different hypotheses.

Alternatively, GUI-based statistics softwares like SPSS are easy to use and force the user to deliberately select different parameters important to statistical tests, leading to fewer errors. However, these softwares are rarely used in practice because it is inconvenient to run statistical tests with a third-party software when most analyses are done in Python.

Here, I demonstrate a prototype for PandaStats a GUI-based widget for conducting statistical analyses in Jupyter Notebook. PandaStats has the potential to reduce errors in hypothesis testing by forcing users to deliberately select parameters and input variables via a GUI. PandaStats can be easily integrated with existing analysis code because it is a widget that can be run in the Jupyter Notebook environment.

Technology Fit

This prototype code works for a limited set of use cases and parameters described here. Specifically, it supports five statistical tests: one-way anovas, independent sample t-tests, simple linear regressions, correlations, and pairwise wilcoxon signed-rank tests. A more complete version of this idea would support more tests but also provide more robust parameter options for tests. Also, input validation has not been implemented, so incorrect inputs may return value erorrs.

To test the protyped code, see the jupyter notebook panda_stats_test.ipynb. It requires Python with version above 3.10 and pandas, pingouin, and ipywidgets as dependencies.

Specific test cases are described in the notebook and below.

PandaStats Use Cases

Examples of the PandaStats display and results of running different statistical tests for two different datasets are shown below.

These test cases are fully described in the associated jupyter notebook.

Dataset 1 - PandaStats Display Before Running Tests

Dataset 1

Dataset 1 - t-test

Dataset 1

Dataset 1 - regression

Dataset 1

Dataset 1 - wilcoxon

Dataset 1

Dataset 2 - PandaStats Display Before Running Tests

Dataset 2

Dataset 2 - ANOVA

Dataset 2

Dataset 2 - Spearman Correlation

Dataset 2

Dataset 2 - Pearson Correlation

Dataset 2

Author

Ethan Trepka

Acknowledgments

jupyter-widgets used for widget display, documentation linked here

pingouin used for statistics, documentation linked here.

pandas used for data types, documentation linked here.