Skip to content

WillemNieuwenhuis/anon_excel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Calculate paired T-test

Installation

The app is a python application and can be installed with pip:

pip install anon_excel-1.0.0-py3-none-any.whl

This will also install all dependencies (pandas and scipy).

Usage

usage: anonex [-h] [-a] [-c] [-o] [-s] [-t] [-x] folder

This app scans multiple sets of surveys. It offers an option to clean and store the survey data, and also an option to perform and store a T-test analysis. The T-test is only possible when both pre- and post- survey is available.
Optionally personal information is removed.

positional arguments:
  folder           Specify the folder with the excel report(s)

options:
  -h, --help       show this help message and exit
  -a, --anonymize  Anonymize personal data (default = No)
  -c, --color      Add colors in excel file with clean ranked data (default = No)
  -o, --overwrite  Overwrite existing excel outputs (default = No)
  -s, --strip      Strip leading s-char from s-number (default = No)
  -t, --ttest      Perform T-test calculation (default = No)
  -x, --clean      Save cleaned data (default = No)

The surveys

Inputs are surveys with common questions and students. The surveys are in Excel format. Surveys come in a pre-course survey and a post-course survey. The pre-course survey contains the survey result before lecture/workshop/course. The post-course survey contains survey results obtained after the lecture/workshop/course.

Only a single set of pre- and post-course surveys is expected. The filenames of pre- and post-course surveys are unique: the pre-course survey starts with Pre, the post-course surveys starts with Post. For example:

Pre-Course Survey_ Perceived Sense of Community in Blended Learning(1-89).xlsx
Post-Course Survey_ Perceived Sense of Community in Blended Learning(1-34).xlsx

Properties of the surveys:

  • Both contain student ID's and thus can identify individuals
  • The lists of students do not have to be identical
  • The lists of questions do not have to be the same
  • The answers of the questions are categorical: (Strongly agree (SA), Agree (A), Neutral (N), Disagree (D), Strongly Disagree (SD)

Ranking table

An additional input is the translation table to link the categorical survey values to numerical values. The location is assumed to be in the same folder as the survey data files. For now the name of this ranking table is fixed to:

Scoring.xlsx

[!Note] The ranking table is unique for each set of surveys.

Application requirements

The application can generate multiple outputs: both cleaned data and T-test are optional. When a T-test is calculated cleaning will also be run, but saving the cleaned data is still optional. Cleaned data excel file:

  • Output of cleanup up data for both pre-survey and post-survey data
  • Optionally anonymized student information (student number)
  • Optionally records are color-coded:
    • green record: appears in both survey results
    • purple/blue record: appears only in pre-survey results
    • red record: appears only in post-survey results

Analysis output excel file:

  • Paired T-test results, for both dimensions: questions and students
  • Descriptive statistics for the questions
  • Table with numerical rankings per question for all students: one table with the pre-survey questions, one with the post-survey questions and a third one with the questions exclusive to the post-survey Only questions common to pre- and post- survey are included in the first two tables.

Approach

  1. Find sets of surveys, if none found stop.
  2. Optionally encrypt the student data to remove possibility of identification
  3. Recode the categorical ranking into numerical values
  4. Clean the survey data; for cleaning only post-survey data is not required
  5. Optionally save the cleanup data. Data will be stored in subfolder cleaned
  6. App is finished if no T-test is specified
  7. Extract only the common questions from both surveys
  8. Filter the data to only use data from students participating in both surveys
  9. Calculate the T-test for each question
  10. Calculate the T-test for each student
  11. Save the result to the subfolder analysis

Anonymize student ID's

Turn ID codes in a unique code. This is done with a hashing function called "blake2b". This is a stable hashing function to guarantuee that the hashcode will be the same for the same student each time, as well as unique.

The hashed code will be only be visible in the optional cleaned data, and when the anonymize command line parameter is set appropriately. In the analysis result the anonymized ID's are replaced with human readable ID's: student_nn; all other identifyable data is absent from the analysis results.

The user can control how data is anonymized with the -a option:

  • if not specified: no anomyzation is stored in the output
  • if -a is specified: anonymization is applied on the student ID column and an additional column is added, the original ID column is retained.
  • if -aa is specified: anonymization is applied on the student ID column, an additional column is added, the original ID column is removed

Example command lines

Display help text anonex -h

Clean pre-survey data, removing letters from student numbers anonex -s -x <survey data folder>

Clean pre-survey data, removing non-digits from student numbers and anonymizing student numbers, overwriting any existing previous results. anonex -s -x -aa -o <survey data folder>

Clean pre-survey data, removing non-digits from student numbers and anonymizing student numbers, performing t-test analysis, overwriting any existing previous results. anonex -s -x -t -aa -o <survey data folder>

About

anonimize participants

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages