Code for analysis in the manuscript entitled "Development of a Simplified Smell Test to Identify Patients with Typical Parkinson’s as Informed by Multiple Cohorts, Machine Learning and External Validation".
medRxiv preprint doi: https://doi.org/10.1101/2024.08.09.24311696
Datasets of the 3 cohorts used in this study can be accessed via zenodo (link to be updated) upon request.
Note: please place the data files within the same working directory after downloading them from zenodo.
Data dictionary.xlsx
Ottawa_cut.csv
: Ottawa (PREDIGT) Trial (baseline)PROBE_cut.csv
: Prognostic Biomarkers in Parkinson’s Disease Study (PROBE; baseline)DeNoPa_cut.csv
: “De Novo Parkinson disease study” (DeNoPa; baseline, 48-month, and 72-month follow-up visits)
Note: the public accessible data files have gone through further de-identification steps to protect the identity of all study participants. If you wish to reproduce all the results in the manuscript, please contact the corresponding authors or cohort PIs to access the original data sets. The de-identification steps include:
- Transform continuous
age
to categoricalage_cat
: This will affect the results of Table 1, Supplemental Table 3, and Figure 7(b) in the medRxiv version. - Remove
disease.duration
: This will affect the results of Table 1. - Remove
race
,ethnic
: These variables were not directly used to generate any results in the manuscript, but were used to describe the race/ethnicity characteristics of the study cohorts.
Included in the Supportive data
folder in this repo:
diagnosis colors.csv
: colors representing each diagnostic groups for plotting.Published rankings.csv
: 8 previously published scent rankings, 4 for SST-ID, 4 for UPSIT.scent_shared.csv
: 11 scents that are shared by SST-ID and UPSIT. (Note: this file is generated by4. Combine UPSIT and SST-ID.qmd
, but is also required by1. Tables, distributions, ROCs.qmd
for plotting. So I included it here to avoid running error.)SST_ID_option.csv
: Options provided for each SST-ID scent, used for generating ICCs.UPSIT_key.csv
: answer keys of UPSIT.UPSIT_option.csv
: Options provided for each UPSIT scent, used for generating ICCs.
Important: This project used TestGardener version 3.2.6. TestGardener has had a major update recently (version 3.3.3) which includes various changes that are incompatible with code here. We will update the repo to accommodate the changes, but until that completes, please use TestGardener version 3.2.6 to run this code. You can find it here.
1. Tables, distributions, ROCs.qmd
- Code for
- Table 1: demographic and diagnostic characteristics of the 3 cohorts
- Table 2: AUC, sensitivity and specificity of SST-ID and UPSIT
- Supplemental Table 2: relationship between SST-ID/UPSIT scores with age, sex, and diagnostic groups
- Figure 2: distribution and AUC values of SST-ID and UPSIT
- Supplemental Figure 1: distribution and AUC values of SST-TH and SST-DS
- Need to read
Ottawa_cut.csv
PROBE_cut.csv
DeNoPa_cut.csv
Supportive data/diagnosis colors.csv
- Code for
2. Abbreviated Smell Test SST-ID.qmd
- Code for
- Figure 3 (a,b): rankings of SST-ID scents, percentages of correct scent identification in each group
- Supplemental Figure 2 (a): Percentage differences of correct scent identification between HC and PD/DLB groups (% HC - % PD/DLB) in DeNoPa (SST-ID)
- Figure 4 (a): comparing SST-ID rankings of this study with 4 previously published ones
- Figure 6 (a)-(c): validation of the SST-ID subsets
- Figure 5, Supplemental Figure 3, Supplemental Figure 4: ICCs of SST-ID scents
- Need to read
DeNoPa_cut.csv
Supportive data/Published rankings.csv
Supportive data/scent_shared.csv
Supportive data/SST_ID_option.csv
- Generate files: the intermediate results will be saved to a folder called
Generated files
. Note: you need to create such folder within your working directory.SST_ID_rankings.csv
: rankings of SST-ID scentsdf_sex_D_M.rds
,df_sex_D_F.rds
,df_sex_D.rds
: relationship between scent identification and sex of SST-ID in DeNoPadf_fine_D.rds
,df_age_D.rds
: relationship between scent identification and age of SST-ID in DeNoPa Note: These cannot be reproduced if you are using data from zenodo
- Code for
3. Abbreviated Smell Test UPSIT.qmd
- Code for
- Figure 3 (c,d): rankings of UPSIT scents, percentages of correct scent identification in each group
- Supplemental Figure 2 (b): Percentage differences of correct scent identification between HC and PD/DLB groups (% HC - % PD/DLB) in Ottawa Trial (UPSIT)
- Figure 4 (b): comparing UPSIT rankings of this study with 4 previously published ones
- Figure 6 (d,e): validation of the UPSIT subsets
- Figure 5, Supplemental Figure 3, Supplemental Figure 5: ICCs of UPSIT scents
- Need to read
Ottawa_cut.csv
PROBE_cut.csv
Supportive data/diagnosis colors.csv
Supportive data/Published rankings.csv
Supportive data/scent_shared.csv
Supportive data/UPSIT_key.csv
Supportive data/UPSIT_option.csv
- Generate files: the intermediate results will be saved to a folder called
Generated files
. Note: you need to create such folder within your working directory.UPSIT_rankings.csv
: rankings of UPSIT scentsdf_sex_O_M.rds
,df_sex_O_F.rds
,df_sex_O.rds
: relationship between scent identification and sex of UPSIT in Ottawa trialdf_fine_O.rds
,df_age_O.rds
: relationship between scent identification and age of UPSIT in Ottawa trial Note: These cannot be reproduced if you are using data from zenododf_sex_P_M.rds
,df_sex_P_F.rds
,df_sex_P.rds
: relationship between scent identification and sex of UPSIT in PROBEdf_fine_P.rds
,df_age_P.rds
: relationship between scent identification and age of UPSIT in PROBE Note: These cannot be reproduced if you are using data from zenodo
- Code for
4. Combine UPSIT and SST-ID.qmd
- Code for
- Table 3: rankings of the 11 scents shared by SST-ID and UPSIT
- Table 4: performance of the 7-scent abbreviated smell test
- Figure 7: relationship of scent identification with sex and age
- Need to read
Ottawa_cut.csv
PROBE_cut.csv
DeNoPa_cut.csv
Supportive data/diagnosis colors.csv
- Intermediate results generated by
2. Abbreviated Smell Test SST-ID.qmd
and3. Abbreviated Smell Test UPSIT.qmd
- Generate files: the intermediate results will be saved to a folder called
Generated files
. Note: you need to create such folder within your working directory.scent_shared.csv
: 11 scents that are shared by SST-ID and UPSIT.
- Code for
Supportive functions are placed in the R
folder. These functions need to be sourced in 2. Abbreviated Smell Test SST-ID.qmd
and 4. Combine UPSIT and SST-ID.qmd
.
itemAUC.R
: calculate each scent's AUC values and rank them, using cross-validationsubsetAUC.R
: calculate each subset's AUC values.- Modified
TestGardener
functions to generate ICCs:make.dataList.R
,Wbinsmth.R
,eval_surp.R
,ICC.plot.R