-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in harmonizationApply due to site differences between training and testing data #50
Comments
Thanks for submitting this issue. It's crucial to know whether your two testing sites are included within the nine training sites, or if the sites are entirely disjoint between training & testing. |
Hi there, And after that I getting above mentioned error. I found the below line in the document which pretty much explains why I am getting the error. However, I would like to know in my case what would be the feasible way to perform data harmonisation. I wanted to know whether do I need to concatenation both file and apply harmonisation of them or do it separately. |
This method is not designed to harmonize sites that are not part of the training data. That said, you should be able to fix the error by including all sites in the training and testing sets. Typically, users will designate a subset of their data (i.e., healthy controls) to train the harmonization model. Then, they will apply the model to all data (i.e., patients and healthy controls). If some of your sites only contain patients, then unfortunately this method will not be suitable. This is a known limitation of statistical harmonization methods and I am not aware of a method that appropriately addresses this situation. I am going to close this issue, but please feel free to re-open it if you have further questions. |
Hey, thanks for the awesome work. |
Please supply more details. What are the dimensions of the training data and testing data? Also, what are the dimensions of the training and testing covariates? Including your code may help diagnose the issue. |
Hey there,
Thank you for your exciting work.
I recently used this package for my work, and now I am a little bit confused about how to use it or whether the process I used is correct. I am doing a regression analysis, and the target prediction is Age. So, I collected healthy data from nine different sites, and then we had patients' data from two different sites.
combat_model, features_train_combat = harmonizationLearn(features_train, covariates_train) # smooth_terms=['Age']
When I pass the combat_model, which I get from the above function to the testing data, I get an error:
features_test_combat = harmonizationApply(features_test, covariates_test, combat_model)
ERROR:
IndexError: index 4 is out of bounds for axis 1 with size 4
I think I am getting this error because of the site difference between the training and testing data, as the training sample has nine sites, and the testing data has two sites. So my next question is, do I need to apply harmonization separately to each training and testing dataset?
I could not find any information about how to proceed or whether we should apply harmonization to all (training and testing) at once or apply it separately.
Can you suggest what I should do?
Thank you!
The text was updated successfully, but these errors were encountered: