-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse table support - new feature #71
Comments
Hi, thanks for reporting this. I wouldn't change the default behavior, but I am keen on adding an option to return a long format dataset instead. Let me have a look into it! |
Hi again @SoccerGeekPhD, I experimented a little bit on this and can share the following.
I was wondering, do you have any specific benchmark on the performance of the package that you can share here? I will close this issue for now, but feel free to reopen it if you have more details to share. Alternatively, I would recommend checking out the {icd} package (currently only available on GitHub), which uses sparse matrix multiplication and compiled code under the hood, see e.g. this draft paper. Note that the {comorbidity} package is much faster now, so the benchmarks in the article are not accurate anymore. I hope this helps, Alessandro |
The wide data frame returned by comorbidity may consume a lot of memory and does not match the input format.
Would changing the output to a sparse format of two columns {ID, Name} just like the input to the function be helpful for memory and performance?
For example, my data science team in a large insurance company takes a claim with up to 9 ICD10 codes on a claim with a specific service date. This claim is pivoted to {patientID, ICD10, date, ...} then filtered to unique {patientID, ICD10} over a time frame to create the input to comorbidity(). So can you keep this narrow format for output?
The text was updated successfully, but these errors were encountered: