-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split ingestion pipeline #61
Conversation
Correct some of the list/non list codes in tests
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #61 +/- ##
==========================================
+ Coverage 97.58% 97.70% +0.12%
==========================================
Files 39 40 +1
Lines 1779 1962 +183
==========================================
+ Hits 1736 1917 +181
- Misses 43 45 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, few minor comments!
Splits the ingestion pipeline into two parts; fhirflat file creation and validation.
The initial file creation doesn't pack/unpack any pydantic FHIR resource classes, while validation tries to create resources but doesn't then convert them back into fhirflat files. Together, this provides a significant speedup even when running both file creation and validation together to mimic the previous
ingest_to_flat
behaviour.Previous speeds on a sample dataset:
New speeds doing ingestion + validation together (no parallel processing implemented):
Fixes #54