-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: finemapping template and DAG for UKB PPP #10
Conversation
99e0990
to
300425b
Compare
@project-defiant This has been quite substantially rewritten compared to the first draft. Could you do another round of reviews, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good setup for me to use it during the gwas_catalog etl step (also for other ones that require finemapping).
**common.shared_dag_kwargs, | ||
) as dag: | ||
( | ||
FinemappingBatchOperator.partial( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The partial will not work beyond the threshold, and I have tested it on local airflow DAG, this breaks on around ~5k partial tasks even with the threshold increase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but just to clarify, in this PR the partial/expand routine iterates not on individual loci (of which there are potentially 100,000s in the worst case), but on chunks of the manifest, of which there are <10 in either case
* feat: template for creating finemapping jobs * feat: example DAG for creating finemapping jobs * fix: quote parameters containing = for Hydra * chore: add GENTROPY_DOCKER_IMAGE to common layer * feat: always use a list of jobs in the DAG * refactor: use manifest as input * feat: implement generate_manifests_for_finemapping * refactor: rewrite the DAG to use new functions * fix: import errors in DAG * fix: multiple fixes following test runs
The idea is to have a common finemapping template, which specific DAGs can reuse and modify according to their needs.