-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAG-based mapping and execution #31
Conversation
This commit also adds a CLI flag to just print out the map
…anywhere just works
Unclear on best terminology for the "soft inputs" that aggregate stages can now take. Currently it's |
There are cases where the same stage might get called multiple times with the same arguments - the DAG will only see that it doesn't have the cached values at first, and since it doesn't update as the experiment is running, it will think every subsequent call of that stage will still need to execute. By adding in the cache check even when the DAG things the stage should execute, we ensure that it can still use cached values from earlier in the experiment and skip execution.
It seems that if the root logger has a level set to INFO and there are no handlers, it will default to using a handler anyway. I resolved this by setting the logging level to error if quiet is set. Fixes #33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nathan and I discussed briefly at a high level and this approach seems sound to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problems with the overall approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few minor comments/questions and some code suggestions.
After discussion with Mark, we figured that a better way of handling The idea here would be to pass in the state objects via dictionaries that are keyed by the associated records in the passed records list. Records that don't have the specified name in state would simply not be part of the argument-specific dictionary. As an example, where a previous aggregate stage might have looked like: @aggregate(outputs=["final_results"])
def combine_results(record: Record, records: list[Record]):
final_results = {}
for record in records:
if "results" in record.state:
final_results[record.args.name] = record.state["results"]
return final_results could now become: @aggregate(inputs=["results"], outputs=["final_results"])
def combine_results(record: Record, records: list[Record], results: dict[Record, float]):
final_results = {}
for record, result in results.items():
final_results[record.args.name] = result
return final_results This eliminates the need to directly access state on records and simplifies having to check whether a given state has the artifact at all, while retaining the benefits above from adding in an |
Closes #29
The intent of this PR is to make curifactory primarily determine need-to-execute for each stage based on the DAG and where that stage's outputs are needed (based on cache status of later stage outputs and overwrite conditions etc.)
This allows skipping loading an artifact into memory if it won't ever be needed by an executed stage