Cleanup Docker Containers immediately for samples with errors #706
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For the Docker sandbox we have a deferred cleanup logic that allows users to see console progress for cleanup. This was initially put in because cleanup could be very long (in which case users might Ctrl+C, leaking containers).
At the time there was no concept of multiple concurrent tasks and no concept of
fail_on_error
, which meant at a practical level "deferred" cleanup was really just deferring a few hundred milliseconds (because any error implied that the entire eval was ending). In the meantime, a number of things have changed:(1) We have resolved the long cleanup issues w/ a combination of
init: true
andstop_grace_period: 1s
(which means that containers terminate typically in 1 second or less)(2) We have added two independent ways that evals can continue in the presence of errors: (a)
max_tasks > 1
means that one task can fail but others keep running; and (b)fail_on_error
means that a sample can fail but the others keep running.The implication of (2) above means that our deferral can result in many containers hanging around far longer than they are needed (whereas before this could never happen b/c errors ended the entire eval).
This change modifies our setting of the
interrupted
flag -- we now only set this forasyncio.CancelledError
(which will deterministically end the entire eval). This means that when ordinary sample errors occur their container(s) will be cleaned up immediately.