Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an error depending on the expiration of TemporalConfigStorage #1513

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TrsNium
Copy link
Contributor

@TrsNium TrsNium commented Dec 23, 2020

objective

KubernetesCommandExecutor now throws an exception when the execution time of a pod exceeds the defaultPodTTL.
But we also need to consider that the pod execution time may exceed expiration of TemporalConfigStorage.
This is because pod error detection can be done early.

problem

if we continue processing when the pod's execution time exceed the expiration of TemporalConfigStorage, the curl command will raise an error when trying to upload the archive to the expired TemporaryConfigStorage at the end of the pod process.
It is inefficient to not notice error until the end of the pod process.

Fail if inTemporalConfigStorage has expired.

bashArguments.add(s("curl -s \"%s\" --output %s", url, relativeArchivePath.toString()));

Fail if outTemporalConfigStorage has expired.

bashArguments.add(s("curl -s -X PUT -T %s -L \"%s\"", outputArchivePathName, url));

changes

Make throwing an error depending on the expiration of TemporalConfigStorage.

reopened #1359

@TrsNium TrsNium changed the title Throw an error depending on the expiration date of TemporalConfigStorage Throw an error depending on the expiration of TemporalConfigStorage Dec 23, 2020
@yoyama yoyama self-requested a review December 23, 2020 07:36
@yoyama yoyama added this to the v0.10.0 milestone Dec 23, 2020
@TrsNium TrsNium requested a review from yoyama January 4, 2021 11:20
@yoyama yoyama modified the milestones: v0.10.0, v0.10.1 Jan 26, 2021
@yoyama yoyama requested a review from szyn February 10, 2021 06:22
@yoyama
Copy link
Contributor

yoyama commented Feb 10, 2021

@szyn Could you review this PR?
LGTM for me.

@@ -276,7 +293,7 @@ CommandStatus getCommandStatusFromKubernetes(final CommandContext context,
final InputStream in = outConfigStorage.getContentInputStream(outputArchiveKey);
ProjectArchives.extractTarArchive(context.getLocalProjectPath(), in); // runtime exception
}
else if (defaultPodTTL.isPresent() && isRunningLongerThanTTL(previousStatusJson)) {
else if (isRunningLongerThanOutConfigStorageExpiration(previousStatusJson) || (defaultPodTTL.isPresent() && isRunningLongerThanTTL(previousStatusJson))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are using a non-blocking operator such as py>. In this case, IIRC, (defaultPodTTL.isPresent() && isRunningLongerThanTTL(previousStatusJson)) will never become true. In addtion, this block will not execute when user/WorkflowExecutionTimeoutEnforcer cancel-requested to the attempt. Is it OK/acceptable for you?
We also had the same kind of issue. Thus we introduce the cleanup function to the Operator and CommnadExecutor interfaces and implements them in PyOperatorFactory and EcsCommandExecutor. Please see #1480, and this function helps to perform the cleanup process if the attempt received cancel-requested. I feel it is a good idea to handle it by using the same way to avoid such an issue.
How about adding it to KubernetesCommandExecutor as well? How do you think about that?

@szyn szyn modified the milestones: v0.10.1, v0.10.2 Jun 3, 2021
@szyn szyn modified the milestones: v0.10.2, v0.10.3 Jul 22, 2021
@szyn szyn modified the milestones: v0.10.3, v0.10.4 Sep 22, 2021
@szyn szyn force-pushed the master branch 3 times, most recently from c55e9cc to bd66ae8 Compare November 25, 2021 05:00
@szyn szyn removed this from the v0.10.4 milestone Jan 14, 2022
@szyn szyn added this to the v0.10.x milestone Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants