Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel doesn't always work - issue is slurmId #53

Open
rohrlich opened this issue Feb 23, 2022 · 4 comments
Open

Cancel doesn't always work - issue is slurmId #53

rohrlich opened this issue Feb 23, 2022 · 4 comments

Comments

@rohrlich
Copy link
Contributor

I believe the failure to cancel is due to the wrong SlurmId and it always seems to be one less than it really is (e.g. the really ID is 268743 but grunt has it as 268742. This may only be a problem with array jobs. Anyone else see this occasional problem?

Workaround is of course to execute scancel on the server with the correct ID

@rcoreilly
Copy link
Member

The array job numbering may be off, however it has been working for me just in last day. Not sure what might be different? Are you using the newer array grunter script as found in grunt/grunter_array.py? there was an older version that did things a bit differently.

@rohrlich
Copy link
Contributor Author

Yes the latest version. Only happens sometimes but I don't have a lot of data because I don't often cancel. I can start watching to see if the ID is correct and update the ticket at some point.

@rohrlich
Copy link
Contributor Author

slurmId definitely off by one. I have the latest grunt code. Is there anything in the grunter code that might be out of date and I should change?

@rohrlich
Copy link
Contributor Author

Here is some likely helpful information --
job.slurmid has 3 values, e.g.
272125 (this is the one that shows up in the grunt slurm id column)
272126 (actual jobs)
272127 (waiting around to cleanup - I believe)

Here is what squeue shows
272127 high ac1_roh0 rohrlich PD 0:00 1 (Dependency)
272126_0 high ac1_roh0 rohrlich R 2:34 1 agate-12
272126_1 high ac1_roh0 rohrlich R 2:34 1 agate-12
272126_2 high ac1_roh0 rohrlich R 2:34 1 agate-12
272126_3 high ac1_roh0 rohrlich R 2:34 1 agate-16
272126_4 high ac1_roh0 rohrlich R 2:34 1 agate-16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants