-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Tests timeouts 6h+, maybe r.futures.calib hangs #1312
Comments
@petrasovaa I saw that you were involved in some r.futures work. Do you know if there's something there to check out? |
On my fork, I saw some timeouts 3 months ago and 2 weeks ago, but for 8.4.0. So maybe not a regression but something from addons specifically. I also see that r.random.walk could be really long too. https://github.com/echoix/grass-addons/actions/runs/13048290473/job/36402778863 Are we hitting a memory/swap limit? The job could get killed in that case no? |
Well, it is a non-trivial model, so it's possible something is wrong, but where do you even see it and how often it happens? |
Pick any one of the 6h ones of https://github.com/OSGeo/grass-addons/actions/workflows/ci.yml?query=is%3Afailure |
A harder to debug situation would be if this and/or r.random.walk are stochastic/gradient descent and they don't converge, ie, convergence depends on the initial state or seed. |
We could always limit the test step at 45 minutes and save ourselves 5 hours of CI time in the European mornings |
Limit sounds good, but the per test limit seems more informative for s failures. |
We might as well use both, as if we're effectively hung, the timeout might not work |
I've got to a point where I'm able to run successfully r.futures.calib and r.futures.parallelpga with pytest, but not with gunittest, as r.futures.calib still hangs with gunittest. I overcame one of our previous limitations of using pytest: I managed to write an autouse fixture that is enabled when the test directory is "testsuite", and contains a data directory. If so, it copies the data directory to a pytest tmp directory, and changes the current working directory with monkeypatch (so it gets reverted), and it really works. A problem, is that since it is not pytest tests, the setUpClass() methods get executed before the autouse fixture can change the directory, so I changed to setUp() instead (per test), and it is still way faster than waiting for timeouts. If we find a way to have the data directory copied for a setUpClass(), we would likely have everything solved for running everything with pytest (even in the main repo) |
Some tests are hanging and taking 6+ hours before getting killed when running on CI.
Name of the addon
The name of the GRASS GIS addon.
The last line of the log shown is r.futures.calib, but maybe the problem is elsewhere.
Describe the bug
A clear and concise description of what the bug of the addon is.
A couple mornings this week, plus this weekend, I manually stopped some jobs in grass-addons as they were on their way to taking 6h.
To Reproduce
Steps to reproduce the behavior:
See the logs for build and test, most often (if not only) of the hanging jobs are for main+3.11.
Expected behavior
A clear and concise description of what you expected to happen.
Tests are fast, or at least, not hanging.
Screenshots
If applicable, add screenshots to help explain your problem.
System description (please complete the following information):
On CI runners.
Additional context
Add any other context about the problem here.
I don't think my work last weekend is the cause, as some of the PRs weren't timing out, but some did. Maybe I just allowed more tests to pass (with more dependencies or better translatable strings) and we end up with hanging tests. There were some times out in the last weeks before my last work here, but not as constant as this last week. (Even before the caching. One day I tried deleting all cache here, and the next daily run timed out too).
It may be a while loop (that I didn't find in the first two r.futures tests and code), but there's a lot of nested for loops).
It could be a case where there is some input expected and it's waiting, but it's more common with the cmd.exe on windows.
Could a stdin/stdout/stderr be closed and the gunittest test runner doesn't handle it well?
We can give a try with 8.4.0 with Python 3.11 or the 8.5.0dev with the Python version used in the 8.4.0 job (if it is 3.9+), to see if it is a Python problem, or a regression in the development grass in the last month(s) or so.
The text was updated successfully, but these errors were encountered: