Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add more scheduling options for process runner #9862

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

dsseng
Copy link
Member

@dsseng dsseng commented Dec 2, 2024

This allows us to run low-priority background tasks like xfs_scrub only while idle.

Example usage in #9848

  • chore: process: minor capability cleanup
  • feat: allow running processes with custom priority
  • feat: allow setting ionice on processes
  • feat: allow setting scheduling class on processes

Copy link
Member Author

@dsseng dsseng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure how we test this. Perhaps running a unit test on the host won't work (or maybe it'll be okay if we lower priorities or set idle scheduling class), and an integration test would require exposing unneeded APIs

@smira
Copy link
Member

smira commented Dec 2, 2024

Unsure how we test this. Perhaps running a unit test on the host won't work (or maybe it'll be okay if we lower priorities or set idle scheduling class), and an integration test would require exposing unneeded APIs

if we can do a test, that'd be cool.

we could use this for the dashboard service - lower the priority to idle on everything.

@dsseng
Copy link
Member Author

dsseng commented Dec 2, 2024

Hm, but then we'll need to get those parameters from Talos to test. In case of xfs_scrub I just waited for it to run and inspected the process with chrt, ionice and htop from a debug container

@smira
Copy link
Member

smira commented Dec 2, 2024

Hm, but then we'll need to get those parameters from Talos to test. In case of xfs_scrub I just waited for it to run and inspected the process with chrt, ionice and htop from a debug container

dashboard runs, so we can do an integration test?

@frezbo
Copy link
Member

frezbo commented Dec 2, 2024

Hm, but then we'll need to get those parameters from Talos to test. In case of xfs_scrub I just waited for it to run and inspected the process with chrt, ionice and htop from a debug container

dashboard runs, so we can do an integration test?

yeh we could run those tools from a privileged pod in test, we already have helpers for these

@smira
Copy link
Member

smira commented Dec 2, 2024

Hm, but then we'll need to get those parameters from Talos to test. In case of xfs_scrub I just waited for it to run and inspected the process with chrt, ionice and htop from a debug container

dashboard runs, so we can do an integration test?

yeh we could run those tools from a privileged pod in test, we already have helpers for these

I guess we could simply read from /proc ?

@dsseng
Copy link
Member Author

dsseng commented Dec 2, 2024

I guess nice and maybe scheduling policy could be received using procfs, let's see. Didn't find anything about IO priorities so they're likely fetched using a syscall. But if we can run commands in a privileged pods that could work as well.

@dsseng
Copy link
Member Author

dsseng commented Dec 2, 2024

I don't think giving dashboard some weird scheduling policy would be great. Idle can make it not informative when system has something to do (it won't be scheduled) while RT-like would increase resource usage by dashboard and raise its priority too much

@smira
Copy link
Member

smira commented Dec 2, 2024

I don't think giving dashboard some weird scheduling policy would be great. Idle can make it not informative when system has something to do (it won't be scheduled) while RT-like would increase resource usage by dashboard and raise its priority too much

Probably not totally idle, but de-prioritizing in favor of anything else I guess might make sense

@dsseng
Copy link
Member Author

dsseng commented Dec 2, 2024

Well, out of 6 scheduling policies default perhaps shouldn't be changed. Offsetting priority (nice) would be good

@dsseng dsseng force-pushed the process-sched-opts branch from 4840b76 to aede8a5 Compare December 4, 2024 18:12
Copy link
Member

@smira smira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we de-prioritize dashboard?

@@ -219,6 +222,148 @@ func (suite *ProcessSuite) TestStopSigKill() {
<-done
}

func (suite *ProcessSuite) TestPriority() {
pidFile := filepath.Join(suite.tmpDir, "talos-test-pid-prio")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to t.Skip() this is os.Getuid() != 0? (to make go test ./... pass without buildkit?)

Copy link
Member Author

@dsseng dsseng Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go test -v ./internal/app/machined/pkg/system/runner/process works for me, it's how I developed this. If we have nice < 17 we can drop nice to 17 (generally) without caps or root

@dsseng
Copy link
Member Author

dsseng commented Dec 4, 2024

should we de-prioritize dashboard?

By how many? Not really sure, as it must still be responsive even under load, so just increase niceness by some points

@smira
Copy link
Member

smira commented Dec 4, 2024

should we de-prioritize dashboard?

By how many? Not really sure, as it must still be responsive even under load, so just increase niceness by some points

I'd say it's the last thing that should be responsive (when compared to any other system component).

@dsseng
Copy link
Member Author

dsseng commented Dec 4, 2024

10 should do fine then. 0 - normal, -19 - max, 20 - least prioritized. Should I do it here as well?

This commit adds runner options for priority, IO priority, scheduling policy. It also cleans up previously developed code for capabilities.

This is useful to launch background tasks such as xfs_scrub to not reduce system performance. We set nice 10 for dashboard so that it gives priority to more important system services.

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
@dsseng dsseng force-pushed the process-sched-opts branch from aede8a5 to 9081506 Compare December 4, 2024 19:14
@dsseng
Copy link
Member Author

dsseng commented Dec 4, 2024

/m

@talos-bot talos-bot merged commit 9081506 into siderolabs:main Dec 4, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backported
Development

Successfully merging this pull request may close these issues.

4 participants