Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow policyscript to include/exclude CPUs #289

Open
phemmer opened this issue Dec 21, 2023 · 10 comments
Open

allow policyscript to include/exclude CPUs #289

phemmer opened this issue Dec 21, 2023 · 10 comments

Comments

@phemmer
Copy link

phemmer commented Dec 21, 2023

Currently irqbalance has the environment variable IRQBALANCE_BANNED_CPUS (and IRQBALANCE_BANNED_CPULIST) which allows for excluding CPUs from any IRQ assignment. It would be nice if we could be more granular and only apply this to specific IRQs.

I currently have an issue with a driver when its IRQs get assigned to CPUs >= 64. While this is a problem with the driver, I wanted to temporarily work around the problem by excluding CPUs >=64 from being used on those IRQs.

Regarding whether this should be a whitelist or blacklist, matching the IRQBALANCE_BANNED_CPUS functionality would mean blacklist. However in my case, not all my systems have the same CPU count. So with a blacklist, to avoid having to count the CPUs and construct the appropriate mask, I have to provide a mask big enough to cover all the different systems' hardware. This does work, as it seems irqbalance ignores the oversized mask bits, but a whitelist might be more succinct. A whitelist would also be more inline with the existing /proc/irq/$irq/smp_affinity provided by the kernel.

So ultimately the request is to add support for an instruction to be emitted by the policy script to whitelist or blacklist (whichever is decided) CPUs for the specific IRQ indicated to the script.

@liuchao173
Copy link
Member

If CPUs >=64 are in another numa node, you can use the numa_node setting in your policy script to driver's IRQs to a specific numa node.
Or you can use ban=true in your policy script, and manually set IRQs' smp_affinity once.

@nhorman
Copy link
Member

nhorman commented Feb 19, 2024

I'm not sure I see the difficulty in computing a run time blacklist. You should be able to do that with 2 linen in your policy script:

BINMASK=$(for i in $(awk '/processor/ {print $3}' /proc/cpuinfo); do if [ $i -ge 24 ]; then echo -n 0; else echo -n 1; fi; done)
printf "%x" "$((2#$BINMASK))"

This is adjusted of course for illustration, but it should work for any boundary you want

Additionally, as @liuchao173 noted, you can also just return ban=true and set your affinity manually for the irq in question

@phemmer
Copy link
Author

phemmer commented Feb 19, 2024

If CPUs >=64 are in another numa node, you can use the numa_node setting in your policy script to driver's IRQs to a specific numa node.

This is not the case

Or you can use ban=true in your policy script, and manually set IRQs' smp_affinity once.

On one system I have 336 interrupts that would have to be banned, and thus manually set. This means strategically placing these interrupts such that they are evenly distributed. Doing this manually is extremely difficult, and is the whole point of using irqbalance.

I'm not sure I see the difficulty in computing a run time blacklist. You should be able to do that with 2 linen in your policy script:

I'm not following your example script. It outputs a number not a key=value pair. The irqbalance man page mentions no support for this. Is this some undocumented feature?

@nhorman
Copy link
Member

nhorman commented Feb 19, 2024

Its not the entire script, I only meant to demonstrate that building a custom mask at run time based on the number of cpus shouldn't be particularly difficult, regardless of if we did whitelists and blacklists. Although looking at it, I could have sworn I added a key to the policy script that allowed for a per-irq banning mask, but I don't see it now. Let me try dig that up and apply it

@nhorman nhorman closed this as completed Jul 2, 2024
@phemmer
Copy link
Author

phemmer commented Jul 2, 2024

Was this implemented? Says "closed this as completed", but I don't see any commits adding the functionality, and the documentation doesn't show anything.

@nhorman
Copy link
Member

nhorman commented Jul 2, 2024

No, I had assumed by your lack of response over the last few months , that my suggestion on creating a custom mask worked for you, or that, if you wanted the functionality you would submit a PR for it.

@phemmer
Copy link
Author

phemmer commented Jul 2, 2024

I'm really confused. You seemed to acknowledge that the needed functionality is missing:

I could have sworn I added a key to the policy script that allowed for a per-irq banning mask, but I don't see it now.

And that you probably had the code around somewhere:

Let me try dig that up and apply it

@nhorman
Copy link
Member

nhorman commented Jul 2, 2024

oh, sorry, you're right, I didn't update you on that. Apologies. I looked and it turns out I didn't ever implement this, or if I did, its been lost to time. That said, I'm not in a position to (re)create that work at the moment. Are you interested in writing a PR?

@nhorman nhorman reopened this Jul 2, 2024
@phemmer
Copy link
Author

phemmer commented Jul 2, 2024

Maybe. C isn't my strongest language, but I can usually get by. Though we are in the middle of upgrading both the systems with the hardware causing our issue, and upgrading to kernel 6.1 which has some major changes to XDP and seems to have solved the issue. Also I'm about to head out on vacation. So to be completely honest, while it's possible I may tackle it, it's probably not likely. If you want to close that's fine. I just was confused as if it were addressed, I definitely would have utilized the functionality instead of the hack we've currently got in place.

@nhorman
Copy link
Member

nhorman commented Jul 2, 2024

yeah, I was cleaning up issues, and totally missed my last comment on this one. I'll leave it open. Perhaps I'll get to it in a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants