-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add nvidia MIG #258
feat: add nvidia MIG #258
Conversation
3964289
to
a9636a0
Compare
6b025dc
to
d72bc52
Compare
4cb9254
to
385d4fe
Compare
385d4fe
to
495148d
Compare
9603808
to
2217276
Compare
|
||
// The GPU in the current instance is not one of the known GPUs. We attempt using a profile that doesn't belong to one of the known GPUs. | ||
for (gpu, mig_profile) in &mig_settings.profile { | ||
if !known_gpus.contains(gpu) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this always true? You could make it another ensure!
or omit the check, to help de-indent the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mig_settings might contain some known and some unsupported (by us) GPUs (that support MIG). If the current instance has one of the "currently unsupported" GPUs, we wouldn't want to call nvidia-smi
commands in cases we know aren't valid.
c2f0243
to
e44778b
Compare
7edc279
to
eafcab1
Compare
eafcab1
to
777c37d
Compare
Force push fixes all the above comments. |
a811e5b
to
164e650
Compare
Got rid of hack commit since we removed dependency on settings-sdk structs to simplify the code. |
164e650
to
cf9e015
Compare
cf9e015
to
6feabca
Compare
Issue number:
Related:
Description of changes:
Adding nvidia-migmanager service and binary that configures the instance with nvidia mig.
Testing done:
kubectl describe node
shows 56 gpus post instance reboot.Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.