-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent NM from automatic (DHCP) configuration on ethernet devices #840
base: main
Are you sure you want to change the base?
Conversation
06a8541
to
bc29060
Compare
fbda264
to
c38781c
Compare
recheck |
c38781c
to
c38f438
Compare
c38f438
to
2ae86e1
Compare
recheck |
recheck |
when: edpm_network_config_tool == 'os-net-config' | ||
|
||
- name: Disable auto-configuration of all interfaces by NetworkManager | ||
when: ( edpm_bootstrap_network_service == 'NetworkManager' ) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is edpm_bootstrap_network_service
var set or default defined? This var does not exist by default. Also the code does not look correct to me. When edpm_bootstrap_network_service
is anything other than 'NetworkManager` os-net-config won't run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That variable appears to be set in the bootstrap role:
https://github.com/openstack-k8s-operators/edpm-ansible/blob/main/roles/edpm_bootstrap/defaults/main.yml#L62
So, I assumed they were using it from there.
The same conditional check is also used in the Bootstrap role by the looks of it:
https://github.com/openstack-k8s-operators/edpm-ansible/blob/main/roles/edpm_bootstrap/tasks/bootstrap.yml#L73
I'm not sure of the background for this variable though. Just an observation in response to this comment. Maybe we can take this opportunity to evaluate our use of the variable to ensure it still makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I assumed they were using it from there.
That's a role var and it won't be availale for other roles or other playbook runs. Also, every service is run as a separate playbook, so that won't be available in this role. If you check the logs, as expected it fails not finding that var.
https://logserver.rdoproject.org/40/840/2ae86e19752a6c6cd36627135f56b38773f689b4/github-check/podified-multinode-edpm-deployment-crc/5f5a454/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/pods/configure-network-edpm-deployment-openstack-edpm-ipam-cmwdj/logs/configure-network-edpm-deployment-openstack-edpm-ipam.log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the code does not look correct to me. When edpm_bootstrap_network_service
is anything other than 'NetworkManager` os-net-config won't run?
- os-net-config with ifcfg scripts doesn't have any dependency on NetworkManager.
- But, the reason for adding this check was, if os-net-config is used for deployment and bootstrap used "NetworkManager" during cloud-init, we need to disable the auto dhcp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I assumed they were using it from there.
That's a role var and it won't be availale for other roles or other playbook runs. Also, every service is run as a separate playbook, so that won't be available in this role.
@rabi , will referring like this fix it?
- https://github.com/openstack-k8s-operators/edpm-ansible/blob/main/roles/edpm_derive_pci_device_spec/defaults/main.yml#L26
(neutron_sriov_agent default is referred in derive_pci role)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for now I can safely assume "edpm_bootstrap_network_service" is NetworkManager and remove the check for CI to pass
Can you answer the question @slagle asked? Should we not stop/disable NetworkManager when using network-scripts. I understand that we've been keeping it enabled and running since TripleO days and doing these hacks to prevent it doing it's things, but do we really need it running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slagle / @rabi, there are few reasons for not disabling NM as of now:
- os-net-config can use nmstate provider in future, which will eventually need NM
- Disabling/stopping NM might need a wider review and QE scope
- The issue we try to fix is the flood of DHCP requests (esp. for SR-IOV VFs) during node provision, which leads to timeout
- SR-IOV VFs are treated as individual i/f by NM and DHCP requests are sent out for all on their creation
cc @karthiksundaravel DanSneddon for further comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The network-scripts has a variable NM_CONTROLLED, which could be used to enable or disable NetworkManager for that interface. All we need now is to prevent NM from configuring the devices which are not configured or yet to be configured by network scripts. Given this requirement, setting no-auto-default=* does the job well. This configuration should hold good irrespective of the os-net-config provider (ifcfg or nmstate) as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with the comments from @karthiksundaravel and we already have seen this causing issues in the field. The only way to be sure that things behave as expected is for auto-default to be turned off in the configuration as it isn't needed for this use case anyway (cloud-init will do what is required, but NM doesn't need to step in and create configurations for unconfigured interfaces).
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bshephar, vcandapp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3e19b84
to
5cbb92f
Compare
5cbb92f
to
4580656
Compare
recheck |
4580656
to
3ba8450
Compare
As @vcandapp mentioned NM will cause duplicate DHCP requests, so this fix will prevent NM from automatically creating DHCP requests when they are not needed. This will improve the behavior that we see and ensure that NM is operating in a way that does not conflict with os-net-config. It is understandable that NM would create DHCP enabled configurations by default for interfaces for normal use cases, but for cloud we really need for it to get out of the way and let the automation work as designed. |
recheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+2 I'm fine with the current change.
When edpm_network_config role uses os-net-config to perform host network configuration, duplicate DHCP requests from NM are causing issue. Eg. https://issues.redhat.com/browse/OSPRH-9142 This fix prevents NM to configure DHCP by default Signed-off-by: vcandapp <vcandapp@redhat.com>
+2 |
recheck |
When edpm_network_config role uses os-net-config to perform host network configuration, duplicate DHCP requests from NM are causing connectivity issue. Eg.
https://issues.redhat.com/browse/OSPRH-9142
This fix prevents NM from configure DHCP by default
This PR replaces an older PR