Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netperf: implements dynamic NUMA binding #4252

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mcasquer
Copy link
Contributor

netperf: implements dynamic NUMA binding

The test was taking the last NUMA node to
bind the VM's memory. In some systems the last
NUMA node could have no memory and/or CPUs
assigned, updates the test to take the first
valid node.

Signed-off-by: mcasquer mcasquer@redhat.com
ID: 3321

@mcasquer mcasquer marked this pull request as ready for review January 19, 2025 19:57
@mcasquer
Copy link
Contributor Author

Test cases didn't pass but the error is not related with this patch

 (1/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: STARTED
 (1/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: ERROR: local variable 'client_pub_ip' referenced before assignment (384.46 s)
 (2/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: STARTED
 (2/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: ERROR: local variable 'client_pub_ip' referenced before assignment (376.33 s)
 (3/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: STARTED
 (3/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: ERROR: Timeout expired while waiting for shell command to complete: 'C:\\rhcygwin\\Cygwin.bat -i /Cygwin-Terminal.ico -'    (output: 'The system cannot find the path specified.\n\nC:\\>') (272.61 s)
RESULTS    : PASS 0 | ERROR 3 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

@mcasquer
Copy link
Contributor Author

@heywji could you review this PR? Thanks !

@heywji
Copy link
Contributor

heywji commented Jan 20, 2025

LGTM. Thanks to Mario's efforts and help.


Hello, other reviewers.

Let me explain some background here. The netperf of my netkvm test loop is netperf_stress_test.cfg. But someday I type the test case name as 'netperf', some errors reported. After talking with @mcasquer, we confirmed it was because of the NUMA node memory issue.

It's an actual NUMA issue improvement, even though it is not directly connected with my netkvm test loop.

@mcasquer
Copy link
Contributor Author

@zhencliu @PaulYuuu please, could you review this PR and Wenkang's comment? Thanks!

Comment on lines 153 to 154
mem = int(params["mem"])
mem_kb = mem * 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it outside loop, don't need to calculate the memory size for each loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code updated, thank you !

mem = int(params["mem"])
mem_kb = mem * 1024
if node_mem_free > mem_kb:
pinned_node = node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add break to break the loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks !

vm = env.get_vm(vm_name)
vm.verify_alive()
except virt_vm.VMCreateError:
test.error("Not enough memory on the nodes to create the VM")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to check error message includes related info before test.error, to avoid other risks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PaulYuuu what do you mean? Including a log trace with the info from the exception instance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the VM creation failed because of other issues(assert, missing args, others), that should not be a memory problem, but here you assume it fails by insufficient memory for all exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the idea, please tell me if now looks better, I don't have this kind of host for test it 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test.error/cancel depends on you, or find the message to match expected info first, if matched, cancel it, otherwise raise error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code updated and tested the cancel case by hard coding the node:

 (1/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: STARTED
 (1/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: CANCEL: The node: 7 used for VM pinning is not valid (23.93 s)

@mcasquer mcasquer force-pushed the 3321_netperf_numa_node branch from dc7d668 to 3f7f334 Compare January 24, 2025 06:42
@mcasquer
Copy link
Contributor Author

@heywji please, whenever is possible, could you test the latest patch changes? I saw in your host some QEMU and avocado processes running already... thanks !

@mcasquer mcasquer force-pushed the 3321_netperf_numa_node branch from 3f7f334 to ed6a152 Compare January 24, 2025 11:12
The test was taking the last NUMA node to
bind the VM's memory. In some systems the last
NUMA node could have no memory and/or CPUs
assigned, updates the test to take the first
valid node.

Signed-off-by: mcasquer <mcasquer@redhat.com>
@mcasquer mcasquer force-pushed the 3321_netperf_numa_node branch from ed6a152 to 8bccd82 Compare January 24, 2025 11:13
@heywji
Copy link
Contributor

heywji commented Jan 24, 2025

LGTM

 (1/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: STARTED
 (1/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: PASS (1596.48 s)
 (2/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: STARTED
 (2/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.43 s)
 (3/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: STARTED
.default_install.aio_threads.q35: STARTED
 (1/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: PASS (1596.48 s)
 (2/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: STARTED
 (2/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.43 s)
 (3/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: STARTED
 (3/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.13 s)
 (4/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: STARTED
 (4/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.19 s)
 (5/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.q35: STARTED (5/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.22 s)
 (6/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.best_registry_setting.q35: STARTED
 (6/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.best_registry_setting.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.12 s)
 (7/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.cygwin.q35: STARTED
 (7/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.cygwin.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.06 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants