You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think there are somewhat confused copies for the input and output data for the benchmark. Since the kernel computes a = b + scalar * c, the vectors b and c should be copied from host to device and the vector a should be copied back from device to host. See here and here.
Most importantly from the performance point of view, you're making one unnecessary copy from host to device, which does not happen in the zero-copy and unified memory variants, so the comparison with pageable and pinned memory is not completely fair.
Jakub
The text was updated successfully, but these errors were encountered:
Hi Carl,
I think there are somewhat confused copies for the input and output data for the benchmark. Since the kernel computes
a = b + scalar * c
, the vectorsb
andc
should be copied from host to device and the vectora
should be copied back from device to host. See here and here.Most importantly from the performance point of view, you're making one unnecessary copy from host to device, which does not happen in the zero-copy and unified memory variants, so the comparison with pageable and pinned memory is not completely fair.
Jakub
The text was updated successfully, but these errors were encountered: