You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To test NGO network performance, run iperf server-side on host Linux, and run client inside NGO. And the client-side never exits. From the log, I can see the client hanging at a Recvfrom syscall.
With the same usage, running in Occlum can pass.
To reproduce
Steps to reproduce the behavior:
Run iperf server on host: ./iperf -s -p 6888
Run iperf client in Occlum: occlum run /bin/iperf -c 127.0.0.1 -p 6888 -P 1 -t 1
Expected behavior
The client-side should exit when the test ends.
Logs
Server-side (strace):
After the test ended, the server close the socket (fd = 4) and exit the thread. However, the client-side didn't respond to the socket close and kept waiting.
Client-side:
When the transmission ended, the client-side shut down write and looked like trying to receive a control message by Recvfrom and the second Recvfrom never returns.
�[0m[2022-05-26T06:58:13.758Z][TRACE][C:4][T4][#4907][Setitimer] Syscall { num = Setitimer }�[0m
�[0m[2022-05-26T06:58:13.758Z][TRACE][C:4][T4][#4907][Setitimer] ret = 0xffffffffffffffda�[0m
�[31m[2022-05-26T06:58:13.759Z][ERROR][C:4][T4][#4907][Setitimer] Error = ENOSYS (#38, Function not implemented): Unimplemented or unknown syscall [line = 854, file = src/entry/syscall.rs]�[0m
�[0m[2022-05-26T06:58:13.759Z][TRACE][C:4][T4][#4908][Shutdown] Syscall { num = Shutdown, fd = 3, how = 1 }�[0m
�[0m[2022-05-26T06:58:13.759Z][DEBUG][C:4][T4][#4908][Shutdown] shutdown: fd: 3, how: 1�[0m
�[0m[2022-05-26T06:58:13.759Z][TRACE][C:4][T4][#4908][Shutdown] ret = 0x0�[0m
�[0m[2022-05-26T06:58:13.759Z][TRACE][C:4][T4][#4909][ClockGettime] Syscall { num = ClockGettime, clockid = 0, ts_u = 0x7fff2ca94cf0 }�[0m
�[0m[2022-05-26T06:58:13.759Z][TRACE][C:4][T4][#4909][ClockGettime] ret = 0x0�[0m
�[0m[2022-05-26T06:58:13.759Z][TRACE][C:4][T4][#4910][Setsockopt] Syscall { num = Setsockopt, fd = 3, level = 1, optname = 20, optval = 0x7fff2ca94cf0, optlen = 16 }�[0m
�[0m[2022-05-26T06:58:13.759Z][DEBUG][C:4][T4][#4910][Setsockopt] setsockopt: fd: 3, level: 1, optname: 20, optval: 0x7fff2ca94cf0, optlen: 16�[0m
�[0m[2022-05-26T06:58:13.759Z][TRACE][C:4][T4][#4910][Setsockopt] ret = 0x0�[0m
�[0m[2022-05-26T06:58:13.759Z][TRACE][C:4][T4][#4911][Recvfrom] Syscall { num = Recvfrom, fd = 3, base = 0x7fff2c252010, len = 131072, flags = 0, addr = 0x0, addr_len = 0x0 }�[0m
�[0m[2022-05-26T06:58:13.759Z][TRACE][C:4][T4][#4911][Recvfrom] ret = 0x1c�[0m
�[0m[2022-05-26T06:58:13.759Z][TRACE][C:4][T4][#4912][Recvfrom] Syscall { num = Recvfrom, fd = 3, base = 0x7fff2c252010, len = 131072, flags = 0, addr = 0x0, addr_len = 0x0 }�[0m
�[0m[2022-05-26T06:58:13.765Z][DEBUG][C:0] Timer Wheel: will sleep 10s�[0m
�[0m[2022-05-26T06:58:13.765Z][TRACE][C:3][T3][#74][ClockNanosleep] ret = 0x0�[0m
�[0m[2022-05-26T06:58:13.765Z][TRACE][C:3][T3][#75][ClockNanosleep] Syscall { num = ClockNanosleep, clockid = 1, flags = 0, request = 0x7fff2c693e00, remain = 0x0 }�[0m
�[0m[2022-05-26T06:58:13.765Z][DEBUG][C:3][T3][#75][ClockNanosleep] Timer Wheel: try waking�[0m
�[0m[2022-05-26T06:58:13.765Z][DEBUG][C:0] Timer Wheel: woken up�[0m
�[0m[2022-05-26T06:58:13.782Z][DEBUG][C:0] Timer Wheel: will sleep 10s�[0m
�[0m[2022-05-26T06:58:13.782Z][TRACE][C:3][T3][#75][ClockNanosleep] ret = 0x0�[0m
�[0m[2022-05-26T06:58:13.782Z][TRACE][C:3][T3][#76][ClockNanosleep] Syscall { num = ClockNanosleep, clockid = 1, flags = 0, request = 0x7fff2c693e00, remain = 0x0 }�[0m
�[0m[2022-05-26T06:58:13.782Z][DEBUG][C:3][T3][#76][ClockNanosleep] Timer Wheel: try waking�[0m
�[0m[2022-05-26T06:58:13.783Z][DEBUG][C:0] Timer Wheel: woken up�[0m
�[0m[2022-05-26T06:58:13.799Z][DEBUG][C:0] Timer Wheel: will sleep 10s�[0m
So when a side is close, on the other side of the stream, Recvfrom should return 0. From the man page of recv, it also states that:
When a stream socket peer has performed an orderly shutdown, the return value will be 0 (the traditional "end-of-file" return).
But in NGO, it shows a different phenomenon that the close didn't make the recvfrom return. I did a little digging, and it looks like the recvfrom request is submitted to the kernel but it never returns and the calling thread keeps waiting. Thus, it seems like a fault of io_uring that close/shutdown can't cancel pending requests and make them return.
From an issue in liburing (axboe/liburing#568), it looks like it won't be supported before 5.19.
Possible solution/Implementation
Besides this problem, iperf also uses a Netlink socket for other usages in this test. NGO is also not supported. To run this tool, there are mainly two solutions:
(1) Wait for the kernel 5.19 and hopefully, this usage is supported, and update all related libraries to use this feature. Also, add support for Netlink socket type.
(2) Add an ocall based implementation as a fallback solution when running applications like this. We could encourage the users to try io_uring based network first and if it can't use, use this fallback solution. At least, the in-enclave scheduling can bring some benefits. The drawback of this solution is that some of the network ocall will be blocking and can't release the CPU.
The text was updated successfully, but these errors were encountered:
iouring is fine. The reason is that the client doesn't really "shutdown" the host fd and the server is not notified. The shutdown implementation needs ocall.
Describe the bug
To test NGO network performance, run iperf server-side on host Linux, and run client inside NGO. And the client-side never exits. From the log, I can see the client hanging at a
Recvfrom
syscall.With the same usage, running in Occlum can pass.
To reproduce
Steps to reproduce the behavior:
Expected behavior
The client-side should exit when the test ends.
Logs
Server-side (strace):
After the test ended, the server close the socket (fd = 4) and exit the thread. However, the client-side didn't respond to the socket
close
and kept waiting.Client-side:
When the transmission ended, the client-side shut down write and looked like trying to receive a control message by
Recvfrom
and the secondRecvfrom
never returns.Environment
Additional context
Compared with the client log running in Occlum, the second
Recvfrom
return when the server closes the socket.Server-side:
Client-side:
So when a side is close, on the other side of the stream,
Recvfrom
should return 0. From the man page of recv, it also states that:But in NGO, it shows a different phenomenon that the
close
didn't make therecvfrom
return. I did a little digging, and it looks like therecvfrom
request is submitted to the kernel but it never returns and the calling thread keeps waiting. Thus, it seems like a fault of io_uring that close/shutdown can't cancel pending requests and make them return.From an issue in liburing (axboe/liburing#568), it looks like it won't be supported before 5.19.
Possible solution/Implementation
Besides this problem, iperf also uses a Netlink socket for other usages in this test. NGO is also not supported. To run this tool, there are mainly two solutions:
(1) Wait for the kernel 5.19 and hopefully, this usage is supported, and update all related libraries to use this feature. Also, add support for Netlink socket type.
(2) Add an ocall based implementation as a fallback solution when running applications like this. We could encourage the users to try io_uring based network first and if it can't use, use this fallback solution. At least, the in-enclave scheduling can bring some benefits. The drawback of this solution is that some of the network ocall will be blocking and can't release the CPU.
The text was updated successfully, but these errors were encountered: