Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe performance degradation when TCP is funneled over UDP (GSO/TSO) #11

Open
msune opened this issue Nov 10, 2024 · 2 comments
Open

Comments

@msune
Copy link
Member

msune commented Nov 10, 2024

Summary

There is a severe performance degradation when TCP is funneled over UDP on flows within the same host.

I was able to repro here: msune/ebpf_gso:main, using this synthetic scenario.

~/personal/ebpf_gso/test$ make check_perf_calibration 
------------------------------------------------------------
Server listening on TCP port 80
TCP window size:  128 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.0.1.2, TCP port 80
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 10.0.0.1 port 47032 connected with 10.0.1.2 port 80 (icwnd/mss/irtt=13/1388/53)
[  1] local 10.0.1.2 port 80 connected with 10.0.0.1 port 47032 (icwnd/mss/irtt=13/1388/33)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0003 sec  5.73 GBytes  4.92 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0109 sec  5.73 GBytes  4.91 Gbits/sec
make[1]: Entering directory '/home/marc/personal/ebpf_gso/test'
make[1]: Leaving directory '/home/marc/personal/ebpf_gso/test'
~/personal/ebpf_gso/test$ make check_perf
------------------------------------------------------------
Server listening on TCP port 8080
TCP window size:  128 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.0.1.2, TCP port 8080
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  1] local 10.0.0.1 port 35932 connected with 10.0.1.2 port 8080 (icwnd/mss/irtt=13/1388/36)
[  1] local 10.0.1.2 port 8080 connected with 10.0.0.1 port 35932 (icwnd/mss/irtt=13/1388/15)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-20.4668 sec  69.2 KBytes  27.7 Kbits/sec
make[1]: Entering directory '/home/marc/personal/ebpf_gso/test'
Waiting for server threads to complete. Interrupt again to force quit.
make[1]: Leaving directory '/home/marc/personal/ebpf_gso/test'
~/personal/ebpf_gso/test$ [ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-24.5630 sec  60.0 Bytes  19.5 bits/sec

Env:

  • Kernel: Linux XXX 6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64 GNU/Linux
  • clang: Debian clang version 14.0.6

Root cause analysis

The repro pushes a UDP header in ns1 and pops it on ns2.

pwru clearly shows that the skb is marked as SKB_GSO_TCPV4 (0x1) (*). When UDP header is pushed:

0xffff9027fa937400 2   ~in/iperf:135125 4026532397 0              307        0x0800 1440  2816  10.0.0.1:58330->10.0.1.2:80(udp)   udp4_ufo_fragment
(struct skb_shared_info){
 .nr_frags = (__u8)1,
 .gso_size = (short unsigned int)1380,
 .gso_type = (unsigned int)3, <---------------------------- 
 .dataref = (atomic_t){
  .counter = (int)65538,
 },
 .frags = (skb_frag_t[])[

   0x00000000faa803d8,
   2776,
   60,
  },
 ],
}

When the kernel attempts to UFO the packet:

0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7000  10.0.0.1:42592->10.0.1.2:80(udp)   inet_gso_segment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6980  10.0.0.1:42592->10.0.1.2:80(udp)   udp4_ufo_fragment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)
Full pwru trace
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0               0         0x0000 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) ip_local_out
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0               0         0x0000 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) __ip_local_out
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0               0         0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) ip_output
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) nf_hook_slow
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) apparmor_ip_postroute
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) ip_finish_output
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) __ip_finish_output
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) ip_finish_output2
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) neigh_resolve_output
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) eth_header
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) skb_push
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(tcp) __dev_queue_xmit
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(tcp) tcf_classify
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(tcp) skb_ensure_writable
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(udp) skb_ensure_writable
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(udp) bpf_skb_generic_push
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(udp) skb_push
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   netdev_core_pick_tx
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   validate_xmit_skb
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   netif_skb_features
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   passthru_features_check
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   skb_network_protocol
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   __skb_gso_segment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   skb_mac_gso_segment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   skb_network_protocol
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7000  10.0.0.1:42592->10.0.1.2:80(udp)   inet_gso_segment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6980  10.0.0.1:42592->10.0.1.2:80(udp)   udp4_ufo_fragment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)

It drops it) as it can't find SKB_GSO_UDP/GSO_UDP_L4 in here https://github.com/torvalds/linux/blob/de2f378f2b771b39594c04695feee86476743a69/net/ipv4/udp_offload.c#L429.

(*) There seems to be a bug in the kernel; disabling all GSO/TSO offloads keeps marking egress SKBs as TCP_GSO

@msune
Copy link
Member Author

msune commented Nov 10, 2024

(*) There seems to be a bug in the kernel; disabling all GSO/TSO offloads keeps marking egress SKBs as TCP_GSO

This should/will be investigated elsewhere, as it's not strictly related to sfunnel/ebpf.

@msune
Copy link
Member Author

msune commented Nov 10, 2024

Trying to find a workaround

Thge main issue is that there is no direct access to skb->gso_type. Some strategies I tried so far:

Not working: bpf_skb_adjust_room() with encap flags

None of the flags listed in the doc works for the purpose, as:

Not working (bug?): abusing bpf_skb_change_tail()

bpf_skb_change_tail() doc mentions:

This helper is a slow path utility intended for replies with control messages. And because it is targeted for slow path, the helper itself can afford to be slow: it implicitly linearizes, unclones and drops offloads from the skb.

The skb gso_type is correctly reset to 0x0, but then the skb is >> MTU. The (big) packet is later on dropped due to the MTU check (ofc, because the pkt is NOT anymore a GSOed pkt), in __dev_forward_skb2() which ends up calling __is_skb_forwardable(), here is the check

(See pkt size 2842 >> iface mtu 1440)

0xffff902880e2a000 1   ksoftirqd/1:21   4026532600 0              487        0x0800 1440  2842  10.0.0.1:60148->10.0.1.2:8080(tcp) __dev_forward_skb
0xffff902880e2a000 1   ksoftirqd/1:21   4026532600 0              487        0x0800 1440  2842  10.0.0.1:60148->10.0.1.2:8080(tcp) __dev_forward_skb2
0xffff902880e2a000 1   ksoftirqd/1:21   4026532600 0              487        0x0800 1440  2842  10.0.0.1:60148->10.0.1.2:8080(tcp) kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)

A simple repro of this issue - without encap/decaps - here msune/ebpf_gso:change_tail_gso.

I think this is a bug, and bpf_skb_change_tail() should break the beefy packet into the segments before being sent (and should be done after all TC BPF hooks, I guess). This is probably worth having a discussion in cilium #ebpf slack channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant