Severe performance degradation when TCP is funneled over UDP (GSO/TSO) #11

msune · 2024-11-10T09:09:39Z

Summary

There is a severe performance degradation when TCP is funneled over UDP on flows within the same host.

I was able to repro here: msune/ebpf_gso:main, using this synthetic scenario.

~/personal/ebpf_gso/test$ make check_perf_calibration 
------------------------------------------------------------
Server listening on TCP port 80
TCP window size:  128 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.0.1.2, TCP port 80
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 10.0.0.1 port 47032 connected with 10.0.1.2 port 80 (icwnd/mss/irtt=13/1388/53)
[  1] local 10.0.1.2 port 80 connected with 10.0.0.1 port 47032 (icwnd/mss/irtt=13/1388/33)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0003 sec  5.73 GBytes  4.92 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0109 sec  5.73 GBytes  4.91 Gbits/sec
make[1]: Entering directory '/home/marc/personal/ebpf_gso/test'
make[1]: Leaving directory '/home/marc/personal/ebpf_gso/test'
~/personal/ebpf_gso/test$ make check_perf
------------------------------------------------------------
Server listening on TCP port 8080
TCP window size:  128 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.0.1.2, TCP port 8080
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  1] local 10.0.0.1 port 35932 connected with 10.0.1.2 port 8080 (icwnd/mss/irtt=13/1388/36)
[  1] local 10.0.1.2 port 8080 connected with 10.0.0.1 port 35932 (icwnd/mss/irtt=13/1388/15)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-20.4668 sec  69.2 KBytes  27.7 Kbits/sec
make[1]: Entering directory '/home/marc/personal/ebpf_gso/test'
Waiting for server threads to complete. Interrupt again to force quit.
make[1]: Leaving directory '/home/marc/personal/ebpf_gso/test'
~/personal/ebpf_gso/test$ [ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-24.5630 sec  60.0 Bytes  19.5 bits/sec

Env:

Kernel: Linux XXX 6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64 GNU/Linux
clang: Debian clang version 14.0.6

Root cause analysis

The repro pushes a UDP header in ns1 and pops it on ns2.

pwru clearly shows that the skb is marked as SKB_GSO_TCPV4 (0x1) (*). When UDP header is pushed:

0xffff9027fa937400 2   ~in/iperf:135125 4026532397 0              307        0x0800 1440  2816  10.0.0.1:58330->10.0.1.2:80(udp)   udp4_ufo_fragment
(struct skb_shared_info){
 .nr_frags = (__u8)1,
 .gso_size = (short unsigned int)1380,
 .gso_type = (unsigned int)3, <---------------------------- 
 .dataref = (atomic_t){
  .counter = (int)65538,
 },
 .frags = (skb_frag_t[])[

   0x00000000faa803d8,
   2776,
   60,
  },
 ],
}

When the kernel attempts to UFO the packet:

0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7000  10.0.0.1:42592->10.0.1.2:80(udp)   inet_gso_segment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6980  10.0.0.1:42592->10.0.1.2:80(udp)   udp4_ufo_fragment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)

Full pwru trace

0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0               0         0x0000 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) ip_local_out
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0               0         0x0000 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) __ip_local_out
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0               0         0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) ip_output
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) nf_hook_slow
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) apparmor_ip_postroute
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) ip_finish_output
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) __ip_finish_output
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) ip_finish_output2
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) neigh_resolve_output
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) eth_header
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6992  10.0.0.1:42592->10.0.1.2:8080(tcp) skb_push
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(tcp) __dev_queue_xmit
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(tcp) tcf_classify
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(tcp) skb_ensure_writable
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(udp) skb_ensure_writable
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(udp) bpf_skb_generic_push
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7006  10.0.0.1:42592->10.0.1.2:8080(udp) skb_push
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   netdev_core_pick_tx
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   validate_xmit_skb
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   netif_skb_features
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   passthru_features_check
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   skb_network_protocol
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   __skb_gso_segment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   skb_mac_gso_segment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   skb_network_protocol
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7000  10.0.0.1:42592->10.0.1.2:80(udp)   inet_gso_segment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  6980  10.0.0.1:42592->10.0.1.2:80(udp)   udp4_ufo_fragment
0xffff9027df8a4800 6   ~bin/iperf:69662 4026532606 0              107        0x0800 1440  7014  10.0.0.1:42592->10.0.1.2:80(udp)   kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)

It drops it) as it can't find SKB_GSO_UDP/GSO_UDP_L4 in here https://github.com/torvalds/linux/blob/de2f378f2b771b39594c04695feee86476743a69/net/ipv4/udp_offload.c#L429.

(*) There seems to be a bug in the kernel; disabling all GSO/TSO offloads keeps marking egress SKBs as TCP_GSO

The text was updated successfully, but these errors were encountered:

msune · 2024-11-10T09:10:49Z

(*) There seems to be a bug in the kernel; disabling all GSO/TSO offloads keeps marking egress SKBs as TCP_GSO

This should/will be investigated elsewhere, as it's not strictly related to sfunnel/ebpf.

msune · 2024-11-10T09:30:50Z

Trying to find a workaround

Thge main issue is that there is no direct access to skb->gso_type. Some strategies I tried so far:

Not working: `bpf_skb_adjust_room()` with encap flags

None of the flags listed in the doc works for the purpose, as:

there isn't a flag to add SKB_GSO_UDP/ GSO_UDP_L4
Flags are always ORed to the skb, see https://github.com/torvalds/linux/blob/de2f378f2b771b39594c04695feee86476743a69/net/core/filter.c#L3720 and especially the implementation of bpf_skb_net_grow()

Not working (bug?): abusing `bpf_skb_change_tail()`

bpf_skb_change_tail() doc mentions:

This helper is a slow path utility intended for replies with control messages. And because it is targeted for slow path, the helper itself can afford to be slow: it implicitly linearizes, unclones and drops offloads from the skb.

The skb gso_type is correctly reset to 0x0, but then the skb is >> MTU. The (big) packet is later on dropped due to the MTU check (ofc, because the pkt is NOT anymore a GSOed pkt), in __dev_forward_skb2() which ends up calling __is_skb_forwardable(), here is the check

(See pkt size 2842 >> iface mtu 1440)

0xffff902880e2a000 1   ksoftirqd/1:21   4026532600 0              487        0x0800 1440  2842  10.0.0.1:60148->10.0.1.2:8080(tcp) __dev_forward_skb
0xffff902880e2a000 1   ksoftirqd/1:21   4026532600 0              487        0x0800 1440  2842  10.0.0.1:60148->10.0.1.2:8080(tcp) __dev_forward_skb2
0xffff902880e2a000 1   ksoftirqd/1:21   4026532600 0              487        0x0800 1440  2842  10.0.0.1:60148->10.0.1.2:8080(tcp) kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)

A simple repro of this issue - without encap/decaps - here msune/ebpf_gso:change_tail_gso.

I think this is a bug, and bpf_skb_change_tail() should break the beefy packet into the segments before being sent (and should be done after all TC BPF hooks, I guess). This is probably worth having a discussion in cilium #ebpf slack channel.

msune added bug Something isn't working bpf performance gso/tso tcp_over_udp labels Nov 10, 2024

msune mentioned this issue Nov 10, 2024

Severe performance degradation when TCP is funneled over TCP (GSO/TSO) #12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Severe performance degradation when TCP is funneled over UDP (GSO/TSO) #11

Severe performance degradation when TCP is funneled over UDP (GSO/TSO) #11

msune commented Nov 10, 2024 •

edited

Loading

msune commented Nov 10, 2024

msune commented Nov 10, 2024

Severe performance degradation when TCP is funneled over UDP (GSO/TSO) #11

Severe performance degradation when TCP is funneled over UDP (GSO/TSO) #11

Comments

msune commented Nov 10, 2024 • edited Loading

Summary

Env:

Root cause analysis

msune commented Nov 10, 2024

msune commented Nov 10, 2024

Trying to find a workaround

Not working: bpf_skb_adjust_room() with encap flags

Not working (bug?): abusing bpf_skb_change_tail()

msune commented Nov 10, 2024 •

edited

Loading

Not working: `bpf_skb_adjust_room()` with encap flags

Not working (bug?): abusing `bpf_skb_change_tail()`