mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-12-08 01:08:45 +00:00
Lorenz recently reported:
In our TC classifier cls_redirect [0], we use the following sequence of
helper calls to decapsulate a GUE (basically IP + UDP + custom header)
encapsulated packet:
bpf_skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
bpf_redirect(skb->ifindex, BPF_F_INGRESS)
It seems like some checksums of the inner headers are not validated in
this case. For example, a TCP SYN packet with invalid TCP checksum is
still accepted by the network stack and elicits a SYN ACK. [...]
That is, we receive the following packet from the driver:
| ETH | IP | UDP | GUE | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx checksum offloading.
On this packet we run skb_adjust_room_mac(-encap_len), and get the following:
| ETH | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
Note that ip_summed is still CHECKSUM_UNNECESSARY. After bpf_redirect()'ing
into the ingress, we end up in tcp_v4_rcv(). There, skb_checksum_init() is
turned into a no-op due to CHECKSUM_UNNECESSARY.
The bpf_skb_adjust_room() helper is not aware of protocol specifics. Internally,
it handles the CHECKSUM_COMPLETE case via skb_postpull_rcsum(), but that does
not cover CHECKSUM_UNNECESSARY. In this case skb->csum_level of the original
skb prior to bpf_skb_adjust_room() call was 0, that is, covering UDP. Right now
there is no way to adjust the skb->csum_level. NICs that have checksum offload
disabled (CHECKSUM_NONE) or that support CHECKSUM_COMPLETE are not affected.
Use a safe default for CHECKSUM_UNNECESSARY by resetting to CHECKSUM_NONE and
add a flag to the helper called BPF_F_ADJ_ROOM_NO_CSUM_RESET that allows users
from opting out. Opting out is useful for the case where we don't remove/add
full protocol headers, or for the case where a user wants to adjust the csum
level manually e.g. through bpf_csum_level() helper that is added in subsequent
patch.
The bpf_skb_proto_{4_to_6,6_to_4}() for NAT64/46 translation from the BPF
bpf_skb_change_proto() helper uses bpf_skb_net_hdr_{push,pop}() pair internally
as well but doesn't change layers, only transitions between v4 to v6 and vice
versa, therefore no adoption is required there.
[0] https://lore.kernel.org/bpf/20200424185556.7358-1-lmb@cloudflare.com/
Fixes:
|
||
|---|---|---|
| .. | ||
| bpf_sk_storage.c | ||
| datagram.c | ||
| datagram.h | ||
| dev_addr_lists.c | ||
| dev_ioctl.c | ||
| dev.c | ||
| devlink.c | ||
| drop_monitor.c | ||
| dst_cache.c | ||
| dst.c | ||
| failover.c | ||
| fib_notifier.c | ||
| fib_rules.c | ||
| filter.c | ||
| flow_dissector.c | ||
| flow_offload.c | ||
| gen_estimator.c | ||
| gen_stats.c | ||
| gro_cells.c | ||
| hwbm.c | ||
| link_watch.c | ||
| lwt_bpf.c | ||
| lwtunnel.c | ||
| Makefile | ||
| neighbour.c | ||
| net_namespace.c | ||
| net-procfs.c | ||
| net-sysfs.c | ||
| net-sysfs.h | ||
| net-traces.c | ||
| netclassid_cgroup.c | ||
| netevent.c | ||
| netpoll.c | ||
| netprio_cgroup.c | ||
| page_pool.c | ||
| pktgen.c | ||
| ptp_classifier.c | ||
| request_sock.c | ||
| rtnetlink.c | ||
| scm.c | ||
| secure_seq.c | ||
| skbuff.c | ||
| skmsg.c | ||
| sock_diag.c | ||
| sock_map.c | ||
| sock_reuseport.c | ||
| sock.c | ||
| stream.c | ||
| sysctl_net_core.c | ||
| timestamping.c | ||
| tso.c | ||
| utils.c | ||
| xdp.c | ||