Try different combinations of global functions replacement:
- replace function that changes packet data with one that doesn't;
- replace function that changes packet data with one that does;
- replace function that doesn't change packet data with one that does;
- replace function that doesn't change packet data with one that doesn't;
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241210041100.1898468-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Verify that the sockmap link was not severed, and socket's entry is indeed
removed from the map when the corresponding descriptor gets closed.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241202-sockmap-replace-v1-2-1e88579e7bd5@rbox.co
With CONFIG_KPROBES_ON_FTRACE enabled on powerpc, ftrace_location_range
returns ftrace location for bpf_fentry_test1 at offset of 4 bytes from
function entry. This is because branch to _mcount function is at offset
of 4 bytes in function profile sequence.
To fix this, add entry_offset of 4 bytes while verifying the address for
kprobe entry address of bpf_fentry_test1 in verify_perf_link_info in
selftest, when CONFIG_KPROBES_ON_FTRACE is enabled.
Disassemble of bpf_fentry_test1:
c000000000e4b080 <bpf_fentry_test1>:
c000000000e4b080: a6 02 08 7c mflr r0
c000000000e4b084: b9 e2 22 4b bl c00000000007933c <_mcount>
c000000000e4b088: 01 00 63 38 addi r3,r3,1
c000000000e4b08c: b4 07 63 7c extsw r3,r3
c000000000e4b090: 20 00 80 4e blr
When CONFIG_PPC_FTRACE_OUT_OF_LINE [1] is enabled, these function profile
sequence is moved out of line with an unconditional branch at offset 0.
So, the test works without altering the offset for
'CONFIG_KPROBES_ON_FTRACE && CONFIG_PPC_FTRACE_OUT_OF_LINE' case.
Disassemble of bpf_fentry_test1:
c000000000f95190 <bpf_fentry_test1>:
c000000000f95190: 00 00 00 60 nop
c000000000f95194: 01 00 63 38 addi r3,r3,1
c000000000f95198: b4 07 63 7c extsw r3,r3
c000000000f9519c: 20 00 80 4e blr
[1] https://lore.kernel.org/all/20241030070850.1361304-13-hbathini@linux.ibm.com/
Fixes: 23cf7aa539 ("selftests/bpf: Add selftest for fill_link_info")
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241209065720.234344-1-skb99@linux.ibm.com
The selftests build four kernel modules which use copy-pasted Makefile
targets. This is a bit messy, and doesn't scale so well when we add more
modules, so let's consolidate these rules into a single rule generated
for each module name, and move the module sources into a single
directory.
To avoid parallel builds of the different modules stepping on each
other's toes during the 'modpost' phase of the Kbuild 'make modules',
the module files should really be a grouped target. However, make only
added explicit support for grouped targets in version 4.3, which is
newer than the minimum version supported by the kernel. However, make
implicitly treats pattern matching rules with multiple targets as a
grouped target, so we can work around this by turning the rule into a
pattern matching target. We do this by replacing '.ko' with '%ko' in the
targets with subst().
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/bpf/20241204-bpf-selftests-mod-compile-v5-1-b96231134a49@redhat.com
Add a __caps_unpriv annotation so that tests requiring specific
capabilities while dropping the rest can conveniently specify them
during selftest declaration instead of munging with capabilities at
runtime from the testing binary.
While at it, let us convert test_verifier_mtu to use this new support
instead.
Since we do not want to include linux/capability.h, we only defined the
four main capabilities BPF subsystem deals with in bpf_misc.h for use in
tests. If the user passes a CAP_SYS_NICE or anything else that's not
defined in the header, capability parsing code will return a warning.
Also reject strtol returning 0. CAP_CHOWN = 0 but we'll never need to
use it, and strtol doesn't errno on failed conversion. Fail the test in
such a case.
The original diff for this idea is available at link [0].
[0]: https://lore.kernel.org/bpf/a1e48f5d9ae133e19adc6adf27e19d585e06bab4.camel@gmail.com
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
[ Kartikeya: rebase on bpf-next, add warn to parse_caps, convert test_verifier_mtu ]
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Include tests that check for rejection in erroneous cases, like
unbalanced IRQ-disabled counts, within and across subprogs, invalid IRQ
flag state or input to kfuncs, behavior upon overwriting IRQ saved state
on stack, interaction with sleepable kfuncs/helpers, global functions,
and out of order restore. Include some success scenarios as well to
demonstrate usage.
#128/1 irq/irq_save_bad_arg:OK
#128/2 irq/irq_restore_bad_arg:OK
#128/3 irq/irq_restore_missing_2:OK
#128/4 irq/irq_restore_missing_3:OK
#128/5 irq/irq_restore_missing_3_minus_2:OK
#128/6 irq/irq_restore_missing_1_subprog:OK
#128/7 irq/irq_restore_missing_2_subprog:OK
#128/8 irq/irq_restore_missing_3_subprog:OK
#128/9 irq/irq_restore_missing_3_minus_2_subprog:OK
#128/10 irq/irq_balance:OK
#128/11 irq/irq_balance_n:OK
#128/12 irq/irq_balance_subprog:OK
#128/13 irq/irq_global_subprog:OK
#128/14 irq/irq_restore_ooo:OK
#128/15 irq/irq_restore_ooo_3:OK
#128/16 irq/irq_restore_3_subprog:OK
#128/17 irq/irq_restore_4_subprog:OK
#128/18 irq/irq_restore_ooo_3_subprog:OK
#128/19 irq/irq_restore_invalid:OK
#128/20 irq/irq_save_invalid:OK
#128/21 irq/irq_restore_iter:OK
#128/22 irq/irq_save_iter:OK
#128/23 irq/irq_flag_overwrite:OK
#128/24 irq/irq_flag_overwrite_partial:OK
#128/25 irq/irq_ooo_refs_array:OK
#128/26 irq/irq_sleepable_helper:OK
#128/27 irq/irq_sleepable_kfunc:OK
#128 irq:OK
Summary: 1/27 PASSED, 0 SKIPPED, 0 FAILED
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204030400.208005-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
serial_test_flow_dissector_namespace manipulates both the root net
namespace and a dedicated non-root net namespace. If for some reason a
program attach on root namespace succeeds while it was expected to
fail, the unexpected program will remain attached to the root namespace,
possibly affecting other runs or even other tests in the same run.
Fix undesired test failure side effect by explicitly detaching programs
on failing tests expecting attach to fail. As a side effect of this
change, do not test errno value if the tested operation do not fail.
Fixes: 284ed00a59dd ("selftests/bpf: migrate flow_dissector namespace exclusivity test")
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20241128-small_flow_test_fix-v1-1-c12d45c98c59@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This verifies that programs of BPF_PROG_TYPE_CGROUP_SKB can access
skb->data_end with direct packet access when being run with
BPF_PROG_TEST_RUN.
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Link: https://lore.kernel.org/r/20241125152603.375898-2-mahe.tardy@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
test_flow_dissector.sh loads flow_dissector program and subprograms,
creates and configured relevant tunnels and interfaces, and ensure that
the bpf dissection is actually performed correctly. Similar tests exist
in test_progs (thanks to flow_dissector.c) and run the same programs,
but those are only executed with BPF_PROG_RUN: those tests are then
missing some coverage (eg: coverage for flow keys manipulated when the
configured flower uses a port range, which has a dedicated test in
test_flow_dissector.sh)
Convert test_flow_dissector.sh into test_progs so that the corresponding
tests are also run in CI.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-13-45b46494f937@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Trying to add udp-dedicated helpers in network_helpers involves
including some udp header, which makes multiple test_progs tests build
fail:
In file included from ./progs/test_cls_redirect.h:13,
from [...]/prog_tests/cls_redirect.c:15:
[...]/usr/include/linux/udp.h:23:8: error: redefinition of ‘struct udphdr’
23 | struct udphdr {
| ^~~~~~
In file included from ./network_helpers.h:17,
from [...]/prog_tests/cls_redirect.c:13:
[...]/usr/include/netinet/udp.h:55:8: note: originally defined here
55 | struct udphdr
| ^~~~~~
This error is due to struct udphdr being defined in both <linux/udp.h>
and <netinet/udp.h>.
Use only <netinet/udp.h> in every test. While at it, perform the same
for tcp.h. For some tests, the change needs to be done in the eBPF
program part as well, because of some headers sharing between both
sides.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-11-45b46494f937@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
xdp_metadata test has a small helper computing ipv4 checksums to allow
manually building packets.
Move this helper to network_helpers to share it with other tests.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-9-45b46494f937@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit a11c397c43 ("bpf/flow_dissector: add mode to enforce global BPF
flow dissector") is currently tested in test_flow_dissector.sh, which is
not part of test_progs. Add the corresponding test to flow_dissector.c,
which is part of test_progs. The new test reproduces the behavior
implemented in its shell script counterpart:
- attach a flow dissector program to the root net namespace, ensure
that we can not attach another flow dissector in any non-root net
namespace
- attach a flow dissector program to a non-root net namespace, ensure
that we can not attach another flow dissector in root namespace
Since the new test is performing operations in the root net namespace,
make sure to set it as a "serial" test to make sure not to conflict with
any other test.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-7-45b46494f937@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The bpf_flow program is able to handle GRE headers in IP packets. Add a
few test data input simulating those GRE packets, with 2 different
cases:
- parse GRE and the encapsulated packet
- parse GRE only
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-6-45b46494f937@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The flow_dissector test integrated in test_progs actually runs a wide
matrix of tests over different packets types and bpf programs modes, but
exposes only 3 main tests, preventing tests users from running specific
subtests with a specific input only.
Expose all subtests executed by flow_dissector by using
test__start_subtest().
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-5-45b46494f937@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The flow_dissector runs plenty of tests over diffent kind of packets,
grouped into three categories: skb mode, non-skb mode with direct
attach, and non-skb with indirect attach.
Re-split the main function into dedicated tests. Each test now must have
its own setup/teardown, but for the advantage of being able to run them
separately. While at it, make sure that tests attaching the bpf programs
are run in a dedicated ns.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-4-45b46494f937@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The flow dissector test currently relies on generic CHECK macros to
perform tests. Update those to newer, more-specific ASSERT macros.
This update allows to get rid of the global duration variable, which was
needed by the CHECK macros
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-3-45b46494f937@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The flow_dissector program currently compares flow keys returned by bpf
program with the expected one thanks to a custom macro using memcmp.
Use the new ASSERT_MEMEQ macro to perform this comparision. This update
also allows to get rid of the unused bpf_test_run_opts variable in
run_tests_skb_less (it was only used by the CHECK macro for its duration
field)
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-2-45b46494f937@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
CONFIG_PREEMPT is a preemtion model the so called "Low-Latency Desktop".
A different preemption model is PREEMPT_RT the so called "Real-Time".
Both implement preemption in kernel and set CONFIG_PREEMPTION.
There is also the so called "LAZY PREEMPT" which the "Scheduler
controlled preemption model". Here we have also preemption in the kernel
the rules are slightly different.
Therefore the testsuite should not check for CONFIG_PREEMPT (as one
model) but for CONFIG_PREEMPTION to figure out if preemption in the
kernel is possible.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20241119161819.qvEcs-n_@linutronix.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core
----
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL
knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read
access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code
--------------------------------------------
- Add small file operations for debugfs, to reduce the struct ops size.
- Refactoring and optimization for the implementation of page_frag API,
This is a preparatory work to consolidate the page_frag
implementation.
Netfilter
---------
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users
the option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent
CI improvements.
BPF
---
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols
---------
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after close,
the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per device
neigh lists.
Driver API
----------
- Introduce a unified interface to configure transmission H/W shaping,
and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling
-----------------
- forwarding: introduce deferred commands, to simplify
the cleanup phase
Drivers
-------
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- adds support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Adds representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- adds support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implements page pool support
- macsec:
- inherit lower device's features and TSO limits when offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- Add the dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- adds clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4
9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL
wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/
vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX
jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF
b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v
kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie
BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl
lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u
Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx
eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf
THcvVM+bupEZ
=GzPr
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core:
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of
rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infrastructure is guarded by the
CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code:
- Add small file operations for debugfs, to reduce the struct ops
size.
- Refactoring and optimization for the implementation of page_frag
API, This is a preparatory work to consolidate the page_frag
implementation.
Netfilter:
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users the
option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent CI
improvements.
BPF:
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols:
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after
close, the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state
lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per
device neigh lists.
Driver API:
- Introduce a unified interface to configure transmission H/W
shaping, and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling:
- forwarding: introduce deferred commands, to simplify the cleanup
phase
Drivers:
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- add support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Add representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- add support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implement page pool support
- macsec:
- inherit lower device's features and TSO limits when
offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- add dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- add clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature"
* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
mm: page_frag: fix a compile error when kernel is not compiled
Documentation: tipc: fix formatting issue in tipc.rst
selftests: nic_performance: Add selftest for performance of NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex states
selftests: nic_link_layer: Add link layer selftest for NIC driver
bnxt_en: Add FW trace coredump segments to the coredump
bnxt_en: Add a new ethtool -W dump flag
bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
bnxt_en: Add functions to copy host context memory
bnxt_en: Do not free FW log context memory
bnxt_en: Manage the FW trace context memory
bnxt_en: Allocate backing store memory for FW trace logs
bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
bnxt_en: Refactor bnxt_free_ctx_mem()
bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
bnxt_en: Update firmware interface spec to 1.10.3.85
selftests/bpf: Add some tests with sockmap SK_PASS
bpf: fix recursive lock when verdict program return SK_PASS
wireguard: device: support big tcp GSO
wireguard: selftests: load nf_conntrack if not present
...
Add a new tests in sockmap_basic.c to test SK_PASS for sockmap
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20241118030910.36230-3-mrpre@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add three tests for struct_ops using private stack.
./test_progs -t struct_ops_private_stack
#336/1 struct_ops_private_stack/private_stack:OK
#336/2 struct_ops_private_stack/private_stack_fail:OK
#336/3 struct_ops_private_stack/private_stack_recur:OK
#336 struct_ops_private_stack:OK
The following is a snippet of a struct_ops check_member() implementation:
u32 moff = __btf_member_bit_offset(t, member) / 8;
switch (moff) {
case offsetof(struct bpf_testmod_ops3, test_1):
prog->aux->priv_stack_requested = true;
prog->aux->recursion_detected = test_1_recursion_detected;
fallthrough;
default:
break;
}
return 0;
The first test is with nested two different callback functions where the
first prog has more than 512 byte stack size (including subprogs) with
private stack enabled.
The second test is a negative test where the second prog has more than 512
byte stack size without private stack enabled.
The third test is the same callback function recursing itself. At run time,
the jit trampoline recursion check kicks in to prevent the recursion. The
recursion_detected() callback function is implemented by the bpf_testmod,
the following message in dmesg
bpf_testmod: oh no, recursing into test_1, recursion_misses 1
demonstrates the callback function is indeed triggered when recursion miss
happens.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163938.2225528-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Some private stack tests are added including:
- main prog only with stack size greater than BPF_PSTACK_MIN_SIZE.
- main prog only with stack size smaller than BPF_PSTACK_MIN_SIZE.
- prog with one subprog having MAX_BPF_STACK stack size and another
subprog having non-zero small stack size.
- prog with callback function.
- prog with exception in main prog or subprog.
- prog with async callback without nesting
- prog with async callback with possible nesting
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163927.2224750-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Similar to commit [1] sample perf events less often in
test_send_signal_nmi(). This should reduce perf events throttling.
[1] 7015843afc ("selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241112110906.3045278-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The following invocation:
$ t1=send_signal/send_signal_perf_thread_remote \
t2=send_signal/send_signal_nmi_thread_remote \
./test_progs -t $t1,$t2
Leads to send_signal_nmi_thread_remote to be stuck
on a line 180:
/* wait for result */
err = read(pipe_c2p[0], buf, 1);
In this test case:
- perf event PERF_COUNT_HW_CPU_CYCLES is created for parent process;
- BPF program is attached to perf event, and sends a signal to child
process when event occurs;
- parent program burns some CPU in busy loop and calls read() to get
notification from child that it received a signal.
The perf event is declared with .sample_period = 1.
This forces perf to throttle events, and under some unclear conditions
the event does not always occur while parent is in busy loop.
After parent enters read() system call CPU cycles event won't be
generated for parent anymore. Thus, if perf event had not occurred
already the test is stuck.
This commit updates the parent to wait for notification with a timeout,
doing several iterations of busy loop + read_with_timeout().
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241112110906.3045278-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit provides a watchdog timer that sets a limit of how long a
single sub-test could run:
- if sub-test runs for 10 seconds, the name of the test is printed
(currently the name of the test is printed only after it finishes);
- if sub-test runs for 120 seconds, the running thread is terminated
with SIGSEGV (to trigger crash_handler() and get a stack trace).
Specifically:
- the timer is armed on each call to run_one_test();
- re-armed at each call to test__start_subtest();
- is stopped when exiting run_one_test().
Default timeout could be overridden using '-w' or '--watchdog-timeout'
options. Value 0 can be used to turn the timer off.
Here is an example execution:
$ ./ssh-exec.sh ./test_progs -w 5 -t \
send_signal/send_signal_perf_thread_remote,send_signal/send_signal_nmi_thread_remote
WATCHDOG: test case send_signal/send_signal_nmi_thread_remote executes for 5 seconds, terminating with SIGSEGV
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x1f)[0x9049ef]
/lib64/libc.so.6(+0x40d00)[0x7f1f1184fd00]
/lib64/libc.so.6(read+0x4a)[0x7f1f1191cc4a]
./test_progs[0x720dd3]
./test_progs[0x71ef7a]
./test_progs(test_send_signal+0x1db)[0x71edeb]
./test_progs[0x9066c5]
./test_progs(main+0x5ed)[0x9054ad]
/lib64/libc.so.6(+0x2a088)[0x7f1f11839088]
/lib64/libc.so.6(__libc_start_main+0x8b)[0x7f1f1183914b]
./test_progs(_start+0x25)[0x527385]
#292 send_signal:FAIL
test_send_signal_common:PASS:reading pipe 0 nsec
test_send_signal_common:PASS:reading pipe error: size 0 0 nsec
test_send_signal_common:PASS:incorrect result 0 nsec
test_send_signal_common:PASS:pipe_write 0 nsec
test_send_signal_common:PASS:setpriority 0 nsec
Timer is implemented using timer_{create,start} librt API.
Internally librt uses pthreads for SIGEV_THREAD timers,
so this change adds a background timer thread to the test process.
Because of this a few checks in tests 'bpf_iter' and 'iters'
need an update to account for an extra thread.
For parallelized scenario the watchdog is also created for each worker
fork. If one of the workers gets stuck, it would be terminated by a
watchdog. In theory, this might lead to a scenario when all worker
threads are exhausted, however this should not be a problem for
server_main(), as it would exit with some of the tests not run.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241112110906.3045278-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Logic to prevent callbacks from acquiring new references for the program
(i.e. leaving acquired references), and releasing caller references
(i.e. those acquired in parent frames) was introduced in commit
9d9d00ac29 ("bpf: Fix reference state management for synchronous callbacks").
This was necessary because back then, the verifier simulated each
callback once (that could potentially be executed N times, where N can
be zero). This meant that callbacks that left lingering resources or
cleared caller resources could do it more than once, operating on
undefined state or leaking memory.
With the fixes to callback verification in commit
ab5cfac139 ("bpf: verify callbacks as if they are called unknown number of times"),
all of this extra logic is no longer necessary. Hence, drop it as part
of this commit.
Cc: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241109231430.2475236-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
The timer_lockup test needs 2 CPUs to work, on single-CPU nodes it fails
to set thread affinity to CPU 1 since it doesn't exist:
# ./test_progs -t timer_lockup
test_timer_lockup:PASS:timer_lockup__open_and_load 0 nsec
test_timer_lockup:PASS:pthread_create thread1 0 nsec
test_timer_lockup:PASS:pthread_create thread2 0 nsec
timer_lockup_thread:PASS:cpu affinity 0 nsec
timer_lockup_thread:FAIL:cpu affinity unexpected error: 22 (errno 0)
test_timer_lockup:PASS: 0 nsec
#406 timer_lockup:FAIL
Skip the test if only 1 CPU is available.
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Fixes: 50bd5a0c65 ("selftests/bpf: Add timer lockup selftest")
Tested-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241107115231.75200-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Add test cases to verify the following four update operations on htab of
maps don't trigger lockdep warning:
(1) add then delete
(2) add, overwrite, then delete
(3) add, then lookup_and_delete
(4) add two elements, then lookup_and_delete_batch
Test cases are added for pre-allocated and non-preallocated htab of maps
respectively.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241106063542.357743-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Moving the definition of ENOTSUPP into bpf_util.h to remove the
duplicated definitions in multiple files.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241106063542.357743-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
With recent uprobe fix [1] the sync time after unregistering uprobe is
much longer and prolongs the consumer test which creates and destroys
hundreds of uprobes.
This change adds 16 threads (which fits the test logic) and speeds up
the test.
Before the change:
# perf stat --null ./test_progs -t uprobe_multi_test/consumers
#421/9 uprobe_multi_test/consumers:OK
#421 uprobe_multi_test:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
Performance counter stats for './test_progs -t uprobe_multi_test/consumers':
28.818778973 seconds time elapsed
0.745518000 seconds user
0.919186000 seconds sys
After the change:
# perf stat --null ./test_progs -t uprobe_multi_test/consumers 2>&1
#421/9 uprobe_multi_test/consumers:OK
#421 uprobe_multi_test:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
Performance counter stats for './test_progs -t uprobe_multi_test/consumers':
3.504790814 seconds time elapsed
0.012141000 seconds user
0.751760000 seconds sys
[1] commit 87195a1ee3 ("uprobes: switch to RCU Tasks Trace flavor for better performance")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-14-jolsa@kernel.org
Adding uprobe session consumers to the consumer test,
so we get the session into the test mix.
In addition scaling down the test to have just 1 uprobe
and 1 uretprobe, otherwise the test time grows and is
unsuitable for CI even with threads.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-13-jolsa@kernel.org
Testing that the session ret_handler bypass works on single
uprobe with multiple consumers, each with different session
ignore return value.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-12-jolsa@kernel.org
Making sure kprobe.session program can return only [0,1] values.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-11-jolsa@kernel.org
Making sure uprobe.session program can return only [0,1] values.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-10-jolsa@kernel.org
Adding uprobe session test that verifies the cookie value is stored
properly when single uprobe-ed function is executed recursively.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-9-jolsa@kernel.org
Adding uprobe session test that verifies the cookie value
get properly propagated from entry to return program.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-8-jolsa@kernel.org
Adding uprobe session test and testing that the entry program
return value controls execution of the return probe program.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-7-jolsa@kernel.org
The new uprobe changes bring some new behaviour that we need to reflect
in the consumer test. Now pending uprobe instance in the kernel can
survive longer and thus might call uretprobe consumer callbacks in
some situations in which, previously, such callback would be omitted.
We now need to take that into account in uprobe-multi consumer tests.
The idea being that uretprobe under test either stayed from before to
after (uret_stays + test_bit) or uretprobe instance survived and we
have uretprobe active in after (uret_survives + test_bit).
uret_survives just states that uretprobe survives if there are *any*
uretprobes both before and after (overlapping or not, doesn't matter)
and uprobe was attached before.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241107094337.3848210-1-jolsa@kernel.org
New netns selftest helpers netns_new() and netns_free() has been added
in network_helpers.c, let's use them in mptcp selftests too instead of
using MPTCP's own helpers create_netns() and cleanup_netns().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/c02fda3177b34f9e74a044833fda9761627f4d07.1730338692.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Ensure that trusted PTR_TO_BTF_ID accesses perform PROBE_MEM handling in
raw_tp program. Without the previous fix, this selftest crashes the
kernel due to a NULL-pointer dereference. Also ensure that dead code
elimination does not kick in for checks on the pointer.
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241104171959.2938862-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Availability of the gettid definition across glibc versions supported by
BPF selftests is not certain. Currently, all users in the tree open-code
syscall to gettid. Convert them to a common macro definition.
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241104171959.2938862-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZyP6TxUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISINz7QD/RTuJAzPJXPQmjdzMj7pepjnSQH4K
DnOc1soDqjJPSFkBAMlklDCZqSsFoNtNxagbyILrYQBC/MsV9jngimK46DEN
=pDzC
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-10-31
We've added 13 non-merge commits during the last 16 day(s) which contain
a total of 16 files changed, 710 insertions(+), 668 deletions(-).
The main changes are:
1) Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it, from Puranjay Mohan.
2) Rewrite and migrate the test_tcp_check_syncookie.sh BPF selftest
into test_progs so that it can be run in BPF CI, from Alexis Lothoré.
3) Two BPF sockmap selftest fixes, from Zijian Zhang.
4) Small XDP synproxy BPF selftest cleanup to remove IP_DF check,
from Vincent Li.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Add a selftest for bpf_csum_diff()
selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifier
bpf: bpf_csum_diff: Optimize and homogenize for all archs
net: checksum: Move from32to16() to generic header
selftests/bpf: remove xdp_synproxy IP_DF check
selftests/bpf: remove test_tcp_check_syncookie
selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
selftests/bpf: get rid of global vars in btf_skc_cls_ingress
selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
selftests/bpf: factorize conn and syncookies tests in a single runner
selftests/bpf: Fix txmsg_redir of test_txmsg_pull in test_sockmap
selftests/bpf: Fix msg_verify_data in test_sockmap
====================
Link: https://patch.msgid.link/20241031221543.108853-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The new subtest runs with bpf_prog_test_run_opts() as a syscall prog.
It iterates the kmem_cache using bpf_for_each loop and count the number
of entries. Finally it checks it with the number of entries from the
regular iterator.
$ ./vmtest.sh -- ./test_progs -t kmem_cache_iter
...
#130/1 kmem_cache_iter/check_task_struct:OK
#130/2 kmem_cache_iter/check_slabinfo:OK
#130/3 kmem_cache_iter/open_coded_iter:OK
#130 kmem_cache_iter:OK
Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED
Also simplify the code by using attach routine of the skeleton.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241030222819.1800667-2-namhyung@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a selftest for the bpf_csum_diff() helper. This selftests runs the
helper in all three configurations(push, pull, and diff) and verifies
its output. The correct results have been computed by hand and by the
helper's older implementation.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-5-puranjay@kernel.org
As we can see from the title, when I compiled the selftests/bpf, I
saw the error:
implicit declaration of function ‘gettid’ ; did you mean ‘getgid’? [-Werror=implicit-function-declaration]
skel->bss->tid = gettid();
^~~~~~
getgid
Directly call the syscall solves this issue.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20241029074627.80289-1-kerneljasonxing@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch tests the map creation failure when the map_value
has unsupported uptr. The three cases are the struct is larger
than one page, the struct is empty, and the struct is a kernel struct.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-13-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch test the following failures in syscall update_elem
1. The first update_elem(BPF_F_LOCK) should be EOPNOTSUPP. syscall.c takes
care of unpinning the uptr.
2. The second update_elem(BPF_EXIST) fails. syscall.c takes care of
unpinning the uptr.
3. The forth update_elem(BPF_NOEXIST) fails. bpf_local_storage_update
takes care of unpinning the uptr.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-11-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch tests the case when uptr has a struct spanning across two
pages. It is not supported now and EOPNOTSUPP is expected from the
syscall update_elem.
It also tests the whole uptr struct located exactly at the
end of a page and ensures that this case is accepted by update_elem.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-10-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make sure the memory of uptrs have been mapped to the kernel properly.
Also ensure the values of uptrs in the kernel haven't been copied
to userspace.
It also has the syscall update_elem/delete_elem test to test the
pin/unpin code paths.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a new subtest validating that bpf_object loaded and initialized
through generic APIs is still interoperable with BPF subskeleton,
including initialization and reading of global variables.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241023043908.3834423-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Move the "load w/o expected_attach_type" test case to
prog_tests/sock_create.c and drop the remaining test case, as it is made
redundant with the existing coverage inside prog_tests/sock_create.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20241022152913.574836-4-jrife@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Move all BPF_CGROUP_INET6_POST_BIND and BPF_CGROUP_INET4_POST_BIND test
cases to a new prog_test, prog_tests/sock_post_bind.c, except for
LOAD_REJECT test cases.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20241022152913.574836-2-jrife@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add testcases to test bpf_send_signal_task(). In these new test cases,
the main process triggers the BPF program and the forked process
receives the signals. The target process's signal handler receives a
cookie from the bpf program.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241016084136.10305-3-puranjay@kernel.org
One remaining difference between test_tcp_check_syncookie.sh and
btf_skc_cls_ingress is a small test on the mss value embedded in the
cookie generated with the eBPF helper.
Bring the corresponding test in btf_skc_cls_ingress.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241020-syncookie-v2-5-2db240225fed@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
btf_skc_cls_ingress test currently checks that syncookie and
bpf_sk_assign/release helpers behave correctly in multiple scenarios,
but only with ipv6 socket.
Increase those helpers coverage by adding testing support for IPv4
sockets and IPv4/IPv6 sockets. The rework is mostly based on features
brought earlier in test_tcp_check_syncookie.sh to cover some fixes
performed on those helpers, but test_tcp_check_syncookie.sh is not
integrated in test_progs. The most notable changes linked to this are:
- some rework in the corresponding eBPF program to support both types of
traffic
- the switch from start_server to start_server_str to allow to check
some socket options
- the introduction of new subtests for ipv4 and ipv4/ipv6
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241020-syncookie-v2-4-2db240225fed@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
There are a few global variables in btf_skc_cls_ingress.c, which are not
really used by different tests. Get rid of those global variables, by
performing the following updates:
- make srv_sa6 local to the main runner function
- make skel local to the main function, and propagate it through
function arguments
- get rid of duration by replacing CHECK macros with the ASSERT_XXX
macros. While updating those assert macros:
- do not return early on asserts performing some actual tests, let the
other tests run as well (keep the early return for parts handling
test setup)
- instead of converting the CHECK on skel->bss->linum, just remove it,
since there is already a call to print_err_line after the test to
print the failing line in the bpf program
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241020-syncookie-v2-3-2db240225fed@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
btf_skc_cls_ingress.c currently runs two subtests, and create a
dedicated network namespace for each, but never cleans up the created
namespace once the test has ended.
Add missing namespace cleanup after each subtest to avoid accumulating
namespaces for each new subtest. While at it, switch namespace
management to netns_{new,free}
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241020-syncookie-v2-2-2db240225fed@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
btf_skc_cls_ingress currently describe two tests, both running a simple
tcp server and then initializing a connection to it. The sole difference
between the tests is about the tcp_syncookie configuration, and some
checks around this feature being enabled/disabled.
Share the common code between those two tests by moving the code into a
single runner, parameterized by a "gen_cookies" argument. Split the
performed checks accordingly.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241020-syncookie-v2-1-2db240225fed@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Previously test_task_tid was setting `linfo.task.tid`
to `getpid()` which is the same as `gettid()` for the
parent process. Instead create a new child thread
and set `linfo.task.tid` to `gettid()` to make sure
the tid filtering logic is working as expected.
Signed-off-by: Jordan Rome <linux@jordanrome.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241016210048.1213935-2-linux@jordanrome.com
Add a test case to ensure that attaching a tail callee program with an
freplace program fails, and that updating an extended program to a
prog_array map is also prohibited.
This test is designed to prevent the potential infinite loop issue caused
by the combination of tail calls and freplace, ensuring the correct
behavior and stability of the system.
Additionally, fix the broken tailcalls/tailcall_freplace selftest
because an extension prog should not be tailcalled.
cd tools/testing/selftests/bpf; ./test_progs -t tailcalls
337/25 tailcalls/tailcall_freplace:OK
337/26 tailcalls/tailcall_bpf2bpf_freplace:OK
337 tailcalls:OK
Summary: 1/26 PASSED, 0 SKIPPED, 0 FAILED
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20241015150207.70264-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds test cases for bpf_task_from_vpid() kfunc.
task_kfunc_from_vpid_no_null_check is used to test the case where
the return value is not checked for NULL pointer.
test_task_from_vpid_current is used to test obtaining the
struct task_struct of the process in the pid namespace based on vpid.
test_task_from_vpid_invalid is used to test the case of invalid vpid.
test_task_from_vpid_current and test_task_from_vpid_invalid will run
in the new namespace.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB5848F13435CD650AC4B7BD7099442@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test traverses all slab caches using the kmem_cache_iter and save
the data into slab_result array map. And check if current task's
pointer is from "task_struct" slab cache using bpf_get_kmem_cache().
Also compare the result array with /proc/slabinfo if available (when
CONFIG_SLUB_DEBUG is on). Note that many of the fields in the slabinfo
are transient, so it only compares the name and objsize fields.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241010232505.1339892-4-namhyung@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch continues the migration and removal process for cgroup
sock_create tests to selftests.
The test being migrated verifies the ability of cgroup BPF to block the
creation of specific types of sockets using a verdict. Specifically, the
test denies socket creation when the socket is of type AF_INET{6},
SOCK_DGRAM, and IPPROTO_ICMP{V6}. If the requested socket type matches
these attributes, the cgroup BPF verdict blocks the socket creation.
As with the previous commit, this test currently lacks coverage in
selftests, so this patch migrates the functionality into the sock_create
tests under selftests. This migration ensures that the socket creation
blocking behavior with cgroup bpf program is properly tested within the
selftest framework.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20241011044847.51584-3-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch migrates the old test for cgroup BPF that sets
sk_bound_dev_if, mark, and priority when AF_INET{6} sockets are created.
The most closely related tests under selftests are 'test_sock' and
'sockopt'. However, these existing tests serve different purposes.
'test_sock' focuses mainly on verifying the socket binding process,
while 'sockopt' concentrates on testing the behavior of getsockopt and
setsockopt operations for various socket options.
Neither of these existing tests directly covers the ability of cgroup
BPF to set socket attributes such as sk_bound_dev_if, mark, and priority
during socket creation. To address this gap, this patch introduces a
migration of the old cgroup socket attribute test, now included as the
'sock_create' test in selftests/bpf. This ensures that the ability to
configure these attributes during socket creation is properly tested.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20241011044847.51584-2-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Removed unnecessary `fd = -1` assignments after closing file descriptors.
because it will be assigned by the function bpf_prog_load().This improves
code readability and removes redundant operations.
Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241010055737.4292-1-zhujun2@cmss.chinamobile.com
Add assertions in `bpf_link_info.uprobe_multi` test to verify that
`count` and `path_size` fields are correctly populated when the fields
are unset.
This tests a previous bug where the `path_size` field was not populated
when `path` and `path_size` were unset.
Signed-off-by: Tyrone Wu <wudevelops@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241011000803.681190-2-wudevelops@gmail.com
xdp_cpumap_attach, in its current form, only checks that an xdp cpumap
program can be executed, but not that it performs correctly the cpu
redirect as configured by userspace (bpf_prog_test_run_opts will return
success even if the redirect program returns an error)
Add a check to ensure that the program performs the configured redirect
as well. The check is based on a global variable incremented by a
chained program executed only if the redirect program properly executes.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241009-convert_xdp_tests-v3-3-51cea913710c@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Current test only checks attach/detach on cpu map type program, and so
does not check that it can be properly executed, neither that it
redirects correctly.
Update the existing test to extend its coverage:
- keep the redirected program loaded
- try to execute it through bpf_prog_test_run_opts with some dummy
context
While at it, bring the following minor improvements:
- isolate test interface in its own namespace
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241009-convert_xdp_tests-v3-2-51cea913710c@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add a test case for kfuncs from multiple external modules, checking
that the correct kfuncs are called regardless of which order they're
called in. Specifically, check that calling the kfuncs in an order
different from the one the modules' BTF are loaded in works.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20241010-fix-kfunc-btf-caching-for-modules-v2-3-745af6c1af98@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Existing code calls connect() with a 'struct sockaddr_in6 *' argument
where a 'struct sockaddr *' argument is declared, yielding compile errors
when building for mips64el/musl-libc:
In file included from cgroup_ancestor.c:3:
cgroup_ancestor.c: In function 'send_datagram':
cgroup_ancestor.c:38:38: error: passing argument 2 of 'connect' from incompatible pointer type [-Werror=incompatible-pointer-types]
38 | if (!ASSERT_OK(connect(sock, &addr, sizeof(addr)), "connect")) {
| ^~~~~
| |
| struct sockaddr_in6 *
./test_progs.h:343:29: note: in definition of macro 'ASSERT_OK'
343 | long long ___res = (res); \
| ^~~
In file included from .../netinet/in.h:10,
from .../arpa/inet.h:9,
from ./test_progs.h:17:
.../sys/socket.h:386:19: note: expected 'const struct sockaddr *' but argument is of type 'struct sockaddr_in6 *'
386 | int connect (int, const struct sockaddr *, socklen_t);
| ^~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
This only compiles because of a glibc extension allowing declaration of the
argument as a "transparent union" which includes both types above.
Explicitly cast the argument to allow compiling for both musl and glibc.
Cc: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Fixes: f957c230e1 ("selftests/bpf: convert test_skb_cgroup_id_user to test_progs")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Reviewed-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241008231232.634047-1-tony.ambardar@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add three success test cases to test the flattening of array of nested
struct. For these three tests, the number of special fields in map is
BTF_FIELDS_MAX, but the array is defined in structs with different
nested level.
Add one failure test case for the flattening as well. In the test case,
the number of special fields in map is BTF_FIELDS_MAX + 1. It will make
btf_parse_fields() in map_create() return -E2BIG, the creation of map
will succeed, but the load of program will fail because the btf_record
is invalid for the map.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241008071114.3718177-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add netns cookie test that verifies the helper is now supported and work
in the context of tc programs.
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Link: https://lore.kernel.org/r/20241007095958.97442-2-mahe.tardy@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Let's bail out from consumer test after we hit first fail,
so we don't pollute the log with many instances with possibly
the same error.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With newly merged code the uprobe behaviour is slightly different
and affects uprobe consumer test.
We no longer need to check if the uprobe object is still preserved
after removing last uretprobe, because it stays as long as there's
pending/installed uretprobe instance.
This allows to run uretprobe consumers registered 'after' uprobe was
hit even if previous uretprobe got unregistered before being hit.
The uprobe object will be now removed after the last uprobe ref is
released and in such case it's held by ri->uprobe (return instance)
which is released after the uretprobe is hit.
Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Ihor Solodrai <ihor.solodrai@pm.me>
Closes: https://lore.kernel.org/bpf/w6U8Z9fdhjnkSp2UaFaV1fGqJXvfLEtDKEUyGDkwmoruDJ_AgF_c0FFhrkeKW18OqiP-05s9yDKiT6X-Ns-avN_ABf0dcUkXqbSJN1TQSXo=@pm.me/
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The current xdp_devmap_attach test attaches a program
that redirects to another program via devmap.
It is, however, never executed, so do that to catch
any bugs that might occur during execution.
Also, execute the same for a veth pair so that we
also cover the non-generic path.
Warning: Running this without the bugfix in this series
will likely crash your system.
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20240911-devel-koalo-fix-ingress-ifindex-v4-2-5c643ae10258@linutronix.de
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch adds a subtest named test_subflow in test_mptcp to load and
verify the newly added MPTCP subflow BPF program. To goal is to make
sure it is possible to set different socket options per subflows, while
the userspace socket interface only lets the application to set the same
socket options for the whole MPTCP connection and its multiple subflows.
To check that, a client and a server are started in a dedicated netns,
with veth interfaces to simulate multiple paths. They will exchange data
to allow the creation of an additional subflow.
When the different subflows are being created, the new MPTCP subflow BPF
program will set some socket options: marks and TCP CC. The validation
is done by the same program, when the userspace checks the value of the
modified socket options. On the userspace side, it will see that the
default values are still being used on the MPTCP connection, while the
BPF program will see different options set per subflow of the same MPTCP
connection.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240926-upstream-bpf-next-20240506-mptcp-subflow-test-v7-3-d26029e15cdd@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmbk/nIACgkQ6rmadz2v
bTqxuBAAnqW81Rr0nORIxeJMbyo4EiFuYHGk6u5BYP9NPzqHroUPCLVmSP7Hp/Ta
CJjsiZeivZsGa6Qlc3BCa4hHNpqP5WE1C/73svSDn7/99EfxdSBtirpMVFUPsUtn
DDb5chNpvnxKNS8Mw5Ty8wBrdbXHMlSx+IfaFHpv0Yn6EAcuF4UdoEUq2l3PqhfD
Il9Zm127eViPGAP+o+TBZFfW+rRw8d0ngqeRq2GvJ8ibNEDWss+GmBI1Dod7d+fC
dUDg96Ipdm1a5Xz7dnH80eXz9JHdpu6qhQrQMKKArnlpJElrKiOf9b17ZcJoPQOR
ZnstEnUyVnrWROZxUuKY72+2tx3TuSf+L9uZqFHNx3Ix5FIoS+tFbHf4b8SxtsOb
hb2X7SigdGqhQDxUT+IPeO5hsJlIvG1/VYxMXxgc++rh9DjL06hDLUSH1WBSU0fC
kFQ7HrcpAlVHtWmGbwwUyVjD+KC/qmZBTAnkcYT4C62WZVytSCnihIuSFAvV1tpZ
SSIhVPyQ599UoZIiQYihp0S4qP74FotCtErWSrThneh2Cl8kDsRq//lV1nj/PTV8
CpTvz4VCFDFTgthCfd62fP95EwW5K+aE3NjGTPW/9Hx/0+J/1tT+yqWsrToGaruf
TbrqtzQhpclz9UEqA+696cVAXNj9uRU4AoD3YIg72kVnRlkgYd0=
=MDwh
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Introduce '__attribute__((bpf_fastcall))' for helpers and kfuncs with
corresponding support in LLVM.
It is similar to existing 'no_caller_saved_registers' attribute in
GCC/LLVM with a provision for backward compatibility. It allows
compilers generate more efficient BPF code assuming the verifier or
JITs will inline or partially inline a helper/kfunc with such
attribute. bpf_cast_to_kern_ctx, bpf_rdonly_cast,
bpf_get_smp_processor_id are the first set of such helpers.
- Harden and extend ELF build ID parsing logic.
When called from sleepable context the relevants parts of ELF file
will be read to find and fetch .note.gnu.build-id information. Also
harden the logic to avoid TOCTOU, overflow, out-of-bounds problems.
- Improvements and fixes for sched-ext:
- Allow passing BPF iterators as kfunc arguments
- Make the pointer returned from iter_next method trusted
- Fix x86 JIT convergence issue due to growing/shrinking conditional
jumps in variable length encoding
- BPF_LSM related:
- Introduce few VFS kfuncs and consolidate them in
fs/bpf_fs_kfuncs.c
- Enforce correct range of return values from certain LSM hooks
- Disallow attaching to other LSM hooks
- Prerequisite work for upcoming Qdisc in BPF:
- Allow kptrs in program provided structs
- Support for gen_epilogue in verifier_ops
- Important fixes:
- Fix uprobe multi pid filter check
- Fix bpf_strtol and bpf_strtoul helpers
- Track equal scalars history on per-instruction level
- Fix tailcall hierarchy on x86 and arm64
- Fix signed division overflow to prevent INT_MIN/-1 trap on x86
- Fix get kernel stack in BPF progs attached to tracepoint:syscall
- Selftests:
- Add uprobe bench/stress tool
- Generate file dependencies to drastically improve re-build time
- Match JIT-ed and BPF asm with __xlated/__jited keywords
- Convert older tests to test_progs framework
- Add support for RISC-V
- Few fixes when BPF programs are compiled with GCC-BPF backend
(support for GCC-BPF in BPF CI is ongoing in parallel)
- Add traffic monitor
- Enable cross compile and musl libc
* tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (260 commits)
btf: require pahole 1.21+ for DEBUG_INFO_BTF with default DWARF version
btf: move pahole check in scripts/link-vmlinux.sh to lib/Kconfig.debug
btf: remove redundant CONFIG_BPF test in scripts/link-vmlinux.sh
bpf: Call the missed kfree() when there is no special field in btf
bpf: Call the missed btf_record_free() when map creation fails
selftests/bpf: Add a test case to write mtu result into .rodata
selftests/bpf: Add a test case to write strtol result into .rodata
selftests/bpf: Rename ARG_PTR_TO_LONG test description
selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test
bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error
bpf: Improve check_raw_mode_ok test for MEM_UNINIT-tagged types
bpf: Fix helper writes to read-only maps
bpf: Remove truncation test in bpf_strtol and bpf_strtoul helpers
bpf: Fix bpf_strtol and bpf_strtoul helpers for 32bit
selftests/bpf: Add tests for sdiv/smod overflow cases
bpf: Fix a sdiv overflow issue
libbpf: Add bpf_object__token_fd accessor
docs/bpf: Add missing BPF program types to docs
docs/bpf: Add constant values for linkages
bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing
...
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZuH9UQAKCRDbK58LschI
g0/zAP99WOcCBp1M/jSTUOba230+eiol7l5RirDEA6wu7TqY2QEAuvMG0KfCCpTI
I0WqStrK1QMbhwKPodJC1k+17jArKgw=
=jfMU
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-09-11
We've added 12 non-merge commits during the last 16 day(s) which contain
a total of 20 files changed, 228 insertions(+), 30 deletions(-).
There's a minor merge conflict in drivers/net/netkit.c:
00d066a4d4 ("netdev_features: convert NETIF_F_LLTX to dev->lltx")
d966087948 ("netkit: Disable netpoll support")
The main changes are:
1) Enable bpf_dynptr_from_skb for tp_btf such that this can be used
to easily parse skbs in BPF programs attached to tracepoints,
from Philo Lu.
2) Add a cond_resched() point in BPF's sock_hash_free() as there have
been several syzbot soft lockup reports recently, from Eric Dumazet.
3) Fix xsk_buff_can_alloc() to account for queue_empty_descs which
got noticed when zero copy ice driver started to use it,
from Maciej Fijalkowski.
4) Move the xdp:xdp_cpumap_kthread tracepoint before cpumap pushes skbs
up via netif_receive_skb_list() to better measure latencies,
from Daniel Xu.
5) Follow-up to disable netpoll support from netkit, from Daniel Borkmann.
6) Improve xsk selftests to not assume a fixed MAX_SKB_FRAGS of 17 but
instead gather the actual value via /proc/sys/net/core/max_skb_frags,
also from Maciej Fijalkowski.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
sock_map: Add a cond_resched() in sock_hash_free()
selftests/bpf: Expand skb dynptr selftests for tp_btf
bpf: Allow bpf_dynptr_from_skb() for tp_btf
tcp: Use skb__nullable in trace_tcp_send_reset
selftests/bpf: Add test for __nullable suffix in tp_btf
bpf: Support __nullable argument suffix for tp_btf
bpf, cpumap: Move xdp:xdp_cpumap_kthread tracepoint before rcv
selftests/xsk: Read current MAX_SKB_FRAGS from sysctl knob
xsk: Bump xsk_queue::queue_empty_descs in xp_can_alloc()
tcp_bpf: Remove an unused parameter for bpf_tcp_ingress()
bpf, sockmap: Correct spelling skmsg.c
netkit: Disable netpoll support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Link: https://patch.msgid.link/20240911211525.13834-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
No conflicts (sort of) and no adjacent changes.
This merge reverts commit b3c9e65eb2 ("net: hsr: remove seqnr_lock")
from net, as it was superseded by
commit 430d67bdcb ("net: hsr: Use the seqnr lock for frames received via interlink port.")
in net-next.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new set of tests validating behavior of capturing stack traces
with build ID. We extend uprobe_multi target binary with ability to
trigger uprobe (so that we can capture stack traces from it), but also
we allow to force build ID data to be either resident or non-resident in
memory (see also a comment about quirks of MADV_PAGEOUT).
That way we can validate that in non-sleepable context we won't get
build ID (as expected), but with sleepable uprobes we will get that
build ID regardless of it being physically present in memory.
Also, we add a small add-on linker script which reorders
.note.gnu.build-id section and puts it after (big) .text section,
putting build ID data outside of the very first page of ELF file. This
will test all the relaxations we did in build ID parsing logic in kernel
thanks to freader abstraction.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-11-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add 3 test cases for skb dynptr used in tp_btf:
- test_dynptr_skb_tp_btf: use skb dynptr in tp_btf and make sure it is
read-only.
- skb_invalid_ctx_fentry/skb_invalid_ctx_fexit: bpf_dynptr_from_skb
should fail in fentry/fexit.
In test_dynptr_skb_tp_btf, to trigger the tracepoint in kfree_skb,
test_pkt_access is used for its test_run, as in kfree_skb.c. Because the
test process is different from others, a new setup type is defined,
i.e., SETUP_SKB_PROG_TP.
The result is like:
$ ./test_progs -t 'dynptr/test_dynptr_skb_tp_btf'
#84/14 dynptr/test_dynptr_skb_tp_btf:OK
#84 dynptr:OK
#127 kfunc_dynptr_param:OK
Summary: 2/1 PASSED, 0 SKIPPED, 0 FAILED
$ ./test_progs -t 'dynptr/skb_invalid_ctx_f'
#84/85 dynptr/skb_invalid_ctx_fentry:OK
#84/86 dynptr/skb_invalid_ctx_fexit:OK
#84 dynptr:OK
#127 kfunc_dynptr_param:OK
Summary: 2/2 PASSED, 0 SKIPPED, 0 FAILED
Also fix two coding style nits (change spaces to tabs).
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240911033719.91468-6-lulie@linux.alibaba.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
When "arg#%d expected pointer to ctx, but got %s" error is printed, both
template parts actually point to the type of the argument, therefore, it
will also say "but got PTR", regardless of what was the actual register
type.
Fix the message to print the register type in the second part of the
template, change the existing test to adapt to the new format, and add a
new test to test the case when arg is a pointer to context, but reg is a
scalar.
Fixes: 00b85860fe ("bpf: Rewrite kfunc argument handling")
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240909133909.1315460-1-maxim@isovalent.com
Recently, when compiling bpf selftests on RV64, the following
compilation failure occurred:
progs/bpf_dctcp.c:29:21: error: redefinition of 'fallback' as different kind of symbol
29 | volatile const char fallback[TCP_CA_NAME_MAX];
| ^
/workspace/tools/testing/selftests/bpf/tools/include/vmlinux.h:86812:15: note: previous definition is here
86812 | typedef u32 (*fallback)(u32, const unsigned char *, size_t);
The reason is that the `fallback` symbol has been defined in
arch/riscv/lib/crc32.c, which will cause symbol conflicts when vmlinux.h
is included in bpf_dctcp. Let we rename `fallback` string to
`fallback_cc` in bpf_dctcp to fix this compilation failure.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-3-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The idea is to run same test as for test_pid_filter_process, but instead
of standard fork-ed process we create the process with clone(CLONE_VM..)
to make sure the thread leader process filter works properly in this case.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-5-jolsa@kernel.org
The idea is to create and monitor 3 uprobes, each trigered in separate
process and make sure the bpf program gets executed just for the proper
PID specified via pid filter.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-4-jolsa@kernel.org
Considering that CO-RE direct read access to the first system call
argument is already available on s390 and arm64, let's enable
test_bpf_syscall_macro:syscall_arg1 on these architectures.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-4-pulehui@huaweicloud.com
The core part of the selftest, i.e., the je <-> jmp cycle, mimics the
original sched-ext bpf program. The test will fail without the
previous patch.
I tried to create some cases for other potential cycles
(je <-> je, jmp <-> je and jmp <-> jmp) with similar pattern
to the test in this patch, but failed. So this patch
only contains one test for je <-> jmp cycle.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240904221256.37389-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add selftest for cases where btf_name_valid_section() does not properly
check for certain types of names.
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Link: https://lore.kernel.org/r/20240831054742.364585-1-aha310510@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Create a BTF with endianness different from host, make a distilled
base/split BTF pair from it, dump as raw bytes, import again and
verify that endianness is preserved.
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240830173406.1581007-1-eddyz87@gmail.com
This patch adds test cases for iter next method returning valid
pointer, which can also used as usage examples.
Currently iter next method should return valid pointer.
iter_next_trusted is the correct usage and test if iter next method
return valid pointer. bpf_iter_task_vma_next has KF_RET_NULL flag,
so the returned pointer may be NULL. We need to check if the pointer
is NULL before using it.
iter_next_trusted_or_null is the incorrect usage. There is no checking
before using the pointer, so it will be rejected by the verifier.
iter_next_rcu and iter_next_rcu_or_null are similar test cases for
KF_RCU_PROTECTED iterators.
iter_next_rcu_not_trusted is used to test that the pointer returned by
iter next method of KF_RCU_PROTECTED iterator cannot be passed in
KF_TRUSTED_ARGS kfuncs.
iter_next_ptr_mem_not_trusted is used to test that base type
PTR_TO_MEM should not be combined with type flag PTR_TRUSTED.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB5848709758F6922F02AF9F1F99962@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch tests the epilogue patching when the main prog has
multiple BPF_EXIT. The verifier should have patched the 2nd (and
later) BPF_EXIT with a BPF_JA that goes back to the earlier
patched epilogue instructions.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-10-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a pro/epilogue test when the main prog has a goto insn
that goes back to the very first instruction of the prog. It is
to test the correctness of the adjust_jmp_off(prog, 0, delta)
after the verifier has applied the prologue and/or epilogue patch.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a gen_epilogue test to test a main prog
using a bpf_tail_call.
A non test_loader test is used. The tailcall target program,
"test_epilogue_subprog", needs to be used in a struct_ops map
before it can be loaded. Another struct_ops map is also needed
to host the actual "test_epilogue_tailcall" struct_ops program
that does the bpf_tail_call. The earlier test_loader patch
will attach all struct_ops maps but the bpf_testmod.c does
not support >1 attached struct_ops.
The earlier patch used the test_loader which has already covered
checking for the patched pro/epilogue instructions. This is done
by the __xlated tag.
This patch goes for the regular skel load and syscall test to do
the tailcall test that can also allow to directly pass the
the "struct st_ops_args *args" as ctx_in to the
SEC("syscall") program.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-8-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This test adds a new struct_ops "bpf_testmod_st_ops" in bpf_testmod.
The ops of the bpf_testmod_st_ops is triggered by new kfunc calls
"bpf_kfunc_st_ops_test_*logue". These new kfunc calls are
primarily used by the SEC("syscall") program. The test triggering
sequence is like:
SEC("syscall")
syscall_prologue(struct st_ops_args *args)
bpf_kfunc_st_op_test_prologue(args)
st_ops->test_prologue(args)
.gen_prologue adds 1000 to args->a
.gen_epilogue adds 10000 to args->a
.gen_epilogue will also set the r0 to 2 * args->a.
The .gen_prologue and .gen_epilogue of the bpf_testmod_st_ops
will test the prog->aux->attach_func_name to decide if
it needs to generate codes.
The main programs of the pro_epilogue.c will call a
new kfunc bpf_kfunc_st_ops_inc10 which does "args->a += 10".
It will also call a subprog() which does "args->a += 1".
This patch uses the test_loader infra to check the __xlated
instructions patched after gen_prologue and/or gen_epilogue.
The __xlated check is based on Eduard's example (Thanks!) in v1.
args->a is returned by the struct_ops prog (either the main prog
or the epilogue). Thus, the __retval of the SEC("syscall") prog
is checked. For example, when triggering the ops in the
'SEC("struct_ops/test_epilogue") int test_epilogue'
The expected args->a is +1 (subprog call) + 10 (kfunc call)
+ 10000 (.gen_epilogue) = 10011.
The expected return value is 2 * 10011 (.gen_epilogue).
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-7-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds test cases for zero offset (implicit cast) or non-zero
offset pointer as KF_ACQUIRE kfuncs argument. Currently KF_ACQUIRE
kfuncs should support passing in pointers like &sk->sk_write_queue
(non-zero offset) or &sk->__sk_common (zero offset) and not be rejected
by the verifier.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB5848CB6F0D4D9068669A905B99952@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZsiMrQAKCRDbK58LschI
g1mtAP9wBoNO9sNRrJ2OUg69R5uSTT2//v7icN01xwVtx9ir/AD+PJ+v/WG1QVlM
6GNsPoGtQ53ptuiJFfXEkuVELGqKywY=
=I/T4
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-08-23
We've added 10 non-merge commits during the last 15 day(s) which contain
a total of 10 files changed, 222 insertions(+), 190 deletions(-).
The main changes are:
1) Add TCP_BPF_SOCK_OPS_CB_FLAGS to bpf_*sockopt() to address the case
when long-lived sockets miss a chance to set additional callbacks
if a sockops program was not attached early in their lifetime,
from Alan Maguire.
2) Add a batch of BPF selftest improvements which fix a few bugs and add
missing features to improve the test coverage of sockmap/sockhash,
from Michal Luczaj.
3) Fix a false-positive Smatch-reported off-by-one in tcp_validate_cookie()
which is part of the test_tcp_custom_syncookie BPF selftest,
from Kuniyuki Iwashima.
4) Fix the flow_dissector BPF selftest which had a bug in IP header's
tot_len calculation doing subtraction after htons() instead of inside
htons(), from Asbjørn Sloth Tønnesen.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftest: bpf: Remove mssind boundary check in test_tcp_custom_syncookie.c.
selftests/bpf: Introduce __attribute__((cleanup)) in create_pair()
selftests/bpf: Exercise SOCK_STREAM unix_inet_redir_to_connected()
selftests/bpf: Honour the sotype of af_unix redir tests
selftests/bpf: Simplify inet_socketpair() and vsock_socketpair_connectible()
selftests/bpf: Socket pair creation, cleanups
selftests/bpf: Support more socket types in create_pair()
selftests/bpf: Avoid subtraction after htons() in ipip tests
selftests/bpf: add sockopt tests for TCP_BPF_SOCK_OPS_CB_FLAGS
bpf/bpf_get,set_sockopt: add option to set TCP-BPF sock ops flags
====================
Link: https://patch.msgid.link/20240823134959.1091-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a selftest to confirm the issue, which gets -EINVAL when update
attached freplace prog to prog_array map, has been fixed.
cd tools/testing/selftests/bpf; ./test_progs -t tailcalls
328/25 tailcalls/tailcall_freplace:OK
328 tailcalls:OK
Summary: 1/25 PASSED, 0 SKIPPED, 0 FAILED
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20240728114612.48486-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and selftests parts in sync.
[1] https://github.com/llvm/llvm-project/pull/105417
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240822084112.3257995-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Check that verifier rejects BPF program containing relocation
pointing to non-existent BTF type.
To force relocation resolution on kernel side test case uses
bpf_attr->core_relos field. This field is not exposed by libbpf,
so directly do BPF system call in the test.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240822080124.2995724-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A program calling sub-program which does a tail call.
The idea is to verify instructions generated by jit for tail calls:
- in program and sub-program prologues;
- for subprogram call instruction;
- for tail call itself.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240820102357.3372779-9-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Rewrite function to have (unneeded) socket descriptors automatically
close()d when leaving the scope. Make sure the "ownership" of fds is
correctly passed via take_fd(); i.e. descriptor returned to caller will
remain valid.
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-6-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Constants got switched reducing the test's coverage. Replace SOCK_DGRAM
with SOCK_STREAM in one of unix_inet_skb_redir_to_connected() tests.
Fixes: 51354f700d ("bpf, sockmap: Add af_unix test with both sockets in map")
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-5-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Do actually test the sotype as specified by the caller.
This picks up after commit 75e0e27db6 ("selftest/bpf: Change udp to inet
in some function names").
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-4-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Replace implementation with a call to a generic function.
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-3-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Following create_pair() changes, remove unused function argument in
create_socket_pairs() and adapt its callers, i.e. drop the open-coded
loopback socket creation.
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-2-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Extend the function to allow creating socket pairs of SOCK_STREAM,
SOCK_DGRAM and SOCK_SEQPACKET.
Adapt direct callers and leave further cleanups for the following patch.
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-1-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Enable traffic monitoring for the subtests of select_reuseport.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240815053254.470944-7-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Enable traffic monitor for each subtest of sockmap_listen.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240815053254.470944-6-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Enable traffic monitoring for the test case tc_redirect.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240815053254.470944-5-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
test_skb_cgroup_id_user allows testing skb cgroup id retrieval at different
levels, but is not integrated in test_progs, so it is not run
automatically in CI. The test overlaps a bit with
cgroup_skb_sk_lookup_kern, which is integrated in test_progs and test
extensively skb cgroup helpers, but there is still one major difference
between the two tests which justifies the conversion:
cgroup_skb_sk_lookup_kern deals with a BPF_PROG_TYPE_CGROUP_SKB (attached
on a cgroup), while test_skb_cgroup_id_user deals with a
BPF_PROG_TYPE_SCHED_CLS (attached on a qdisc)
Convert test_skb_cgroup_id_user into test_progs framework in order to run
it automatically in CI. The main differences with the original test are the
following:
- rename the test to make it shorter and more straightforward regarding
tested feature
- the wrapping shell script has been dropped since every setup step is now
handled in the main C test file
- the test has been renamed for a shorter name and reflecting the tested
API
- add dedicated assert log per level to ease test failure debugging
- use global variables instead of maps to access bpf prog data
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20240813-convert_cgroup_tests-v4-4-a33c03458cf6@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
test_cgroup_storage is currently a standalone program which is not run
when executing test_progs.
Convert it to the test_progs framework so it can be automatically executed
in CI. The conversion led to the following changes:
- converted the raw bpf program in the userspace test file into a dedicated
test program in progs/ dir
- reduced the scope of cgroup_storage test: the content from this test
overlaps with some other tests already present in test_progs, most
notably netcnt and cgroup_storage_multi*. Those tests already check
extensively local storage, per-cpu local storage, cgroups interaction,
etc. So the new test only keep the part testing that the program return
code (based on map content) properly leads to packet being passed or
dropped.
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20240813-convert_cgroup_tests-v4-2-a33c03458cf6@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
get_current_cgroup_id_user allows testing for bpf_get_current_cgroup_id()
bpf API but is not integrated into test_progs, and so is not tested
automatically in CI.
Convert it to the test_progs framework to allow running it automatically.
The most notable differences with the old test are the following:
- the new test relies on autoattach instead of manually hooking/enabling
the targeted tracepoint through perf_event, which reduces quite a lot the
test code size
- it also accesses bpf prog data through global variables instead of maps
- sleep duration passed to nanosleep syscall has been reduced to its
minimum to not impact overall CI duration (we only care about the syscall
being properly triggered, not about the passed duration)
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20240813-convert_cgroup_tests-v4-1-a33c03458cf6@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
On little-endian systems, doing subtraction after htons()
leads to interesting results:
Given:
MAGIC_BYTES = 123 = 0x007B aka. in big endian: 0x7B00 = 31488
sizeof(struct iphdr) = 20
Before this patch:
__bpf_constant_htons(MAGIC_BYTES) - sizeof(struct iphdr) = 0x7AEC
0x7AEC = htons(0xEC7A) = htons(60538)
So these were outer IP packets with a total length of 123 bytes,
containing an inner IP packet with a total length of 60538 bytes.
After this patch:
__bpf_constant_htons(MAGIC_BYTES - sizeof(struct iphdr)) = htons(103)
Now these packets are outer IP packets with a total length of 123 bytes,
containing an inner IP packet with a total length of 103 bytes.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Link: https://lore.kernel.org/r/20240808075906.1849564-1-ast@fiberby.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add tests to set TCP sockopt TCP_BPF_SOCK_OPS_CB_FLAGS via
bpf_setsockopt() and use a cgroup/getsockopt program to retrieve
the value to verify it was set.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20240808150558.1035626-3-alan.maguire@oracle.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add test for bpf_get_dentry_xattr on hook security_inode_getxattr.
Verify that the kfunc can read the xattr. Also test failing getxattr
from user space by returning non-zero from the LSM bpf program.
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240806230904.71194-4-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a bunch of positive selftests which extensively cover the various
contexts and parameters in which the new VFS based BPF kfuncs may be
used from.
Again, the following VFS based BPF kfuncs are thoroughly tested within
this new selftest:
* struct file *bpf_get_task_exe_file(struct task_struct *);
* void bpf_put_file(struct file *);
* int bpf_path_d_path(struct path *, char *, size_t);
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20240731110833.1834742-4-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a bunch of negative selftests responsible for asserting that the
BPF verifier successfully rejects a BPF program load when the
underlying BPF program misuses one of the newly introduced VFS based
BPF kfuncs.
The following VFS based BPF kfuncs are extensively tested within this
new selftest:
* struct file *bpf_get_task_exe_file(struct task_struct *);
* void bpf_put_file(struct file *);
* int bpf_path_d_path(struct path *, char *, size_t);
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20240731110833.1834742-3-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jakub reported bpf selftest "btf_dump" failure after forwarding to
v6.11-rc1 with netdev.
Error: #33 btf_dump
Error: #33/15 btf_dump/btf_dump: var_data
btf_dump_data:FAIL:find type id unexpected find type id: actual -2 < expected 0
The reason for the failure is due to
commit 94ede2a3e9 ("profiling: remove stale percpu flip buffer variables")
where percpu static variable "cpu_profile_flip" is removed.
Let us replace "cpu_profile_flip" with a variable in bpf subsystem
so whenever that variable gets deleted or renamed, we can detect the
failure immediately. In this case, I picked a static percpu variable
"bpf_cgrp_storage_busy" which is defined in kernel/bpf/bpf_cgrp_storage.c.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240802185434.1749056-1-yonghong.song@linux.dev
These are three important bug fixes for the cross-architecture tree,
fixing a regression with the new syscall.tbl file, the inconsistent
numbering for the new uretprobe syscall and a bug with iowrite64be
on alpha.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmas3bQACgkQYKtH/8kJ
Uicf5xAAhNJuljx11D1WuMq5tT+tKHGTGzHZnQr1S2cNqo26Voeqm1ChaWT5U8Cg
DBg+uSs5xj9aqdw7D/NVreCX/+JoxasJe6caqJhXqXOuMp8/GHIDaFAQV0SGw6Qy
uwAzkRlz9RDZLFROZHoP+UsNr7X5Fv7C8i7Hpq2NusFnpHZTqDoUFas+g3LFy+Ex
K/+y8nw1f6W5ZRl64LH1rBGFPWw2xSu/busghR9HjYXyc9sIQbMxq+14KbNuK8FK
RTxzjUlmUDjX/Vcv7CW+C16mzQl04o4cFFK6xwGUxUaFYhWZHywkdfVvf35gYzCt
Gv3LUn75pVjHOX7rQQcmBKSdHc6wGx1ZmV5vrAABO5c/docsMXdjUSWw2fOOmiB4
JNNuGRK1CCfckpjMdJEXyUrchzI6W9mzqLZAShhNA0T9XnDhR7oekoFWwbDWiCyO
oFPpjYZEtZdD0IdBC5bxhK2ES7HLuidiUL017+ZSSRIgAD9vQrXt2hTPYUmNJEC6
YRsdOXm/tU3RrU/oKqzTnjW6fv7jZSAgND8TFEZFKGXrpjCipKqkrxyqNk3uiHMn
YPEIXsBAeIPcTJGArkuyTRn2bGg2qW2NkmUze7cWkPfaj8P+Ljf9EQ54I533f94R
Qm9G6d4c9ZmJg0h4YZ+WhPPXtTvf22Xdfp/56q8M52U/ThvTBuI=
=0iyr
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"These are three important bug fixes for the cross-architecture tree,
fixing a regression with the new syscall.tbl file, the inconsistent
numbering for the new uretprobe syscall and a bug with iowrite64be on
alpha"
* tag 'asm-generic-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
syscalls: fix syscall macros for newfstat/newfstatat
uretprobe: change syscall number, again
alpha: fix ioread64be()/iowrite64be() helpers
Despite multiple attempts to get the syscall number assignment right
for the newly added uretprobe syscall, we ended up with a bit of a mess:
- The number is defined as 467 based on the assumption that the
xattrat family of syscalls would use 463 through 466, but those
did not make it into 6.11.
- The include/uapi/asm-generic/unistd.h file still lists the number
463, but the new scripts/syscall.tbl that was supposed to have the
same data lists 467 instead as the number for arc, arm64, csky,
hexagon, loongarch, nios2, openrisc and riscv. None of these
architectures actually provide a uretprobe syscall.
- All the other architectures (powerpc, arm, mips, ...) don't list
this syscall at all.
There are two ways to make it consistent again: either list it with
the same syscall number on all architectures, or only list it on x86
but not in scripts/syscall.tbl and asm-generic/unistd.h.
Based on the most recent discussion, it seems like we won't need it
anywhere else, so just remove the inconsistent assignment and instead
move the x86 number to the next available one in the architecture
specific range, which is 335.
Fixes: 5c28424e9a ("syscalls: Fix to add sys_uretprobe to syscall.tbl")
Fixes: 190fec72df ("uprobe: Wire up uretprobe system call")
Fixes: 63ded11097 ("uprobe: Change uretprobe syscall scope and number")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Current cgroup_dev test mostly tests that device operation is accepted or
refused base on passed major/minor (and so, any operation performed during
test involves only char device)
Add a small subtest ensuring that the device type passed to bpf program
allows it to take decisions as well.
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20240731-convert_dev_cgroup-v4-3-849425d90de6@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
test_dev_cgroup is defined as a standalone test program, and so is not
executed in CI.
Convert it to test_progs framework so it is tested automatically in CI, and
remove the old test. In order to be able to run it in test_progs, /dev/null
must remain usable, so change the new test to test operations on devices
1:3 as valid, and operations on devices 1:5 (/dev/zero) as invalid.
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20240731-convert_dev_cgroup-v4-2-849425d90de6@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
When building selftests with RELEASE=1 using GCC compiler, it complaints
about uninitialized err. Fix the problem.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240730231805.1933923-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Linux 5.1 implemented 64-bit time types and related syscalls to address the
Y2038 problem generally across archs. Userspace handling of Y2038 varies
with the libc however. While musl libc uses 64-bit time across all 32-bit
and 64-bit platforms, GNU glibc uses 64-bit time on 64-bit platforms but
defaults to 32-bit time on 32-bit platforms unless they "opt-in" to 64-bit
time or explicitly use 64-bit syscalls and time structures.
One specific area is the standard setsockopt() call, SO_TIMESTAMPNS option
used for timestamping, and the related output 'struct timespec'. GNU glibc
defaults as above, also exposing the SO_TIMESTAMPNS_NEW flag to explicitly
use a 64-bit call and 'struct __kernel_timespec'. Since these are not
exposed or needed with musl libc, their use in tc_redirect.c leads to
compile errors building for mips64el/musl:
tc_redirect.c: In function 'rcv_tstamp':
tc_redirect.c:425:32: error: 'SO_TIMESTAMPNS_NEW' undeclared (first use in this function); did you mean 'SO_TIMESTAMPNS'?
425 | cmsg->cmsg_type == SO_TIMESTAMPNS_NEW)
| ^~~~~~~~~~~~~~~~~~
| SO_TIMESTAMPNS
tc_redirect.c:425:32: note: each undeclared identifier is reported only once for each function it appears in
tc_redirect.c: In function 'test_inet_dtime':
tc_redirect.c:491:49: error: 'SO_TIMESTAMPNS_NEW' undeclared (first use in this function); did you mean 'SO_TIMESTAMPNS'?
491 | err = setsockopt(listen_fd, SOL_SOCKET, SO_TIMESTAMPNS_NEW,
| ^~~~~~~~~~~~~~~~~~
| SO_TIMESTAMPNS
However, using SO_TIMESTAMPNS_NEW isn't strictly needed, nor is Y2038 being
explicitly tested. The timestamp checks in tc_redirect.c are simple: the
packet receive timestamp is non-zero and processed/handled in less than 5
seconds.
Switch to using the standard setsockopt() call and SO_TIMESTAMPNS option to
ensure compatibility across glibc and musl libc. In the worst-case, there
is a 5-second window 14 years from now where tc_redirect tests may fail on
32-bit systems. However, we should reasonably expect glibc to adopt a
64-bit mandate rather than the current "opt-in" policy before the Y2038
roll-over.
Fixes: ce6f6cffae ("selftests/bpf: Wait for the netstamp_needed_key static key to be turned on")
Fixes: c803475fd8 ("bpf: selftests: test skb->tstamp in redirect_neigh")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/031d656c058b4e55ceae56ef49c4e1729b5090f3.1722244708.git.tony.ambardar@gmail.com
Typically stdin, stdout, stderr are treated as reserved identifiers under
ISO/ANSI C and libc implementations further define these as macros, both in
glibc and musl <stdio.h>.
However, while glibc defines:
...
/* Standard streams. */
extern FILE *stdin; /* Standard input stream. */
extern FILE *stdout; /* Standard output stream. */
extern FILE *stderr; /* Standard error output stream. */
/* C89/C99 say they're macros. Make them happy. */
#define stdin stdin
#define stdout stdout
#define stderr stderr
...
musl instead uses (legally):
...
extern FILE *const stdin;
extern FILE *const stdout;
extern FILE *const stderr;
#define stdin (stdin)
#define stdout (stdout)
#define stderr (stderr)
...
The latter results in compile errors when the names are reused as fields of
'struct test_env' and elsewhere in test_progs.[ch] and reg_bounds.c.
Rename the fields to stdout_saved and stderr_saved to avoid many errors
seen building against musl, e.g.:
In file included from test_progs.h:6,
from test_progs.c:5:
test_progs.c: In function 'print_test_result':
test_progs.c:237:21: error: expected identifier before '(' token
237 | fprintf(env.stdout, "#%-*d %s:", TEST_NUM_WIDTH, test->test_num, test->test_name);
| ^~~~~~
test_progs.c:237:9: error: too few arguments to function 'fprintf'
237 | fprintf(env.stdout, "#%-*d %s:", TEST_NUM_WIDTH, test->test_num, test->test_name);
| ^~~~~~~
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/ZqR2DuHdBXPX%2Fyx8@kodidev-ubuntu/
Link: https://lore.kernel.org/bpf/684ea17548e237f39dfb3f7a3d33450069015b21.1722244708.git.tony.ambardar@gmail.com
Compiling lwt_reroute.c with GCC 12.3 for mips64el/musl-libc yields errors:
In file included from .../include/arpa/inet.h:9,
from ./test_progs.h:18,
from tools/testing/selftests/bpf/prog_tests/lwt_helpers.h:11,
from tools/testing/selftests/bpf/prog_tests/lwt_reroute.c:52:
.../include/netinet/in.h:23:8: error: redefinition of 'struct in6_addr'
23 | struct in6_addr {
| ^~~~~~~~
In file included from .../include/linux/icmp.h:24,
from tools/testing/selftests/bpf/prog_tests/lwt_helpers.h:9:
.../include/linux/in6.h:33:8: note: originally defined here
33 | struct in6_addr {
| ^~~~~~~~
.../include/netinet/in.h:34:8: error: redefinition of 'struct sockaddr_in6'
34 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../include/linux/in6.h:50:8: note: originally defined here
50 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../include/netinet/in.h:42:8: error: redefinition of 'struct ipv6_mreq'
42 | struct ipv6_mreq {
| ^~~~~~~~~
.../include/linux/in6.h:60:8: note: originally defined here
60 | struct ipv6_mreq {
| ^~~~~~~~~
These errors occur because <linux/in6.h> is included before <netinet/in.h>,
bypassing the Linux uapi/libc compat mechanism's partial musl support. As
described in [1] and [2], fix these errors by including <netinet/in.h> in
lwt_reroute.c before any uapi headers.
[1]: commit c0bace7984 ("uapi libc compat: add fallback for unsupported libcs")
[2]: https://git.musl-libc.org/cgit/musl/commit/?id=04983f227238
Fixes: 6c77997bc6 ("selftests/bpf: Add lwt_xmit tests for BPF_REROUTE")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/bd2908aec0755ba8b75f5dc41848b00585f5c73e.1722244708.git.tony.ambardar@gmail.com
Fixing ASSERT_OK condition check in uprobe_syscall test,
otherwise we return from test on pipe success.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240726180847.684584-1-jolsa@kernel.org
test_xdp_veth.sh tests that XDP return codes work as expected, by bringing
up multiple veth pairs isolated in different namespaces, attaching specific
xdp programs to each interface, and ensuring that the whole chain allows to
ping one end interface from the first one. The test runs well but is
currently not integrated in test_progs, which prevents it from being run
automatically in the CI infrastructure.
Rewrite it as a C test relying on libbpf to allow running it in the CI
infrastructure. The new code brings up the same network infrastructure and
reuses the same eBPF programs as test_xdp_veth.sh, for which skeletons are
already generated by the bpf tests makefile.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/bpf/20240721-convert_test_xdp_veth-v4-2-23bdba21b2f9@bootlin.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Remove a redundant include of '<linux/in6.h>', whose needed definitions are
already provided by 'test_progs.h'. This avoids errors seen compiling for
mips64el/musl-libc:
In file included from .../arpa/inet.h:9,
from ./test_progs.h:17,
from prog_tests/crypto_sanity.c:10:
.../netinet/in.h:23:8: error: redefinition of 'struct in6_addr'
23 | struct in6_addr {
| ^~~~~~~~
In file included from crypto_sanity.c:7:
.../linux/in6.h:33:8: note: originally defined here
33 | struct in6_addr {
| ^~~~~~~~
.../netinet/in.h:34:8: error: redefinition of 'struct sockaddr_in6'
34 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../linux/in6.h:50:8: note: originally defined here
50 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../netinet/in.h:42:8: error: redefinition of 'struct ipv6_mreq'
42 | struct ipv6_mreq {
| ^~~~~~~~~
.../linux/in6.h:60:8: note: originally defined here
60 | struct ipv6_mreq {
| ^~~~~~~~~
Fixes: 91541ab192 ("selftests: bpf: crypto skcipher algo selftests")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/bpf/911293968f424ad7b462d8805aeb3baee8f4985b.1721713597.git.tony.ambardar@gmail.com
Remove a redundant include of '<linux/in6.h>', whose needed definitions are
already provided by 'test_progs.h'. This avoids errors seen compiling for
mips64el/musl-libc:
In file included from .../arpa/inet.h:9,
from ./test_progs.h:17,
from prog_tests/decap_sanity.c:9:
.../netinet/in.h:23:8: error: redefinition of 'struct in6_addr'
23 | struct in6_addr {
| ^~~~~~~~
In file included from decap_sanity.c:7:
.../linux/in6.h:33:8: note: originally defined here
33 | struct in6_addr {
| ^~~~~~~~
.../netinet/in.h:34:8: error: redefinition of 'struct sockaddr_in6'
34 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../linux/in6.h:50:8: note: originally defined here
50 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../netinet/in.h:42:8: error: redefinition of 'struct ipv6_mreq'
42 | struct ipv6_mreq {
| ^~~~~~~~~
.../linux/in6.h:60:8: note: originally defined here
60 | struct ipv6_mreq {
| ^~~~~~~~~
Fixes: 70a00e2f1d ("selftests/bpf: Test bpf_skb_adjust_room on CHECKSUM_PARTIAL")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/e986ba2d7edccd254b54f7cd049b98f10bafa8c3.1721713597.git.tony.ambardar@gmail.com
Remove a redundant include of '<linux/icmp.h>' which is already provided in
'lwt_helpers.h'. This avoids errors seen compiling for mips64el/musl-libc:
In file included from .../arpa/inet.h:9,
from lwt_redirect.c:51:
.../netinet/in.h:23:8: error: redefinition of 'struct in6_addr'
23 | struct in6_addr {
| ^~~~~~~~
In file included from .../linux/icmp.h:24,
from lwt_redirect.c:50:
.../linux/in6.h:33:8: note: originally defined here
33 | struct in6_addr {
| ^~~~~~~~
.../netinet/in.h:34:8: error: redefinition of 'struct sockaddr_in6'
34 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../linux/in6.h:50:8: note: originally defined here
50 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../netinet/in.h:42:8: error: redefinition of 'struct ipv6_mreq'
42 | struct ipv6_mreq {
| ^~~~~~~~~
.../linux/in6.h:60:8: note: originally defined here
60 | struct ipv6_mreq {
| ^~~~~~~~~
Fixes: 43a7c3ef8a ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/3869dda876d5206d2f8d4dd67331c739ceb0c7f8.1721713597.git.tony.ambardar@gmail.com
The type 'loff_t' is a GNU extension and not exposed by the musl 'fcntl.h'
header unless _GNU_SOURCE is defined. Add this definition to fix errors
seen compiling for mips64el/musl-libc:
In file included from tools/testing/selftests/bpf/prog_tests/core_reloc.c:4:
./bpf_testmod/bpf_testmod.h:10:9: error: unknown type name 'loff_t'
10 | loff_t off;
| ^~~~~~
./bpf_testmod/bpf_testmod.h:16:9: error: unknown type name 'loff_t'
16 | loff_t off;
| ^~~~~~
Fixes: 6bcd39d366 ("selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/11c3af75a7eb6bcb7ad9acfae6a6f470c572eb82.1721713597.git.tony.ambardar@gmail.com
The GNU version of 'struct tcp_info' in 'netinet/tcp.h' is not exposed by
musl headers unless _GNU_SOURCE is defined.
Add this definition to fix errors seen compiling for mips64el/musl-libc:
tcp_rtt.c: In function 'wait_for_ack':
tcp_rtt.c:24:25: error: storage size of 'info' isn't known
24 | struct tcp_info info;
| ^~~~
tcp_rtt.c:24:25: error: unused variable 'info' [-Werror=unused-variable]
cc1: all warnings being treated as errors
Fixes: 1f4f80fed2 ("selftests/bpf: test_progs: convert test_tcp_rtt")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/f2329767b15df206f08a5776d35a47c37da855ae.1721713597.git.tony.ambardar@gmail.com
The GNU version of 'struct tcphdr' has members 'doff', 'source' and 'dest',
which are not exposed by musl libc headers unless _GNU_SOURCE is defined.
Add this definition to fix errors seen compiling for mips64el/musl-libc:
flow_dissector.c:118:30: error: 'struct tcphdr' has no member named 'doff'
118 | .tcp.doff = 5,
| ^~~~
flow_dissector.c:119:30: error: 'struct tcphdr' has no member named 'source'
119 | .tcp.source = 80,
| ^~~~~~
flow_dissector.c:120:30: error: 'struct tcphdr' has no member named 'dest'
120 | .tcp.dest = 8080,
| ^~~~
Fixes: ae173a9157 ("selftests/bpf: support BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/8f7ab21a73f678f9cebd32b26c444a686e57414d.1721713597.git.tony.ambardar@gmail.com
The GNU version of 'struct tcphdr' with member 'doff' is not exposed by
musl headers unless _GNU_SOURCE is defined. Add this definition to fix
errors seen compiling for mips64el/musl-libc:
In file included from kfree_skb.c:2:
kfree_skb.c: In function 'on_sample':
kfree_skb.c:45:30: error: 'struct tcphdr' has no member named 'doff'
45 | if (CHECK(pkt_v6->tcp.doff != 5, "check_tcp",
| ^
Fixes: 580d656d80 ("selftests/bpf: Add kfree_skb raw_tp test")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/e2d8cedc790959c10d6822a51f01a7a3616bea1b.1721713597.git.tony.ambardar@gmail.com
The GNU version of 'struct tcphdr', with members 'doff' and 'urg_ptr', is
not exposed by musl headers unless _GNU_SOURCE is defined.
Add this definition to fix errors seen compiling for mips64el/musl-libc:
parse_tcp_hdr_opt.c:18:21: error: 'struct tcphdr' has no member named 'urg_ptr'
18 | .pk6_v6.tcp.urg_ptr = 123,
| ^~~~~~~
parse_tcp_hdr_opt.c:19:21: error: 'struct tcphdr' has no member named 'doff'
19 | .pk6_v6.tcp.doff = 9, /* 16 bytes of options */
| ^~~~
Fixes: cfa7b01189 ("selftests/bpf: tests for using dynptrs to parse skb and xdp buffers")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/ac5440213c242c62cb4e0d9e0a9cd5058b6a31f6.1721713597.git.tony.ambardar@gmail.com
Update ns_current_pid_tgid.c to use '#include <fcntl.h>' and avoid compile
error against mips64el/musl libc:
In file included from .../prog_tests/ns_current_pid_tgid.c:14:
.../include/sys/fcntl.h:1:2: error: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
| ^~~~~~~
cc1: all warnings being treated as errors
Fixes: 09c02d553c ("bpf, selftests: Fold test_current_pid_tgid_new_ns into test_progs.")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/8bdc869749177b575025bf69600a4ce591822609.1721713597.git.tony.ambardar@gmail.com
Explicitly include '<linux/build_bug.h>' to fix errors seen compiling with
gcc targeting mips64el/musl-libc:
user_ringbuf.c: In function 'test_user_ringbuf_loop':
user_ringbuf.c:426:9: error: implicit declaration of function 'BUILD_BUG_ON' [-Werror=implicit-function-declaration]
426 | BUILD_BUG_ON(total_samples <= c_max_entries);
| ^~~~~~~~~~~~
cc1: all warnings being treated as errors
Fixes: e5a9df51c7 ("selftests/bpf: Add selftests validating the user ringbuf")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/b28575f9221ec54871c46a2e87612bb4bbf46ccd.1721713597.git.tony.ambardar@gmail.com
The addition of general support for unprivileged tests in test_loader.c
breaks building test_verifier on non-glibc (e.g. musl) systems, due to the
inclusion of glibc extension '<error.h>' in 'unpriv_helpers.c'. However,
the header is actually not needed, so remove it to restore building.
Similarly for sk_lookup.c and flow_dissector.c, error.h is not necessary
and causes problems, so drop them.
Fixes: 1d56ade032 ("selftests/bpf: Unprivileged tests for test_loader.c")
Fixes: 0ab5539f85 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Fixes: 0905beec9f ("selftests/bpf: run flow dissector tests in skb-less mode")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/5664367edf5fea4f3f4b4aec3b182bcfc6edff9c.1721713597.git.tony.ambardar@gmail.com
Existing code calls getsockname() with a 'struct sockaddr_in6 *' argument
where a 'struct sockaddr *' argument is declared, yielding compile errors
when building for mips64el/musl-libc:
bpf_iter_setsockopt.c: In function 'get_local_port':
bpf_iter_setsockopt.c:98:30: error: passing argument 2 of 'getsockname' from incompatible pointer type [-Werror=incompatible-pointer-types]
98 | if (!getsockname(fd, &addr, &addrlen))
| ^~~~~
| |
| struct sockaddr_in6 *
In file included from .../netinet/in.h:10,
from .../arpa/inet.h:9,
from ./test_progs.h:17,
from bpf_iter_setsockopt.c:5:
.../sys/socket.h:391:23: note: expected 'struct sockaddr * restrict' but argument is of type 'struct sockaddr_in6 *'
391 | int getsockname (int, struct sockaddr *__restrict, socklen_t *__restrict);
| ^
cc1: all warnings being treated as errors
This compiled under glibc only because the argument is declared to be a
"funky" transparent union which includes both types above. Explicitly cast
the argument to allow compiling for both musl and glibc.
Fixes: eed92afdd1 ("bpf: selftest: Test batching and bpf_(get|set)sockopt in bpf tcp iter")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Geliang Tang <geliang@kernel.org>
Link: https://lore.kernel.org/bpf/f41def0f17b27a23b1709080e4e3f37f4cc11ca9.1721713597.git.tony.ambardar@gmail.com
Tests for no_caller_saved_registers processing logic
(see verifier.c:match_and_mark_nocsr_pattern()):
- a canary positive test case;
- a canary test case for arm64 and riscv64;
- various tests with broken patterns;
- tests with read/write fixed/varying stack access that violate nocsr
stack access contract;
- tests with multiple subprograms;
- tests using nocsr in combination with may_goto/bpf_loop,
as all of these features affect stack depth;
- tests for nocsr stack spills below max stack depth.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240722233844.1406874-11-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
struct bpf_insn *disasm_insn(struct bpf_insn *insn, char *buf, size_t buf_sz);
Disassembles instruction 'insn' to a text buffer 'buf'.
Removes insn->code hex prefix added by kernel disassembly routine.
Returns a pointer to the next instruction
(increments insn by either 1 or 2).
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240722233844.1406874-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Add a few reg_bounds selftests to test 32/16/8-bit ldsx and subreg comparison.
Without the previous patch, all added tests will fail.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240723162940.2732171-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Adding test that attaches/detaches multiple consumers on
single uprobe and verifies all were hit as expected.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240722202758.3889061-3-jolsa@kernel.org
Regular BPF hash map is not mmap-able from user space. However, map-in-map
with outer map of type BPF_MAP_TYPE_HASH_OF_MAPS and mmap-able array as
inner map can perform similar operations as a mmap-able hash map. This
can be used by applications that benefit from fast accesses to some local
data.
Add a selftest to show this use case.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240723051455.1589192-1-song@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
No need to use a dedicated helper inetaddr_len() to get the length of
the IPv4 or IPv6 address, it can be got by make_sockaddr(), this patch
drops it.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/32e2a4122921051da38a6e4fbb2ebee5f0af5a4e.1721475357.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
This patch uses the public network helers client_socket() + make_sockaddr()
in sk_lookup.c to create the client socket, set the timeout sockopts, and
make the connecting address. The local defined function make_socket()
can be dropped then.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/588771977ac48c27f73526d8421a84b91d7cf218.1721475357.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
This patch uses the new helper connect_to_addr_str() in sk_lookup.c to
create the client socket and connect to the server, instead of using local
defined function make_client(). This local function can be dropped then.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/058199d7ab46802249dae066ca22c98f6be508ee.1721475357.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Add test for lsm tail call to ensure tail call can only be used between
bpf lsm progs attached to the same hook.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20240719110059.797546-9-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
There is an existing "bpf_tcp_ca/unsupp_cong_op" test to ensure
the unsupported tcp-cc "get_info" struct_ops prog cannot be loaded.
This patch adds a new test in the bpf_testmod such that the
unsupported ops test does not depend on other kernel subsystem
where its supporting ops may be changed in the future.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240722183049.2254692-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
The struct member "must_fail" of network_helper_opts() is only used in
cgroup_v1v2 tests, it makes sense to drop it from network_helper_opts.
Return value (fd) of connect_to_fd_opts() and the expect errno (EPERM)
can be checked in cgroup_v1v2.c directly, no need to check them in
connect_fd_to_addr() in network_helpers.c.
This also makes connect_fd_to_addr() function useless. It can be replaced
by connect().
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/3faf336019a9a48e2e8951f4cdebf19e3ac6e441.1721282219.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
The "type" parameter of connect_to_fd_opts() is redundant of "server_fd".
Since the "type" can be obtained inside by invoking getsockopt(SO_TYPE),
without passing it in as a parameter.
This patch drops the "type" parameter of connect_to_fd_opts() and updates
its callers.
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/50d8ce7ab7ab0c0f4d211fc7cc4ebe3d3f63424c.1721282219.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZqIl1AAKCRDbK58LschI
g/MdAP9oyZV9/IZ6Y6Z1fWfio0SB+yJGugcwbFjWcEtNrzsqJQEAwipQnemAI4NC
HBMfK2a/w7vhAFMXrP/SbkB/gUJJ7QE=
=vovf
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-07-25
We've added 14 non-merge commits during the last 8 day(s) which contain
a total of 19 files changed, 177 insertions(+), 70 deletions(-).
The main changes are:
1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and
BPF sockhash. Also add test coverage for this case, from Michal Luczaj.
2) Fix a segmentation issue when downgrading gso_size in the BPF helper
bpf_skb_adjust_room(), from Fred Li.
3) Fix a compiler warning in resolve_btfids due to a missing type cast,
from Liwei Song.
4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte
boundary in the fexit_sleep BPF selftest, from Puranjay Mohan.
5) Fix a xsk regression to require a flag when actuating tx_metadata_len,
from Stanislav Fomichev.
6) Fix function prototype BTF dumping in libbpf for prototypes that have
no input arguments, from Andrii Nakryiko.
7) Fix stacktrace symbol resolution in perf script for BPF programs
containing subprograms, from Hou Tao.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
bpf: Fix a segment issue when downgrading gso_size
tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids
bpf, events: Use prog to emit ksymbol event for main program
selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB
selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags
selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected()
af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash
bpftool: Fix typo in usage help
libbpf: Fix no-args func prototype BTF dumping syntax
MAINTAINERS: Update powerpc BPF JIT maintainers
MAINTAINERS: Update email address of Naveen
selftests/bpf: fexit_sleep: Fix stack allocation for arm64
====================
Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This flag is now required to use tx_metadata_len.
Fixes: 40808a237d ("selftests/bpf: Add TX side to xdp_metadata")
Reported-by: Julian Schindel <mail@arctic-alpaca.de>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240713015253.121248-3-sdf@fomichev.me
Uprobes:
- x86/shstk: Make return uprobe work with shadow stack.
- Add uretprobe syscall which speeds up the uretprobe 10-30% faster. This
syscall is automatically used from user-space trampolines which are
generated by the uretprobe. If this syscall is used by normal
user program, it will cause SIGILL. Note that this is currently only
implemented on x86_64.
(This also has 2 fixes for adjusting the syscall number to avoid conflict
with new *attrat syscalls.)
- uprobes/perf: fix user stack traces in the presence of pending uretprobe.
This corrects the uretprobe's trampoline address in the stacktrace with
correct return address.
- selftests/x86: Add a return uprobe with shadow stack test.
- selftests/bpf: Add uretprobe syscall related tests.
. test case for register integrity check.
. test case with register changing case.
. test case for uretprobe syscall without uprobes (expected to be failed).
. test case for uretprobe with shadow stack.
- selftests/bpf: add test validating uprobe/uretprobe stack traces
- MAINTAINERS: Add uprobes entry. This does not specify the tree but to
clarify who maintains and reviews the uprobes.
Kprobes:
- tracing/kprobes: Test case cleanups. Replace redundant WARN_ON_ONCE() +
pr_warn() with WARN_ONCE() and remove unnecessary code from selftest.
- tracing/kprobes: Add symbol counting check when module loads. This
checks the uniqueness of the probed symbol on modules. The same check
has already done for kernel symbols.
(This also has a fix for build error with CONFIG_MODULES=n)
Cleanup:
- Add MODULE_DESCRIPTION() macros for fprobe and kprobe examples.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmaWYxwbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bsUgH/3JcSzDZujQWCZ1f4fJn
QecvTFSYcCl6ck8+/3wm4EsgeCXIFOyPnoPc7k2Gm+l6Dlk1DKGV6wV4tuKFUq9X
9mplcwoVA0Ln+EX9zv9v4s99yUGxcU9xjgC9XT7J52SvqYncPIi6dR0Z9wlJBmyd
Bx3cZk+wSzCYaoqYngI2fKlzsEcYgDIP999fQPRi0HGzNZujc4xeJyjCTC/48yWO
9kreRQq6wFdgRQTwMcR/fKPDKIGZQCU8jkXv5crVV5K3rNaBcwBmCJJMP8PzPU0V
UQ0+8RZK+Qk8SBwXcMNVRqm/efTderob4IYxP8OBe5wjAIE7+vu8r6sqwxRIS54M
Cyg=
=DRSr
-----END PGP SIGNATURE-----
Merge tag 'probes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
"Uprobes:
- x86/shstk: Make return uprobe work with shadow stack
- Add uretprobe syscall which speeds up the uretprobe 10-30% faster.
This syscall is automatically used from user-space trampolines
which are generated by the uretprobe. If this syscall is used by
normal user program, it will cause SIGILL. Note that this is
currently only implemented on x86_64.
(This also has two fixes for adjusting the syscall number to avoid
conflict with new *attrat syscalls.)
- uprobes/perf: fix user stack traces in the presence of pending
uretprobe. This corrects the uretprobe's trampoline address in the
stacktrace with correct return address
- selftests/x86: Add a return uprobe with shadow stack test
- selftests/bpf: Add uretprobe syscall related tests.
- test case for register integrity check
- test case with register changing case
- test case for uretprobe syscall without uprobes (expected to fail)
- test case for uretprobe with shadow stack
- selftests/bpf: add test validating uprobe/uretprobe stack traces
- MAINTAINERS: Add uprobes entry. This does not specify the tree but
to clarify who maintains and reviews the uprobes
Kprobes:
- tracing/kprobes: Test case cleanups.
Replace redundant WARN_ON_ONCE() + pr_warn() with WARN_ONCE() and
remove unnecessary code from selftest
- tracing/kprobes: Add symbol counting check when module loads.
This checks the uniqueness of the probed symbol on modules. The
same check has already done for kernel symbols
(This also has a fix for build error with CONFIG_MODULES=n)
Cleanup:
- Add MODULE_DESCRIPTION() macros for fprobe and kprobe examples"
* tag 'probes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
MAINTAINERS: Add uprobes entry
selftests/bpf: Change uretprobe syscall number in uprobe_syscall test
uprobe: Change uretprobe syscall scope and number
tracing/kprobes: Fix build error when find_module() is not available
tracing/kprobes: Add symbol counting check when module loads
selftests/bpf: add test validating uprobe/uretprobe stack traces
perf,uprobes: fix user stack traces in the presence of pending uretprobes
tracing/kprobe: Remove cleanup code unrelated to selftest
tracing/kprobe: Integrate test warnings into WARN_ONCE
selftests/bpf: Add uretprobe shadow stack test
selftests/bpf: Add uretprobe syscall call from user space test
selftests/bpf: Add uretprobe syscall test for regs changes
selftests/bpf: Add uretprobe syscall test for regs integrity
selftests/x86: Add return uprobe shadow stack test
uprobe: Add uretprobe syscall to speed up return probe
uprobe: Wire up uretprobe system call
x86/shstk: Make return uprobe work with shadow stack
samples: kprobes: add missing MODULE_DESCRIPTION() macros
fprobe: add missing MODULE_DESCRIPTION() macro
Verify that out-of-band packets are silently dropped before they reach the
redirection logic.
The idea is to test with a 2 byte long send(). Should a MSG_OOB flag be in
use, only the last byte will be treated as out-of-band. Test fails if
verd_mapfd indicates a wrong number of packets processed (e.g. if OOB
wasn't dropped at the source) or if it was possible to recv() MSG_OOB from
the mapped socket, or if any stale OOB data have been left reachable from
the unmapped socket.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20240713200218.2140950-5-mhal@rbox.co
Extend pairs_redir_to_connected() and unix_inet_redir_to_connected() with a
send_flags parameter. Replace write() with send() allowing packets to be
sent as MSG_OOB.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20240713200218.2140950-4-mhal@rbox.co
Function ignores the AF_UNIX socket type argument, SOCK_DGRAM is hardcoded.
Fix to respect the argument provided.
Fixes: 75e0e27db6 ("selftest/bpf: Change udp to inet in some function names")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20240713200218.2140950-3-mhal@rbox.co
On ARM64 the stack pointer should be aligned at a 16 byte boundary or
the SPAlignmentFault can occur. The fexit_sleep selftest allocates the
stack for the child process as a character array, this is not guaranteed
to be aligned at 16 bytes.
Because of the SPAlignmentFault, the child process is killed before it
can do the nanosleep call and hence fentry_cnt remains as 0. This causes
the main thread to hang on the following line:
while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2);
Fix this by allocating the stack using mmap() as described in the
example in the man page of clone().
Remove the fexit_sleep test from the DENYLIST of arm64.
Fixes: eddbe8e652 ("selftest/bpf: Add a test to check trampoline freeing logic.")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240715173327.8657-1-puranjay@kernel.org
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZpGVmAAKCRDbK58LschI
gxB4AQCgquQis63yqTI36j4iXBT+TuxHEBNoQBSLyzYdrLS1dgD/S5DRJDA+3LD+
394hn/VtB1qvX5vaqjsov4UIwSMyxA0=
=OhSn
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-07-12
We've added 23 non-merge commits during the last 3 day(s) which contain
a total of 18 files changed, 234 insertions(+), 243 deletions(-).
The main changes are:
1) Improve BPF verifier by utilizing overflow.h helpers to check
for overflows, from Shung-Hsi Yu.
2) Fix NULL pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
when attr->attach_prog_fd was not specified, from Tengda Wu.
3) Fix arm64 BPF JIT when generating code for BPF trampolines with
BPF_TRAMP_F_CALL_ORIG which corrupted upper address bits,
from Puranjay Mohan.
4) Remove test_run callback from lwt_seg6local_prog_ops which never worked
in the first place and caused syzbot reports,
from Sebastian Andrzej Siewior.
5) Relax BPF verifier to accept non-zero offset on KF_TRUSTED_ARGS/
/KF_RCU-typed BPF kfuncs, from Matt Bobrowski.
6) Fix a long standing bug in libbpf with regards to handling of BPF
skeleton's forward and backward compatibility, from Andrii Nakryiko.
7) Annotate btf_{seq,snprintf}_show functions with __printf,
from Alan Maguire.
8) BPF selftest improvements to reuse common network helpers in sk_lookup
test and dropping the open-coded inetaddr_len() and make_socket() ones,
from Geliang Tang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
selftests/bpf: Test for null-pointer-deref bugfix in resolve_prog_type()
bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again
bpf: use check_sub_overflow() to check for subtraction overflows
bpf: use check_add_overflow() to check for addition overflows
bpf: fix overflow check in adjust_jmp_off()
bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o
bpf: annotate BTF show functions with __printf
bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG
selftests/bpf: Close obj in error path in xdp_adjust_tail
selftests/bpf: Null checks for links in bpf_tcp_ca
selftests/bpf: Use connect_fd_to_fd in sk_lookup
selftests/bpf: Use start_server_addr in sk_lookup
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Add ASSERT_OK_FD macro
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: fix compilation failure when CONFIG_NF_FLOW_TABLE=m
bpf: Remove tst_run from lwt_seg6local_prog_ops.
bpf: relax zero fixed offset constraint on KF_TRUSTED_ARGS/KF_RCU
...
====================
Link: https://patch.msgid.link/20240712212448.5378-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/act_ct.c
26488172b0 ("net/sched: Fix UAF when resolving a clash")
3abbd7ed8b ("act_ct: prepare for stolen verdict coming from conntrack and nat engine")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a selftest that tries to trigger a situation where two timer callbacks
are attempting to cancel each other's timer. By running them continuously,
we hit a condition where both run in parallel and cancel each other.
Without the fix in the previous patch, this would cause a lockup as
hrtimer_cancel on either side will wait for forward progress from the
callback.
Ensure that this situation leads to a EDEADLK error.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240711052709.2148616-1-memxor@gmail.com
If bpf_object__load() fails in test_xdp_adjust_frags_tail_grow(), "obj"
opened before this should be closed. So use "goto out" to close it instead
of using "return" here.
Fixes: 110221081a ("bpf: selftests: update xdp_adjust_tail selftest to include xdp frags")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/f282a1ed2d0e3fb38cceefec8e81cabb69cab260.1720615848.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Run bpf_tcp_ca selftests (./test_progs -t bpf_tcp_ca) on a Loongarch
platform, some "Segmentation fault" errors occur:
'''
test_dctcp:PASS:bpf_dctcp__open_and_load 0 nsec
test_dctcp:FAIL:bpf_map__attach_struct_ops unexpected error: -524
#29/1 bpf_tcp_ca/dctcp:FAIL
test_cubic:PASS:bpf_cubic__open_and_load 0 nsec
test_cubic:FAIL:bpf_map__attach_struct_ops unexpected error: -524
#29/2 bpf_tcp_ca/cubic:FAIL
test_dctcp_fallback:PASS:dctcp_skel 0 nsec
test_dctcp_fallback:PASS:bpf_dctcp__load 0 nsec
test_dctcp_fallback:FAIL:dctcp link unexpected error: -524
#29/4 bpf_tcp_ca/dctcp_fallback:FAIL
test_write_sk_pacing:PASS:open_and_load 0 nsec
test_write_sk_pacing:FAIL:attach_struct_ops unexpected error: -524
#29/6 bpf_tcp_ca/write_sk_pacing:FAIL
test_update_ca:PASS:open 0 nsec
test_update_ca:FAIL:attach_struct_ops unexpected error: -524
settcpca:FAIL:setsockopt unexpected setsockopt: \
actual -1 == expected -1
(network_helpers.c:99: errno: No such file or directory) \
Failed to call post_socket_cb
start_test:FAIL:start_server_str unexpected start_server_str: \
actual -1 == expected -1
test_update_ca:FAIL:ca1_ca1_cnt unexpected ca1_ca1_cnt: \
actual 0 <= expected 0
#29/9 bpf_tcp_ca/update_ca:FAIL
#29 bpf_tcp_ca:FAIL
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x28)[0x5555567ed91c]
linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x7ffffee408b0]
./test_progs(bpf_link__update_map+0x80)[0x555556824a78]
./test_progs(+0x94d68)[0x5555564c4d68]
./test_progs(test_bpf_tcp_ca+0xe8)[0x5555564c6a88]
./test_progs(+0x3bde54)[0x5555567ede54]
./test_progs(main+0x61c)[0x5555567efd54]
/usr/lib64/libc.so.6(+0x22208)[0x7ffff2aaa208]
/usr/lib64/libc.so.6(__libc_start_main+0xac)[0x7ffff2aaa30c]
./test_progs(_start+0x48)[0x55555646bca8]
Segmentation fault
'''
This is because BPF trampoline is not implemented on Loongarch yet,
"link" returned by bpf_map__attach_struct_ops() is NULL. test_progs
crashs when this NULL link passes to bpf_link__update_map(). This
patch adds NULL checks for all links in bpf_tcp_ca to fix these errors.
If "link" is NULL, goto the newly added label "out" to destroy the skel.
v2:
- use "goto out" instead of "return" as Eduard suggested.
Fixes: 06da9f3bd6 ("selftests/bpf: Test switching TCP Congestion Control algorithms.")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/b4c841492bd4ed97964e4e61e92827ce51bf1dc9.1720615848.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch uses public helper connect_fd_to_fd() exported in
network_helpers.h instead of using getsockname() + connect() in
run_lookup_prog() in prog_tests/sk_lookup.c. This can simplify
the code.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/7077c277cde5a1864cdc244727162fb75c8bb9c5.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch uses public helper start_server_addr() in udp_recv_send()
in prog_tests/sk_lookup.c to simplify the code.
And use ASSERT_OK_FD() to check fd returned by start_server_addr().
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/f11cabfef4a2170ecb66a1e8e2e72116d8f621b3.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch uses public helper start_server_str() to simplify make_server()
in prog_tests/sk_lookup.c.
Add a callback setsockopts() to do all sockopts, set it to post_socket_cb
pointer of struct network_helper_opts. And add a new struct cb_opts to save
the data needed to pass to the callback. Then pass this network_helper_opts
to start_server_str().
Also use ASSERT_OK_FD() to check fd returned by start_server_str().
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/5981539f5591d2c4998c962ef2bf45f34c940548.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
In the error path when update_lookup_map() fails in drop_on_reuseport in
prog_tests/sk_lookup.c, "server1", the fd of server 1, should be closed.
This patch fixes this by using "goto close_srv1" lable instead of "detach"
to close "server1" in this case.
Fixes: 0ab5539f85 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/86aed33b4b0ea3f04497c757845cff7e8e621a2d.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZoxN0AAKCRDbK58LschI
g0c5AQDa3ZV9gfbN42y1zSDoM1uOgO60fb+ydxyOYh8l3+OiQQD/fLfpTY3gBFSY
9yi/pZhw/QdNzQskHNIBrHFGtJbMxgs=
=p1Zz
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-07-08
The following pull-request contains BPF updates for your *net-next* tree.
We've added 102 non-merge commits during the last 28 day(s) which contain
a total of 127 files changed, 4606 insertions(+), 980 deletions(-).
The main changes are:
1) Support resilient split BTF which cuts down on duplication and makes BTF
as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman.
2) Add support for dumping kfunc prototypes from BTF which enables both detecting
as well as dumping compilable prototypes for kfuncs, from Daniel Xu.
3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement
support for BPF exceptions, from Ilya Leoshkevich.
4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support
for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui.
5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option
for skbs and add coverage along with it, from Vadim Fedorenko.
6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives
a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan.
7) Extend the BPF verifier to track the delta between linked registers in order
to better deal with recent LLVM code optimizations, from Alexei Starovoitov.
8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should
have been a pointer to the map value, from Benjamin Tissoires.
9) Extend BPF selftests to add regular expression support for test output matching
and adjust some of the selftest when compiled under gcc, from Cupertino Miranda.
10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always
iterates exactly once anyway, from Dan Carpenter.
11) Add the capability to offload the netfilter flowtable in XDP layer through
kfuncs, from Florian Westphal & Lorenzo Bianconi.
12) Various cleanups in networking helpers in BPF selftests to shave off a few
lines of open-coded functions on client/server handling, from Geliang Tang.
13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so
that x86 JIT does not need to implement detection, from Leon Hwang.
14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an
out-of-bounds memory access for dynpointers, from Matt Bobrowski.
15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as
it might lead to problems on 32-bit archs, from Jiri Olsa.
16) Enhance traffic validation and dynamic batch size support in xsk selftests,
from Tushar Vyavahare.
bpf-next-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits)
selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
bpf: helpers: fix bpf_wq_set_callback_impl signature
libbpf: Add NULL checks to bpf_object__{prev_map,next_map}
selftests/bpf: Remove exceptions tests from DENYLIST.s390x
s390/bpf: Implement exceptions
s390/bpf: Change seen_reg to a mask
bpf: Remove unnecessary loop in task_file_seq_get_next()
riscv, bpf: Optimize stack usage of trampoline
bpf, devmap: Add .map_alloc_check
selftests/bpf: Remove arena tests from DENYLIST.s390x
selftests/bpf: Add UAF tests for arena atomics
selftests/bpf: Introduce __arena_global
s390/bpf: Support arena atomics
s390/bpf: Enable arena
s390/bpf: Support address space cast instruction
s390/bpf: Support BPF_PROBE_MEM32
s390/bpf: Land on the next JITed instruction after exception
s390/bpf: Introduce pre- and post- probe functions
s390/bpf: Get rid of get_probe_mem_regno()
...
====================
Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Check that __sync_*() functions don't cause kernel panics when handling
freed arena pages.
x86_64 does not support some arena atomics yet, and aarch64 may or may
not support them, based on the availability of LSE atomics at run time.
Do not enable this test for these architectures for simplicity.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240701234304.14336-12-iii@linux.ibm.com
Add testcase where 7th argument is struct for architectures with 8 argument
registers, and increase the complexity of the struct.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240702121944.1091530-4-pulehui@huaweicloud.com
Factor out many args tests from tracing_struct and rename some function names
to make more sense. Meanwhile, remove unnecessary skeleton detach operation
as it will be covered by skeleton destroy operation.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240702121944.1091530-3-pulehui@huaweicloud.com
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
e3f02f32a0 ("ionic: fix kernel panic due to multi-buffer handling")
d9c0420999 ("ionic: Mark error paths in the data path as unlikely")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a set of tests to validate that stack traces captured from or in the
presence of active uprobes and uretprobes are valid and complete.
For this we use BPF program that are installed either on entry or exit
of user function, plus deep-nested USDT. One of target funtions
(target_1) is recursive to generate two different entries in the stack
trace for the same uprobe/uretprobe, testing potential edge conditions.
If there is no fixes, we get something like this for one of the scenarios:
caller: 0x758fff - 0x7595ab
target_1: 0x758fd5 - 0x758fff
target_2: 0x758fca - 0x758fd5
target_3: 0x758fbf - 0x758fca
target_4: 0x758fb3 - 0x758fbf
ENTRY #0: 0x758fb3 (in target_4)
ENTRY #1: 0x758fd3 (in target_2)
ENTRY #2: 0x758ffd (in target_1)
ENTRY #3: 0x7fffffffe000
ENTRY #4: 0x7fffffffe000
ENTRY #5: 0x6f8f39
ENTRY #6: 0x6fa6f0
ENTRY #7: 0x7f403f229590
Entry #3 and #4 (0x7fffffffe000) are uretprobe trampoline addresses
which obscure actual target_1 and another target_1 invocations. Also
note that between entry #0 and entry #1 we are missing an entry for
target_3.
With fixes, we get desired full stack traces:
caller: 0x758fff - 0x7595ab
target_1: 0x758fd5 - 0x758fff
target_2: 0x758fca - 0x758fd5
target_3: 0x758fbf - 0x758fca
target_4: 0x758fb3 - 0x758fbf
ENTRY #0: 0x758fb7 (in target_4)
ENTRY #1: 0x758fc8 (in target_3)
ENTRY #2: 0x758fd3 (in target_2)
ENTRY #3: 0x758ffd (in target_1)
ENTRY #4: 0x758ff3 (in target_1)
ENTRY #5: 0x75922c (in caller)
ENTRY #6: 0x6f8f39
ENTRY #7: 0x6fa6f0
ENTRY #8: 0x7f986adc4cd0
Now there is a logical and complete sequence of function calls.
Link: https://lore.kernel.org/all/20240522013845.1631305-5-andrii@kernel.org/
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
add simple kfuncs to create/destroy a context type to bpf_testmod,
register them and add a kfunc_call test to use them. This provides
test coverage for registration of dtor kfuncs from modules.
By transferring the context pointer to a map value as a __kptr
we also trigger the map-based dtor cleanup logic, improving test
coverage.
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240620091733.1967885-7-alan.maguire@oracle.com
Adding selftest to verify that struct_ops maps are auto attached by
bpf skeleton's `*__attach` function.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240621180324.238379-1-yatsenko@meta.com
Since start_server_str() is added now, it can be used in mptcp.c in
start_mptcp_server() instead of using helpers make_sockaddr() and
start_server_addr() to simplify the code.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/16fb3e2cd60b64b5470b0e69f1aa233feaf2717c.1718932493.git.tanggeliang@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In test_bpf_ip_check_defrag_ok(), the new helper client_socket() can be
used to replace connect_to_fd_opts() with "noconnect" opts, and the strcut
member "noconnect" of network_helper_opts can be dropped now, always
connect to server in connect_to_fd_opts().
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/f45760becce51986e4e08283c7df0f933eb0da14.1718932493.git.tanggeliang@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The opts.{type, noconnect} is at least a bit non intuitive or unnecessary.
The only use case now is in test_bpf_ip_check_defrag_ok which ends up
bypassing most (or at least some) of the connect_to_fd_opts() logic. It's
much better that test should have its own connect_to_fd_opts() instead.
This patch adds a new "type" parameter for connect_to_fd_opts(), then
opts->type and getsockopt(SO_TYPE) can be replaced by "type" parameter in
it.
In connect_to_fd(), use getsockopt(SO_TYPE) to get "type" value and pass
it to connect_to_fd_opts().
In bpf_tcp_ca.c and cgroup_v1v2.c, "SOCK_STREAM" types are passed to
connect_to_fd_opts(), and in ip_check_defrag.c, different types "SOCK_RAW"
and "SOCK_DGRAM" are passed to it.
With these changes, the strcut member "type" of network_helper_opts can be
dropped now.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/cfd20b5ad4085c1d1af5e79df3b09013a407199f.1718932493.git.tanggeliang@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
1e7962114c ("bnxt_en: Restore PTP tx_avail count in case of skb_pad() error")
165f87691a ("bnxt_en: add timestamping statistics support")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ensure relocated BTF looks as expected; in this case identical to
original split BTF, with a few duplicate anonymous types added to
split BTF by the relocation process. Also add relocation tests
for edge cases like missing type in base BTF and multiple types
of the same name.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240613095014.357981-5-alan.maguire@oracle.com
Test generation of split+distilled base BTF, ensuring that
- named base BTF STRUCTs and UNIONs are represented as 0-vlen sized
STRUCT/UNIONs
- named ENUM[64]s are represented as 0-vlen named ENUM[64]s
- anonymous struct/unions are represented in full in split BTF
- anonymous enums are represented in full in split BTF
- types unreferenced from split BTF are not present in distilled
base BTF
Also test that with vmlinux BTF and split BTF based upon it,
we only represent needed base types referenced from split BTF
in distilled base.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240613095014.357981-3-alan.maguire@oracle.com
Add special test to be sure that only __nullable BTF params can be
replaced by NULL. This patch adds fake kfuncs in bpf_testmod to
properly test different params.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://lore.kernel.org/r/20240613211817.1551967-6-vadfed@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding test to verify that when called from outside of the
trampoline provided by kernel, the uretprobe syscall will cause
calling process to receive SIGILL signal and the attached bpf
program is not executed.
Link: https://lore.kernel.org/all/20240611112158.40795-8-jolsa@kernel.org/
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Adding test that creates uprobe consumer on uretprobe which changes some
of the registers. Making sure the changed registers are propagated to the
user space when the ureptobe syscall trampoline is used on x86_64.
To be able to do this, adding support to bpf_testmod to create uprobe via
new attribute file:
/sys/kernel/bpf_testmod_uprobe
This file is expecting file offset and creates related uprobe on current
process exe file and removes existing uprobe if offset is 0. The can be
only single uprobe at any time.
The uprobe has specific consumer that changes registers used in ureprobe
syscall trampoline and which are later checked in the test.
Link: https://lore.kernel.org/all/20240611112158.40795-7-jolsa@kernel.org/
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add uretprobe syscall test that compares register values before
and after the uretprobe is hit. It also compares the register
values seen from attached bpf program.
Link: https://lore.kernel.org/all/20240611112158.40795-6-jolsa@kernel.org/
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZmIsRAAKCRDbK58LschI
g4SSAP0bkl6rPMn7zp1h+/l7hlvpp2aVOmasBTe8hIhAGUbluwD/TGq4sNsGgXFI
i4tUtFRhw8pOjy2guy6526qyJvBs8wY=
=WMhY
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-06-06
We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 50 files changed, 1887 insertions(+), 527 deletions(-).
The main changes are:
1) Add a user space notification mechanism via epoll when a struct_ops
object is getting detached/unregistered, from Kui-Feng Lee.
2) Big batch of BPF selftest refactoring for sockmap and BPF congctl
tests, from Geliang Tang.
3) Add BTF field (type and string fields, right now) iterator support
to libbpf instead of using existing callback-based approaches,
from Andrii Nakryiko.
4) Extend BPF selftests for the latter with a new btf_field_iter
selftest, from Alan Maguire.
5) Add new kfuncs for a generic, open-coded bits iterator,
from Yafang Shao.
6) Fix BPF selftests' kallsyms_find() helper under kernels configured
with CONFIG_LTO_CLANG_THIN, from Yonghong Song.
7) Remove a bunch of unused structs in BPF selftests,
from David Alan Gilbert.
8) Convert test_sockmap section names into names understood by libbpf
so it can deduce program type and attach type, from Jakub Sitnicki.
9) Extend libbpf with the ability to configure log verbosity
via LIBBPF_LOG_LEVEL environment variable, from Mykyta Yatsenko.
10) Fix BPF selftests with regards to bpf_cookie and find_vma flakiness
in nested VMs, from Song Liu.
11) Extend riscv32/64 JITs to introduce shift/add helpers to generate Zba
optimization, from Xiao Wang.
12) Enable BPF programs to declare arrays and struct fields with kptr,
bpf_rb_root, and bpf_list_head, from Kui-Feng Lee.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
selftests/bpf: Drop useless arguments of do_test in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp_fallback in bpf_tcp_ca
selftests/bpf: Add start_test helper in bpf_tcp_ca
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
libbpf: Auto-attach struct_ops BPF maps in BPF skeleton
selftests/bpf: Add btf_field_iter selftests
selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT
libbpf: Remove callback-based type/string BTF field visitor helpers
bpftool: Use BTF field iterator in btfgen
libbpf: Make use of BTF field iterator in BTF handling code
libbpf: Make use of BTF field iterator in BPF linker code
libbpf: Add BTF field iterator
selftests/bpf: Ignore .llvm.<hash> suffix in kallsyms_find()
selftests/bpf: Fix bpf_cookie and find_vma in nested VM
selftests/bpf: Test global bpf_list_head arrays.
selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types.
selftests/bpf: Test kptr arrays and kptrs in nested struct fields.
bpf: limit the number of levels of a nested struct type.
bpf: look into the types of the fields of a struct type recursively.
...
====================
Link: https://lore.kernel.org/r/20240606223146.23020-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bpf_map_lookup_elem() has been removed from do_test(), it makes the
sk_stg_map argument of do_test() useless. In addition, two exactly the
same opts are passed in all the places where do_test() is invoked, so
cli_opts argument can be dropped too.
This patch drops these two useless arguments of do_test() in bpf_tcp_ca.c.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/7056eab111d78a05bce29d2821228dc93f240de4.1717054461.git.tanggeliang@kylinos.cn
The newly added helper start_test() can be used in test_dctcp_fallback()
too, to replace start_server_str() and connect_to_fd_opts(). In that
way, two network_helper_opts srv_opts and cli_opts are used instead of
the previously shared opts.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/792ca3bb013fa06e618176da02d75e4f79a76733.1717054461.git.tanggeliang@kylinos.cn
This patch uses connect_to_fd_opts() instead of using connect_fd_to_fd()
and settcpca() in do_test() in prog_tests/bpf_tcp_ca.c to accept a struct
network_helper_opts argument.
Then define a dctcp dedicated post_socket_cb callback stg_post_socket_cb(),
invoking both settcpca() and bpf_map_update_elem() in it, and set it in
test_dctcp(). For passing map_fd into stg_post_socket_cb() callback, a new
member map_fd is added in struct cb_opts.
Add another "const struct network_helper_opts *cli_opts" to do_test() to
separate it from the server "opts".
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/876ec90430865bc468e3b7f6fb2648420b075548.1717054461.git.tanggeliang@kylinos.cn
bpf_cookie and find_vma are flaky in nested VMs, which is used by some CI
systems. It turns out these failures are caused by unreliable perf event
in nested VM. Fix these by:
1. Use PERF_COUNT_SW_CPU_CLOCK in find_vma;
2. Increase sample_freq in bpf_cookie.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org
Make sure global arrays of bpf_list_heads and fields of bpf_list_heads in
nested struct types work correctly.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240523174202.461236-10-thinker.li@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make sure global arrays of bpf_rb_root and fields of bpf_rb_root in nested
struct types work correctly.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240523174202.461236-9-thinker.li@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make sure that BPF programs can declare global kptr arrays and kptr fields
in struct types that is the type of a global variable or the type of a
nested descendant field in a global variable.
An array with only one element is special case, that it treats the element
like a non-array kptr field. Nested arrays are also tested to ensure they
are handled properly.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240523174202.461236-8-thinker.li@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
'scale_test_def' is unused since commit 3762a39ce8 ("selftests/bpf: Split out
bpf_verif_scale selftests into multiple tests"). Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240602234112.225107-2-linux@treblig.org
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/ti/icssg/icssg_classifier.c
abd5576b9c ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
56a5cf538c ("net: ti: icssg-prueth: Fix start counter for ft1 filter")
https://lore.kernel.org/all/20240531123822.3bb7eadf@canb.auug.org.au/
No other adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Verify whether a user space program is informed through epoll with EPOLLHUP
when a struct_ops object is detached.
The BPF code in selftests/bpf/progs/struct_ops_module.c has become
complex. Therefore, struct_ops_detach.c has been added to segregate the BPF
code for detachment tests from the BPF code for other tests based on the
recommendation of Andrii Nakryiko.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240530065946.979330-6-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add test cases for the bits iter:
- Positive cases
- Bit mask representing a single word (8-byte unit)
- Bit mask representing data spanning more than one word
- The index of the set bit
- Nagative cases
- bpf_iter_bits_destroy() is required after calling
bpf_iter_bits_new()
- bpf_iter_bits_destroy() can only destroy an initialized iter
- bpf_iter_bits_next() must use an initialized iter
- Bit mask representing zero words
- Bit mask representing fewer words than expected
- Case for ENOMEM
- Case for NULL pointer
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240517023034.48138-3-laoar.shao@gmail.com
This patch uses new helper start_server_str() in do_test() in bpf_tcp_ca.c
to accept a struct network_helper_opts argument instead of using
start_server() and settcpca(). Then change the type of the first paramenter
of do_test() into a struct network_helper_opts one.
Define its own cb_opts and opts for each test, set its own cc name into
cb_opts.cc, and cc_cb() into post_socket_cb callback, then pass it to
do_test().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/6e1b6555e3284e77c8aa60668c61a66c5f99aa37.1716638248.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch uses start_server_str() helper in test_dctcp_fallback() in
bpf_tcp_ca.c, instead of using start_server() and settcpca(). For
support opts in start_server_str() helper, opts->cb_opts needs to be
passed to post_socket_cb() in __start_server().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/414c749321fa150435f7fe8e12c80fec8b447c78.1716638248.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Since the post_socket_cb() callback is added in struct network_helper_opts,
it's make sense to use it not only in __start_server(), but also in
connect_to_fd_opts(). Then it can be used to set TCP_CONGESTION sockopt.
Add a "void *" type member cb_opts into struct network_helper_opts, and add
a new struct named cb_opts in prog_tests/bpf_tcp_ca.c, then cc can be moved
into struct cb_opts from network_helper_opts. Define a new callback cc_cb()
to set TCP_CONGESTION sockopt, and set it to post_socket_cb pointer of opts.
Define a new cb_opts cubic, set it to cb_opts of opts. Pass this opts to
connect_to_fd_opts() in test_dctcp_fallback().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/b512bb8d8f6854c9ea5c409b69d1bf37c6f272c6.1716638248.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
It's not possible to have one generic/common "struct post_socket_opts"
for all tests. It's better to have the individual test define its own
callback opts struct.
So this patch drops struct post_socket_opts, and changes the second
parameter of post_socket_cb as "void *" type.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/f8bda41c7cb9cb6979b2779f89fb3a684234304f.1716638248.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Verifier enforces that only certain program types can mutate sock{map,hash}
maps, that is update it or delete from it. Add test coverage for these
checks so we don't regress.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240527-sockmap-verify-deletes-v1-3-944b372f2101@cloudflare.com
Validate libbpf's USDT-over-multi-uprobe logic by adding USDTs to
existing multi-uprobe tests. This checks correct libbpf fallback to
singular uprobes (when run on older kernels with buggy PID filtering).
We reuse already established child process and child thread testing
infrastructure, so additions are minimal. These test fail on either
older kernels or older version of libbpf that doesn't detect PID
filtering problems.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend existing multi-uprobe tests to test that PID filtering works
correctly. We already have child *process* tests, but we need also child
*thread* tests. This patch adds spawn_thread() helper to start child
thread, wait for it to be ready, and then instruct it to trigger desired
uprobes.
Additionally, we extend BPF-side code to track thread ID, not just
process ID. Also we detect whether extraneous triggerings with
unexpected process IDs happened, and validate that none of that happened
in practice.
These changes prove that fixed PID filtering logic for multi-uprobe
works as expected. These tests fail on old kernels.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Current implementation of PID filtering logic for multi-uprobes in
uprobe_prog_run() is filtering down to exact *thread*, while the intent
for PID filtering it to filter by *process* instead. The check in
uprobe_prog_run() also differs from the analogous one in
uprobe_multi_link_filter() for some reason. The latter is correct,
checking task->mm, not the task itself.
Fix the check in uprobe_prog_run() to perform the same task->mm check.
While doing this, we also update get_pid_task() use to use PIDTYPE_TGID
type of lookup, given the intent is to get a representative task of an
entire process. This doesn't change behavior, but seems more logical. It
would hold task group leader task now, not any random thread task.
Last but not least, given multi-uprobe support is half-broken due to
this PID filtering logic (depending on whether PID filtering is
important or not), we need to make it easy for user space consumers
(including libbpf) to easily detect whether PID filtering logic was
already fixed.
We do it here by adding an early check on passed pid parameter. If it's
negative (and so has no chance of being a valid PID), we return -EINVAL.
Previous behavior would eventually return -ESRCH ("No process found"),
given there can't be any process with negative PID. This subtle change
won't make any practical change in behavior, but will allow applications
to detect PID filtering fixes easily. Libbpf fixes take advantage of
this in the next patch.
Cc: stable@vger.kernel.org
Acked-by: Jiri Olsa <jolsa@kernel.org>
Fixes: b733eeade4 ("bpf: Add pid filter support for uprobe_multi link")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With changes in the design to forward CLOCK_TAI in the skbuff
framework, existing selftest framework needs modification
to handle forwarding of UDP packets with CLOCK_TAI as clockid.
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509211834.3235191-4-quic_abchauha@quicinc.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add test for allocating and looking up ct entry in a
non-default ct zone with kfuncs bpf_{xdp,skb}_ct_alloc
and bpf_{xdp,skb}_ct_lookup.
Add negative tests for looking up ct entry in a different
ct zone to where it was allocated and with a different
direction.
Update reserved test for old struct definition to test for
ct_zone_id being set when opts size isn't NF_BPF_CT_OPTS_SZ (16).
Signed-off-by: Brad Cowie <brad@faucet.nz>
Link: https://lore.kernel.org/r/20240522050712.732558-2-brad@faucet.nz
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The btf_dump test fails:
test_btf_dump_struct_data:FAIL:file_operations unexpected file_operations: actual '(struct file_operations){
.owner = (struct module *)0xffffffffffffffff,
.fop_flags = (fop_flags_t)4294967295,
.llseek = (loff_t (*)(struct f' != expected '(struct file_operations){
.owner = (struct module *)0xffffffffffffffff,
.llseek = (loff_t (*)(struct file *, loff_t, int))0xffffffffffffffff,'
The "fop_flags" is a recent addition to the struct file_operations in
commit 210a03c9d5 ("fs: claw back a few FMODE_* bits")
This patch changes the test_btf_dump_struct_data() to reflect
this change.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20240516164310.2481460-1-martin.lau@linux.dev
Core & protocols
----------------
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd passing
functionality. New method based on Tarjan's Strongly Connected Components
algorithm should be both faster and remove a lot of workarounds
we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP packets
and forwarding them together. Useful for small switches / routers which
lack basic checksum offload in some scenarios (e.g. PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't
use NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6 address
labels, netdev threaded NAPI sysfs files, bonding driver's sysfs files,
MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics, TC Qdiscs,
neighbor entries, ARP entries via ioctl(SIOCGARP), a lot of the link
information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory accounting,
RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2% PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked,
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states.
State can be used either for input or output packet processing.
Things we sprinkled into general kernel code
--------------------------------------------
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter
---------
- Use GFP_KERNEL to clone elements, to deal better with OOM situations
and avoid failures in the .commit step.
BPF
---
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function entry
and return, the entry program can decide if the return program gets
executed and the entry program can share u64 cookie value with return
program. "Session mode" is a common use-case for tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw tracepoint
programs in order to ease migration from classic to raw tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V JITs.
This allows inlining functions which need to access per-CPU state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86 instruction.
Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor process-context
bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API
----------
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line) config.
- Initial bits of a queue control API, for now allowing a single queue
to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling
-----------------
- Remove the need to create a config file to run the net forwarding tests
so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test machine).
Add a few such tests.
- Create a shared code library for writing Python tests. Expose the YAML
Netlink library from tools/ to the tests for easy Netlink access.
- Move netfilter tests under net/, extend them, separate performance tests
from correctness tests, and iron out issues found by running them
"on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF info,
TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers
-------
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application rather
than dmesg (large number of driver changes from Asbjørn Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it messes up
TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the MII
bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API cleanup.
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices drivers.
Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZD6sQACgkQMUZtbf5S
IrtLYw/+I73ePGIye37o2jpbodcLAUZVfF3r6uYUzK8hokEcKD0QVJa9w7PizLZ3
UO45ClOXFLJCkfP4reFenLfxGCel2AJI+F7VFl2xaO2XgrcH/lnVrHqKZEAEXjls
KoYMnShIolv7h2MKP6hHtyTi2j1wvQUKsZC71o9/fuW+4fUT8gECx1YtYcL73wrw
gEMdlUgBYC3jiiCUHJIFX6iPJ2t/TC+q1eIIF2K/Osrk2kIqQhzoozcL4vpuAZQT
99ljx/qRelXa8oppDb7nM5eulg7WY8ZqxEfFZphTMC5nLEGzClxuOTTl2kDYI/D/
UZmTWZDY+F5F0xvNk2gH84qVJXBOVDoobpT7hVA/tDuybobc/kvGDzRayEVqVzKj
Q0tPlJs+xBZpkK5TVnxaFLJVOM+p1Xosxy3kNVXmuYNBvT/R89UbJiCrUKqKZF+L
z/1mOYUv8UklHqYAeuJSptHvqJjTGa/fsEYP7dAUBbc1N2eVB8mzZ4mgU5rYXbtC
E6UXXiWnoSRm8bmco9QmcWWoXt5UGEizHSJLz6t1R5Df/YmXhWlytll5aCwY1ksf
FNoL7S4u7AZThL1Nwi7yUs4CAjhk/N4aOsk+41S0sALCx30BJuI6UdesAxJ0lu+Z
fwCQYbs27y4p7mBLbkYwcQNxAxGm7PSK4yeyRIy2njiyV4qnLf8=
=EsC2
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd
passing functionality. New method based on Tarjan's Strongly
Connected Components algorithm should be both faster and remove a
lot of workarounds we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP
packets and forwarding them together. Useful for small switches /
routers which lack basic checksum offload in some scenarios (e.g.
PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't use
NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
address labels, netdev threaded NAPI sysfs files, bonding driver's
sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
of the link information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory
accounting, RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2%
PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states. State can be
used either for input or output packet processing.
Things we sprinkled into general kernel code:
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter:
- Use GFP_KERNEL to clone elements, to deal better with OOM
situations and avoid failures in the .commit step.
BPF:
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function
entry and return, the entry program can decide if the return
program gets executed and the entry program can share u64 cookie
value with return program. "Session mode" is a common use-case for
tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw
tracepoint programs in order to ease migration from classic to raw
tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V
JITs. This allows inlining functions which need to access per-CPU
state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86
instruction. Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor
process-context bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto
APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API:
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by
rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line)
config.
- Initial bits of a queue control API, for now allowing a single
queue to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling:
- Remove the need to create a config file to run the net forwarding
tests so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test
machine). Add a few such tests.
- Create a shared code library for writing Python tests. Expose the
YAML Netlink library from tools/ to the tests for easy Netlink
access.
- Move netfilter tests under net/, extend them, separate performance
tests from correctness tests, and iron out issues found by running
them "on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF
info, TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers:
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application
rather than dmesg (large number of driver changes from Asbjørn
Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it
messes up TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the
MII bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API
cleanup
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
drivers. Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support"
* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
selftests: netfilter: fix packetdrill conntrack testcase
net: gro: fix napi_gro_cb zeroed alignment
Bluetooth: btintel_pcie: Refactor and code cleanup
Bluetooth: btintel_pcie: Fix warning reported by sparse
Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
Bluetooth: btintel_pcie: Fix compiler warnings
Bluetooth: btintel_pcie: Add *setup* function to download firmware
Bluetooth: btintel_pcie: Add support for PCIe transport
Bluetooth: btintel: Export few static functions
Bluetooth: HCI: Remove HCI_AMP support
Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
Bluetooth: qca: Fix error code in qca_read_fw_build_info()
Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
Bluetooth: btintel: Add support for BlazarI
LE Create Connection command timeout increased to 20 secs
dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
Bluetooth: compute LE flow credits based on recvbuf space
Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
...