linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-08-25 03:13:21 +00:00

Author	SHA1	Message	Date
Kees Cook	9203e0a82c	wireguard: peer: Replace sockaddr with sockaddr_inet As part of the removal of the variably-sized sockaddr for kernel internals, replace struct sockaddr with sockaddr_inet in the endpoint union. No binary changes; the union size remains unchanged due to sockaddr_inet matching the size of sockaddr_in6. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250722171836.1078436-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-25 15:29:58 -07:00
Samiullah Khawaja	78afdadafe	net: Use netif_threaded_enable instead of netif_set_threaded in drivers Prepare for adding an enum type for NAPI threaded states by adding netif_threaded_enable API. De-export the existing netif_set_threaded API and only use it internally. Update existing drivers to use netif_threaded_enable instead of the de-exported netif_set_threaded. Note that dev_set_threaded used by mt76 debugfs file is unchanged. Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Link: https://patch.msgid.link/20250723013031.2911384-3-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-24 18:34:55 -07:00
Stanislav Fomichev	5d4d84618e	net: s/dev_set_threaded/netif_set_threaded/ Commit `cc34acd577` ("docs: net: document new locking reality") introduced netif_ vs dev_ function semantics: the former expects locked netdev, the latter takes care of the locking. We don't strictly follow this semantics on either side, but there are more dev_xxx handlers now that don't fit. Rename them to netif_xxx where appropriate. Note that one dev_set_threaded call still remains in mt76 for debugfs file. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250717172333.1288349-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-18 17:27:47 -07:00
Petr Machata	f78c75d84f	net: ipv6: Add a flags argument to ip6tunnel_xmit(), udp_tunnel6_xmit_skb() ip6tunnel_xmit() erases the contents of the SKB control block. In order to be able to set particular IP6CB flags on the SKB, add a corresponding parameter, and propagate it to udp_tunnel6_xmit_skb() as well. In one of the following patches, VXLAN driver will use this facility to mark packets as subject to IPv6 multicast routing. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/acb4f9f3e40c3a931236c3af08a720b017fbfbfb.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:45 -07:00
Petr Machata	e3411e326f	net: ipv4: Add a flags argument to iptunnel_xmit(), udp_tunnel_xmit_skb() iptunnel_xmit() erases the contents of the SKB control block. In order to be able to set particular IPCB flags on the SKB, add a corresponding parameter, and propagate it to udp_tunnel_xmit_skb() as well. In one of the following patches, VXLAN driver will use this facility to mark packets as subject to IP multicast routing. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/89c9daf9f2dc088b6b92ccebcc929f51742de91f.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:44 -07:00
Ingo Molnar	41cb08555c	treewide, timers: Rename from_timer() to timer_container_of() Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com	2025-06-08 09:07:37 +02:00
Mirco Barone	db9ae3b6b4	wireguard: device: enable threaded NAPI Enable threaded NAPI by default for WireGuard devices in response to low performance behavior that we observed when multiple tunnels (and thus multiple wg devices) are deployed on a single host. This affects any kind of multi-tunnel deployment, regardless of whether the tunnels share the same endpoints or not (i.e., a VPN concentrator type of gateway would also be affected). The problem is caused by the fact that, in case of a traffic surge that involves multiple tunnels at the same time, the polling of the NAPI instance of all these wg devices tends to converge onto the same core, causing underutilization of the CPU and bottlenecking performance. This happens because NAPI polling is hosted by default in softirq context, but the WireGuard driver only raises this softirq after the rx peer queue has been drained, which doesn't happen during high traffic. In this case, the softirq already active on a core is reused instead of raising a new one. As a result, once two or more tunnel softirqs have been scheduled on the same core, they remain pinned there until the surge ends. In our experiments, this almost always leads to all tunnel NAPIs being handled on a single core shortly after a surge begins, limiting scalability to less than 3× the performance of a single tunnel, despite plenty of unused CPU cores being available. The proposed mitigation is to enable threaded NAPI for all WireGuard devices. This moves the NAPI polling context to a dedicated per-device kernel thread, allowing the scheduler to balance the load across all available cores. On our 32-core gateways, enabling threaded NAPI yields a ~4× performance improvement with 16 tunnels, increasing throughput from ~13 Gbps to ~48 Gbps. Meanwhile, CPU usage on the receiver (which is the bottleneck) jumps from 20% to 100%. We have found no performance regressions in any scenario we tested. Single-tunnel throughput remains unchanged. More details are available in our Netdev paper. Link: https://netdevconf.info/0x18/docs/netdev-0x18-paper23-talk-paper.pdf Signed-off-by: Mirco Barone <mirco.barone@polito.it> Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250605120616.2808744-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-05 07:53:57 -07:00
Jordan Rife	ba3d7b93db	wireguard: allowedips: add WGALLOWEDIP_F_REMOVE_ME flag The current netlink API for WireGuard does not directly support removal of allowed ips from a peer. A user can remove an allowed ip from a peer in one of two ways: 1. By using the WGPEER_F_REPLACE_ALLOWEDIPS flag and providing a new list of allowed ips which omits the allowed ip that is to be removed. 2. By reassigning an allowed ip to a "dummy" peer then removing that peer with WGPEER_F_REMOVE_ME. With the first approach, the driver completely rebuilds the allowed ip list for a peer. If my current configuration is such that a peer has allowed ips 192.168.0.2 and 192.168.0.3 and I want to remove 192.168.0.2 the actual transition looks like this. [192.168.0.2, 192.168.0.3] <-- Initial state [] <-- Step 1: Allowed ips removed for peer [192.168.0.3] <-- Step 2: Allowed ips added back for peer This is true even if the allowed ip list is small and the update does not need to be batched into multiple WG_CMD_SET_DEVICE requests, as the removal and subsequent addition of ips is non-atomic within a single request. Consequently, wg_allowedips_lookup_dst and wg_allowedips_lookup_src may return NULL while reconfiguring a peer even for packets bound for ips a user did not intend to remove leading to unintended interruptions in connectivity. This presents in userspace as failed calls to sendto and sendmsg for UDP sockets. In my case, I ran netperf while repeatedly reconfiguring the allowed ips for a peer with wg. /usr/local/bin/netperf -H 10.102.73.72 -l 10m -t UDP_STREAM -- -R 1 -m 1024 send_data: data send error: No route to host (errno 113) netperf: send_omni: send_data failed: No route to host While this may not be of particular concern for environments where peers and allowed ips are mostly static, systems like Cilium manage peers and allowed ips in a dynamic environment where peers (i.e. Kubernetes nodes) and allowed ips (i.e. pods running on those nodes) can frequently change making WGPEER_F_REPLACE_ALLOWEDIPS problematic. The second approach avoids any possible connectivity interruptions but is hacky and less direct, requiring the creation of a temporary peer just to dispose of an allowed ip. Introduce a new flag called WGALLOWEDIP_F_REMOVE_ME which in the same way that WGPEER_F_REMOVE_ME allows a user to remove a single peer from a WireGuard device's configuration allows a user to remove an ip from a peer's set of allowed ips. This enables incremental updates to a device's configuration without any connectivity blips or messy workarounds. A corresponding patch for wg extends the existing `wg set` interface to leverage this feature. $ wg set wg0 peer <PUBKEY> allowed-ips +192.168.88.0/24,-192.168.0.1/32 When '+' or '-' is prepended to any ip in the list, wg clears WGPEER_F_REPLACE_ALLOWEDIPS and sets the WGALLOWEDIP_F_REMOVE_ME flag on any ip prefixed with '-'. Signed-off-by: Jordan Rife <jordan@jrife.io> [Jason: minor style nits, fixes to selftest, bump of wireguard-tools version] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-5-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-27 09:06:19 +02:00
Jason A. Donenfeld	c852902007	wireguard: netlink: use NLA_POLICY_MASK where possible Rather than manually validating flags against the various __ALL_* constants, put this in the netlink policy description and have the upper layer machinery check it for us. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-4-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-27 09:06:19 +02:00
Kees Cook	71e5da46e7	wireguard: global: add __nonstring annotations for unterminated strings When a character array without a terminating NUL character has a static initializer, GCC 15's -Wunterminated-string-initialization will only warn if the array lacks the "nonstring" attribute[1]. Mark the arrays with __nonstring to correctly identify the char array as "not a C string" and thereby eliminate the warning: ../drivers/net/wireguard/cookie.c:29:56: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Wunterminated-string-initialization] 29 \| static const u8 mac1_key_label[COOKIE_KEY_LABEL_LEN] = "mac1----"; \| ^~~~~~~~~~ ../drivers/net/wireguard/cookie.c:30:58: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Wunterminated-string-initialization] 30 \| static const u8 cookie_key_label[COOKIE_KEY_LABEL_LEN] = "cookie--"; \| ^~~~~~~~~~ ../drivers/net/wireguard/noise.c:28:38: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (38 chars into 37 available) [-Wunterminated-string-initialization] 28 \| static const u8 handshake_name[37] = "Noise_IKpsk2_25519_ChaChaPoly_BLAKE2s"; \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/net/wireguard/noise.c:29:39: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (35 chars into 34 available) [-Wunterminated-string-initialization] 29 \| static const u8 identifier_name[34] = "WireGuard v1 zx2c4 Jason@zx2c4.com"; \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The arrays are always used with their fixed size, so use __nonstring. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1] Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-3-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-27 09:06:19 +02:00
Thomas Gleixner	8fa7292fee	treewide: Switch/rename to timer_delete[_sync]() timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-05 10:30:12 +02:00
Xiao Liang	cf517ac16a	net: Use link/peer netns in newlink() of rtnl_link_ops Add two helper functions - rtnl_newlink_link_net() and rtnl_newlink_peer_net() for netns fallback logic. Peer netns falls back to link netns, and link netns falls back to source netns. Convert the use of params->net in netdevice drivers to one of the helper functions for clarity. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-4-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-21 15:28:02 -08:00
Xiao Liang	69c7be1b90	rtnetlink: Pack newlink() params into struct There are 4 net namespaces involved when creating links: - source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend), - peer netns - netns of peer device. Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on netlink attributes in the request. +------------+-------------------+---------+---------+ \| peer netns \| IFLA_LINK_NETNSID \| src_net \| dev_net \| +------------+-------------------+---------+---------+ \| \| absent \| source \| target \| \| absent +-------------------+---------+---------+ \| \| present \| link \| link \| +------------+-------------------+---------+---------+ \| \| absent \| peer \| target \| \| present +-------------------+---------+---------+ \| \| present \| peer \| link \| +------------+-------------------+---------+---------+ When IFLA_LINK_NETNSID is present, the device is created in link netns first and then moved to target netns. This has some side effects, including extra ifindex allocation, ifname validation and link events. These could be avoided if we create it in target netns from the beginning. On the other hand, the meaning of src_net parameter is ambiguous. It varies depending on how parameters are passed. It is the effective link (or peer netns) by design, but some drivers ignore it and use dev_net instead. To provide more netns context for drivers, this patch packs existing newlink() parameters, along with the source netns, link netns and peer netns, into a struct. The old "src_net" is renamed to "net" to avoid confusion with real source netns, and will be deprecated later. The use of src_net are converted to params->net trivially. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-21 15:28:02 -08:00
Daniel Borkmann	06a34f7db7	wireguard: device: support big tcp GSO Advertise GSO_MAX_SIZE as TSO max size in order support BIG TCP for wireguard. This helps to improve wireguard performance a bit when enabled as it allows wireguard to aggregate larger skbs in wg_packet_consume_data_done() via napi_gro_receive(), but also allows the stack to build larger skbs on xmit where the driver then segments them before encryption inside wg_xmit(). We've seen a 15% improvement in TCP stream performance. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20241117212030.629159-5-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:32:27 -08:00
Dheeraj Reddy Jonnalagadda	c1822fb64f	wireguard: allowedips: remove redundant selftest call This commit fixes a useless call issue detected by Coverity (CID 1508092). The call to horrible_allowedips_lookup_v4 is unnecessary as its return value is never checked. Signed-off-by: Dheeraj Reddy Jonnalagadda <dheeraj.linuxdev@gmail.com> Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20241117212030.629159-3-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:32:27 -08:00
Tobias Klauser	2c862914fb	wireguard: device: omit unnecessary memset of netdev private data The memory for netdev_priv is allocated using kvzalloc in alloc_netdev_mqs before rtnl_link_ops->setup is called so there is no need to zero it again in wg_setup. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20241117212030.629159-2-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:32:27 -08:00
Alexander Lobakin	00d066a4d4	netdev_features: convert NETIF_F_LLTX to dev->lltx NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-09-03 11:36:43 +02:00
Jason A. Donenfeld	381a7d453f	wireguard: send: annotate intentional data race in checking empty queue KCSAN reports a race in wg_packet_send_keepalive, which is intentional: BUG: KCSAN: data-race in wg_packet_send_keepalive / wg_packet_send_staged_packets write to 0xffff88814cd91280 of 8 bytes by task 3194 on cpu 0: __skb_queue_head_init include/linux/skbuff.h:2162 [inline] skb_queue_splice_init include/linux/skbuff.h:2248 [inline] wg_packet_send_staged_packets+0xe5/0xad0 drivers/net/wireguard/send.c:351 wg_xmit+0x5b8/0x660 drivers/net/wireguard/device.c:218 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3564 __dev_queue_xmit+0xeff/0x1d80 net/core/dev.c:4349 dev_queue_xmit include/linux/netdevice.h:3134 [inline] neigh_connected_output+0x231/0x2a0 net/core/neighbour.c:1592 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0xa66/0xce0 net/ipv6/ip6_output.c:137 ip6_finish_output+0x1a5/0x490 net/ipv6/ip6_output.c:222 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:243 dst_output include/net/dst.h:451 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] ndisc_send_skb+0x4a2/0x670 net/ipv6/ndisc.c:509 ndisc_send_rs+0x3ab/0x3e0 net/ipv6/ndisc.c:719 addrconf_dad_completed+0x640/0x8e0 net/ipv6/addrconf.c:4295 addrconf_dad_work+0x891/0xbc0 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706 worker_thread+0x525/0x730 kernel/workqueue.c:2787 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242 read to 0xffff88814cd91280 of 8 bytes by task 3202 on cpu 1: skb_queue_empty include/linux/skbuff.h:1798 [inline] wg_packet_send_keepalive+0x20/0x100 drivers/net/wireguard/send.c:225 wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline] wg_packet_handshake_receive_worker+0x445/0x5e0 drivers/net/wireguard/receive.c:213 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706 worker_thread+0x525/0x730 kernel/workqueue.c:2787 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242 value changed: 0xffff888148fef200 -> 0xffff88814cd91280 Mark this race as intentional by using the skb_queue_empty_lockless() function rather than skb_queue_empty(), which uses READ_ONCE() internally to annotate the race. Cc: stable@vger.kernel.org Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20240704154517.1572127-5-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 17:21:10 -07:00
Jason A. Donenfeld	2fe3d6d205	wireguard: queueing: annotate intentional data race in cpu round robin KCSAN reports a race in the CPU round robin function, which, as the comment points out, is intentional: BUG: KCSAN: data-race in wg_packet_send_staged_packets / wg_packet_send_staged_packets read to 0xffff88811254eb28 of 4 bytes by task 3160 on cpu 1: wg_cpumask_next_online drivers/net/wireguard/queueing.h:127 [inline] wg_queue_enqueue_per_device_and_peer drivers/net/wireguard/queueing.h:173 [inline] wg_packet_create_data drivers/net/wireguard/send.c:320 [inline] wg_packet_send_staged_packets+0x60e/0xac0 drivers/net/wireguard/send.c:388 wg_packet_send_keepalive+0xe2/0x100 drivers/net/wireguard/send.c:239 wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline] wg_packet_handshake_receive_worker+0x449/0x5f0 drivers/net/wireguard/receive.c:213 process_one_work kernel/workqueue.c:3248 [inline] process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3329 worker_thread+0x526/0x720 kernel/workqueue.c:3409 kthread+0x1d1/0x210 kernel/kthread.c:389 ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 write to 0xffff88811254eb28 of 4 bytes by task 3158 on cpu 0: wg_cpumask_next_online drivers/net/wireguard/queueing.h:130 [inline] wg_queue_enqueue_per_device_and_peer drivers/net/wireguard/queueing.h:173 [inline] wg_packet_create_data drivers/net/wireguard/send.c:320 [inline] wg_packet_send_staged_packets+0x6e5/0xac0 drivers/net/wireguard/send.c:388 wg_packet_send_keepalive+0xe2/0x100 drivers/net/wireguard/send.c:239 wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline] wg_packet_handshake_receive_worker+0x449/0x5f0 drivers/net/wireguard/receive.c:213 process_one_work kernel/workqueue.c:3248 [inline] process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3329 worker_thread+0x526/0x720 kernel/workqueue.c:3409 kthread+0x1d1/0x210 kernel/kthread.c:389 ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 value changed: 0xffffffff -> 0x00000000 Mark this race as intentional by using READ/WRITE_ONCE(). Cc: stable@vger.kernel.org Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20240704154517.1572127-4-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 17:21:10 -07:00
Helge Deller	948f991c62	wireguard: allowedips: avoid unaligned 64-bit memory accesses On the parisc platform, the kernel issues kernel warnings because swap_endian() tries to load a 128-bit IPv6 address from an unaligned memory location: Kernel: unaligned access to 0x55f4688c in wg_allowedips_insert_v6+0x2c/0x80 [wireguard] (iir 0xf3010df) Kernel: unaligned access to 0x55f46884 in wg_allowedips_insert_v6+0x38/0x80 [wireguard] (iir 0xf2010dc) Avoid such unaligned memory accesses by instead using the get_unaligned_be64() helper macro. Signed-off-by: Helge Deller <deller@gmx.de> [Jason: replace src[8] in original patch with src+8] Cc: stable@vger.kernel.org Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20240704154517.1572127-3-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 17:21:10 -07:00
Jakub Kicinski	cd7209628c	genetlink: remove linux/genetlink.h genetlink.h is a shell of what used to be a combined uAPI and kernel header over a decade ago. It has fewer than 10 lines of code. Merge it into net/genetlink.h. In some ways it'd be better to keep the combined header under linux/ but it would make looking through git history harder. Acked-by: Sven Eckelmann <sven@narfation.org> Link: https://lore.kernel.org/r/20240329175710.291749-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-01 21:44:34 -07:00
Jason A. Donenfeld	71cbd32e3d	wireguard: netlink: access device through ctx instead of peer The previous commit fixed a bug that led to a NULL peer->device being dereferenced. It's actually easier and faster performance-wise to instead get the device from ctx->wg. This semantically makes more sense too, since ctx->wg->peer_allowedips.seq is compared with ctx->allowedips_seq, basing them both in ctx. This also acts as a defence in depth provision against freed peers. Cc: stable@vger.kernel.org Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:50 +01:00
Jason A. Donenfeld	55b6c73867	wireguard: netlink: check for dangling peer via is_dead instead of empty list If all peers are removed via wg_peer_remove_all(), rather than setting peer_list to empty, the peer is added to a temporary list with a head on the stack of wg_peer_remove_all(). If a netlink dump is resumed and the cursored peer is one that has been removed via wg_peer_remove_all(), it will iterate from that peer and then attempt to dump freed peers. Fix this by instead checking peer->is_dead, which was explictly created for this purpose. Also move up the device_update_lock lockdep assertion, since reading is_dead relies on that. It can be reproduced by a small script like: echo "Setting config..." ip link add dev wg0 type wireguard wg setconf wg0 /big-config ( while true; do echo "Showing config..." wg showconf wg0 > /dev/null done ) & sleep 4 wg setconf wg0 <(printf "[Peer]\nPublicKey=$(wg genkey)\n") Resulting in: BUG: KASAN: slab-use-after-free in __lock_acquire+0x182a/0x1b20 Read of size 8 at addr ffff88811956ec70 by task wg/59 CPU: 2 PID: 59 Comm: wg Not tainted 6.8.0-rc2-debug+ #5 Call Trace: <TASK> dump_stack_lvl+0x47/0x70 print_address_description.constprop.0+0x2c/0x380 print_report+0xab/0x250 kasan_report+0xba/0xf0 __lock_acquire+0x182a/0x1b20 lock_acquire+0x191/0x4b0 down_read+0x80/0x440 get_peer+0x140/0xcb0 wg_get_device_dump+0x471/0x1130 Cc: stable@vger.kernel.org Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Reported-by: Lillian Berry <lillian@star-ark.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:50 +01:00
Breno Leitao	df9bbb5e77	wireguard: device: remove generic .ndo_get_stats64 Commit `3e2f544dd8` ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:49 +01:00
Breno Leitao	db2952dfbd	wireguard: device: leverage core stats allocator With commit `34d21de99c` ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of in this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in this driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:49 +01:00
Nikita Zhandarovich	bba045dc4d	wireguard: receive: annotate data-race around receiving_counter.counter Syzkaller with KCSAN identified a data-race issue when accessing keypair->receiving_counter.counter. Use READ_ONCE() and WRITE_ONCE() annotations to mark the data race as intentional. BUG: KCSAN: data-race in wg_packet_decrypt_worker / wg_packet_rx_poll write to 0xffff888107765888 of 8 bytes by interrupt on cpu 0: counter_validate drivers/net/wireguard/receive.c:321 [inline] wg_packet_rx_poll+0x3ac/0xf00 drivers/net/wireguard/receive.c:461 __napi_poll+0x60/0x3b0 net/core/dev.c:6536 napi_poll net/core/dev.c:6605 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6738 __do_softirq+0xc4/0x279 kernel/softirq.c:553 do_softirq+0x5e/0x90 kernel/softirq.c:454 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x6c5/0x700 drivers/net/wireguard/receive.c:499 process_one_work kernel/workqueue.c:2633 [inline] ... read to 0xffff888107765888 of 8 bytes by task 3196 on cpu 1: decrypt_packet drivers/net/wireguard/receive.c:252 [inline] wg_packet_decrypt_worker+0x220/0x700 drivers/net/wireguard/receive.c:501 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706 worker_thread+0x525/0x730 kernel/workqueue.c:2787 ... Fixes: `a9e90d9931` ("wireguard: noise: separate receive counter from send counter") Reported-by: syzbot+d1de830e4ecdaac83d89@syzkaller.appspotmail.com Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:49 +01:00
Eric Dumazet	80bfab79b8	net: adopt skb_network_offset() and similar helpers This is a cleanup patch, making code a bit more concise. 1) Use skb_network_offset(skb) in place of (skb_network_header(skb) - skb->data) 2) Use -skb_network_offset(skb) in place of (skb->data - skb_network_header(skb)) 3) Use skb_transport_offset(skb) in place of (skb_transport_header(skb) - skb->data) 4) Use skb_inner_transport_offset(skb) in place of (skb_inner_transport_header(skb) - skb->data) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> # for sfc Signed-off-by: David S. Miller <davem@davemloft.net>	2024-03-04 08:47:06 +00:00
Eric Dumazet	93da8d75a6	wireguard: use DEV_STATS_INC() wg_xmit() can be called concurrently, KCSAN reported [1] some device stats updates can be lost. Use DEV_STATS_INC() for this unlikely case. [1] BUG: KCSAN: data-race in wg_xmit / wg_xmit read-write to 0xffff888104239160 of 8 bytes by task 1375 on cpu 0: wg_xmit+0x60f/0x680 drivers/net/wireguard/device.c:231 __netdev_start_xmit include/linux/netdevice.h:4918 [inline] netdev_start_xmit include/linux/netdevice.h:4932 [inline] xmit_one net/core/dev.c:3543 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3559 ... read-write to 0xffff888104239160 of 8 bytes by task 1378 on cpu 1: wg_xmit+0x60f/0x680 drivers/net/wireguard/device.c:231 __netdev_start_xmit include/linux/netdevice.h:4918 [inline] netdev_start_xmit include/linux/netdevice.h:4932 [inline] xmit_one net/core/dev.c:3543 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3559 ... v2: also change wg_packet_consume_data_done() (Hangbin Liu) and wg_packet_purge_staged_packets() Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-11-19 19:48:25 +00:00
Herbert Xu	d90dde8c55	wireguard: do not include crypto/algapi.h The header file crypto/algapi.h is for internal use only. Use the header file crypto/utils.h instead. Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-09-15 18:29:44 +08:00
Jakub Kicinski	7288dd2fd4	genetlink: use attrs from struct genl_info Since dumps carry struct genl_info now, use the attrs pointer from genl_info and remove the one in struct genl_dumpit_info. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-15 15:00:45 -07:00
Jason A. Donenfeld	46622219aa	wireguard: allowedips: expand maximum node depth In the allowedips self-test, nodes are inserted into the tree, but it generated an even amount of nodes, but for checking maximum node depth, there is of course the root node, which makes the total number necessarily odd. With two few nodes added, it never triggered the maximum depth check like it should have. So, add 129 nodes instead of 128 nodes, and do so with a more straightforward scheme, starting with all the bits set, and shifting over one each time. Then increase the maximum depth to 129, and choose a better name for that variable to make it clear that it represents depth as opposed to bits. Cc: stable@vger.kernel.org Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20230807132146.2191597-2-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-07 12:26:57 -07:00
Jason A. Donenfeld	326534e837	wireguard: timers: move to using timer_delete_sync The documentation says that del_timer_sync is obsolete, and code should use the equivalent timer_delete_sync instead, so switch to it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-03 09:17:52 +01:00
Jason A. Donenfeld	f58d0a9b4c	wireguard: netlink: send staged packets when setting initial private key Packets bound for peers can queue up prior to the device private key being set. For example, if persistent keepalive is set, a packet is queued up to be sent as soon as the device comes up. However, if the private key hasn't been set yet, the handshake message never sends, and no timer is armed to retry, since that would be pointless. But, if a user later sets a private key, the expectation is that those queued packets, such as a persistent keepalive, are actually sent. So adjust the configuration logic to account for this edge case, and add a test case to make sure this works. Maxim noticed this with a wg-quick(8) config to the tune of: [Interface] PostUp = wg set %i private-key somefile [Peer] PublicKey = ... Endpoint = ... PersistentKeepalive = 25 Here, the private key gets set after the device comes up using a PostUp script, triggering the bug. Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Reported-by: Maxim Cournoyer <maxim.cournoyer@gmail.com> Tested-by: Maxim Cournoyer <maxim.cournoyer@gmail.com> Link: https://lore.kernel.org/wireguard/87fs7xtqrv.fsf@gmail.com/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-03 09:17:52 +01:00
Jason A. Donenfeld	7387943fa3	wireguard: queueing: use saner cpu selection wrapping Using `% nr_cpumask_bits` is slow and complicated, and not totally robust toward dynamic changes to CPU topologies. Rather than storing the next CPU in the round-robin, just store the last one, and also return that value. This simplifies the loop drastically into a much more common pattern. Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Manuel Leiner <manuel.leiner@gmx.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-03 09:17:52 +01:00
Eric Dumazet	d457a0e329	net: move gso declarations and functions to their own files Move declarations into include/net/gso.h and code into net/core/gso.c Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stanislav Fomichev <sdf@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-10 00:11:41 -07:00
Linus Torvalds	8ca09d5fa3	cpumask: fix incorrect cpumask scanning result checks It turns out that commit `596ff4a09b` ("cpumask: re-introduce constant-sized cpumask optimizations") exposed a number of cases of drivers not checking the result of "cpumask_next()" and friends correctly. The documented correct check for "no more cpus in the cpumask" is to check for the result being equal or larger than the number of possible CPU ids, exactly _because_ we've always done those constant-sized cpumask scans using a widened type before. So the return value of a cpumask scan should be checked with if (cpu >= nr_cpu_ids) ... because the cpumask scan did not necessarily stop exactly at that maximum CPU id. But a few cases ended up instead using checks like if (cpu == nr_cpumask_bits) ... which used that internal "widened" number of bits. And that used to work pretty much by accident (ok, in this case "by accident" is simply because it matched the historical internal implementation of the cpumask scanning, so it was more of a "intentionally using implementation details rather than an accident"). But the extended constant-sized optimizations then did that internal implementation differently, and now that code that did things wrong but matched the old implementation no longer worked at all. Which then causes subsequent odd problems due to using what ends up being an invalid CPU ID. Most of these cases require either unusual hardware or special uses to hit, but the random.c one triggers quite easily. All you really need is to have a sufficiently small CONFIG_NR_CPUS value for the bit scanning optimization to be triggered, but not enough CPUs to then actually fill that widened cpumask. At that point, the cpumask scanning will return the NR_CPUS constant, which is _not_ the same as nr_cpumask_bits. This just does the mindless fix with sed -i 's/== nr_cpumask_bits/>= nr_cpu_ids/' to fix the incorrect uses. The ones in the SCSI lpfc driver in particular could probably be fixed more cleanly by just removing that repeated pattern entirely, but I am not emptionally invested enough in that driver to care. Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/lkml/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/ Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/lkml/CAMuHMdUKo_Sf7TjKzcNDa8Ve+6QrK+P8nSQrSQ=6LTRmcBKNww@mail.gmail.com/ Reported-by: Vernon Yang <vernon2gm@gmail.com> Link: https://lore.kernel.org/lkml/20230306160651.2016767-1-vernon2gm@gmail.com/ Cc: Yury Norov <yury.norov@gmail.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-03-06 12:15:13 -08:00
Jiri Slaby (SUSE)	2d4ee16d96	wireguard: timers: cast enum limits members to int in prints Since gcc13, each member of an enum has the same type as the enum. And that is inherited from its members. Provided "REKEY_AFTER_MESSAGES = 1ULL << 60", the named type is unsigned long. This generates warnings with gcc-13: error: format '%d' expects argument of type 'int', but argument 6 has type 'long unsigned int' Cast those particular enum members to int when printing them. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36113 Cc: Martin Liska <mliska@suse.cz> Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/all/20221213225208.3343692-2-Jason@zx2c4.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-13 19:30:45 -08:00
Jason A. Donenfeld	e8a533cbeb	treewide: use get_random_u32_inclusive() when possible These cases were done with this Coccinelle: @@ expression H; expression L; @@ - (get_random_u32_below(H) + L) + get_random_u32_inclusive(L, H + L - 1) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - + E - - E ) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - - E - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - - E + F - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - + E + F - - E ) And then subsequently cleaned up by hand, with several automatic cases rejected if it didn't make sense contextually. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-18 02:18:02 +01:00
Jason A. Donenfeld	8032bf1233	treewide: use get_random_u32_below() instead of deprecated function This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-18 02:15:15 +01:00
Jason A. Donenfeld	197173db99	treewide: use get_random_bytes() when possible The prandom_bytes() function has been a deprecated inline wrapper around get_random_bytes() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-10-11 17:42:58 -06:00
Jason A. Donenfeld	7e3cf0843f	treewide: use get_random_{u8,u16}() when possible, part 1 Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value, simply use the get_random_{u8,u16}() functions, which are faster than wasting the additional bytes from a 32-bit value. This was done mechanically with this coccinelle script: @@ expression E; identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; typedef u16; typedef __be16; typedef __le16; typedef u8; @@ ( - (get_random_u32() & 0xffff) + get_random_u16() \| - (get_random_u32() & 0xff) + get_random_u8() \| - (get_random_u32() % 65536) + get_random_u16() \| - (get_random_u32() % 256) + get_random_u8() \| - (get_random_u32() >> 16) + get_random_u16() \| - (get_random_u32() >> 24) + get_random_u8() \| - (u16)get_random_u32() + get_random_u16() \| - (u8)get_random_u32() + get_random_u8() \| - (__be16)get_random_u32() + (__be16)get_random_u16() \| - (__le16)get_random_u32() + (__le16)get_random_u16() \| - prandom_u32_max(65536) + get_random_u16() \| - prandom_u32_max(256) + get_random_u8() \| - E->inet_id = get_random_u32() + E->inet_id = get_random_u16() ) @@ identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; typedef u16; identifier v; @@ - u16 v = get_random_u32(); + u16 v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; typedef u8; identifier v; @@ - u8 v = get_random_u32(); + u8 v = get_random_u8(); @@ identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; typedef u16; u16 v; @@ - v = get_random_u32(); + v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; typedef u8; u8 v; @@ - v = get_random_u32(); + v = get_random_u8(); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Examine limits @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value < 256: coccinelle.RESULT = cocci.make_ident("get_random_u8") elif value < 65536: coccinelle.RESULT = cocci.make_ident("get_random_u16") else: print("Skipping large mask of %s" % (literal)) cocci.include_match(False) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; identifier add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + (RESULT() & LITERAL) Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-10-11 17:42:58 -06:00
Jakub Kicinski	b48b89f9c1	net: drop the weight argument from netif_napi_add We tell driver developers to always pass NAPI_POLL_WEIGHT as the weight to netif_napi_add(). This may be confusing to newcomers, drop the weight argument, those who really need to tweak the weight can use netif_napi_add_weight(). Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-28 18:57:14 -07:00
Jakub Kicinski	0140a7168f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/freescale/fec.h `7b15515fc1` ("Revert "fec: Restart PPS after link state change"") `40c79ce13b` ("net: fec: add stop mode support for imx8 platform") https://lore.kernel.org/all/20220921105337.62b41047@canb.auug.org.au/ drivers/pinctrl/pinctrl-ocelot.c `c297561bc9` ("pinctrl: ocelot: Fix interrupt controller") `181f604b33` ("pinctrl: ocelot: add ability to be used in a non-mmio configuration") https://lore.kernel.org/all/20220921110032.7cd28114@canb.auug.org.au/ tools/testing/selftests/drivers/net/bonding/Makefile `bbb774d921` ("net: Add tests for bonding and team address list management") `152e8ec776` ("selftests/bonding: add a test for bonding lladdr target") https://lore.kernel.org/all/20220921110437.5b7dbd82@canb.auug.org.au/ drivers/net/can/usb/gs_usb.c `5440428b3d` ("can: gs_usb: gs_can_open(): fix race dev->can.state condition") `45dfa45f52` ("can: gs_usb: add RX and TX hardware timestamp support") https://lore.kernel.org/all/84f45a7d-92b6-4dc5-d7a1-072152fab6ff@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-22 13:02:10 -07:00
Jason A. Donenfeld	26c013108c	wireguard: netlink: avoid variable-sized memcpy on sockaddr Doing a variable-sized memcpy is slower, and the compiler isn't smart enough to turn this into a constant-size assignment. Further, Kees' latest fortified memcpy will actually bark, because the destination pointer is type sockaddr, not explicitly sockaddr_in or sockaddr_in6, so it thinks there's an overflow: memcpy: detected field-spanning write (size 28) of single field "&endpoint.addr" at drivers/net/wireguard/netlink.c:446 (size 16) Fix this by just assigning by using explicit casts for each checked case. Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reported-by: syzbot+a448cda4dba2dac50de5@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 11:26:14 -07:00
Jason A. Donenfeld	684dec3cf4	wireguard: ratelimiter: disable timings test by default A previous commit tried to make the ratelimiter timings test more reliable but in the process made it less reliable on other configurations. This is an impossible problem to solve without increasingly ridiculous heuristics. And it's not even a problem that actually needs to be solved in any comprehensive way, since this is only ever used during development. So just cordon this off with a DEBUG_ ifdef, just like we do for the trie's randomized tests, so it can be enabled while hacking on the code, and otherwise disabled in CI. In the process we also revert `151c8e499f`. Fixes: `151c8e499f` ("wireguard: ratelimiter: use hrtimer in selftest") Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 11:26:13 -07:00
Jakub Kicinski	9c5d03d362	genetlink: start to validate reserved header bytes We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel) Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-29 12:47:15 +01:00
Linus Torvalds	228dfe98a3	Char / Misc driver changes for 6.0-rc1 Here is the large set of char and misc and other driver subsystem changes for 6.0-rc1. Highlights include: - large set of IIO driver updates, additions, and cleanups - new habanalabs device support added (loads of register maps much like GPUs have) - soundwire driver updates - phy driver updates - slimbus driver updates - tiny virt driver fixes and updates - misc driver fixes and updates - interconnect driver updates - hwtracing driver updates - fpga driver updates - extcon driver updates - firmware driver updates - counter driver update - mhi driver fixes and updates - binder driver fixes and updates - speakup driver fixes Full details are in the long shortlog contents. All of these have been in linux-next for a while without any reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYup9QQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylBKQCfaSuzl9ZP9dTvAw2FPp14oRqXnpoAnicvWAoq 1vU9Vtq2c73uBVLdZm4m =AwP3 -----END PGP SIGNATURE----- Merge tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc driver updates from Greg KH: "Here is the large set of char and misc and other driver subsystem changes for 6.0-rc1. Highlights include: - large set of IIO driver updates, additions, and cleanups - new habanalabs device support added (loads of register maps much like GPUs have) - soundwire driver updates - phy driver updates - slimbus driver updates - tiny virt driver fixes and updates - misc driver fixes and updates - interconnect driver updates - hwtracing driver updates - fpga driver updates - extcon driver updates - firmware driver updates - counter driver update - mhi driver fixes and updates - binder driver fixes and updates - speakup driver fixes All of these have been in linux-next for a while without any reported problems" * tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (634 commits) drivers: lkdtm: fix clang -Wformat warning char: remove VR41XX related char driver misc: Mark MICROCODE_MINOR unused spmi: trace: fix stack-out-of-bound access in SPMI tracing functions dt-bindings: iio: adc: Add compatible for MT8188 iio: light: isl29028: Fix the warning in isl29028_remove() iio: accel: sca3300: Extend the trigger buffer from 16 to 32 bytes iio: fix iio_format_avail_range() printing for none IIO_VAL_INT iio: adc: max1027: unlock on error path in max1027_read_single_value() iio: proximity: sx9324: add empty line in front of bullet list iio: magnetometer: hmc5843: Remove duplicate 'the' iio: magn: yas530: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: magnetometer: ak8974: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: veml6030: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: vcnl4035: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: vcnl4000: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: tsl2591: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() iio: light: tsl2583: Use DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr() iio: light: isl29028: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() iio: light: gp2ap002: Switch to DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr() ...	2022-08-04 11:05:48 -07:00
Jason A. Donenfeld	c31b14d86d	wireguard: allowedips: don't corrupt stack when detecting overflow In case push_rcu() and related functions are buggy, there's a WARN_ON(len >= 128), which the selftest tries to hit by being tricky. In case it is hit, we shouldn't corrupt the kernel's stack, though; otherwise it may be hard to even receive the report that it's buggy. So conditionalize the stack write based on that WARN_ON()'s return value. Note that this never actually happens anyway. The WARN_ON() in the first place is bounded by IS_ENABLED(DEBUG), and isn't expected to ever actually hit. This is just a debugging sanity check. Additionally, hoist the constant 128 into a named enum, MAX_ALLOWEDIPS_BITS, so that it's clear why this value is chosen. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/all/CAHk-=wjJZGA6w_DxA+k7Ejbqsq+uGK==koPai3sqdsfJqemvag@mail.gmail.com/ Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-02 13:47:50 -07:00
Jason A. Donenfeld	151c8e499f	wireguard: ratelimiter: use hrtimer in selftest Using msleep() is problematic because it's compared against ratelimiter.c's ktime_get_coarse_boottime_ns(), which means on systems with slow jiffies (such as UML's forced HZ=100), the result is inaccurate. So switch to using schedule_hrtimeout(). However, hrtimer gives us access only to the traditional posix timers, and none of the _COARSE variants. So now, rather than being too imprecise like jiffies, it's too precise. One solution would be to give it a large "range" value, but this will still fire early on a loaded system. A better solution is to align the timeout to the actual coarse timer, and then round up to the nearest tick, plus change. So add the timeout to the current coarse time, and then schedule_hrtimer() until the absolute computed time. This should hopefully reduce flakes in CI as well. Note that we keep the retry loop in case the entire function is running behind, because the test could still be scheduled out, by either the kernel or by the hypervisor's kernel, in which case restarting the test and hoping to not be scheduled out still helps. Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-02 13:47:50 -07:00
Kalesh Singh	261e224d6a	pm/sleep: Add PM_USERSPACE_AUTOSLEEP Kconfig Systems that initiate frequent suspend/resume from userspace can make the kernel aware by enabling PM_USERSPACE_AUTOSLEEP config. This allows for certain sleep-sensitive code (wireguard/rng) to decide on what preparatory work should be performed (or not) in their pm_notification callbacks. This patch was prompted by the discussion at [1] which attempts to remove CONFIG_ANDROID that currently guards these code paths. [1] https://lore.kernel.org/r/20220629150102.1582425-1-hch@lst.de/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Link: https://lore.kernel.org/r/20220630191230.235306-1-kaleshsingh@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-07-01 10:39:20 +02:00

1 2 3

118 Commits