mirror_ubuntu-kernels/kernel/bpf
Hou Tao b9e9ed90b1 bpf: Call free_htab_elem() after htab_unlock_bucket()
For htab of maps, when the map is removed from the htab, it may hold the
last reference of the map. bpf_map_fd_put_ptr() will invoke
bpf_map_free_id() to free the id of the removed map element. However,
bpf_map_fd_put_ptr() is invoked while holding a bucket lock
(raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock
(spinlock_t), triggering the following lockdep warning:

  =============================
  [ BUG: Invalid wait context ]
  6.11.0-rc4+ #49 Not tainted
  -----------------------------
  test_maps/4881 is trying to lock:
  ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70
  other info that might help us debug this:
  context-{5:5}
  2 locks held by test_maps/4881:
   #0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270
   #1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80
  stack backtrace:
  CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ #49
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
  Call Trace:
   <TASK>
   dump_stack_lvl+0x6e/0xb0
   dump_stack+0x10/0x20
   __lock_acquire+0x73e/0x36c0
   lock_acquire+0x182/0x450
   _raw_spin_lock_irqsave+0x43/0x70
   bpf_map_free_id.part.0+0x21/0x70
   bpf_map_put+0xcf/0x110
   bpf_map_fd_put_ptr+0x9a/0xb0
   free_htab_elem+0x69/0xe0
   htab_map_update_elem+0x50f/0xa80
   bpf_fd_htab_map_update_elem+0x131/0x270
   htab_map_update_elem+0x50f/0xa80
   bpf_fd_htab_map_update_elem+0x131/0x270
   bpf_map_update_value+0x266/0x380
   __sys_bpf+0x21bb/0x36b0
   __x64_sys_bpf+0x45/0x60
   x64_sys_call+0x1b2a/0x20d0
   do_syscall_64+0x5d/0x100
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

One way to fix the lockdep warning is using raw_spinlock_t for
map_idr_lock as well. However, bpf_map_alloc_id() invokes
idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a
similar lockdep warning because the slab's lock (s->cpu_slab->lock) is
still a spinlock.

Instead of changing map_idr_lock's type, fix the issue by invoking
htab_put_fd_value() after htab_unlock_bucket(). However, only deferring
the invocation of htab_put_fd_value() is not enough, because the old map
pointers in htab of maps can not be saved during batched deletion.
Therefore, also defer the invocation of free_htab_elem(), so these
to-be-freed elements could be linked together similar to lru map.

There are four callers for ->map_fd_put_ptr:

(1) alloc_htab_elem() (through htab_put_fd_value())
It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of
htab_put_fd_value() can not simply move after htab_unlock_bucket(),
because the old element has already been stashed in htab->extra_elems.
It may be reused immediately after htab_unlock_bucket() and the
invocation of htab_put_fd_value() after htab_unlock_bucket() may release
the newly-added element incorrectly. Therefore, saving the map pointer
of the old element for htab of maps before unlocking the bucket and
releasing the map_ptr after unlock. Beside the map pointer in the old
element, should do the same thing for the special fields in the old
element as well.

(2) free_htab_elem() (through htab_put_fd_value())
Its caller includes __htab_map_lookup_and_delete_elem(),
htab_map_delete_elem() and __htab_map_lookup_and_delete_batch().

For htab_map_delete_elem(), simply invoke free_htab_elem() after
htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just
like lru map, linking the to-be-freed element into node_to_free list
and invoking free_htab_elem() for these element after unlock. It is safe
to reuse batch_flink as the link for node_to_free, because these
elements have been removed from the hash llist.

Because htab of maps doesn't support lookup_and_delete operation,
__htab_map_lookup_and_delete_elem() doesn't have the problem, so kept
it as is.

(3) fd_htab_map_free()
It invokes ->map_fd_put_ptr without raw_spinlock_t.

(4) bpf_fd_htab_map_update_elem()
It invokes ->map_fd_put_ptr without raw_spinlock_t.

After moving free_htab_elem() outside htab bucket lock scope, using
pcpu_freelist_push() instead of __pcpu_freelist_push() to disable
the irq before freeing elements, and protecting the invocations of
bpf_mem_cache_free() with migrate_{disable|enable} pair.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241106063542.357743-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11 08:18:30 -08:00
..
preload bpf: make preloaded map iterators to display map elements count 2023-07-06 12:42:25 -07:00
arena.c bpf: Fix remap of arena. 2024-06-18 17:19:46 +02:00
arraymap.c bpf: Prevent tailcall infinite loop caused by freplace 2024-10-16 09:21:18 -07:00
bloom_filter.c bpf: Check bloom filter map value size 2024-03-27 09:56:17 -07:00
bpf_cgrp_storage.c bpf: Add "bool swap_uptrs" arg to bpf_local_storage_update() and bpf_selem_alloc() 2024-10-24 10:25:59 -07:00
bpf_inode_storage.c bpf: Add "bool swap_uptrs" arg to bpf_local_storage_update() and bpf_selem_alloc() 2024-10-24 10:25:59 -07:00
bpf_iter.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
bpf_local_storage.c bpf: Add uptr support in the map_value of the task local storage. 2024-10-24 10:25:59 -07:00
bpf_lru_list.c bpf: Address KCSAN report on bpf_lru_list 2023-05-12 12:01:03 -07:00
bpf_lru_list.h bpf: lru: Remove unused declaration bpf_lru_promote() 2023-08-08 17:21:42 -07:00
bpf_lsm.c bpf, lsm: Remove bpf_lsm_key_free hook 2024-10-08 12:52:40 -07:00
bpf_struct_ops.c bpf: Check unsupported ops from the bpf_struct_ops's cfi_stubs 2024-07-29 12:54:13 -07:00
bpf_task_storage.c bpf: Add uptr support in the map_value of the task local storage. 2024-10-24 10:25:59 -07:00
btf_iter.c bpf: Remove custom build rule 2024-08-30 08:55:26 -07:00
btf_relocate.c bpf: Remove custom build rule 2024-08-30 08:55:26 -07:00
btf.c bpf: Mark raw_tp arguments with PTR_MAYBE_NULL 2024-11-04 11:37:36 -08:00
cgroup_iter.c bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg 2023-11-07 15:24:25 -08:00
cgroup.c bpf: Allow bpf_current_task_under_cgroup() with BPF_CGROUP_* 2024-08-19 15:25:30 -07:00
core.c bpf: Prevent tailcall infinite loop caused by freplace 2024-10-16 09:21:18 -07:00
cpumap.c bpf, cpumap: Move xdp:xdp_cpumap_kthread tracepoint before rcv 2024-09-11 16:32:11 +02:00
cpumask.c bpf: Allow invoking kfuncs from BPF_PROG_TYPE_SYSCALL progs 2024-04-05 10:56:09 -07:00
crypto.c bpf: crypto: make state and IV dynptr nullable 2024-06-13 16:33:04 -07:00
devmap.c bpf: devmap: provide rxq after redirect 2024-10-02 13:48:26 -07:00
disasm.c bpf: add special internal-only MOV instruction to resolve per-CPU addrs 2024-04-03 10:29:55 -07:00
disasm.h bpf: Relicense disassembler as GPL-2.0-only OR BSD-2-Clause 2021-09-02 14:49:23 +02:00
dispatcher.c bpf: Use arch_bpf_trampoline_size 2023-12-06 17:17:20 -08:00
hashtab.c bpf: Call free_htab_elem() after htab_unlock_bucket() 2024-11-11 08:18:30 -08:00
helpers.c bpf: Add open coded version of kmem_cache iterator 2024-11-01 11:08:32 -07:00
inode.c bpf: Preserve param->string when parsing mount options 2024-10-22 12:56:38 -07:00
Kconfig bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of 2024-05-14 00:36:29 -07:00
kmem_cache_iter.c bpf: Add open coded version of kmem_cache iterator 2024-11-01 11:08:32 -07:00
link_iter.c bpf: Add bpf_link iterator 2022-05-10 11:20:45 -07:00
local_storage.c bpf: Replace 8 seq_puts() calls by seq_putc() calls 2024-07-29 12:53:00 -07:00
log.c bpf: Fix print_reg_state's constant scalar dump 2024-10-17 11:06:34 -07:00
lpm_trie.c bpf: Avoid kfree_rcu() under lock in bpf_lpm_trie. 2024-03-29 11:10:41 -07:00
Makefile bpf: Add kmem_cache iterator 2024-10-14 18:33:04 -07:00
map_in_map.c bpf: switch maps to CLASS(fd, ...) 2024-08-13 15:58:17 -07:00
map_in_map.h bpf: Add map and need_defer parameters to .map_fd_put_ptr() 2023-12-04 17:50:26 -08:00
map_iter.c bpf: treewide: Annotate BPF kfuncs in BTF 2024-01-31 20:40:56 -08:00
memalloc.c bpf: Call kfree(obj) only once in free_one() 2024-10-03 17:47:35 -07:00
mmap_unlock_work.h bpf: Introduce helper bpf_find_vma 2021-11-07 11:54:51 -08:00
mprog.c bpf: Handle bpf_mprog_query with NULL entry 2023-10-06 17:11:20 -07:00
net_namespace.c net: Add includes masked by netdevice.h including uapi/bpf.h 2021-12-29 20:03:05 -08:00
offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-09-21 21:49:45 +02:00
percpu_freelist.c bpf: Initialize same number of free nodes for each pcpu_freelist 2022-11-11 12:05:14 -08:00
percpu_freelist.h
prog_iter.c
queue_stack_maps.c bpf: Avoid deadlock when using queue and stack maps from NMI 2023-09-11 19:04:49 -07:00
relo_core.c bpf: Remove custom build rule 2024-08-30 08:55:26 -07:00
reuseport_array.c bpf: Use sockfd_put() helper 2024-08-30 08:57:47 -07:00
ringbuf.c bpf: Add MEM_WRITE attribute 2024-10-22 15:42:56 -07:00
stackmap.c bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers 2024-09-11 09:58:31 -07:00
syscall.c bpf: Add support for uprobe multi session attach 2024-11-11 08:18:03 -08:00
sysfs_btf.c btf: Avoid weak external references 2024-04-16 16:35:13 +02:00
task_iter.c bpf: Fix iter/task tid filtering 2024-10-17 10:52:18 -07:00
tcx.c bpf, tcx: Get rid of tcx_link_const 2023-10-23 15:01:53 -07:00
tnum.c bpf: simplify tnum output if a fully known constant 2023-12-02 11:36:51 -08:00
token.c bpf: convert bpf_token_create() to CLASS(fd, ...) 2024-09-12 18:58:02 -07:00
trampoline.c bpf: Prevent tailcall infinite loop caused by freplace 2024-10-16 09:21:18 -07:00
verifier.c bpf: Add support for uprobe multi session attach 2024-11-11 08:18:03 -08:00