linux/kernel/rcu
Frederic Weisbecker 61399e0c54 rcu: Fix racy re-initialization of irq_work causing hangs
RCU re-initializes the deferred QS irq work everytime before attempting
to queue it. However there are situations where the irq work is
attempted to be queued even though it is already queued. In that case
re-initializing messes-up with the irq work queue that is about to be
handled.

The chances for that to happen are higher when the architecture doesn't
support self-IPIs and irq work are then all lazy, such as with the
following sequence:

1) rcu_read_unlock() is called when IRQs are disabled and there is a
   grace period involving blocked tasks on the node. The irq work
   is then initialized and queued.

2) The related tasks are unblocked and the CPU quiescent state
   is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE,
   allowing the irq work to be requeued in the future (note the previous
   one hasn't fired yet).

3) A new grace period starts and the node has blocked tasks.

4) rcu_read_unlock() is called when IRQs are disabled again. The irq work
   is re-initialized (but it's queued! and its node is cleared) and
   requeued. Which means it's requeued to itself.

5) The irq work finally fires with the tick. But since it was requeued
   to itself, it loops and hangs.

Fix this with initializing the irq work only once before the CPU boots.

Fixes: b41642c877 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
2025-08-11 08:43:49 +05:30
..
Kconfig srcu: Make FORCE_NEED_SRCU_NMI_SAFE depend on RCU_EXPERT 2025-03-28 21:19:17 -07:00
Kconfig.debug rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool 2025-02-05 07:14:40 -08:00
Makefile
rcu_segcblist.c rcu/nocb: Simplify (de-)offloading state machine 2024-09-09 00:03:55 +05:30
rcu_segcblist.h rcu: Remove unused declaration rcu_segcblist_offload() 2024-10-22 15:36:56 +02:00
rcu.h Merge branches 'rcu/misc-for-6.16', 'rcu/seq-counters-for-6.16' and 'rcu/torture-for-6.16' into rcu/for-next 2025-05-16 11:18:16 -04:00
rcuscale.c rcuscale: using kcalloc() to relpace kmalloc() 2025-05-16 09:00:54 -04:00
rcutorture.c rcutorture: Remove support for SRCU-lite 2025-07-16 09:48:44 +05:30
refscale.c Merge branches 'rcu-exp.23.07.2025', 'rcu.22.07.2025', 'torture-scripts.16.07.2025', 'srcu.19.07.2025', 'rcu.nocb.18.07.2025' and 'refscale.07.07.2025' into rcu.merge.23.07.2025 2025-07-23 21:42:20 +05:30
srcutiny.c Merge branches 'docs.2025.02.04a', 'lazypreempt.2025.03.04a', 'misc.2025.03.04a', 'srcu.2025.02.05a' and 'torture.2025.02.05a' 2025-03-04 18:47:32 -08:00
srcutree.c srcu: Expedite SRCU-fast grace periods 2025-07-16 09:50:57 +05:30
sync.c rcu: Eliminate lockless accesses to rcu_sync->gp_count 2024-07-04 13:48:57 -07:00
tasks.h treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
tiny.c RCU pull request for v6.15 2025-03-24 19:41:37 -07:00
tree_exp.h rcu/exp: Remove needless CPU up quiescent state report 2025-07-08 23:21:13 +05:30
tree_nocb.h rcu: Fix delayed execution of hurry callbacks 2025-07-18 09:25:34 +05:30
tree_plugin.h rcu: Fix racy re-initialization of irq_work causing hangs 2025-08-11 08:43:49 +05:30
tree_stall.h sched_ext: Changes for v6.17 2025-07-31 16:29:46 -07:00
tree.c rcu: Fix racy re-initialization of irq_work causing hangs 2025-08-11 08:43:49 +05:30
tree.h rcu: Fix racy re-initialization of irq_work causing hangs 2025-08-11 08:43:49 +05:30
update.c torture: Add dowarn argument to torture_sched_setaffinity() 2024-12-14 16:38:23 +01:00