Commit Graph

21 Commits

Author SHA1 Message Date
Sebastian Andrzej Siewior
73c6c02b4f futex: Correct the kernedoc return value for futex_wait_setup().
The kerneldoc for futex_wait_setup() states it can return "0" or "<1".
This isn't true because the error case is "<0" not less than 1.

Document that <0 is returned on error. Drop the possible return values
and state possible reasons.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/20250517151455.1065363-6-bigeasy@linutronix.de
2025-05-21 13:57:41 +02:00
Borislav Petkov (AMD)
bd59f61709 futex: Fix kernel-doc comments
Fix those:

  ./kernel/futex/futex.h:208: warning: Function parameter or struct member 'drop_hb_ref' not described in 'futex_q'
  ./kernel/futex/waitwake.c:343: warning: expecting prototype for futex_wait_queue(). Prototype was for futex_do_wait() instead
  ./kernel/futex/waitwake.c:594: warning: Function parameter or struct member 'task' not described in 'futex_wait_setup'

Fixes: 93f1b6d79a ("futex: Move futex_queue() into futex_wait_setup()")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250512185641.0450a99b@canb.auug.org.au # report
Link: https://lore.kernel.org/r/20250515171641.24073-1-bp@kernel.org     # submission
2025-05-16 09:33:53 +02:00
Sebastian Andrzej Siewior
3f6b233018 futex: Acquire a hash reference in futex_wait_multiple_setup()
futex_wait_multiple_setup() changes task_struct::__state to
!TASK_RUNNING and then enqueues on multiple futexes. Every
futex_q_lock() acquires a reference on the global hash which is
dropped later.

If a rehash is in progress then the loop will block on
mm_struct::futex_hash_bucket for the rehash to complete and this will
lose the previously set task_struct::__state.

Acquire a reference on the local hash to avoiding blocking on
mm_struct::futex_hash_bucket.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-9-bigeasy@linutronix.de
2025-05-03 12:02:06 +02:00
Peter Zijlstra
6c67f8d880 futex: Create futex_hash() get/put class
This gets us:

  hb = futex_hash(key) /* gets hb and inc users */
  futex_hash_get(hb)   /* inc users */
  futex_hash_put(hb)   /* dec users */

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-7-bigeasy@linutronix.de
2025-05-03 12:02:06 +02:00
Peter Zijlstra
8486d12f55 futex: Create hb scopes
Create explicit scopes for hb variables; almost pure re-indent.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-6-bigeasy@linutronix.de
2025-05-03 12:02:05 +02:00
Peter Zijlstra
2fb292096d futex: Pull futex_hash() out of futex_q_lock()
Getting the hash bucket and queuing it are two distinct actions. In
light of wanting to add a put hash bucket function later, untangle
them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-5-bigeasy@linutronix.de
2025-05-03 12:02:05 +02:00
Peter Zijlstra
93f1b6d79a futex: Move futex_queue() into futex_wait_setup()
futex_wait_setup() has a weird calling convention in order to return
hb to use as an argument to futex_queue().

Mostly such that requeue can have an extra test in between.

Reorder code a little to get rid of this and keep the hb usage inside
futex_wait_setup().

[bigeasy: fixes]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-4-bigeasy@linutronix.de
2025-05-03 12:02:05 +02:00
Linus Torvalds
fa76887bb7 Fix a dangling pointer bug in the futex code used by the
uring code, which isn't causing problems at the moment
 due to uring ABI limitations leaving it essentially
 unused in current usages, but is a good idea to fix
 nevertheless.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmenHrkRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jnLQ//T+vNYeyQ5Nc3CuqsZfv5h77ijCLzazSh
 qu5LXyGHHIlLLPEzh53wRQQbGBQ6A2HdbVVphn8k/0v4eT1Ez5yN7AiTYuPkEP73
 m6MWQAWcGQ7M7vR7cvWIsIB1wS5PD2g3UdvS8x+OECZk4lnSx4Xh/TfbRIURwhe2
 SS6jgRGhaodsp8N2o8c/BgrvvHY9aedJQhx4iAh3PiuPomygr9kfIAaQstQNKx61
 w4NQBQhK93LD9duESc+ONDlRhzSvbdJfRby1hbHzvcnCGe5S2aZzOfY31CPJbOt6
 UvbfeStEGEHkfqbZOXEtwVPZ80+U2hWvD67wSXFB0pTc68zkuGN3/Ko88GCyZx5+
 mxDRYWLoExknEUuk/Mc+hOzu1uaCjpXxA8qRr7SW3ewH1QOGr+ZISQgSffRdujbH
 2E2cBh9/HOeVZ/7nAvfkSU+yyfvBwZBP/Q0PN5ODpk3S7ZfCC7h57oClWx4WUuTX
 0H9N2IvPG0hqmqljKkt/5Xc4Qgvh6RA+pmxK0uUngViuw+v81Ea7/m+kbetQRO07
 OPOH/UT4nlmwoCwch+nKr/MRmZADpXEZyeRKS0kBJQLRMkN9VT1+e2Zf3Yir2Ji4
 hveqiJKiIgCPPxz3w+N/XcSgOTQUN1PmOLjEXB+gRNRctsvGZOtuY2HZIydQAMbT
 EjJBwkEWIQo=
 =qhN4
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Ingo Molnar:
 "Fix a dangling pointer bug in the futex code used by the uring code.

  It isn't causing problems at the moment due to uring ABI limitations
  leaving it essentially unused in current usages, but is a good idea to
  fix nevertheless"

* tag 'locking-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Pass in task to futex_queue()
2025-02-08 10:54:11 -08:00
Jens Axboe
5e0e02f0d7 futex: Pass in task to futex_queue()
futex_queue() -> __futex_queue() uses 'current' as the task to store in
the struct futex_q->task field. This is fine for synchronous usage of
the futex infrastructure, but it's not always correct when used by
io_uring where the task doing the initial futex_queue() might not be
available later on. This doesn't lead to any issues currently, as the
io_uring side doesn't support PI futexes, but it does leave a
potentially dangling pointer which is never a good idea.

Have futex_queue() take a task_struct argument, and have the regular
callers pass in 'current' for that. Meanwhile io_uring can just pass in
NULL, as the task should never be used off that path. In theory
req->tctx->task could be used here, but there's no point populating it
with a task field that will never be used anyway.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/22484a23-542c-4003-b721-400688a0d055@kernel.dk
2025-01-24 09:37:30 +01:00
Yafang Shao
b6dcdb06c0 kernel: remove get_task_comm() and print task comm directly
Patch series "Remove get_task_comm() and print task comm directly", v2.

Since task->comm is guaranteed to be NUL-terminated, we can print it
directly without the need to copy it into a separate buffer.  This
simplifies the code and avoids unnecessary operations.


This patch (of 5):

Since task->comm is guaranteed to be NUL-terminated, we can print it
directly without the need to copy it into a separate buffer.  This
simplifies the code and avoids unnecessary operations.

Link: https://lkml.kernel.org/r/20241219023452.69907-1-laoar.shao@gmail.com
Link: https://lkml.kernel.org/r/20241219023452.69907-2-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "André Almeida" <andrealmeid@igalia.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Vineet Gupta <vgupta@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12 20:21:15 -08:00
Kent Overstreet
8b7787a543 plist: Split out plist_types.h
Trimming down sched.h dependencies: we don't want to include more than
the base types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-20 19:26:31 -05:00
Jens Axboe
e9a56c9325 futex: make the vectored futex operations available
Rename unqueue_multiple() as futex_unqueue_multiple(), and make both
that and futex_wait_multiple_setup() available for external users. This
is in preparation for wiring up vectored waits in io_uring.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-29 02:37:07 -06:00
Jens Axboe
e52c43403c futex: abstract out a __futex_wake_mark() helper
Move the unqueue and lock_ptr clear into a helper that futex_wake_mark()
calls. Add it to the public functions as well, in preparation for using
it outside the core futex code.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-29 02:36:54 -06:00
Jens Axboe
12a4be50af futex: factor out the futex wake handling
In preparation for having another waker that isn't futex_wake_mark(),
add a wake handler in futex_q. No extra data is associated with the
handler outside of struct futex_q itself. futex_wake_mark() is defined as
the standard wakeup helper, now set through futex_q_init like other
defaults.

Normal sync futex waiting relies on wake_q holding tasks that should
be woken up. This is what futex_wake_mark() does, it'll unqueue the
futex and add the associated task to the wake queue. For async usage of
futex waiting, rather than having tasks sleeping on the futex, we'll
need to deal with a futex wake differently. For the planned io_uring
case, that means posting a completion event for the task in question.
Having a definable wake handler can help support that use case.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-29 02:36:50 -06:00
peterz@infradead.org
3b63a55f49 futex: Propagate flags into get_futex_key()
Instead of only passing FLAGS_SHARED as a boolean, pass down flags as
a whole.

No functional change intended.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230921105248.282857501@noisy.programming.kicks-ass.net
2023-09-21 19:22:09 +02:00
peterz@infradead.org
cb8c4312af futex: Add sys_futex_wait()
To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This
syscall implements what was previously known as FUTEX_WAIT_BITSET
except it uses 'unsigned long' for the value and bitmask arguments,
takes timespec and clockid_t arguments for the absolute timeout and
uses FUTEX2 flags.

The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20230921105248.164324363@noisy.programming.kicks-ass.net
2023-09-21 19:22:08 +02:00
peterz@infradead.org
43adf84495 futex: FLAGS_STRICT
The current semantics for futex_wake() are a bit loose, specifically
asking for 0 futexes to be woken actually gets you 1.

Adding a !nr check to sys_futex_wake() makes that it would return 0
for unaligned futex words, because that check comes in the shared
futex_wake() function. Adding the !nr check there, would affect the
legacy sys_futex() semantics.

Hence frob a flag :-(

Suggested-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230921105248.048643656@noisy.programming.kicks-ass.net
2023-09-21 19:22:07 +02:00
peterz@infradead.org
5694289ce1 futex: Flag conversion
Futex has 3 sets of flags:

 - legacy futex op bits
 - futex2 flags
 - internal flags

Add a few helpers to convert from the API flags into the internal
flags.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/20230921105247.722140574@noisy.programming.kicks-ass.net
2023-09-21 19:22:05 +02:00
Peter Zijlstra
f5d39b0208 freezer,sched: Rewrite core freezer logic
Rewrite the core freezer to behave better wrt thawing and be simpler
in general.

By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
ensured frozen tasks stay frozen until thawed and don't randomly wake
up early, as is currently possible.

As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
two PF_flags (yay!).

Specifically; the current scheme works a little like:

	freezer_do_not_count();
	schedule();
	freezer_count();

And either the task is blocked, or it lands in try_to_freezer()
through freezer_count(). Now, when it is blocked, the freezer
considers it frozen and continues.

However, on thawing, once pm_freezing is cleared, freezer_count()
stops working, and any random/spurious wakeup will let a task run
before its time.

That is, thawing tries to thaw things in explicit order; kernel
threads and workqueues before doing bringing SMP back before userspace
etc.. However due to the above mentioned races it is entirely possible
for userspace tasks to thaw (by accident) before SMP is back.

This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
where the userspace task requires a special CPU to run.

As said; replace this with a special task state TASK_FROZEN and add
the following state transitions:

	TASK_FREEZABLE	-> TASK_FROZEN
	__TASK_STOPPED	-> TASK_FROZEN
	__TASK_TRACED	-> TASK_FROZEN

The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL
(IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state
is already required to deal with spurious wakeups and the freezer
causes one such when thawing the task (since the original state is
lost).

The special __TASK_{STOPPED,TRACED} states *can* be restored since
their canonical state is in ->jobctl.

With this, frozen tasks need an explicit TASK_FROZEN wakeup and are
free of undue (early / spurious) wakeups.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
2022-09-07 21:53:50 +02:00
André Almeida
bf69bad38c futex: Implement sys_futex_waitv()
Add support to wait on multiple futexes. This is the interface
implemented by this syscall:

futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes,
	    unsigned int flags, struct timespec *timeout, clockid_t clockid)

struct futex_waitv {
	__u64 val;
	__u64 uaddr;
	__u32 flags;
	__u32 __reserved;
};

Given an array of struct futex_waitv, wait on each uaddr. The thread
wakes if a futex_wake() is performed at any uaddr. The syscall returns
immediately if any waiter has *uaddr != val. *timeout is an optional
absolute timeout value for the operation. This syscall supports only
64bit sized timeout structs. The flags argument of the syscall should be
empty, but it can be used for future extensions. Flags for shared
futexes, sizes, etc. should be used on the individual flags of each
waiter.

__reserved is used for explicit padding and should be 0, but it might be
used for future extensions. If the userspace uses 32-bit pointers, it
should make sure to explicitly cast it when assigning to waitv::uaddr.

Returns the array index of one of the woken futexes. There’s no given
information of how many were woken, or any particular attribute of it
(if it’s the first woken, if it is of the smaller index...).

Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210923171111.300673-17-andrealmeid@collabora.com
2021-10-07 13:51:11 +02:00
Peter Zijlstra
a046f1a0d3 futex: Split out wait/wake
Move the wait/wake bits into their own file.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@collabora.com>
Link: https://lore.kernel.org/r/20210923171111.300673-15-andrealmeid@collabora.com
2021-10-07 13:51:11 +02:00