The sched/task.h header file currently exposes a tryget_task_struct()
function, but it is very risky to use it: If the last refcount of the
task is dropped using put_task_struct_many(), then the task is freed
right away without an RCU grace period.

This means that if the kernel contains a code path anywhere such that
the last refcount of a task may be dropped with put_task_struct_many(),
and it also contains a code path anywhere that tries to stash a task
pointer under rcu and use tryget_task_struct() on it, then if they ever
execute on the same 'struct task_struct', it results in a
use-after-free.

The above applies even if the RCU user drops its own task reference with
put_task_struct(), because if that is not the last reference, then it's
possible for another thread to invoke put_task_struct_many() and free
the task less than a grace period after the RCU user called
put_task_struct().

There does not appear to be an actual problem in the kernel tree right
now because there are no in-tree users of put_task_struct_many() where
refcount_sub_and_test() might return 'true'. Io-uring invokes the
function from task work while the task is still running, so it will not
decrement it all the way to zero. (Note that if I'm wrong about this,
then it's probably possible to trigger UAF by combining this codepath in
io-uring with the tryget_task_struct() call in sched-ext.)

However, the current situation is fragile and error-prone.
- If you look at put_task_struct_many() in isolation, it looks like it
  would be okay to call it in a situation where refcount_sub_and_test()
  might return 'true'.
- Similarly, if you look at tryget_task_struct(), you would assume that
  you are allowed to call this method for a grace period after 'users'
  hitting zero. (If not, why does it exist?)
But if two different kernel developers anywhere in the kernel make these
conflicting assumptions at any point in the future, then the combination
of their code may lead to a use-after-free if there is any way for them
to interact via the same 'struct task_struct'.

Thus, as a defensive measure, we should either make
put_task_struct_many() use call_rcu(), or we should delete
tryget_task_struct(). This patch suggests the former because it does not
change anything for any callers that exist today. (As argued previously,
the body of the 'if' statement is dead code in the kernel today.)

The comment in put_task_struct() is also updated so that nobody changes
its implementation to only use call_rcu() under PREEMPT_RT in the
future. The current comment suggests that would be a legal change, but
it is similarly incompatible with anyone using tryget_task_struct().

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
Including sched-ext and io-uring in the cc list as they are the only
users of tryget_task_struct() and put_task_struct_many() respectively.
---
 include/linux/sched/task.h | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 41ed884cffc9..da2fbd17b676 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -131,19 +131,25 @@ static inline void put_task_struct(struct task_struct *t)
 		return;
 
 	/*
-	 * Under PREEMPT_RT, we can't call __put_task_struct
-	 * in atomic context because it will indirectly
-	 * acquire sleeping locks. The same is true if the
-	 * current process has a mutex enqueued (blocked on
-	 * a PI chain).
+	 * Delay __put_task_struct() for one grace period so
+	 * that tryget_task_struct() may be used for one
+	 * grace period after any call to put_task_struct().
 	 *
-	 * In !RT, it is always safe to call __put_task_struct().
-	 * Though, in order to simplify the code, resort to the
-	 * deferred call too.
+	 * This also has the benefit of making it legal to
+	 * call put_task_struct() in atomic context. We
+	 * can't do that under PREEMPT_RT because it will
+	 * indirectly acquire sleeping locks. The same is
+	 * true if the current process has a mutex enqueued
+	 * (blocked on a PI chain).
 	 *
 	 * call_rcu() will schedule __put_task_struct_rcu_cb()
 	 * to be called in process context.
 	 *
+	 * In !RT, it is safe to call __put_task_struct()
+	 * from atomic context, but we still need to delay
+	 * cleanup for a grace period to accommodate
+	 * tryget_task_struct() callers.
+	 *
 	 * __put_task_struct() is called when
 	 * refcount_dec_and_test(&t->usage) succeeds.
 	 *
@@ -164,7 +170,7 @@ DEFINE_FREE(put_task, struct task_struct *, if (_T) put_task_struct(_T))
 static inline void put_task_struct_many(struct task_struct *t, int nr)
 {
 	if (refcount_sub_and_test(nr, &t->usage))
-		__put_task_struct(t);
+		call_rcu(&t->rcu, __put_task_struct_rcu_cb);
 }
 
 void put_task_struct_rcu_user(struct task_struct *task);

---
base-commit: 7fd2df204f342fc17d1a0bfcd474b24232fb0f32
change-id: 20260508-put-task-struct-many-5b5b2f4ae174

Best regards,
-- 
Alice Ryhl <aliceryhl@google.com>