The sched/task.h header file currently exposes a tryget_task_struct() function, but it is very risky to use it: If the last refcount of the task is dropped using put_task_struct_many(), then the task is freed right away without an RCU grace period. This means that if the kernel contains a code path anywhere such that the last refcount of a task may be dropped with put_task_struct_many(), and it also contains a code path anywhere that tries to stash a task pointer under rcu and use tryget_task_struct() on it, then if they ever execute on the same 'struct task_struct', it results in a use-after-free. The above applies even if the RCU user drops its own task reference with put_task_struct(), because if that is not the last reference, then it's possible for another thread to invoke put_task_struct_many() and free the task less than a grace period after the RCU user called put_task_struct(). There does not appear to be an actual problem in the kernel tree right now because there are no in-tree users of put_task_struct_many() where refcount_sub_and_test() might return 'true'. Io-uring invokes the function from task work while the task is still running, so it will not decrement it all the way to zero. (Note that if I'm wrong about this, then it's probably possible to trigger UAF by combining this codepath in io-uring with the tryget_task_struct() call in sched-ext.) However, the current situation is fragile and error-prone. - If you look at put_task_struct_many() in isolation, it looks like it would be okay to call it in a situation where refcount_sub_and_test() might return 'true'. - Similarly, if you look at tryget_task_struct(), you would assume that you are allowed to call this method for a grace period after 'users' hitting zero. (If not, why does it exist?) But if two different kernel developers anywhere in the kernel make these conflicting assumptions at any point in the future, then the combination of their code may lead to a use-after-free if there is any way for them to interact via the same 'struct task_struct'. Thus, as a defensive measure, we should either make put_task_struct_many() use call_rcu(), or we should delete tryget_task_struct(). This patch suggests the former because it does not change anything for any callers that exist today. (As argued previously, the body of the 'if' statement is dead code in the kernel today.) The comment in put_task_struct() is also updated so that nobody changes its implementation to only use call_rcu() under PREEMPT_RT in the future. The current comment suggests that would be a legal change, but it is similarly incompatible with anyone using tryget_task_struct(). Signed-off-by: Alice Ryhl --- Including sched-ext and io-uring in the cc list as they are the only users of tryget_task_struct() and put_task_struct_many() respectively. --- include/linux/sched/task.h | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 41ed884cffc9..da2fbd17b676 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -131,19 +131,25 @@ static inline void put_task_struct(struct task_struct *t) return; /* - * Under PREEMPT_RT, we can't call __put_task_struct - * in atomic context because it will indirectly - * acquire sleeping locks. The same is true if the - * current process has a mutex enqueued (blocked on - * a PI chain). + * Delay __put_task_struct() for one grace period so + * that tryget_task_struct() may be used for one + * grace period after any call to put_task_struct(). * - * In !RT, it is always safe to call __put_task_struct(). - * Though, in order to simplify the code, resort to the - * deferred call too. + * This also has the benefit of making it legal to + * call put_task_struct() in atomic context. We + * can't do that under PREEMPT_RT because it will + * indirectly acquire sleeping locks. The same is + * true if the current process has a mutex enqueued + * (blocked on a PI chain). * * call_rcu() will schedule __put_task_struct_rcu_cb() * to be called in process context. * + * In !RT, it is safe to call __put_task_struct() + * from atomic context, but we still need to delay + * cleanup for a grace period to accommodate + * tryget_task_struct() callers. + * * __put_task_struct() is called when * refcount_dec_and_test(&t->usage) succeeds. * @@ -164,7 +170,7 @@ DEFINE_FREE(put_task, struct task_struct *, if (_T) put_task_struct(_T)) static inline void put_task_struct_many(struct task_struct *t, int nr) { if (refcount_sub_and_test(nr, &t->usage)) - __put_task_struct(t); + call_rcu(&t->rcu, __put_task_struct_rcu_cb); } void put_task_struct_rcu_user(struct task_struct *task); --- base-commit: 7fd2df204f342fc17d1a0bfcd474b24232fb0f32 change-id: 20260508-put-task-struct-many-5b5b2f4ae174 Best regards, -- Alice Ryhl