This moves the condition (tid != 1 && !tmp->child_reaper) to after idr alloc, so it not only covers that first process in pid namespace has pid 1 in case of clone3(set_tid) requesting wrong pid, but also if idr itself gives wrong pid for some reason. This could've been the case before this patch, when creating first process the alloc_pid()->pidfs_add_pid() code path fails, so that the idr->idr_next is non zero anymore and next process calling to alloc_pid(), will get 2 as a pid from idr_alloc_cyclic(). Effectively leading to init-less pid namespace, which is a bug. Note: This is also a preparation for the next patch in the series, which will introduce an ability of creating init from the task different to the task which had created the pid namespace. Needed to make sure that init is always first, even in this new case. Suggested-by: Oleg Nesterov Signed-off-by: Pavel Tikhomirov -- v3: Split from main commit. Merge two checks of ->child_reaper into one. --- kernel/pid.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/kernel/pid.c b/kernel/pid.c index 76c2744493e2..ebf013f35cb3 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -215,12 +215,6 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *arg_set_tid, retval = -EINVAL; if (tid < 1 || tid >= pid_max[ns->level - i]) goto out_abort; - /* - * Also fail if a PID != 1 is requested and - * no PID 1 exists. - */ - if (tid != 1 && !READ_ONCE(tmp->child_reaper)) - goto out_abort; retval = -EPERM; if (!checkpoint_restore_ns_capable(tmp->user_ns)) goto out_abort; @@ -296,9 +290,18 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *arg_set_tid, pid->numbers[i].nr = nr; pid->numbers[i].ns = tmp; - tmp = tmp->parent; i--; retried_preload = false; + + /* + * PID 1 (init) must be created first. + */ + if (!READ_ONCE(tmp->child_reaper) && nr != 1) { + retval = -EINVAL; + goto out_free; + } + + tmp = tmp->parent; } /* -- 2.53.0