From: "Franklin \"Snaipe\" Mathieu" The main motivation for this change is to be able to bind-mount memfd file descriptors. Prior to this change, it was not easy for a process to create a private in-memory handle that could then be bind-mounted. A process had to have access to a tmpfs, create a file in it, call open_tree on the resulting file descriptor, close the original file descriptor, unlink the file, and then check that no other process raced the process to open the new file. Doable, but not great for mounting sensitive content like secrets. With this change, it is now possible for a process to prepare a memfd, and call open_tree on it: int tmpfd = memfd_create("secret", 0); fchmod(tmpfd, 0600); write(tmpfd, "SecretKey", 9); int treefd = open_tree(tmpfd, "", OPEN_TREE_CLONE|AT_EMPTY_PATH|AT_RECURSIVE); move_mount(treefd, "", -1, "/secret.txt", MOVE_MOUNT_F_EMPTY_PATH); Signed-off-by: Franklin "Snaipe" Mathieu --- fs/namespace.c | 8 ++++++++ mm/internal.h | 2 ++ mm/shmem.c | 2 +- 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index d82910f33dc4..f51ad2013662 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -38,6 +38,9 @@ #include "pnode.h" #include "internal.h" +/* For checking memfd bind-mounts via shm_mnt */ +#include "../mm/internal.h" + /* Maximum number of mounts in a mount namespace */ static unsigned int sysctl_mount_max __read_mostly = 100000; @@ -2901,6 +2904,8 @@ static int do_change_type(const struct path *path, int ms_flags) * (3) The caller tries to copy a pidfs mount referring to a pidfd. * (4) The caller is trying to copy a mount tree that belongs to an * anonymous mount namespace. + * (5) The caller is trying to copy a mount tree belonging to shm_mnt + * (e.g. bind-mounting a file descriptor obtained from memfd_create) * * For that to be safe, this helper enforces that the origin mount * namespace the anonymous mount namespace was created from is the @@ -2943,6 +2948,9 @@ static inline bool may_copy_tree(const struct path *path) if (d_op == &pidfs_dentry_operations) return true; + if (path->mnt == shm_mnt) + return true; + if (!is_mounted(path->mnt)) return false; diff --git a/mm/internal.h b/mm/internal.h index 1561fc2ff5b8..aa45c5576b16 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -24,6 +24,8 @@ struct folio_batch; +extern struct vfsmount *shm_mnt __ro_after_init; + /* * Maintains state across a page table move. The operation assumes both source * and destination VMAs already exist and are specified by the user. diff --git a/mm/shmem.c b/mm/shmem.c index b9081b817d28..449d6bc813ae 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -43,7 +43,7 @@ #include #include "swap.h" -static struct vfsmount *shm_mnt __ro_after_init; +struct vfsmount *shm_mnt __ro_after_init; #ifdef CONFIG_SHMEM /* -- 2.52.0