Add kdoc comments and describe exactly what these functions are used for in detail, pointing out importantly that the anon_vma_clone() !dst->anon_vma && src->anon_vma dance is ONLY for fork. Both are confusing functions that will be refactored in a subsequent patch but the first stage is establishing documentation and some invariants. Add some basic CONFIG_DEBUG_VM asserts that help document expected state, specifically: anon_vma_clone() - mmap write lock held. - We do nothing if src VMA is not faulted. - The destination VMA has no anon_vma_chain yet. - We are always operating on the same active VMA (i.e. vma->anon_vma). - If not forking, must operate on the same mm_struct. unlink_anon_vmas() - mmap lock held (write lock except when freeing page tables). - That unfaulted VMAs are no-ops. We are presented with a special case when anon_vma_clone() fails to allocate memory, where we have a VMA with partially set up anon_vma state. Since we hold the exclusive mmap write lock, and since we are cloning from a source VMA which consequently can't also have its anon_vma state modified, we know no anon_vma referenced can be empty. This allows us to significantly simplify this case and just remove anon_vma_chain objects associated with the VMA, so we add a specific partial cleanup path for this scenario. This also allows us to drop the hack of setting vma->anon_vma to NULL before unlinking anon_vma state in this scenario. Signed-off-by: Lorenzo Stoakes Reviewed-by: Liam R. Howlett --- mm/rmap.c | 130 +++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 104 insertions(+), 26 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index c86f1135222b..54ccf884d90a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -258,30 +258,62 @@ static inline void unlock_anon_vma_root(struct anon_vma *root) up_write(&root->rwsem); } -/* - * Attach the anon_vmas from src to dst. - * Returns 0 on success, -ENOMEM on failure. - * - * anon_vma_clone() is called by vma_expand(), vma_merge(), __split_vma(), - * copy_vma() and anon_vma_fork(). The first four want an exact copy of src, - * while the last one, anon_vma_fork(), may try to reuse an existing anon_vma to - * prevent endless growth of anon_vma. Since dst->anon_vma is set to NULL before - * call, we can identify this case by checking (!dst->anon_vma && - * src->anon_vma). - * - * If (!dst->anon_vma && src->anon_vma) is true, this function tries to find - * and reuse existing anon_vma which has no vmas and only one child anon_vma. - * This prevents degradation of anon_vma hierarchy to endless linear chain in - * case of constantly forking task. On the other hand, an anon_vma with more - * than one child isn't reused even if there was no alive vma, thus rmap - * walker has a good chance of avoiding scanning the whole hierarchy when it - * searches where page is mapped. +static void check_anon_vma_clone(struct vm_area_struct *dst, + struct vm_area_struct *src) +{ + /* The write lock must be held. */ + mmap_assert_write_locked(src->vm_mm); + /* If not a fork (implied by dst->anon_vma) then must be on same mm. */ + VM_WARN_ON_ONCE(dst->anon_vma && dst->vm_mm != src->vm_mm); + + /* If we have anything to do src->anon_vma must be provided. */ + VM_WARN_ON_ONCE(!src->anon_vma && !list_empty(&src->anon_vma_chain)); + VM_WARN_ON_ONCE(!src->anon_vma && dst->anon_vma); + /* We are establishing a new anon_vma_chain. */ + VM_WARN_ON_ONCE(!list_empty(&dst->anon_vma_chain)); + /* + * On fork, dst->anon_vma is set NULL (temporarily). Otherwise, anon_vma + * must be the same across dst and src. + */ + VM_WARN_ON_ONCE(dst->anon_vma && dst->anon_vma != src->anon_vma); +} + +static void cleanup_partial_anon_vmas(struct vm_area_struct *vma); + +/** + * anon_vma_clone - Establishes new anon_vma_chain objects in @dst linking to + * all of the anon_vma objects contained within @src anon_vma_chain's. + * @dst: The destination VMA with an empty anon_vma_chain. + * @src: The source VMA we wish to duplicate. + * + * This is the heart of the VMA side of the anon_vma implementation - we invoke + * this function whenever we need to set up a new VMA's anon_vma state. + * + * This is invoked for: + * + * - VMA Merge, but only when @dst is unfaulted and @src is faulted - meaning we + * clone @src into @dst. + * - VMA split. + * - VMA (m)remap. + * - Fork of faulted VMA. + * + * In all cases other than fork this is simply a duplication. Fork additionally + * adds a new active anon_vma. + * + * ONLY in the case of fork do we try to 'reuse' existing anon_vma's in an + * anon_vma hierarchy, reusing anon_vma's which have no VMA associated with them + * but do have a single child. This is to avoid waste of memory when repeatedly + * forking. + * + * Returns: 0 on success, -ENOMEM on failure. */ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) { struct anon_vma_chain *avc, *pavc; struct anon_vma *root = NULL; + check_anon_vma_clone(dst, src); + list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) { struct anon_vma *anon_vma; @@ -315,14 +347,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) return 0; enomem_failure: - /* - * dst->anon_vma is dropped here otherwise its num_active_vmas can - * be incorrectly decremented in unlink_anon_vmas(). - * We can safely do this because callers of anon_vma_clone() don't care - * about dst->anon_vma if anon_vma_clone() failed. - */ - dst->anon_vma = NULL; - unlink_anon_vmas(dst); + cleanup_partial_anon_vmas(dst); return -ENOMEM; } @@ -393,11 +418,64 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) return -ENOMEM; } +/* + * In the unfortunate case of anon_vma_clone() failing to allocate memory we + * have to clean things up. + * + * On clone we hold the exclusive mmap write lock, so we can't race + * unlink_anon_vmas(). Since we're cloning, we know we can't have empty + * anon_vma's, since existing anon_vma's are what we're cloning from. + * + * So this function needs only traverse the anon_vma_chain and free each + * allocated anon_vma_chain. + */ +static void cleanup_partial_anon_vmas(struct vm_area_struct *vma) +{ + struct anon_vma_chain *avc, *next; + bool locked = false; + + /* + * We exclude everybody else from being able to modify anon_vma's + * underneath us. + */ + mmap_assert_locked(vma->vm_mm); + + list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) { + struct anon_vma *anon_vma = avc->anon_vma; + + /* All anon_vma's share the same root. */ + if (!locked) { + anon_vma_lock_write(anon_vma); + locked = true; + } + + anon_vma_interval_tree_remove(avc, &anon_vma->rb_root); + list_del(&avc->same_vma); + anon_vma_chain_free(avc); + } +} + +/** + * unlink_anon_vmas() - remove all links between a VMA and anon_vma's, freeing + * anon_vma_chain objects. + * @vma: The VMA whose links to anon_vma objects is to be severed. + * + * As part of the process anon_vma_chain's are freed, + * anon_vma->num_children,num_active_vmas is updated as required and, if the + * relevant anon_vma references no further VMAs, its reference count is + * decremented. + */ void unlink_anon_vmas(struct vm_area_struct *vma) { struct anon_vma_chain *avc, *next; struct anon_vma *root = NULL; + /* Always hold mmap lock, read-lock on unmap possibly. */ + mmap_assert_locked(vma->vm_mm); + + /* Unfaulted is a no-op. */ + VM_WARN_ON_ONCE(!vma->anon_vma && !list_empty(&vma->anon_vma_chain)); + /* * Unlink each anon_vma chained to the VMA. This list is ordered * from newest to oldest, ensuring the root anon_vma gets freed last. -- 2.52.0