From: Dave Hansen == Background == There are basically two parallel ways to look up a VMA: the traditional way, which is protected by mmap_lock, and the RCU-based per-VMA lock way which is based on RCU and refcounts. == Problem == The mmap_lock one is more straightforward to use but it has a big disadvantage in that it can not be mixed with page faults since those can take mmap_lock for read, which can deadlock when mixed with page faults. For example: mmap_read_lock(mm); // Another thread does mmap_write_lock(). // New mmap_lock readers are blocked. vma = vma_lookup(mm, address); // This deadlocks on mmap_read_lock() if it faults: copy_from_user(address); mmap_read_unlock(mm); The RCU one can be mixed with faults, but it is not available in all configs, so all RCU users need to be able to fall back to the traditional way. == Solution == Add a variant of the RCU-based lookup that waits for writers. This is basically the same as the existing RCU-based lookup, but it also takes mmap_lock for read and waits for writers to finish before returning the VMA. This has some advantages: 1. Callers do not need to have a fallback path for when they collide with writers. 2. It can be used in contexts where page faults can happen because it can take the mmap_lock for read but never *holds* it. 3. Its fast path does not require taking mmap_lock for read. Basically, when applied correctly, this approach results in faster *and* simpler code. Signed-off-by: Dave Hansen Cc: Suren Baghdasaryan Cc: Andrew Morton Cc: "Liam R. Howlett" Cc: Lorenzo Stoakes Cc: Vlastimil Babka Cc: Shakeel Butt Cc: linux-mm@kvack.org Cc: Greg Kroah-Hartman Cc: Arve Hjønnevåg Cc: Todd Kjos Cc: Christian Brauner Cc: Carlos Llamas Cc: Alice Ryhl Cc: "David S. Miller" Cc: David Ahern Cc: netdev@vger.kernel.org -- Changes from v1: * Add a comment explaining that this can not be mixed with other per-VMA lock or mmap_lock users. It is prone to deadlocks if so. * Add a FIXME about making the mmap_read_lock() killable * Add more chaneglog bits about the possibility for an infinite goto loop. * Adopt vma_start_read_unlocked() implementation from Lorenzo --- b/include/linux/mmap_lock.h | 3 +++ b/mm/mmap_lock.c | 27 +++++++++++++++++++++++++++ 2 files changed, 30 insertions(+) diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h --- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait 2026-06-10 15:57:55.828431712 -0700 +++ b/include/linux/mmap_lock.h 2026-06-10 15:57:55.834431925 -0700 @@ -257,6 +257,9 @@ static inline bool vma_start_read_locked return vma_start_read_locked_nested(vma, 0); } +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm, + unsigned long address); + static inline void vma_end_read(struct vm_area_struct *vma) { vma_refcount_put(vma); diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c --- a/mm/mmap_lock.c~lock-vma-under-rcu-wait 2026-06-10 15:57:55.831431819 -0700 +++ b/mm/mmap_lock.c 2026-06-10 16:02:50.723860779 -0700 @@ -338,6 +338,33 @@ inval: return NULL; } +/* + * Find the VMA covering 'address' and lock it for reading. Waits for writers to + * finish if the VMA is being modified. Returns NULL if there is no VMA covering + * 'address'. + * + * Use only in code paths where no mmap_lock and no VMA lock is held. + * + * The fast path does not take mmap_lock. + */ +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm, + unsigned long address) +{ + struct vm_area_struct *vma; + + /* Fast path: return stable VMA covering 'address': */ + vma = lock_vma_under_rcu(mm, address); + if (vma) + return vma; + + /* Slow path: preclude VMA writers by getting mmap read lock. */ + guard(rwsem_read)(&mm->mmap_lock); + if (!vma_start_read_locked(vma)) + return NULL; + + return vma; +} + static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm, struct vma_iterator *vmi, unsigned long from_addr) _