Add hmm_range_fault_unlockable(), a new HMM entry point that allows the
mmap read lock to be dropped during page faults. This follows the
int *locked pattern from get_user_pages_remote() in mm/gup.c: callers
pass an int *locked variable indicating they can handle the lock being
dropped.

When locked is non-NULL, hmm_vma_fault() adds FAULT_FLAG_ALLOW_RETRY
and FAULT_FLAG_KILLABLE to the fault flags passed to handle_mm_fault().
If the fault handler drops the mmap lock (returning VM_FAULT_RETRY or
VM_FAULT_COMPLETED), the function sets *locked = 0 and returns 0,
signalling the caller to restart its walk with a fresh notifier
sequence. Fatal signals are checked before returning, matching GUP
behavior. The caller is responsible for re-acquiring the lock and
restarting from the beginning, since previously collected PFNs may be
stale after the lock was dropped.

The existing hmm_range_fault() is refactored into a thin wrapper that
calls hmm_range_fault_unlockable(range, NULL). Passing NULL means
FAULT_FLAG_ALLOW_RETRY is never set, preserving existing behavior for
all current callers with no functional change.

Faulting hugetlb pages is not supported on the unlockable path: if a
hugetlb page requires faulting, -EFAULT is returned. This is because
walk_hugetlb_range() holds hugetlb_vma_lock_read across the callback
and unconditionally unlocks on return; if the mmap lock is dropped
inside the callback the VMA may be freed, making the walk framework's
unlock a use-after-free. Hugetlb pages already present in page tables
are handled normally.

Documentation/mm/hmm.rst is updated with a new section describing the
unlockable API, its usage pattern, and the hugetlb limitation.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 Documentation/mm/hmm.rst |   89 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/hmm.h      |    1 +
 mm/hmm.c                 |   91 +++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 172 insertions(+), 9 deletions(-)

diff --git a/Documentation/mm/hmm.rst b/Documentation/mm/hmm.rst
index 7d61b7a8b65b7..13874b4dfd5f4 100644
--- a/Documentation/mm/hmm.rst
+++ b/Documentation/mm/hmm.rst
@@ -208,6 +208,95 @@ invalidate() callback. That lock must be held before calling
 mmu_interval_read_retry() to avoid any race with a concurrent CPU page table
 update.
 
+Scalable lock-drop support (hmm_range_fault_unlockable)
+=======================================================
+
+Some page fault handlers (e.g., userfaultfd) require the mmap lock to be
+dropped during fault resolution. Drivers that need to support such mappings
+can use::
+
+  int hmm_range_fault_unlockable(struct hmm_range *range, int *locked);
+
+This follows the same ``int *locked`` pattern used by ``get_user_pages_remote()``
+in ``mm/gup.c``. The caller sets ``*locked = 1`` and holds the mmap read lock
+before calling. If the lock is dropped during the fault (VM_FAULT_RETRY or
+VM_FAULT_COMPLETED), the function returns 0 with ``*locked = 0``, signalling
+the caller to restart its walk with a fresh notifier sequence. The caller is
+responsible for re-acquiring the lock and restarting from the beginning, since
+previously collected PFNs may be stale.
+
+The usage pattern is::
+
+ int driver_populate_range_unlockable(...)
+ {
+      struct hmm_range range;
+      int locked;
+      ...
+
+      range.notifier = &interval_sub;
+      range.start = ...;
+      range.end = ...;
+      range.hmm_pfns = ...;
+
+      if (!mmget_not_zero(interval_sub->notifier.mm))
+          return -EFAULT;
+
+ again:
+      range.notifier_seq = mmu_interval_read_begin(&interval_sub);
+      locked = 1;
+      mmap_read_lock(mm);
+      ret = hmm_range_fault_unlockable(&range, &locked);
+      if (locked)
+          mmap_read_unlock(mm);
+      if (ret) {
+          if (ret == -EBUSY)
+                 goto again;
+          return ret;
+      }
+      if (!locked)
+          goto again;
+
+      take_lock(driver->update);
+      if (mmu_interval_read_retry(&ni, range.notifier_seq) {
+          release_lock(driver->update);
+          goto again;
+      }
+
+      /* Use pfns array content to update device page table,
+       * under the update lock */
+
+      release_lock(driver->update);
+      return 0;
+ }
+
+Passing ``locked = NULL`` to ``hmm_range_fault_unlockable()`` is equivalent to
+calling ``hmm_range_fault()`` — the lock will never be dropped.
+
+Note: hugetlb pages are not supported with the unlockable path. If a hugetlb
+page requires faulting during an ``hmm_range_fault_unlockable()`` call,
+``-EFAULT`` is returned. Hugetlb pages that are already present in page tables
+are handled normally.
+
+This limitation exists because ``walk_hugetlb_range()`` in the page walk
+framework holds ``hugetlb_vma_lock_read`` across the callback and unconditionally
+unlocks on return. If the mmap lock is dropped inside the callback (via
+VM_FAULT_RETRY), the VMA may be freed before the walk framework's unlock,
+resulting in a use-after-free. Possible approaches to lift this limitation in
+the future:
+
+1. Extend the walk framework to allow callbacks to signal that the hugetlb vma
+   lock was dropped (e.g., a flag in ``struct mm_walk`` that tells
+   ``walk_hugetlb_range()`` to skip the unlock).
+
+2. Bypass ``walk_page_range()`` for hugetlb pages in the unlockable path and
+   walk hugetlb page tables directly with custom lock management (similar to
+   how GUP handles hugetlb without the walk framework).
+
+3. Re-acquire the mmap lock before returning from the hugetlb callback (like
+   ``fixup_user_fault()``), ensuring the VMA remains valid for the walk
+   framework's unlock. This changes the "never re-take" contract and would
+   require callers to handle hugetlb differently.
+
 Leverage default_flags and pfn_flags_mask
 =========================================
 
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index db75ffc949a7a..46e581865c48a 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -123,6 +123,7 @@ struct hmm_range {
  * Please see Documentation/mm/hmm.rst for how to use the range API.
  */
 int hmm_range_fault(struct hmm_range *range);
+int hmm_range_fault_unlockable(struct hmm_range *range, int *locked);
 
 /*
  * HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a range
diff --git a/mm/hmm.c b/mm/hmm.c
index 5955f2f0c83db..9bf2fa37f2efd 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -33,6 +33,7 @@
 struct hmm_vma_walk {
 	struct hmm_range	*range;
 	unsigned long		last;
+	int			*locked;
 };
 
 enum {
@@ -86,10 +87,28 @@ static int hmm_vma_fault(unsigned long addr, unsigned long end,
 		fault_flags |= FAULT_FLAG_WRITE;
 	}
 
-	for (; addr < end; addr += PAGE_SIZE)
-		if (handle_mm_fault(vma, addr, fault_flags, NULL) &
-		    VM_FAULT_ERROR)
+	if (hmm_vma_walk->locked)
+		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+
+	for (; addr < end; addr += PAGE_SIZE) {
+		vm_fault_t ret;
+
+		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
+
+		if (ret & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)) {
+			/*
+			 * The mmap lock has been dropped by the fault handler.
+			 * Record the failing address and signal lock-drop to
+			 * the caller.
+			 */
+			*hmm_vma_walk->locked = 0;
+			hmm_vma_walk->last = addr;
+			return -EAGAIN;
+		}
+
+		if (ret & VM_FAULT_ERROR)
 			return -EFAULT;
+	}
 	return -EBUSY;
 }
 
@@ -566,6 +585,17 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 	if (required_fault) {
 		int ret;
 
+		/*
+		 * Faulting hugetlb pages on the unlockable path is not
+		 * supported. The walk framework holds hugetlb_vma_lock_read
+		 * which must be dropped before handle_mm_fault, but if the
+		 * mmap lock is also dropped (VM_FAULT_RETRY), the vma may
+		 * be freed and the walk framework's unconditional unlock
+		 * becomes a use-after-free.
+		 */
+		if (hmm_vma_walk->locked)
+			return -EFAULT;
+
 		spin_unlock(ptl);
 		hugetlb_vma_unlock_read(vma);
 		/*
@@ -655,14 +685,49 @@ static const struct mm_walk_ops hmm_walk_ops = {
  *
  * This is similar to get_user_pages(), except that it can read the page tables
  * without mutating them (ie causing faults).
+ *
+ * The mmap lock must be held by the caller and will remain held on return.
+ * For a variant that allows the mmap lock to be dropped during faults (e.g.,
+ * for userfaultfd support), see hmm_range_fault_unlockable().
  */
 int hmm_range_fault(struct hmm_range *range)
 {
+	return hmm_range_fault_unlockable(range, NULL);
+}
+EXPORT_SYMBOL(hmm_range_fault);
+
+/**
+ * hmm_range_fault_unlockable - fault a range with mmap lock-drop support
+ * @range:	argument structure
+ * @locked:	pointer to lock state variable (input: 1; output: 0 if lock
+ *		was dropped)
+ *
+ * Similar to hmm_range_fault() but allows the mmap lock to be dropped during
+ * page faults. This enables support for userfaultfd-backed mappings and other
+ * cases where handle_mm_fault() may need to release the mmap lock.
+ *
+ * The caller must hold the mmap read lock and set *locked = 1 before calling.
+ * On return:
+ *   - *locked == 1: mmap lock is still held, return value has normal semantics
+ *   - *locked == 0: mmap lock was dropped. The caller must re-acquire the lock
+ *     and restart the operation. Return value is -EBUSY in this case.
+ *
+ * When the lock is dropped internally, this function will attempt to
+ * re-acquire it and retry the fault with FAULT_FLAG_TRIED set. If the retry
+ * also results in lock-drop (possible but unusual), or if a fatal signal is
+ * pending, the function returns with *locked == 0.
+ *
+ * Returns 0 on success or a negative error code. See hmm_range_fault() for
+ * the full list of possible errors.
+ */
+int hmm_range_fault_unlockable(struct hmm_range *range, int *locked)
+{
+	struct mm_struct *mm = range->notifier->mm;
 	struct hmm_vma_walk hmm_vma_walk = {
 		.range = range,
 		.last = range->start,
+		.locked = locked,
 	};
-	struct mm_struct *mm = range->notifier->mm;
 	int ret;
 
 	mmap_assert_locked(mm);
@@ -674,16 +739,24 @@ int hmm_range_fault(struct hmm_range *range)
 			return -EBUSY;
 		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
 				      &hmm_walk_ops, &hmm_vma_walk);
+		if (ret == -EAGAIN) {
+			/*
+			 * The mmap lock was dropped during the fault
+			 * (e.g. userfaultfd). Signal the caller to restart
+			 * by returning with *locked = 0.
+			 */
+			if (fatal_signal_pending(current))
+				return -EINTR;
+			return 0;
+		}
 		/*
-		 * When -EBUSY is returned the loop restarts with
-		 * hmm_vma_walk.last set to an address that has not been stored
-		 * in pfns. All entries < last in the pfn array are set to their
-		 * output, and all >= are still at their input values.
+		 * -EBUSY: page table changed during the walk.
+		 * Restart from hmm_vma_walk.last.
 		 */
 	} while (ret == -EBUSY);
 	return ret;
 }
-EXPORT_SYMBOL(hmm_range_fault);
+EXPORT_SYMBOL(hmm_range_fault_unlockable);
 
 /**
  * hmm_dma_map_alloc - Allocate HMM map structure


Convert the mshv driver's HMM fault path to use
hmm_range_fault_unlockable() instead of hmm_range_fault(). This enables
userfaultfd-backed guest memory regions by allowing the mmap lock to be
dropped during page fault handling.

Extract the per-VMA walk into a dedicated mshv_region_hmm_fault_walk()
helper. The outer mshv_region_hmm_fault_and_lock() handles the do/while
restart loop: if the lock is dropped during a fault (userfaultfd resolution
or similar) or an invalidation occurs (-EBUSY), the function restarts the
entire walk from the beginning with a fresh notifier_seq, since the VMA
layout may have changed.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c |  127 +++++++++++++++++++++++++++++++--------------
 1 file changed, 87 insertions(+), 40 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index d09940e88298e..05665446ca6d9 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -565,6 +565,75 @@ int mshv_region_get(struct mshv_region *region)
 	return kref_get_unless_zero(&region->mreg_refcount);
 }
 
+/**
+ * mshv_region_hmm_fault_walk - Walk VMAs and fault in pages for a range
+ * @region : Pointer to the memory region structure
+ * @range  : HMM range structure (caller sets notifier and notifier_seq)
+ * @start  : Starting virtual address of the range to fault (inclusive)
+ * @end    : Ending virtual address of the range to fault (exclusive)
+ * @pfns   : Output array for page frame numbers with HMM flags
+ * @locked : Pointer to lock state; set to 0 if mmap lock was dropped
+ * @do_fault: If true, fault in missing pages; if false, snapshot only
+ *
+ * Iterates through VMAs covering [start, end), collecting page frame
+ * numbers via hmm_range_fault_unlockable() for each VMA segment.
+ * When @do_fault is true, missing pages are faulted in and write faults
+ * are requested only when both the VMA and the hypervisor mapping permit
+ * writes, to avoid breaking copy-on-write semantics on read-only mappings.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+static int mshv_region_hmm_fault_walk(struct mshv_region *region,
+				      struct hmm_range *range,
+				      unsigned long start,
+				      unsigned long end,
+				      unsigned long *pfns,
+				      int *locked,
+				      bool do_fault)
+{
+	unsigned long cur_start = start;
+	unsigned long *cur_pfns = pfns;
+
+	while (cur_start < end) {
+		struct vm_area_struct *vma;
+
+		vma = vma_lookup(range->notifier->mm, cur_start);
+		if (!vma)
+			return -EFAULT;
+
+		range->hmm_pfns = cur_pfns;
+		range->start = cur_start;
+		range->end = min(vma->vm_end, end);
+		range->default_flags = 0;
+		if (do_fault) {
+			range->default_flags = HMM_PFN_REQ_FAULT;
+			/*
+			 * Only request writable pages from HMM when
+			 * both the VMA and the hypervisor mapping allow
+			 * writes. Without this, hmm_range_fault() would
+			 * trigger COW on read-only mappings (e.g. shared
+			 * zero pages, file-backed pages), breaking
+			 * copy-on-write semantics and potentially
+			 * granting the guest write access to shared host
+			 * pages.
+			 */
+			if ((vma->vm_flags & VM_WRITE) &&
+			    (region->hv_map_flags & HV_MAP_GPA_WRITABLE))
+				range->default_flags |= HMM_PFN_REQ_WRITE;
+		}
+
+		int ret = hmm_range_fault_unlockable(range, locked);
+
+		if (ret || !*locked)
+			return ret;
+
+		cur_start = range->end;
+		cur_pfns += (range->end - range->start) >> PAGE_SHIFT;
+	}
+
+	return 0;
+}
+
 /**
  * mshv_region_hmm_fault_and_lock - Fault in pages across VMAs and lock
  *                                  the memory region
@@ -575,11 +644,9 @@ int mshv_region_get(struct mshv_region *region)
  * @do_fault: If true, fault in missing pages; if false, snapshot only
  *            pages already present in page tables
  *
- * Iterates through VMAs covering [start, end), collecting page frame
- * numbers via hmm_range_fault() for each VMA segment.  When @do_fault
- * is true, missing pages are faulted in and write faults are requested
- * only when both the VMA and the hypervisor mapping permit writes, to
- * avoid breaking copy-on-write semantics on read-only mappings.
+ * Faults in pages covering [start, end) and acquires region->mreg_mutex.
+ * If the mmap lock is dropped during the fault (e.g. by userfaultfd) or
+ * the mmu notifier sequence is invalidated, the entire walk is restarted.
  *
  * On success, returns with region->mreg_mutex held; the caller is
  * responsible for releasing it.  Returns -EBUSY if the mmu notifier
@@ -597,47 +664,27 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_region *region,
 		.notifier = &region->mreg_mni,
 	};
 	struct mm_struct *mm = region->mreg_mni.mm;
+	int locked;
 	int ret;
 
-	range.notifier_seq = mmu_interval_read_begin(range.notifier);
-	mmap_read_lock(mm);
-	while (start < end) {
-		struct vm_area_struct *vma;
+	do {
+		range.notifier_seq = mmu_interval_read_begin(range.notifier);
+		locked = 1;
+		mmap_read_lock(mm);
 
-		vma = vma_lookup(mm, start);
-		if (!vma) {
-			ret = -EFAULT;
-			break;
-		}
+		ret = mshv_region_hmm_fault_walk(region, &range, start, end,
+						 pfns, &locked, do_fault);
 
-		range.hmm_pfns = pfns;
-		range.start = start;
-		range.end = min(vma->vm_end, end);
-		range.default_flags = 0;
-		if (do_fault) {
-			range.default_flags = HMM_PFN_REQ_FAULT;
-			/*
-			 * Only request writable pages from HMM when both
-			 * the VMA and the hypervisor mapping allow writes.
-			 * Without this, hmm_range_fault() would trigger
-			 * COW on read-only mappings (e.g. shared zero
-			 * pages, file-backed pages), breaking
-			 * copy-on-write semantics and potentially granting
-			 * the guest write access to shared host pages.
-			 */
-			if ((vma->vm_flags & VM_WRITE) &&
-			    (region->hv_map_flags & HV_MAP_GPA_WRITABLE))
-				range.default_flags |= HMM_PFN_REQ_WRITE;
-		}
+		if (locked)
+			mmap_read_unlock(mm);
 
-		ret = hmm_range_fault(&range);
-		if (ret)
-			break;
+		/*
+		 * If the lock was dropped (by userfaultfd or similar), restart
+		 * the entire walk with a fresh notifier_seq since the VMA layout
+		 * may have changed. Also restart on -EBUSY (invalidation).
+		 */
+	} while (!locked || ret == -EBUSY);
 
-		start = range.end;
-		pfns += (range.end - range.start) >> PAGE_SHIFT;
-	}
-	mmap_read_unlock(mm);
 	if (ret)
 		return ret;
 

Add a selftest that exercises hmm_range_fault_unlockable() with a
userfaultfd-backed mapping. The test:

1. Creates an anonymous mmap region
2. Registers it with userfaultfd (UFFDIO_REGISTER_MODE_MISSING)
3. Spawns a handler thread that responds to page faults by filling
   pages with a known pattern (0xAB) via UFFDIO_COPY
4. Issues HMM_DMIRROR_READ_UNLOCKABLE to the test_hmm driver, which
   calls hmm_range_fault_unlockable() internally
5. Verifies the device read back the data provided by the userfaultfd
   handler

This requires changes to the test_hmm kernel module:
- New dmirror_range_fault_unlockable() that uses the new HMM API
- New dmirror_fault_unlockable() and dmirror_read_unlockable() wrappers
- New HMM_DMIRROR_READ_UNLOCKABLE ioctl (0x09)

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 lib/test_hmm.c                         |  122 +++++++++++++++++++++++++++++
 lib/test_hmm_uapi.h                    |    1 
 tools/testing/selftests/mm/hmm-tests.c |  133 ++++++++++++++++++++++++++++++++
 3 files changed, 256 insertions(+)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 0964d53365e61..20b14e279a8bd 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -327,6 +327,84 @@ static int dmirror_range_fault(struct dmirror *dmirror,
 	return ret;
 }
 
+static int dmirror_range_fault_unlockable(struct dmirror *dmirror,
+					  struct hmm_range *range)
+{
+	struct mm_struct *mm = dmirror->notifier.mm;
+	unsigned long timeout =
+		jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
+	int locked;
+	int ret;
+
+	while (true) {
+		if (time_after(jiffies, timeout)) {
+			ret = -EBUSY;
+			goto out;
+		}
+
+		range->notifier_seq = mmu_interval_read_begin(range->notifier);
+		locked = 1;
+		mmap_read_lock(mm);
+		ret = hmm_range_fault_unlockable(range, &locked);
+		if (locked)
+			mmap_read_unlock(mm);
+		if (ret) {
+			if (ret == -EBUSY)
+				continue;
+			goto out;
+		}
+		if (!locked)
+			continue;
+
+		mutex_lock(&dmirror->mutex);
+		if (mmu_interval_read_retry(range->notifier,
+					    range->notifier_seq)) {
+			mutex_unlock(&dmirror->mutex);
+			continue;
+		}
+		break;
+	}
+
+	ret = dmirror_do_fault(dmirror, range);
+
+	mutex_unlock(&dmirror->mutex);
+out:
+	return ret;
+}
+
+static int dmirror_fault_unlockable(struct dmirror *dmirror,
+				    unsigned long start,
+				    unsigned long end, bool write)
+{
+	struct mm_struct *mm = dmirror->notifier.mm;
+	unsigned long addr;
+	unsigned long pfns[32];
+	struct hmm_range range = {
+		.notifier = &dmirror->notifier,
+		.hmm_pfns = pfns,
+		.pfn_flags_mask = 0,
+		.default_flags =
+			HMM_PFN_REQ_FAULT | (write ? HMM_PFN_REQ_WRITE : 0),
+		.dev_private_owner = dmirror->mdevice,
+	};
+	int ret = 0;
+
+	if (!mmget_not_zero(mm))
+		return 0;
+
+	for (addr = start; addr < end; addr = range.end) {
+		range.start = addr;
+		range.end = min(addr + (ARRAY_SIZE(pfns) << PAGE_SHIFT), end);
+
+		ret = dmirror_range_fault_unlockable(dmirror, &range);
+		if (ret)
+			break;
+	}
+
+	mmput(mm);
+	return ret;
+}
+
 static int dmirror_fault(struct dmirror *dmirror, unsigned long start,
 			 unsigned long end, bool write)
 {
@@ -426,6 +504,47 @@ static int dmirror_read(struct dmirror *dmirror, struct hmm_dmirror_cmd *cmd)
 	return ret;
 }
 
+static int dmirror_read_unlockable(struct dmirror *dmirror,
+				   struct hmm_dmirror_cmd *cmd)
+{
+	struct dmirror_bounce bounce;
+	unsigned long start, end;
+	unsigned long size = cmd->npages << PAGE_SHIFT;
+	int ret;
+
+	start = cmd->addr;
+	end = start + size;
+	if (end < start)
+		return -EINVAL;
+
+	ret = dmirror_bounce_init(&bounce, start, size);
+	if (ret)
+		return ret;
+
+	while (1) {
+		mutex_lock(&dmirror->mutex);
+		ret = dmirror_do_read(dmirror, start, end, &bounce);
+		mutex_unlock(&dmirror->mutex);
+		if (ret != -ENOENT)
+			break;
+
+		start = cmd->addr + (bounce.cpages << PAGE_SHIFT);
+		ret = dmirror_fault_unlockable(dmirror, start, end, false);
+		if (ret)
+			break;
+		cmd->faults++;
+	}
+
+	if (ret == 0) {
+		if (copy_to_user(u64_to_user_ptr(cmd->ptr), bounce.ptr,
+				 bounce.size))
+			ret = -EFAULT;
+	}
+	cmd->cpages = bounce.cpages;
+	dmirror_bounce_fini(&bounce);
+	return ret;
+}
+
 static int dmirror_do_write(struct dmirror *dmirror, unsigned long start,
 			    unsigned long end, struct dmirror_bounce *bounce)
 {
@@ -1537,6 +1656,9 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp,
 		dmirror->flags = cmd.npages;
 		ret = 0;
 		break;
+	case HMM_DMIRROR_READ_UNLOCKABLE:
+		ret = dmirror_read_unlockable(dmirror, &cmd);
+		break;
 
 	default:
 		return -EINVAL;
diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
index f94c6d4573382..076df6df92275 100644
--- a/lib/test_hmm_uapi.h
+++ b/lib/test_hmm_uapi.h
@@ -38,6 +38,7 @@ struct hmm_dmirror_cmd {
 #define HMM_DMIRROR_CHECK_EXCLUSIVE	_IOWR('H', 0x06, struct hmm_dmirror_cmd)
 #define HMM_DMIRROR_RELEASE		_IOWR('H', 0x07, struct hmm_dmirror_cmd)
 #define HMM_DMIRROR_FLAGS		_IOWR('H', 0x08, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_READ_UNLOCKABLE	_IOWR('H', 0x09, struct hmm_dmirror_cmd)
 
 #define HMM_DMIRROR_FLAG_FAIL_ALLOC	(1ULL << 0)
 
diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftests/mm/hmm-tests.c
index e8328c89d855e..e7bf061747edd 100644
--- a/tools/testing/selftests/mm/hmm-tests.c
+++ b/tools/testing/selftests/mm/hmm-tests.c
@@ -26,6 +26,9 @@
 #include <sys/mman.h>
 #include <sys/ioctl.h>
 #include <sys/time.h>
+#include <sys/syscall.h>
+#include <linux/userfaultfd.h>
+#include <poll.h>
 
 
 /*
@@ -2852,4 +2855,134 @@ TEST_F_TIMEOUT(hmm, benchmark_thp_migration, 120)
 					&thp_results, &regular_results);
 	}
 }
+
+/*
+ * Test that HMM can fault in pages backed by userfaultfd using the
+ * hmm_range_fault_unlockable() path. This exercises the lock-drop retry
+ * logic in the HMM framework.
+ */
+struct uffd_thread_args {
+	int uffd;
+	void *page_buffer;
+	unsigned long page_size;
+};
+
+static void *uffd_handler_thread(void *arg)
+{
+	struct uffd_thread_args *args = arg;
+	struct uffd_msg msg;
+	struct uffdio_copy copy;
+	struct pollfd pollfd;
+	int ret;
+
+	pollfd.fd = args->uffd;
+	pollfd.events = POLLIN;
+
+	while (1) {
+		ret = poll(&pollfd, 1, 5000);
+		if (ret <= 0)
+			break;
+
+		ret = read(args->uffd, &msg, sizeof(msg));
+		if (ret != sizeof(msg))
+			break;
+
+		if (msg.event != UFFD_EVENT_PAGEFAULT)
+			break;
+
+		/* Fill the page with a known pattern */
+		memset(args->page_buffer, 0xAB, args->page_size);
+
+		copy.dst = msg.arg.pagefault.address & ~(args->page_size - 1);
+		copy.src = (unsigned long)args->page_buffer;
+		copy.len = args->page_size;
+		copy.mode = 0;
+		copy.copy = 0;
+
+		ret = ioctl(args->uffd, UFFDIO_COPY, &copy);
+		if (ret < 0)
+			break;
+	}
+
+	return NULL;
+}
+
+TEST_F(hmm, userfaultfd_read)
+{
+	struct hmm_buffer *buffer;
+	struct uffd_thread_args uffd_args;
+	unsigned long npages;
+	unsigned long size;
+	unsigned long i;
+	unsigned char *ptr;
+	pthread_t thread;
+	int uffd;
+	int ret;
+	struct uffdio_api api;
+	struct uffdio_register reg;
+
+	npages = 4;
+	size = npages << self->page_shift;
+
+	/* Create userfaultfd */
+	uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+	if (uffd < 0)
+		SKIP(return, "userfaultfd not available");
+
+	api.api = UFFD_API;
+	api.features = 0;
+	ret = ioctl(uffd, UFFDIO_API, &api);
+	ASSERT_EQ(ret, 0);
+
+	buffer = malloc(sizeof(*buffer));
+	ASSERT_NE(buffer, NULL);
+
+	buffer->fd = -1;
+	buffer->size = size;
+	buffer->mirror = malloc(size);
+	ASSERT_NE(buffer->mirror, NULL);
+
+	/* Create anonymous mapping */
+	buffer->ptr = mmap(NULL, size,
+			   PROT_READ | PROT_WRITE,
+			   MAP_PRIVATE | MAP_ANONYMOUS,
+			   -1, 0);
+	ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+	/* Register the region with userfaultfd */
+	reg.range.start = (unsigned long)buffer->ptr;
+	reg.range.len = size;
+	reg.mode = UFFDIO_REGISTER_MODE_MISSING;
+	ret = ioctl(uffd, UFFDIO_REGISTER, &reg);
+	ASSERT_EQ(ret, 0);
+
+	/* Set up the handler thread */
+	uffd_args.uffd = uffd;
+	uffd_args.page_buffer = malloc(self->page_size);
+	ASSERT_NE(uffd_args.page_buffer, NULL);
+	uffd_args.page_size = self->page_size;
+
+	ret = pthread_create(&thread, NULL, uffd_handler_thread, &uffd_args);
+	ASSERT_EQ(ret, 0);
+
+	/*
+	 * Use the unlockable read path which allows the mmap lock to be
+	 * dropped during the fault, enabling userfaultfd resolution.
+	 */
+	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_READ_UNLOCKABLE,
+			      buffer, npages);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(buffer->cpages, npages);
+
+	/* Verify the device read the data filled by the uffd handler */
+	ptr = buffer->mirror;
+	for (i = 0; i < size; ++i)
+		ASSERT_EQ(ptr[i], (unsigned char)0xAB);
+
+	pthread_join(thread, NULL);
+	free(uffd_args.page_buffer);
+	close(uffd);
+	hmm_buffer_free(buffer);
+}
+
 TEST_HARNESS_MAIN