Splitting a partially mapped folio caused a regression in the Intel Xe
SVM test suite in the mremap section, resulting in the following stack
trace:
NFO: task kworker/u65:2:1642 blocked for more than 30 seconds.
[ 212.624286] Tainted: G S W 6.18.0-rc6-xe+ #1719
[ 212.638288] Workqueue: xe_page_fault_work_queue xe_pagefault_queue_work [xe]
[ 212.638323] Call Trace:
[ 212.638324]
[ 212.638325] __schedule+0x4b0/0x990
[ 212.638330] schedule+0x22/0xd0
[ 212.638331] io_schedule+0x41/0x60
[ 212.638333] migration_entry_wait_on_locked+0x1d8/0x2d0
[ 212.638336] ? __pfx_wake_page_function+0x10/0x10
[ 212.638339] migration_entry_wait+0xd2/0xe0
[ 212.638341] hmm_vma_walk_pmd+0x7c9/0x8d0
[ 212.638343] walk_pgd_range+0x51d/0xa40
[ 212.638345] __walk_page_range+0x75/0x1e0
[ 212.638347] walk_page_range_mm+0x138/0x1f0
[ 212.638349] hmm_range_fault+0x59/0xa0
[ 212.638351] drm_gpusvm_get_pages+0x194/0x7b0 [drm_gpusvm_helper]
[ 212.638354] drm_gpusvm_range_get_pages+0x2d/0x40 [drm_gpusvm_helper]
[ 212.638355] __xe_svm_handle_pagefault+0x259/0x900 [xe]
[ 212.638375] ? update_load_avg+0x7f/0x6c0
[ 212.638377] ? update_curr+0x13d/0x170
[ 212.638379] xe_svm_handle_pagefault+0x37/0x90 [xe]
[ 212.638396] xe_pagefault_queue_work+0x2da/0x3c0 [xe]
[ 212.638420] process_one_work+0x16e/0x2e0
[ 212.638422] worker_thread+0x284/0x410
[ 212.638423] ? __pfx_worker_thread+0x10/0x10
[ 212.638425] kthread+0xec/0x210
[ 212.638427] ? __pfx_kthread+0x10/0x10
[ 212.638428] ? __pfx_kthread+0x10/0x10
[ 212.638430] ret_from_fork+0xbd/0x100
[ 212.638433] ? __pfx_kthread+0x10/0x10
[ 212.638434] ret_from_fork_asm+0x1a/0x30
[ 212.638436]
The issue appears to be that migration PTEs are not properly removed
after a split due to incorrect retry handling after a split failure or
success. Upon failure, collect a skip, and upon success, continue the
collection from the current position in the sequence.
Also, while here, fix migrate_vma_split_folio to only lock the new fault
folio if it is different from the original fault folio (i.e., it’s
possible the original fault folio is not the same as the one being
split).
Cc: Andrew Morton
Cc: David Hildenbrand
Cc: Zi Yan
Cc: Joshua Hahn
Cc: Rakie Kim
Cc: Byungchul Park
Cc: Gregory Price
Cc: Ying Huang
Cc: Alistair Popple
Cc: Oscar Salvador
Cc: Lorenzo Stoakes
Cc: Baolin Wang
Cc: Liam R. Howlett
Cc: Nico Pache
Cc: Ryan Roberts
Cc: Dev Jain
Cc: Barry Song
Cc: Lyude Paul
Cc: Danilo Krummrich
Cc: David Airlie
Cc: Simona Vetter
Cc: Ralph Campbell
Cc: Mika Penttilä
Cc: Francois Dugast
Cc: Balbir Singh
Signed-off-by: Matthew Brost
---
This fixup should be squashed into the patch "mm/migrate_device: handle
partially mapped folios during" in mm/mm-unstable
---
mm/migrate_device.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index fa42d2ebd024..4506e96dcd20 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -110,8 +110,10 @@ static int migrate_vma_split_folio(struct folio *folio,
folio_unlock(folio);
folio_put(folio);
} else if (folio != new_fault_folio) {
- folio_get(new_fault_folio);
- folio_lock(new_fault_folio);
+ if (new_fault_folio != fault_folio) {
+ folio_get(new_fault_folio);
+ folio_lock(new_fault_folio);
+ }
folio_unlock(folio);
folio_put(folio);
}
@@ -266,10 +268,11 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
return 0;
}
- ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+ ptep = pte_offset_map_lock(mm, pmdp, start, &ptl);
if (!ptep)
goto again;
arch_enter_lazy_mmu_mode();
+ ptep += (addr - start) / PAGE_SIZE;
for (; addr < end; addr += PAGE_SIZE, ptep++) {
struct dev_pagemap *pgmap;
@@ -351,16 +354,18 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (folio && folio_test_large(folio)) {
int ret;
+ arch_leave_lazy_mmu_mode();
pte_unmap_unlock(ptep, ptl);
ret = migrate_vma_split_folio(folio,
migrate->fault_page);
if (ret) {
- ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
- goto next;
+ if (unmapped)
+ flush_tlb_range(walk->vma, start, end);
+
+ return migrate_vma_collect_skip(addr, end, walk);
}
- addr = start;
goto again;
}
mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
--
2.34.1
Fix splitting a folio in PMD collection to collect a skip on folio split
failure and continue the loop at the current position on success. This
fixes an issue where migration entries that had already been collected
could accidentally be left behind.
Cc: Andrew Morton
Cc: David Hildenbrand
Cc: Zi Yan
Cc: Joshua Hahn
Cc: Rakie Kim
Cc: Byungchul Park
Cc: Gregory Price
Cc: Ying Huang
Cc: Alistair Popple
Cc: Oscar Salvador
Cc: Lorenzo Stoakes
Cc: Baolin Wang
Cc: "Liam R. Howlett"
Cc: Nico Pache
Cc: Ryan Roberts
Cc: Dev Jain
Cc: Barry Song
Cc: Lyude Paul
Cc: Danilo Krummrich
Cc: David Airlie
Cc: Simona Vetter
Cc: Ralph Campbell
Cc: Mika Penttilä
Cc: Francois Dugast
Cc: Balbir Singh
Signed-off-by: Matthew Brost
---
This fixup should be squashed into the patch "mm/migrate_device: add THP
splitting during migration" in mm/mm-unstable
---
mm/migrate_device.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 4506e96dcd20..ab373fd38961 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -313,16 +313,18 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (folio_test_large(folio)) {
int ret;
+ arch_leave_lazy_mmu_mode();
pte_unmap_unlock(ptep, ptl);
ret = migrate_vma_split_folio(folio,
migrate->fault_page);
if (ret) {
- ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
- goto next;
+ if (unmapped)
+ flush_tlb_range(walk->vma, start, end);
+
+ return migrate_vma_collect_skip(addr, end, walk);
}
- addr = start;
goto again;
}
--
2.34.1