We always unconditionally drain the LRU before retrying anon folio reuse in wp_can_reuse_anon_folio(). Instead, assume !LRU anon folios are in lru_cache, and use the refcount to avoid many unnecessary LRU drains. Acked-by: Shakeel Butt Reviewed-by: Baoquan He Signed-off-by: Barry Song (Xiaomi) --- mm/memory.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index ff338c2abe92..f6848f4234a6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4193,12 +4193,18 @@ static bool wp_can_reuse_anon_folio(struct folio *folio, */ if (folio_test_ksm(folio) || folio_ref_count(folio) > 3) return false; - if (!folio_test_lru(folio)) + if (!folio_test_lru(folio)) { + /* + * Assume folio is on lru_cache and holds a cache reference. + */ + if (folio_ref_count(folio) > 2 + folio_test_swapcache(folio)) + return false; /* * We cannot easily detect+handle references from * remote LRU caches or references to LRU folios. */ lru_add_drain(); + } if (folio_ref_count(folio) > 1 + folio_test_swapcache(folio)) return false; if (!folio_trylock(folio)) -- 2.39.3 (Apple Git-146) The "we just allocated them without exposing them to the swapcache" case no longer exists, as Kairui has routed synchronous I/O through the swapcache as well in his series "unify swapin use swap cache and cleanup flags"[1]. As a result, folio_ref_count() should never be 1 in this path, since at least two references are held (base ref plus swapcache). Remove the folio_ref_count()==1 check and update the comment accordingly. [1] https://lore.kernel.org/all/20251220-swap-table-p2-v5-0-8862a265a033@tencent.com/ Acked-by: Usama Arif Reviewed-by: Kairui Song Reviewed-by: Baoquan He Acked-by: Shakeel Butt Signed-off-by: Barry Song (Xiaomi) --- mm/memory.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index f6848f4234a6..abd0adcf65f0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5049,12 +5049,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* * Same logic as in do_wp_page(); however, optimize for pages that are - * certainly not shared either because we just allocated them without - * exposing them to the swapcache or because the swap entry indicates - * exclusivity. + * certainly not because the swap entry indicates exclusivity. */ - if (!folio_test_ksm(folio) && - (exclusive || folio_ref_count(folio) == 1)) { + if (!folio_test_ksm(folio) && exclusive) { if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) && !pte_needs_soft_dirty_wp(vma, pte)) { pte = pte_mkwrite(pte, vma); -- 2.39.3 (Apple Git-146) We are doing a lot of redundant lru_add_drain() calls in do_swap_page(), especially for synchronous I/O devices. For example, the test program below currently ends up draining lru_cache 100% of the time: int main(int argc, char *argv[]) { int i; #define SIZE 100*1024*1024 while(1) { volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); for (int i = 0; i < SIZE/sizeof(int); i++) p[i] = i%64; madvise((void *)p, SIZE, MADV_PAGEOUT); for (int i = 0; i < SIZE/sizeof(int); i++) p[i] = i%64; munmap(p, SIZE); } return 0; } Folio reuse now relies primarily on the exclusive hint, making lru_cache draining to drop the refcount in lru_cache largely irrelevant. Acked-by: Shakeel Butt Reviewed-by: Baoquan He Signed-off-by: Barry Song (Xiaomi) --- mm/memory.c | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index abd0adcf65f0..2983a6baf474 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4903,16 +4903,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } else if (folio != swapcache) page = folio_page(folio, 0); - /* - * If we want to map a page that's in the swapcache writable, we - * have to detect via the refcount if we're really the exclusive - * owner. Try removing the extra reference from the local LRU - * caches if required. - */ - if ((vmf->flags & FAULT_FLAG_WRITE) && - !folio_test_ksm(folio) && !folio_test_lru(folio)) - lru_add_drain(); - folio_throttle_swaprate(folio, GFP_KERNEL); /* -- 2.39.3 (Apple Git-146) Originally, we unconditionally called lru_add_drain() for write swap-in page faults. This might drop the reference held by the per-CPU LRU cache if the folio happened to reside there. However, there was no guarantee that the folio was actually cached on the current CPU. Now that lru_add_drain() has been removed, we have lost one opportunity to drop a reference held by the LRU cache. We could instead incorporate that possibility into the condition evaluated by should_try_to_free_swap(). Suggested-by: Kairui Song Signed-off-by: Barry Song (Xiaomi) --- mm/memory.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 2983a6baf474..14577c67c61a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5087,8 +5087,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Remove the swap entry and conditionally try to free up the swapcache. * Do it after mapping, so raced page faults will likely see the folio * in swap cache and wait on the folio lock. + * Assume non-LRU folios may be queued in the LRU cache, which contributes + * an additional reference to the folio. */ - if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags)) + if (should_try_to_free_swap(si, folio, vma, nr_pages + + !folio_test_lru(folio), vmf->flags)) folio_free_swap(folio); folio_unlock(folio); -- 2.39.3 (Apple Git-146)