We always unconditionally drain the LRU before retrying anon folio
reuse in wp_can_reuse_anon_folio(). Instead, assume !LRU anon folios
are in lru_cache, and use the refcount to avoid many unnecessary LRU
drains.

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
 mm/memory.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index ff338c2abe92..f6848f4234a6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4193,12 +4193,18 @@ static bool wp_can_reuse_anon_folio(struct folio *folio,
 	 */
 	if (folio_test_ksm(folio) || folio_ref_count(folio) > 3)
 		return false;
-	if (!folio_test_lru(folio))
+	if (!folio_test_lru(folio)) {
+		/*
+		 * Assume folio is on lru_cache and holds a cache reference.
+		 */
+		if (folio_ref_count(folio) > 2 + folio_test_swapcache(folio))
+			return false;
 		/*
 		 * We cannot easily detect+handle references from
 		 * remote LRU caches or references to LRU folios.
 		 */
 		lru_add_drain();
+	}
 	if (folio_ref_count(folio) > 1 + folio_test_swapcache(folio))
 		return false;
 	if (!folio_trylock(folio))
-- 
2.39.3 (Apple Git-146)


The "we just allocated them without exposing them to the swapcache"
case no longer exists, as Kairui has routed synchronous I/O through
the swapcache as well in his series "unify swapin use swap cache and
cleanup flags"[1]. As a result, folio_ref_count() should never be 1
in this path, since at least two references are held (base ref plus
swapcache). Remove the folio_ref_count()==1 check and update the
comment accordingly.

[1] https://lore.kernel.org/all/20251220-swap-table-p2-v5-0-8862a265a033@tencent.com/

Acked-by: Usama Arif <usama.arif@linux.dev>
Reviewed-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
 mm/memory.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index f6848f4234a6..abd0adcf65f0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5049,12 +5049,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 
 	/*
 	 * Same logic as in do_wp_page(); however, optimize for pages that are
-	 * certainly not shared either because we just allocated them without
-	 * exposing them to the swapcache or because the swap entry indicates
-	 * exclusivity.
+	 * certainly not because the swap entry indicates exclusivity.
 	 */
-	if (!folio_test_ksm(folio) &&
-	    (exclusive || folio_ref_count(folio) == 1)) {
+	if (!folio_test_ksm(folio) && exclusive) {
 		if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) &&
 		    !pte_needs_soft_dirty_wp(vma, pte)) {
 			pte = pte_mkwrite(pte, vma);
-- 
2.39.3 (Apple Git-146)


We are doing a lot of redundant lru_add_drain() calls in
do_swap_page(), especially for synchronous I/O devices. For
example, the test program below currently ends up draining
lru_cache 100% of the time:

int main(int argc, char *argv[])
{
        int i;
 #define SIZE 100*1024*1024
	while(1) {
		volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
                        MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

		for (int i = 0; i < SIZE/sizeof(int); i++)
			p[i] =  i%64;
		madvise((void *)p, SIZE, MADV_PAGEOUT);
		for (int i = 0; i < SIZE/sizeof(int); i++)
			p[i] =  i%64;
		munmap(p, SIZE);
	}
	return 0;
}

Folio reuse now relies primarily on the exclusive hint, making
lru_cache draining to drop the refcount in lru_cache largely
irrelevant.

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
 mm/memory.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index abd0adcf65f0..2983a6baf474 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4903,16 +4903,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	} else if (folio != swapcache)
 		page = folio_page(folio, 0);
 
-	/*
-	 * If we want to map a page that's in the swapcache writable, we
-	 * have to detect via the refcount if we're really the exclusive
-	 * owner. Try removing the extra reference from the local LRU
-	 * caches if required.
-	 */
-	if ((vmf->flags & FAULT_FLAG_WRITE) &&
-	    !folio_test_ksm(folio) && !folio_test_lru(folio))
-		lru_add_drain();
-
 	folio_throttle_swaprate(folio, GFP_KERNEL);
 
 	/*
-- 
2.39.3 (Apple Git-146)


Originally, we unconditionally called lru_add_drain() for write
swap-in page faults. This might drop the reference held by the per-CPU
LRU cache if the folio happened to reside there. However, there was no
guarantee that the folio was actually cached on the current CPU.

Now that lru_add_drain() has been removed, we have lost one
opportunity to drop a reference held by the LRU cache. We could
instead incorporate that possibility into the condition evaluated by
should_try_to_free_swap().

Suggested-by: Kairui Song <ryncsn@gmail.com>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
 mm/memory.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 2983a6baf474..14577c67c61a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5087,8 +5087,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	 * Remove the swap entry and conditionally try to free up the swapcache.
 	 * Do it after mapping, so raced page faults will likely see the folio
 	 * in swap cache and wait on the folio lock.
+	 * Assume non-LRU folios may be queued in the LRU cache, which contributes
+	 * an additional reference to the folio.
 	 */
-	if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
+	if (should_try_to_free_swap(si, folio, vma, nr_pages +
+			!folio_test_lru(folio), vmf->flags))
 		folio_free_swap(folio);
 
 	folio_unlock(folio);
-- 
2.39.3 (Apple Git-146)