No numbers to back this up, but it seemed obvious to me, that if there are competing lru_add_drain_all()ers, the work will be minimized if each flushes its own local queues before locking and doing cross-CPU drains. Signed-off-by: Hugh Dickins Acked-by: David Hildenbrand --- mm/swap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/swap.c b/mm/swap.c index b74ebe865dd9..881e53b2877e 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -834,6 +834,9 @@ static inline void __lru_add_drain_all(bool force_all_cpus) */ this_gen = smp_load_acquire(&lru_drain_gen); + /* It helps everyone if we do our own local drain immediately. */ + lru_add_drain(); + mutex_lock(&lock); /* -- 2.51.0