collapse_file() is capable of collapsing pagecache folios from writable files to PMD folios. Now enable clean pagecache folio collapse in addition to read-only pagecache folio collapse by removing the inode_is_open_for_write() from file_thp_enabled() and only performing filemap_flush() if the file is read-only. This means userspace needs to explicitly flush the content of pagecache folios before khugepaged can collapse the folios, or use madvise(MADV_COLLAPSE), which does the flush in the retry. The reason is that blindly enabling dirty pagecache folio from writable files collapse makes khugepaged flush these folios all the time. It is undesirable to cause system level pagecache flushes. To properly support dirty pagecache folio collapse, filemap_flush() needs to be avoided. Potentially, merging associated buffer instead of dropping it with filemap_release_folio() might be needed. NOTE: this breaks khugepaged selftests for writable file pagecache collapse, which is set to fail all the time. The next commit fixes it. Signed-off-by: Zi Yan Reviewed-by: Lance Yang Cc: Al Viro Cc: Baolin Wang Cc: Barry Song Cc: Chris Mason Cc: Christian Brauner Cc: David Hildenbrand (Arm) Cc: David Sterba Cc: Dev Jain Cc: Jan Kara Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Mike Rapoport Cc: Nico Pache Cc: Ryan Roberts Cc: Shuah Khan Cc: Song Liu Cc: Suren Baghdasaryan Cc: Vlastimil Babka --- mm/huge_memory.c | 2 +- mm/khugepaged.c | 15 +++++++++------ 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d055f53be8502..c565b2a651e06 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -97,7 +97,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) if (!mapping_pmd_folio_support(vma->vm_file->f_mapping)) return false; - return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); + return S_ISREG(inode->i_mode); } /* If returns true, we are unable to access the VMA's folios. */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c743ec41a7b8b..395c40c24dbc5 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2342,18 +2342,21 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, } else if (folio_test_dirty(folio)) { /* * This page is dirty because it hasn't - * been flushed since first write. There - * won't be new dirty pages. + * been flushed since first write. * - * Trigger async flush here and hope the - * writeback is done when khugepaged - * revisits this page. + * Trigger async flush for read-only files and + * hope the writeback is done when khugepaged + * revisits this page. Writable files can have + * their folios dirty at any time; blindly + * flushing them would cause undesirable + * system-wide writeback. * * This is a one-off situation. We are not * forcing writeback in loop. */ xas_unlock_irq(&xas); - filemap_flush(mapping); + if (!inode_is_open_for_write(mapping->host)) + filemap_flush(mapping); result = SCAN_PAGE_DIRTY_OR_WRITEBACK; goto xa_unlocked; } else if (folio_test_writeback(folio)) { -- 2.53.0