Add three new mTHP statistics to track collapse failures for different orders when encountering swap PTEs, excessive none PTEs, and shared PTEs: - collapse_exceed_swap_pte: Increment when mTHP collapse fails due to swap PTEs - collapse_exceed_none_pte: Counts when mTHP collapse fails due to exceeding the none PTE threshold for the given order - collapse_exceed_shared_pte: Counts when mTHP collapse fails due to shared PTEs These statistics complement the existing THP_SCAN_EXCEED_* events by providing per-order granularity for mTHP collapse attempts. The stats are exposed via sysfs under `/sys/kernel/mm/transparent_hugepage/hugepages-*/stats/` for each supported hugepage size. As we currently dont support collapsing mTHPs that contain a swap or shared entry, those statistics keep track of how often we are encountering failed mTHP collapses due to these restrictions. Signed-off-by: Nico Pache --- Documentation/admin-guide/mm/transhuge.rst | 23 ++++++++++++++++++++++ include/linux/huge_mm.h | 3 +++ mm/huge_memory.c | 7 +++++++ mm/khugepaged.c | 16 ++++++++++++--- 4 files changed, 46 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 13269a0074d4..7c71cda8aea1 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -709,6 +709,29 @@ nr_anon_partially_mapped an anonymous THP as "partially mapped" and count it here, even though it is not actually partially mapped anymore. +collapse_exceed_none_pte + The number of anonymous mTHP pte ranges where the number of none PTEs + exceeded the max_ptes_none threshold. For mTHP collapse, khugepaged + checks a PMD region and tracks which PTEs are present. It then tries + to collapse to the largest enabled mTHP size. The allowed number of empty + PTEs is the max_ptes_none threshold scaled by the collapse order. This + counter records the number of times a collapse attempt was skipped for + this reason, and khugepaged moved on to try the next available mTHP size. + +collapse_exceed_swap_pte + The number of anonymous mTHP pte ranges which contain at least one swap + PTE. Currently khugepaged does not support collapsing mTHP regions + that contain a swap PTE. This counter can be used to monitor the + number of khugepaged mTHP collapses that failed due to the presence + of a swap PTE. + +collapse_exceed_shared_pte + The number of anonymous mTHP pte ranges which contain at least one shared + PTE. Currently khugepaged does not support collapsing mTHP pte ranges + that contain a shared PTE. This counter can be used to monitor the + number of khugepaged mTHP collapses that failed due to the presence + of a shared PTE. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index d442f45bd458..990622c96c8b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -144,6 +144,9 @@ enum mthp_stat_item { MTHP_STAT_SPLIT_DEFERRED, MTHP_STAT_NR_ANON, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, + MTHP_STAT_COLLAPSE_EXCEED_SWAP, + MTHP_STAT_COLLAPSE_EXCEED_NONE, + MTHP_STAT_COLLAPSE_EXCEED_SHARED, __MTHP_STAT_COUNT }; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 76509e3d845b..07ea9aafd64c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -638,6 +638,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON); DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_SWAP); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_NONE); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + static struct attribute *anon_stats_attrs[] = { &anon_fault_alloc_attr.attr, @@ -654,6 +658,9 @@ static struct attribute *anon_stats_attrs[] = { &split_deferred_attr.attr, &nr_anon_attr.attr, &nr_anon_partially_mapped_attr.attr, + &collapse_exceed_swap_pte_attr.attr, + &collapse_exceed_none_pte_attr.attr, + &collapse_exceed_shared_pte_attr.attr, NULL, }; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ebcc0c85a0d6..8abbe6e4317a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -589,7 +589,9 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, continue; } else { result = SCAN_EXCEED_NONE_PTE; - count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + if (order == HPAGE_PMD_ORDER) + count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE); goto out; } } @@ -628,10 +630,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, * shared may cause a future higher order collapse on a * rescan of the same range. */ - if (order != HPAGE_PMD_ORDER || (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared)) { + if (order != HPAGE_PMD_ORDER) { + result = SCAN_EXCEED_SHARED_PTE; + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + goto out; + } + + if (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); goto out; } } @@ -1071,6 +1080,7 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, * range. */ if (order != HPAGE_PMD_ORDER) { + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SWAP); pte_unmap(pte); mmap_read_unlock(mm); result = SCAN_EXCEED_SWAP_PTE; -- 2.51.0