Add three new mTHP statistics to track collapse failures for different orders when encountering swap PTEs, excessive none PTEs, and shared PTEs: - collapse_exceed_swap_pte: Increment when mTHP collapse fails due to swap PTEs - collapse_exceed_none_pte: Counts when mTHP collapse fails due to exceeding the none PTE threshold for the given order - collapse_exceed_shared_pte: Counts when mTHP collapse fails due to shared PTEs These statistics complement the existing THP_SCAN_EXCEED_* events by providing per-order granularity for mTHP collapse attempts. The stats are exposed via sysfs under `/sys/kernel/mm/transparent_hugepage/hugepages-*/stats/` for each supported hugepage size. As we currently dont support collapsing mTHPs that contain a swap or shared entry, those statistics keep track of how often we are encountering failed mTHP collapses due to these restrictions. Reviewed-by: Baolin Wang Signed-off-by: Nico Pache --- Documentation/admin-guide/mm/transhuge.rst | 24 ++++++++++++++++++++++ include/linux/huge_mm.h | 3 +++ mm/huge_memory.c | 7 +++++++ mm/khugepaged.c | 16 ++++++++++++--- 4 files changed, 47 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index c51932e6275d..d396d1bfb274 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -714,6 +714,30 @@ nr_anon_partially_mapped an anonymous THP as "partially mapped" and count it here, even though it is not actually partially mapped anymore. +collapse_exceed_none_pte + The number of collapse attempts that failed due to exceeding the + max_ptes_none threshold. For mTHP collapse, Currently only max_ptes_none + values of 0 and (HPAGE_PMD_NR - 1) are supported. Any other value will + emit a warning and no mTHP collapse will be attempted. khugepaged will + try to collapse to the largest enabled (m)THP size, if it fails, it will + try the next lower enabled mTHP size. This counter records the number of + times a collapse attempt was skipped for exceeding the max_ptes_none + threshold, and khugepaged will move on to the next available mTHP size. + +collapse_exceed_swap_pte + The number of anonymous mTHP pte ranges which were unable to collapse due + to containing at least one swap PTE. Currently khugepaged does not + support collapsing mTHP regions that contain a swap PTE. This counter can + be used to monitor the number of khugepaged mTHP collapses that failed + due to the presence of a swap PTE. + +collapse_exceed_shared_pte + The number of anonymous mTHP pte ranges which were unable to collapse due + to containing at least one shared PTE. Currently khugepaged does not + support collapsing mTHP pte ranges that contain a shared PTE. This + counter can be used to monitor the number of khugepaged mTHP collapses + that failed due to the presence of a shared PTE. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f93365e182b4..1082b78e794d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -144,6 +144,9 @@ enum mthp_stat_item { MTHP_STAT_SPLIT_DEFERRED, MTHP_STAT_NR_ANON, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, + MTHP_STAT_COLLAPSE_EXCEED_SWAP, + MTHP_STAT_COLLAPSE_EXCEED_NONE, + MTHP_STAT_COLLAPSE_EXCEED_SHARED, __MTHP_STAT_COUNT }; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c1e1e91b0e61..b4d9b3ac9a7c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -639,6 +639,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON); DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_SWAP); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_NONE); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + static struct attribute *anon_stats_attrs[] = { &anon_fault_alloc_attr.attr, @@ -655,6 +659,9 @@ static struct attribute *anon_stats_attrs[] = { &split_deferred_attr.attr, &nr_anon_attr.attr, &nr_anon_partially_mapped_attr.attr, + &collapse_exceed_swap_pte_attr.attr, + &collapse_exceed_none_pte_attr.attr, + &collapse_exceed_shared_pte_attr.attr, NULL, }; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b2ea56c9bb42..efb8a47af65a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -604,7 +604,9 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, continue; } else { result = SCAN_EXCEED_NONE_PTE; - count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + if (!is_mthp_order(order)) + count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE); goto out; } } @@ -634,10 +636,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, * shared may cause a future higher order collapse on a * rescan of the same range. */ - if (is_mthp_order(order) || (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared)) { + if (is_mthp_order(order)) { + result = SCAN_EXCEED_SHARED_PTE; + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + goto out; + } + + if (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); goto out; } } @@ -1086,6 +1095,7 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, * range. */ if (is_mthp_order(order)) { + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SWAP); pte_unmap(pte); mmap_read_unlock(mm); result = SCAN_EXCEED_SWAP_PTE; -- 2.51.1