The histogram for under-quota region prioritization [1] is made for all
regions that are eligible for the DAMOS target access pattern.  When
there are DAMOS filters, the prioritization-threshold access temperature
that generated from the histogram could be inaccurate.

For example, suppose there are three regions.  Each region is 1 GiB.
The access temperature of the regions are 100, 50, and 0.  And a DAMOS
scheme that targeting _any_ access temperature with quota 2 GiB is
being used.  The histogram will look like below:

    temperature	size of regions having >=temperature temperature
    0		3 GiB
    50		2 GiB
    100		1 GiB

Based on the histogram and the quota (2 GiB), DAMOS applies the action
to only the regions having >=50 temperature.  This is all good.

Let's suppose the region of temperature 50 is excluded by a DAMOS
filter.  Regardless of the filter, DAMOS will try to apply the action on
only regions having >=50 temperature.  Because the region of temperature
50 is filtered out, the action is applied to only the region of
temperature 100.  Worse yet, suppose the filter is excluding regions of
temperature 50 and 100.  Then no action is really applied to any region,
while the region of temperature 0 is there.

People used to work around this by utilizing multiple contexts, instead
of the core layer DAMOS filters.  For example, DAMON-based memory
tiering approaches including the quota auto-tuning based one [2] are
using a DAMON context per NUMA node.  If the above explained issue is
effectively alleviated, those can be configured again to run with single
context and DAMOS filters for applying the promotion and demotion to
only specific NUMA nodes.

Alleviate the problem by checking core DAMOS filters when generating the
histogram.  The reason to check only core filters is the overhead.
While core filters are usually for coarse-grained filtering (e.g.,
target/address filters for process, NUMA, zone level filtering),
operation layer filters are usually for fine-grained filtering (e.g.,
for anon page).  Doing this for operation layer filters would cause
significant overhead.  There is no known use case that is affected by
the operation layer filters-distorted histogram problem, though.  Do
this for only core filters for now.  We will revisit this for operation
layer filters in future.  We might be able to apply a sort of sampling
based operation layer filtering.

After this fix is applied, for the first case that there is a DAMOS
filter excluding the region of temperature 50, the histogram will be
like below:

    temperature	size of regions having >=temperature temperature
    0		2 GiB
    100		1 GiB

And DAMOS will set the temperature threshold as 0, allowing both regions
of temperatures 0 and 100 be applied.

For the second case that there is a DAMOS filter excluding the regions
of temperature 50 and 100, the histogram will be like below:

    temperature	size of regions having >=temperature temperature
    0		1 GiB

And DAMOS will set the temperature threshold as 0, allowing the region
of temperature 0 be applied.

[1] 'Prioritization' section of Documentation/mm/damon/design.rst
[2] commit 0e1c773b501f ("mm/damon/core: introduce damos quota goal
    metrics for memory node utilization")

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/damon/core.c b/mm/damon/core.c
index 5e2724a4f285e..bda4218188314 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2309,6 +2309,8 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s)
 		damon_for_each_region(r, t) {
 			if (!__damos_valid_target(r, s))
 				continue;
+			if (damos_core_filter_out(c, t, r, s))
+				continue;
 			score = c->ops.get_scheme_score(c, t, r, s);
 			c->regions_score_histogram[score] +=
 				damon_sz_region(r);
-- 
2.47.3

kdamond_apply_schemes() is using safe regions walk
(damon_for_each_region_safe()), which is safe for deallocation of the
region inside the loop.  Actually the code does not only read, but also
write of the regions.  Specifically, regions can be split inside the
loop.  But, splitting a region doesn't deallocate a region, or corrupt
the list.  There is hence no reason to use the safe walk.  Rather, it is
wasting the next pointer and causing a problem.

When an address filter is applied, and there is a region that intersects
with the filter, the filter splits the region on the filter boundary.
The intention is to let DAMOS apply action to only filtered address
ranges.  However, because DAMOS is doing the safe walk, in the next
iteration, the region that split and now will be next to the previous
region, is simply ignored.

Use the non-safe version of the walk, which is safe for this use case.

damos_skip_charged_region() was working around the issue using a pointer
of pointer hack.  Remove that together.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/core.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/mm/damon/core.c b/mm/damon/core.c
index bda4218188314..0ff190ed8a599 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1707,17 +1707,18 @@ static bool damos_valid_target(struct damon_ctx *c, struct damon_target *t,
  * This function checks if a given region should be skipped or not for the
  * reason.  If only the starting part of the region has previously charged,
  * this function splits the region into two so that the second one covers the
- * area that not charged in the previous charge widnow and saves the second
- * region in *rp and returns false, so that the caller can apply DAMON action
- * to the second one.
+ * area that not charged in the previous charge widnow, and return true.  The
+ * caller can see the second one on the next iteration of the region walk.
+ * Note that this means the caller should use damon_for_each_region() instead
+ * of damon_for_each_region_safe().  If damon_for_each_region_safe() is used,
+ * the second region will just be ignored.
  *
- * Return: true if the region should be entirely skipped, false otherwise.
+ * Return: true if the region should be skipped, false otherwise.
  */
 static bool damos_skip_charged_region(struct damon_target *t,
-		struct damon_region **rp, struct damos *s,
+		struct damon_region *r, struct damos *s,
 		unsigned long min_region_sz)
 {
-	struct damon_region *r = *rp;
 	struct damos_quota *quota = &s->quota;
 	unsigned long sz_to_skip;
 
@@ -1744,8 +1745,7 @@ static bool damos_skip_charged_region(struct damon_target *t,
 				sz_to_skip = min_region_sz;
 			}
 			damon_split_region_at(t, r, sz_to_skip);
-			r = damon_next_region(r);
-			*rp = r;
+			return true;
 		}
 		quota->charge_target_from = NULL;
 		quota->charge_addr_from = 0;
@@ -2004,7 +2004,7 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
 		if (quota->esz && quota->charged_sz >= quota->esz)
 			continue;
 
-		if (damos_skip_charged_region(t, &r, s, c->min_region_sz))
+		if (damos_skip_charged_region(t, r, s, c->min_region_sz))
 			continue;
 
 		if (s->max_nr_snapshots &&
@@ -2347,7 +2347,7 @@ static void damos_trace_stat(struct damon_ctx *c, struct damos *s)
 static void kdamond_apply_schemes(struct damon_ctx *c)
 {
 	struct damon_target *t;
-	struct damon_region *r, *next_r;
+	struct damon_region *r;
 	struct damos *s;
 	unsigned long sample_interval = c->attrs.sample_interval ?
 		c->attrs.sample_interval : 1;
@@ -2373,7 +2373,7 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
 		if (c->ops.target_valid && c->ops.target_valid(t) == false)
 			continue;
 
-		damon_for_each_region_safe(r, next_r, t)
+		damon_for_each_region(r, t)
 			damon_do_apply_schemes(c, t, r);
 	}
 
-- 
2.47.3