From: Wanpeng Li pick_eevdf()'s PICK_BUDDY path only returns cfs_rq->next when the entity is eligible. A yield_to() target that is behind avg_vruntime at any level of its sched_entity hierarchy is skipped, and the set_next_buddy() hint is lost. Add eevdf_credit_entity_vlag(), which can credit a nominated entity up to the eligibility boundary so that pick_eevdf() can honor the buddy hint. The helper handles cfs_rq->curr, which is off-tree and can be shifted in place while carrying any active vprot window. Gate the helper behind SCHED_FEAT(YIELD_TO_LAG_CREDIT). The helper has no caller in this change, so mark it __maybe_unused; there is no functional change. Signed-off-by: Wanpeng Li --- kernel/sched/fair.c | 48 +++++++++++++++++++++++++++++++++++++++++ kernel/sched/features.h | 9 ++++++++ 2 files changed, 57 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3ebec186f982..e7f5ea25fdae 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9341,6 +9341,54 @@ static void put_prev_task_fair(struct rq *rq, struct task_struct *prev, struct t } } +/* + * eevdf_credit_entity_vlag - credit a nominated next-buddy to eligibility + * + * Advance @se (already nominated by set_next_buddy(), so cfs_rq->next == se) + * just enough negative vlag to reach the eligibility boundary (vlag = 0) so + * pick_eevdf()'s PICK_BUDDY branch returns it. cfs_rq->curr is shifted in + * place (off-tree, carrying any vprot window). Queued entities are left + * unchanged. + * + * Idempotent: a no-op once @se is already eligible. Caller must hold + * rq_of(cfs_rq)->lock with rq_clock up to date. + */ +static void __maybe_unused +eevdf_credit_entity_vlag(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + u64 avruntime, credit; + s64 vlag; + + /* Callers gate this helper with YIELD_TO_LAG_CREDIT. */ + if (cfs_rq->nr_queued < 2) + return; + if (throttled_hierarchy(cfs_rq)) + return; + if (WARN_ON_ONCE(!se->on_rq) || se->sched_delayed) + return; + + update_curr(cfs_rq); + avruntime = avg_vruntime(cfs_rq); + vlag = entity_lag(cfs_rq, se, avruntime); + + /* Already eligible: nothing to do. */ + if (vlag >= 0) + return; + + credit = (u64)(-vlag); + + if (cfs_rq->curr == se) { + /* curr is off-tree: in-place shift, carrying any vprot window. */ + if (protect_slice(se)) + se->vprot -= credit; + se->vruntime -= credit; + se->deadline -= credit; + return; + } + + /* Queued entities are left unchanged by this helper path. */ +} + /* * sched_yield() is very simple */ diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 84c4fe3abd74..65c511c9ca28 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -40,6 +40,15 @@ SCHED_FEAT(NEXT_BUDDY, false) */ SCHED_FEAT(PICK_BUDDY, true) +/* + * Let yield_to_task_fair() credit bounded EEVDF lag to the nominated + * next-buddy so pick_eevdf() honors the hint even when the target has + * negative vlag at some level of its ancestor chain. The credit is bounded + * by a queue-depth-scaled margin within entity_lag()'s legal range, so + * fairness is preserved. + */ +SCHED_FEAT(YIELD_TO_LAG_CREDIT, true) + /* * Consider buddies to be cache hot, decreases the likeliness of a * cache buddy being migrated away, increases cache locality. -- 2.43.0