We've hit the 512 bytes limit on stack depth a few times in Cilium recently. As a result, we started reporting in CI our current maximum stack depth across all configurations for each BPF program. Unfortunately, that is not trivial to compute in userspace. The verifier reports the stack depths of individual subprogs at the end of the logs. However the maximum combined stack depth also depends on the callgraph of those subprogs (the max combined stack depth is the height of the callgraph weighted by per-subprog stack depths). We can compute a callgraph in userspace from the loaded instructions, but it often doesn't match the verifier's own callgraph because of dead code elimination. Our current approach relies on dumping the BPF_LOG_LEVEL2 logs, but this feels overkill considering the verifier already has the information we need. The patch lets the verifier dump the maximum combined stack depth in the logs, on the same line as the per-subprog stack depths: stack depth 16+256 max 272 The per-subprog stack depths and the new max stack depth are not directly comparable. The former is sometimes updated during fixups, while the latter is not. As a result, even with a single subprog, we may end up with two slightly different values. The aim of the new max value is to be closest to what is actually enforced by the verifier. Signed-off-by: Paul Chaignon --- include/linux/bpf_verifier.h | 2 ++ kernel/bpf/verifier.c | 6 +++++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 976e2b2f40e8..d91843994c82 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -936,6 +936,8 @@ struct bpf_verifier_env { u32 prev_insn_processed, insn_processed; /* number of jmps, calls, exits analyzed so far */ u32 prev_jmps_processed, jmps_processed; + /* maximum combined stack depth */ + u32 max_stack_depth; /* total verification time */ u64 verification_time; /* maximum number of verifier states kept in 'branching' instructions */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 11054ad89c14..896dbb4515d7 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5045,6 +5045,8 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx, } } else { depth += subprog_depth; + if (depth > env->max_stack_depth) + env->max_stack_depth = depth; if (depth > MAX_BPF_STACK) { total = 0; for (tmp = idx; tmp >= 0; tmp = dinfo[tmp].caller) @@ -5185,6 +5187,8 @@ static int check_max_stack_depth(struct bpf_verifier_env *env) if (priv_stack_mode == PRIV_STACK_UNKNOWN) priv_stack_mode = bpf_enable_priv_stack(env->prog); + env->max_stack_depth = env->subprog_info[0].stack_depth; + /* All async_cb subprogs use normal kernel stack. If a particular * subprog appears in both main prog and async_cb subtree, that * subprog will use normal kernel stack to avoid potential nesting. @@ -18289,7 +18293,7 @@ static void print_verification_stats(struct bpf_verifier_env *env) verbose(env, "stack depth %d", env->subprog_info[0].stack_depth); for (i = 1; i < subprog_cnt; i++) verbose(env, "+%d", env->subprog_info[i].stack_depth); - verbose(env, "\n"); + verbose(env, " max %d\n", env->max_stack_depth); verbose(env, "insns processed %d", env->subprog_info[0].insn_processed); for (i = 1; i < subprog_cnt; i++) if (bpf_subprog_is_global(env, i)) -- 2.43.0 This patch tests the maximum stack depth reporting in verifier logs, with a couple special cases covered: fastcall, private stacks, and rounding up to 16 bytes. For that last one, we need to skip the test when JIT compilation is disabled as the rounding is then to 32 bytes. Signed-off-by: Paul Chaignon --- tools/testing/selftests/bpf/progs/verifier_bpf_fastcall.c | 3 +-- tools/testing/selftests/bpf/progs/verifier_private_stack.c | 3 +++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/verifier_bpf_fastcall.c b/tools/testing/selftests/bpf/progs/verifier_bpf_fastcall.c index 0d9e167555b5..8d7ff38e4c06 100644 --- a/tools/testing/selftests/bpf/progs/verifier_bpf_fastcall.c +++ b/tools/testing/selftests/bpf/progs/verifier_bpf_fastcall.c @@ -799,8 +799,7 @@ __naked int bpf_loop_interaction2(void) SEC("raw_tp") __arch_x86_64 -__log_level(4) -__msg("stack depth 512+0") +__log_level(4) __msg("stack depth 512+0 max 512") /* just to print xlated version when debugging */ __xlated("r0 = &(void __percpu *)(r0)") __success diff --git a/tools/testing/selftests/bpf/progs/verifier_private_stack.c b/tools/testing/selftests/bpf/progs/verifier_private_stack.c index 646e8ef82051..4167d3a09252 100644 --- a/tools/testing/selftests/bpf/progs/verifier_private_stack.c +++ b/tools/testing/selftests/bpf/progs/verifier_private_stack.c @@ -86,6 +86,7 @@ __naked static void cumulative_stack_depth_subprog(void) SEC("kprobe") __description("Private stack, subtree > MAX_BPF_STACK") __success +__log_level(4) __msg("stack depth 512+32 max 512") __arch_x86_64 /* private stack fp for the main prog */ __jited(" movabsq $0x{{.*}}, %r9") @@ -324,6 +325,8 @@ int private_stack_async_callback_1(void) SEC("fentry/bpf_fentry_test9") __description("Private stack, async callback, potential nesting") __success __retval(0) +__load_if_JITed() +__log_level(4) __msg("stack depth 8+0+256+0 max 272") __arch_x86_64 __jited(" subq $0x100, %rsp") __arch_arm64 -- 2.43.0