Currently, precision backtracking logic in __mark_chain_precision does
not check if the target register or stack slot is already marked as
precise. This can lead to redundant work when multiple paths require
the same register to be precise.

This patch moves the "skip if already precise" logic into the core
backtracking loop setup in __mark_chain_precision(). This ensures
that all entry points (mark_chain_precision, propagate_precision)
benefit from the optimization and allows for cleaner code at the
call sites.

Specifically:
- In __mark_chain_precision(), before starting the backtracking loop,
  check if the initial register/stack slot is already precise.
- For batch propagation (used by propagate_precision), iterate through
  the requested masks and clear bits that are already precise in the
  starting state.
- Remove redundant checks in mark_chain_precision() and
  propagate_precision().

Performance results:
The optimization significantly reduces verification time by skipping
redundant backtracking for registers already marked as precise.

Key improvements (Duration):
- prog_nested_calls:           671us -> 292us (-56.48%)
- prog_non_constant_callback:  254us -> 111us (-56.30%)
- stack_check:                 807us -> 411us (-49.07%)
- arena_strsearch:             120us -> 65us  (-45.83%)
- prog_null_ctx:               216us -> 164us (-24.07%)

Verified that total instruction and state counts remain identical
across all tests (0.00% change), confirming logic equivalence.

Test steps and detailed performance raw data are as follows:

The data was collected using the following command:
$ sudo ./veristat -e duration,total_insns,total_states -o csv \
    bpf_flow.bpf.o bpf_loop.bpf.o arena_strsearch.bpf.o

Baseline (at b54345928fa1):
file_name,prog_name,verdict,duration,total_insns,total_states
arena_strsearch.bpf.o,arena_strsearch,failure,120,20,2
bpf_flow.bpf.o,_dissect,success,465,211,13
bpf_flow.bpf.o,flow_dissector_0,success,2408,1461,68
bpf_flow.bpf.o,flow_dissector_1,success,2698,1567,59
bpf_flow.bpf.o,flow_dissector_2,success,2010,1244,56
bpf_flow.bpf.o,flow_dissector_3,success,2250,1498,57
bpf_flow.bpf.o,flow_dissector_4,success,343,259,4
bpf_flow.bpf.o,flow_dissector_5,success,674,416,21
bpf_loop.bpf.o,prog_invalid_flags,success,292,50,5
bpf_loop.bpf.o,prog_nested_calls,success,671,145,19
bpf_loop.bpf.o,prog_non_constant_callback,success,254,41,5
bpf_loop.bpf.o,prog_null_ctx,success,216,22,3
bpf_loop.bpf.o,stack_check,success,807,325,25
bpf_loop.bpf.o,test_prog,success,493,64,7

Patched:
file_name,prog_name,verdict,duration,total_insns,total_states
arena_strsearch.bpf.o,arena_strsearch,failure,65,20,2
bpf_flow.bpf.o,_dissect,success,467,211,13
bpf_flow.bpf.o,flow_dissector_0,success,2392,1461,68
bpf_flow.bpf.o,flow_dissector_1,success,2722,1567,59
bpf_flow.bpf.o,flow_dissector_2,success,2050,1244,56
bpf_flow.bpf.o,flow_dissector_3,success,2258,1498,57
bpf_flow.bpf.o,flow_dissector_4,success,342,259,4
bpf_flow.bpf.o,flow_dissector_5,success,702,416,21
bpf_loop.bpf.o,prog_invalid_flags,success,239,50,5
bpf_loop.bpf.o,prog_nested_calls,success,292,145,19
bpf_loop.bpf.o,prog_non_constant_callback,success,111,41,5
bpf_loop.bpf.o,prog_null_ctx,success,164,22,3
bpf_loop.bpf.o,stack_check,success,411,325,25
bpf_loop.bpf.o,test_prog,success,484,64,7

Comparison (veristat -C):
$ ./veristat -C prec_skip_baseline.csv prec_skip_patched.csv
File                   Program                     Verdict (A)  Verdict (B)  Verdict (DIFF)  Duration (us) (A)  Duration (us) (B)  Duration (us) (DIFF)  Insns (A)  Insns (B)  Insns (DIFF)  States (A)  States (B)  States (DIFF)
---------------------  --------------------------  -----------  -----------  --------------  -----------------  -----------------  --------------------  ---------  ---------  ------------  ----------  ----------  -------------
arena_strsearch.bpf.o  arena_strsearch             failure      failure      MATCH                         120                 65         -55 (-45.83%)         20         20   +0 (+0.00%)           2           2    +0 (+0.00%)
bpf_flow.bpf.o         _dissect                    success      success      MATCH                         465                467           +2 (+0.43%)        211        211   +0 (+0.00%)          13          13    +0 (+0.00%)
bpf_flow.bpf.o         flow_dissector_0            success      success      MATCH                        2408               2392          -16 (-0.66%)       1461       1461   +0 (+0.00%)          68          68    +0 (+0.00%)
bpf_flow.bpf.o         flow_dissector_1            success      success      MATCH                        2698               2722          +24 (+0.89%)       1567       1567   +0 (+0.00%)          59          59    +0 (+0.00%)
bpf_flow.bpf.o         flow_dissector_2            success      success      MATCH                        2010               2050          +40 (+1.99%)       1244       1244   +0 (+0.00%)          56          56    +0 (+0.00%)
bpf_flow.bpf.o         flow_dissector_3            success      success      MATCH                        2250               2258           +8 (+0.36%)       1498       1498   +0 (+0.00%)          57          57    +0 (+0.00%)
bpf_flow.bpf.o         flow_dissector_4            success      success      MATCH                         343                342           -1 (-0.29%)        259        259   +0 (+0.00%)           4           4    +0 (+0.00%)
bpf_flow.bpf.o         flow_dissector_5            success      success      MATCH                         674                702          +28 (+4.15%)        416        416   +0 (+0.00%)          21          21    +0 (+0.00%)
bpf_loop.bpf.o         prog_invalid_flags          success      success      MATCH                         292                239         -53 (-18.15%)         50         50   +0 (+0.00%)           5           5    +0 (+0.00%)
bpf_loop.bpf.o         prog_nested_calls           success      success      MATCH                         671                292        -379 (-56.48%)        145        145   +0 (+0.00%)          19          19    +0 (+0.00%)
bpf_loop.bpf.o         prog_non_constant_callback  success      success      MATCH                         254                111        -143 (-56.30%)         41         41   +0 (+0.00%)           5           5    +0 (+0.00%)
bpf_loop.bpf.o         prog_null_ctx               success      success      MATCH                         216                164         -52 (-24.07%)         22         22   +0 (+0.00%)           3           3    +0 (+0.00%)
bpf_loop.bpf.o         stack_check                 success      success      MATCH                         807                411        -396 (-49.07%)        325        325   +0 (+0.00%)          25          25    +0 (+0.00%)
bpf_loop.bpf.o         test_prog                   success      success      MATCH                         493                484           -9 (-1.83%)         64         64   +0 (+0.00%)           7           7    +0 (+0.00%)

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
v1 -> v2:
- Move "skip if already precise" logic into the core backtracking engine
  __mark_chain_precision() as suggested by Eduard Zingerman.
- Add detailed performance benchmark results and veristat comparison.
- Clean up call sites in mark_chain_precision() and propagate_precision().

 kernel/bpf/verifier.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f0ca69f888fa..340d972bd833 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4765,14 +4765,37 @@ static int __mark_chain_precision(struct bpf_verifier_env *env,
 	 * slot, but don't set precise flag in current state, as precision
 	 * tracking in the current state is unnecessary.
 	 */
-	func = st->frame[bt->frame];
 	if (regno >= 0) {
-		reg = &func->regs[regno];
+		reg = &st->frame[bt->frame]->regs[regno];
 		if (reg->type != SCALAR_VALUE) {
 			verifier_bug(env, "backtracking misuse");
 			return -EFAULT;
 		}
+		if (reg->precise)
+			return 0;
 		bt_set_reg(bt, regno);
+	} else {
+		for (fr = bt->frame; fr >= 0; fr--) {
+			u32 reg_mask = bt_frame_reg_mask(bt, fr);
+			u64 stack_mask = bt_frame_stack_mask(bt, fr);
+			DECLARE_BITMAP(mask, 64);
+
+			func = st->frame[fr];
+			if (reg_mask) {
+				bitmap_from_u64(mask, reg_mask);
+				for_each_set_bit(i, mask, 32) {
+					if (func->regs[i].precise)
+						bt_clear_frame_reg(bt, fr, i);
+				}
+			}
+			if (stack_mask) {
+				bitmap_from_u64(mask, stack_mask);
+				for_each_set_bit(i, mask, 64) {
+					if (func->stack[i].spilled_ptr.precise)
+						bt_clear_frame_slot(bt, fr, i);
+				}
+			}
+		}
 	}
 
 	if (bt_empty(bt))
-- 
2.39.5