In the old behavior, duplicated packets were sent back to the root qdisc, which could create dangerous infinite loops in hierarchical setups - imagine a scenario where each level of a multi-stage netem hierarchy kept feeding duplicates back to the top, potentially causing system instability or resource exhaustion. The new behavior elegantly solves this by enqueueing duplicates to the same qdisc that created them, ensuring that packet duplication occurs exactly once per netem stage in a controlled, predictable manner. This change enables users to safely construct complex network emulation scenarios using netem hierarchies (like the 4x multiplication demonstrated in testing) without worrying about runaway packet generation, while still preserving the intended duplication effects. Another advantage of this approach is that it eliminates the enqueue reentrant behaviour which triggered many vulnerabilities. See the last patch in this patchset which updates the test cases for such vulnerabilities. Now users can confidently chain multiple netem qdiscs together to achieve sophisticated network impairment combinations, knowing that each stage will apply its effects exactly once to the packet flow, making network testing scenarios more reliable and results more deterministic. I tested netem packet duplication in two configurations: 1. Nest netem-to-netem hierarchy using parent/child attachment 2. Single netem using prio qdisc with netem leaf Setup commands and results: Single netem hierarchy (prio + netem): tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 tc filter add dev lo parent 1:0 protocol ip matchall classid 1:1 tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100% Result: 2x packet multiplication (1→2 packets) 2 echo requests + 4 echo replies = 6 total packets Expected behavior: Only one netem stage exists in this hierarchy, so 1 ping becomes 2 packets (100% duplication). The 2 echo requests generate 2 echo replies, which also get duplicated to 4 replies, yielding the predictable total of 6 packets (2 requests + 4 replies). Nest netem hierarchy (netem + netem): tc qdisc add dev lo root handle 1: netem limit 1000 duplicate 100% tc qdisc add dev lo parent 1: handle 2: netem limit 1000 duplicate 100% Result: 4x packet multiplication (1→2→4 packets) 4 echo requests + 16 echo replies = 20 total packets Expected behavior: Root netem duplicates 1 ping to 2 packets, child netem receives 2 packets and duplicates each to create 4 total packets. Since ping operates bidirectionally, 4 echo requests generate 4 echo replies, which also get duplicated through the same hierarchy (4→8→16), resulting in the predictable total of 20 packets (4 requests + 16 replies). The new netem duplication behavior does not break the documented semantics of "creates a copy of the packet before queuing." The man page description remains true since duplication occurs before the queuing process, creating both original and duplicate packets that are then enqueued. The documentation does not specify which qdisc should receive the duplicates, only that copying happens before queuing. The implementation choice to enqueue duplicates to the same qdisc (rather than root) is an internal detail that maintains the documented behavior while preventing infinite loops in hierarchical configurations. Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication") Reported-by: William Liu Reported-by: Savino Dicanosa Signed-off-by: Cong Wang --- net/sched/sch_netem.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index fdd79d3ccd8c..191f64bd68ff 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -165,6 +165,7 @@ struct netem_sched_data { */ struct netem_skb_cb { u64 time_to_send; + u8 duplicate : 1; }; static inline struct netem_skb_cb *netem_skb_cb(struct sk_buff *skb) @@ -460,8 +461,16 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, skb->prev = NULL; /* Random duplication */ - if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng)) - ++count; + if (q->duplicate) { + bool dup = true; + + if (netem_skb_cb(skb)->duplicate) { + netem_skb_cb(skb)->duplicate = 0; + dup = false; + } + if (dup && q->duplicate >= get_crandom(&q->dup_cor, &q->prng)) + ++count; + } /* Drop packet? */ if (loss_event(q)) { @@ -532,17 +541,12 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, } /* - * If doing duplication then re-insert at top of the - * qdisc tree, since parent queuer expects that only one - * skb will be queued. + * If doing duplication then re-insert at the same qdisc, + * as going back to the root would induce loops. */ if (skb2) { - struct Qdisc *rootq = qdisc_root_bh(sch); - u32 dupsave = q->duplicate; /* prevent duplicating a dup... */ - - q->duplicate = 0; - rootq->enqueue(skb2, rootq, to_free); - q->duplicate = dupsave; + netem_skb_cb(skb2)->duplicate = 1; + qdisc_enqueue(skb2, sch, to_free); skb2 = NULL; } -- 2.34.1 qfq_choose_next_agg() could return NULL so its return value should be properly checked unless NULL is acceptable. There are two cases we need to deal with: 1) q->in_serv_agg, which is okay with NULL since it is either checked or just compared with other pointer without dereferencing. In fact, it is even intentionally set to NULL in one of the cases. 2) in_serv_agg, which is a temporary local variable, which is not okay with NULL, since it is dereferenced immediately, hence must be checked. This fix corrects one of the 2nd cases, and leaving the 1st case as they are. Although this bug is triggered with the netem duplicate change, the root cause is still within qfq qdisc. Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Cc: Xiang Mei Signed-off-by: Cong Wang --- net/sched/sch_qfq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index f0eb70353744..f328a58c7b98 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -1147,6 +1147,8 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch) * choose the new aggregate to serve. */ in_serv_agg = q->in_serv_agg = qfq_choose_next_agg(q); + if (!in_serv_agg) + return NULL; skb = qfq_peek_skb(in_serv_agg, &cl, &len); } if (!skb) -- 2.34.1 Integrate the reproducer from William into tc-testing and use scapy to generate packets for testing: # ./tdc.py -e 1676 -- ns/SubPlugin.__init__ -- scapy/SubPlugin.__init__ Test 1676: NETEM nested duplicate 100% [ 1154.071135] v0p0id1676: entered promiscuous mode [ 1154.101066] virtio_net virtio0 enp1s0: entered promiscuous mode [ 1154.146301] virtio_net virtio0 enp1s0: left promiscuous mode . Sent 1 packets. [ 1154.173695] v0p0id1676: left promiscuous mode [ 1154.216159] v0p0id1676: entered promiscuous mode . Sent 1 packets. [ 1154.238398] v0p0id1676: left promiscuous mode [ 1154.260909] v0p0id1676: entered promiscuous mode . Sent 1 packets. [ 1154.282708] v0p0id1676: left promiscuous mode [ 1154.309399] v0p0id1676: entered promiscuous mode . Sent 1 packets. [ 1154.337949] v0p0id1676: left promiscuous mode [ 1154.360924] v0p0id1676: entered promiscuous mode . Sent 1 packets. [ 1154.383522] v0p0id1676: left promiscuous mode All test results: 1..1 ok 1 1676 - NETEM nested duplicate 100% Reported-by: William Liu Signed-off-by: Cong Wang --- .../tc-testing/tc-tests/qdiscs/netem.json | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json index 3c4444961488..03c4ceb22990 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json +++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json @@ -336,5 +336,30 @@ "teardown": [ "$TC qdisc del dev $DUMMY handle 1: root" ] + }, + { + "id": "1676", + "name": "NETEM nested duplicate 100%", + "category": ["qdisc", "netem"], + "setup": [ + "$TC qdisc add dev $DEV1 root handle 1: netem limit 1000 duplicate 100%", + "$TC qdisc add dev $DEV1 parent 1: handle 2: netem limit 1000 duplicate 100%" + ], + "teardown": [ + "$TC qdisc del dev $DEV1 root" + ], + "plugins": { + "requires": ["nsPlugin", "scapyPlugin"] + }, + "scapy": { + "iface": "$DEV0", + "count": 5, + "packet": "Ether()/IP(dst='10.10.10.1', src='10.10.10.10')/TCP(sport=12345, dport=80)" + }, + "cmdUnderTest": "$TC -s qdisc show dev $DEV1", + "expExitCode": "0", + "verifyCmd": "$TC -s qdisc show dev $DEV1", + "matchPattern": "Sent [0-9]+ bytes [0-9]+ pkt", + "matchCount": "2" } ] -- 2.34.1 Integrate the test case from Jamal into tc-testing: Test 94a7: Test PRIO with NETEM duplication All test results: 1..1 ok 1 94a7 - Test PRIO with NETEM duplication Signed-off-by: Cong Wang --- .../tc-testing/tc-tests/infra/qdiscs.json | 29 +++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json index c6db7fa94f55..605a478032d8 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json +++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json @@ -763,6 +763,35 @@ "matchJSON": [], "teardown": [ "$TC qdisc del dev $DUMMY root" + ] + }, + { + "id": "94a7", + "name": "Test PRIO with NETEM duplication", + "category": [ + "qdisc", + "prio", + "netem" + ], + "plugins": { + "requires": [ + "nsPlugin" + ] + }, + "setup": [ + "$IP link set dev $DUMMY up || true", + "$IP addr add 10.10.11.10/24 dev $DUMMY || true", + "$TC qdisc add dev $DUMMY root handle 1: prio bands 3 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0", + "$TC filter add dev $DUMMY parent 1:0 protocol ip matchall classid 1:1", + "$TC qdisc add dev $DUMMY parent 1:1 handle 10: netem limit 4 duplicate 100%" + ], + "cmdUnderTest": "ping -c1 -W0.01 -I $DUMMY 10.10.11.11", + "expExitCode": "1", + "verifyCmd": "$TC -s qdisc show dev $DUMMY | grep -A 5 'qdisc netem' | grep -E 'Sent [0-9]+ bytes [0-9]+ pkt'", + "matchPattern": "Sent \\d+ bytes (\\d+) pkt", + "matchCount": "1", + "teardown": [ + "$TC qdisc del dev $DUMMY root handle 1: prio" ] } ] -- 2.34.1 Given that multi-queue NICs are prevalent and the global spinlock issue with single netem instances is a known performance limitation, the setup using mq as a parent for netem is an excellent and highly reasonable pattern for applying netem effects like 100% duplication efficiently on modern Linux systems. Signed-off-by: Cong Wang --- .../tc-testing/tc-tests/infra/qdiscs.json | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json index 605a478032d8..b4a507bc48a3 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json +++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json @@ -793,5 +793,36 @@ "teardown": [ "$TC qdisc del dev $DUMMY root handle 1: prio" ] + }, + { + "id": "94a8", + "name": "Test MQ with NETEM duplication", + "category": [ + "qdisc", + "mq", + "netem" + ], + "plugins": { + "requires": ["nsPlugin", "scapyPlugin"] + }, + "setup": [ + "$IP link set dev $DEV1 up", + "$TC qdisc add dev $DEV1 root handle 1: mq", + "$TC qdisc add dev $DEV1 parent 1:1 handle 10: netem duplicate 100%", + "$TC qdisc add dev $DEV1 parent 1:2 handle 20: netem duplicate 100%" + ], + "scapy": { + "iface": "$DEV0", + "count": 5, + "packet": "Ether()/IP(dst='10.10.10.1', src='10.10.10.10')/ICMP()" + }, + "cmdUnderTest": "$TC -s qdisc show dev $DEV1", + "expExitCode": "0", + "verifyCmd": "$TC -s qdisc show dev $DEV1 | grep -A 5 'qdisc netem' | grep -E 'Sent [0-9]+ bytes [0-9]+ pkt'", + "matchPattern": "Sent \\d+ bytes (\\d+) pkt", + "matchCount": "2", + "teardown": [ + "$TC qdisc del dev $DEV1 root handle 1: mq" + ] } ] -- 2.34.1 Now netem does no longer trigger reentrant behaviour of its upper qdiscs, the whole architecture becomes more solid and less error prone. Keep these test cases since one of them still sucessfully caught a bug in QFQ qdisc, but update them to the new netem enqueue behavior. Signed-off-by: Cong Wang --- .../tc-testing/tc-tests/infra/qdiscs.json | 54 +++++++++---------- 1 file changed, 25 insertions(+), 29 deletions(-) diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json index b4a507bc48a3..d43cd3c17526 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json +++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json @@ -381,7 +381,7 @@ }, { "id": "90ec", - "name": "Test DRR's enqueue reentrant behaviour with netem", + "name": "Test DRR with NETEM duplication", "category": [ "qdisc", "drr" @@ -399,11 +399,11 @@ ], "cmdUnderTest": "ping -c 1 -I $DUMMY 10.10.10.1 > /dev/null || true", "expExitCode": "0", - "verifyCmd": "$TC -j -s qdisc ls dev $DUMMY handle 1:0", + "verifyCmd": "$TC -j -s qdisc ls dev $DUMMY handle 2:0", "matchJSON": [ { - "kind": "drr", - "handle": "1:", + "kind": "netem", + "handle": "2:", "bytes": 196, "packets": 2 } @@ -416,7 +416,7 @@ }, { "id": "1f1f", - "name": "Test ETS's enqueue reentrant behaviour with netem", + "name": "Test ETS with NETEM duplication", "category": [ "qdisc", "ets" @@ -434,15 +434,13 @@ ], "cmdUnderTest": "ping -c 1 -I $DUMMY 10.10.10.1 > /dev/null || true", "expExitCode": "0", - "verifyCmd": "$TC -j -s class show dev $DUMMY", + "verifyCmd": "$TC -j -s qdisc show dev $DUMMY handle 2:0", "matchJSON": [ { - "class": "ets", - "handle": "1:1", - "stats": { - "bytes": 196, - "packets": 2 - } + "kind": "netem", + "handle": "2:", + "bytes": 196, + "packets": 2 } ], "matchCount": "1", @@ -453,7 +451,7 @@ }, { "id": "5e6d", - "name": "Test QFQ's enqueue reentrant behaviour with netem", + "name": "Test QFQ with NETEM duplication", "category": [ "qdisc", "qfq" @@ -471,11 +469,11 @@ ], "cmdUnderTest": "ping -c 1 -I $DUMMY 10.10.10.1 > /dev/null || true", "expExitCode": "0", - "verifyCmd": "$TC -j -s qdisc ls dev $DUMMY handle 1:0", + "verifyCmd": "$TC -j -s qdisc ls dev $DUMMY handle 2:0", "matchJSON": [ { - "kind": "qfq", - "handle": "1:", + "kind": "netem", + "handle": "2:", "bytes": 196, "packets": 2 } @@ -488,7 +486,7 @@ }, { "id": "bf1d", - "name": "Test HFSC's enqueue reentrant behaviour with netem", + "name": "Test HFSC with NETEM duplication", "category": [ "qdisc", "hfsc" @@ -512,13 +510,11 @@ ], "cmdUnderTest": "ping -c 1 10.10.10.2 -I$DUMMY > /dev/null || true", "expExitCode": "0", - "verifyCmd": "$TC -j -s qdisc ls dev $DUMMY handle 1:0", + "verifyCmd": "$TC -j -s qdisc ls dev $DUMMY handle 3:0", "matchJSON": [ { - "kind": "hfsc", - "handle": "1:", - "bytes": 392, - "packets": 4 + "kind": "netem", + "handle": "3:" } ], "matchCount": "1", @@ -529,7 +525,7 @@ }, { "id": "7c3b", - "name": "Test nested DRR's enqueue reentrant behaviour with netem", + "name": "Test nested DRR with NETEM duplication", "category": [ "qdisc", "drr" @@ -550,13 +546,13 @@ ], "cmdUnderTest": "ping -c 1 -I $DUMMY 10.10.10.1 > /dev/null || true", "expExitCode": "0", - "verifyCmd": "$TC -j -s qdisc ls dev $DUMMY handle 1:0", + "verifyCmd": "$TC -j -s qdisc ls dev $DUMMY handle 3:0", "matchJSON": [ { - "kind": "drr", - "handle": "1:", - "bytes": 196, - "packets": 2 + "kind": "netem", + "handle": "3:", + "bytes": 98, + "packets": 1 } ], "matchCount": "1", @@ -629,7 +625,7 @@ }, { "id": "309e", - "name": "Test HFSC eltree double add with reentrant enqueue behaviour on netem", + "name": "Test complex HFSC with NETEM duplication", "category": [ "qdisc", "hfsc" -- 2.34.1