In the old behavior, duplicated packets were sent back to the root qdisc, which could create dangerous infinite loops in hierarchical setups - imagine a scenario where each level of a multi-stage netem hierarchy kept feeding duplicates back to the top, potentially causing system instability or resource exhaustion. The new behavior elegantly solves this by enqueueing duplicates to the same qdisc that created them, ensuring that packet duplication occurs exactly once per netem stage in a controlled, predictable manner. This change enables users to safely construct complex network emulation scenarios using netem hierarchies (like the 4x multiplication demonstrated in testing) without worrying about runaway packet generation, while still preserving the intended duplication effects. Users can now confidently chain multiple netem qdiscs together to achieve sophisticated network impairment combinations, knowing that each stage will apply its effects exactly once to the packet flow, making network testing scenarios more reliable and results more deterministic. I tested netem packet duplication in two configurations: 1. Nest netem-to-netem hierarchy using parent/child attachment 2. Single netem using prio qdisc with netem leaf Setup commands and results: Single netem hierarchy (prio + netem): tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 tc filter add dev lo parent 1:0 protocol ip matchall classid 1:1 tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100% Result: 2x packet multiplication (1→2 packets) 2 echo requests + 4 echo replies = 6 total packets Expected behavior: Only one netem stage exists in this hierarchy, so 1 ping becomes 2 packets (100% duplication). The 2 echo requests generate 2 echo replies, which also get duplicated to 4 replies, yielding the predictable total of 6 packets (2 requests + 4 replies). Nest netem hierarchy (netem + netem): tc qdisc add dev lo root handle 1: netem limit 1000 duplicate 100% tc qdisc add dev lo parent 1: handle 2: netem limit 1000 duplicate 100% Result: 4x packet multiplication (1→2→4 packets) 4 echo requests + 16 echo replies = 20 total packets Expected behavior: Root netem duplicates 1 ping to 2 packets, child netem receives 2 packets and duplicates each to create 4 total packets. Since ping operates bidirectionally, 4 echo requests generate 4 echo replies, which also get duplicated through the same hierarchy (4→8→16), resulting in the predictable total of 20 packets (4 requests + 16 replies). The new netem duplication behavior does not break the documented semantics of "creates a copy of the packet before queuing." The man page description remains true since duplication occurs before the queuing process, creating both original and duplicate packets that are then enqueued. The documentation does not specify which qdisc should receive the duplicates, only that copying happens before queuing. The implementation choice to enqueue duplicates to the same qdisc (rather than root) is an internal detail that maintains the documented behavior while preventing infinite loops in hierarchical configurations. Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication") Reported-by: William Liu Reported-by: Savino Dicanosa Signed-off-by: Cong Wang --- net/sched/sch_netem.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index fdd79d3ccd8c..191f64bd68ff 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -165,6 +165,7 @@ struct netem_sched_data { */ struct netem_skb_cb { u64 time_to_send; + u8 duplicate : 1; }; static inline struct netem_skb_cb *netem_skb_cb(struct sk_buff *skb) @@ -460,8 +461,16 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, skb->prev = NULL; /* Random duplication */ - if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng)) - ++count; + if (q->duplicate) { + bool dup = true; + + if (netem_skb_cb(skb)->duplicate) { + netem_skb_cb(skb)->duplicate = 0; + dup = false; + } + if (dup && q->duplicate >= get_crandom(&q->dup_cor, &q->prng)) + ++count; + } /* Drop packet? */ if (loss_event(q)) { @@ -532,17 +541,12 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, } /* - * If doing duplication then re-insert at top of the - * qdisc tree, since parent queuer expects that only one - * skb will be queued. + * If doing duplication then re-insert at the same qdisc, + * as going back to the root would induce loops. */ if (skb2) { - struct Qdisc *rootq = qdisc_root_bh(sch); - u32 dupsave = q->duplicate; /* prevent duplicating a dup... */ - - q->duplicate = 0; - rootq->enqueue(skb2, rootq, to_free); - q->duplicate = dupsave; + netem_skb_cb(skb2)->duplicate = 1; + qdisc_enqueue(skb2, sch, to_free); skb2 = NULL; } -- 2.34.1 Integrate the reproducer from William into tc-testing and use scapy to generate packets for testing: # ./tdc.py -e 1676 -- ns/SubPlugin.__init__ -- scapy/SubPlugin.__init__ Test 1676: NETEM nested duplicate 100% [ 1154.071135] v0p0id1676: entered promiscuous mode [ 1154.101066] virtio_net virtio0 enp1s0: entered promiscuous mode [ 1154.146301] virtio_net virtio0 enp1s0: left promiscuous mode . Sent 1 packets. [ 1154.173695] v0p0id1676: left promiscuous mode [ 1154.216159] v0p0id1676: entered promiscuous mode . Sent 1 packets. [ 1154.238398] v0p0id1676: left promiscuous mode [ 1154.260909] v0p0id1676: entered promiscuous mode . Sent 1 packets. [ 1154.282708] v0p0id1676: left promiscuous mode [ 1154.309399] v0p0id1676: entered promiscuous mode . Sent 1 packets. [ 1154.337949] v0p0id1676: left promiscuous mode [ 1154.360924] v0p0id1676: entered promiscuous mode . Sent 1 packets. [ 1154.383522] v0p0id1676: left promiscuous mode All test results: 1..1 ok 1 1676 - NETEM nested duplicate 100% Reported-by: William Liu Signed-off-by: Cong Wang --- .../tc-testing/tc-tests/qdiscs/netem.json | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json index 3c4444961488..03c4ceb22990 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json +++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json @@ -336,5 +336,30 @@ "teardown": [ "$TC qdisc del dev $DUMMY handle 1: root" ] + }, + { + "id": "1676", + "name": "NETEM nested duplicate 100%", + "category": ["qdisc", "netem"], + "setup": [ + "$TC qdisc add dev $DEV1 root handle 1: netem limit 1000 duplicate 100%", + "$TC qdisc add dev $DEV1 parent 1: handle 2: netem limit 1000 duplicate 100%" + ], + "teardown": [ + "$TC qdisc del dev $DEV1 root" + ], + "plugins": { + "requires": ["nsPlugin", "scapyPlugin"] + }, + "scapy": { + "iface": "$DEV0", + "count": 5, + "packet": "Ether()/IP(dst='10.10.10.1', src='10.10.10.10')/TCP(sport=12345, dport=80)" + }, + "cmdUnderTest": "$TC -s qdisc show dev $DEV1", + "expExitCode": "0", + "verifyCmd": "$TC -s qdisc show dev $DEV1", + "matchPattern": "Sent [0-9]+ bytes [0-9]+ pkt", + "matchCount": "2" } ] -- 2.34.1 Integrate the test case from Jamal into tc-testing: Test 94a7: Test PRIO with NETEM duplication All test results: 1..1 ok 1 94a7 - Test PRIO with NETEM duplication Signed-off-by: Cong Wang --- .../tc-testing/tc-tests/infra/qdiscs.json | 29 +++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json index 5c6851e8d311..bfa6de751270 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json +++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json @@ -672,5 +672,34 @@ "teardown": [ "$TC qdisc del dev $DUMMY root handle 1: drr" ] + }, + { + "id": "94a7", + "name": "Test PRIO with NETEM duplication", + "category": [ + "qdisc", + "prio", + "netem" + ], + "plugins": { + "requires": [ + "nsPlugin" + ] + }, + "setup": [ + "$IP link set dev $DUMMY up || true", + "$IP addr add 10.10.11.10/24 dev $DUMMY || true", + "$TC qdisc add dev $DUMMY root handle 1: prio bands 3 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0", + "$TC filter add dev $DUMMY parent 1:0 protocol ip matchall classid 1:1", + "$TC qdisc add dev $DUMMY parent 1:1 handle 10: netem limit 4 duplicate 100%" + ], + "cmdUnderTest": "ping -c1 -W0.01 -I $DUMMY 10.10.11.11", + "expExitCode": "1", + "verifyCmd": "$TC -s qdisc show dev $DUMMY | grep -A 5 'qdisc netem' | grep -E 'Sent [0-9]+ bytes [0-9]+ pkt'", + "matchPattern": "Sent \\d+ bytes (\\d+) pkt", + "matchCount": "1", + "teardown": [ + "$TC qdisc del dev $DUMMY root handle 1: prio" + ] } ] -- 2.34.1 This is a very reasonable use case for multiqueue NIC, add it to tc-testing to ensure no one should break it. Test 94a8: Test MQ with NETEM duplication [ 753.877611] v0p0id94a8: entered promiscuous mode [ 753.909783] virtio_net virtio0 enp1s0: entered promiscuous mode [ 753.936686] virtio_net virtio0 enp1s0: left promiscuous mode . Sent 1 packets. [ 753.984974] v0p0id94a8: left promiscuous mode [ 754.010725] v0p0id94a8: entered promiscuous mode . Sent 1 packets. [ 754.030879] v0p0id94a8: left promiscuous mode [ 754.067966] v0p0id94a8: entered promiscuous mode . Sent 1 packets. [ 754.096516] v0p0id94a8: left promiscuous mode [ 754.129166] v0p0id94a8: entered promiscuous mode . Sent 1 packets. [ 754.156371] v0p0id94a8: left promiscuous mode [ 754.187278] v0p0id94a8: entered promiscuous mode . Sent 1 packets. [ 754.212102] v0p0id94a8: left promiscuous mode All test results: 1..1 ok 1 94a8 - Test MQ with NETEM duplication Signed-off-by: Cong Wang --- .../tc-testing/tc-tests/infra/qdiscs.json | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json index bfa6de751270..8e206260fa79 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json +++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json @@ -701,5 +701,35 @@ "teardown": [ "$TC qdisc del dev $DUMMY root handle 1: prio" ] + }, + { + "id": "94a8", + "name": "Test MQ with NETEM duplication", + "category": [ + "qdisc", + "mq", + "netem" + ], + "plugins": { + "requires": ["nsPlugin", "scapyPlugin"] + }, + "setup": [ + "$IP link set dev $DEV1 up", + "$TC qdisc add dev $DEV1 root handle 1: mq", + "$TC qdisc add dev $DEV1 parent 1:1 handle 10: netem duplicate 100%" + ], + "scapy": { + "iface": "$DEV0", + "count": 5, + "packet": "Ether()/IP(dst='10.10.10.1', src='10.10.10.10')/ICMP()" + }, + "cmdUnderTest": "$TC -s qdisc show dev $DEV1", + "expExitCode": "0", + "verifyCmd": "$TC -s qdisc show dev $DEV1 | grep -A 5 'qdisc netem' | grep -E 'Sent [0-9]+ bytes [0-9]+ pkt'", + "matchPattern": "Sent \\d+ bytes (\\d+) pkt", + "matchCount": "1", + "teardown": [ + "$TC qdisc del dev $DEV1 root handle 1: mq" + ] } ] -- 2.34.1