From: Weiming Shi __dev_queue_xmit() has two transmit code paths depending on whether the device has a qdisc attached: 1. Qdisc path (q->enqueue): calls __dev_xmit_skb() 2. No-qdisc path: calls dev_hard_start_xmit() directly Commit 745e20f1b626 ("net: add a recursion limit in xmit path") added recursion protection to the no-qdisc path via dev_xmit_recursion() check and dev_xmit_recursion_inc()/dec() tracking. However, the qdisc path performs no recursion depth checking at all. This allows unbounded recursion through qdisc-attached devices. For example, a bond interface in broadcast mode with gretap slaves whose remote endpoints route back through the bond creates an infinite transmit loop that exhausts the kernel stack: BUG: KASAN: stack-out-of-bounds in blake2s.constprop.0+0xe7/0x160 Write of size 32 at addr ffff88810033fed0 by task kworker/0:1/11 Workqueue: mld mld_ifc_work Call Trace: __build_flow_key.constprop.0 (net/ipv4/route.c:515) ip_rt_update_pmtu (net/ipv4/route.c:1073) iptunnel_xmit (net/ipv4/ip_tunnel_core.c:84) ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847) gre_tap_xmit (net/ipv4/ip_gre.c:779) dev_hard_start_xmit (net/core/dev.c:3887) sch_direct_xmit (net/sched/sch_generic.c:347) __dev_queue_xmit (net/core/dev.c:4802) bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312) bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279) bond_start_xmit (drivers/net/bonding/bond_main.c:5530) dev_hard_start_xmit (net/core/dev.c:3887) __dev_queue_xmit (net/core/dev.c:4841) ip_finish_output2 (net/ipv4/ip_output.c:237) ip_output (net/ipv4/ip_output.c:438) iptunnel_xmit (net/ipv4/ip_tunnel_core.c:86) gre_tap_xmit (net/ipv4/ip_gre.c:779) dev_hard_start_xmit (net/core/dev.c:3887) sch_direct_xmit (net/sched/sch_generic.c:347) __dev_queue_xmit (net/core/dev.c:4802) bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312) bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279) bond_start_xmit (drivers/net/bonding/bond_main.c:5530) dev_hard_start_xmit (net/core/dev.c:3887) __dev_queue_xmit (net/core/dev.c:4841) ip_finish_output2 (net/ipv4/ip_output.c:237) ip_output (net/ipv4/ip_output.c:438) iptunnel_xmit (net/ipv4/ip_tunnel_core.c:86) ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847) gre_tap_xmit (net/ipv4/ip_gre.c:779) dev_hard_start_xmit (net/core/dev.c:3887) sch_direct_xmit (net/sched/sch_generic.c:347) __dev_queue_xmit (net/core/dev.c:4802) bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312) bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279) bond_start_xmit (drivers/net/bonding/bond_main.c:5530) dev_hard_start_xmit (net/core/dev.c:3887) __dev_queue_xmit (net/core/dev.c:4841) mld_sendpack mld_ifc_work process_one_work worker_thread poc (76) used greatest stack depth: 8 bytes left The per-queue qdisc_run_begin() serialization does not prevent this because each gretap slave can have multiple TX queues, so each recursion level may select a different queue. The q->owner check also fails because each level operates on a different qdisc instance. Fix by adding the same recursion protection to the qdisc path that the no-qdisc path already has: check dev_xmit_recursion() before entering __dev_xmit_skb(), and bracket the call with dev_xmit_recursion_inc()/dec() to properly track nesting depth across both transmit paths. Fixes: bbd8a0d3a3b6 ("net: Avoid enqueuing skb for default qdiscs") Reported-by: Xiang Mei Signed-off-by: Weiming Shi --- net/core/dev.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index c1a9f7fdcffa..d5d929df67be 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4799,7 +4799,17 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) trace_net_dev_queue(skb); if (q->enqueue) { + if (unlikely(dev_xmit_recursion())) { + net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n", + dev->name); + rc = -ENETDOWN; + dev_core_stats_tx_dropped_inc(dev); + kfree_skb_list(skb); + goto out; + } + dev_xmit_recursion_inc(); rc = __dev_xmit_skb(skb, q, dev, txq); + dev_xmit_recursion_dec(); goto out; } -- 2.43.0