From: Shahar Shitrit Introduce a new helper function netif_xmit_time_out_duration() to check if a TX queue has timed out and report the timeout duration. This helper consolidates the logic that is duplicated in several locations and also encapsulates the check for whether the TX queue is stopped. As the first user, convert dev_watchdog() to use this helper. Signed-off-by: Shahar Shitrit Reviewed-by: Yael Chemla Signed-off-by: Tariq Toukan --- include/linux/netdevice.h | 15 +++++++++++++++ net/sched/sch_generic.c | 7 +++---- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e808071dbb7d..3cd73769fcfa 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3680,6 +3680,21 @@ static inline bool netif_xmit_stopped(const struct netdev_queue *dev_queue) return dev_queue->state & QUEUE_STATE_ANY_XOFF; } +static inline unsigned int +netif_xmit_timeout_ms(struct netdev_queue *txq, unsigned long *trans_start) +{ + unsigned long txq_trans_start = READ_ONCE(txq->trans_start); + + if (trans_start) + *trans_start = txq_trans_start; + + if (netif_xmit_stopped(txq) && + time_after(jiffies, txq_trans_start + txq->dev->watchdog_timeo)) + return jiffies_to_msecs(jiffies - txq_trans_start); + + return 0; +} + static inline bool netif_xmit_frozen_or_stopped(const struct netdev_queue *dev_queue) { diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 852e603c1755..aa6192781a24 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -523,10 +523,9 @@ static void dev_watchdog(struct timer_list *t) * netdev_tx_sent_queue() and netif_tx_stop_queue(). */ smp_mb(); - trans_start = READ_ONCE(txq->trans_start); - - if (time_after(jiffies, trans_start + dev->watchdog_timeo)) { - timedout_ms = jiffies_to_msecs(jiffies - trans_start); + timedout_ms = netif_xmit_timeout_ms(txq, + &trans_start); + if (timedout_ms) { atomic_long_inc(&txq->trans_timeout); break; } -- 2.31.1 From: Shahar Shitrit Replace the open-coded TX queue timeout check in hns3_get_timeout_queue() with a call to netif_xmit_timeout_ms() helper. Signed-off-by: Shahar Shitrit Reviewed-by: Yael Chemla Signed-off-by: Tariq Toukan --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 7a0654e2d3dd..3e8fe3b5d32b 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -2811,10 +2811,7 @@ static int hns3_get_timeout_queue(struct net_device *ndev) unsigned long trans_start; q = netdev_get_tx_queue(ndev, i); - trans_start = READ_ONCE(q->trans_start); - if (netif_xmit_stopped(q) && - time_after(jiffies, - (trans_start + ndev->watchdog_timeo))) { + if (netif_xmit_timeout_ms(q, &trans_start)) { #ifdef CONFIG_BQL struct dql *dql = &q->dql; -- 2.31.1 From: Shahar Shitrit mlx5e_tx_timeout_work() is invoked when the dev_watchdog reports a timed-out TX queue. Currently, the recovery flow is triggered for all stopped SQs, which is not always correct — some SQs may be temporarily stopped without actually timing out. Attempting to recover such SQs results in no EQE being polled (since no real timeout occurred), which the driver misinterprets as a recovery failure, unnecessarily causing channel reopening. Improve the logic to initiate recovery only for SQs that are both stopped and timed out. Utilize the helper introduced in the previous patch to determine whether the netdevice watchdog timeout period has elapsed since the SQ’s last transmit timestamp. Signed-off-by: Shahar Shitrit Reviewed-by: Yael Chemla Signed-off-by: Tariq Toukan --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index e537df670758..cd146df29ada 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -5139,7 +5139,7 @@ static void mlx5e_tx_timeout_work(struct work_struct *work) netdev_get_tx_queue(netdev, i); struct mlx5e_txqsq *sq = priv->txq2sq[i]; - if (!netif_xmit_stopped(dev_queue)) + if (!netif_xmit_timeout_ms(dev_queue, NULL)) continue; if (mlx5e_reporter_tx_timeout(sq)) -- 2.31.1