After a firmware error is detected and STATUS_FW_ERROR is set, NAPI can still be actively polling or get scheduled from a prior interrupt. The NAPI poll functions (both legacy and MSIX variants) have no check for STATUS_FW_ERROR and will continue processing stale RX ring entries from dying firmware. This can dispatch TX completion notifications containing corrupt SSN values to iwl_mld_handle_tx_resp_notif(), which passes them to iwl_trans_reclaim(). If the corrupt SSN causes reclaim to walk TX queue entries that were already freed by a prior correct reclaim, the result is an skb use-after-free or double-free. The race window opens when the MSIX IRQ handler schedules NAPI (lines 2319-2321 in rx.c) before processing the error bit (lines 2382-2396), or when NAPI is already running on another CPU from a previous interrupt when STATUS_FW_ERROR gets set on the current CPU. Add STATUS_FW_ERROR checks to both NAPI poll functions to prevent processing stale RX data after firmware error, and add early-return guards in the TX response and compressed BA notification handlers as defense-in-depth. Each check uses WARN_ONCE to log if the race is actually hit, which aids diagnosis of the hard-to-reproduce skb use-after-free reported on Intel BE200. Note that _iwl_trans_pcie_gen2_stop_device() already calls iwl_pcie_rx_napi_sync() to quiesce NAPI during device teardown, but that runs much later in the restart sequence. These checks close the window between error detection and device stop. Signed-off-by: Cole Leavitt --- Tested on Intel BE200 (FW 101.6e695a70.0) by forcing NMI via debugfs. The WARN_ONCE fires reliably: iwlwifi: NAPI MSIX poll[0] invoked after FW error WARNING: drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c:1058 at iwl_pcie_napi_poll_msix+0xff/0x130 [iwlwifi], CPU#22 Confirming NAPI poll is invoked after STATUS_FW_ERROR is set. Without this patch, that poll processes stale RX ring data from dead firmware. drivers/net/wireless/intel/iwlwifi/mld/tx.c | 19 ++++++++++++++++++ .../wireless/intel/iwlwifi/pcie/gen1_2/rx.c | 20 +++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c index 3b4b575aadaa..3e99f3ded9bc 100644 --- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c @@ -1071,6 +1071,18 @@ void iwl_mld_handle_tx_resp_notif(struct iwl_mld *mld, bool mgmt = false; bool tx_failure = (status & TX_STATUS_MSK) != TX_STATUS_SUCCESS; + /* Firmware is dead — the TX response may contain corrupt SSN values + * from a dying firmware DMA. Processing it could cause + * iwl_trans_reclaim() to free the wrong TX queue entries, leading to + * skb use-after-free or double-free. + */ + if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) { + WARN_ONCE(1, + "iwlwifi: TX resp notif (sta=%d txq=%d) after FW error\n", + sta_id, txq_id); + return; + } + if (IWL_FW_CHECK(mld, tx_resp->frame_count != 1, "Invalid tx_resp notif frame_count (%d)\n", tx_resp->frame_count)) @@ -1349,6 +1361,13 @@ void iwl_mld_handle_compressed_ba_notif(struct iwl_mld *mld, u8 sta_id = ba_res->sta_id; struct ieee80211_link_sta *link_sta; + if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) { + WARN_ONCE(1, + "iwlwifi: BA notif (sta=%d) after FW error\n", + sta_id); + return; + } + if (!tfd_cnt) return; diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c index 619a9505e6d9..ba18d35fa55d 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c @@ -1015,6 +1015,18 @@ static int iwl_pcie_napi_poll(struct napi_struct *napi, int budget) trans_pcie = iwl_netdev_to_trans_pcie(napi->dev); trans = trans_pcie->trans; + /* Stop processing RX if firmware has crashed. Stale notifications + * from dying firmware (e.g. TX completions with corrupt SSN values) + * can cause use-after-free in reclaim paths. + */ + if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) { + WARN_ONCE(1, + "iwlwifi: NAPI poll[%d] invoked after FW error\n", + rxq->id); + napi_complete_done(napi, 0); + return 0; + } + ret = iwl_pcie_rx_handle(trans, rxq->id, budget); IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", @@ -1042,6 +1054,14 @@ static int iwl_pcie_napi_poll_msix(struct napi_struct *napi, int budget) trans_pcie = iwl_netdev_to_trans_pcie(napi->dev); trans = trans_pcie->trans; + if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) { + WARN_ONCE(1, + "iwlwifi: NAPI MSIX poll[%d] invoked after FW error\n", + rxq->id); + napi_complete_done(napi, 0); + return 0; + } + ret = iwl_pcie_rx_handle(trans, rxq->id, budget); IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", rxq->id, ret, budget); -- 2.52.0