Three driver callbacks schedule a reset and wait for its completion: ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels(). Waiting for reset in ndo_change_mtu() and set_ringparam() was added by commit c2ed2403f12c74 ("iavf: Wait for reset in callbacks which trigger it") to fix a race condition where adding an interface to bonding immediately after MTU or ring parameter change failed because the interface was still in __RESETTING state. The same commit also added waiting in iavf_set_priv_flags(), which was later removed by commit 53844673d55529 ("iavf: kill "legacy-rx" for good"). Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count") to ensure the PF has enough time to complete the VF reset when changing channel count, and to return correct error codes to userspace. Commit ef490bbb226702 ("iavf: Add net_shaper_ops support") added net_shaper_ops to iavf, which required reset_task to use _locked NAPI variants (napi_enable_locked, napi_disable_locked) that need the netdev instance lock. Later, commit 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") and commit 2bcf4772e45adb ("net: ethtool: try to protect all callback with netdev instance lock") started holding the netdev instance lock during ndo and ethtool callbacks for drivers with net_shaper_ops. The combination of waiting for reset and the new locking requirements creates a deadlock: the callback holds the lock and waits for reset_task, but reset_task is blocked waiting for the same lock: Thread 1 (callback) Thread 2 (reset_task) ------------------- --------------------- netdev_lock() ndo_change_mtu() or ethtool op iavf_schedule_reset() iavf_wait_for_reset() iavf_reset_task() waiting... netdev_lock() <- DEADLOCK Reproducer: # echo 16 > /sys/class/net/$PF/device/sriov_numvfs # ip link set $VF up # ip link set $VF mtu 5000 RTNETLINK answers: Device or resource busy # dmesg | tail -1 iavf: MTU change interrupted waiting for reset Fix this by temporarily releasing the lock while waiting for reset to complete. This follows the pattern used elsewhere in the kernel (e.g., do_set_master() releases rtnl_lock before calling ndo_add_slave()). Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Signed-off-by: Petr Oros --- drivers/net/ethernet/intel/iavf/iavf_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 8aa6e92c16431f..d7738fb8fa60bc 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -189,13 +189,22 @@ static bool iavf_is_reset_in_progress(struct iavf_adapter *adapter) * iavf_wait_for_reset - Wait for reset to finish. * @adapter: board private structure * + * The iavf driver selects NET_SHAPER, so callbacks that trigger reset are + * always called with netdev instance lock held, while reset_task also needs + * this lock. Release the lock while waiting to avoid deadlock. + * * Returns 0 if reset finished successfully, negative on timeout or interrupt. */ int iavf_wait_for_reset(struct iavf_adapter *adapter) { - int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue, - !iavf_is_reset_in_progress(adapter), - msecs_to_jiffies(5000)); + struct net_device *netdev = adapter->netdev; + int ret; + + netdev_unlock(netdev); + ret = wait_event_interruptible_timeout(adapter->reset_waitqueue, + !iavf_is_reset_in_progress(adapter), + msecs_to_jiffies(5000)); + netdev_lock(netdev); /* If ret < 0 then it means wait was interrupted. * If ret == 0 then it means we got a timeout while waiting -- 2.52.0