During error recovery testing a pair of tasks was reported to be hung due to a dead-lock situation: - mlx5_unload_one() trying to acquire devlink lock while the PCI error recovery code had acquired the pci_cfg_access_lock(). - mlx5_crdump_collect() trying to acquire the pci_cfg_access_lock() while devlink_health_report() had acquired the devlink lock. Move the check for pci_channel_offline prior to acquiring the pci_cfg_access_lock in mlx5_vsc_gw_lock since collecting the crdump will fail anyhow while PCI error recovery is running. Fixes: 33afbfcc105a ("net/mlx5: Stop waiting for PCI if pci channel is offline") Signed-off-by: Gerd Bayer --- Hi all, while the initial hit was recorded during "random" testing, where PCI error recovery and poll_health() tripped almost simultaneously, I found a way to reproduce a very similar hang at will on s390: Inject a PCI error recovery event on a Physical Function with zpcictl --reset-fw then request a crdump with devlink health dump show pci/ reporter fw_fatal With the patch applied I didn't get the hang but kernel logs showed: [ 792.885743] mlx5_core 000a:00:00.0: mlx5_crdump_collect:51:(pid 1415): crdump: failed to lock vsc gw err -13 and the crdump request ended with: kernel answers: Permission denied Thanks, Gerd --- drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c index 432c98f2626d..d2d3b57a57d5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c @@ -73,16 +73,15 @@ int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev) u32 lock_val; int ret; + if (pci_channel_offline(dev->pdev)) + return -EACCES; + pci_cfg_access_lock(dev->pdev); do { if (retries > VSC_MAX_RETRIES) { ret = -EBUSY; goto pci_unlock; } - if (pci_channel_offline(dev->pdev)) { - ret = -EACCES; - goto pci_unlock; - } /* Check if semaphore is already locked */ ret = vsc_read(dev, VSC_SEMAPHORE_OFFSET, &lock_val); -- 2.48.1