blk_hctx_poll() can busy-poll until a completion is found or need_resched() becomes true. On preemptible kernels, the scheduler can set TIF_NEED_RESCHED on the timer tick and preempt the task at IRQ return before the loop condition re-evaluates it. After the context switch, the flag is cleared, so the poller can continue spinning instead of returning to its caller. This can happen with io_uring IOPOLL reads inside iocb_bio_iopoll(), which holds the rcu_read_lock() while calling bio_poll(). If another poller on the same polled queue drains the available completions, this poller may repeatedly find no completions and remain inside the RCU read-side critical section long enough to trigger RCU stall reports: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P3961 rcu: (detected by 3, t=60002 jiffies, g=18533, q=4943 ncpus=20) task:fio state:R running task stack:0 pid:3961 Call Trace: ? nvme_poll+0x36/0xa0 [nvme] ? blk_hctx_poll+0x39/0x90 ? blk_mq_poll+0x30/0x60 ? bio_poll+0x87/0x170 ? iocb_bio_iopoll+0x32/0x50 ? io_uring_classic_poll+0x25/0x50 ? io_do_iopoll+0x216/0x420 ? __do_sys_io_uring_enter+0x2c7/0x7c0 Reproducible with: fio -filename=/dev/nvme0n1 -direct=1 -size=4g -rw=randread \ --numjobs=32 -bs=4K -ioengine=io_uring -hipri=1 -iodepth=1 \ --registerfiles=1 --group_reporting --thread Record the starting jiffy and exit the loop once jiffies has advanced. This bounds each blk_hctx_poll() invocation while also covering the case where the reschedule flag was cleared by the context switch before the loop condition could observe it. Fixes: f22ecf9c14c1 ("blk-mq: delete task running check in blk_hctx_poll()") Reviewed-by: Fengnan Chang Suggested-by: Fengnan Chang Signed-off-by: Anuj Gupta Signed-off-by: Alok Rathore --- block/blk-mq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 4c5c16cce4f8..e5850dc6c5d9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -5248,6 +5248,7 @@ static int blk_hctx_poll(struct request_queue *q, struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob, unsigned int flags) { int ret; + unsigned long timeout = jiffies + 2; do { ret = q->mq_ops->poll(hctx, iob); @@ -5258,7 +5259,7 @@ static int blk_hctx_poll(struct request_queue *q, struct blk_mq_hw_ctx *hctx, if (ret < 0 || (flags & BLK_POLL_ONESHOT)) break; cpu_relax(); - } while (!need_resched()); + } while (!need_resched() && time_before(jiffies, timeout)); return 0; } -- 2.25.1