bio_may_need_split() uses bi_vcnt to determine if a bio has a single segment, but bi_vcnt is unreliable for cloned bios. Cloned bios share the parent's bi_io_vec array but iterate over a subset via bi_iter, so bi_vcnt may not reflect the actual segment count being iterated. Replace the bi_vcnt check with bvec iterator access via __bvec_iter_bvec(), comparing bi_iter.bi_size against the current bvec's length. This correctly handles both cloned and non-cloned bios. Move bi_io_vec into the first cache line adjacent to bi_iter. This is a sensible layout since bi_io_vec and bi_iter are commonly accessed together throughout the block layer - every bvec iteration requires both fields. This displaces bi_end_io to the second cache line, which is acceptable since bi_end_io and bi_private are always fetched together in bio_endio() anyway. The struct layout change requires bio_reset() to preserve and restore bi_io_vec across the memset, since it now falls within BIO_RESET_BYTES. Nitesh verified that this patch doesn't regress NVMe 512-byte IO perf [1]. Link: https://lore.kernel.org/linux-block/20251220081607.tvnrltcngl3cc2fh@green245.gost/ [1] Signed-off-by: Ming Lei --- block/bio.c | 3 +++ block/blk.h | 12 +++++++++--- include/linux/blk_types.h | 4 ++-- 3 files changed, 14 insertions(+), 5 deletions(-) diff --git a/block/bio.c b/block/bio.c index e726c0e280a8..0e936288034e 100644 --- a/block/bio.c +++ b/block/bio.c @@ -301,9 +301,12 @@ EXPORT_SYMBOL(bio_init); */ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf) { + struct bio_vec *bv = bio->bi_io_vec; + bio_uninit(bio); memset(bio, 0, BIO_RESET_BYTES); atomic_set(&bio->__bi_remaining, 1); + bio->bi_io_vec = bv; bio->bi_bdev = bdev; if (bio->bi_bdev) bio_associate_blkg(bio); diff --git a/block/blk.h b/block/blk.h index e4c433f62dfc..98f4dfd4ec75 100644 --- a/block/blk.h +++ b/block/blk.h @@ -371,12 +371,18 @@ struct bio *bio_split_zone_append(struct bio *bio, static inline bool bio_may_need_split(struct bio *bio, const struct queue_limits *lim) { + const struct bio_vec *bv; + if (lim->chunk_sectors) return true; - if (bio->bi_vcnt != 1) + + if (!bio->bi_io_vec) + return true; + + bv = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); + if (bio->bi_iter.bi_size > bv->bv_len) return true; - return bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > - lim->max_fast_segment_size; + return bv->bv_len + bv->bv_offset > lim->max_fast_segment_size; } /** diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 5dc061d318a4..19a888a2f104 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -232,6 +232,8 @@ struct bio { atomic_t __bi_remaining; + /* The actual vec list, preserved by bio_reset() */ + struct bio_vec *bi_io_vec; struct bvec_iter bi_iter; union { @@ -275,8 +277,6 @@ struct bio { atomic_t __bi_cnt; /* pin count */ - struct bio_vec *bi_io_vec; /* the actual vec list */ - struct bio_set *bi_pool; }; -- 2.47.0 bio_iov_bvec_set() creates a cloned bio that borrows a bvec array from an iov_iter. For cloned bios, bi_vcnt is meaningless because iteration is controlled entirely by bi_iter (bi_idx, bi_size, bi_bvec_done), not by bi_vcnt. Remove the incorrect bi_vcnt assignment. Explicitly initialize bi_iter.bi_idx to 0 to ensure iteration starts at the first bvec. While bi_idx is typically already zero from bio initialization, making this explicit improves clarity and correctness. This change also avoids accessing iter->nr_segs, which is an iov_iter implementation detail that block code should not depend on. Signed-off-by: Ming Lei --- block/bio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/bio.c b/block/bio.c index 0e936288034e..2359c0723b88 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1165,8 +1165,8 @@ void bio_iov_bvec_set(struct bio *bio, const struct iov_iter *iter) { WARN_ON_ONCE(bio->bi_max_vecs); - bio->bi_vcnt = iter->nr_segs; bio->bi_io_vec = (struct bio_vec *)iter->bvec; + bio->bi_iter.bi_idx = 0; bio->bi_iter.bi_bvec_done = iter->iov_offset; bio->bi_iter.bi_size = iov_iter_count(iter); bio_set_flag(bio, BIO_CLONED); -- 2.47.0 io_import_kbuf() recalculates iter->nr_segs to reflect only the bvecs needed for the requested byte range. This was added to provide an accurate segment count to bio_iov_bvec_set(), which copied nr_segs to bio->bi_vcnt for use as a bio split hint. The previous two patches eliminated this dependency: - bio_may_need_split() now uses bi_iter instead of bi_vcnt for split decisions - bio_iov_bvec_set() no longer copies nr_segs to bi_vcnt Since nr_segs is no longer used for bio split decisions, the recalculation loop is unnecessary. The iov_iter already has the correct bi_size to cap iteration, so an oversized nr_segs is harmless. Link: https://lkml.org/lkml/2025/4/16/351 Signed-off-by: Ming Lei --- io_uring/rsrc.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 41c89f5c616d..ee6283676ba7 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1055,17 +1055,6 @@ static int io_import_kbuf(int ddir, struct iov_iter *iter, iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, count); iov_iter_advance(iter, offset); - - if (count < imu->len) { - const struct bio_vec *bvec = iter->bvec; - - len += iter->iov_offset; - while (len > bvec->bv_len) { - len -= bvec->bv_len; - bvec++; - } - iter->nr_segs = 1 + bvec - iter->bvec; - } return 0; } -- 2.47.0