readahead_folio() returns the next folio from the readahead control (rac) but it also drops the refcount on the folio that had been held by the rac. As such, there is only one refcount remaining on the folio (which is held by the page cache) after this returns. This is problematic because this opens a race where if the folio does not have an iomap_folio_state struct attached to it and the folio gets read in by the filesystem's IO helper, folio_end_read() may have already been called on the folio (which will unlock the folio) which allows the page cache to evict the folio (dropping the refcount and leading to the folio being freed), which leads to use-after-free issues when subsequent logic in iomap_readahead_iter() or iomap_read_end() accesses that folio. Fix this by invalidating ctx->cur_folio when a folio without iomap_folio_state metadata attached to it has been handed to the filesystem's IO helper. Fixes: b2f35ac4146d ("iomap: add caller-provided callbacks for read and readahead") Signed-off-by: Joanne Koong --- fs/iomap/buffered-io.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 6beb876658c0..2243399d70b5 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -502,6 +502,8 @@ static int iomap_read_folio_iter(struct iomap_iter *iter, loff_t pos = iter->pos; loff_t length = iomap_length(iter); struct folio *folio = ctx->cur_folio; + size_t folio_len = folio_size(folio); + struct iomap_folio_state *ifs; size_t poff, plen; loff_t pos_diff; int ret; @@ -513,10 +515,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter, return iomap_iter_advance(iter, length); } - ifs_alloc(iter->inode, folio, iter->flags); + ifs = ifs_alloc(iter->inode, folio, iter->flags); length = min_t(loff_t, length, - folio_size(folio) - offset_in_folio(folio, pos)); + folio_len - offset_in_folio(folio, pos)); while (length) { iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen); @@ -542,7 +544,24 @@ static int iomap_read_folio_iter(struct iomap_iter *iter, ret = ctx->ops->read_folio_range(iter, ctx, plen); if (ret) return ret; + *bytes_submitted += plen; + /* + * If the folio does not have ifs metadata attached, + * then after ->read_folio_range(), the folio might have + * gotten freed (eg iomap_finish_folio_read() -> + * folio_end_read() followed by page cache eviction, + * which for readahead folios drops the last refcount). + * Invalidate ctx->cur_folio here. + * + * For folios without ifs metadata attached, the read + * should be on the entire folio. + */ + if (!ifs) { + ctx->cur_folio = NULL; + if (unlikely(plen != folio_len)) + return -EIO; + } } ret = iomap_iter_advance(iter, plen); -- 2.47.3