When write stream is set on the file, override the default directory-locality heuristic with a new heuristic that maps available AGs into streams. Isolating distinct write streams into dedicated allocation groups helps in reducing the block interleaving of concurrent writers. Keeping these streams spatially separated reduces AGF lock contention and logical file fragmentation. If AGs are fewer than write streams, write streams are distributed into available AGs in round robin fashion. If not, available AGs are partitioned into write streams. Since each write stream maps to a partition of multiple contiguous AGs, the inode hash is used to choose the specific AG within the stream partition. This can help with intra-stream concurency when multiple files are being written in a single stream that has 2 or more AGs. Example: 8 Allocation Groups, 4 Streams Partition Size = 2 AGs per Stream Stream 1 (ID: 1) Stream 2 (ID: 2) Streams 3 & 4 +---------+---------+ +---------+---------+ +------------- | AG0 | AG1 | | AG2 | AG3 | | AG4...AG7 +---------+---------+ +---------+---------+ +------------- ^ ^ ^ ^ | | | | | File B (ino: 101) | File D (ino: 201) | 101 % 2 = 1 -> AG 1 | 201 % 2 = 1 -> AG 3 | | File A (ino: 100) File C (ino: 200) 100 % 2 = 0 -> AG 0 200 % 2 = 0 -> AG 2 If AGs can not be evenly distributed among streams, the last stream will absorb the remaining AGs. Note that there are no hard boundaries; this only provides explicit routing hint to xfs allocator so that it can group/isolate files in the way application has decided to group/isolate. We still try to preserve file contiguity, and the full space can be utilized even with a single stream. Signed-off-by: Kanchan Joshi --- fs/xfs/libxfs/xfs_bmap.c | 9 +++++++++ fs/xfs/xfs_inode.c | 33 +++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode.h | 1 + 3 files changed, 43 insertions(+) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 7a4c8f1aa76c..facf56e8e01d 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3591,6 +3591,15 @@ xfs_bmap_btalloc_best_length( int error; ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino); + + /* override the default allocation heuristic if write stream is set */ + if (ap->ip->i_write_stream && ap->datatype & XFS_ALLOC_USERDATA) { + xfs_agnumber_t stream_ag = xfs_inode_write_stream_to_ag(ap->ip); + + if (stream_ag != NULLAGNUMBER) + ap->blkno = XFS_AGB_TO_FSB(args->mp, stream_ag, 0); + } + if (!xfs_bmap_adjacent(ap)) ap->eof = false; diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 9b88b2d1cf9a..e93141d2cd8b 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -93,6 +93,39 @@ xfs_inode_set_write_stream( return 0; } +xfs_agnumber_t +xfs_inode_write_stream_to_ag( + struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + uint8_t stream_id = ip->i_write_stream; + uint32_t max_streams = xfs_inode_max_write_streams(ip); + uint32_t nr_ags; + xfs_agnumber_t start_ag, ags_per_stream; + + if (XFS_IS_REALTIME_INODE(ip) || !max_streams) + return NULLAGNUMBER; + + stream_id -= 1; /* for 0-based math, stream-ids are 1-based */ + + nr_ags = mp->m_sb.sb_agcount; + ags_per_stream = nr_ags / max_streams; + + /* for the case when we have fewer AGs than streams */ + if (ags_per_stream == 0) { + start_ag = stream_id % nr_ags; + ags_per_stream = 1; + } else { + /* otherwise AGs are partitioned into N streams */ + start_ag = stream_id * ags_per_stream; + /* uneven distribution case: last stream may contain extra */ + if (stream_id == max_streams-1) + ags_per_stream = nr_ags - start_ag; + } + /* intra-stream concurrency: hash inode to choose AG within partition */ + return start_ag + (ip->i_ino % ags_per_stream); +} + /* * These two are wrapper routines around the xfs_ilock() routine used to * centralize some grungy code. They are used in places that wish to lock the diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 9f6cab729924..9ab31ff6b5e1 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -682,4 +682,5 @@ int xfs_icreate_dqalloc(const struct xfs_icreate_args *args, int xfs_inode_max_write_streams(struct xfs_inode *ip); uint8_t xfs_inode_get_write_stream(struct xfs_inode *ip); int xfs_inode_set_write_stream(struct xfs_inode *ip, uint8_t stream_id); +xfs_agnumber_t xfs_inode_write_stream_to_ag(struct xfs_inode *ip); #endif /* __XFS_INODE_H__ */ -- 2.25.1