The material change is I_DIRTY_TIME handling without a spurious ref acquire/release cycle. While here a bunch of smaller changes: 1. predict there is an inode -- bpftrace suggests one is passed vast majority of the time 2. convert BUG_ON into VFS_BUG_ON_INODE 3. assert on ->i_count 4. assert ->i_lock is not held 5. flip the order of I_DIRTY_TIME and nlink count checks as the former is less likely to be true I verified atomic_read(&inode->i_count) does not show up in asm if debug is disabled. Signed-off-by: Mateusz Guzik --- The routine kept annoying me, so here is a further revised variant. I verified this compiles, but I still cannot runtime test. I'm sorry for that. My signed-off is conditional on a good samaritan making sure it works :) diff compared to the thing I sent "informally": - if (unlikely(!inode)) - asserts - slightly reworded iput_final commentary - unlikely() on the second I_DIRTY_TIME check Given the revamp I think it makes sense to attribute the change to me, hence a "proper" mail. The thing surviving from the submission by Josef is: + if (atomic_add_unless(&inode->i_count, -1, 1)) + return; And of course he is the one who brought up the spurious refcount trip in the first place. I'm happy with Reported-by, Co-developed-by or whatever other credit as you guys see fit. That aside I think it would be nice if NULL inodes passed to iput became illegal, but that's a different story for another day. fs/inode.c | 46 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 35 insertions(+), 11 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 01ebdc40021e..01a554e11279 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1908,20 +1908,44 @@ static void iput_final(struct inode *inode) */ void iput(struct inode *inode) { - if (!inode) + if (unlikely(!inode)) return; - BUG_ON(inode->i_state & I_CLEAR); + retry: - if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) { - if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) { - atomic_inc(&inode->i_count); - spin_unlock(&inode->i_lock); - trace_writeback_lazytime_iput(inode); - mark_inode_dirty_sync(inode); - goto retry; - } - iput_final(inode); + lockdep_assert_not_held(&inode->i_lock); + VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode); + /* + * Note this assert is technically racy as if the count is bogusly + * equal to one, then two CPUs racing to further drop it can both + * conclude it's fine. + */ + VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode); + + if (atomic_add_unless(&inode->i_count, -1, 1)) + return; + + if ((inode->i_state & I_DIRTY_TIME) && inode->i_nlink) { + trace_writeback_lazytime_iput(inode); + mark_inode_dirty_sync(inode); + goto retry; } + + spin_lock(&inode->i_lock); + if (unlikely((inode->i_state & I_DIRTY_TIME) && inode->i_nlink)) { + spin_unlock(&inode->i_lock); + goto retry; + } + + if (!atomic_dec_and_test(&inode->i_count)) { + spin_unlock(&inode->i_lock); + return; + } + + /* + * iput_final() drops ->i_lock, we can't assert on it as the inode may + * be deallocated by the time the call returns. + */ + iput_final(inode); } EXPORT_SYMBOL(iput); -- 2.43.0