Refcount of a NORCU dentry must not be incremented after having dropped to zero. Otherwise we might end up with the following race: CPU1: in fast_dput(d), rcu_read_lock(); CPU1: decrements refcount of d to 0 CPU1: notice that it's unhashed CPU2: grab a reference to d CPU2: dput(d), freeing d CPU1: ... looks like we need to evict d, let's grab ->d_lock, recheck the refcount, etc. and that spin_lock(&d->d_lock) ends up a UAF, despite still being in an RCU read-side critical area started back when the refcount had been positive. If not for DCACHE_NORCU in d->d_flags freeing would've been RCU-delayed, so we'd have grabbed ->d_lock, noticed the negative value stored into refcount by __dentry_kill(), dropped the locks and that would be it. For NORCU dentries freeing is _not_ delayed, though. Most of the non-counting references are excluded for NORCU dentries - they are not allowed to be hashed, they never get placed on LRU, they never get placed into anyone's list of children and while dput_to_list() might put them into a shrink list, nobody bumps refcount of something that had been reached that way. However, inode's list of aliases can be a problem - it does not contribute to dentry refcount (for obvious reasons) and we *do* have places that grab references to something found on that list - that's precisely what d_find_alias() is. In case of d_find_alias() we are safe - it skips unhashed aliases, so all NORCU ones are ignored there. d_find_any_alias() is *not* limited to hashed ones, though, and while it's usually called for directories (which never get NORCU dentries), there are callers that use it to get something for non-directories with no hashed aliases. Having d_find_any_alias() hit a NORCU dentry is not impossible - it can be easily arranged if you have CAP_DAC_READ_SEARCH (memfd_create() + mmap() + name_to_handle_at() for /proc/self/map_files/<...> + munmap() + open_by_handle_at() will do that, and adding a second memfd_create() for mount_fd makes it possible to do that without having memfd pinned). The race window is narrow, and it's probably not feasible on bare hardware, but... It's not hard to fix, fortunately: * separate __d_find_dir_alias() (== current __d_find_any_alias()) to be used for directory inodes. * provide dget_alias_ilocked() that would return false for NORCU dentries with zero refcount and return true incrementing refcount otherwise * make __d_find_any_alias() go over the list of aliases, using dget_alias_ilocked() and returning the alias it succeeds on (normally the first one). Any NORCU alias with zero refcount is going to be evicted by the thread that had dropped the final reference; this makes __d_find_any_alias() pretend it had lost the race with eviction. Signed-off-by: Al Viro --- fs/dcache.c | 21 ++++++++++++++++++--- include/linux/dcache.h | 18 ++++++++++++++++++ 2 files changed, 36 insertions(+), 3 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 0aff2c510beb..fa12e18906b9 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1052,7 +1052,10 @@ struct dentry *dget_parent(struct dentry *dentry) } EXPORT_SYMBOL(dget_parent); -static struct dentry * __d_find_any_alias(struct inode *inode) +/* + * inode is a directory, inode->i_lock is held by the caller + */ +static struct dentry * __d_find_dir_alias(struct inode *inode) { struct dentry *alias; @@ -1063,6 +1066,18 @@ static struct dentry * __d_find_any_alias(struct inode *inode) return alias; } +static struct dentry * __d_find_any_alias(struct inode *inode) +{ + struct dentry *alias; + + if (hlist_empty(&inode->i_dentry)) + return NULL; + for_each_alias(alias, inode) + if (dget_alias_ilocked(alias)) + return alias; + return NULL; +} + /** * d_find_any_alias - find any alias for a given inode * @inode: inode to find an alias for @@ -1086,7 +1101,7 @@ static struct dentry *__d_find_alias(struct inode *inode) struct dentry *alias; if (S_ISDIR(inode->i_mode)) - return __d_find_any_alias(inode); + return __d_find_dir_alias(inode); for_each_alias(alias, inode) { spin_lock(&alias->d_lock); @@ -3150,7 +3165,7 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry, security_d_instantiate(dentry, inode); spin_lock(&inode->i_lock); if (S_ISDIR(inode->i_mode)) { - struct dentry *new = __d_find_any_alias(inode); + struct dentry *new = __d_find_dir_alias(inode); if (unlikely(new)) { /* The reference to new ensures it remains an alias */ spin_unlock(&inode->i_lock); diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 97a887be150a..a3409de3f490 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -365,6 +365,24 @@ static inline struct dentry *dget(struct dentry *dentry) return dentry; } +/* dentry->d_inode->i_lock must be held by caller */ +static inline bool dget_alias_ilocked(struct dentry *dentry) +{ + if (likely(!(READ_ONCE(dentry->d_flags) & DCACHE_NORCU))) { + lockref_get(&dentry->d_lockref); + return true; + } + // NORCU dentries with zero refcount MUST NOT be grabbed + spin_lock(&dentry->d_lock); + if (dentry->d_lockref.count > 0) { + dget_dlock(dentry); + spin_unlock(&dentry->d_lock); + return true; + } + spin_unlock(&dentry->d_lock); + return false; +} + extern struct dentry *dget_parent(struct dentry *dentry); /** -- 2.47.3