unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that socket is properly set up, which would include having a defined peer. IOW, there's a window when unix_stream_bpf_update_proto() can be called on socket which still has unix_peer(sk) == NULL. T0 bpf T1 connect ------ ---------- WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED) sock_map_sk_state_allowed(sk) ... sk_pair = unix_peer(sk) sock_hold(sk_pair) sock_hold(newsk) smp_mb__after_atomic() unix_peer(sk) = newsk BUG: kernel NULL pointer dereference, address: 0000000000000080 RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0 Call Trace: sock_map_link+0x564/0x8b0 sock_map_update_common+0x6e/0x340 sock_map_update_elem_sys+0x17d/0x240 __sys_bpf+0x26db/0x3250 __x64_sys_bpf+0x21/0x30 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Initial idea was to move peer assignment _before_ the sk_state update[1], but that involved an additional memory barrier, and changing the hot path was rejected. Then a check during proto update was considered[2], but a follow-up discussion[3] concluded the root cause is sockmap taking a wrong lock. Or, more specifically, an insufficient lock[4]. Thus, teach sockmap about the af_unix-specific locking: af_unix protects critical sections under unix_state_lock() operating on unix_sock::lock. [1]: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/ [2]: https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/ [3]: https://lore.kernel.org/netdev/7603c0e6-cd5b-452b-b710-73b64bd9de26@linux.dev/ [4]: https://lore.kernel.org/netdev/CAAVpQUA+8GL_j63CaKb8hbxoL21izD58yr1NvhOhU=j+35+3og@mail.gmail.com/ Suggested-by: Kuniyuki Iwashima Suggested-by: Martin KaFai Lau Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()") Signed-off-by: Michal Luczaj --- net/core/sock_map.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 7ba6a7f24ccd..6109fbe6f99c 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -12,6 +12,7 @@ #include #include #include +#include #include struct bpf_stab { @@ -115,19 +116,43 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype) } static void sock_map_sk_acquire(struct sock *sk) - __acquires(&sk->sk_lock.slock) { lock_sock(sk); + + if (sk_is_unix(sk)) + unix_state_lock(sk); + rcu_read_lock(); } static void sock_map_sk_release(struct sock *sk) - __releases(&sk->sk_lock.slock) { rcu_read_unlock(); + + if (sk_is_unix(sk)) + unix_state_unlock(sk); + release_sock(sk); } +static inline void sock_map_sk_acquire_fast(struct sock *sk) +{ + local_bh_disable(); + bh_lock_sock(sk); + + if (sk_is_unix(sk)) + unix_state_lock(sk); +} + +static inline void sock_map_sk_release_fast(struct sock *sk) +{ + if (sk_is_unix(sk)) + unix_state_unlock(sk); + + bh_unlock_sock(sk); + local_bh_enable(); +} + static void sock_map_add_link(struct sk_psock *psock, struct sk_psock_link *link, struct bpf_map *map, void *link_raw) @@ -604,16 +629,14 @@ static long sock_map_update_elem(struct bpf_map *map, void *key, if (!sock_map_sk_is_suitable(sk)) return -EOPNOTSUPP; - local_bh_disable(); - bh_lock_sock(sk); + sock_map_sk_acquire_fast(sk); if (!sock_map_sk_state_allowed(sk)) ret = -EOPNOTSUPP; else if (map->map_type == BPF_MAP_TYPE_SOCKMAP) ret = sock_map_update_common(map, *(u32 *)key, sk, flags); else ret = sock_hash_update_common(map, key, sk, flags); - bh_unlock_sock(sk); - local_bh_enable(); + sock_map_sk_release_fast(sk); return ret; } -- 2.52.0