The kTLS TX path can hand an open record to a sk_msg verdict program before encryption. If the verdict applies fewer bytes than the open record contains, tls_push_record() splits ctx->open_rec into the record being encrypted and a remainder. The synchronous path reattaches that remainder before continuing. With an async AEAD provider, crypto_aead_encrypt() can return -EINPROGRESS after ctx->open_rec has been unhooked but before the split remainder is reattached. The remainder is no longer reachable through ctx->open_rec or ctx->tx_list, silently dropping transmitted data and leaking the unreachable tls_rec. The same composition also entangles the user-page zerocopy lifetime rules with an async completion path. A sockmap cannot be attached to a socket after an inet ULP is installed: sk_psock_init() returns -EINVAL when inet_csk_has_ulp() is true. So the supported ordering for sockmap + kTLS TX is sockmap first, TLS_TX setup second. When TLS_TX setup sees an existing sk_psock, allocate the AEAD with CRYPTO_ALG_ASYNC masked out and latch the TX zerocopy gate (sw_ctx_tx->async_capable) so the buggy composition becomes structurally unreachable. Ordinary kTLS sockets without sk_msg BPF attached are unaffected and continue to use async-capable providers. Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Cc: stable@vger.kernel.org # 4.20+ Signed-off-by: Christopher Lusk Assisted-by: Codex:gpt-5.5 Assisted-by: Claude:claude-opus-4-7 --- Changes since v2 [1]: - Per netdev maintainer guidance [2], replace the Option-C drain-on-error fix with a setup-time surface narrowing in tls_set_sw_offload(): when a sockmap is already attached at TLS_TX setup, request a synchronous AEAD (CRYPTO_ALG_ASYNC in the allocation mask) and set sw_ctx_tx->async_capable = 1. Both moves are needed: latching async_capable alone disables zerocopy but tls_do_encryption() can still return -EINPROGRESS on the copy path; selecting a sync provider removes that return path for sk_msg-attached sockets. - Drop the selftest from the series per Jakub's note that the existing sockmap + TLS coverage at tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c exercises this configuration [3]. That suite covers sockmap + kTLS policy paths broadly; the specific async-pcrypt pass-then-drop failure mode from the v2 reproducer was validated for v3 on QEMU/KVM with a KASAN+LOCKDEP-instrumented kernel against net base 2156a29aecff before send. - Single-patch series. Changes since v1: - v1's remainder-rooting fix was incomplete; Sashiko AI review surfaced a real UAF in the v2 follow-up that John Fastabend endorsed on the v1 thread [4]. The surface-narrowing approach in v3 makes both failure modes unreachable by avoiding the async + sk_msg composition entirely rather than patching each continuation point. [1] https://lore.kernel.org/all/20260521025840.976378-1-clusk@northecho.dev/ [2] https://lore.kernel.org/all/20260525133028.58494274@kernel.org/ [3] https://lore.kernel.org/all/20260525133048.2dc6d8d3@kernel.org/ [4] https://lore.kernel.org/all/huduxtn6parzgiaf5cyiyrrvjjvx6jsdedowvrd4nkwmuyeind@j6migjgofh2i/ net/tls/tls_sw.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 964ebc268..0000000 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2867,7 +2867,20 @@ int tls_set_sw_offload(struct sock *sk, int tx, rec_seq = crypto_info_rec_seq(src_crypto_info, cipher_desc); if (!*aead) { - *aead = crypto_alloc_aead(cipher_desc->cipher_name, 0, 0); + u32 mask = 0; + + if (tx) { + struct sk_psock *psock; + + psock = sk_psock_get(sk); + if (psock) { + mask = CRYPTO_ALG_ASYNC; + sw_ctx_tx->async_capable = 1; + sk_psock_put(sk, psock); + } + } + + *aead = crypto_alloc_aead(cipher_desc->cipher_name, 0, mask); if (IS_ERR(*aead)) { rc = PTR_ERR(*aead); *aead = NULL; -- 2.54.0