raw_send_hdrinc() validates that the caller-supplied IPv4 header fits within the message length: iphlen = iph->ihl * 4; err = -EINVAL; if (iphlen > length) goto error_free; if (iphlen >= sizeof(*iph)) { /* fix up saddr, tot_len, id, csum, transport_header */ } It does not, however, reject ihl < 5. For such a packet the "if (iphlen >= sizeof(*iph))" branch is skipped, leaving the crafted iphdr untouched, but the packet is still handed to __ip_local_out() and onward. Downstream consumers that read iph->ihl assume a sane value: net/ipv4/ah4.c:ah_output() in particular subtracts sizeof(struct iphdr) from top_iph->ihl * 4 and passes the (signed-int-negative, then cast to size_t) result to memcpy(), producing an OOB access of length close to SIZE_MAX and a host kernel panic. An IPv4 header with ihl < 5 is malformed by definition (RFC 791: "Internet Header Length is the length of the internet header in 32 bit words ... Note that the minimum value for a correct header is 5."). The kernel should not be willing to inject such a packet into its own output path. Reject "iphlen < sizeof(*iph)" alongside the existing "iphlen > length" check. This matches the principle that locally constructed packets that re-enter the IP stack must pass the same basic sanity tests that a foreign packet would be subjected to. Once this lands, the "if (iphlen >= sizeof(*iph))" wrapper around the fixup branch becomes redundant; left in place to keep the patch minimal and backport-friendly. A follow-up can unwrap it. Note that commit 86f4c90a1c5c ("ipv4, ipv6: ensure raw socket message is big enough to hold an IP header") ensures the message buffer is large enough to hold an iphdr, but does not constrain the self-reported iph->ihl. Reachability: the malformed packet source is any caller with CAP_NET_RAW, including an unprivileged process in a user+net namespace on a kernel with CONFIG_USER_NS=y. The reproduced AH crash also requires a matching xfrm AH policy on the outgoing route; a container granted CAP_NET_ADMIN can install that state and policy in its netns. Loopback bypasses xfrm_output, so the trigger uses a real netdev. Reproduced on UML + KASAN: kernel-mode fault at addr 0x0 with memcpy_orig at the crash site. Same shape reproduces inside a rootless Docker container with --cap-add NET_ADMIN on a stock distro kernel. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Suggested-by: Herbert Xu Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito --- net/ipv4/raw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 5aaf9c62c8e1..68e88cb3e55c 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -391,7 +391,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, * in, reject the frame as invalid */ err = -EINVAL; - if (iphlen > length) + if (iphlen > length || iphlen < sizeof(*iph)) goto error_free; if (iphlen >= sizeof(*iph)) { -- 2.53.0 ah_output() and ah_output_done() copy the IPv4 options area with if (top_iph->ihl != 5) { memcpy(dst, src, top_iph->ihl * 4 - sizeof(struct iphdr)); } The "!= 5" guard correctly excludes the no-options case (ihl == 5) and allows ihl > 5 where options are present. It does NOT exclude ihl < 5. For ihl in [0, 4], top_iph->ihl * 4 is less than sizeof(struct iphdr) (20); the subtraction is computed as int, becomes negative, and is then implicitly converted to size_t at the memcpy() call. The resulting length is close to SIZE_MAX and memcpy walks off the slab allocation backing the skb's network header. With the preceding patch ("ipv4: raw: reject IP_HDRINCL packets with ihl < 5") in place, an ihl < 5 packet from a raw IP_HDRINCL socket is rejected before it reaches the local-output path. However, post-LOCAL_OUT hook mangling (nftables payload-set, NFQUEUE reinject) can still rewrite the IPv4 header after the raw_send_hdrinc validation has run and deliver an ihl < 5 packet to ah_output(). Reachability of this path requires CAP_NET_ADMIN in the relevant netns; it is a smaller class than the original CAP_NET_RAW path but it is not zero. Independently of the post-LOCAL_OUT mangling question, the AH consumer should not contain a memcpy whose size is derived from an attacker-influenced field without a floor. Change the guard to "top_iph->ihl > 5" at all three sites: - ah_output_done() (the .complete callback path) - ah_output() (the synchronous options-copy site) - ah_output() (the post-hash restore site) Behavior for valid packets (ihl in {5, 6, ..., 15}) is unchanged. For malformed packets with ihl < 5, the options copy is cleanly skipped; the malformed field no longer becomes a huge memcpy length. This is the defense-in-depth half of the series; the upstream sanity check in the preceding patch is the primary fix. A mirror-pattern audit found no analogous bug in ah_input(), ip_clear_mutable_options(), or net/ipv6/ah6.c (IPv6 has a fixed-length header and no IP_HDRINCL equivalent for crafting an ihl < 5 ipv6hdr). Reproduced on UML + KASAN: kernel-mode fault at addr 0x0 with memcpy_orig at the crash site on a pre-fix kernel. The AH guard was verified by forcing the same packets through xfrm: the xfrm state counter incremented and no KASAN splat or panic occurred. With the preceding patch in this series, the original raw IP_HDRINCL path is rejected before AH. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito --- net/ipv4/ah4.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index 4366cbac3f06..8fa31bdf9792 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -137,7 +137,7 @@ static void ah_output_done(void *data, int err) top_iph->tos = iph->tos; top_iph->ttl = iph->ttl; top_iph->frag_off = iph->frag_off; - if (top_iph->ihl != 5) { + if (top_iph->ihl > 5) { top_iph->daddr = iph->daddr; memcpy(top_iph+1, iph+1, top_iph->ihl*4 - sizeof(struct iphdr)); } @@ -197,7 +197,7 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb) iph->ttl = top_iph->ttl; iph->frag_off = top_iph->frag_off; - if (top_iph->ihl != 5) { + if (top_iph->ihl > 5) { iph->daddr = top_iph->daddr; memcpy(iph+1, top_iph+1, top_iph->ihl*4 - sizeof(struct iphdr)); err = ip_clear_mutable_options(top_iph, &top_iph->daddr); @@ -253,7 +253,7 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb) top_iph->tos = iph->tos; top_iph->ttl = iph->ttl; top_iph->frag_off = iph->frag_off; - if (top_iph->ihl != 5) { + if (top_iph->ihl > 5) { top_iph->daddr = iph->daddr; memcpy(top_iph+1, iph+1, top_iph->ihl*4 - sizeof(struct iphdr)); } -- 2.53.0