Currently skb metadata location is derived from the MAC header offset. This breaks when L2 tunnel/tagging devices (VLAN, GRE, etc.) reset the MAC offset after pulling the encapsulation header, making the metadata inaccessible. A naive fix would be to move metadata on every skb_pull() path. However, we can avoid a memmove on L2 decapsulation if we can locate metadata independently of the MAC offset. Introduce a meta_end field in skb_shared_info to track where metadata ends, decoupling it from mac_header. The new field takes 2 bytes out of the existing 4 byte hole, with structure size unchanged if we reorder the gso_type field. Update skb_metadata_set() to record meta_end at the time of the call, and adjust skb_data_move() and pskb_expand_head() to keep meta_end in sync with head buffer layout. Remove the now-unneeded metadata adjustment in skb_reorder_vlan_header(). Note that this breaks BPF skb metadata access through skb->data_meta when there is a gap between meta_end and skb->data. Following BPF verifier changes address this. Also, we still need to relocate the metadata on encapsulation on forward path. VLAN and QinQ have already been patched when fixing TC BPF helpers [1], but other tagging/tunnel code still requires similar changes. This will be done as a follow up. Signed-off-by: Jakub Sitnicki --- include/linux/skbuff.h | 14 ++++++++++++-- net/core/skbuff.c | 10 ++-------- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ff90281ddf90..8868db976e1f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -595,15 +595,16 @@ struct skb_shared_info { __u8 meta_len; __u8 nr_frags; __u8 tx_flags; + u16 meta_end; unsigned short gso_size; /* Warning: this field is not always filled in (UFO)! */ unsigned short gso_segs; + unsigned int gso_type; struct sk_buff *frag_list; union { struct skb_shared_hwtstamps hwtstamps; struct xsk_tx_metadata_compl xsk_meta; }; - unsigned int gso_type; u32 tskey; /* @@ -4499,7 +4500,7 @@ static inline u8 skb_metadata_len(const struct sk_buff *skb) static inline void *skb_metadata_end(const struct sk_buff *skb) { - return skb_mac_header(skb); + return skb->head + skb_shinfo(skb)->meta_end; } static inline bool __skb_metadata_differs(const struct sk_buff *skb_a, @@ -4554,8 +4555,16 @@ static inline bool skb_metadata_differs(const struct sk_buff *skb_a, true : __skb_metadata_differs(skb_a, skb_b, len_a); } +/** + * skb_metadata_set - Record packet metadata length and location. + * @skb: packet carrying the metadata + * @meta_len: number of bytes of metadata preceding skb->data + * + * Must be called when skb->data already points past the metadata area. + */ static inline void skb_metadata_set(struct sk_buff *skb, u8 meta_len) { + skb_shinfo(skb)->meta_end = skb_headroom(skb); skb_shinfo(skb)->meta_len = meta_len; } @@ -4601,6 +4610,7 @@ static inline void skb_data_move(struct sk_buff *skb, const int len, } memmove(meta + len, meta, meta_len + n); + skb_shinfo(skb)->meta_end += len; return; no_metadata: diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4f4d7ab7057f..7142487644c3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2306,6 +2306,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, #endif skb->tail += off; skb_headers_offset_update(skb, nhead); + skb_shinfo(skb)->meta_end += nhead; skb->cloned = 0; skb->hdr_len = 0; skb->nohdr = 0; @@ -6219,8 +6220,7 @@ EXPORT_SYMBOL_GPL(skb_scrub_packet); static struct sk_buff *skb_reorder_vlan_header(struct sk_buff *skb) { - int mac_len, meta_len; - void *meta; + int mac_len; if (skb_cow(skb, skb_headroom(skb)) < 0) { kfree_skb(skb); @@ -6233,12 +6233,6 @@ static struct sk_buff *skb_reorder_vlan_header(struct sk_buff *skb) mac_len - VLAN_HLEN - ETH_TLEN); } - meta_len = skb_metadata_len(skb); - if (meta_len) { - meta = skb_metadata_end(skb) - meta_len; - memmove(meta + VLAN_HLEN, meta, meta_len); - } - skb->mac_header += VLAN_HLEN; return skb; } -- 2.43.0