From: Ankit Garg The DQO RX datapath programs a per-buffer-queue-descriptor header_buf_addr at post time and reads the split header back at completion time. Both the post and the read currently index the header buffer by queue position rather than by the buffer's identity: - post (gve_rx_post_buffers_dqo): header_buf_addr is computed from bufq->tail - read (gve_rx_dqo): the header is read from desc_idx (the completion queue head index) This relies on the buffer-queue index and the completion-queue index being equal for the start of every packet, i.e. on the device consuming posted buffers and returning completions in the exact same order. That assumption does not hold once HW-GRO is enabled with multiple flows: coalesced segments are accepted and completed in an order that may differ from the order buffers were posted, and segments from different flows may interleave. That results in two problems: 1. Wrong header slot on read. Because the read offset is derived from the completion index (desc_idx) while the device wrote the header to the address programmed for the buffer's buf_id, the driver can copy a header belonging to a different packet. This shows up as throughput drop (about 30% drop and large numbers of TCP retransmissions) with header-split and HW-GRO both enabled and many streams. 2. Header buffer reused while still owned by the device. The driver advances bufq->head by one per completion and re-posts buffers based on that. Arrival of N RX completions only guarantees that at least N RX buffer descriptors have been read by the device. It does not guarantee that the device has relinquished the ownership of all the buffers corresponding to those N descriptors. With out-of-order completions (e.g. the completion for a packet copied into buffer N arrives before the completion for a packet copied into buffer N-1), the driver can re-post and overwrite a header buffer that the device is still going to write into, corrupting the header of a packet whose completion has not yet been processed. Fix both issues by indexing the header buffer by buf_id on both the post and read paths. Reading from buf_id's slot is therefore always correct regardless of completion ordering (fixes problem 1). Indexing by buf_id also ties each header slot to the lifetime of its buffer state. A buffer state is only returned to the free/recycle lists when its own completion (buf_id) is processed, so its header slot can only be re-posted after the device is done with it. This makes header slot reuse safe under out-of-order completions (fixes problem 2). Allocate (gve_rx_alloc_hdr_bufs) and free (gve_rx_free_hdr_bufs) the header buffers based on num_buf_states to match the buf_id indexing. Cc: stable@vger.kernel.org Fixes: 5e37d8254e7f ("gve: Add header split data path") Signed-off-by: Ankit Garg Reviewed-by: Praveen Kaligineedi Reviewed-by: Jordan Rhee Reviewed-by: Harshitha Ramamurthy Signed-off-by: Joshua Washington --- drivers/net/ethernet/google/gve/gve_rx_dqo.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c index 7924dce7..02cba280 100644 --- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c @@ -21,11 +21,13 @@ static void gve_rx_free_hdr_bufs(struct gve_priv *priv, struct gve_rx_ring *rx) { struct device *hdev = &priv->pdev->dev; - int buf_count = rx->dqo.bufq.mask + 1; if (rx->dqo.hdr_bufs.data) { - dma_free_coherent(hdev, priv->header_buf_size * buf_count, - rx->dqo.hdr_bufs.data, rx->dqo.hdr_bufs.addr); + size_t size = + (size_t)priv->header_buf_size * rx->dqo.num_buf_states; + + dma_free_coherent(hdev, size, rx->dqo.hdr_bufs.data, + rx->dqo.hdr_bufs.addr); rx->dqo.hdr_bufs.data = NULL; } } @@ -254,7 +256,7 @@ int gve_rx_alloc_ring_dqo(struct gve_priv *priv, /* Allocate header buffers for header-split */ if (cfg->enable_header_split) - if (gve_rx_alloc_hdr_bufs(priv, rx, buffer_queue_slots)) + if (gve_rx_alloc_hdr_bufs(priv, rx, rx->dqo.num_buf_states)) goto err; /* Allocate RX completion queue */ @@ -381,10 +383,13 @@ void gve_rx_post_buffers_dqo(struct gve_rx_ring *rx) break; } - if (rx->dqo.hdr_bufs.data) + if (rx->dqo.hdr_bufs.data) { + u16 buf_id = le16_to_cpu(desc->buf_id); + desc->header_buf_addr = cpu_to_le64(rx->dqo.hdr_bufs.addr + - priv->header_buf_size * bufq->tail); + (size_t)priv->header_buf_size * buf_id); + } bufq->tail = (bufq->tail + 1) & bufq->mask; complq->num_free_slots--; @@ -826,10 +831,13 @@ static int gve_rx_dqo(struct napi_struct *napi, struct gve_rx_ring *rx, int unsplit = 0; if (hdr_len && !hbo) { - rx->ctx.skb_head = gve_rx_copy_data(priv->dev, napi, - rx->dqo.hdr_bufs.data + - desc_idx * priv->header_buf_size, - hdr_len); + size_t offset = + (size_t)buffer_id * priv->header_buf_size; + + rx->ctx.skb_head = + gve_rx_copy_data(priv->dev, napi, + rx->dqo.hdr_bufs.data + offset, + hdr_len); if (unlikely(!rx->ctx.skb_head)) goto error; rx->ctx.skb_tail = rx->ctx.skb_head; -- 2.55.0.rc0.738.g0c8ab3ebcc-goog