Expose this as a constant so we can reuse it in drivers.

Signed-off-by: Mina Almasry <almasrymina@google.com>
---
 include/net/page_pool/types.h | 2 ++
 net/core/page_pool.c          | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 1509a536cb85..5edba3122b10 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -58,6 +58,8 @@ struct pp_alloc_cache {
 	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
 };
 
+#define PAGE_POOL_MAX_RING_SIZE 16384
+
 /**
  * struct page_pool_params - page pool parameters
  * @fast:	params accessed frequently on hotpath
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1a5edec485f1..7b2808da294f 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -211,7 +211,7 @@ static int page_pool_init(struct page_pool *pool,
 		return -EINVAL;
 
 	if (pool->p.pool_size)
-		ring_qsize = min(pool->p.pool_size, 16384);
+		ring_qsize = min(pool->p.pool_size, PAGE_POOL_MAX_RING_SIZE);
 
 	/* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
 	 * DMA_BIDIRECTIONAL is for allowing page used for DMA sending,

base-commit: 327c20c21d80e0d87834b392d83ae73c955ad8ff
-- 
2.51.2.1026.g39e6a42477-goog

NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
current gve devmem tcp configuration.

Root causing showed that this particular workload results in a very
bursty pattern of devmem allocations and frees, exhausting the page_pool
ring buffer. This results in sock_devmem_dontneed taking up to 5ms to
free a batch of 128 netmems, as each free does not find an available
entry in the pp->ring, and going all the way down to the (slow) gen_pool,
and gve_alloc_buffer running into a burst of successive allocations
which also don't find entries in the pp->ring (not dontneed'd yet,
presumably), each allocation taking up to 100us, slowing down the napi
poll loop.

From there, the slowness of the napi poll loop results, I suspect,
in the rx buffers not being processed in time, and packet drops
detected by tcpdump. The total sum of all this badness results in this
workload running at around 0.5 GB/s, when expected perf is around 12
GB/s.

This entire behavior can be avoided by increasing the pp->ring size to the
max allowed 16384. This makes the pp able to handle the bursty
alloc/frees of this particular workload. AFACT there should be no
negative side effect of arbitrarily increasing the pp->ring size in this
manner for ZC configs - the memory is prealloced and pinned by the
memory provider anyway.

Tested by running AllToAll PXN=2 workload. Before:

Avg bus bandwidth    : 0.434191

After:

Avg bus bandwidth    : 12.5494

Note that there is more we can do to optimize this path, such as bulk
netmem dontneeds, bulk netmem pp refills, and possibly taking a page
from the iouring zcrx playbook and replacing the gen_pool with a simpler
fixed-size array based allocator, but this seems sufficient to fix these
critcal workloads.

With thanks to Willem and Eric for helping root cause this,

Cc: ziweixiao@google.com
Fixes: 62d7f40503bc ("gve: support unreadable netmem")
Reported-by: Vedant Mathur <vedantmathur@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
---
 drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
index 0e2b703c673a..f63ffdd3b3ba 100644
--- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
@@ -8,6 +8,8 @@
 #include "gve.h"
 #include "gve_utils.h"
 
+#include "net/netdev_queues.h"
+
 int gve_buf_ref_cnt(struct gve_rx_buf_state_dqo *bs)
 {
 	return page_count(bs->page_info.page) - bs->page_info.pagecnt_bias;
@@ -263,6 +265,8 @@ struct page_pool *gve_rx_create_page_pool(struct gve_priv *priv,
 	if (priv->header_split_enabled) {
 		pp.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM;
 		pp.queue_idx = rx->q_num;
+		if  (netif_rxq_has_unreadable_mp(priv->dev, rx->q_num))
+			pp.pool_size = PAGE_POOL_MAX_RING_SIZE;
 	}
 
 	return page_pool_create(&pp);
-- 
2.51.2.1026.g39e6a42477-goog