Commit Graph

12 Commits

Author SHA1 Message Date
Joshua Washington
2236836eab gve: implement DQO TX datapath for AF_XDP zero-copy
In the descriptor clean path, a number of changes need to be made to
accommodate out of order completions and double completions.

The XSK stack can only handle completions being processed in order, as a
single counter is incremented in xsk_tx_completed to sigify how many XSK
descriptors have been completed. Because completions can come back out
of order in DQ, a separate queue of XSK descriptors must be maintained.
This queue keeps the pending packets in the order that they were written
so that the descriptors can be counted in xsk_tx_completed in the same
order.

For double completions, a new pending packet state and type are
introduced. The new type, GVE_TX_PENDING_PACKET_DQO_XSK, plays an
anlogous role to pre-existing _SKB and _XDP_FRAME pending packet types
for XSK descriptors. The new state, GVE_PACKET_STATE_XSK_COMPLETE,
represents packets for which no more completions are expected. This
includes packets which have received a packet completion or reinjection
completion, as well as packets whose reinjection completion timer have
timed out. At this point, such packets can be counted as part of
xsk_tx_completed() and freed.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-5-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-22 11:35:49 +02:00
Joshua Washington
d8a8ca14c9 gve: add XDP_TX and XDP_REDIRECT support for DQ RDA
This patch adds support for XDP_TX and XDP_REDIRECT for the DQ RDA
queue format. To appropriately support transmission of XDP frames, a
new pending packet type GVE_TX_PENDING_PACKET_DQO_XDP_FRAME is
introduced for completion handling, as there was a previous assumption
that completed packets would be SKBs.

XDP_TX handling completes the basic XDP actions, so the feature is
recorded accordingly. This patch also enables the ndo_xdp_xmit callback
allowing DQ to handle XDP_REDIRECT packets originating from another
interface.

The XDP spinlock is moved to common TX ring fields so that it can be
used in both GQ and DQ. Originally, it was in a section which was
mutually exclusive for GQ and DQ.

In summary, 3 XDP features are exposed for the DQ RDA queue format:
1) NETDEV_XDP_ACT_BASIC
2) NETDEV_XDP_ACT_NDO_XMIT
3) NETDEV_XDP_ACT_REDIRECT

Note that XDP and header-data split are mutually exclusive for the time
being due to lack of multi-buffer XDP support.

This patch does not add support for the DQ QPL format. That is to come
in a future patch series.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-21 14:26:24 +01:00
Shailend Chand
c93462b914 gve: Implement queue api
The new netdev queue api is implemented for gve.

Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/all/20240501232549.1327174-11-shailend@google.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-06 18:23:05 -07:00
Shailend Chand
f13697cc7a gve: Switch to config-aware queue allocation
The new config-aware functions will help achieve the goal of being able
to allocate resources for new queues while there already are active
queues serving traffic.

These new functions work off of arbitrary queue allocation configs
rather than just the currently active config in priv, and they return
the newly allocated resources instead of writing them into priv.

Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240122182632.1102721-4-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-23 17:41:31 -08:00
Shailend Chand
1dfc2e4611 gve: Refactor napi add and remove functions
This change makes the napi poll functions non-static and moves the
gve_(add|remove)_napi functions to gve_utils.c, to make possible future
"start queue" hooks in the datapath files.

Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240122182632.1102721-3-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-23 17:41:31 -08:00
Eric Dumazet
18de1e517e gve: add gve_features_check()
It is suboptimal to attempt skb linearization from ndo_start_xmit()
if a gso skb has pathological layout, or if host stack does not have
access to the payload (TCP direct). Linearization of large skbs
can also fail under memory pressure.

We should instead have an ndo_features_check() so that we can
fallback to GSO, which is supported even for TCP direct,
and generally much more efficient (no payload copy).

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Bailey Forrest <bcf@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jeroen de Borst <jeroendb@google.com>
Cc: Praveen Kaligineedi <pkaligineedi@google.com>
Cc: Shailend Chand <shailend@google.com>
Cc: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-17 03:29:16 +00:00
Tao Liu
6081ac2013 gve: Add tx|rx-coalesce-usec for DQO
Adding ethtool support for changing rx-coalesce-usec and tx-coalesce-usec
when using the DQO queue format.

Signed-off-by: Tao Liu <xliutaox@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16 10:41:54 +00:00
Catherine Sullivan
d30baacc04 gve: Move the irq db indexes out of the ntfy block struct
Giving the device access to other kernel structs is not ideal.
Move the indexes into their own array and just keep pointers to
them in the ntfy block struct.

Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16 10:41:53 +00:00
Bailey Forrest
a57e5de476 gve: DQO: Add TX path
TX SKBs will have their buffers DMA mapped with the device. Each buffer
will have at least one TX descriptor associated. Each SKB will also have
a metadata descriptor.

Each TX queue maintains an array of `gve_tx_pending_packet_dqo` objects.
Every TX SKB will have an associated pending_packet object. A TX SKB's
descriptors will use its pending_packet's index as the completion tag,
which will be returned on the TX completion queue.

The device implements a "flow-miss model". Most packets will simply
receive a packet completion. The flow-miss system may choose to process
a packet based on its contents. A TX packet which experiences a flow
miss would receive a miss completion followed by a later reinjection
completion. The miss-completion is received when the packet starts to be
processed by the flow-miss system and the reinjection completion is
received when the flow-miss system completes processing the packet and
sends it on the wire.

Notable mentions:

- Buffers may be freed after receiving the miss-completion, but in order
  to avoid packet reordering, we do not complete the SKB until receiving
  the reinjection completion.

- The driver must robustly handle the unlikely scenario where a miss
  completion does not have an associated reinjection completion. This is
  accomplished by maintaining a list of packets which have a pending
  reinjection completion. After a short timeout (5 seconds), the
  SKB and buffers are released and the pending_packet is moved to a
  second list which has a longer timeout (60 seconds), where the
  pending_packet will not be reused. When the longer timeout elapses,
  the driver may assume the reinjection completion would never be
  received and the pending_packet may be reused.

- Completion handling is triggered by an interrupt and is done in the
  NAPI poll function. Because the TX path and completion exist in
  different threading contexts they maintain their own lists for free
  pending_packet objects. The TX path uses a lock-free approach to steal
  the list from the completion path.

- Both the TSO context and general context descriptors have metadata
  bytes. The device requires that if multiple descriptors contain the
  same field, each descriptor must have the same value set for that
  field.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
0dcc144a79 gve: DQO: Configure interrupts on device up
When interrupts are first enabled, we also set the ratelimits, which
will be static for the entire usage of the device.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
9c1a59a2f4 gve: DQO: Add ring allocation and initialization
Allocate the buffer and completion ring structures. Do not populate the
rings yet. That will happen in the respective rx and tx datapath
follow-on patches

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
5e8c5adf95 gve: DQO: Add core netdev features
Add napi netdev device registration, interrupt handling and initial tx
and rx polling stubs. The stubs will be filled in follow-on patches.

Also:
- LRO feature advertisement and handling
- Also update ethtool logic

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00