Commit Graph

337 Commits

Author SHA1 Message Date
Akihiko Odaki
6071d13c6a virtio-net: Fix VLAN filter table reset timing
Problem
-------

The expected initial state of the table depends on feature negotiation:

With VIRTIO_NET_F_CTRL_VLAN:
  The table must be empty in accordance with the specification.
Without VIRTIO_NET_F_CTRL_VLAN:
  The table must be filled to permit all VLAN traffic.

Prior to commit 06b636a1e2 ("virtio-net: do not reset vlan filtering
at set_features"), virtio_net_set_features() always reset the VLAN
table. That commit changed the behavior to skip table reset when
VIRTIO_NET_F_CTRL_VLAN was negotiated, assuming the table would be
properly cleared during device reset and remain stable.

However, this assumption breaks when a driver renegotiates features:
1. Initial negotiation without VIRTIO_NET_F_CTRL_VLAN (table filled)
2. Renegotiation with VIRTIO_NET_F_CTRL_VLAN (table will not be cleared)

The problem was exacerbated by commit 0caed25cd1 ("virtio: Call
set_features during reset"), which triggered virtio_net_set_features()
during device reset, exposing the bug whenever VIRTIO_NET_F_CTRL_VLAN
was negotiated after a device reset.

Solution
--------

Fix the issue by initializing the table when virtio_net_set_features()
is called to change the VIRTIO_NET_F_CTRL_VLAN bit of
vdev->guest_features.

This approach ensures the correct table state regardless of feature
negotiation sequence by performing initialization in
virtio_net_set_features() as QEMU did prior to commit 06b636a1e2
("virtio-net: do not reset vlan filtering at set_features").

This change still preserves the goal of the commit, which was to avoid
resetting the table during migration, by checking whether the
VIRTIO_NET_F_CTRL_VLAN bit of vdev->guest_features is being changed;
vdev->guest_features is set before virtio_net_set_features() gets called
during migration.

It also avoids resetting the table when the driver sets a feature
bitmask with no change for the VIRTIO_NET_F_CTRL_VLAN bit, which makes
the operation idempotent and its semantics cleaner.

Additionally, this change ensures the table is initialized after
feature negotiation and before the DRIVER_OK status bit being set for
compatibility with the Linux driver before commit 50c0ada627f5
("virtio-net: fix race between ndo_open() and virtio_device_ready()"),
which did not ensure to set the DRIVER_OK status bit before modifying
the table.

Fixes: 06b636a1e2 ("virtio-net: do not reset vlan filtering at set_features")
Cc: qemu-stable@nongnu.org
Reported-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Tested-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20250727-vlan-v3-1-bbee738619b1@rsg.ci.i.u-tokyo.ac.jp>
Tested-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01 08:33:04 -04:00
Stefan Hajnoczi
e452053097 virtio,pci,pc: features, fixes, tests
SPCR acpi table can now be disabled
 vhost-vdpa can now report hashing capability to guest
 PPTT acpi table now tells guest vCPUs are identical
 vost-user-blk now shuts down faster
 loongarch64 now supports bios-tables-test
 intel_iommu now supports ATS
 cxl now supports DCD Fabric Management Command Set
 arm now supports acpi pci hotplug
 
 fixes, cleanups
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ
 pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv
 joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i
 TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg
 h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z
 ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc=
 =sktm
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pci,pc: features, fixes, tests

SPCR acpi table can now be disabled
vhost-vdpa can now report hashing capability to guest
PPTT acpi table now tells guest vCPUs are identical
vost-user-blk now shuts down faster
loongarch64 now supports bios-tables-test
intel_iommu now supports ATS
cxl now supports DCD Fabric Management Command Set
arm now supports acpi pci hotplug

fixes, cleanups

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

 # -----BEGIN PGP SIGNATURE-----
 #
 # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo
 # YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ
 # pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv
 # joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i
 # TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg
 # h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z
 # ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc=
 # =sktm
 # -----END PGP SIGNATURE-----
 # gpg: Signature made Tue 15 Jul 2025 02:56:48 EDT
 # gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
 # gpg:                issuer "mst@redhat.com"
 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
 # gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
 # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
 #      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (97 commits)
  hw/cxl: mailbox-utils: 0x5605 - FMAPI Initiate DC Release
  hw/cxl: mailbox-utils: 0x5604 - FMAPI Initiate DC Add
  hw/cxl: Create helper function to create DC Event Records from extents
  hw/cxl: mailbox-utils: 0x5603 - FMAPI Get DC Region Extent Lists
  hw/cxl: mailbox-utils: 0x5602 - FMAPI Set DC Region Config
  hw/mem: cxl_type3: Add DC Region bitmap lock
  hw/cxl: Move definition for dynamic_capacity_uuid and enum for DC event types to header
  hw/cxl: mailbox-utils: 0x5601 - FMAPI Get Host Region Config
  hw/mem: cxl_type3: Add dsmas_flags to CXLDCRegion struct
  hw/cxl: mailbox-utils: 0x5600 - FMAPI Get DCD Info
  hw/cxl: fix DC extent capacity tracking
  tests: virt: Update expected ACPI tables for virt test
  hw/acpi/aml-build: Build a root node in the PPTT table
  hw/acpi/aml-build: Set identical implementation flag for PPTT processor nodes
  tests: virt: Allow changes to PPTT test table
  qtest/bios-tables-test: Generate reference blob for DSDT.acpipcihp
  qtest/bios-tables-test: Generate reference blob for DSDT.hpoffacpiindex
  tests/qtest/bios-tables-test: Add aarch64 ACPI PCI hotplug test
  tests/qtest/bios-tables-test: Prepare for addition of acpi pci hp tests
  hw/arm/virt: Let virt support pci hotplug/unplug GED event
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Conflicts:
  net/vhost-vdpa.c
  vhost_vdpa_set_steering_ebpf() was removed, resolve the context
  conflict.
2025-07-16 07:00:47 -04:00
Akihiko Odaki
729b573419 virtio-net: Add hash type options
By default, virtio-net limits the hash types that will be advertised to
the guest so that all hash types are covered by the offloading
capability the client provides. This change allows to override this
behavior and to advertise hash types that require user-space hash
calculation by specifying "on" for the corresponding properties.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250530-vdpa-v1-6-5af4109b1c19@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-07-14 05:25:34 -04:00
Akihiko Odaki
7b6e7e4990 virtio-net: Retrieve peer hashing capability
Retrieve peer hashing capability instead of hardcoding.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250530-vdpa-v1-4-5af4109b1c19@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-07-14 05:25:34 -04:00
Akihiko Odaki
2deec9ab7d virtio-net: Move virtio_net_get_features() down
Move virtio_net_get_features() to the later part of the file so that
it can call other functions.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250530-vdpa-v1-3-5af4109b1c19@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-07-14 05:25:34 -04:00
Laurent Vivier
ba5acc5d6e net: Add is_vhost_user flag to vhost_net struct
Introduce a boolean is_vhost_user field to the vhost_net
structure. This flag is initialized during vhost_net_init based
on whether the backend is vhost-user.

This refactoring simplifies checks for vhost-user specific behavior,
replacing direct comparisons of 'net->nc->info->type' with the new
flag. It improves readability and encapsulates the backend type
information directly within the vhost_net instance.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2025-07-14 13:27:09 +08:00
Laurent Vivier
33b78a30a3 net: Allow network backends to advertise max TX queue size
This commit refactors how the maximum transmit queue size for
virtio-net devices is determined, making the mechanism more generic
and extensible.

Previously, virtio_net_max_tx_queue_size() contained hardcoded
checks for specific network backend types (vhost-user and
vhost-vdpa) to determine their supported maximum queue size. This
created direct dependencies and would require modifications for
every new backend that supports variable queue sizes.

To improve flexibility, a new max_tx_queue_size field is added
to the vhost_net structure. This allows each network backend
to advertise its supported maximum transmit queue size directly.

The virtio_net_max_tx_queue_size() function now retrieves the max
TX queue size from the vhost_net struct, if available and set.
Otherwise, it defaults to VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2025-07-14 13:27:09 +08:00
Laurent Vivier
7136352b40 vhost_net: Rename vhost_set_vring_enable() for clarity
This is a cosmetic change with no functional impact.

The function vhost_set_vring_enable() is specific to vhost_net and
is used outside of vhost_net.c (specifically, in
hw/net/virtio-net.c). To prevent confusion with other similarly named
vhost functions, such as the one found in cryptodev-vhost.c, it has
been renamed to vhost_net_set_vring_enable(). This clarifies that the
function belongs to the vhost_net module.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2025-07-14 13:27:09 +08:00
Akihiko Odaki
adda0ad56b virtio-net: Add queues for RSS during migration
virtio_net_pre_load_queues() inspects vdev->guest_features to tell if
VIRTIO_NET_F_RSS or VIRTIO_NET_F_MQ is enabled to infer the required
number of queues. This works for VIRTIO_NET_F_MQ but it doesn't for
VIRTIO_NET_F_RSS because only the lowest 32 bits of vdev->guest_features
is set at the point and VIRTIO_NET_F_RSS uses bit 60 while
VIRTIO_NET_F_MQ uses bit 22.

Instead of inferring the required number of queues from
vdev->guest_features, use the number loaded from the vm state. This
change also has a nice side effect to remove a duplicate peer queue
pair change by circumventing virtio_net_set_multiqueue().

Also update the comment in include/hw/virtio/virtio.h to prevent an
implementation of pre_load_queues() from refering to any fields being
loaded during migration by accident in the future.

Fixes: 8c49756825 ("virtio-net: Add only one queue pair when realizing")

Tested-by: Lei Yang <leiyang@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2025-07-14 13:26:52 +08:00
Daniel P. Berrangé
21cf31c51a hw/net/virtio-net: skip automatic zero-init of large arrays
The 'virtio_net_receive_rcu' method has three arrays with
VIRTQUEUE_MAX_SIZE elements, which are apprixmately 32k in
size used for copying data between guest and host. Skip the
automatic zero-init of these arrays to eliminate the
performance overhead in the I/O hot path.

The three arrays will be selectively initialized as required
when processing network buffers.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20250610123709.835102-22-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-06-12 13:40:16 -04:00
Haoqian He
bc85aae420 vhost-user: return failure if backend crash when live migration
Live migration should be terminated if the vhost-user backend crashes
before the migration completes.

Specifically, since the vhost device will be stopped when VM is stopped
before the end of the live migration, in current implementation if the
backend crashes, vhost-user device set_status() won't return failure,
live migration won't perceive the disconnection between QEMU and the
backend.

When the VM is migrated to the destination, the inflight IO will be
resubmitted, and if the IO was completed out of order before, it will
cause IO error.

To fix this issue:
1. Add the return value to set_status() for VirtioDeviceClass.
  a. For the vhost-user device, return failure when the backend crashes.
  b. For other virtio devices, always return 0.
2. Return failure if vhost_dev_stop() failed for vhost-user device.

If QEMU loses connection with the vhost-user backend, virtio set_status()
can return failure to the upper layer, migration_completion() can handle
the error, terminate the live migration, and restore the VM, so that
inflight IO can be completed normally.

Signed-off-by: Haoqian He <haoqian.he@smartx.com>
Message-Id: <20250416024729.3289157-4-haoqian.he@smartx.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-05-14 05:39:15 -04:00
Philippe Mathieu-Daudé
12d1a768bd qom: Have class_init() take a const data argument
Mechanical change using gsed, then style manually adapted
to pass checkpatch.pl script.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250424194905.82506-4-philmd@linaro.org>
2025-04-25 17:00:41 +02:00
Antoine Damhet
e28fbd1c52 Revert "virtio-net: Copy received header to buffer"
This reverts commit 7987d2be5a.

The goal was to remove the need to patch the (const) input buffer
with a recomputed UDP checksum by copying headers to a RW region and
inject the checksum there. The patch computed the checksum only from the
header fields (missing the rest of the payload) producing an invalid one
and making guests fail to acquire a DHCP lease.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2727
Cc: qemu-stable@nongnu.org
Signed-off-by: Antoine Damhet <adamhet@scaleway.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20250408145345.142947-1-adamhet@scaleway.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-04-15 08:24:48 -04:00
Akihiko Odaki
c17ad4b11b virtio-net: Fix num_buffers for version 1
The specification says the device MUST set num_buffers to 1 if
VIRTIO_NET_F_MRG_RXBUF has not been negotiated.

Fixes: df91055db5 ("virtio-net: enable virtio 1.0")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250108-buffers-v1-1-a0c85ff31aeb@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-04-02 12:44:47 -04:00
Stefan Hajnoczi
b69801dd6b virtio,pc,pci: features, fixes, cleanups
Features:
 
 SR-IOV emulation for pci
 virtio-mem-pci support for s390
 interleave support for cxl
 big endian support for vdpa svq
 new QAPI events for vhost-user
 
 Also vIOMMU reset order fixups are in.
 Fixes, cleanups all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAme4b8sPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpHKcIAKPJsVqPdda2dJ7b7FdyRT0Q+uwezXqaGHd4
 7Lzih1wsxYNkwIAyPtEb76/21qiS7BluqlUCfCB66R9xWjP5/KfvAFj4/r4AEduE
 fxAgYzotNpv55zcRbcflMyvQ42WGiZZHC+o5Lp7vDXUP3pIyHrl0Ydh5WmcD+hwS
 BjXvda58TirQpPJ7rUL+sSfLih17zQkkDcfv5/AgorDy1wK09RBKwMx/gq7wG8yJ
 twy8eBY2CmfmFD7eTM+EKqBD2T0kwLEeLfS/F/tl5Fyg6lAiYgYtCbGLpAmWErsg
 XZvfZmwqL7CNzWexGvPFnnLyqwC33WUP0k0kT88Y5wh3/h98blw=
 =tej8
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pc,pci: features, fixes, cleanups

Features:

SR-IOV emulation for pci
virtio-mem-pci support for s390
interleave support for cxl
big endian support for vdpa svq
new QAPI events for vhost-user

Also vIOMMU reset order fixups are in.
Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

 # -----BEGIN PGP SIGNATURE-----
 #
 # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAme4b8sPHG1zdEByZWRo
 # YXQuY29tAAoJECgfDbjSjVRpHKcIAKPJsVqPdda2dJ7b7FdyRT0Q+uwezXqaGHd4
 # 7Lzih1wsxYNkwIAyPtEb76/21qiS7BluqlUCfCB66R9xWjP5/KfvAFj4/r4AEduE
 # fxAgYzotNpv55zcRbcflMyvQ42WGiZZHC+o5Lp7vDXUP3pIyHrl0Ydh5WmcD+hwS
 # BjXvda58TirQpPJ7rUL+sSfLih17zQkkDcfv5/AgorDy1wK09RBKwMx/gq7wG8yJ
 # twy8eBY2CmfmFD7eTM+EKqBD2T0kwLEeLfS/F/tl5Fyg6lAiYgYtCbGLpAmWErsg
 # XZvfZmwqL7CNzWexGvPFnnLyqwC33WUP0k0kT88Y5wh3/h98blw=
 # =tej8
 # -----END PGP SIGNATURE-----
 # gpg: Signature made Fri 21 Feb 2025 20:21:31 HKT
 # gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
 # gpg:                issuer "mst@redhat.com"
 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
 # gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
 # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
 #      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (41 commits)
  docs/devel/reset: Document reset expectations for DMA and IOMMU
  hw/vfio/common: Add a trace point in vfio_reset_handler
  hw/arm/smmuv3: Move reset to exit phase
  hw/i386/intel-iommu: Migrate to 3-phase reset
  hw/virtio/virtio-iommu: Migrate to 3-phase reset
  vhost-user-snd: correct the calculation of config_size
  net: vhost-user: add QAPI events to report connection state
  hw/virtio/virtio-nsm: Respond with correct length
  vdpa: Fix endian bugs in shadow virtqueue
  MAINTAINERS: add more files to `vhost`
  cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0()
  vhost-iova-tree: Update documentation
  vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees
  vhost-iova-tree: Implement an IOVA-only tree
  amd_iommu: Use correct bitmask to set capability BAR
  amd_iommu: Use correct DTE field for interrupt passthrough
  hw/virtio: reset virtio balloon stats on machine reset
  mem/cxl_type3: support 3, 6, 12 and 16 interleave ways
  hw/mem/cxl_type3: Ensure errp is set on realization failure
  hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-02-22 05:06:39 +08:00
Akihiko Odaki
bc82af6b0d hw/net: Fix NULL dereference with software RSS
When an eBPF program cannot be attached, virtio_net_load_ebpf() returns
false, and virtio_net_device_realize() enters the code path to handle
errors because of this, but it causes NULL dereference because no error
is generated.

Change virtio_net_load_ebpf() to return false only when a fatal error
occurred.

Fixes: b5900dff14 ("hw/net: report errors from failing to use eBPF RSS FDs")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-software-v1-1-9e5161b534d8@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-20 18:23:19 -05:00
Daniel P. Berrangé
407bc4bf90 qapi: Move include/qapi/qmp/ to include/qobject/
The general expectation is that header files should follow the same
file/path naming scheme as the corresponding source file. There are
various historical exceptions to this practice in QEMU, with one of
the most notable being the include/qapi/qmp/ directory. Most of the
headers there correspond to source files in qobject/.

This patch corrects most of that inconsistency by creating
include/qobject/ and moving the headers for qobject/ there.

This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h:
scripts/get_maintainer.pl now reports "QAPI" instead of "No
maintainers found".

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20241118151235.2665921-2-armbru@redhat.com>
[Rebased]
2025-02-10 15:33:16 +01:00
Laurent Vivier
60f543ad91 virtio-net: vhost-user: Implement internal migration
Add support of VHOST_USER_PROTOCOL_F_DEVICE_STATE in virtio-net
with vhost-user backend.

Cc: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20250115135044.799698-3-lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-15 13:07:34 -05:00
Stefan Hajnoczi
65cb7129f4 Accel & Exec patch queue
- Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander)
 - Add '-d invalid_mem' logging option (Zoltan)
 - Create QOM containers explicitly (Peter)
 - Rename sysemu/ -> system/ (Philippe)
 - Re-orderning of include/exec/ headers (Philippe)
   Move a lot of declarations from these legacy mixed bag headers:
     . "exec/cpu-all.h"
     . "exec/cpu-common.h"
     . "exec/cpu-defs.h"
     . "exec/exec-all.h"
     . "exec/translate-all"
   to these more specific ones:
     . "exec/page-protection.h"
     . "exec/translation-block.h"
     . "user/cpu_loop.h"
     . "user/guest-host.h"
     . "user/page-protection.h"
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t
 wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt
 KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K
 A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8
 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe///
 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r
 xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl
 VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay
 ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP
 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd
 +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6
 x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo=
 =cjz8
 -----END PGP SIGNATURE-----

Merge tag 'exec-20241220' of https://github.com/philmd/qemu into staging

Accel & Exec patch queue

- Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander)
- Add '-d invalid_mem' logging option (Zoltan)
- Create QOM containers explicitly (Peter)
- Rename sysemu/ -> system/ (Philippe)
- Re-orderning of include/exec/ headers (Philippe)
  Move a lot of declarations from these legacy mixed bag headers:
    . "exec/cpu-all.h"
    . "exec/cpu-common.h"
    . "exec/cpu-defs.h"
    . "exec/exec-all.h"
    . "exec/translate-all"
  to these more specific ones:
    . "exec/page-protection.h"
    . "exec/translation-block.h"
    . "user/cpu_loop.h"
    . "user/guest-host.h"
    . "user/page-protection.h"

 # -----BEGIN PGP SIGNATURE-----
 #
 # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t
 # wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt
 # KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K
 # A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8
 # 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe///
 # 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r
 # xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl
 # VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay
 # ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP
 # 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd
 # +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6
 # x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo=
 # =cjz8
 # -----END PGP SIGNATURE-----
 # gpg: Signature made Fri 20 Dec 2024 11:45:20 EST
 # gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
 # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [unknown]
 # gpg: WARNING: This key is not certified with a trusted signature!
 # gpg:          There is no indication that the signature belongs to the owner.
 # Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* tag 'exec-20241220' of https://github.com/philmd/qemu: (59 commits)
  util/qemu-timer: fix indentation
  meson: Do not define CONFIG_DEVICES on user emulation
  system/accel-ops: Remove unnecessary 'exec/cpu-common.h' header
  system/numa: Remove unnecessary 'exec/cpu-common.h' header
  hw/xen: Remove unnecessary 'exec/cpu-common.h' header
  target/mips: Drop left-over comment about Jazz machine
  target/mips: Remove tswap() calls in semihosting uhi_fstat_cb()
  target/xtensa: Remove tswap() calls in semihosting simcall() helper
  accel/tcg: Un-inline translator_is_same_page()
  accel/tcg: Include missing 'exec/translation-block.h' header
  accel/tcg: Move tcg_cflags_has/set() to 'exec/translation-block.h'
  accel/tcg: Restrict curr_cflags() declaration to 'internal-common.h'
  qemu/coroutine: Include missing 'qemu/atomic.h' header
  exec/translation-block: Include missing 'qemu/atomic.h' header
  accel/tcg: Declare cpu_loop_exit_requested() in 'exec/cpu-common.h'
  exec/cpu-all: Include 'cpu.h' earlier so MMU_USER_IDX is always defined
  target/sparc: Move sparc_restore_state_to_opc() to cpu.c
  target/sparc: Uninline cpu_get_tb_cpu_state()
  target/loongarch: Declare loongarch_cpu_dump_state() locally
  user: Move various declarations out of 'exec/exec-all.h'
  ...

Conflicts:
	hw/char/riscv_htif.c
	hw/intc/riscv_aplic.c
	target/s390x/cpu.c

	Apply sysemu header path changes to not in the pull request.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-12-21 11:07:00 -05:00
Philippe Mathieu-Daudé
32cad1ffb8 include: Rename sysemu/ -> system/
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.

Files renamed manually then mechanical change using sed tool.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
2024-12-20 17:44:56 +01:00
Richard Henderson
5fcabe628b include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LIST
Now that all of the Property arrays are counted, we can remove
the terminator object from each array.  Update the assertions
in device_class_set_props to match.

With struct Property being 88 bytes, this was a rather large
form of terminator.  Saves 30k from qemu-system-aarch64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-21-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-19 19:36:37 +01:00
Richard Henderson
e732f00f32 hw/net: Constify all Property
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-12-15 12:55:41 -06:00
Akihiko Odaki
9379ea9db3 virtio-net: Add queues before loading them
Call virtio_net_set_multiqueue() to add queues before loading their
states. Otherwise the loaded queues will not have handlers and elements
in them will not be processed.

Cc: qemu-stable@nongnu.org
Fixes: 8c49756825 ("virtio-net: Add only one queue pair when realizing")
Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-11-26 11:56:44 +08:00
Akihiko Odaki
7987d2be5a virtio-net: Copy received header to buffer
receive_header() used to cast the const qualifier of the pointer to the
received packet away to modify the header. Avoid this by copying the
received header to buffer.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-11-25 14:00:51 +08:00
Akihiko Odaki
17437418c4 virtio-net: Initialize hash reporting values
The specification says hash_report should be set to
VIRTIO_NET_HASH_REPORT_NONE if VIRTIO_NET_F_HASH_REPORT is negotiated
but not configured with VIRTIO_NET_CTRL_MQ_RSS_CONFIG. However,
virtio_net_receive_rcu() instead wrote out the content of the extra_hdr
variable, which is not uninitialized in such a case.

Fix this by zeroing the extra_hdr.

Fixes: e22f0603fb ("virtio-net: reference implementation of hash report")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-11-25 14:00:06 +08:00
Akihiko Odaki
1981fa9d7d virtio-net: Fix hash reporting when the queue changes
virtio_net_process_rss() fills the values used for hash reporting, but
the values used to be thrown away with a recursive function call if
the queue changes after RSS. Avoid the function call to keep the values.

Fixes: a4c960eedc ("virtio-net: Do not write hashes to peer buffer")
Buglink: https://issues.redhat.com/browse/RHEL-59572
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-11-25 14:00:04 +08:00
Akihiko Odaki
162bdb8113 virtio-net: Do not check for the queue before RSS
virtio_net_can_receive() checks if the queue is ready, but RSS will
change the queue to use so, strictly speaking, we may still be able to
receive the packet even if the queue initially provided is not ready.
Perform RSS before virtio_net_can_receive() to cover such a case.

Fixes: 4474e37a5b ("virtio-net: implement RX RSS processing")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-11-25 14:00:04 +08:00
Akihiko Odaki
a8575f7fb2 virtio-net: Fix size check in dhclient workaround
work_around_broken_dhclient() accesses IP and UDP headers to detect
relevant packets and to calculate checksums, but it didn't check if
the packet has size sufficient to accommodate them, causing out-of-bound
access hazards. Fix this by correcting the size requirement.

Fixes: 1d41b0c1ec ("Work around dhclient brokenness")
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-11-25 14:00:04 +08:00
Peter Maydell
5814c08467 hw/net/virtio-net.c: Don't assume IP length field is aligned
In virtio-net.c we assume that the IP length field in the packet is
aligned, and we copy its address into a uint16_t* in the
VirtioNetRscUnit struct which we then dereference later.  This isn't
a safe assumption; it will also result in compilation failures if we
mark the ip_header struct as QEMU_PACKED because the compiler will
not let you take the address of an unaligned struct field.

Make the ip_plen field in VirtioNetRscUnit a void*, and make all the
places where we read or write through that pointer instead use some
new accessor functions read_unit_ip_len() and write_unit_ip_len()
which account for the pointer being potentially unaligned and also do
the network-byte-order conversion we were previously using htons() to
perform.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241114141619.806652-2-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-11-18 13:45:45 +01:00
Akihiko Odaki
cd76e8fcbe virtio-net: Avoid indirection_table_mask overflow
We computes indirections_len by adding 1 to indirection_table_mask, but
it may overflow indirection_table_mask is UINT16_MAX. Check if
indirection_table_mask is small enough before adding 1.

Fixes: 590790297c ("virtio-net: implement RSS configuration command")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-10-29 15:53:29 +08:00
Daniel P. Berrangé
ae311fb315 hw/net: improve tracing of eBPF RSS setup
This adds more trace events to key eBPF RSS setup operations, and
also distinguishes events from multiple NIC instances.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-10-28 14:37:25 +08:00
Daniel P. Berrangé
b5900dff14 hw/net: report errors from failing to use eBPF RSS FDs
If the user/mgmt app passed in a set of pre-opened FDs for eBPF RSS,
then it is expecting QEMU to use them. Any failure to do so must be
considered a fatal error and propagated back up the stack, otherwise
deployment mistakes will not be detectable in a prompt manner. When
not using pre-opened FDs, then eBPF RSS is tried on a "best effort"
basis only and thus fallback to software RSS is valid.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-10-28 14:37:25 +08:00
Daniel P. Berrangé
00b69f1d86 ebpf: add formal error reporting to all APIs
The eBPF code is currently reporting error messages through trace
events. Trace events are fine for debugging, but they are not to be
considered the primary error reporting mechanism, as their output
is inaccessible to callers.

This adds an "Error **errp" parameter to all methods which have
important error scenarios to report to the caller.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-10-28 14:37:25 +08:00
Daniel P. Berrangé
493a2403c2 hw/net: fix typo s/epbf/ebpf/ in virtio-net
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-10-28 14:37:25 +08:00
Hanna Czenczek
00adced5c3 virtio: Allow .get_vhost() without vhost_started
Historically, .get_vhost() was probably only called when
vdev->vhost_started is true.  However, we now decidedly want to call it
also when vhost_started is false, specifically so we can issue a reset
to the vhost back-end while device operation is stopped.

Some .get_vhost() implementations dereference some pointers (or return
offsets from them) that are probably guaranteed to be non-NULL when
vhost_started is true, but not necessarily otherwise.  This patch makes
all such implementations check all such pointers, returning NULL if any
is NULL.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20240723163941.48775-2-hreitz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-09-10 14:27:56 -04:00
Nicholas Piggin
44bc14fa1e virtio-net: Use virtual time for RSC timers
Receive coalescing is visible to the target machine, so its timers
should use virtual time like other timers in virtio-net, to be
compatible with record-replay.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20240813050638.446172-10-npiggin@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240813202329.1237572-18-alex.bennee@linaro.org>
2024-08-16 14:04:19 +01:00
Nicholas Piggin
a0bf401b8e virtio-net: Use replay_schedule_bh_event for bhs that affect machine state
The regular qemu_bh_schedule() calls result in non-deterministic
execution of the bh in record-replay mode, which causes replay failure.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20240813050638.446172-9-npiggin@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240813202329.1237572-17-alex.bennee@linaro.org>
2024-08-16 14:04:19 +01:00
thomas
f937309fbd virtio-net: Fix network stall at the host side waiting for kick
Patch 06b1297017 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.

The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.

Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
   to int, it returns an opaque that represents the shadow_avail_idx
   of the virtqueue on success, else -1 on error.
2. Add a new API: virtio_queue_enable_notification_and_check(),
   it takes an opaque as input arg which is returned from
   virtqueue_get_avail_bytes(). It enables notification firstly,
   then checks whether the guest has added some buffers since
   last check of available buffers or not by virtio_queue_poll(),
   return ture if yes.

The patch also reverts patch "06b12970174".

The case below can reproduce the stall.

                                       Guest 0
                                     +--------+
                                     | iperf  |
                    ---------------> | server |
         Host       |                +--------+
       +--------+   |                    ...
       | iperf  |----
       | client |----                  Guest n
       +--------+   |                +--------+
                    |                | iperf  |
                    ---------------> | server |
                                     +--------+

Boot many guests from qemu with virtio network:
 qemu ... -netdev tap,id=net_x \
    -device virtio-net-pci-non-transitional,\
    iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x

Each guest acts as iperf server with commands below:
 iperf3 -s -D -i 10 -p 8001
 iperf3 -s -D -i 10 -p 8002

The host as iperf client:
 iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000

After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.

It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.

         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               |  need kick the host?
               |  NAPI continues
               v
         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               v
              ...           ...

On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.

Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.

The host may stall forever at the sequences below:

         Host                        Guest
     ------------                 -----------
 fetch buf, send packet           receive packet ---
         ...                          ...         |
 fetch buf, send packet             add buf       |
         ...                        add buf   virtnet_poll
    buf not enough      avail idx-> add buf       |
    read avail idx                  add buf       |
                                    add buf      ---
                                  receive packet ---
    write event idx                   ...         |
    wait for kick                   add buf   virtnet_poll
                                      ...         |
                                                 ---
                                 no more packet, exit NAPI

In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
 step 1: The buf is not enough, for example, a big packet
         needs 5 buf, but the available buf count is 3.
         The host read current avail idx.
 step 2: The guest adds some buf, then checks whether the
         host is waiting for kick signal, not at this time.
         The used ring is not empty, the guest continues
         the second loop of NAPI.
 step 3: The host writes the avail idx read from avail
         ring to used ring as event idx via
         virtio_queue_set_notification(q->rx_vq, 1).
 step 4: At the end of the second loop of NAPI, recheck
         whether kick is needed, as the event idx in the
         used ring written by the host is beyound the
         range of kick condition, the guest will not
         send kick signal to the host.

Fixes: 06b1297017 ("virtio-net: fix network stall under load")
Cc: qemu-stable@nongnu.org
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-08-02 11:09:52 +08:00
Akihiko Odaki
f1595ceb9a virtio-net: Ensure queue index fits with RSS
Ensure the queue index points to a valid queue when software RSS
enabled. The new calculation matches with the behavior of Linux's TAP
device with the RSS eBPF program.

Fixes: 4474e37a5b ("virtio-net: implement RX RSS processing")
Reported-by: Zhibin Hu <huzhibin5@huawei.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-08-02 11:08:44 +08:00
Dmitry Frolov
d4f471eb7e hw/net/virtio-net.c: fix crash in iov_copy()
A crash found while fuzzing device virtio-net-socket-check-used.
Assertion "offset == 0" in iov_copy() fails if less than guest_hdr_len bytes
were transmited.

Signed-off-by: Dmitry Frolov <frolov@swemel.ru>
Message-Id: <20240613143529.602591-2-frolov@swemel.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-01 17:16:05 -04:00
Alexey Dobriyan
2c3e4e2de6 virtio-net: drop too short packets early
Reproducer from https://gitlab.com/qemu-project/qemu/-/issues/1451
creates small packet (1 segment, len = 10 == n->guest_hdr_len),
then destroys queue.

"if (n->host_hdr_len != n->guest_hdr_len)" is triggered, if body creates
zero length/zero segment packet as there is nothing after guest header.

qemu_sendv_packet_async() tries to send it.

slirp discards it because it is smaller than Ethernet header,
but returns 0 because tx hooks are supposed to return total length of data.

0 is propagated upwards and is interpreted as "packet has been sent"
which is terrible because queue is being destroyed, nobody is waiting for TX
to complete and assert it triggered.

Fix is discard such empty packets instead of sending them.

Length 1 packets will go via different codepath:

	virtqueue_push(q->tx_vq, elem, 0);
	virtio_notify(vdev, q->tx_vq);
	g_free(elem);

and aren't problematic.

Signed-off-by: Alexey Dobriyan <adobriyan@yandex-team.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
a4c960eedc virtio-net: Do not write hashes to peer buffer
The peer buffer is qualified with const and not meant to be modified.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
13d40aa88b virtio-net: Always set populate_hash
The member is not cleared during reset so may have a stale value.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
0e07198ea3 virtio-net: Unify the logic to update NIC state for RSS
The code to attach or detach the eBPF program to RSS were duplicated so
unify them into one function to save some code.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
cef776c03a virtio-net: Disable RSS on reset
RSS is disabled by default.

Fixes: 590790297c ("virtio-net: implement RSS configuration command")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
942f420e5c virtio-net: Shrink header byte swapping buffer
Byte swapping is only performed for the part of header shared with the
legacy standard and the buffer only needs to cover it.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
ad57f700f4 virtio-net: Copy header only when necessary
The copied header is only used for byte swapping.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
8c49756825 virtio-net: Add only one queue pair when realizing
Multiqueue usage is not negotiated yet when realizing. If more than
one queue is added and the guest never requests to enable multiqueue,
the extra queues will not be deleted when unrealizing and leak.

Fixes: f9d6dbf0bf ("virtio-net: remove virtio queues if the guest doesn't support multiqueue")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
283be5966e virtio-net: Do not propagate ebpf-rss-fds errors
Propagating ebpf-rss-fds errors has several problems.

First, it makes device realization fail and disables the fallback to the
conventional eBPF loading.

Second, it leaks memory by making device realization fail without
freeing memory already allocated.

Third, the convention is to set an error when a function returns false,
but virtio_net_load_ebpf_fds() and virtio_net_load_ebpf() returns false
without setting an error, which is confusing.

Remove the propagation to fix these problems.

Fixes: 0524ea0510 ("ebpf: Added eBPF initialization by fds.")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
4b52d63249 tap: Remove qemu_using_vnet_hdr()
Since qemu_set_vnet_hdr_len() is always called when
qemu_using_vnet_hdr() is called, we can merge them and save some code.

For consistency, express that the virtio-net header is not in use by
returning 0 with qemu_get_vnet_hdr_len() instead of having a dedicated
function, qemu_get_using_vnet_hdr().

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:25 +08:00