SCSIDevice keeps track of in-flight requests for device reset and Task
Management Functions (TMFs). The request list requires protection so
that multi-threaded SCSI emulation can be implemented in commits that
follow.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-5-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Until now, a SCSIDevice's I/O requests have run in a single AioContext.
In order to support multiple IOThreads it will be necessary to move to
the concept of a per-SCSIRequest AioContext.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-4-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the past a single AioContext was used for block I/O and it was
fetched using blk_get_aio_context(). Nowadays the block layer supports
running I/O from any AioContext and multiple AioContexts at the same
time. Remove the dma_blk_io() AioContext argument and use the current
AioContext instead.
This makes calling the function easier and enables multiple IOThreads to
use dma_blk_io() concurrently for the same block device.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-3-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 71544d30a6 ("scsi: push request restart to SCSIDevice") removed
the only user of SCSIDiskState->bh.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-2-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qsd-migrate is currently only working for raw, qcow2 and qed.
Other formats are failing, e.g. because they don't support migration.
Thus let's limit this test to the three usable formats now.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250224214058.205889-1-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
aio_dispatch_handler() adds handlers to ctx->poll_aio_handlers if
polling should be enabled. If we call adjust_polling_time() for all
polling handlers before this, new polling handlers are still left at
poll->ns = 0 and polling is only actually enabled after the next event.
Move the adjust_polling_time() call after aio_dispatch_handler().
This fixes test-nested-aio-poll, which expects that polling becomes
effective the first time around.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311141912.135657-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Adaptive polling has a big problem: It doesn't consider that an event
loop can wait for many different events that may have very different
typical latencies.
For example, think of a guest that tends to send a new I/O request soon
after the previous I/O request completes, but the storage on the host is
rather slow. In this case, getting the new request from guest quickly
means that polling is enabled, but the next thing is performing the I/O
request on the backend, which is slow and disables polling again for the
next guest request. This means that in such a scenario, polling could
help for every other event, but is only ever enabled when it can't
succeed.
In order to fix this, keep a separate AioPolledEvent for each
AioHandler. We will then know that the backend file descriptor always
has a high latency and isn't worth polling for, but we also know that
the guest is always fast and we should poll for it. This solves at least
half of the problem, we can now keep polling for those cases where it
makes sense and get the improved performance from it.
Since the event loop doesn't know which event will be next, we still do
some unnecessary polling while we're waiting for the slow disk. I made
some attempts to be more clever than just randomly growing and shrinking
the polling time, and even to let callers be explicit about when they
expect a new event, but so far this hasn't resulted in improved
performance or even caused performance regressions. For now, let's just
fix the part that is easy enough to fix, we can revisit the rest later.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-6-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-5-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As a preparation for having multiple adaptive polling states per
AioContext, move the 'ns' field into a separate struct.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-4-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For block drivers that don't advertise FUA support, we already call
bdrv_co_flush(), which considers BDRV_O_NO_FLUSH. However, drivers that
do support FUA still see the FUA flag with BDRV_O_NO_FLUSH and get the
associated performance penalty that cache.no-flush=on was supposed to
avoid.
Clear FUA for write requests if BDRV_O_NO_FLUSH is set.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Until now, FUA was always emulated with a separate flush after the write
for file-posix. The overhead of processing a second request can reduce
performance significantly for a guest disk that has disabled the write
cache, especially if the host disk is already write through, too, and
the flush isn't actually doing anything.
Advertise support for REQ_FUA in write requests and implement it for
Linux AIO and io_uring using the RWF_DSYNC flag for write requests. The
thread pool still performs a separate fdatasync() call. This can be
improved later by using the pwritev2() syscall if available.
As an example, this is how fio numbers can be improved in some scenarios
with this patch (all using virtio-blk with cache=directsync on an nvme
block device for the VM, fio with ioengine=libaio,direct=1,sync=1):
| old | with FUA support
------------------------------+---------------+-------------------
bs=4k, iodepth=1, numjobs=1 | 45.6k iops | 56.1k iops
bs=4k, iodepth=1, numjobs=16 | 183.3k iops | 236.0k iops
bs=4k, iodepth=16, numjobs=1 | 258.4k iops | 311.1k iops
However, not all scenarios are clear wins. On another slower disk I saw
little to no improvment. In fact, in two corner case scenarios, I even
observed a regression, which I however consider acceptable:
1. On slow host disks in a write through cache mode, when the guest is
using virtio-blk in a separate iothread so that polling can be
enabled, and each completion is quickly followed up with a new
request (so that polling gets it), it can happen that enabling FUA
makes things slower - the additional very fast no-op flush we used to
have gave the adaptive polling algorithm a success so that it kept
polling. Without it, we only have the slow write request, which
disables polling. This is a problem in the polling algorithm that
will be fixed later in this series.
2. With a high queue depth, it can be beneficial to have flush requests
for another reason: The optimisation in bdrv_co_flush() that flushes
only once per write generation acts as a synchronisation mechanism
that lets all requests complete at the same time. This can result in
better batching and if the disk is very fast (I only saw this with a
null_blk backend), this can make up for the overhead of the flush and
improve throughput. In theory, we could optionally introduce a
similar artificial latency in the normal completion path to achieve
the same kind of completion batching. This is not implemented in this
series.
Compatibility is not a concern for the kernel side of io_uring, it has
supported RWF_DSYNC from the start. However, io_uring_prep_writev2() is
not available before liburing 2.2.
Linux AIO started supporting it in Linux 4.13 and libaio 0.3.111. The
kernel is not a problem for any supported build platform, so it's not
necessary to add runtime checks. However, openSUSE is still stuck with
an older libaio version that would break the build.
We must detect the presence of the writev2 functions in the user space
libraries at build time to avoid build failures.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-2-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I could have sworn I had this is a previous iteration of the patches
but I guess it got lost in a re-base. As we are going to call
vulkaninfo to probe for "bad" drivers we need to skip if the binary
isn't available.
Fixes: 9f7e493d11 (tests/functional: skip vulkan tests with nVidia)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250312190314.1632357-1-alex.bennee@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* Fixed endianness of VFIO device state packets
* Improved IGD passthrough support with legacy mode
* Improved build
* Added support for old AMD GPUs (x550)
* Updated property documentation
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmfQfQcACgkQUaNDx8/7
7KEUNw/+PjFpHrz5muQ8itkbyd36eJJdcxCl+9IPIWfnUfB582epkLcgvWyswGUo
krFTregoRG0PKtgZDtv95owGtVJOgK6XYFadGHiYkvvsb41twOYsP7/SuI+KMiEv
IDFLMvCTyorSIIoEF8i2EexfGPRV1VoWwvBoHgRRmYlzwzXnufjABpoZ0a25DTye
DQ4yhSfqoIh1gOcdL9tPictnZg9OxKr2ePXNdrtymtEIhg3ZobD3Jd8J4WCcsfKT
fxxBO5NsGgA8oM7i02fYN9kgMwqTnVhSAu1wq9PXsbrnNXam+trywAWSO6CjL+rV
++STWNSrRoHzuotRBr7BzrTpTFyQyfwBWqUT5L4NlhgXB3Xybk+M6Zj08Yva8pjE
w78JQKvKp54gU34AWBW0/J6+u3v+iE8l1Eywx6xueF9Q+YSUDeW9B1LDdjFJryhF
d8j3J+vuglbdsp05D+tVErf5cqFvFDfrjTkXkZNtmx7wky45XS9ZvNazYW1KI3f9
bg8Wjb7ZujuvxpSjycPRZzdKa8kqSgSZg7fg91Wimiy1Iqe3SZVVWNchLYiPp8Dm
nXMfOEpVHQZ1vzeo7dVWyxu9Y1ujgvUQy8kMa9q2W2S7HQ5Sna79n7eMVJxqZQ4G
m0ETFToOcPPOnZBWgqNOSUlSQncFuIVgNTDvycQ9dMhGorYcBDI=
=Vh0m
-----END PGP SIGNATURE-----
Merge tag 'pull-vfio-20250311' of https://github.com/legoater/qemu into staging
vfio queue:
* Fixed endianness of VFIO device state packets
* Improved IGD passthrough support with legacy mode
* Improved build
* Added support for old AMD GPUs (x550)
* Updated property documentation
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmfQfQcACgkQUaNDx8/7
# 7KEUNw/+PjFpHrz5muQ8itkbyd36eJJdcxCl+9IPIWfnUfB582epkLcgvWyswGUo
# krFTregoRG0PKtgZDtv95owGtVJOgK6XYFadGHiYkvvsb41twOYsP7/SuI+KMiEv
# IDFLMvCTyorSIIoEF8i2EexfGPRV1VoWwvBoHgRRmYlzwzXnufjABpoZ0a25DTye
# DQ4yhSfqoIh1gOcdL9tPictnZg9OxKr2ePXNdrtymtEIhg3ZobD3Jd8J4WCcsfKT
# fxxBO5NsGgA8oM7i02fYN9kgMwqTnVhSAu1wq9PXsbrnNXam+trywAWSO6CjL+rV
# ++STWNSrRoHzuotRBr7BzrTpTFyQyfwBWqUT5L4NlhgXB3Xybk+M6Zj08Yva8pjE
# w78JQKvKp54gU34AWBW0/J6+u3v+iE8l1Eywx6xueF9Q+YSUDeW9B1LDdjFJryhF
# d8j3J+vuglbdsp05D+tVErf5cqFvFDfrjTkXkZNtmx7wky45XS9ZvNazYW1KI3f9
# bg8Wjb7ZujuvxpSjycPRZzdKa8kqSgSZg7fg91Wimiy1Iqe3SZVVWNchLYiPp8Dm
# nXMfOEpVHQZ1vzeo7dVWyxu9Y1ujgvUQy8kMa9q2W2S7HQ5Sna79n7eMVJxqZQ4G
# m0ETFToOcPPOnZBWgqNOSUlSQncFuIVgNTDvycQ9dMhGorYcBDI=
# =Vh0m
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 12 Mar 2025 02:12:23 HKT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full]
# gpg: aka "Cédric Le Goater <clg@kaod.org>" [full]
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-vfio-20250311' of https://github.com/legoater/qemu: (21 commits)
vfio/pci: Drop debug commentary from x-device-dirty-page-tracking
vfio/pci-quirks: Exclude non-ioport BAR from ATI quirk
hw/vfio: Compile display.c once
hw/vfio: Compile iommufd.c once
hw/vfio: Compile more objects once
hw/vfio: Compile some common objects once
hw/vfio/common: Get target page size using runtime helpers
hw/vfio/common: Include missing 'system/tcg.h' header
hw/vfio/spapr: Do not include <linux/kvm.h>
system: Declare qemu_[min/max]rampagesize() in 'system/hostmem.h'
vfio/migration: Use BE byte order for device state wire packets
vfio/igd: Fix broken KVMGT OpRegion support
vfio/igd: Introduce x-igd-lpc option for LPC bridge ID quirk
vfio/igd: Handle x-igd-opregion option in config quirk
vfio/igd: Decouple common quirks from legacy mode
vfio/igd: Refactor vfio_probe_igd_bar4_quirk into pci config quirk
vfio/pci: Add placeholder for device-specific config space quirks
vfio/igd: Move LPC bridge initialization to a separate function
vfio/igd: Consolidate OpRegion initialization into a single function
vfio/igd: Do not include GTT stolen size in etc/igd-bdsm-size
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Assets are uniquely identified by human-readable-ish url, so make an
AssetError exception class that prints url with error message.
A property 'transient' is used to capture whether the client may retry
or try again later, or if it is a serious and likely permanent error.
This is used to retain the existing behaviour of treating HTTP errors
other than 404 as 'transient' and not causing precache step to fail.
Additionally, partial-downloads and stale asset caches that fail to
resolve after the retry limit are now treated as transient and do not
cause precache step to fail.
For background: The NetBSD archive is, at the time of writing, failing
with short transfer. Retrying the fetch at that position (as wget does)
results in a "503 backend unavailable" error. We would like to get that
error code directly, but I have not found a way to do that with urllib,
so treating the short-copy as a transient failure covers that case (and
seems like a reasonable way to handle it in general).
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250312130002.945508-4-npiggin@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
If the server provides a Content-Length header, use that to verify the
size of the downloaded file. This catches cases where the connection
terminates early, and gives the opportunity to retry. Without this, the
checksum will likely mismatch and fail without retry.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250312130002.945508-3-npiggin@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Currently the fetch code does not fail gracefully when retry limit is
exceeded, it just falls through the loop with no file, which ends up
hitting other errors.
Add a check for non-existing file, which indicates the retry limit was
exceeded.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250312130002.945508-2-npiggin@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The tests have been converted to the functional framework, so
we should not talk about Avocado here anymore.
Fixes: f7d6b77220 ("tests/functional: Convert BananaPi tests to the functional framework")
Fixes: 380f7268b7 ("tests/functional: Convert the OrangePi tests to the functional framework")
Fixes: 4c0a2df81c ("tests/functional: Convert some tests that download files via fetch_asset()")
Message-ID: <20250311160847.388670-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
On my machine the arm_replay test takes over 2 minutes to run
in a config with Rust enabled and debug enabled:
$ time (cd build/rust ; PYTHONPATH=../../python:../../tests/functional
QEMU_TEST_QEMU_BINARY=./qemu-system-arm ./pyvenv/bin/python3
../../tests/functional/test_arm_replay.py)
TAP version 13
ok 1 test_arm_replay.ArmReplay.test_cubieboard
ok 2 test_arm_replay.ArmReplay.test_vexpressa9
ok 3 test_arm_replay.ArmReplay.test_virt
1..3
real 2m16.564s
user 2m13.461s
sys 0m3.523s
Bump up the timeout to 4 minutes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250310102830.3752440-1-peter.maydell@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When commit 72cdd672e1 extended the ppc64 e500 test to add network
support, it forgot to require the 'user' netdev backend. Fix that.
Fixes: 72cdd672e1 ("tests/functional: Replace the ppc64 e500 advent calendar test")
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250308071328.193694-1-clg@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This was missed at the time.
Fixes: 812b31d3f9 ("configs: rename default-configs to configs and reorganise")
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250306174113.427116-1-groug@kaod.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
All instances of TYPE_IMX_USDHC set vendor=SDHCI_VENDOR_IMX.
No need to special-case it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20250308213640.13138-3-philmd@linaro.org>
Allows SYNDBG definitions to be available for common compilation units.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250307215623.524987-5-pierrick.bouvier@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Rather than checking ACPI availability at compile time by
checking the CONFIG_ACPI definition from CONFIG_DEVICES,
check at runtime via acpi_builtin().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20250307223949.54040-5-philmd@linaro.org>
Define acpi_tables / acpi_tables_len stubs, then replace the
compile-time CONFIG_ACPI check in fw_cfg.c by a runtime one.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20250307223949.54040-4-philmd@linaro.org>
acpi_builtin() can be used to check at runtime whether
the ACPI subsystem is built in a qemu-system binary.
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250307223949.54040-3-philmd@linaro.org>
qemu_arch_available() is a bit simpler to understand while
reviewing than the undocumented arch_type variable.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250305005225.95051-5-philmd@linaro.org>
We shouldn't use target specific globals for machine properties.
These ones could be desugarized, as explained in [*]. While
certainly doable, not trivial nor my priority for now. Just move
them to a different file to clarify they are *globals*, like the
generic globals residing in system/globals.c.
Since arch_init.c was introduced using the MIT license (see commit
ad96090a01), retain the same license for the new globals-target.c
file.
[*] https://lore.kernel.org/qemu-devel/e514d6db-781d-4afe-b057-9046c70044dc@redhat.com/
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250305005225.95051-2-philmd@linaro.org>
There is no TARGET_ARM_64 definition. Luckily enough,
when TARGET_AARCH64 is defined, TARGET_ARM also is.
Fixes: 733766cd37 ("hw/arm: introduce xenpvh machine")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250305153929.43687-2-philmd@linaro.org>
For accesses to the 91c111 data register, the address within the
packet's data frame is determined by a combination of the pointer
register and the offset used to access the data register, so that you
can access data at effectively wider than byte width. The pointer
register's pointer field is 11 bits wide, which is exactly the size
to index a 2048-byte data frame.
We weren't quite getting the logic right for ensuring that we end up
with a pointer value to use in the s->data[][] array that isn't out
of bounds:
* we correctly mask when getting the initial pointer value
* for the "autoincrement the pointer register" case, we
correctly mask after adding 1 so that the pointer register
wraps back around at the 2048 byte mark
* but for the non-autoincrement case where we have to add the
low 2 bits of the data register offset, we don't account
for the possibility that the pointer register is 0x7ff
and the addition should wrap
Fix this bug by factoring out the "get the p value to use as an array
index" into a function, making it use FIELD macro names rather than
hard-coded constants, and having a utility function that does "add a
value and wrap it" that we can use both for the "autoincrement" and
"add the offset bits" codepaths.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2758
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250228191652.1957208-1-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Now we have a constant for the maximum packet size, we can use it
to replace various hardcoded 2048 values.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250228174802.1945417-4-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
When the smc91c111 transmits a packet, it must read a control byte
which is at the end of the data area and CRC. However, we don't
sanitize the length field in the packet buffer, so if the guest sets
the length field to something large we will try to read past the end
of the packet data buffer when we access the control byte.
As usual, the datasheet says nothing about the behaviour of the
hardware if the guest misprograms it in this way. It says only that
the maximum valid length is 2048 bytes. We choose to log the guest
error and silently drop the packet.
This requires us to factor out the "mark the tx packet as complete"
logic, so we can call it for this "drop packet" case as well as at
the end of the loop when we send a valid packet.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2742
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250228174802.1945417-3-peter.maydell@linaro.org>
[PMD: Update smc91c111_do_tx() as len > MAX_PACKET_SIZE]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The smc91c111 uses packet numbers as an index into its internal
s->data[][] array. Valid packet numbers are between 0 and 3, but
the code does not generally check this, and there are various
places where the guest can hand us an arbitrary packet number
and cause an out-of-bounds access to the data array.
Add validation of packet numbers. The datasheet is not very
helpful about how guest errors like this should be handled:
it says nothing on the subject, and none of the documented
error conditions are relevant. We choose to log the situation
with LOG_GUEST_ERROR and silently ignore the attempted operation.
In the places where we are about to access the data[][] array
using a packet number and we know the number is valid because
we got it from somewhere that has already validated, we add
an assert() to document that belief.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250228174802.1945417-2-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The implementation just allows Linux to determine date and time.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20250223114708.1780-19-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The interrupt enable registers are not reset to 0 on Freescale eSDHC
but some bits are enabled on reset. At least some U-Boot versions seem
to expect this and not initialise these registers before expecting
interrupts. Use existing vendor property for Freescale eSDHC and set
the reset value of the interrupt registers to match Freescale
documentation.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-ID: <20250210160329.DDA7F4E600E@zero.eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The intent behind the x-device-dirty-page-tracking option is twofold:
1) development/testing in the presence of VFs with VF dirty page tracking
2) deliberately choosing platform dirty tracker over the VF one.
Item 2) scenario is useful when VF dirty tracker is not as fast as
IOMMU, or there's some limitations around it (e.g. number of them is
limited; aggregated address space under tracking is limited),
efficiency/scalability (e.g. 1 pagetable in IOMMU dirty tracker to scan
vs N VFs) or just troubleshooting. Given item 2 it is not restricted to
debugging, hence drop the debug parenthesis from the option description.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250311174807.79825-1-joao.m.martins@oracle.com
[ clg: Fixed subject spelling ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The ATI BAR4 quirk is targeting an ioport BAR. Older devices may
have a BAR4 which is not an ioport, causing a segfault here. Test
the BAR type to skip these devices.
Similar to
"8f419c5b: vfio/pci-quirks: Exclude non-ioport BAR from NVIDIA quirk"
Untested, as I don't have the card to test.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2856
Signed-off-by: Vasilis Liaskovitis <vliaskovitis@suse.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250310235833.41026-1-vliaskovitis@suse.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
display.c doesn't rely on target specific definitions,
move it to system_ss[] to build it once.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-8-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-9-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Removing unused "exec/ram_addr.h" header allow to compile
iommufd.c once for all targets.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-6-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-8-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
These files depend on the VFIO symbol in their Kconfig
definition. They don't rely on target specific definitions,
move them to system_ss[] to build them once.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-5-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-7-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Prefer runtime helpers to get target page size.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250305153929.43687-3-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-5-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
<linux/kvm.h> is already included by "system/kvm.h" in the next line.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250307180337.14811-3-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-3-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Both qemu_minrampagesize() and qemu_maxrampagesize() are
related to host memory backends, having the following call
stack:
qemu_minrampagesize()
-> find_min_backend_pagesize()
-> object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)
qemu_maxrampagesize()
-> find_max_backend_pagesize()
-> object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)
Having TYPE_MEMORY_BACKEND defined in "system/hostmem.h":
include/system/hostmem.h:23:#define TYPE_MEMORY_BACKEND "memory-backend"
Move their prototype declaration to "system/hostmem.h".
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-7-philmd@linaro.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-2-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Wire data commonly use BE byte order (including in the existing migration
protocol), use it also for for VFIO device state packets.
This will allow VFIO multifd device state transfer between hosts with
different endianness.
Although currently there is no such use case, it's good to have it now
for completeness.
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/dcfc04cc1a50655650dbac8398e2742ada84ee39.1741611079.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The KVMGT/GVT-g vGPU also exposes OpRegion. But unlike IGD passthrough,
it only needs the OpRegion quirk. A previous change moved x-igd-opregion
handling to config quirk breaks KVMGT functionality as it brings extra
checks and applied other quirks. Here we check if the device is mdev
(KVMGT) or not (passthrough), and then applies corresponding quirks.
As before, users must manually specify x-igd-opregion=on to enable it
on KVMGT devices. In the future, we may check the VID/DID and enable
OpRegion automatically.
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-11-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>