Commit Graph

120494 Commits

Author SHA1 Message Date
Hanna Czenczek
dc087a9859 vhost: Do not abort on log-start error
Commit 3688fec892 ("memory: Add Error** argument to .log_global_start()
handler") enabled vhost_log_global_start() to return a proper error, but
did not change it to do so; instead, it still aborts the whole process
on error.

This crash can be reproduced by e.g. killing a virtiofsd daemon before
initiating migration.  In such a case, qemu should not crash, but just
make the attempted migration fail.

Buglink: https://issues.redhat.com/browse/RHEL-94534
Reported-by: Tingting Mao <timao@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20250724125928.61045-2-hreitz@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit c1997099dc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-08-02 08:12:45 +03:00
Jonah Palmer
f8af005037 virtio: fix off-by-one and invalid access in virtqueue_ordered_fill
Commit b44135daa3 introduced virtqueue_ordered_fill for
VIRTIO_F_IN_ORDER support but had a few issues:

* Conditional while loop used 'steps <= max_steps' but should've been
  'steps < max_steps' since reaching steps == max_steps would indicate
  that we didn't find an element, which is an error. Without this
  change, the code would attempt to read invalid data at an index
  outside of our search range.

* Incremented 'steps' using the next chain's ndescs instead of the
  current one.

This patch corrects the loop bounds and synchronizes 'steps' and index
increments.

We also add a defensive sanity check against malicious or invalid
descriptor counts to avoid a potential infinite loop and DoS.

Fixes: b44135daa3 ("virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support")
Reported-by: terrynini <terrynini38514@gmail.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20250721150208.2409779-1-jonah.palmer@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 6fcf5ebafa)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-08-02 08:11:22 +03:00
Bibo Mao
d6b2796a28 target/loongarch: Fix valid virtual address checking
On LoongArch64 system, the high 32 bit of 64 bit virtual address should be
0x00000[0-7]yyy or 0xffff8yyy. The bit from 47 to 63 should be all 0 or
all 1.

Function get_physical_address() only checks bit 48 to 63, there will be
problem with the following test case. On physical machine, there is bus
error report and program exits abnormally. However on qemu TCG system
emulation mode, the program runs normally. The virtual address
0xffff000000000000ULL + addr and addr are treated the same on TLB entry
checking. This patch fixes this issue.

void main()
{
        void *addr, *addr1;
        int val;

        addr = malloc(100);
        *(int *)addr = 1;
        addr1 = 0xffff000000000000ULL + addr;
        val = *(int *)addr1;
        printf("val %d \n", val);
}

Cc: qemu-stable@nongnu.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-ID: <20250714015446.746163-1-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
(cherry picked from commit caab7ac835)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-08-02 08:09:19 +03:00
Jay Chang
93890300d8 target/riscv: Restrict midelegh access to S-mode harts
RISC-V AIA Spec states:
"For a machine-level environment, extension Smaia encompasses all added
CSRs and all modifications to interrupt response behavior that the AIA
specifies for a hart, over all privilege levels. For a supervisor-level
environment, extension Ssaia is essentially the same as Smaia except
excluding the machine-level CSRs and behavior not directly visible to
supervisor level."

Since midelegh is an AIA machine-mode CSR, add Smaia extension check in
aia_smode32 predicate.

Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Jay Chang <jay.chang@sifive.com>
Reviewed-by: Nutty Liu<liujingqi@lanxincomputing.com>
Message-ID: <20250701030021.99218-3-jay.chang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 86bc3a0abf)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-31 09:06:44 +03:00
Jay Chang
f45d7aaf67 target/riscv: Restrict mideleg/medeleg/medelegh access to S-mode harts
RISC-V Privileged Spec states:
"In harts with S-mode, the medeleg and mideleg registers must exist, and
setting a bit in medeleg or mideleg will delegate the corresponding trap
, when occurring in S-mode or U-mode, to the S-mode trap handler. In
harts without S-mode, the medeleg and mideleg registers should not
exist."

Add smode predicate to ensure these CSRs are only accessible when S-mode
is supported.

Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Jay Chang <jay.chang@sifive.com>
Reviewed-by: Nutty Liu<liujingqi@lanxincomputing.com>
Message-ID: <20250701030021.99218-2-jay.chang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit e443ba0336)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-31 09:06:34 +03:00
Yang Jialong
3c28bee827 intc/riscv_aplic: Fix target register read when source is inactive
The RISC-V Advanced interrupt Architecture:
4.5.16. Interrupt targets:
If interrupt source i is inactive in this domain, register target[i] is
read-only zero.

Signed-off-by: Yang Jialong <z_bajeer@yeah.net>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250728055114.252024-1-z_bajeer@yeah.net>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit b6f1244678)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-31 09:06:18 +03:00
Vac Chen
94c441fe10 target/riscv: Fix pmp range wraparound on zero
pmp_is_in_range() prefers to match addresses within the interval
[start, end]. To archieve this, pmpaddrX is decremented during the end
address update.

In TOR mode, a rule is ignored if its start address is greater than or
equal to its end address.

However, if pmpaddrX is set to 0, this decrement operation causes the
calulated end address to wrap around to UINT_MAX. In this scenario, the
address guard for this PMP entry would become ineffective.

This patch addresses the issue by moving the guard check earlier,
preventing the problematic wraparound when pmpaddrX is zero.

Signed-off-by: Vac Chen <vacantron@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250706065554.42953-1-vacantron@gmail.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 77707bfdf8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-30 21:17:15 +03:00
Xu Lu
423a91255c target/riscv: Fix exception type when VU accesses supervisor CSRs
When supervisor CSRs are accessed from VU-mode, a virtual instruction
exception should be raised instead of an illegal instruction.

Fixes: c1fbcecb3a (target/riscv: Fix csr number based privilege checking)
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Message-ID: <20250708060720.7030-1-luxu.kernel@bytedance.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 30ef718423)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-30 21:00:58 +03:00
Daniel Henrique Barboza
093997ee90 target/riscv: do not call GETPC() in check_ret_from_m_mode()
GETPC() should always be called from the top level helper, e.g. the
first helper that is called by the translation code. We stopped doing
that in commit 3157a553ec, and then we introduced problems when
unwinding the exceptions being thrown by helper_mret(), as reported by
[1].

Call GETPC() at the top level helper and pass the value along.

[1] https://gitlab.com/qemu-project/qemu/-/issues/3020

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Fixes: 3157a553ec ("target/riscv: Add Smrnmi mnret instruction")
Closes: https://gitlab.com/qemu-project/qemu/-/issues/3020
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250714133739.1248296-1-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 16aa7771af)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-30 20:59:31 +03:00
Daniel Henrique Barboza
d1baf44157 linux-user/strace.list: add riscv_hwprobe entry
We're missing a strace entry for riscv_hwprobe, and using -strace will
report it as "Unknown syscall 258".

After this patch we'll have:

$ ./build/qemu-riscv64 -strace test_mutex_riscv
110182 riscv_hwprobe(0x7f207efdc700,1,0,0,0,0) = 0
110182 brk(NULL) = 0x0000000000082000
(...)

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250728170633.113384-1-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit e111ffe48b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-30 20:58:53 +03:00
Michael Tokarev
5a2e67526f roms/Makefile: fix npcmNxx_bootrom build rules
Since commit 70ce076fa6, the actual rom source dirs
are subdirs of vbootrom/ submodule, not in top-level of it.

Fixes: 70ce076fa6 "roms: Update vbootrom to 1287b6e"
Fixes: 269b7effd9 ("pc-bios: Add NPCM8XX vBootrom")

Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250727215511.807880-1-mjt@tls.msk.ru>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 653a75a9d7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-29 20:12:06 +03:00
Pierrick Bouvier
58741ac6ef system/physmem: fix use-after-free with dispatch
A use-after-free bug was reported when booting a Linux kernel during the
pci setup phase. It's quite hard to reproduce (needs smp, and favored by
having several pci devices with BAR and specific Linux config, which
is Debian default one in this case).

After investigation (see the associated bug ticket), it appears that,
under specific conditions, we might access a cached AddressSpaceDispatch
that was reclaimed by RCU thread meanwhile.
In the Linux boot scenario, during the pci phase, memory region are
destroyed/recreated, resulting in exposition of the bug.

The core of the issue is that we cache the dispatch associated to
current cpu in cpu->cpu_ases[asidx].memory_dispatch. It is updated with
tcg_commit, which runs asynchronously on a given cpu.
At some point, we leave the rcu critial section, and the RCU thread
starts reclaiming it, but tcg_commit is not yet invoked, resulting in
the use-after-free.

It's not the first problem around this area, and commit 0d58c66068 [1]
("softmmu: Use async_run_on_cpu in tcg_commit") already tried to
address it. It did a good job, but it seems that we found a specific
situation where it's not enough.

This patch takes a simple approach: remove the cached value creating the
issue, and make sure we always get the current mapping for address
space, using address_space_to_dispatch(cpu->cpu_ases[asidx].as).
It's equivalent to qatomic_rcu_read(&as->current_map)->dispatch;
This is not really costly, we just need two dereferences,
including one atomic (rcu) read, which is negligible considering we are
already on mmu slow path anyway.

Note that tcg_commit is still needed, as it's taking care of flushing
TLB, removing previously mapped entries.

Another solution would be to cache directly values under the dispatch
(dispatch themselves are not ref counted), keep an active reference on
associated memory section, and release it when appropriate (tricky).
Given the time already spent debugging this area now and previously, I
strongly prefer eliminating the root of the issue, instead of adding
more complexity for a hypothetical performance gain. RCU is precisely
used to ensure good performance when reading data, so caching is not as
beneficial as it might seem IMHO.

[1] 0d58c66068

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3040
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20250724161142.2803091-1-pierrick.bouvier@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 2865bf1c57)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-29 20:10:21 +03:00
Luc Michel
eb1adad5f4 hw/net/cadence_gem: fix register mask initialization
The gem_init_register_masks function was called at init time but it
relies on the num-priority-queues property. Call it at realize time
instead.

Cc: qemu-stable@nongnu.org
Fixes: 4c70e32f05 ("net: cadence_gem: Define access permission for interrupt registers")
Signed-off-by: Luc Michel <luc.michel@amd.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
Reviewed-by: Sai Pavan Boddu <sai.pavan.boddu@amd.com>
Message-ID: <20250716095432.81923-2-luc.michel@amd.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 2bfcd27e00)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-29 20:05:29 +03:00
Philippe Mathieu-Daudé
3891ba6581 target/mips: Only update MVPControl.EVP bit if executed by master VPE
According to the 'MIPS MT Application-Specific Extension' manual:

  If the VPE executing the instruction is not a Master VPE,
  with the MVP bit of the VPEConf0 register set, the EVP bit
  is unchanged by the instruction.

Modify the DVPE/EVPE opcodes to only update the MVPControl.EVP bit
if executed on a master VPE.

Cc: qemu-stable@nongnu.org
Reported-by: Hansni Bu
Buglink: https://bugs.launchpad.net/qemu/+bug/1926277
Fixes: f249412c74 ("mips: Add MT halting and waking of VPEs")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-ID: <20210427133343.159718-1-f4bug@amsat.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit e895095c78)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-29 20:03:24 +03:00
Alex Bennée
2da4045a3c docs/user: clarify user-mode expects the same OS
While we somewhat cover this later when we talk about supported
operating systems make it clear in the front matter.

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250725154517.3523095-2-alex.bennee@linaro.org>
(cherry picked from commit 8d6c7de1cc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-28 19:12:17 +03:00
Peter Maydell
309f46d077 linux-user/aarch64: Support TPIDR2_MAGIC signal frame record
FEAT_SME adds the TPIDR2 userspace-accessible system register, which
is used as part of the procedure calling standard's lazy saving
scheme for the ZA registers:
 https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#66the-za-lazy-saving-scheme

The Linux kernel has a signal frame record for saving
and restoring this value when calling signal handlers, but
we forgot to implement this. The result is that code which
tries to unwind an exception out of a signal handler will
not work correctly.

Add support for the missing record.

Cc: qemu-stable@nongnu.org
Fixes: 78011586b9 ("target/arm: Enable SME for user-only")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250725175510.3864231-3-peter.maydell@linaro.org>
(cherry picked from commit 99870aff90)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-28 19:09:07 +03:00
Peter Maydell
7764278a30 linux-user/aarch64: Clear TPIDR2_EL0 when delivering signals
A recent change to the kernel (Linux commit b376108e1f88
"arm64/fpsimd: signal: Clear TPIDR2 when delivering signals") updated
the signal-handler entry code to always clear TPIDR2_EL0.

This is necessary for the userspace ZA lazy saving scheme to work
correctly when unwinding exceptions across a signal boundary.
(For the essay-length description of the incorrect behaviour and
why this is the correct fix, see the commit message for the
kernel commit.)

Make QEMU also clear TPIDR2_EL0 on signal entry, applying the
equivalent bugfix to our implementation.

Note that getting this unwinding to work correctly also requires
changes to the userspace code, e.g.  as implemented in gcc in
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b5ffc8e75a8

This change is technically an ABI change; from the kernel's
point of view SME was never enabled (it was hidden behind
CONFIG_BROKEN) before the change. From QEMU's point of view
our SME-related signal handling was broken anyway as we weren't
saving and restoring TPIDR2_EL0.

Cc: qemu-stable@nongnu.org
Fixes: 78011586b9 ("target/arm: Enable SME for user-only")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250725175510.3864231-2-peter.maydell@linaro.org>
(cherry picked from commit 3cdd990aa9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-28 19:06:12 +03:00
Paolo Bonzini
3b5ea01387 target/i386: fix width of third operand of VINSERTx128
Table A-5 of the Intel manual incorrectly lists the third operand of
VINSERTx128 as Wqq, but it is actually a 128-bit value.  This is
visible when W is a memory operand close to the end of the page.

Fixes the recently-added poly1305_kunit test in linux-next.

(No testcase yet, but I plan to modify test-avx2 to use memory
close to the end of the page.  This would work because the test
vectors correctly have the memory operand as xmm2/m128).

Reported-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: qemu-stable@nongnu.org
Fixes: 7906847768 ("target/i386: reimplement 0x0f 0x3a, add AVX", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit feea87cd6b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-28 18:58:59 +03:00
Michael Tokarev
4d7cc3805a hw/display/qxl-render.c: fix qxl_unpack_chunks() chunk size calculation
In case of multiple chunks, code in qxl_unpack_chunks() takes size of the
wrong (next in the chain) chunk, instead of using current chunk size.
This leads to wrong number of bytes being copied, and to crashes if next
chunk size is larger than the current one.

Based on the code by Gao Yong.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1628
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit b8882becd5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-28 18:58:26 +03:00
Peter Maydell
ddedb92861 host-utils: Drop workaround for buggy Apple Clang __builtin_subcll()
In commit b0438861ef ("host-utils: Avoid using __builtin_subcll on
buggy versions of Apple Clang") we added a workaround for a bug in
Apple Clang 14 where its __builtin_subcll() implementation was wrong.
This bug was only present in Apple Clang 14, not in upstream clang,
and is not present in Apple Clang versions 15 and newer.

Since commit 4e035201 we have required at least Apple Clang 15, so we
no longer build with the buggy versions.  We can therefore drop the
workaround. This is effectively a revert of b0438861ef.

This should not be backported to stable branches, which may still
need to support Apple Clang 14.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3030
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250714145033.1908788-1-peter.maydell@linaro.org
(cherry picked from commit e74aad9f81)
(Mjt: 10.0 already requres clang 15+, so it's okay to backport
 this change to 10.0 branch)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-23 11:47:29 +03:00
Michael Tokarev
66d21643c2 Update version for 10.0.3 release
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-22 20:46:10 +03:00
Zenghui Yu
a5ac9803d5 hvf: arm: Emulate ICC_RPR_EL1 accesses properly
Commit a2260983c6 ("hvf: arm: Add support for GICv3") added GICv3 support
by implementing emulation for a few system registers. ICC_RPR_EL1 was
defined but not plugged in the sysreg handlers (for no good reason).

Fix it.

Fixes: a2260983c6 ("hvf: arm: Add support for GICv3")
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250714160139.10404-3-zenghui.yu@linux.dev
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit e6da704b71)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-21 22:11:49 +03:00
Peter Maydell
3341f5cd5c target/arm: Correct encoding of Debug Communications Channel registers
We don't implement the Debug Communications Channel (DCC), but
we do attempt to provide dummy versions of its system registers
so that software that tries to access them doesn't fall over.

However, we got the tx/rx register definitions wrong. These
should be:

AArch32:
  DBGDTRTX   p14 0 c0 c5 0  (on writes)
  DBGDTRRX   p14 0 c0 c5 0  (on reads)

AArch64:
  DBGDTRTX_EL0  2 3 0 5 0 (on writes)
  DBGDTRRX_EL0  2 3 0 5 0 (on reads)
  DBGDTR_EL0    2 3 0 4 0 (reads and writes)

where DBGDTRTX and DBGDTRRX are effectively different names for the
same 32-bit register, which has tx behaviour on writes and rx
behaviour on reads.  The AArch64-only DBGDTR_EL0 is a 64-bit wide
register whose top and bottom halves map to the DBGDTRRX and DBGDTRTX
registers.

Currently we have just one cpreg struct, which:
 * calls itself DBGDTR_EL0
 * uses the DBGDTRTX_EL0/DBGDTRRX_EL0 encoding
 * is marked as ARM_CP_STATE_BOTH but has the wrong opc1
   value for AArch32
 * is implemented as RAZ/WI

Correct the encoding so:
 * we name the DBGDTRTX/DBGDTRRX register correctly
 * we split it into AA64 and AA32 versions so we can get the
   AA32 encoding right
 * we implement DBGDTR_EL0 at its correct encoding

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2986
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250708141049.778361-1-peter.maydell@linaro.org
(cherry picked from commit 655659a74a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-21 22:02:51 +03:00
Daniel P. Berrangé
08fa61a365 ui: fix setting client_endian field defaults
When a VNC client sends a "set pixel format" message, the
'client_endian' field will get initialized, however, it is
valid to omit this message if the client wants to use the
server's native pixel format. In the latter scenario nothing
is initializing the 'client_endian' field, so it remains set
to 0, matching neither G_LITTLE_ENDIAN nor G_BIG_ENDIAN. This
then results in pixel format conversion routines taking the
wrong code paths.

This problem existed before the 'client_be' flag was changed
into the 'client_endian' value, but the lack of initialization
meant it semantically defaulted to little endian, so only big
endian systems would potentially be exposed to incorrect pixel
translation.

The 'virt-viewer' / 'remote-viewer' apps always send a "set
pixel format" message so aren't exposed to any problems, but
the classical 'vncviewer' app will show the problem easily.

Fixes: 7ed96710e8
Reported-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
(cherry picked from commit 3ac6daa9e1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-21 22:00:41 +03:00
Peter Maydell
9561a8c140 hw/net/npcm_gmac.c: Send the right data for second packet in a row
The transmit loop in gmac_try_send_next_packet() is constructed in a
way that means it will send incorrect data if it it sends more than
one packet.

The function assembles the outbound data in a dynamically allocated
block of memory which is pointed to by tx_send_buffer.  We track the
first point in this block of memory which is not yet used with the
prev_buf_size offset, initially zero.  We track the size of the
packet we're sending with the length variable, also initially zero.

As we read chunks of data out of guest memory, we write them to
tx_send_buffer[prev_buf_size], and then increment both prev_buf_size
and length.  (We might dynamically reallocate the buffer if needed.)

When we send a packet, we checksum and send length bytes, starting at
tx_send_buffer, and then we reset length to 0.  This gives the right
data for the first packet.  But we don't reset prev_buf_size.  This
means that if we process more descriptors with further data for the
next packet, that data will continue to accumulate at offset
prev_buf_size, i.e.  after the data for the first packet.  But when
we transmit that second packet, we send length bytes from
tx_send_buffer, so we will send a packet which has the length of the
second packet but the data of the first one.

The fix for this is to also clear prev_buf_size after the packet has
been sent -- we never need the data from packet one after we've sent
it, so we can write packet two's data starting at the beginning of
the buffer.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 871a6e5b33)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-21 18:17:37 +03:00
Paolo Bonzini
24778b1c7e target/i386: do not expose ARCH_CAPABILITIES on AMD CPU
KVM emulates the ARCH_CAPABILITIES on x86 for both Intel and AMD
cpus, although the IA32_ARCH_CAPABILITIES MSR is an Intel-specific
MSR and it makes no sense to emulate it on AMD.

As a consequence, VMs created on AMD with qemu -cpu host and using
KVM will advertise the ARCH_CAPABILITIES feature and provide the
IA32_ARCH_CAPABILITIES MSR. This can cause issues (like Windows BSOD)
as the guest OS might not expect this MSR to exist on such cpus (the
AMD documentation specifies that ARCH_CAPABILITIES feature and MSR
are not defined on the AMD architecture).

A fix was proposed in KVM code, however KVM maintainers don't want to
change this behavior that exists for 6+ years and suggest changes to be
done in QEMU instead.  Therefore, hide the bit from "-cpu host":
migration of -cpu host guests is only possible between identical host
kernel and QEMU versions, therefore this is not a problematic breakage.

If a future AMD machine does include the MSR, that would re-expose the
Windows guest bug; but it would not be KVM/QEMU's problem at that
point, as we'd be following a genuine physical CPU impl.

Reported-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d3a24134e3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-19 10:42:58 +03:00
Zhao Liu
82517381c5 i386/cpu: Honor maximum value for CPUID.8000001DH.EAX[25:14]
CPUID.8000001DH:EAX[25:14] is "NumSharingCache", and the number of
logical processors sharing this cache is the value of this field
incremented by 1. Because of its width limitation, the maximum value
currently supported is 4095.

Though at present Q35 supports up to 4096 CPUs, by constructing a
specific topology, the width of the APIC ID can be extended beyond 12
bits. For example, using `-smp threads=33,cores=9,modules=9` results in
a die level offset of 6 + 4 + 4 = 14 bits, which can also cause
overflow. Check and honor the maximum value as CPUID.04H did.

Cc: Babu Moger <babu.moger@amd.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-8-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5d21ee453a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:24 +03:00
Qian Wen
1822478999 i386/cpu: Fix overflow of cache topology fields in CPUID.04H
According to SDM, CPUID.0x4:EAX[31:26] indicates the Maximum number of
addressable IDs for processor cores in the physical package. If we
launch over 64 cores VM, the 6-bit field will overflow, and the wrong
core_id number will be reported.

Since the HW reports 0x3f when the intel processor has over 64 cores,
limit the max value written to EAX[31:26] to 63, so max num_cores should
be 64.

For EAX[14:25], though at present Q35 supports up to 4096 CPUs, by
constructing a specific topology, the width of the APIC ID can be
extended beyond 12 bits. For example, using `-smp threads=33,cores=9,
modules=9` results in a die level offset of 6 + 4 + 4 = 14 bits, which
can also cause overflow.  check and honor the maximum value for
EAX[14:25] as well.

In addition, for host-cache-info case, also apply the same checks and
fixes.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Qian Wen <qian.wen@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-7-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3e86124e7c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:23 +03:00
Michael Tokarev
d097553158 i386/cpu: Fix cpu number overflow in CPUID.01H.EBX[23:16]
The legacy topology enumerated by CPUID.1.EBX[23:16] is defined in SDM
Vol2:

Bits 23-16: Maximum number of addressable IDs for logical processors in
this physical package.

When threads_per_socket > 255, it will 1) overwrite bits[31:24] which is
apic_id, 2) bits [23:16] get truncated.

Specifically, if launching the VM with -smp 256, the value written to
EBX[23:16] is 0 because of data overflow. If the guest only supports
legacy topology, without V2 Extended Topology enumerated by CPUID.0x1f
or Extended Topology enumerated by CPUID.0x0b to support over 255 CPUs,
the return of the kernel invoking cpu_smt_allowed() is false and APs
(application processors) will fail to bring up. Then only CPU 0 is online,
and others are offline.

For example, launch VM via:
qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \
    -cpu qemu64,cpuid-0xb=off -smp 256 -m 32G \
    -drive file=guest.img,if=none,id=virtio-disk0,format=raw \
    -device virtio-blk-pci,drive=virtio-disk0,bootindex=1 --nographic

The guest shows:
    CPU(s):               256
    On-line CPU(s) list:  0
    Off-line CPU(s) list: 1-255

To avoid this issue caused by overflow, limit the max value written to
EBX[23:16] to 255 as the HW does.

Cc: qemu-stable@nongnu.org
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Qian Wen <qian.wen@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-6-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a62fef5829)
(Mjt: fixup for 10.0.x series due to missing v10.0.0-2217-gf985a1195b
 "i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]")
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:23 +03:00
Akihiko Odaki
2df7954daa ui/vnc: Do not copy z_stream
vnc_worker_thread_loop() copies z_stream stored in its local VncState to
the persistent VncState, and the copied one is freed with deflateEnd()
later. However, deflateEnd() refuses to operate with a copied z_stream
and returns Z_STREAM_ERROR, leaking the allocated memory.

Avoid copying the zlib state to fix the memory leak.

Fixes: bd023f953e ("vnc: threaded VNC server")
Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250603-zlib-v3-1-20b857bd8d05@rsg.ci.i.u-tokyo.ac.jp>
(cherry picked from commit aef22331b5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:23 +03:00
David Hildenbrand
12e88c0c6c vhost: Fix used memslot tracking when destroying a vhost device
When we unplug a vhost device, we end up calling vhost_dev_cleanup()
where we do a memory_listener_unregister().

This memory_listener_unregister() call will end up disconnecting the
listener from the address space through listener_del_address_space().

In that process, we effectively communicate the removal of all memory
regions from that listener, resulting in region_del() + commit()
callbacks getting triggered.

So in case of vhost, we end up calling vhost_commit() with no remaining
memory slots (0).

In vhost_commit() we end up overwriting the global variables
used_memslots / used_shared_memslots, used for detecting the number
of free memslots. With used_memslots / used_shared_memslots set to 0
by vhost_commit() during device removal, we'll later assume that the
other vhost devices still have plenty of memslots left when calling
vhost_get_free_memslots().

Let's fix it by simply removing the global variables and depending
only on the actual per-device count.

Easy to reproduce by adding two vhost-user devices to a VM and then
hot-unplugging one of them.

While at it, detect unexpected underflows in vhost_get_free_memslots()
and issue a warning.

Reported-by: yuanminghao <yuanmh12@chinatelecom.cn>
Link: https://lore.kernel.org/qemu-devel/20241121060755.164310-1-yuanmh12@chinatelecom.cn/
Fixes: 2ce68e4cf5 ("vhost: add vhost_has_free_slot() interface")
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20250603111336.1858888-1-david@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9f749129e2)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:23 +03:00
Cole Robinson
2533500b4a roms: re-remove execute bit from hppa-firmware*
This was fixed in c9d77526bd for
9.2.0 but regressed in db34be3291 in
10.0.0

When the bit is present, rpmbuild complains about missing ELF build-id

Signed-off-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-ID: <52d0edfbb9b2f63a866f0065a721f3a95da6f8ba.1747590860.git.crobinso@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit a598090eba)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:23 +03:00
Kevin Wolf
e50ca171e2 file-posix: Fix aio=threads performance regression after enablign FUA
For aio=threads, we're currently not implementing REQ_FUA in any useful
way, but just do a separate raw_co_flush_to_disk() call. This changes
behaviour compared to the old state, which used bdrv_co_flush() with its
optimisations. As a quick fix, call bdrv_co_flush() again like before.
Eventually, we can use pwritev2() to make use of RWF_DSYNC if available,
but we'll still have to keep this code path as a fallback, so this fix
is required either way.

While the fix itself is a one-liner, some new graph locking annotations
are needed to convince TSA that the locking is correct.

Cc: qemu-stable@nongnu.org
Fixes: 984a32f17e ("file-posix: Support FUA writes")
Buglink: https://issues.redhat.com/browse/RHEL-96854
Reported-by: Tingting Mao <timao@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250625085019.27735-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit d402da1360)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:23 +03:00
Ethan Milon
787a817cd5 amd_iommu: Fix truncation of oldval in amdvi_writeq
The variable `oldval` was incorrectly declared as a 32-bit `uint32_t`.
This could lead to truncation and incorrect behavior where the upper
read-only 32 bits are significant.

Fix the type of `oldval` to match the return type of `ldq_le_p()`.

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca68 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Ethan Milon <ethan.milon@eviden.com>
Message-Id: <20250617150427.20585-9-alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 5788929e05)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:23 +03:00
Alejandro Jimenez
fc1ad5124f amd_iommu: Remove duplicated definitions
No functional change.

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-8-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 5959b641c9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:23 +03:00
Alejandro Jimenez
90c635c926 amd_iommu: Fix the calculation for Device Table size
Correctly calculate the Device Table size using the format encoded in the
Device Table Base Address Register (MMIO Offset 0000h).

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca68 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-7-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 67d3077ee4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:23 +03:00
Alejandro Jimenez
18e70a46c3 amd_iommu: Fix mask to retrieve Interrupt Table Root Pointer from DTE
Fix an off-by-one error in the definition of AMDVI_IR_PHYS_ADDR_MASK. The
current definition masks off the most significant bit of the Interrupt Table
Root ptr i.e. it only generates a mask with bits [50:6] set. See the AMD I/O
Virtualization Technology (IOMMU) Specification for the Interrupt Table
Root Pointer[51:6] field in the Device Table Entry format.

Cc: qemu-stable@nongnu.org
Fixes: b44159fe00 ("x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-6-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 123cf4bdd3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:23 +03:00
Alejandro Jimenez
caaa648508 amd_iommu: Fix masks for various IOMMU MMIO Registers
Address various issues with definitions of the MMIO registers e.g. for the
Device Table Address Register, the size mask currently encompasses reserved
bits [11:9], so change it to only extract the bits [8:0] encoding size.

Convert masks to use GENMASK64 for consistency, and make unrelated
definitions independent.

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca68 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-5-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 108e10ff69)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:22 +03:00
Alejandro Jimenez
d1ea4a1b0e amd_iommu: Update bitmasks representing DTE reserved fields
The DTE validation method verifies that all bits in reserved DTE fields are
unset. Update them according to the latest definition available in AMD I/O
Virtualization Technology (IOMMU) Specification - Section 2.2.2.1 Device
Table Entry Format. Remove the magic numbers and use a macro helper to
generate bitmasks covering the specified ranges for better legibility.

Note that some reserved fields specify that events are generated when they
contain non-zero bits, or checks are skipped under certain configurations.
This change only updates the reserved masks, checks for special conditions
are not yet implemented.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-4-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ff3dcb3bf6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:22 +03:00
Alejandro Jimenez
1ca9d2e0c2 amd_iommu: Fix Device ID decoding for INVALIDATE_IOTLB_PAGES command
The DeviceID bits are extracted using an incorrect offset in the call to
amdvi_iotlb_remove_page(). This field is read (correctly) earlier, so use
the value already retrieved for devid.

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca68 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-3-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit c63b8d1425)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:22 +03:00
Alejandro Jimenez
10a9eedc20 amd_iommu: Fix Miscellaneous Information Register 0 encoding
The definitions encoding the maximum Virtual, Physical, and Guest Virtual
Address sizes supported by the IOMMU are using incorrect offsets i.e. the
VASize and GVASize offsets are switched. The value in the GVAsize field is
also modified, since it was incorrectly encoded.

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca68 ("hw/i386: Introduce AMD IOMMU")
Co-developed-by: Ethan MILON <ethan.milon@eviden.com>
Signed-off-by: Ethan MILON <ethan.milon@eviden.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Message-Id: <20250617150427.20585-2-alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 091c7d7924)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-17 06:57:22 +03:00
Akihiko Odaki
f272f318c0 virtio-net: Add queues for RSS during migration
virtio_net_pre_load_queues() inspects vdev->guest_features to tell if
VIRTIO_NET_F_RSS or VIRTIO_NET_F_MQ is enabled to infer the required
number of queues. This works for VIRTIO_NET_F_MQ but it doesn't for
VIRTIO_NET_F_RSS because only the lowest 32 bits of vdev->guest_features
is set at the point and VIRTIO_NET_F_RSS uses bit 60 while
VIRTIO_NET_F_MQ uses bit 22.

Instead of inferring the required number of queues from
vdev->guest_features, use the number loaded from the vm state. This
change also has a nice side effect to remove a duplicate peer queue
pair change by circumventing virtio_net_set_multiqueue().

Also update the comment in include/hw/virtio/virtio.h to prevent an
implementation of pre_load_queues() from refering to any fields being
loaded during migration by accident in the future.

Fixes: 8c49756825 ("virtio-net: Add only one queue pair when realizing")

Tested-by: Lei Yang <leiyang@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit adda0ad56b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-15 23:06:56 +03:00
Anastasia Belova
6624ff3972 net: fix buffer overflow in af_xdp_umem_create()
s->pool has n_descs elements so maximum i should be
n_descs - 1. Fix the upper bound.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: cb039ef3d9 ("net: add initial support for AF_XDP network backend")
Cc: qemu-stable@nongnu.org
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Anastasia Belova <nabelova31@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 110d0fa2d4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-15 23:06:36 +03:00
Thomas Huth
a655b6548f accel/kvm: Adjust the note about the minimum required kernel version
Since commit 126e7f7803 ("kvm: require KVM_CAP_IOEVENTFD and
KVM_CAP_IOEVENTFD_ANY_LENGTH") we require at least kernel 4.5 to
be able to use KVM. Adjust the upgrade_note accordingly.
While we're at it, remove the text about kvm-kmod and the
SourceForge URL since this is not actively maintained anymore.

Fixes: 126e7f7803 ("kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTH")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit f180e367fc)
2025-07-15 23:05:15 +03:00
Peter Maydell
c49db93c36 linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC
In the linux-user do_fork() function we try to set the FD_CLOEXEC
flag on a pidfd like this:

    fcntl(pid_fd, F_SETFD, fcntl(pid_fd, F_GETFL) | FD_CLOEXEC);

This has two problems:
 (1) it doesn't check errors, which Coverity complains about
 (2) we use F_GETFL when we mean F_GETFD

Deal with both of these problems by using qemu_set_cloexec() instead.
That function will assert() if the fcntls fail, which is fine (we are
inside fork_start()/fork_end() so we know nothing can mess around
with our file descriptors here, and we just got this one from
pidfd_open()).

(As we are touching the if() statement here, we correct the
indentation.)

Coverity: CID 1508111
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250711141217.1429412-1-peter.maydell@linaro.org>
(cherry picked from commit d6390204c6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-13 12:04:36 +03:00
Chaney, Ben
b4ead90726 migration: Don't sync volatile memory after migration completes
Syncing volatile memory provides no benefit, instead it can cause
performance issues in some cases.  Only sync memory that is marked as
non-volatile after migration completes on destination.

Signed-off-by: Ben Chaney <bchaney@akamai.com>
Fixes: bd108a44bc (migration: ram: Switch to ram block writeback)
Link: https://lore.kernel.org/r/1CC43F59-336F-4A12-84AD-DB89E0A17A95@akamai.com
Signed-off-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 983899eab4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-13 11:36:12 +03:00
Geoffrey Thomas
a4e31f5d8e linux-user: Hold the fd-trans lock across fork
If another thread is holding target_fd_trans_lock during a fork,
then the lock becomes permanently locked in the child and the
emulator deadlocks at the next interaction with the fd-trans table.
As with other locks, acquire the lock in fork_start() and release
it in fork_end().

Cc: qemu-stable@nongnu.org
Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com>
Fixes: c093364f4d "fd-trans: Fix race condition on reallocation of the translation table."
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2846
Buglink: https://github.com/astral-sh/uv/issues/6105
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250314124742.4965-1-geofft@ldpreload.com>
(cherry picked from commit e4e839b2ee)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-13 11:23:43 +03:00
Peter Maydell
1759558915 linux-user: Check for EFAULT failure in nanosleep
target_to_host_timespec() returns an error if the memory the guest
passed us isn't actually readable.  We check for this everywhere
except the callsite in the TARGET_NR_nanosleep case, so this mistake
was caught by a Coverity heuristic.

Add the missing error checks to the calls that convert between the
host and target timespec structs.

Coverity: CID 1507104
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250710164355.1296648-1-peter.maydell@linaro.org>
(cherry picked from commit c4828cb850)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-13 11:21:51 +03:00
Peter Maydell
1714828a56 linux-user: Implement fchmodat2 syscall
The fchmodat2 syscall is new from Linux 6.6; it is like the
existing fchmodat syscall except that it takes a flags parameter.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3019
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250710113123.1109461-1-peter.maydell@linaro.org>
(cherry picked from commit 6a3e132a1b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-13 11:19:28 +03:00
Bernhard Beschow
676bc0f4a7 hw/arm/fsl-imx8mp: Wire VIRQ and VFIQ
Allows to run KVM guests inside the imx8mp-evk machine.

Fixes: a4eefc69b2 ("hw/arm: Add i.MX 8M Plus EVK board")
CC: qemu-stable
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 930180f3b9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-07-13 11:16:36 +03:00