Using qom-list and qom-get to get all the nodes and property values in
a QOM tree can take multiple seconds because it requires 1000's of
individual QOM requests. Some managers fetch the entire tree or a
large subset of it when starting a new VM, and this cost is a
substantial fraction of start up time.
Define the qom-list-get command, which fetches all the properties and
values for a list of paths. This can be much faster than qom-list
plus qom-get. When getting an entire QOM tree, I measured a 10x
speedup in elapsed time.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <1752248703-217318-2-git-send-email-steven.sistare@oracle.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-ID: <20250711054005.60969-12-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
"Returns: <description>" is rendered like "Return: <Type> –
<description>". Mentioning the type in the description again is
commonly redundant. Rephrase such descriptions not to.
Well, I tried. Maybe not very hard. Sorry!
Signed-off-by: John Snow <jsnow@redhat.com>
Message-ID: <20250711051045.51110-5-jsnow@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
[Commit message amended to explain why]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Use the conventional "- If <error-condition>" phrasing, optionally
with ", <error-class>".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250708072828.105185-3-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Use imperative mood "Do ..." instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250708072828.105185-2-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Remove the QAPI doc section heading syntax, use plain rST section
headings instead.
Tests and documentation are updated to match.
Interestingly, Plain rST headings work fine before this patch, except
for over- and underlining with '=', which the doc parser rejected as
invalid QAPI doc section heading in free-form comments.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-ID: <20250618165353.1980365-5-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Add more detail to commit message]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Adds an IGVM loader to QEMU which processes a given IGVM file and
applies the directives within the file to the current guest
configuration.
The IGVM loader can be used to configure both confidential and
non-confidential guests. For confidential guests, the
ConfidentialGuestSupport object for the system is used to encrypt
memory, apply the initial CPU state and perform other confidential guest
operations.
The loader is configured via a new IgvmCfg QOM object which allows the
user to provide a path to the IGVM file to process.
Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Gerd Hoffman <kraxel@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/ae3a07d8f514d93845a9c16bb155c847cb567b0d.1751554099.git.roy.hopkins@randomman.co.uk
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add property "quote-generation-socket" to tdx-guest, which is a property
of type SocketAddress to specify Quote Generation Service(QGS).
On request of GetQuote, it connects to the QGS socket, read request
data from shared guest memory, send the request data to the QGS,
and store the response into shared guest memory, at last notify
TD guest by interrupt.
command line example:
qemu-system-x86_64 \
-object '{"qom-type":"tdx-guest","id":"tdx0","quote-generation-socket":{"type":"unix", "path":"/var/run/tdx-qgs/qgs.socket"}}' \
-machine confidential-guest-support=tdx0
Note, above example uses the unix socket. It can be other types, like vsock,
which depends on the implementation of QGS.
To avoid no response from QGS server, setup a timer for the transaction.
If timeout, make it an error and interrupt guest. Define the threshold of
time to 30s at present, maybe change to other value if not appropriate.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250527073916.1243024-3-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Three sha384 hash values, mrconfigid, mrowner and mrownerconfig, of a TD
can be provided for TDX attestation. Detailed meaning of them can be
found: https://lore.kernel.org/qemu-devel/31d6dbc1-f453-4cef-ab08-4813f4e0ff92@intel.com/
Allow user to specify those values via property mrconfigid, mrowner and
mrownerconfig. They are all in base64 format.
example
-object tdx-guest, \
mrconfigid=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
mrowner=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
mrownerconfig=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-14-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Bit 28 of TD attribute, named SEPT_VE_DISABLE. When set to 1, it disables
EPT violation conversion to #VE on guest TD access of PENDING pages.
Some guest OS (e.g., Linux TD guest) may require this bit as 1.
Otherwise refuse to boot.
Add sept-ve-disable property for tdx-guest object, for user to configure
this bit.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-10-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce tdx-guest object which inherits X86_CONFIDENTIAL_GUEST,
and will be used to create TDX VMs (TDs) by
qemu -machine ...,confidential-guest-support=tdx0 \
-object tdx-guest,id=tdx0
It has one QAPI member 'attributes' defined, which allows user to set
TD's attributes directly.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-3-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This feature was only applied during the 9.2 cycle, so reflect
that rather than 9.1.
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Closes: https://lore.kernel.org/qemu-devel/ZyngEiwmYeZ-DvCy@redhat.com/
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20241107123446.902801-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
These are very similar to the recently added Generic Initiators
but instead of representing an initiator of memory traffic they
represent an edge point beyond which may lie either targets or
initiators. Here we add these ports such that they may
be targets of hmat_lb records to describe the latency and
bandwidth from host side initiators to the port. A discoverable
mechanism such as UEFI CDAT read from CXL devices and switches
is used to discover the remainder of the path, and the OS can build
up full latency and bandwidth numbers as need for work and data
placement decisions.
Acked-by: Markus Armbruster <armbru@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916174122.1843197-1-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sweep the entire documentation again. Last done in commit
209e64d9ed (qapi: Refill doc comments to conform to current
conventions).
To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3". Finds no
differences. Comparing with diff is not useful, as the reflown
paragraphs are visible there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240729065220.860163-1-armbru@redhat.com>
[Straightforward conflict with commit 442110bc6f resolved]
Some QOM properties are associated with ObjectTypes that already
depend on CONFIG_* switches. So to avoid generating dead code,
let's also make the definition of those properties dependent on
the corresponding CONFIG_*.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-ID: <20240604135931.311709-1-sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Make SecretKeyringProperties conditional, too]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
When an Example section has a brief explanation, convert it to a
qmp-example:: section using the :title: option.
Rule of thumb: If the title can fit on a single line and requires no rST
markup, it's a good candidate for using the :title: option of
qmp-example.
In this patch, trailing punctuation is removed from the title section
for consistent headline aesthetics. In just one case, specifics of the
example are removed to make the title read better.
See commit-4: "docs/qapidoc: create qmp-example directive", for a
detailed explanation of this custom directive syntax.
See commit+2: "qapi: remove "Example" doc section" for a detailed
explanation of why.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240717021312.606116-8-jsnow@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Use the no-option form of ".. qmp-example::" to convert any Examples
that do not have any form of caption or explanation whatsoever. Note
that in a few cases, example sections are split into two or more
separate example blocks. This is only done stylistically to create a
delineation between two or more logically independent examples.
See commit-3: "docs/qapidoc: create qmp-example directive", for a
detailed explanation of this custom directive syntax.
See commit+3: "qapi: remove "Example" doc section" for a detailed
explanation of why.
Note: an empty "TODO" line was added to announce-self to keep the
example from floating up into the body; this will be addressed more
rigorously in the new qapidoc generator.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-ID: <20240717021312.606116-7-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Markup fixed in one place]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Commit 8f9a9259d3 added ObjectType member @x-vfio-user-server with
feature unstable, but neglected to explain why it is unstable. Do
that now.
Fixes: 8f9a9259d3 (vfio-user: define vfio-user-server object)
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Cc: John G Johnson <john.g.johnson@oracle.com>
Cc: Jagannathan Raman <jag.raman@oracle.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240703095310.1242102-1-armbru@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
[Indentation fixed]
Currently if the 'legacy-vm-type' property of the sev-guest object is
'on', QEMU will attempt to use the newer KVM_SEV_INIT2 kernel
interface in conjunction with the newer KVM_X86_SEV_VM and
KVM_X86_SEV_ES_VM KVM VM types.
This can lead to measurement changes if, for instance, an SEV guest was
created on a host that originally had an older kernel that didn't
support KVM_SEV_INIT2, but is booted on the same host later on after the
host kernel was upgraded.
Instead, if legacy-vm-type is 'off', QEMU should fail if the
KVM_SEV_INIT2 interface is not provided by the current host kernel.
Modify the fallback handling accordingly.
In the future, VMSA features and other flags might be added to QEMU
which will require legacy-vm-type to be 'off' because they will rely
on the newer KVM_SEV_INIT2 interface. It may be difficult to convey to
users what values of legacy-vm-type are compatible with which
features/options, so as part of this rework, switch legacy-vm-type to a
tri-state OnOffAuto option. 'auto' in this case will automatically
switch to using the newer KVM_SEV_INIT2, but only if it is required to
make use of new VMSA features or other options only available via
KVM_SEV_INIT2.
Defining 'auto' in this way would avoid inadvertantly breaking
compatibility with older kernels since it would only be used in cases
where users opt into newer features that are only available via
KVM_SEV_INIT2 and newer kernels, and provide better default behavior
than the legacy-vm-type=off behavior that was previously in place, so
make it the default for 9.1+ machine types.
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
cc: kvm@vger.kernel.org
Signed-off-by: Michael Roth <michael.roth@amd.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240710041005.83720-1-michael.roth@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We do not need a dedicated section for notes. By eliminating a specially
parsed section, these notes can be treated as normal rST paragraphs in
the new QMP reference manual, and can be placed and styled much more
flexibly.
Convert all existing "Note" and "Notes" sections to pure rST. As part of
the conversion, capitalize the first letter of each sentence and add
trailing punctuation where appropriate to ensure notes look sensible and
consistent in rendered HTML documentation. Markup is also re-aligned to
the de-facto standard of 3 spaces for directives.
Update docs/devel/qapi-code-gen.rst to reflect the new paradigm, and
update the QAPI parser to prohibit "Note" sections while suggesting a
new syntax. The exact formatting to use is a matter of taste, but a good
candidate is simply:
.. note:: lorem ipsum ...
... dolor sit amet ...
... consectetur adipiscing elit ...
... but there are other choices, too. The Sphinx readthedocs theme
offers theming for the following forms (capitalization unimportant); all
are adorned with a (!) symbol () in the title bar for rendered HTML
docs.
See
https://sphinx-rtd-theme.readthedocs.io/en/stable/demo/demo.html#admonitions
for examples of each directive/admonition in use.
These are rendered in orange:
.. Attention:: ...
.. Caution:: ...
.. WARNING:: ...
These are rendered in red:
.. DANGER:: ...
.. Error:: ...
These are rendered in green:
.. Hint:: ...
.. Important:: ...
.. Tip:: ...
These are rendered in blue:
.. Note:: ...
.. admonition:: custom title
admonition body text
This patch uses ".. note::" almost everywhere, with just two "caution"
directives. Several instances of "Notes:" have been converted to
merely ".. note::", or multiple ".. note::" where appropriate.
".. admonition:: notes" is used in a few places where we had an
ordered list of multiple notes that would not make sense as
standalone/separate admonitions. Two "Note:" following "Example:"
have been turned into ordinary paragraphs within the example.
NOTE: Because qapidoc.py does not attempt to preserve source ordering of
sections, the conversion of Notes from a "tagged section" to an
"untagged section" means that rendering order for some notes *may
change* as a result of this patch. The forthcoming qapidoc.py rewrite
strictly preserves source ordering in the rendered documentation, so
this issue will be rectified in the new generator.
Signed-off-by: John Snow <jsnow@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com> [for block*.json]
Message-ID: <20240626222128.406106-11-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message clarified slightly, period added to one more note]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).
The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.
This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.
In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com> (QAPI schema)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20240618100519.145853-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The default value of the @share option of the @MemoryBackendProperties
really depends on the backend type, so let's document the default
values in the same place where we define the option to avoid
dispersing the information.
Cc: David Hildenbrand <david@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20240618100043.144657-2-sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
SEV-SNP support relies on a different set of properties/state than the
existing 'sev-guest' object. This patch introduces the 'sev-snp-guest'
object, which can be used to configure an SEV-SNP guest. For example,
a default-configured SEV-SNP guest with no additional information
passed in for use with attestation:
-object sev-snp-guest,id=sev0
or a fully-specified SEV-SNP guest where all spec-defined binary
blobs are passed in as base64-encoded strings:
-object sev-snp-guest,id=sev0, \
policy=0x30000, \
init-flags=0, \
id-block=YWFhYWFhYWFhYWFhYWFhCg==, \
id-auth=CxHK/OKLkXGn/KpAC7Wl1FSiisWDbGTEKz..., \
author-key-enabled=on, \
host-data=LNkCWBRC5CcdGXirbNUV1OrsR28s..., \
guest-visible-workarounds=AA==, \
See the QAPI schema updates included in this patch for more usage
details.
In some cases these blobs may be up to 4096 characters, but this is
generally well below the default limit for linux hosts where
command-line sizes are defined by the sysconf-configurable ARG_MAX
value, which defaults to 2097152 characters for Ubuntu hosts, for
example.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Acked-by: Markus Armbruster <armbru@redhat.com> (for QAPI schema)
Signed-off-by: Michael Roth <michael.roth@amd.com>
Co-developed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Pankaj Gupta <pankaj.gupta@amd.com>
Message-ID: <20240530111643.1091816-8-pankaj.gupta@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently all SEV/SEV-ES functionality is managed through a single
'sev-guest' QOM type. With upcoming support for SEV-SNP, taking this
same approach won't work well since some of the properties/state
managed by 'sev-guest' is not applicable to SEV-SNP, which will instead
rely on a new QOM type with its own set of properties/state.
To prepare for this, this patch moves common state into an abstract
'sev-common' parent type to encapsulate properties/state that are
common to both SEV/SEV-ES and SEV-SNP, leaving only SEV/SEV-ES-specific
properties/state in the current 'sev-guest' type. This should not
affect current behavior or command-line options.
As part of this patch, some related changes are also made:
- a static 'sev_guest' variable is currently used to keep track of
the 'sev-guest' instance. SEV-SNP would similarly introduce an
'sev_snp_guest' static variable. But these instances are now
available via qdev_get_machine()->cgs, so switch to using that
instead and drop the static variable.
- 'sev_guest' is currently used as the name for the static variable
holding a pointer to the 'sev-guest' instance. Re-purpose the name
as a local variable referring the 'sev-guest' instance, and use
that consistently throughout the code so it can be easily
distinguished from sev-common/sev-snp-guest instances.
- 'sev' is generally used as the name for local variables holding a
pointer to the 'sev-guest' instance. In cases where that now points
to common state, use the name 'sev_common'; in cases where that now
points to state specific to 'sev-guest' instance, use the name
'sev_guest'
In order to enable kernel-hashes for SNP, pull it from
SevGuestProperties to its parent SevCommonProperties so
it will be available for both SEV and SNP.
Signed-off-by: Michael Roth <michael.roth@amd.com>
Co-developed-by: Dov Murik <dovmurik@linux.ibm.com>
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Acked-by: Markus Armbruster <armbru@redhat.com> (QAPI schema)
Co-developed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Pankaj Gupta <pankaj.gupta@amd.com>
Message-ID: <20240530111643.1091816-5-pankaj.gupta@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU will currently automatically make use of the KVM_SEV_INIT2 API for
initializing SEV and SEV-ES guests verses the older
KVM_SEV_INIT/KVM_SEV_ES_INIT interfaces.
However, the older interfaces will silently avoid sync'ing FPU/XSAVE
state to the VMSA prior to encryption, thus relying on behavior and
measurements that assume the related fields to be allow zero.
With KVM_SEV_INIT2, this state is now synced into the VMSA, resulting in
measurements changes and, theoretically, behaviorial changes, though the
latter are unlikely to be seen in practice.
To allow a smooth transition to the newer interface, while still
providing a mechanism to maintain backward compatibility with VMs
created using the older interfaces, provide a new command-line
parameter:
-object sev-guest,legacy-vm-type=true,...
and have it default to false.
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240409230743.962513-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For legibility, wrap text paragraphs so every line is at most 70
characters long.
To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3". Finds no
differences. Comparing with diff is not useful, as the refilled
paragraphs are visible there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240322140910.328840-11-armbru@redhat.com>
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
partitioning of the GPU device resources (including device memory) into
several (upto 8) isolated instances. Each of the partitioned memory needs
a dedicated NUMA node to operate. The partitions are not fixed and they
can be created/deleted at runtime.
Unfortunately Linux OS does not provide a means to dynamically create/destroy
NUMA nodes and such feature implementation is not expected to be trivial. The
nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
we utilize the Generic Initiator (GI) Affinity structures that allows
association between nodes and devices. Multiple GI structures per BDF is
possible, allowing creation of multiple nodes by exposing unique PXM in each
of these structures.
Implement the mechanism to build the GI affinity structures as Qemu currently
does not. Introduce a new acpi-generic-initiator object to allow host admin
link a device with an associated NUMA node. Qemu maintains this association
and use this object to build the requisite GI Affinity Structure.
When multiple NUMA nodes are associated with a device, it is required to
create those many number of acpi-generic-initiator objects, each representing
a unique device:node association.
Following is one of a decoded GI affinity structure in VM ACPI SRAT.
[0C8h 0200 1] Subtable Type : 05 [Generic Initiator Affinity]
[0C9h 0201 1] Length : 20
[0CAh 0202 1] Reserved1 : 00
[0CBh 0203 1] Device Handle Type : 01
[0CCh 0204 4] Proximity Domain : 00000007
[0D0h 0208 16] Device Handle : 00 00 20 00 00 00 00 00 00 00 00
00 00 00 00 00
[0E0h 0224 4] Flags (decoded below) : 00000001
Enabled : 1
[0E4h 0228 4] Reserved2 : 00000000
[0E8h 0232 1] Subtable Type : 05 [Generic Initiator Affinity]
[0E9h 0233 1] Length : 20
An admin can provide a range of acpi-generic-initiator objects, each
associating a device (by providing the id through pci-dev argument)
to the desired NUMA node (using the node argument). Currently, only PCI
device is supported.
For the grace hopper system, create a range of 8 nodes and associate that
with the device using the acpi-generic-initiator object. While a configuration
of less than 8 nodes per device is allowed, such configuration will prevent
utilization of the feature to the fullest. The following sample creates 8
nodes per PCI device for a VM with 2 PCI devices and link them to the
respecitve PCI device using acpi-generic-initiator objects:
-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
-numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
-numa node,nodeid=8 -numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
-object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
-object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
-object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
-object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
-object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
-object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
-object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \
-numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \
-numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \
-numa node,nodeid=16 -numa node,nodeid=17 \
-device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \
-object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \
-object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \
-object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \
-object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \
-object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \
-object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \
-object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \
-object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \
Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1]
Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Message-Id: <20240308145525.10886-2-ankita@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
By convention, we indent the second and subsequent lines of
descriptions and tagged sections, except for examples.
Turn this into a hard rule, and apply it to examples, too.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240216145841.2099240-11-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[Straightforward conflicts in qapi/migration.json resolved]
Commit e050e42678 (qapi: Use explicit bulleted lists) added list
markup to correct bad rendering:
A JSON block comment like this:
Returns: nothing on success
If @node is not a valid block device, DeviceNotFound
If @name is not found, GenericError with an explanation
renders like this:
Returns: nothing on success If node is not a valid block device,
DeviceNotFound If name is not found, GenericError with an explanation
because whitespace is not significant.
Use an actual bulleted list, so that the formatting is correct.
It missed a few instances. Commit a937b6aa73 (qapi: Reformat doc
comments to conform to current conventions) then reflowed them.
Revert the reflowing, and add list markup.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240120095327.666239-6-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Introduce an iommufd object which allows the interaction
with the host /dev/iommu device.
The /dev/iommu can have been already pre-opened outside of qemu,
in which case the fd can be passed directly along with the
iommufd object:
This allows the iommufd object to be shared accross several
subsystems (VFIO, VDPA, ...). For example, libvirt would open
the /dev/iommu once.
If no fd is passed along with the iommufd object, the /dev/iommu
is opened by the qemu code.
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
For now, "share=off,readonly=on" would always result in us opening the
file R/O and mmap'ing the opened file MAP_PRIVATE R/O -- effectively
turning it into ROM.
Especially for VM templating, "share=off" is a common use case. However,
that use case is impossible with files that lack write permissions,
because "share=off,readonly=on" will not give us writable RAM.
The sole user of ROM via memory-backend-file are R/O NVDIMMs, but as we
have users (Kata Containers) that rely on the existing behavior --
malicious VMs should not be able to consume COW memory for R/O NVDIMMs --
we cannot change the semantics of "share=off,readonly=on"
So let's add a new "rom" property with on/off/auto values. "auto" is
the default and what most people will use: for historical reasons, to not
change the old semantics, it defaults to the value of the "readonly"
property.
For VM templating, one can now use:
-object memory-backend-file,share=off,readonly=on,rom=off,...
But we'll disallow:
-object memory-backend-file,share=on,readonly=on,rom=off,...
because we would otherwise get an error when trying to mmap the R/O file
shared and writable. An explicit error message is cleaner.
We will also disallow for now:
-object memory-backend-file,share=off,readonly=off,rom=on,...
-object memory-backend-file,share=on,readonly=off,rom=on,...
It's not harmful, but also not really required for now.
Alternatives that were abandoned:
* Make "unarmed=on" for the NVDIMM set the memory region container
readonly. We would still see a change of ROM->RAM and possibly run
into memslot limits with vhost-user. Further, there might be use cases
for "unarmed=on" that should still allow writing to that memory
(temporary files, system RAM, ...).
* Add a new "readonly=on/off/auto" parameter for NVDIMMs. Similar issues
as with "unarmed=on".
* Make "readonly" consume "on/off/file" instead of being a 'bool' type.
This would slightly changes the behavior of the "readonly" parameter:
values like true/false (as accepted by a 'bool'type) would no longer be
accepted.
Message-ID: <20230906120503.359863-4-david@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Since commit a937b6aa73 (qapi: Reformat doc comments to conform to
current conventions), a number of comments not conforming to the
current formatting conventions were added. No problem, just sweep
the entire documentation once more.
To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3". Finds no
differences. Comparing with diff is not useful, as the reflown
paragraphs are visible there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20230720071610.1096458-7-armbru@redhat.com>
Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.
In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.
To make this work consistently, also fix up all places in QEMU that
expect fd offsets to be 0.
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20230403221421.60877-1-graf@amazon.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Change
# @name: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
# do eiusmod tempor incididunt ut labore et dolore magna aliqua.
to
# @name: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
# do eiusmod tempor incididunt ut labore et dolore magna aliqua.
See recent commit "qapi: Relax doc string @name: description
indentation rules" for rationale.
Reflow paragraphs to 70 columns width, and consistently use two spaces
to separate sentences.
To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3". Finds no
differences. Comparing with diff is not useful, as the reflown
paragraphs are visible there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230428105429.1687850-18-armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Lukas Straub <lukasstraub2@web.de>
[Straightforward conflicts in qapi/audio.json qapi/misc-target.json
qapi/run-state.json resolved]
A few examples neglect to prefix QMP input with '->'. Fix that.
Two examples have extra space after '<-'. Delete it.
A few examples neglect to show output. Provide some. The example
output for query-vcpu-dirty-limit could use further improvement. Add
a TODO comment.
Use "Examples:" instead of "Example:" where multiple examples are
given.
One example section numbers its two examples. Not done elsewhere;
drop.
Another example section separates them with "or". Likewise.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230425064223.820979-8-armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Documentation suggests @foo is merely shorthand for ``foo``. It's
not, it carries additional meaning: it's a reference to a QAPI schema
name.
Reword the documentation to spell that out.
Fix up the few ``foo`` that should be @foo.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230425064223.820979-7-armbru@redhat.com>
Add 'throttle-bps' and 'throttle-ops' limitation to set QoS. The
two arguments work with both QEMU command line and QMP command.
Example of QEMU command line:
-object cryptodev-backend-builtin,id=cryptodev1,throttle-bps=1600,\
throttle-ops=100
Example of QMP command:
virsh qemu-monitor-command buster --hmp qom-set /objects/cryptodev1 \
throttle-ops 100
or cancel limitation:
virsh qemu-monitor-command buster --hmp qom-set /objects/cryptodev1 \
throttle-ops 0
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230301105847.253084-11-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
lots of acpi rework
first version of biosbits infrastructure
ASID support in vhost-vdpa
core_count2 support in smbios
PCIe DOE emulation
virtio vq reset
HMAT support
part of infrastructure for viommu support in vhost-vdpa
VTD PASID support
fixes, tests all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut
uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s
5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X
Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur
6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU
EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo=
=zTCn
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
pci,pc,virtio: features, tests, fixes, cleanups
lots of acpi rework
first version of biosbits infrastructure
ASID support in vhost-vdpa
core_count2 support in smbios
PCIe DOE emulation
virtio vq reset
HMAT support
part of infrastructure for viommu support in vhost-vdpa
VTD PASID support
fixes, tests all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut
# uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s
# 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X
# Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur
# 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU
# EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo=
# =zTCn
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 07 Nov 2022 14:27:53 EST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (83 commits)
checkpatch: better pattern for inline comments
hw/virtio: introduce virtio_device_should_start
tests/acpi: update tables for new core count test
bios-tables-test: add test for number of cores > 255
tests/acpi: allow changes for core_count2 test
bios-tables-test: teach test to use smbios 3.0 tables
hw/smbios: add core_count2 to smbios table type 4
vhost-user: Support vhost_dev_start
vhost: Change the sequence of device start
intel-iommu: PASID support
intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
intel-iommu: drop VTDBus
intel-iommu: don't warn guest errors when getting rid2pasid entry
vfio: move implement of vfio_get_xlat_addr() to memory.c
tests: virt: Update expected *.acpihmatvirt tables
tests: acpi: aarch64/virt: add a test for hmat nodes with no initiators
hw/arm/virt: Enable HMAT on arm virt machine
tests: Add HMAT AArch64/virt empty table files
tests: acpi: q35: update expected blobs *.hmat-noinitiators expected HMAT:
tests: acpi: q35: add test for hmat nodes without initiators
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Most of them were found and fixed using codespell.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221030105944.311940-1-sw@weilnetz.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
cryptodev: Added a new type of backend named lkcf-backend for
cryptodev. This backend upload asymmetric keys to linux kernel,
and let kernel do the accelerations if possible.
The lkcf stands for Linux Kernel Cryptography Framework.
Signed-off-by: lei he <helei.sig11@bytedance.com>
Message-Id: <20221008085030.70212-5-helei.sig11@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Let's allow for specifying a thread context via the "prealloc-context"
property. When set, preallcoation threads will be crated via the
thread context -- inheriting the same CPU affinity as the thread
context.
Pinning preallcoation threads to CPUs can heavily increase performance
in NUMA setups, because, preallocation from a CPU close to the target
NUMA node(s) is faster then preallocation from a CPU further remote,
simply because of memory bandwidth for initializing memory with zeroes.
This is especially relevant for very large VMs backed by huge/gigantic
pages, whereby preallocation is mandatory.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-7-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's make it easier to pin threads created via a ThreadContext to
all host CPUs currently belonging to a given set of host NUMA nodes --
which is the common case.
"node-affinity" is simply a shortcut for setting "cpu-affinity" manually
to the list of host CPUs belonging to the set of host nodes. This property
can only be written.
A simple QEMU example to set the CPU affinity to host node 1 on a system
with two nodes, 24 CPUs each, whereby odd-numbered host CPUs belong to
host node 1:
qemu-system-x86_64 -S \
-object thread-context,id=tc1,node-affinity=1
And we can query the cpu-affinity via HMP/QMP:
(qemu) qom-get tc1 cpu-affinity
[
1,
3,
5,
7,
9,
11,
13,
15,
17,
19,
21,
23,
25,
27,
29,
31,
33,
35,
37,
39,
41,
43,
45,
47
]
We cannot query the node-affinity:
(qemu) qom-get tc1 node-affinity
Error: Insufficient permission to perform this operation
But note that due to dynamic library loading this example will not work
before we actually make use of thread_context_create_thread() in QEMU
code, because the type will otherwise not get registered. We'll wire
this up next to make it work.
Note that if the host CPUs for a host node change due do CPU hot(un)plug
CPU onlining/offlining (i.e., lscpu output changes) after the ThreadContext
was started, the CPU affinity will not get updated.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221014134720.168738-5-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>