No iommu driver implements this any more, get rid of it.
Link: https://lore.kernel.org/r/9-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Compute the isolated_msi over all the devices in the IOMMU group because
iommufd and vfio both need to know that the entire group is isolated
before granting access to it.
Link: https://lore.kernel.org/r/2-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
iommufd is the user API to control the IOMMU subsystem as it relates to
managing IO page tables that point at user space memory.
It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
container) which is the VFIO specific interface for a similar idea.
We see a broad need for extended features, some being highly IOMMU device
specific:
- Binding iommu_domain's to PASID/SSID
- Userspace IO page tables, for ARM, x86 and S390
- Kernel bypassed invalidation of user page tables
- Re-use of the KVM page table in the IOMMU
- Dirty page tracking in the IOMMU
- Runtime Increase/Decrease of IOPTE size
- PRI support with faults resolved in userspace
Many of these HW features exist to support VM use cases - for instance the
combination of PASID, PRI and Userspace IO Page Tables allows an
implementation of DMA Shared Virtual Addressing (vSVA) within a
guest. Dirty tracking enables VM live migration with SRIOV devices and
PASID support allow creating "scalable IOV" devices, among other things.
As these features are fundamental to a VM platform they need to be
uniformly exposed to all the driver families that do DMA into VMs, which
is currently VFIO and VDPA.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF
YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl
s7fAu+CQ1pr9+9NKGevD+frw8Solsw4=
=jJkd
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd implementation from Jason Gunthorpe:
"iommufd is the user API to control the IOMMU subsystem as it relates
to managing IO page tables that point at user space memory.
It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
container) which is the VFIO specific interface for a similar idea.
We see a broad need for extended features, some being highly IOMMU
device specific:
- Binding iommu_domain's to PASID/SSID
- Userspace IO page tables, for ARM, x86 and S390
- Kernel bypassed invalidation of user page tables
- Re-use of the KVM page table in the IOMMU
- Dirty page tracking in the IOMMU
- Runtime Increase/Decrease of IOPTE size
- PRI support with faults resolved in userspace
Many of these HW features exist to support VM use cases - for instance
the combination of PASID, PRI and Userspace IO Page Tables allows an
implementation of DMA Shared Virtual Addressing (vSVA) within a guest.
Dirty tracking enables VM live migration with SRIOV devices and PASID
support allow creating "scalable IOV" devices, among other things.
As these features are fundamental to a VM platform they need to be
uniformly exposed to all the driver families that do DMA into VMs,
which is currently VFIO and VDPA"
For more background, see the extended explanations in Jason's pull request:
https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits)
iommufd: Change the order of MSI setup
iommufd: Improve a few unclear bits of code
iommufd: Fix comment typos
vfio: Move vfio group specific code into group.c
vfio: Refactor dma APIs for emulated devices
vfio: Wrap vfio group module init/clean code into helpers
vfio: Refactor vfio_device open and close
vfio: Make vfio_device_open() truly device specific
vfio: Swap order of vfio_device_container_register() and open_device()
vfio: Set device->group in helper function
vfio: Create wrappers for group register/unregister
vfio: Move the sanity check of the group to vfio_create_group()
vfio: Simplify vfio_create_group()
iommufd: Allow iommufd to supply /dev/vfio/vfio
vfio: Make vfio_container optionally compiled
vfio: Move container related MODULE_ALIAS statements into container.c
vfio-iommufd: Support iommufd for emulated VFIO devices
vfio-iommufd: Support iommufd for physical VFIO devices
vfio-iommufd: Allow iommufd to be used in place of a container fd
vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
...
Access to the internals of struct iommu_fwspec by non-IOMMU drivers is
discouraged. Many drivers for Tegra SoCs, however, need access to their
IOMMU stream IDs so that they can be programmed into various hardware
registers.
Formalize this access into a common helper to make it easier to audit
and maintain.
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20221206165945.3551774-3-thierry.reding@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
This structure is to be considered private to the IOMMU API. Except for
very few exceptions, IOMMU consumer drivers should treat this as opaque
data.
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20221206165945.3551774-2-thierry.reding@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
These complement the group interfaces used by VFIO and are for use by
iommufd. The main difference is that multiple devices in the same group
can all share the ownership by passing the same ownership pointer.
Move the common code into shared functions.
Link: https://lore.kernel.org/r/2-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This queries if a domain linked to a device should expect to support
enforce_cache_coherency() so iommufd can negotiate the rules for when a
domain should be shared or not.
For iommufd a device that declares IOMMU_CAP_ENFORCE_CACHE_COHERENCY will
not be attached to a domain that does not support it.
Link: https://lore.kernel.org/r/1-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This series is to replace the previous EMEDIUMTYPE patch in a VFIO series:
https://lore.kernel.org/kvm/Yxnt9uQTmbqul5lf@8bytes.org/
The purpose is to regulate all existing ->attach_dev callback functions to
use EINVAL exclusively for an incompatibility error between a device and a
domain. This allows VFIO and IOMMUFD to detect such a soft error, and then
try a different domain with the same device.
Among all the patches, the first two are preparatory changes. And then one
patch to update kdocs and another three patches for the enforcement
effort.
Link: https://lore.kernel.org/r/cover.1666042872.git.nicolinc@nvidia.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY2JjUQAKCRCFwuHvBreF
YaFbAP492zvOEaZaRxiK4XcdsU1ZBCovB/2Keh/QIQdb7Ig6hgD/dW7TygTP1+4a
Oqpcu/6aLeHvhayfZt1142S3e0HuHwU=
=g5C+
-----END PGP SIGNATURE-----
Merge tag 'for-joerg' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core
iommu: Define EINVAL as device/domain incompatibility
This series is to replace the previous EMEDIUMTYPE patch in a VFIO series:
https://lore.kernel.org/kvm/Yxnt9uQTmbqul5lf@8bytes.org/
The purpose is to regulate all existing ->attach_dev callback functions to
use EINVAL exclusively for an incompatibility error between a device and a
domain. This allows VFIO and IOMMUFD to detect such a soft error, and then
try a different domain with the same device.
Among all the patches, the first two are preparatory changes. And then one
patch to update kdocs and another three patches for the enforcement
effort.
Link: https://lore.kernel.org/r/cover.1666042872.git.nicolinc@nvidia.com
This adds some mechanisms around the iommu_domain so that the I/O page
fault handling framework could route a page fault to the domain and
call the fault handler from it.
Add pointers to the page fault handler and its private data in struct
iommu_domain. The fault handler will be called with the private data
as a parameter once a page fault is routed to the domain. Any kernel
component which owns an iommu domain could install handler and its
private parameter so that the page fault could be further routed and
handled.
This also prepares the SVA implementation to be the first consumer of
the per-domain page fault handling model. The I/O page fault handler
for SVA is copied to the SVA file with mmget_not_zero() added before
mmap_read_lock().
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
These ops'es have been deprecated. There's no need for them anymore.
Remove them to avoid dead code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-11-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The existing iommu SVA interfaces are implemented by calling the SVA
specific iommu ops provided by the IOMMU drivers. There's no need for
any SVA specific ops in iommu_ops vector anymore as we can achieve
this through the generic attach/detach_dev_pasid domain ops.
This refactors the IOMMU SVA interfaces implementation by using the
iommu_attach/detach_device_pasid interfaces and align them with the
concept of the SVA iommu domain. Put the new SVA code in the SVA
related file in order to make it self-contained.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The SVA iommu_domain represents a hardware pagetable that the IOMMU
hardware could use for SVA translation. This adds some infrastructures
to support SVA domain in the iommu core. It includes:
- Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA domain
type. The IOMMU drivers that support allocation of the SVA domain
should provide its own SVA domain specific iommu_domain_ops.
- Add a helper to allocate an SVA domain. The iommu_domain_free()
is still used to free an SVA domain.
The report_iommu_fault() should be replaced by the new
iommu_report_device_fault(). Leave the existing fault handler with the
existing users and the newly added SVA members excludes it.
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):
- SVA (Shared Virtual Address)
- kernel DMA with PASID
- hardware-assist mediated device
This adds the set_dev_pasid domain ops for setting the domain onto a
PASID of a device and remove_dev_pasid iommu ops for removing any setup
on a PASID of device. This also adds interfaces for device drivers to
attach/detach/retrieve a domain for a PASID of a device.
If multiple devices share a single group, it's fine as long the fabric
always routes every TLP marked with a PASID to the host bridge and only
the host bridge. For example, ACS achieves this universally and has been
checked when pci_enable_pasid() is called. As we can't reliably tell the
source apart in a group, all the devices in a group have to be considered
as the same source, and mapped to the same PASID table.
The DMA ownership is about the whole device (more precisely, iommu group),
including the RID and PASIDs. When the ownership is converted, the pasid
array must be empty. This also adds necessary checks in the DMA ownership
interfaces.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The current kernel DMA with PASID support is based on the SVA with a flag
SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address
space to a PASID of the device. The device driver programs the device with
kernel virtual address (KVA) for DMA access. There have been security and
functional issues with this approach:
- The lack of IOTLB synchronization upon kernel page table updates.
(vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
- Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address. There are also no use
cases yet where DMA engines need kernel virtual addresses for in-kernel
DMA.
This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface.
The device drivers are suggested to handle kernel DMA with PASID through
the kernel DMA APIs.
The drvdata parameter in iommu_sva_bind_device() and all callbacks is not
needed anymore. Cleanup them as well.
Link: https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use this field to save the number of PASIDs that a device is able to
consume. It is a generic attribute of a device and lifting it into the
per-device dev_iommu struct could help to avoid the boilerplate code
in various IOMMU drivers.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use this field to keep the number of supported PASIDs that an IOMMU
hardware is able to support. This is a generic attribute of an IOMMU
and lifting it into the per-IOMMU device structure makes it possible
to allocate a PASID for device without calls into the IOMMU drivers.
Any iommu driver that supports PASID related features should set this
field before enabling them on the devices.
In the Intel IOMMU driver, intel_iommu_sm is moved to CONFIG_INTEL_IOMMU
enclave so that the pasid_supported() helper could be used in dmar.c
without compilation errors.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cases like VFIO wish to attach a device to an existing domain that was
not allocated specifically from the device. This raises a condition
where the IOMMU driver can fail the domain attach because the domain and
device are incompatible with each other.
This is a soft failure that can be resolved by using a different domain.
Provide a dedicated errno EINVAL from the IOMMU driver during attach that
the reason why the attach failed is because of domain incompatibility.
VFIO can use this to know that the attach is a soft failure and it should
continue searching. Otherwise, the attach will be a hard failure and VFIO
will return the code to userspace.
Update kdocs to add rules of return value to the attach_dev op and APIs.
Link: https://lore.kernel.org/r/bd56d93c18621104a0fa1b0de31e9b760b81b769.1666042872.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add gfp parameter to iommu_alloc_resv_region() for the callers to specify
the memory allocation behavior. Thus iommu_alloc_resv_region() could also
be available in critical contexts.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The iommu-dma layer is now mostly encapsulated by iommu_dma_ops, with
only a couple more public interfaces left pertaining to MSI integration.
Since these depend on the main IOMMU API header anyway, move their
declarations there, taking the opportunity to update the half-baked
comments to proper kerneldoc along the way.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/9cd99738f52094e6bed44bfee03fa4f288d20695.1660668998.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Clean up the remaining trivial bus_set_iommu() callsites along
with the implementation. Now drivers only have to know and care
about iommu_device instances, phew!
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> # s390
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ea383d5f4d74ffe200ab61248e5de6e95846180a.1660572783.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With all callers now converted to the device-specific version, retire
the old bus-based interface, and give drivers the chance to indicate
accurate per-instance capabilities.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/d8bd8777d06929ad8f49df7fc80e1b9af32a41b5.1660574547.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
dev_has_feat has been removed from iommu_ops in commit
309c56e846 ("iommu: remove the unused dev_has_feat method"),
remove its description in the struct doc.
Signed-off-by: Yuan Can <yuancan@huawei.com>
Link: https://lore.kernel.org/r/20220815013339.2552-1-yuancan@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
All drivers that implement get_resv_regions just use
generic_put_resv_regions to implement the put side. Remove the
indirections and document the allocations constraints.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220708080616.238833-4-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This method is never actually called.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220708080616.238833-2-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Parse through the IORT RMR nodes and populate the reserve region list
corresponding to a given IOMMU and device(optional). Also, go through
the ID mappings of the RMR node and retrieve all the SIDs associated
with it.
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Steven Price <steven.price@arm.com>
Tested-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220615101044.1972-5-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A callback is introduced to struct iommu_resv_region to free memory
allocations associated with the reserved region. This will be useful
when we introduce support for IORT RMR based reserved regions.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steven Price <steven.price@arm.com>
Tested-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220615101044.1972-2-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
While the comment was correct that this flag was intended to convey the
block no-snoop support in the IOMMU, it has become widely implemented and
used to mean the IOMMU supports IOMMU_CACHE as a map flag. Only the Intel
driver was different.
Now that the Intel driver is using enforce_cache_coherency() update the
comment to make it clear that IOMMU_CAP_CACHE_COHERENCY is only about
IOMMU_CACHE. Fix the Intel driver to return true since IOMMU_CACHE always
works.
The two places that test this flag, usnic and vdpa, are both assigning
userspace pages to a driver controlled iommu_domain and require
IOMMU_CACHE behavior as they offer no way for userspace to synchronize
caches.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY and
IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU.
Currently only Intel and AMD IOMMUs are known to support this
feature. They both implement it as an IOPTE bit, that when set, will cause
PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though
the no-snoop bit was clear.
The new API is triggered by calling enforce_cache_coherency() before
mapping any IOVA to the domain which globally switches on no-snoop
blocking. This allows other implementations that might block no-snoop
globally and outside the IOPTE - AMD also documents such a HW capability.
Leave AMD out of sync with Intel and have it block no-snoop even for
in-kernel users. This can be trivially resolved in a follow up patch.
Only VFIO needs to call this API because it does not have detailed control
over the device to avoid requesting no-snoop behavior at the device
level. Other places using domains with real kernel drivers should simply
avoid asking their devices to set the no-snoop bit.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The iommu group changes notifer is not referenced in the tree. Remove it
to avoid dead code.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220418005000.897664-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Multiple devices may be placed in the same IOMMU group because they
cannot be isolated from each other. These devices must either be
entirely under kernel control or userspace control, never a mixture.
This adds dma ownership management in iommu core and exposes several
interfaces for the device drivers and the device userspace assignment
framework (i.e. VFIO), so that any conflict between user and kernel
controlled dma could be detected at the beginning.
The device driver oriented interfaces are,
int iommu_device_use_default_domain(struct device *dev);
void iommu_device_unuse_default_domain(struct device *dev);
By calling iommu_device_use_default_domain(), the device driver tells
the iommu layer that the device dma is handled through the kernel DMA
APIs. The iommu layer will manage the IOVA and use the default domain
for DMA address translation.
The device user-space assignment framework oriented interfaces are,
int iommu_group_claim_dma_owner(struct iommu_group *group,
void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);
The device userspace assignment must be disallowed if the DMA owner
claiming interface returns failure.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
VT-d's dmar_platform_optin() actually represents a combination of
properties fairly well standardised by Microsoft as "Pre-boot DMA
Protection" and "Kernel DMA Protection"[1]. As such, we can provide
interested consumers with an abstracted capability rather than
driver-specific interfaces that won't scale. We name it for the former
aspect since that's what external callers are most likely to be
interested in; the latter is for the IOMMU layer to handle itself.
[1] https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-kernel-dma-protection
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/d6218dff2702472da80db6aec2c9589010684551.1650878781.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
iommu_capable() only really works for systems where all IOMMU instances
are completely homogeneous, and all devices are IOMMU-mapped. Implement
the new variant which will be able to give a more accurate answer for
whichever device the caller is actually interested in, and even more so
once all the external users have been converted and we can reliably pass
the device pointer through the internal driver interface too.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/8407eb9586677995b7a9fd70d0fd82d85929a9bb.1650878781.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the domain specific operations out of struct iommu_ops into a new
structure that only has domain specific operations. This solves the
problem of needing to know if the method vector for a given operation
needs to be retrieved from the device or the domain. Logically the domain
ops are the ones that make sense for external subsystems and endpoint
drivers to use, while device ops, with the sole exception of domain_alloc,
are IOMMU API internals.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The is_attach_deferred iommu_ops callback is a device op. The domain
argument is unnecessary and never used. Remove it to make code clean.
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The common iommu_ops is hooked to both device and domain. When a helper
has both device and domain pointer, the way to get the iommu_ops looks
messy in iommu core. This sorts out the way to get iommu_ops. The device
related helpers go through device pointer, while the domain related ones
go through domain pointer.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The apply_resv_region callback in iommu_ops was introduced to reserve an
IOVA range in the given DMA domain when the IOMMU driver manages the IOVA
by itself. As all drivers converted to use dma-iommu in the core, there's
no driver using this anymore. Remove it to avoid dead code.
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The aux-domain related interfaces and iommu_ops are not referenced
anywhere in the tree. We've also reached a consensus to redesign it
based the new iommufd framework. Remove them to avoid dead code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The guest pasid related uapi interfaces and definitions are not referenced
anywhere in the tree. We've also reached a consensus to replace them with
a new iommufd design. Remove them to avoid dead code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
page->freelist is for the use of slab. We already have the ability
to free a list of pages in the core mm, but it requires the use of a
list_head and for the pages to be chained together through page->lru.
Switch the Intel IOMMU and IOVA code over to using free_pages_list().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[rm: split from original patch, cosmetic tweaks, fix fq entries]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/2115b560d9a0ce7cd4b948bd51a2b7bde8fdfd59.1639753638.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
- New DART IOMMU driver for Apple Silicon M1 chips.
- Optimizations for iommu_[map/unmap] performance
- Selective TLB flush support for the AMD IOMMU driver to make
it more efficient on emulated IOMMUs.
- Rework IOVA setup and default domain type setting to move more
code out of IOMMU drivers and to support runtime switching
between certain types of default domains.
- VT-d Updates from Lu Baolu:
- Update the virtual command related registers
- Enable Intel IOMMU scalable mode by default
- Preset A/D bits for user space DMA usage
- Allow devices to have more than 32 outstanding PRs
- Various cleanups
- ARM SMMU Updates from Will Deacon:
- SMMUv3: Minor optimisation to avoid zeroing struct members on CMD submission
- SMMUv3: Increased use of batched commands to reduce submission latency
- SMMUv3: Refactoring in preparation for ECMDQ support
- SMMUv2: Fix races when probing devices with identical StreamIDs
- SMMUv2: Optimise walk cache flushing for Qualcomm implementations
- SMMUv2: Allow deep sleep states for some Qualcomm SoCs with shared clocks
- Various smaller optimizations, cleanups, and fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmEyKYAACgkQK/BELZcB
GuOzAxAAnJ02PG07BnFFFGN2/o3eVON4LQUXquMePjcZ8A8oQf073jO/ybWNnpJK
5V+DHRg2CAugFHks/EwIrxFXAWZuStrcnk81d8t6T6ROQl47Zv1qshksUTnsDQnz
V7mQ1P/pcsBwCUf73aD9ncLmLkiuTVfHKoKfe3gHeQI+H+2Lw4ijzB8kIwUqhkHI
heJZLDmO87S2Mr7zlCmMQH5R550fHrTKSbUCx9QqFu3GgWsjkU+3u1S17xR1bEoW
hmhJhyAw+MLrSgdeG4U9o+6AcQuRELEHfVSq7PtDxQ6hEVziGYGY4Nk+YiEcXFiv
mu9qfEkaP/2QOKszvks+nhHrwDnJ9WLnEEskiEFwjsaFauIKsRscfYVUBTWeYXJT
9t/PVngigWLDhGO0NEPthQvJExvJJs1MQQ72CcA6dd0XdGpN+aRglIUWUJP/nQHd
doAx4/1YWnHVkWWUef8NgmVvlHdoXjA7vy4QGL9FYCqV6ImfhAkJYKJ99X6Ovlmk
gje/Kx+5wUPT2nXNbTkjalIylyUNpugMY4xD7K06VXjvRMUf2SbYNDQxYJaDDld6
nDt0F0NvEyrj7HO8egwIZbX3MOikhMGHur48yEyCTbm+9oHQffkODq1o4OfuxJh2
nq0G5Plln9CEmhQVwzibcPSNlYPe8AZbbXqQ9DrJFusEpYj+01c=
=zVaQ
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- New DART IOMMU driver for Apple Silicon M1 chips
- Optimizations for iommu_[map/unmap] performance
- Selective TLB flush support for the AMD IOMMU driver to make it more
efficient on emulated IOMMUs
- Rework IOVA setup and default domain type setting to move more code
out of IOMMU drivers and to support runtime switching between certain
types of default domains
- VT-d Updates from Lu Baolu:
- Update the virtual command related registers
- Enable Intel IOMMU scalable mode by default
- Preset A/D bits for user space DMA usage
- Allow devices to have more than 32 outstanding PRs
- Various cleanups
- ARM SMMU Updates from Will Deacon:
SMMUv3:
- Minor optimisation to avoid zeroing struct members on CMD submission
- Increased use of batched commands to reduce submission latency
- Refactoring in preparation for ECMDQ support
SMMUv2:
- Fix races when probing devices with identical StreamIDs
- Optimise walk cache flushing for Qualcomm implementations
- Allow deep sleep states for some Qualcomm SoCs with shared clocks
- Various smaller optimizations, cleanups, and fixes
* tag 'iommu-updates-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (85 commits)
iommu/io-pgtable: Abstract iommu_iotlb_gather access
iommu/arm-smmu: Fix missing unlock on error in arm_smmu_device_group()
iommu/vt-d: Add present bit check in pasid entry setup helpers
iommu/vt-d: Use pasid_pte_is_present() helper function
iommu/vt-d: Drop the kernel doc annotation
iommu/vt-d: Allow devices to have more than 32 outstanding PRs
iommu/vt-d: Preset A/D bits for user space DMA usage
iommu/vt-d: Enable Intel IOMMU scalable mode by default
iommu/vt-d: Refactor Kconfig a bit
iommu/vt-d: Remove unnecessary oom message
iommu/vt-d: Update the virtual command related registers
iommu: Allow enabling non-strict mode dynamically
iommu: Merge strictness and domain type configs
iommu: Only log strictness for DMA domains
iommu: Expose DMA domain strictness via sysfs
iommu: Express DMA strictness via the domain type
iommu/vt-d: Prepare for multiple DMA domain types
iommu/arm-smmu: Prepare for multiple DMA domain types
iommu/amd: Prepare for multiple DMA domain types
iommu: Introduce explicit type for non-strict DMA domains
...
Previously io-pgtable merely passed the iommu_iotlb_gather pointer
through to helpers, but now it has grown its own direct dereference.
This turns out to break the build for !IOMMU_API configs where the
structure only has a dummy definition. It will probably also crash
drivers who don't use the gather mechanism and simply pass in NULL.
Wrap this dereference in a suitable helper which can both be stubbed
out for !IOMMU_API and encapsulate a NULL check otherwise.
Fixes: 7a7c5badf8 ("iommu: Indicate queued flushes via gather data")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/83672ee76f6405c82845a55c148fa836f56fbbc1.1629465282.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Eliminate the iommu_get_dma_strict() indirection and pipe the
information through the domain type from the beginning. Besides
the flow simplification this also has several nice side-effects:
- Automatically implies strict mode for untrusted devices by
virtue of their IOMMU_DOMAIN_DMA override.
- Ensures that we only end up using flush queues for drivers
which are aware of them and can actually benefit.
- Allows us to handle flush queue init failure by falling back
to strict mode instead of leaving it to possibly blow up later.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/47083d69155577f1367877b1594921948c366eb3.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Promote the difference between strict and non-strict DMA domains from an
internal detail to a distinct domain feature and type, to pave the road
for exposing it through the sysfs default domain interface.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/08cd2afaf6b63c58ad49acec3517c9b32c2bb946.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Since iommu_iotlb_gather exists to help drivers optimise flushing for a
given unmap request, it is also the logical place to indicate whether
the unmap is strict or not, and thus help them further optimise for
whether to expect a sync or a flush_all subsequently. As part of that,
it also seems fair to make the flush queue code take responsibility for
enforcing the really subtle ordering requirement it brings, so that we
don't need to worry about forgetting that if new drivers want to add
flush queue support, and can consolidate the existing versions.
While we're adding to the kerneldoc, also fill in some info for
@freelist which was overlooked previously.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/bf5f8e2ad84e48c712ccbf80fa8c610594c7595f.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that everyone has converged on iommu-dma for IOMMU_DOMAIN_DMA
support, we can abandon the notion of drivers being responsible for the
cookie type, and consolidate all the management into the core code.
CC: Yong Wu <yong.wu@mediatek.com>
CC: Chunyan Zhang <chunyan.zhang@unisoc.com>
CC: Maxime Ripard <mripard@kernel.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/46a2c0e7419c7d1d931762dc7b6a69fa082d199a.1628682048.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Convert to ssize_t return code so the return code from __iommu_map()
can be returned all the way down through dma_iommu_map_sg().
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Refactor iommu_iotlb_gather_add_page() and factor out the logic that
detects whether IOTLB gather range and a new range are disjoint. To be
used by the next patch that implements different gathering logic for
AMD.
Note that updating gather->pgsize unconditionally does not affect
correctness as the function had (and has) an invariant, in which
gather->pgsize always represents the flushing granularity of its range.
Arguably, “size" should never be zero, but lets assume for the matter of
discussion that it might.
If "size" equals to "gather->pgsize", then the assignment in question
has no impact.
Otherwise, if "size" is non-zero, then iommu_iotlb_sync() would
initialize the size and range (see iommu_iotlb_gather_init()), and the
invariant is kept.
Otherwise, "size" is zero, and "gather" already holds a range, so
gather->pgsize is non-zero and (gather->pgsize && gather->pgsize !=
size) is true. Therefore, again, iommu_iotlb_sync() would be called and
initialize the size.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210723093209.714328-5-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The Mediatek driver is not the only one which might want a basic
address-based gathering behaviour, so although it's arguably simple
enough to open-code, let's factor it out for the sake of cleanliness.
Let's also take this opportunity to document the intent of these
helpers for clarity.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210723093209.714328-4-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We only ever now set strict mode enabled in iommu_set_dma_strict(), so
just remove the argument.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-7-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a callback for IOMMU drivers to provide a path for the
IOMMU framework to call into an IOMMU driver, which can
call into the io-pgtable code, to map a physically contiguous
rnage of pages of the same size.
For IOMMU drivers that do not specify a map_pages() callback,
the existing logic of mapping memory one page block at a time
will be used.
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Suggested-by: Will Deacon <will@kernel.org>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Link: https://lore.kernel.org/r/1623850736-389584-5-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a callback for IOMMU drivers to provide a path for the
IOMMU framework to call into an IOMMU driver, which can call
into the io-pgtable code, to unmap a virtually contiguous
range of pages of the same size.
For IOMMU drivers that do not specify an unmap_pages() callback,
the existing logic of unmapping memory one page block at a time
will be used.
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Link: https://lore.kernel.org/r/1623850736-389584-3-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rather than have separate opaque setter functions that are easy to
overlook and lead to repetitive boilerplate in drivers, let's pass the
relevant initialisation parameters directly to iommu_device_register().
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ab001b87c533b6f4db71eb90db6f888953986c36.1617285386.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
It happens that the 3 drivers which first supported being modular are
also ones which play games with their pgsize_bitmap, so have non-const
iommu_ops where dynamically setting the owner manages to work out OK.
However, it's less than ideal to force that upon all drivers which want
to be modular - like the new sprd-iommu driver which now has a potential
bug in that regard - so let's just statically set the module owner and
let ops remain const wherever possible.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/31423b99ff609c3d4b291c701a7a7a810d9ce8dc.1617285386.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use an explicit set_pgtable_quirks method instead that just passes
the actual quirk bitmask instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-20-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.
This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86. As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.
[ported on top of the other iommu_attr changes and added a few small
missing bits]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210401155256.298656-19-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use an explicit enable_nesting method instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-17-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The geometry information can be trivially queried from the iommu_domain
struture.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-16-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
DOMAIN_ATTR_PAGING is never used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-15-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Instead of a separate call to enable all devices from the list, just
enable the liodn once the device is attached to the iommu domain.
This also remove the DOMAIN_ATTR_FSL_PAMU_ENABLE iommu_attr.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-11-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a fsl_pamu_configure_l1_stash API that qman_portal can call directly
instead of indirecting through the iommu attr API.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-8-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The only thing that fsl_pamu_window_enable does for the current caller
is to fill in the prot value in the only dma_window structure, and to
propagate a few values from the iommu_domain_geometry struture into the
dma_window. Remove the dma_window entirely, hardcode the prot value and
otherwise use the iommu_domain_geometry structure instead.
Remove the now unused ->domain_window_enable iommu method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-7-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The only domains allocated forces use of a single window. Remove all
the code related to multiple window support, as well as the need for
qman_portal to force a single window.
Remove the now unused DOMAIN_ATTR_WINDOWS iommu_attr.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-6-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
None of the values returned by this function are ever queried. Also
remove the DOMAIN_ATTR_FSL_PAMUV1 enum value that is not otherwise used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-3-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
domain_window_disable is wired up by fsl_pamu, but never actually called.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-2-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Some systems allow devices to handle I/O Page Faults in the core mm. For
example systems implementing the PCIe PRI extension or Arm SMMU stall
model. Infrastructure for reporting these recoverable page faults was
added to the IOMMU core by commit 0c830e6b32 ("iommu: Introduce device
fault report API"). Add a page fault handler for host SVA.
IOMMU driver can now instantiate several fault workqueues and link them
to IOPF-capable devices. Drivers can choose between a single global
workqueue, one per IOMMU device, one per low-level fault queue, one per
domain, etc.
When it receives a fault event, most commonly in an IRQ handler, the
IOMMU driver reports the fault using iommu_report_device_fault(), which
calls the registered handler. The page fault handler then calls the mm
fault handler, and reports either success or failure with
iommu_page_response(). After the handler succeeds, the hardware retries
the access.
The iopf_param pointer could be embedded into iommu_fault_param. But
putting iopf_param into the iommu_param structure allows us not to care
about ordering between calls to iopf_queue_add_device() and
iommu_register_device_fault_handler().
Tested-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20210401154718.307519-7-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Some devices manage I/O Page Faults (IOPF) themselves instead of relying
on PCIe PRI or Arm SMMU stall. Allow their drivers to enable SVA without
mandating IOMMU-managed IOPF. The other device drivers now need to first
enable IOMMU_DEV_FEAT_IOPF before enabling IOMMU_DEV_FEAT_SVA. Enabling
IOMMU_DEV_FEAT_IOPF on its own doesn't have any effect visible to the
device driver, it is used in combination with other features.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20210401154718.307519-4-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The pasid-num-bits property shouldn't need a dedicated fwspec field,
it's a job for device properties. Add properties for IORT, and access
the number of PASID bits using device_property_read_u32().
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://lore.kernel.org/r/20210401154718.307519-3-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
After the change of patch ("iommu: Switch gather->end to the
inclusive end"), the performace drops from 1600+K IOPS to 1200K in our
kunpeng ARM64 platform.
We find that the range [start1, end1) actually is joint from the range
[end1, end2), but it is considered as disjoint after the change,
so it needs more times of TLB sync, and spends more time on it.
So fix the boundary issue to avoid performance drop.
Fixes: 862c3715de ("iommu: Switch gather->end to the inclusive end")
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/1616643504-120688-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently, because domain attach allows to be deferred from iommu
driver to device driver, and when iommu initializes, the devices
on the bus will be scanned and the default groups will be allocated.
Due to the above changes, some devices could be added to the same
group as below:
[ 3.859417] pci 0000:01:00.0: Adding to iommu group 16
[ 3.864572] pci 0000:01:00.1: Adding to iommu group 16
[ 3.869738] pci 0000:02:00.0: Adding to iommu group 17
[ 3.874892] pci 0000:02:00.1: Adding to iommu group 17
But when attaching these devices, it doesn't allow that a group has
more than one device, otherwise it will return an error. This conflicts
with the deferred attaching. Unfortunately, it has two devices in the
same group for my side, for example:
[ 9.627014] iommu_group_device_count(): device name[0]:0000:01:00.0
[ 9.633545] iommu_group_device_count(): device name[1]:0000:01:00.1
...
[ 10.255609] iommu_group_device_count(): device name[0]:0000:02:00.0
[ 10.262144] iommu_group_device_count(): device name[1]:0000:02:00.1
Finally, which caused the failure of tg3 driver when tg3 driver calls
the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma().
[ 9.660310] tg3 0000:01:00.0: DMA engine test failed, aborting
[ 9.754085] tg3: probe of 0000:01:00.0 failed with error -12
[ 9.997512] tg3 0000:01:00.1: DMA engine test failed, aborting
[ 10.043053] tg3: probe of 0000:01:00.1 failed with error -12
[ 10.288905] tg3 0000:02:00.0: DMA engine test failed, aborting
[ 10.334070] tg3: probe of 0000:02:00.0 failed with error -12
[ 10.578303] tg3 0000:02:00.1: DMA engine test failed, aborting
[ 10.622629] tg3: probe of 0000:02:00.1 failed with error -12
In addition, the similar situations also occur in other drivers such
as the bnxt_en driver. That can be reproduced easily in kdump kernel
when SME is active.
Let's move the handling currently in iommu_dma_deferred_attach() into
the iommu core code so that it can call the __iommu_attach_device()
directly instead of the iommu_attach_device(). The external interface
iommu_attach_device() is not suitable for handling this situation.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20210126115337.20068-3-lijiang@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently gather->end is "unsigned long" which may be overflow in
arch32 in the corner case: 0xfff00000 + 0x100000(iova + size).
Although it doesn't affect the size(end - start), it affects the checking
"gather->end < end"
This patch changes this "end" to the real end address
(end = start + size - 1). Correspondingly, update the length to
"end - start + 1".
Fixes: a7d20dc19d ("iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes")
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210107122909.16317-5-yong.wu@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
mapping. This patch adds iova and size as the parameters in it. then the
IOMMU driver could flush tlb with the whole range once after iova mapping
to improve performance.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210107122909.16317-3-yong.wu@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
Function iommu_dev_has_feature() has never been referenced in the tree,
and there does not appear to be anything coming soon to use it, so delete
it.
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/1609940111-28563-7-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Intel VT-D updates for 5.11. The main thing here is converting the code
over to the iommu-dma API, which required some improvements to the core
code to preserve existing functionality.
* for-next/iommu/vt-d:
iommu/vt-d: Avoid GFP_ATOMIC where it is not needed
iommu/vt-d: Remove set but not used variable
iommu/vt-d: Cleanup after converting to dma-iommu ops
iommu/vt-d: Convert intel iommu driver to the iommu ops
iommu/vt-d: Update domain geometry in iommu_ops.at(de)tach_dev
iommu: Add quirk for Intel graphic devices in map_sg
iommu: Allow the dma-iommu api to use bounce buffers
iommu: Add iommu_dma_free_cpu_cached_iovas()
iommu: Handle freelists when using deferred flushing in iommu drivers
iommu/vt-d: include conditionally on CONFIG_INTEL_IOMMU_SVM
Add a new iommu domain attribute DOMAIN_ATTR_IO_PGTABLE_CFG
for pagetable configuration which initially will be used to
set quirks like for system cache aka last level cache to be
used by client drivers like GPU to set right attributes for
caching the hardware pagetables into the system cache and
later can be extended to include other page table configuration
data.
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Link: https://lore.kernel.org/r/9190aa16f378fc0a7f8e57b2b9f60b033e7eeb4f.1606287059.git.saiprakash.ranjan@codeaurora.org
Signed-off-by: Will Deacon <will@kernel.org>
Allow the iommu_unmap_fast to return newly freed page table pages and
pass the freelist to queue_iova in the dma-iommu ops path.
This is useful for iommu drivers (in this case the intel iommu driver)
which need to wait for the ioTLB to be flushed before newly
free/unmapped page table pages can be freed. This way we can still batch
ioTLB free operations and handle the freelists.
Signed-off-by: Tom Murphy <murphyt7@tcd.ie>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://lore.kernel.org/r/20201124082057.2614359-2-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
Including:
- ARM-SMMU Updates from Will:
- Continued SVM enablement, where page-table is shared with
CPU
- Groundwork to support integrated SMMU with Adreno GPU
- Allow disabling of MSI-based polling on the kernel
command-line
- Minor driver fixes and cleanups (octal permissions, error
messages, ...)
- Secure Nested Paging Support for AMD IOMMU. The IOMMU will
fault when a device tries DMA on memory owned by a guest. This
needs new fault-types as well as a rewrite of the IOMMU memory
semaphore for command completions.
- Allow broken Intel IOMMUs (wrong address widths reported) to
still be used for interrupt remapping.
- IOMMU UAPI updates for supporting vSVA, where the IOMMU can
access address spaces of processes running in a VM.
- Support for the MT8167 IOMMU in the Mediatek IOMMU driver.
- Device-tree updates for the Renesas driver to support r8a7742.
- Several smaller fixes and cleanups all over the place.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl+Fy9MACgkQK/BELZcB
GuNxtRAA0TdYHXt6XyLWmvRAX/ySZSz6KOneZWWwpsQ9wh2/iv1PtBsrV0ltf+6g
CaX4ROZUVRbV9wPD+7maBRbzxrG3QhfEaaV+K45Q2J/QE1wjkyV8qj1eORWTUUoc
nis4FhGDKk2ER/Gsajy2Hjs4+6i43gdWG/+ghVGaCRo8mCZyoz1/6AyMQyN3deuO
NqWOv9E7hsavZjRs/w/LXG7eSE20cZwtt//kPVJF0r9eQqC6i1eJDQj48iRqJVqd
R0dwBQZaLz++qQptyKebDNlmH/3aAsb+A8nCeS7ZwHqWC1QujTWOUYWpFyPPbOmC
KVsQXzTzRfnVTDECF1Pk5d3yi45KILLU3B4zDJfUJjbL3KDYjuVUvhHF/pcGcjC3
H1LWJqHSAL8sJwHvKhpi0VtQ5SOxXnLO5fGG/CZT/Xb4QyM+mkwkFLdn1TryZTR/
M4XA+QuI96TzY7HQUJdSoEDANxoBef6gPnxdDKOnK1v4hfNsPAl7o8hZkM3w0DK8
GoFZUV+vjBhFcymGcQegSNiea28Hfi+hBe+PPHCmw+tJm47cketD5uP5jJ5NGaUe
MKU/QXWXc6oqeBTQT6ki5zJbJXKttbPa8eEmp+FrMatc9kruvBVhQoMbj7Vd3CA1
dC4zK9Awy7yj24ZhZfnAFx2DboCmBTUI3QKjDt9K5PRZyMeyoP8=
=C0Sg
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- ARM-SMMU Updates from Will:
- Continued SVM enablement, where page-table is shared with CPU
- Groundwork to support integrated SMMU with Adreno GPU
- Allow disabling of MSI-based polling on the kernel command-line
- Minor driver fixes and cleanups (octal permissions, error
messages, ...)
- Secure Nested Paging Support for AMD IOMMU. The IOMMU will fault when
a device tries DMA on memory owned by a guest. This needs new
fault-types as well as a rewrite of the IOMMU memory semaphore for
command completions.
- Allow broken Intel IOMMUs (wrong address widths reported) to still be
used for interrupt remapping.
- IOMMU UAPI updates for supporting vSVA, where the IOMMU can access
address spaces of processes running in a VM.
- Support for the MT8167 IOMMU in the Mediatek IOMMU driver.
- Device-tree updates for the Renesas driver to support r8a7742.
- Several smaller fixes and cleanups all over the place.
* tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (57 commits)
iommu/vt-d: Gracefully handle DMAR units with no supported address widths
iommu/vt-d: Check UAPI data processed by IOMMU core
iommu/uapi: Handle data and argsz filled by users
iommu/uapi: Rename uapi functions
iommu/uapi: Use named union for user data
iommu/uapi: Add argsz for user filled data
docs: IOMMU user API
iommu/qcom: add missing put_device() call in qcom_iommu_of_xlate()
iommu/arm-smmu-v3: Add SVA device feature
iommu/arm-smmu-v3: Check for SVA features
iommu/arm-smmu-v3: Seize private ASID
iommu/arm-smmu-v3: Share process page tables
iommu/arm-smmu-v3: Move definitions to a header
iommu/io-pgtable-arm: Move some definitions to a header
iommu/arm-smmu-v3: Ensure queue is read after updating prod pointer
iommu/amd: Re-purpose Exclusion range registers to support SNP CWWB
iommu/amd: Add support for RMP_PAGE_FAULT and RMP_HW_ERR
iommu/amd: Use 4K page for completion wait write-back semaphore
iommu/tegra-smmu: Allow to group clients in same swgroup
iommu/tegra-smmu: Fix iova->phys translation
...
IOMMU user APIs are responsible for processing user data. This patch
changes the interface such that user pointers can be passed into IOMMU
code directly. Separate kernel APIs without user pointers are introduced
for in-kernel users of the UAPI functionality.
IOMMU UAPI data has a user filled argsz field which indicates the data
length of the structure. User data is not trusted, argsz must be
validated based on the current kernel data size, mandatory data size,
and feature flags.
User data may also be extended, resulting in possible argsz increase.
Backward compatibility is ensured based on size and flags (or
the functional equivalent fields) checking.
This patch adds sanity checks in the IOMMU layer. In addition to argsz,
reserved/unused fields in padding, flags, and version are also checked.
Details are documented in Documentation/userspace-api/iommu.rst
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/1601051567-54787-6-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
User APIs such as iommu_sva_unbind_gpasid() may also be used by the
kernel. Since we introduced user pointer to the UAPI functions,
in-kernel callers cannot share the same APIs. In-kernel callers are also
trusted, there is no need to validate the data.
We plan to have two flavors of the same API functions, one called
through ioctls, carrying a user pointer and one called directly with
valid IOMMU UAPI structs. To differentiate both, let's rename existing
functions with an iommu_uapi_ prefix.
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/1601051567-54787-5-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".
No PASID type change in uapi although it defines PASID as __u64 in
some places.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1600187413-163670-2-git-send-email-fenghua.yu@intel.com
To keep naming consistent we should stick with *iotlb*. This patch
renames a few remaining functions.
Signed-off-by: Tom Murphy <murphyt7@tcd.ie>
Link: https://lore.kernel.org/r/20200817210051.13546-1-murphyt7@tcd.ie
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The IOMMU_SYS_CACHE_ONLY flag was never exposed via the DMA API and
has no in-tree users. Remove it.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Isaac J. Manjarres" <isaacm@codeaurora.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Rob Clark <robdclark@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Will Deacon <will@kernel.org>
Move the recently added sg_table wrapper out of CONFIG_IOMMU_SUPPORT to
let the client code copile also when IOMMU support is disabled.
Fixes: 48530d9fab ("iommu: add generic helper for mapping sgtable objects")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200630081756.18526-1-m.szyprowski@samsung.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
- A big part of this is a change in how devices get connected to
IOMMUs in the core code. It contains the change from the old
add_device()/remove_device() to the new
probe_device()/release_device() call-backs. As a result
functionality that was previously in the IOMMU drivers has
been moved to the IOMMU core code, including IOMMU group
allocation for each device.
The reason for this change was to get more robust allocation
of default domains for the iommu groups.
A couple of fixes were necessary after this was merged into
the IOMMU tree, but there are no known bugs left. The last fix
is applied on-top of the merge commit for the topic branches.
- Removal of the driver private domain handling in the Intel
VT-d driver. This was fragile code and I am glad it is gone
now.
- More Intel VT-d updates from Lu Baolu:
- Nested Shared Virtual Addressing (SVA) support to the
Intel VT-d driver
- Replacement of the Intel SVM interfaces to the common
IOMMU SVA API
- SVA Page Request draining support
- ARM-SMMU Updates from Will:
- Avoid mapping reserved MMIO space on SMMUv3, so that
it can be claimed by the PMU driver
- Use xarray to manage ASIDs on SMMUv3
- Reword confusing shutdown message
- DT compatible string updates
- Allow implementations to override the default domain
type
- A new IOMMU driver for the Allwinner Sun50i platform
- Support for ATS gets disabled for untrusted devices (like
Thunderbolt devices). This includes a PCI patch, acked by
Bjorn.
- Some cleanups to the AMD IOMMU driver to make more use of
IOMMU core features.
- Unification of some printk formats in the Intel and AMD IOMMU
drivers and in the IOVA code.
- Updates for DT bindings
- A number of smaller fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl7eX5gACgkQK/BELZcB
GuOMMQ//Si8h3uC4QhTmeNM6OwYpTcImMuCtqOebVDOJYWfbjGb4U2ZvDSUu4r7u
KGj66pWBq9kciKaM5HcLnWNg4iNNG+iZHwYSOy2DAOdPorWh40aM/Obozdd4D4eK
sXt4uy1JEQem/Bm4eTwmvaJV5/riyK6xn1HVocPejstGSJCh4kal/bYuhj415qEa
LLrN0AcitoPaSRl4Pl7/wEtesk+Az0g94jY9qDhtxIQJXWlAwO25s+rIPy4S7QuW
WAFGU+Xp+J7WC3hQm6nHKQtURIqPHtqozT9Flws9YETuyeKwn47GRitMiAXZsy7R
t+kj1cHyglEhe2hdPnJBSFIjyrO3cCrV7CUVryJHigPCQOaQLjoEegThQCYU3VQu
FPRBX+bp4haHeo3BCBy2jQv4JZrPFkTVXeVEtpMRDOoJLb2OKaI34xbOvGy6dMM0
dFtpbAW2IjHuneJaQCbJIC+jaEYii8mr3Zwok4LS8u8Sy+7PPSKmt6Tti3enD8+C
pBB/0CxNJvQFhl13s6oI8NHTT9D6cPTbjxc2Gfc3UuKyyWsz+eR54gRhaBi0FypA
p6syMosNVjjOaHFd5K5gsbpUFCC3X/drIhqeXRLgQ51mqfkNZMuBBtiyLWTk7iJd
CK+1f2aqtBrpUdSNjTzE/XmR+AhjIn2oIcG/7jPCgYXQoSGM2Sg=
=a4z4
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"A big part of this is a change in how devices get connected to IOMMUs
in the core code. It contains the change from the old add_device() /
remove_device() to the new probe_device() / release_device()
call-backs.
As a result functionality that was previously in the IOMMU drivers has
been moved to the IOMMU core code, including IOMMU group allocation
for each device. The reason for this change was to get more robust
allocation of default domains for the iommu groups.
A couple of fixes were necessary after this was merged into the IOMMU
tree, but there are no known bugs left. The last fix is applied on-top
of the merge commit for the topic branches.
Other than that change, we have:
- Removal of the driver private domain handling in the Intel VT-d
driver. This was fragile code and I am glad it is gone now.
- More Intel VT-d updates from Lu Baolu:
- Nested Shared Virtual Addressing (SVA) support to the Intel VT-d
driver
- Replacement of the Intel SVM interfaces to the common IOMMU SVA
API
- SVA Page Request draining support
- ARM-SMMU Updates from Will:
- Avoid mapping reserved MMIO space on SMMUv3, so that it can be
claimed by the PMU driver
- Use xarray to manage ASIDs on SMMUv3
- Reword confusing shutdown message
- DT compatible string updates
- Allow implementations to override the default domain type
- A new IOMMU driver for the Allwinner Sun50i platform
- Support for ATS gets disabled for untrusted devices (like
Thunderbolt devices). This includes a PCI patch, acked by Bjorn.
- Some cleanups to the AMD IOMMU driver to make more use of IOMMU
core features.
- Unification of some printk formats in the Intel and AMD IOMMU
drivers and in the IOVA code.
- Updates for DT bindings
- A number of smaller fixes and cleanups.
* tag 'iommu-updates-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (109 commits)
iommu: Check for deferred attach in iommu_group_do_dma_attach()
iommu/amd: Remove redundant devid checks
iommu/amd: Store dev_data as device iommu private data
iommu/amd: Merge private header files
iommu/amd: Remove PD_DMA_OPS_MASK
iommu/amd: Consolidate domain allocation/freeing
iommu/amd: Free page-table in protection_domain_free()
iommu/amd: Allocate page-table in protection_domain_init()
iommu/amd: Let free_pagetable() not rely on domain->pt_root
iommu/amd: Unexport get_dev_data()
iommu/vt-d: Fix compile warning
iommu/vt-d: Remove real DMA lookup in find_domain
iommu/vt-d: Allocate domain info for real DMA sub-devices
iommu/vt-d: Only clear real DMA device's context entries
iommu: Remove iommu_sva_ops::mm_exit()
uacce: Remove mm_exit() op
iommu/sun50i: Constify sun50i_iommu_ops
iommu/hyper-v: Constify hyperv_ir_domain_ops
iommu/vt-d: Use pci_ats_supported()
iommu/arm-smmu-v3: Use pci_ats_supported()
...
After binding a device to an mm, device drivers currently need to
register a mm_exit handler. This function is called when the mm exits,
to gracefully stop DMA targeting the address space and flush page faults
to the IOMMU.
This is deemed too complex for the MMU release() notifier, which may be
triggered by any mmput() invocation, from about 120 callsites [1]. The
upcoming SVA module has an example of such complexity: the I/O Page
Fault handler would need to call mmput_async() instead of mmput() after
handling an IOPF, to avoid triggering the release() notifier which would
in turn drain the IOPF queue and lock up.
Another concern is the DMA stop function taking too long, up to several
minutes [2]. For some mmput() callers this may disturb other users. For
example, if the OOM killer picks the mm bound to a device as the victim
and that mm's memory is locked, if the release() takes too long, it
might choose additional innocent victims to kill.
To simplify the MMU release notifier, don't forward the notification to
device drivers. Since they don't stop DMA on mm exit anymore, the PASID
lifetime is extended:
(1) The device driver calls bind(). A PASID is allocated.
Here any DMA fault is handled by mm, and on error we don't print
anything to dmesg. Userspace can easily trigger errors by issuing DMA
on unmapped buffers.
(2) exit_mmap(), for example the process took a SIGKILL. This step
doesn't happen during normal operations. Remove the pgd from the
PASID table, since the page tables are about to be freed. Invalidate
the IOTLBs.
Here the device may still perform DMA on the address space. Incoming
transactions are aborted but faults aren't printed out. ATS
Translation Requests return Successful Translation Completions with
R=W=0. PRI Page Requests return with Invalid Request.
(3) The device driver stops DMA, possibly following release of a fd, and
calls unbind(). PASID table is cleared, IOTLB invalidated if
necessary. The page fault queues are drained, and the PASID is
freed.
If DMA for that PASID is still running here, something went seriously
wrong and errors should be reported.
For now remove iommu_sva_ops entirely. We might need to re-introduce
them at some point, for example to notify device drivers of unhandled
IOPF.
[1] https://lore.kernel.org/linux-iommu/20200306174239.GM31668@ziepe.ca/
[2] https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa796c3@amd.com/
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200423125329.782066-3-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
After moving iommu_group setup to iommu core code [1][2] and removing
private domain support in vt-d [3], there are no users for functions such
as iommu_request_dm_for_dev(), iommu_request_dma_domain_for_dev() and
request_default_domain_for_dev(). So, remove these functions.
[1] commit dce8d6964e ("iommu/amd: Convert to probe/release_device()
call-backs")
[2] commit e5d1841f18 ("iommu/vt-d: Convert to probe/release_device()
call-backs")
[3] commit 327d5b2fee ("iommu/vt-d: Allow 32bit devices to uses DMA
domain")
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200513224721.20504-1-sai.praneeth.prakhya@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
struct sg_table is a common structure used for describing a memory
buffer. It consists of a scatterlist with memory pages and DMA addresses
(sgl entry), as well as the number of scatterlist entries: CPU pages
(orig_nents entry) and DMA mapped pages (nents entry).
It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling mapping functions with a wrong number of entries.
To avoid such issues, lets introduce a common wrapper operating directly
on the struct sg_table objects, which take care of the proper use of
the nents and orig_nents entries.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The function is now only used in IOMMU core code and shouldn't be used
outside of it anyway, so remove the export for it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-35-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
All drivers are converted to use the probe/release_device()
call-backs, so the add_device/remove_device() pointers are unused and
the code using them can be removed.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-33-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a check to the bus_iommu_probe() call-path to make sure it ignores
devices which have already been successfully probed. Then export the
bus_iommu_probe() function so it can be used by IOMMU drivers.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-14-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add call-backs to 'struct iommu_ops' as an alternative to the
add_device() and remove_device() call-backs, which will be removed when
all drivers are converted.
The new call-backs will not setup IOMMU groups and domains anymore,
so also add a probe_finalize() call-back where the IOMMU driver can do
per-device setup work which require the device to be set up with a
group and a domain.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-8-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Some devices are reqired to use a specific type (identity or dma)
of default domain when they are used with a vendor iommu. When the
system level default domain type is different from it, the vendor
iommu driver has to request a new default domain with
iommu_request_dma_domain_for_dev() and iommu_request_dm_for_dev()
in the add_dev() callback. Unfortunately, these two helpers only
work when the group hasn't been assigned to any other devices,
hence, some vendor iommu driver has to use a private domain if
it fails to request a new default one.
This adds def_domain_type() callback in the iommu_ops, so that
any special requirement of default domain for a device could be
aware by the iommu generic layer.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
[ jroedel@suse.de: Added iommu_get_def_domain_type() function and use
it to allocate the default domain ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-3-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the pointer for iommu private data from struct iommu_fwspec to
struct dev_iommu.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Will Deacon <will@kernel.org> # arm-smmu
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20200326150841.10083-17-joro@8bytes.org
Add dev_iommu_priv_get/set() functions to access per-device iommu
private data. This makes it easier to move the pointer to a different
location.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Will Deacon <will@kernel.org> # arm-smmu
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20200326150841.10083-9-joro@8bytes.org
Move the iommu_fwspec pointer in struct device into struct dev_iommu.
This is a step in the effort to reduce the iommu related pointers in
struct device to one.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Will Deacon <will@kernel.org> # arm-smmu
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20200326150841.10083-7-joro@8bytes.org
There are users outside of the IOMMU code that need to call that
function. Define it for !CONFIG_IOMMU_API too so that compilation does
not break.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20200326150841.10083-2-joro@8bytes.org
Although the 1-element array was a typical pre-C99 way to implement
variable-length structures, and indeed is a fundamental construct in the
APIs of certain other popular platforms, there's no good reason for it
here (and in particular the sizeof() trick is far too "clever" for its
own good). We can just as easily implement iommu_fwspec's preallocation
behaviour using a standard flexible array member, so let's make it look
the way most readers would expect.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For platform devices that support SubstreamID (SSID), firmware provides
the number of supported SSID bits. Restrict it to what the SMMU supports
and cache it into master->ssid_bits, which will also be used for PCI
PASID.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Requiring each IOMMU driver to initialise the 'owner' field of their
'struct iommu_ops' is error-prone and easily forgotten. Follow the
example set by PCI and USB by assigning THIS_MODULE automatically when
registering the ops structure with IOMMU core.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
Implement a generic function for removing reserved regions. This can be
used by drivers that don't do anything fancy with these regions other
than allocating memory for them.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
To avoid accidental removal of an active IOMMU driver module, take a
reference to the driver module in 'iommu_probe_device()' immediately
prior to invoking the '->add_device()' callback and hold it until the
after the device has been removed by '->remove_device()'.
Suggested-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The 'IOMMU_QCOM_SYS_CACHE' IOMMU protection flag is exposed to all
users of the IOMMU API. Despite its name, the idea behind it isn't
especially tied to Qualcomm implementations and could conceivably be
used by other systems.
Rename it to 'IOMMU_SYS_CACHE_ONLY' and update the comment to describe
a bit better the idea behind it.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Isaac J. Manjarres" <isaacm@codeaurora.org>
Signed-off-by: Will Deacon <will@kernel.org>
Guest shared virtual address (SVA) may require host to shadow guest
PASID tables. Guest PASID can also be allocated from the host via
enlightened interfaces. In this case, guest needs to bind the guest
mm, i.e. cr3 in guest physical address to the actual PASID table in
the host IOMMU. Nesting will be turned on such that guest virtual
address can go through a two level translation:
- 1st level translates GVA to GPA
- 2nd level translates GPA to HPA
This patch introduces APIs to bind guest PASID data to the assigned
device entry in the physical IOMMU. See the diagram below for usage
explanation.
.-------------. .---------------------------.
| vIOMMU | | Guest process mm, FL only |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush -
'-------------' |
| | V
| | GP
'-------------'
Guest
------| Shadow |----------------------- GP->HP* ---------
v v |
Host v
.-------------. .----------------------.
| pIOMMU | | Bind FL for GVA-GPA |
| | '----------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.---------------------.
| | |Set SL to GPA-HPA |
| | '---------------------'
'-------------'
Where:
- FL = First level/stage one page tables
- SL = Second level/stage two page tables
- GP = Guest PASID
- HP = Host PASID
* Conversion needed if non-identity GP-HP mapping option is chosen.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In any virtualization use case, when the first translation stage
is "owned" by the guest OS, the host IOMMU driver has no knowledge
of caching structure updates unless the guest invalidation activities
are trapped by the virtualizer and passed down to the host.
Since the invalidation data can be obtained from user space and will be
written into physical IOMMU, we must allow security check at various
layers. Therefore, generic invalidation data format are proposed here,
model specific IOMMU drivers need to convert them into their own format.
Signed-off-by: Yi L Liu <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a gfp_t parameter to the iommu_ops::map function.
Remove the needless locking in the AMD iommu driver.
The iommu_ops::map function (or the iommu_map function which calls it)
was always supposed to be sleepable (according to Joerg's comment in
this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
should probably have had a "might_sleep()" since it was written. However
currently the dma-iommu api can call iommu_map in an atomic context,
which it shouldn't do. This doesn't cause any problems because any iommu
driver which uses the dma-iommu api uses gfp_atomic in it's
iommu_ops::map function. But doing this wastes the memory allocators
atomic pools.
Signed-off-by: Tom Murphy <murphyt7@tcd.ie>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a couple of functions to allow changing the default
domain type from architecture code and a function for iommu
drivers to request whether the default domain is
passthrough.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
To allow IOMMU drivers to batch up TLB flushing operations and postpone
them until ->iotlb_sync() is called, extend the prototypes for the
->unmap() and ->iotlb_sync() IOMMU ops callbacks to take a pointer to
the current iommu_iotlb_gather structure.
All affected IOMMU drivers are updated, but there should be no
functional change since the extra parameter is ignored for now.
Signed-off-by: Will Deacon <will@kernel.org>
Introduce a helper function for drivers to use when updating an
iommu_iotlb_gather structure in response to an ->unmap() call, rather
than having to open-code the logic in every page-table implementation.
Signed-off-by: Will Deacon <will@kernel.org>
To permit batching of TLB flushes across multiple calls to the IOMMU
driver's ->unmap() implementation, introduce a new structure for
tracking the address range to be flushed and the granularity at which
the flushing is required.
This is hooked into the IOMMU API and its caller are updated to make use
of the new structure. Subsequent patches will plumb this into the IOMMU
drivers as well, but for now the gathering information is ignored.
Signed-off-by: Will Deacon <will@kernel.org>
Commit add02cfdc9 ("iommu: Introduce Interface for IOMMU TLB Flushing")
added three new TLB flushing operations to the IOMMU API so that the
underlying driver operations can be batched when unmapping large regions
of IO virtual address space.
However, the ->iotlb_range_add() callback has not been implemented by
any IOMMU drivers (amd_iommu.c implements it as an empty function, which
incurs the overhead of an indirect branch). Instead, drivers either flush
the entire IOTLB in the ->iotlb_sync() callback or perform the necessary
invalidation during ->unmap().
Attempting to implement ->iotlb_range_add() for arm-smmu-v3.c revealed
two major issues:
1. The page size used to map the region in the page-table is not known,
and so it is not generally possible to issue TLB flushes in the most
efficient manner.
2. The only mutable state passed to the callback is a pointer to the
iommu_domain, which can be accessed concurrently and therefore
requires expensive synchronisation to keep track of the outstanding
flushes.
Remove the callback entirely in preparation for extending ->unmap() and
->iotlb_sync() to update a token on the caller's stack.
Signed-off-by: Will Deacon <will@kernel.org>
Few Qualcomm platforms such as, sdm845 have an additional outer
cache called as System cache, aka. Last level cache (LLC) that
allows non-coherent devices to upgrade to using caching.
This cache sits right before the DDR, and is tightly coupled
with the memory controller. The clients using this cache request
their slices from this system cache, make it active, and can then
start using it.
There is a fundamental assumption that non-coherent devices can't
access caches. This change adds an exception where they *can* use
some level of cache despite still being non-coherent overall.
The coherent devices that use cacheable memory, and CPU make use of
this system cache by default.
Looking at memory types, we have following -
a) Normal uncached :- MAIR 0x44, inner non-cacheable,
outer non-cacheable;
b) Normal cached :- MAIR 0xff, inner read write-back non-transient,
outer read write-back non-transient;
attribute setting for coherenet I/O devices.
and, for non-coherent i/o devices that can allocate in system cache
another type gets added -
c) Normal sys-cached :- MAIR 0xf4, inner non-cacheable,
outer read write-back non-transient
Coherent I/O devices use system cache by marking the memory as
normal cached.
Non-coherent I/O devices should mark the memory as normal
sys-cached in page tables to use system cache.
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Introduce a new type for reserved region. This corresponds
to directly mapped regions which are known to be relaxable
in some specific conditions, such as device assignment use
case. Well known examples are those used by USB controllers
providing PS/2 keyboard emulation for pre-boot BIOS and
early BOOT or RMRRs associated to IGD working in legacy mode.
Since commit c875d2c1b8 ("iommu/vt-d: Exclude devices using RMRRs
from IOMMU API domains") and commit 18436afdc1 ("iommu/vt-d: Allow
RMRR on graphics devices too"), those regions are currently
considered "safe" with respect to device assignment use case
which requires a non direct mapping at IOMMU physical level
(RAM GPA -> HPA mapping).
Those RMRRs currently exist and sometimes the device is
attempting to access it but this has not been considered
an issue until now.
However at the moment, iommu_get_group_resv_regions() is
not able to make any difference between directly mapped
regions: those which must be absolutely enforced and those
like above ones which are known as relaxable.
This is a blocker for reporting severe conflicts between
non relaxable RMRRs (like MSI doorbells) and guest GPA space.
With this new reserved region type we will be able to use
iommu_get_group_resv_regions() to enumerate the IOVA space
that is usable through the IOMMU API without introducing
regressions with respect to existing device assignment
use cases (USB and IGD).
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Some IOMMU hardware features, for example PCI PRI and Arm SMMU Stall,
enable recoverable I/O page faults. Allow IOMMU drivers to report PRI Page
Requests and Stall events through the new fault reporting API. The
consumer of the fault can be either an I/O page fault handler in the host,
or a guest OS.
Once handled, the fault must be completed by sending a page response back
to the IOMMU. Add an iommu_page_response() function to complete a page
fault.
There are two ways to extend the userspace API:
* Add a field to iommu_page_response and a flag to
iommu_page_response::flags describing the validity of this field.
* Introduce a new iommu_page_response_X structure with a different version
number. The kernel must then support both versions.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Traditionally, device specific faults are detected and handled within
their own device drivers. When IOMMU is enabled, faults such as DMA
related transactions are detected by IOMMU. There is no generic
reporting mechanism to report faults back to the in-kernel device
driver or the guest OS in case of assigned devices.
This patch introduces a registration API for device specific fault
handlers. This differs from the existing iommu_set_fault_handler/
report_iommu_fault infrastructures in several ways:
- it allows to report more sophisticated fault events (both
unrecoverable faults and page request faults) due to the nature
of the iommu_fault struct
- it is device specific and not domain specific.
The current iommu_report_device_fault() implementation only handles
the "shoot and forget" unrecoverable fault case. Handling of page
request faults or stalled faults will come later.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Device faults detected by IOMMU can be reported outside the IOMMU
subsystem for further processing. This patch introduces
a generic device fault data structure.
The fault can be either an unrecoverable fault or a page request,
also referred to as a recoverable fault.
We only care about non internal faults that are likely to be reported
to an external subsystem.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 59 temple place suite 330 boston ma 02111
1307 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 136 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000436.384967451@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Normally during iommu probing a device, a default doamin will
be allocated and attached to the device. The domain type of
the default domain is statically defined, which results in a
situation where the allocated default domain isn't suitable
for the device due to some limitations. We already have API
iommu_request_dm_for_dev() to replace a DMA domain with an
identity one. This adds iommu_request_dma_domain_for_dev()
to request a dma domain if an allocated identity domain isn't
suitable for the device in question.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Root complex node in IORT has a bit telling whether it supports ATS or
not. Store this bit in the IOMMU fwspec when setting up a device, so it
can be accessed later by an IOMMU driver. In the future we'll probably
want to store this bit at the host bridge or SMMU rather than in each
endpoint.
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add bind() and unbind() operations to the IOMMU API.
iommu_sva_bind_device() binds a device to an mm, and returns a handle to
the bond, which is released by calling iommu_sva_unbind_device().
Each mm bound to devices gets a PASID (by convention, a 20-bit system-wide
ID representing the address space), which can be retrieved with
iommu_sva_get_pasid(). When programming DMA addresses, device drivers
include this PASID in a device-specific manner, to let the device access
the given address space. Since the process memory may be paged out, device
and IOMMU must support I/O page faults (e.g. PCI PRI).
Using iommu_sva_set_ops(), device drivers provide an mm_exit() callback
that is called by the IOMMU driver if the process exits before the device
driver called unbind(). In mm_exit(), device driver should disable DMA
from the given context, so that the core IOMMU can reallocate the PASID.
Whether the process exited or nor, the device driver should always release
the handle with unbind().
To use these functions, device driver must first enable the
IOMMU_DEV_FEAT_SVA device feature with iommu_dev_enable_feature().
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Sharing a physical PCI device in a finer-granularity way
is becoming a consensus in the industry. IOMMU vendors
are also engaging efforts to support such sharing as well
as possible. Among the efforts, the capability of support
finer-granularity DMA isolation is a common requirement
due to the security consideration. With finer-granularity
DMA isolation, subsets of a PCI function can be isolated
from each others by the IOMMU. As a result, there is a
request in software to attach multiple domains to a physical
PCI device. One example of such use model is the Intel
Scalable IOV [1] [2]. The Intel vt-d 3.0 spec [3] introduces
the scalable mode which enables PASID granularity DMA
isolation.
This adds the APIs to support multiple domains per device.
In order to ease the discussions, we call it 'a domain in
auxiliary mode' or simply 'auxiliary domain' when multiple
domains are attached to a physical device.
The APIs include:
* iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
- Detect both IOMMU and PCI endpoint devices supporting
the feature (aux-domain here) without the host driver
dependency.
* iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
- Check the enabling status of the feature (aux-domain
here). The aux-domain interfaces are available only
if this returns true.
* iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
- Enable/disable device specific aux-domain feature.
* iommu_aux_attach_device(domain, dev)
- Attaches @domain to @dev in the auxiliary mode. Multiple
domains could be attached to a single device in the
auxiliary mode with each domain representing an isolated
address space for an assignable subset of the device.
* iommu_aux_detach_device(domain, dev)
- Detach @domain which has been attached to @dev in the
auxiliary mode.
* iommu_aux_get_pasid(domain, dev)
- Return ID used for finer-granularity DMA translation.
For the Intel Scalable IOV usage model, this will be
a PASID. The device which supports Scalable IOV needs
to write this ID to the device register so that DMA
requests could be tagged with a right PASID prefix.
This has been updated with the latest proposal from Joerg
posted here [5].
Many people involved in discussions of this design.
Kevin Tian <kevin.tian@intel.com>
Liu Yi L <yi.l.liu@intel.com>
Ashok Raj <ashok.raj@intel.com>
Sanjay Kumar <sanjay.k.kumar@intel.com>
Jacob Pan <jacob.jun.pan@linux.intel.com>
Alex Williamson <alex.williamson@redhat.com>
Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Joerg Roedel <joro@8bytes.org>
and some discussions can be found here [4] [5].
[1] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
[3] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[4] https://lkml.org/lkml/2018/7/26/4
[5] https://www.spinics.net/lists/iommu/msg31874.html
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Suggested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce iotlb_sync_map() callback that is invoked in the end of
iommu_map(). This new callback allows IOMMU drivers to avoid syncing
after mapping of each contiguous chunk and sync only when the whole
mapping is completed, optimizing performance of the mapping operation.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
These wrappers will be used to easily change the location of
the field later when all users are converted.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With the flush queue infrastructure already abstracted into IOVA
domains, hooking it up in iommu-dma is pretty simple. Since there is a
degree of dependency on the IOMMU driver knowing what to do to play
along, we key the whole thing off a domain attribute which will be set
on default DMA ops domains to request non-strict invalidation. That way,
drivers can indicate the appropriate support by acknowledging the
attribute, and we can easily fall back to strict invalidation otherwise.
The flush queue callback needs a handle on the iommu_domain which owns
our cookie, so we have to add a pointer back to that, but neatly, that's
also sufficient to indicate whether we're using a flush queue or not,
and thus which way to release IOVAs. The only slight subtlety is
switching __iommu_dma_unmap() from calling iommu_unmap() to explicit
iommu_unmap_fast()/iommu_tlb_sync() so that we can elide the sync
entirely in non-strict mode.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
[rm: convert to domain attribute, tweak comments and commit message]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since these are trivially handled by the .domain_{get,set}_attr
callbacks when relevant, we can streamline struct iommu_ops for
everyone.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
API callers to retrieve the domain pointer, for DMA ops domains it
doesn't scale well for large systems and multi-queue devices, since the
momentary refcount adjustment will lead to exclusive cacheline contention
when multiple CPUs are operating in parallel on different mappings for
the same device.
In the case of DMA ops domains, however, this refcounting is actually
unnecessary, since they already imply that the group exists and is
managed by platform code and IOMMU internals (by virtue of
iommu_group_get_for_dev()) such that a reference will already be held
for the lifetime of the device. Thus we can avoid the bottleneck by
providing a fast lookup specifically for the DMA code to retrieve the
default domain it already knows it has set up - a simple read-only
dereference plays much nicer with cache-coherency protocols.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Implement bus specific support for the fsl-mc bus including
registering arm_smmu_ops and bus specific device add operations.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
All iommu drivers use the default_iommu_map_sg implementation, and there
is no good reason to ever override it. Just expose it as iommu_map_sg
directly and remove the indirection, specially in our post-spectre world
where indirect calls are horribly expensive.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Provide base enablement for using debugfs to expose internal data of an
IOMMU driver. When called, create the /sys/kernel/debug/iommu directory.
Emit a strong warning at boot time to indicate that this feature is
enabled.
This function is called from iommu_init, and creates the initial DebugFS
directory. Drivers may then call iommu_debugfs_new_driver_dir() to
instantiate a device-specific directory to expose internal data.
It will return a pointer to the new dentry structure created in
/sys/kernel/debug/iommu, or NULL in the event of a failure.
Since the IOMMU driver can not be removed from the running system, there
is no need for an "off" function.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently, iommu_unmap, iommu_unmap_fast and iommu_map_sg return
size_t. However, some of the return values are error codes (< 0),
which can be misinterpreted as large size. Therefore, returning size 0
instead to signify failure to map/unmap.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The definition of map_sg was split during a recent addition to iommu_ops.
Put it back together.
Fixes: add02cfdc9 ("iommu: Introduce Interface for IOMMU TLB Flushing")
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With the current IOMMU-API the hardware TLBs have to be
flushed in every iommu_ops->unmap() call-back.
For unmapping large amounts of address space, like it
happens when a KVM domain with assigned devices is
destroyed, this causes thousands of unnecessary TLB flushes
in the IOMMU hardware because the unmap call-back runs for
every unmapped physical page.
With the TLB Flush Interface and the new iommu_unmap_fast()
function introduced here the need to clean the hardware TLBs
is removed from the unmapping code-path. Users of
iommu_unmap_fast() have to explicitly call the TLB-Flush
functions to sync the page-table changes to the hardware.
Three functions for TLB-Flushes are introduced:
* iommu_flush_tlb_all() - Flushes all TLB entries
associated with that
domain. TLBs entries are
flushed when this function
returns.
* iommu_tlb_range_add() - This will add a given
range to the flush queue
for this domain.
* iommu_tlb_sync() - Flushes all queued ranges from
the hardware TLBs. Returns when
the flush is finished.
The semantic of this interface is intentionally similar to
the iommu_gather_ops from the io-pgtable code.
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This new call-back will be used to check if the domain attach need be
deferred for now. If yes, the domain attach/detach will return directly.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The struct iommu_device has a 'struct device' embedded into
it, not as a pointer, but the whole struct. In the
conversion of the iommu drivers to use struct iommu_device
it was forgotten that the relase function for that struct
device simply calls kfree() on the pointer.
This frees memory that was never allocated and causes memory
corruption.
To fix this issue, use a pointer to struct device instead of
embedding the whole struct. This needs some updates in the
iommu sysfs code as well as the Intel VT-d and AMD IOMMU
driver.
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: 39ab9555c2 ('iommu: Add sysfs bindings for struct iommu_device')
Cc: stable@vger.kernel.org # >= v4.11
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Commit 7d3002cc8c ("iommu/core: split mapping to page sizes as supported
by the hardware") replaced 'int gfp_order' with a 'size_t size' of
iommu_map / iommu_unmap function arguments, but missed the function
prototypes for the disabled CONFIG_IOMMU_API case, let's correct them
for consistency.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The function is in no fast-path, there is no need for it to
be static inline in a header file. This also removes the
need to include iommu trace-points in iommu.h.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We make use of 'struct device' in iommu.h, so include
device.h to make it available explicitly.
Re-order the other headers while at it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This is a fairly subtle thing - let's make sure it's described as
clearly as possible to avoid potential misunderstandings.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The introduction of reserved regions has left a couple of rough edges
which we could do with sorting out sooner rather than later. Since we
are not yet addressing the potential dynamic aspect of software-managed
reservations and presenting them at arbitrary fixed addresses, it is
incongruous that we end up displaying hardware vs. software-managed MSI
regions to userspace differently, especially since ARM-based systems may
actually require one or the other, or even potentially both at once,
(which iommu-dma currently has no hope of dealing with at all). Let's
resolve the former user-visible inconsistency ASAP before the ABI has
been baked into a kernel release, in a way that also lays the groundwork
for the latter shortcoming to be addressed by follow-up patches.
For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use
IOMMU_RESV_MSI to describe the hardware type, and document everything a
little bit. Since the x86 MSI remapping hardware falls squarely under
this meaning of IOMMU_RESV_MSI, apply that type to their regions as well,
so that we tell the same story to userspace across all platforms.
Secondly, as the various region types require quite different handling,
and it really makes little sense to ever try combining them, convert the
bitfield-esque #defines to a plain enum in the process before anyone
gets the wrong impression.
Fixes: d30ddcaa7b ("iommu: Add a new type field in iommu_resv_region")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: David Woodhouse <dwmw2@infradead.org>
CC: kvm@vger.kernel.org
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
And also move its remaining functionality to
iommu_device_register() and 'struct iommu_device'.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: devicetree@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There is currently support for iommu sysfs bindings, but
those need to be implemented in the IOMMU drivers. Add a
more generic version of this by adding a struct device to
struct iommu_device and use that for the sysfs bindings.
Also convert the AMD and Intel IOMMU driver to make use of
it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This struct represents one hardware iommu in the iommu core
code. For now it only has the iommu-ops associated with it,
but that will be extended soon.
The register/unregister interface is also added, as well as
making use of it in the Intel and AMD IOMMU drivers.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rename the function to iommu_ops_from_fwnode(), because that
is what the function actually does. The new name is much
more descriptive about what the function does.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce iommu_get_group_resv_regions whose role consists in
enumerating all devices from the group and collecting their
reserved regions. The list is sorted and overlaps between
regions of the same type are handled by merging the regions.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Introduce a new helper serving the purpose to allocate a reserved
region. This will be used in iommu driver implementing reserved
region callbacks.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We introduce a new field to differentiate the reserved region
types and specialize the apply_resv_region implementation.
Legacy direct mapped regions have IOMMU_RESV_DIRECT type.
We introduce 2 new reserved memory types:
- IOMMU_RESV_MSI will characterize MSI regions that are mapped
- IOMMU_RESV_RESERVED characterize regions that cannot by mapped.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We want to extend the callbacks used for dm regions and
use them for reserved regions. Reserved regions can be
- directly mapped regions
- regions that cannot be iommu mapped (PCI host bridge windows, ...)
- MSI regions (because they belong to another address space or because
they are not translated by the IOMMU and need special handling)
So let's rename the struct and also the callbacks.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add the IOMMU_PRIV attribute, which is used to indicate privileged
mappings.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The of_iommu_{set/get}_ops() API is used to associate a device
tree node with a specific set of IOMMU operations. The same
kernel interface is required on systems booting with ACPI, where
devices are not associated with a device tree node, therefore
the interface requires generalization.
The struct device fwnode member represents the fwnode token associated
with the device and the struct it points at is firmware specific;
regardless, it is initialized on both ACPI and DT systems and makes an
ideal candidate to use it to associate a set of IOMMU operations to a
given device, through its struct device.fwnode member pointer, paving
the way for representing per-device iommu_ops (ie an iommu instance
associated with a device).
Convert the DT specific of_iommu_{set/get}_ops() interface to
use struct device.fwnode as a look-up token, making the interface
usable on ACPI systems and rename the data structures and the
registration API so that they are made to represent their usage
more clearly.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Nowicki <tn@semihalf.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Tomasz Nowicki <tn@semihalf.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
iommu_group_get_for_dev() expects that the IOMMU driver's device_group
callback return a group with a reference held for the given device.
Whilst allocating a new group is fine, and pci_device_group() correctly
handles reusing an existing group, there is no general means for IOMMU
drivers doing their own group lookup to take additional references on an
existing group pointer without having to also store device pointers or
resort to elaborate trickery.
Add an IOMMU-driver-specific function to fill the hole.
Acked-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce a common structure to hold the per-device firmware data that
most IOMMU drivers need to keep track of. This enables us to configure
much of that data from common firmware code, and consolidate a lot of
the equivalent implementations, device look-up tables, etc. which are
currently strewn across IOMMU drivers.
This will also be enable us to address the outstanding "multiple IOMMUs
on the platform bus" problem by tweaking IOMMU API calls to prefer
dev->fwspec->ops before falling back to dev->bus->iommu_ops, and thus
gracefully handle those troublesome systems which we currently cannot.
As the first user, hook up the OF IOMMU configuration mechanism. The
driver-defined nature of DT cells means that we still need the drivers
to translate and add the IDs themselves, but future users such as the
much less free-form ACPI IORT will be much simpler and self-contained.
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This new call-back will be used by the iommu driver to do
reserve the given dm_region in its iova space before the
mapping is created.
The call-back is temporary until the dma-ops implementation
is part of the common iommu code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Many IOMMUs support multiple page table formats, meaning that any given
domain may only support a subset of the hardware page sizes presented in
iommu_ops->pgsize_bitmap. There are also certain use-cases where the
creator of a domain may want to control which page sizes are used, for
example to force the use of hugepage mappings to reduce pagetable walk
depth.
To this end, add a per-domain pgsize_bitmap to represent the subset of
page sizes actually in use, to make it possible for domains with
different requirements to coexist.
Signed-off-by: Will Deacon <will.deacon@arm.com>
[rm: hijacked and rebased original patch with new commit message]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The priv field from iommu_ops is a hangover from the of_dma_configure
series and isn't actually used. Remove it before it has chance to
spread.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
On some platforms, MMIO regions might need slightly different treatment
compared to mapping regular memory; add the notion of MMIO mappings to
the IOMMU API's memory type flags, so that callers can let the IOMMU
drivers know to do the right thing.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
iommu drivers that support the standard DT bindings use a of_xlate
callback pointer, but that is only part of struct iommu_ops when
CONFIG_OF_IOMMU is enabled, leading to build errors in randconfig
builds when that is not provided:
drivers/iommu/mtk_iommu.c:497:2: error: unknown field 'of_xlate' specified in initializer
.of_xlate = mtk_iommu_of_xlate,
^
drivers/iommu/mtk_iommu.c:497:14: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
.of_xlate = mtk_iommu_of_xlate,
^~~~~~~~~~~~~~~~~~
drivers/iommu/mtk_iommu.c:497:14: note: (near initialization for 'mtk_iommu_ops.domain_get_attr')
We can work around it by adding more #ifdefs in each driver, but
it seems nicer to just allow setting the pointer even if it is
unused. This makes the driver code look nicer, and it gives better
compile-time coverage when test building on other architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 0df4fabe20 ("iommu/mediatek: Add mt8173 IOMMU driver")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Update the comments around struct iommu_ops to match
current state and fix a few typos while at it.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rename that function to pci_device_group() and export it, so
that IOMMU drivers can use it as their device_group
call-back.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
That call-back is currently unused, change it into a
call-back function for finding the right IOMMU group for a
device.
This is a first step to remove the hard-coded PCI dependency
in the iommu-group code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Taking inspiration from the existing arch/arm code, break out some
generic functions to interface the DMA-API to the IOMMU-API. This will
do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
Since associating an IOVA allocator with an IOMMU domain is a fairly
common need, rather than introduce yet another private structure just to
do this for ourselves, extend the top-level struct iommu_domain with the
notion. A simple opaque cookie allows reuse by other IOMMU API users
with their various different incompatible allocator types.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Using __printf attributes helps to detect several format string issues
at compile time (even though -Wformat-security is currently disabled in
Makefile). For example it can detect when formatting a pointer as a
number, like the issue fixed in commit a3fa71c40f ("wl18xx: show
rx_frames_per_rates as an array as it really is"), or when the arguments
do not match the format string, c.f. for example commit 5ce1aca814
("reiserfs: fix __RASSERT format string").
To prevent similar bugs in the future, add a __printf attribute to every
function prototype which needs one in include/linux/ and lib/. These
functions were mostly found by using gcc's -Wsuggest-attribute=format
flag.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This function can be called by an IOMMU driver to request
that a device's default domain is direct mapped.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add two new functions to the IOMMU-API to allow the IOMMU
drivers to export the requirements for direct mapped regions
per device.
This is useful for exporting the information in Intel VT-d's
RMRR entries or AMD-Vi's unity mappings.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
All drivers have been converted to the new domain_alloc and
domain_free iommu-ops. So remove the old ones and get rid of
iommu_domain->priv too, as this is no longer needed when the
struct iommu_domain is embedded in the private structures of
the iommu drivers.
Tested-by: Thierry Reding <treding@nvidia.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This allows to handle domains differently based on their
type in the future. An IOMMU driver can implement certain
optimizations for DMA-API domains for example.
The domain types can be extended later and some of the
existing domain attributes can be migrated to become domain
flags.
Tested-by: Thierry Reding <treding@nvidia.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>