Commit Graph

691 Commits

Author SHA1 Message Date
Jingqi Liu
e8aa45f8cc iommu/vt-d: debugfs: Dump entry pointing to huge page
For the page table entry pointing to a huge page, the data below the
level of the huge page is meaningless and does not need to be dumped.

Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com>
Link: https://lore.kernel.org/r/20231013135811.73953-2-Jingqi.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-16 09:34:51 +02:00
Jiapeng Chong
c61c255e11 iommu/vt-d: Remove unused function
The function are defined in the pasid.c file, but not called
elsewhere, so delete the unused function.

drivers/iommu/intel/pasid.c:342:20: warning: unused function 'pasid_set_wpe'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6185
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230818091603.64800-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-16 09:34:50 +02:00
Yi Liu
c97d1b20d3 iommu/vt-d: Add domain_alloc_user op
Add the domain_alloc_user() op implementation. It supports allocating
domains to be used as parent under nested translation.

Unlike other drivers VT-D uses only a single page table format so it only
needs to check if the HW can support nesting.

Link: https://lore.kernel.org/r/20230928071528.26258-7-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-10 13:31:24 -03:00
Niklas Schnelle
fa4c450709 iommu: Allow .iotlb_sync_map to fail and handle s390's -ENOMEM return
On s390 when using a paging hypervisor, .iotlb_sync_map is used to sync
mappings by letting the hypervisor inspect the synced IOVA range and
updating a shadow table. This however means that .iotlb_sync_map can
fail as the hypervisor may run out of resources while doing the sync.
This can be due to the hypervisor being unable to pin guest pages, due
to a limit on mapped addresses such as vfio_iommu_type1.dma_entry_limit
or lack of other resources. Either way such a failure to sync a mapping
should result in a DMA_MAPPING_ERROR.

Now especially when running with batched IOTLB flushes for unmap it may
be that some IOVAs have already been invalidated but not yet synced via
.iotlb_sync_map. Thus if the hypervisor indicates running out of
resources, first do a global flush allowing the hypervisor to free
resources associated with these mappings as well a retry creating the
new mappings and only if that also fails report this error to callers.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> # sun50i
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-1-9e5fc4dacc36@linux.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-02 08:42:57 +02:00
Zhang Rui
59df44bfb0 iommu/vt-d: Avoid memory allocation in iommu_suspend()
The iommu_suspend() syscore suspend callback is invoked with IRQ disabled.
Allocating memory with the GFP_KERNEL flag may re-enable IRQs during
the suspend callback, which can cause intermittent suspend/hibernation
problems with the following kernel traces:

Calling iommu_suspend+0x0/0x1d0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 15 at kernel/time/timekeeping.c:868 ktime_get+0x9b/0xb0
...
CPU: 0 PID: 15 Comm: rcu_preempt Tainted: G     U      E      6.3-intel #r1
RIP: 0010:ktime_get+0x9b/0xb0
...
Call Trace:
 <IRQ>
 tick_sched_timer+0x22/0x90
 ? __pfx_tick_sched_timer+0x10/0x10
 __hrtimer_run_queues+0x111/0x2b0
 hrtimer_interrupt+0xfa/0x230
 __sysvec_apic_timer_interrupt+0x63/0x140
 sysvec_apic_timer_interrupt+0x7b/0xa0
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1f/0x30
...
------------[ cut here ]------------
Interrupts enabled after iommu_suspend+0x0/0x1d0
WARNING: CPU: 0 PID: 27420 at drivers/base/syscore.c:68 syscore_suspend+0x147/0x270
CPU: 0 PID: 27420 Comm: rtcwake Tainted: G     U  W   E      6.3-intel #r1
RIP: 0010:syscore_suspend+0x147/0x270
...
Call Trace:
 <TASK>
 hibernation_snapshot+0x25b/0x670
 hibernate+0xcd/0x390
 state_store+0xcf/0xe0
 kobj_attr_store+0x13/0x30
 sysfs_kf_write+0x3f/0x50
 kernfs_fop_write_iter+0x128/0x200
 vfs_write+0x1fd/0x3c0
 ksys_write+0x6f/0xf0
 __x64_sys_write+0x1d/0x30
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

Given that only 4 words memory is needed, avoid the memory allocation in
iommu_suspend().

CC: stable@kernel.org
Fixes: 33e0715710 ("iommu/vt-d: Avoid GFP_ATOMIC where it is not needed")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Ooi, Chin Hao <chin.hao.ooi@intel.com>
Link: https://lore.kernel.org/r/20230921093956.234692-1-rui.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230925120417.55977-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25 16:10:36 +02:00
Ard Biesheuvel
cf8e865810 arch: Remove Itanium (IA-64) architecture
The Itanium architecture is obsolete, and an informal survey [0] reveals
that any residual use of Itanium hardware in production is mostly HP-UX
or OpenVMS based. The use of Linux on Itanium appears to be limited to
enthusiasts that occasionally boot a fresh Linux kernel to see whether
things are still working as intended, and perhaps to churn out some
distro packages that are rarely used in practice.

None of the original companies behind Itanium still produce or support
any hardware or software for the architecture, and it is listed as
'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers
that contributed on behalf of those companies (nor anyone else, for that
matter) have been willing to support or maintain the architecture
upstream or even be responsible for applying the odd fix. The Intel
firmware team removed all IA-64 support from the Tianocore/EDK2
reference implementation of EFI in 2018. (Itanium is the original
architecture for which EFI was developed, and the way Linux supports it
deviates significantly from other architectures.) Some distros, such as
Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have
dropped support years ago.

While the argument is being made [1] that there is a 'for the common
good' angle to being able to build and run existing projects such as the
Grid Community Toolkit [2] on Itanium for interoperability testing, the
fact remains that none of those projects are known to be deployed on
Linux/ia64, and very few people actually have access to such a system in
the first place. Even if there were ways imaginable in which Linux/ia64
could be put to good use today, what matters is whether anyone is
actually doing that, and this does not appear to be the case.

There are no emulators widely available, and so boot testing Itanium is
generally infeasible for ordinary contributors. GCC still supports IA-64
but its compile farm [3] no longer has any IA-64 machines. GLIBC would
like to get rid of IA-64 [4] too because it would permit some overdue
code cleanups. In summary, the benefits to the ecosystem of having IA-64
be part of it are mostly theoretical, whereas the maintenance overhead
of keeping it supported is real.

So let's rip off the band aid, and remove the IA-64 arch code entirely.
This follows the timeline proposed by the Debian/ia64 maintainer [5],
which removes support in a controlled manner, leaving IA-64 in a known
good state in the most recent LTS release. Other projects will follow
once the kernel support is removed.

[0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/
[1] https://lore.kernel.org/all/0075883c-7c51-00f5-2c2d-5119c1820410@web.de/
[2] https://gridcf.org/gct-docs/latest/index.html
[3] https://cfarm.tetaneutral.net/machines/list/
[4] https://lore.kernel.org/all/87bkiilpc4.fsf@mid.deneb.enyo.de/
[5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/

Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-09-11 08:13:17 +00:00
Linus Torvalds
0468be89b3 IOMMU Updates for Linux v6.6
Including:
 
 	- Core changes:
 	  - Consolidate probe_device path
 	  - Make the PCI-SAC IOVA allocation trick PCI-only
 
 	- AMD IOMMU:
 	  - Consolidate PPR log handling
 	  - Interrupt handling improvements
 	  - Refcount fixes for amd_iommu_v2 driver
 
 	- Intel VT-d driver:
 	  - Enable idxd device DMA with pasid through iommu dma ops.
 	  - Lift RESV_DIRECT check from VT-d driver to core.
 	  - Miscellaneous cleanups and fixes.
 
 	- ARM-SMMU drivers:
 	  - Device-tree binding updates:
 	    - Add additional compatible strings for Qualcomm SoCs
 	    - Allow ASIDs to be configured in the DT to work around Qualcomm's
 	      broken hypervisor
 	    - Fix clocks for Qualcomm's MSM8998 SoC
 	  - SMMUv2:
 	    - Support for Qualcomm's legacy firmware implementation featured on
 	      at least MSM8956 and MSM8976.
 	    - Match compatible strings for Qualcomm SM6350 and SM6375 SoC variants
 	  - SMMUv3:
 	    - Use 'ida' instead of a bitmap for VMID allocation
 
 	  - Rockchip IOMMU:
 	    - Lift page-table allocation restrictions on newer hardware
 
 	  - Mediatek IOMMU:
 	    - Add MT8188 IOMMU Support
 
 	  - Renesas IOMMU:
 	    - Allow PCIe devices
 
 	- Usual set of cleanups an smaller fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmTx7IMACgkQK/BELZcB
 GuMxUA/+P/wYvAKCbDpXyszIpyCTx37BkeRTBaVqG0vEKLG6439i+PIm3oudQK+6
 0y+1clJi0Ddu0uv1ck90cIEP1YDuKaKdrOVeE7TtlK+6LKYxTyeN+mz4csMIbahI
 6JMrWzrIEPIyMBHzAepQiGDCsmDkrCngPj0WmA7+EQZSSHVYp+TLe6OLzNs74vDF
 zCITkYNq6aKyg/dNJpMRy6VOHvw9PUiwRvm7ko7WONP4VCtpW4g3Jpkerf19zoV2
 s0nwZuGn3o7F0aFOpRJPPKQNfQnNjOjHdxjcsGBafD9qqAk4TLvnZH24njKtPidJ
 P8CiAu//HxhDyUPTgTIrDroVOGVG7s85XO+WesjPkEI3vnNjXy+qEIinQBJ3oIaI
 ppDLSnArEhfSRgt6dXvPCJ/g4+WGS9jNV85GCa7XBtal2Msu8G89NKC97mpmjCkb
 lnGmCF9t7Tkt/fLWxw4GADBN3m2tOib1GQMvPYAF2WM3jH5aRq2UliIRuCHZkzwv
 EF3SiFQQqab6oogU9tF/A1QLUKQ8QfYOdabqL9z2COgF5tS00VC6b/6VTNkKeBHe
 qIiOpI7IWo76tFJule5gRaUth9nVkjpEo6kL9I6rEldOlFJrX6uaHTta6/isY3gx
 vkN98V/OThRUbDwMD122YVKNNjZE2MNsTeptXqB3jHvl3UWiLsQ=
 =RV+G
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "Core changes:

   - Consolidate probe_device path

   - Make the PCI-SAC IOVA allocation trick PCI-only

  AMD IOMMU:

   - Consolidate PPR log handling

   - Interrupt handling improvements

   - Refcount fixes for amd_iommu_v2 driver

  Intel VT-d driver:

   - Enable idxd device DMA with pasid through iommu dma ops

   - Lift RESV_DIRECT check from VT-d driver to core

   - Miscellaneous cleanups and fixes

  ARM-SMMU drivers:

   - Device-tree binding updates:
      - Add additional compatible strings for Qualcomm SoCs
      - Allow ASIDs to be configured in the DT to work around Qualcomm's
        broken hypervisor
      - Fix clocks for Qualcomm's MSM8998 SoC

   - SMMUv2:
      - Support for Qualcomm's legacy firmware implementation featured
        on at least MSM8956 and MSM8976
      - Match compatible strings for Qualcomm SM6350 and SM6375 SoC
        variants

   - SMMUv3:
      - Use 'ida' instead of a bitmap for VMID allocation

   - Rockchip IOMMU:
      - Lift page-table allocation restrictions on newer hardware

   - Mediatek IOMMU:
      - Add MT8188 IOMMU Support

   - Renesas IOMMU:
      - Allow PCIe devices

  .. and the usual set of cleanups an smaller fixes"

* tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (64 commits)
  iommu: Explicitly include correct DT includes
  iommu/amd: Remove unused declarations
  iommu/arm-smmu-qcom: Add SM6375 SMMUv2
  iommu/arm-smmu-qcom: Add SM6350 DPU compatible
  iommu/arm-smmu-qcom: Add SM6375 DPU compatible
  iommu/arm-smmu-qcom: Sort the compatible list alphabetically
  dt-bindings: arm-smmu: Fix MSM8998 clocks description
  iommu/vt-d: Remove unused extern declaration dmar_parse_dev_scope()
  iommu/vt-d: Fix to convert mm pfn to dma pfn
  iommu/vt-d: Fix to flush cache of PASID directory table
  iommu/vt-d: Remove rmrr check in domain attaching device path
  iommu: Prevent RESV_DIRECT devices from blocking domains
  dmaengine/idxd: Re-enable kernel workqueue under DMA API
  iommu/vt-d: Add set_dev_pasid callback for dma domain
  iommu/vt-d: Prepare for set_dev_pasid callback
  iommu/vt-d: Make prq draining code generic
  iommu/vt-d: Remove pasid_mutex
  iommu/vt-d: Add domain_flush_pasid_iotlb()
  iommu: Move global PASID allocation from SVA to core
  iommu: Generalize PASID 0 for normal DMA w/o PASID
  ...
2023-09-01 16:54:25 -07:00
Linus Torvalds
4debf77169 iommufd for 6.6
This includes a shared branch with VFIO:
 
  - Enhance VFIO_DEVICE_GET_PCI_HOT_RESET_INFO so it can work with iommufd
    FDs, not just group FDs. This removes the last place in the uAPI that
    required the group fd.
 
  - Give VFIO a new device node /dev/vfio/devices/vfioX (the so called cdev
    node) which is very similar to the FD from VFIO_GROUP_GET_DEVICE_FD.
    The cdev is associated with the struct device that the VFIO driver is
    bound to and shows up in sysfs in the normal way.
 
  - Add a cdev IOCTL VFIO_DEVICE_BIND_IOMMUFD which allows a newly opened
    /dev/vfio/devices/vfioX to be associated with an IOMMUFD, this replaces
    the VFIO_GROUP_SET_CONTAINER flow.
 
  - Add cdev IOCTLs VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT to allow the IOMMU
    translation the vfio_device is associated with to be changed. This is a
    significant new feature for VFIO as previously each vfio_device was
    fixed to a single translation.
 
    The translation is under the control of iommufd, so it can be any of
    the different translation modes that iommufd is learning to create.
 
 At this point VFIO has compilation options to remove the legacy interfaces
 and in modern mode it behaves like a normal driver subsystem. The
 /dev/vfio/iommu and /dev/vfio/groupX nodes are not present and each
 vfio_device only has a /dev/vfio/devices/vfioX cdev node that represents
 the device.
 
 On top of this is built some of the new iommufd functionality:
 
  - IOMMU_HWPT_ALLOC allows userspace to directly create the low level
    IO Page table objects and affiliate them with IOAS objects that hold
    the translation mapping. This is the basic functionality for the
    normal IOMMU_DOMAIN_PAGING domains.
 
  - VFIO_DEVICE_ATTACH_IOMMUFD_PT can be used to replace the current
    translation. This is wired up to through all the layers down to the
    driver so the driver has the ability to implement a hitless
    replacement. This is necessary to fully support guest behaviors when
    emulating HW (eg guest atomic change of translation)
 
  - IOMMU_GET_HW_INFO returns information about the IOMMU driver HW that
    owns a VFIO device. This includes support for the Intel iommu, and
    patches have been posted for all the other server IOMMU.
 
 Along the way are a number of internal items:
 
  - New iommufd kapis iommufd_ctx_has_group(), iommufd_device_to_ictx(),
    iommufd_device_to_id(), iommufd_access_detach(), iommufd_ctx_from_fd(),
    iommufd_device_replace()
 
  - iommufd now internally tracks iommu_groups as it needs some per-group
    data
 
  - Reorganize how the internal hwpt allocation flows to have more robust
    locking
 
  - Improve the access interfaces to support detach and replace of an IOAS
    from an access
 
  - New selftests and a rework of how the selftests creates a mock iommu
    driver to be more like a real iommu driver
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZO/QDQAKCRCFwuHvBreF
 YZ2iAP4hNEF6MJLRI2A28V3I/80f3x9Ed3Cirp/Q8ZdVEE+HYQD8DFaafJ0y3iPQ
 5mxD4ZrZ9KfUns/gUqCT5oPHjrcvSAM=
 =EQCw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "On top of the vfio updates is built some new iommufd functionality:

   - IOMMU_HWPT_ALLOC allows userspace to directly create the low level
     IO Page table objects and affiliate them with IOAS objects that
     hold the translation mapping. This is the basic functionality for
     the normal IOMMU_DOMAIN_PAGING domains.

   - VFIO_DEVICE_ATTACH_IOMMUFD_PT can be used to replace the current
     translation. This is wired up to through all the layers down to the
     driver so the driver has the ability to implement a hitless
     replacement. This is necessary to fully support guest behaviors
     when emulating HW (eg guest atomic change of translation)

   - IOMMU_GET_HW_INFO returns information about the IOMMU driver HW
     that owns a VFIO device. This includes support for the Intel iommu,
     and patches have been posted for all the other server IOMMU.

  Along the way are a number of internal items:

   - New iommufd kernel APIs: iommufd_ctx_has_group(),
        iommufd_device_to_ictx(), iommufd_device_to_id(),
        iommufd_access_detach(), iommufd_ctx_from_fd(),
        iommufd_device_replace()

   - iommufd now internally tracks iommu_groups as it needs some
     per-group data

   - Reorganize how the internal hwpt allocation flows to have more
     robust locking

   - Improve the access interfaces to support detach and replace of an
     IOAS from an access

   - New selftests and a rework of how the selftests creates a mock
     iommu driver to be more like a real iommu driver"

Link: https://lore.kernel.org/lkml/ZO%2FTe6LU1ENf58ZW@nvidia.com/

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (34 commits)
  iommufd/selftest: Don't leak the platform device memory when unloading the module
  iommu/vt-d: Implement hw_info for iommu capability query
  iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl
  iommufd: Add IOMMU_GET_HW_INFO
  iommu: Add new iommu op to get iommu hardware information
  iommu: Move dev_iommu_ops() to private header
  iommufd: Remove iommufd_ref_to_users()
  iommufd/selftest: Make the mock iommu driver into a real driver
  vfio: Support IO page table replacement
  iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
  iommufd: Add iommufd_access_replace() API
  iommufd: Use iommufd_access_change_ioas in iommufd_access_destroy_object
  iommufd: Add iommufd_access_change_ioas(_id) helpers
  iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
  vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
  iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
  iommufd/selftest: Return the real idev id from selftest mock_domain
  iommufd: Add IOMMU_HWPT_ALLOC
  iommufd/selftest: Test iommufd_device_replace()
  iommufd: Make destroy_rwsem use a lock class per object type
  ...
2023-08-30 20:41:37 -07:00
Linus Torvalds
1687d8aca5 * Rework apic callbacks, getting rid of unnecessary ones and
coalescing lots of silly duplicates.
  * Use static_calls() instead of indirect calls for apic->foo()
  * Tons of cleanups an crap removal along the way
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTvfO8ACgkQaDWVMHDJ
 krAP2A//ccii/LuvtTnNEIMMR5w2rwTdHv91ancgFkC8pOeNk37Z8sSLq8tKuLFA
 vgjBIysVIqunuRcNCJ+eqwIIxYfU+UGCWHppzLwO+DY3Q7o9EoTL0BgytdAqxpQQ
 ntEVarqWq25QYXKFoAqbUTJ1UXa42/8HfiXAX/jvP+ACXfilkGPZre6ASxlXeOhm
 XbgPuNQPmXi2WYQH9GCQEsz2Nh80hKap8upK2WbQzzJ3lXsm+xA//4klab0HCYwl
 Uc302uVZozyXRMKbAlwmgasTFOLiV8KKriJ0oHoktBpWgkpdR9uv/RDeSaFR3DAl
 aFmecD4k/Hqezg4yVl+4YpEn2KjxiwARCm4PMW5AV7lpWBPBHAOOai65yJlAi9U6
 bP8pM0+aIx9xg7oWfsTnQ7RkIJ+GZ0w+KZ9LXFM59iu3eV1pAJE3UVyUehe/J1q9
 n8OcH0UeHRlAb8HckqVm1AC7IPvfHw4OAPtUq7z3NFDwbq6i651Tu7f+i2bj31cX
 77Ames+fx6WjxUjyFbJwaK44E7Qez3waztdBfn91qw+m0b+gnKE3ieDNpJTqmm5b
 mKulV7KJwwS6cdqY3+Kr+pIlN+uuGAv7wGzVLcaEAXucDsVn/YAMJHY2+v97xv+n
 J9N+yeaYtmSXVlDsJ6dndMrTQMmcasK1CVXKxs+VYq5Lgf+A68w=
 =eoKm
 -----END PGP SIGNATURE-----

Merge tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 apic updates from Dave Hansen:
 "This includes a very thorough rework of the 'struct apic' handlers.
  Quite a variety of them popped up over the years, especially in the
  32-bit days when odd apics were much more in vogue.

  The end result speaks for itself, which is a removal of a ton of code
  and static calls to replace indirect calls.

  If there's any breakage here, it's likely to be around the 32-bit
  museum pieces that get light to no testing these days.

  Summary:

   - Rework apic callbacks, getting rid of unnecessary ones and
     coalescing lots of silly duplicates.

   - Use static_calls() instead of indirect calls for apic->foo()

   - Tons of cleanups an crap removal along the way"

* tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  x86/apic: Turn on static calls
  x86/apic: Provide static call infrastructure for APIC callbacks
  x86/apic: Wrap IPI calls into helper functions
  x86/apic: Mark all hotpath APIC callback wrappers __always_inline
  x86/xen/apic: Mark apic __ro_after_init
  x86/apic: Convert other overrides to apic_update_callback()
  x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb()
  x86/apic: Provide apic_update_callback()
  x86/xen/apic: Use standard apic driver mechanism for Xen PV
  x86/apic: Provide common init infrastructure
  x86/apic: Wrap apic->native_eoi() into a helper
  x86/apic: Nuke ack_APIC_irq()
  x86/apic: Remove pointless arguments from [native_]eoi_write()
  x86/apic/noop: Tidy up the code
  x86/apic: Remove pointless NULL initializations
  x86/apic: Sanitize APIC ID range validation
  x86/apic: Prepare x2APIC for using apic::max_apic_id
  x86/apic: Simplify X2APIC ID validation
  x86/apic: Add max_apic_id member
  x86/apic: Wrap APIC ID validation into an inline
  ...
2023-08-30 10:44:46 -07:00
Joerg Roedel
d8fe59f110 Merge branches 'apple/dart', 'arm/mediatek', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next 2023-08-21 14:18:43 +02:00
Alistair Popple
1af5a81099 mmu_notifiers: rename invalidate_range notifier
There are two main use cases for mmu notifiers.  One is by KVM which uses
mmu_notifier_invalidate_range_start()/end() to manage a software TLB.

The other is to manage hardware TLBs which need to use the
invalidate_range() callback because HW can establish new TLB entries at
any time.  Hence using start/end() can lead to memory corruption as these
callbacks happen too soon/late during page unmap.

mmu notifier users should therefore either use the start()/end() callbacks
or the invalidate_range() callbacks.  To make this usage clearer rename
the invalidate_range() callback to arch_invalidate_secondary_tlbs() and
update documention.

Link: https://lkml.kernel.org/r/6f77248cd25545c8020a54b4e567e8b72be4dca1.1690292440.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Cc: Frederic Barrat <fbarrat@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zhi Wang <zhi.wang.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:41 -07:00
Yi Liu
55243393b0 iommu/vt-d: Implement hw_info for iommu capability query
Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.

Link: https://lore.kernel.org/r/20230818101033.4100-6-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-18 12:52:15 -03:00
Yanfei Xu
fb5f50a43d iommu/vt-d: Fix to convert mm pfn to dma pfn
For the case that VT-d page is smaller than mm page, converting dma pfn
should be handled in two cases which are for start pfn and for end pfn.
Currently the calculation of end dma pfn is incorrect and the result is
less than real page frame number which is causing the mapping of iova
always misses some page frames.

Rename the mm_to_dma_pfn() to mm_to_dma_pfn_start() and add a new helper
for converting end dma pfn named mm_to_dma_pfn_end().

Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230625082046.979742-1-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:46:19 +02:00
Yanfei Xu
8a3b8e63f8 iommu/vt-d: Fix to flush cache of PASID directory table
Even the PCI devices don't support pasid capability, PASID table is
mandatory for a PCI device in scalable mode. However flushing cache
of pasid directory table for these devices are not taken after pasid
table is allocated as the "size" of table is zero. Fix it by
calculating the size by page order.

Found this when reading the code, no real problem encountered for now.

Fixes: 194b3348bd ("iommu/vt-d: Fix PASID directory pointer coherency")
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230616081045.721873-1-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:46:19 +02:00
Lu Baolu
d3aedf94f4 iommu/vt-d: Remove rmrr check in domain attaching device path
The core code now prevents devices with RMRR regions from being assigned
to user space. There is no need to check for this condition in individual
drivers. Remove it to avoid duplicate code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230724060352.113458-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:46:18 +02:00
Lu Baolu
7d0c9da6c1 iommu/vt-d: Add set_dev_pasid callback for dma domain
This allows the upper layers to set a domain to a PASID of a device
if the PASID feature is supported by the IOMMU hardware. The typical
use cases are, for example, kernel DMA with PASID and hardware
assisted mediated device drivers.

The attaching device and pasid information is tracked in a per-domain
list and is used for IOTLB and devTLB invalidation.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-8-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:39 +02:00
Lu Baolu
37f900e718 iommu/vt-d: Prepare for set_dev_pasid callback
The domain_flush_pasid_iotlb() helper function is used to flush the IOTLB
entries for a given PASID. Previously, this function assumed that
RID2PASID was only used for the first-level DMA translation. However, with
the introduction of the set_dev_pasid callback, this assumption is no
longer valid.

Add a check before using the RID2PASID for PASID invalidation. This check
ensures that the domain has been attached to a physical device before
using RID2PASID.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-7-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:38 +02:00
Lu Baolu
154786235d iommu/vt-d: Make prq draining code generic
Currently draining page requests and responses for a pasid is part of SVA
implementation. This is because the driver only supports attaching an SVA
domain to a device pasid. As we are about to support attaching other types
of domains to a device pasid, the prq draining code becomes generic.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-6-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:37 +02:00
Lu Baolu
b61701881f iommu/vt-d: Remove pasid_mutex
The pasid_mutex was used to protect the paths of set/remove_dev_pasid().
It's duplicate with iommu_sva_lock. Remove it to avoid duplicate code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-5-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:37 +02:00
Lu Baolu
ac1a3483fe iommu/vt-d: Add domain_flush_pasid_iotlb()
The VT-d spec requires to use PASID-based-IOTLB invalidation descriptor
to invalidate IOTLB and the paging-structure caches for a first-stage
page table. Add a generic helper to do this.

RID2PASID is used if the domain has been attached to a physical device,
otherwise real PASIDs that the domain has been attached to will be used.
The 'real' PASID attachment is handled in the subsequent change.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-4-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:37 +02:00
Jacob Pan
4298780126 iommu: Generalize PASID 0 for normal DMA w/o PASID
PCIe Process address space ID (PASID) is used to tag DMA traffic, it
provides finer grained isolation than requester ID (RID).

For each device/RID, 0 is a special PASID for the normal DMA (no
PASID). This is universal across all architectures that supports PASID,
therefore warranted to be reserved globally and declared in the common
header. Consequently, we can avoid the conflict between different PASID
use cases in the generic code. e.g. SVA and DMA API with PASIDs.

This paved away for device drivers to choose global PASID policy while
continue doing normal DMA.

Noting that VT-d could support none-zero RID/NO_PASID, but currently not
used.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:34 +02:00
Thomas Gleixner
a539cc86a1 x86/vector: Rename send_cleanup_vector() to vector_schedule_cleanup()
Rename send_cleanup_vector() to vector_schedule_cleanup() to prepare for
replacing the vector cleanup IPI with a timer callback.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20230621171248.6805-2-xin3.li@intel.com
2023-08-06 14:15:09 +02:00
Jason Gunthorpe
6eb4da8cf5 iommu: Have __iommu_probe_device() check for already probed devices
This is a step toward making __iommu_probe_device() self contained.

It should, under proper locking, check if the device is already associated
with an iommu driver and resolve parallel probes. All but one of the
callers open code this test using two different means, but they all
rely on dev->iommu_group.

Currently the bus_iommu_probe()/probe_iommu_group() and
probe_acpi_namespace_devices() rejects already probed devices with an
unlocked read of dev->iommu_group. The OF and ACPI "replay" functions use
device_iommu_mapped() which is the same read without the pointless
refcount.

Move this test into __iommu_probe_device() and put it under the
iommu_probe_device_lock. The store to dev->iommu_group is in
iommu_group_add_device() which is also called under this lock for iommu
driver devices, making it properly locked.

The only path that didn't have this check is the hotplug path triggered by
BUS_NOTIFY_ADD_DEVICE. The only way to get dev->iommu_group assigned
outside the probe path is via iommu_group_add_device(). Today the only
caller is VFIO no-iommu which never associates with an iommu driver. Thus
adding this additional check is safe.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14 16:14:12 +02:00
Linus Torvalds
d35ac6ac0e IOMMU Updates for Linux v6.5
Including:
 
 	- Core changes:
 	  - iova_magazine_alloc() optimization
 	  - Make flush-queue an IOMMU driver capability
 	  - Consolidate the error handling around device attachment
 
 	- AMD IOMMU changes:
 	  - AVIC Interrupt Remapping Improvements
 	  - Some minor fixes and cleanups
 
 	- Intel VT-d changes from Lu Baolu:
 	  - Small and misc cleanups
 
 	- ARM-SMMU changes from Will Deacon:
 	  - Device-tree binding updates:
 	    * Add missing clocks for SC8280XP and SA8775 Adreno SMMUs
 	    * Add two new Qualcomm SMMUs in SDX75 and SM6375
 	  - Workarounds for Arm MMU-700 errata:
 	    * 1076982: Avoid use of SEV-based cmdq wakeup
 	    * 2812531: Terminate command batches with a CMD_SYNC
 	    * Enforce single-stage translation to avoid nesting-related errata
 	  - Set the correct level hint for range TLB invalidation on teardown
 
 	- Some other minor fixes and cleanups (including Freescale PAMU and
 	  virtio-iommu changes)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmSVnS0ACgkQK/BELZcB
 GuM4txAAvtE5pMxM4V/9uTJt+de/vd8XiaH2kfQEULCJm2Yz07Z5+oE+QRtjPc2D
 No+98IGMJCNOg+U+6JZ8P2GR3/soFvKdYjhY/iKTXK+C6jiy3dStIFN/KzzHkbpu
 Y/fUZ5B+DizTO6837osDWIdAz3PcwV3Vk/ogHe3FoHWU13RJYOMp2FAox0QreBNE
 kb7tK3ki/RCasbF9rMt9ClB0SZEVDysRkYF7AtXtsMNVm5jpQAITXVcNUYMeaJFL
 n0J8hjn3EiZj7dgzxbL5bRgDyfPadwJkWz2BxkQ6x0gopgHu0EimGL8p2Bei2f8x
 lv2y692L6zZth2ZgjSkecf3Lo4YHirsP/1U1zrLDjEgeBZ0vRxiX0qsvCb9692C1
 +shy5jOX22ub+zJ2UFHMNGKu3ZdhcKi+meejdqM/GrHcRfZABh26bQILFnPF3Oxp
 2WFb2v7Hq9qdQP50jsGbLji6n165aRW969fBdsk1uDUoCDHNOcdHQS3FsiKAAz5d
 /Z/3PR9tQgnF9bDXJB6RbGJ1rQxHlfvarOQCAYiC02ALj4FnuSLiFSBLe1bI4InR
 AgmnQaH2jmFMWHibdvj3q3sm33sLhOjmAE+ZX0YOhFfgrRGHq88qRwV53IfW477E
 8a+6A+tnu28axk7yVJMvvz5/PeYkD2CMeplYQycUiaQutjvN9sk=
 =aRMe
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "Core changes:
   - iova_magazine_alloc() optimization
   - Make flush-queue an IOMMU driver capability
   - Consolidate the error handling around device attachment

  AMD IOMMU changes:
   - AVIC Interrupt Remapping Improvements
   - Some minor fixes and cleanups

  Intel VT-d changes from Lu Baolu:
   - Small and misc cleanups

  ARM-SMMU changes from Will Deacon:
   - Device-tree binding updates:
      - Add missing clocks for SC8280XP and SA8775 Adreno SMMUs
      - Add two new Qualcomm SMMUs in SDX75 and SM6375
   - Workarounds for Arm MMU-700 errata:
      - 1076982: Avoid use of SEV-based cmdq wakeup
      - 2812531: Terminate command batches with a CMD_SYNC
      - Enforce single-stage translation to avoid nesting-related errata
   - Set the correct level hint for range TLB invalidation on teardown

  .. and some other minor fixes and cleanups (including Freescale PAMU
  and virtio-iommu changes)"

* tag 'iommu-updates-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (50 commits)
  iommu/vt-d: Remove commented-out code
  iommu/vt-d: Remove two WARN_ON in domain_context_mapping_one()
  iommu/vt-d: Handle the failure case of dmar_reenable_qi()
  iommu/vt-d: Remove unnecessary (void*) conversions
  iommu/amd: Remove extern from function prototypes
  iommu/amd: Use BIT/BIT_ULL macro to define bit fields
  iommu/amd: Fix DTE_IRQ_PHYS_ADDR_MASK macro
  iommu/amd: Fix compile error for unused function
  iommu/amd: Improving Interrupt Remapping Table Invalidation
  iommu/amd: Do not Invalidate IRT when IRTE caching is disabled
  iommu/amd: Introduce Disable IRTE Caching Support
  iommu/amd: Remove the unused struct amd_ir_data.ref
  iommu/amd: Switch amd_iommu_update_ga() to use modify_irte_ga()
  iommu/arm-smmu-v3: Set TTL invalidation hint better
  iommu/arm-smmu-v3: Document nesting-related errata
  iommu/arm-smmu-v3: Add explicit feature for nesting
  iommu/arm-smmu-v3: Document MMU-700 erratum 2812531
  iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982
  dt-bindings: arm-smmu: Add SDX75 SMMU compatible
  dt-bindings: arm-smmu: Add SM6375 GPU SMMU
  ...
2023-06-29 20:51:03 -07:00
Joerg Roedel
a7a334076d Merge branches 'iommu/fixes', 'arm/smmu', 'ppc/pamu', 'virtio', 'x86/vt-d', 'core' and 'x86/amd' into next 2023-06-19 10:12:42 +02:00
Lu Baolu
b4da4e112a iommu/vt-d: Remove commented-out code
These lines of code were commented out when they were first added in commit
ba39592764 ("Intel IOMMU: Intel IOMMU driver"). We do not want to restore
them because the VT-d spec has deprecated the read/write draining hit.

VT-d spec (section 11.4.2):
"
 Hardware implementation with Major Version 2 or higher (VER_REG), always
 performs required drain without software explicitly requesting a drain in
 IOTLB invalidation. This field is deprecated and hardware  will always
 report it as 1 to maintain backward compatibility with software.
"

Remove the code to make the code cleaner.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20230609060514.15154-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:33 +02:00
Yanfei Xu
3f13f72787 iommu/vt-d: Remove two WARN_ON in domain_context_mapping_one()
Remove the WARN_ON(did == 0) as the domain id 0 is reserved and
set once the domain_ids is allocated. So iommu_init_domains will
never return 0.

Remove the WARN_ON(!table) as this pointer will be accessed in
the following code, if empty "table" really happens, the kernel
will report a NULL pointer reference warning at the first place.

Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230605112659.308981-3-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:32 +02:00
Yanfei Xu
a0e9911ac1 iommu/vt-d: Handle the failure case of dmar_reenable_qi()
dmar_reenable_qi() may not succeed. Check and return when it fails.

Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230605112659.308981-2-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:32 +02:00
Suhui
82d9654f92 iommu/vt-d: Remove unnecessary (void*) conversions
No need cast (void*) to (struct root_entry *).

Signed-off-by: Suhui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20230425033743.75986-1-suhui@nfschina.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:31 +02:00
Peter Zijlstra
b1fe7f2cda x86,intel_iommu: Replace cmpxchg_double()
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.855976804@infradead.org
2023-06-05 09:36:38 +02:00
Robin Murphy
a4fdd97622 iommu: Use flush queue capability
It remains really handy to have distinct DMA domain types within core
code for the sake of default domain policy selection, but we can now
hide that detail from drivers by using the new capability instead.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> # amd, intel, smmu-v3
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1c552d99e8ba452bdac48209fa74c0bdd52fd9d9.1683233867.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:38:45 +02:00
Robin Murphy
4a20ce0ff6 iommu: Add a capability for flush queue support
Passing a special type to domain_alloc to indirectly query whether flush
queues are a worthwhile optimisation with the given driver is a bit
clunky, and looking increasingly anachronistic. Let's put that into an
explicit capability instead.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> # amd, intel, smmu-v3
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/f0086a93dbccb92622e1ace775846d81c1c4b174.1683233867.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:38:44 +02:00
Linus Torvalds
58390c8ce1 IOMMU Updates for Linux 6.4
Including:
 
 	- Convert to platform remove callback returning void
 
 	- Extend changing default domain to normal group
 
 	- Intel VT-d updates:
 	    - Remove VT-d virtual command interface and IOASID
 	    - Allow the VT-d driver to support non-PRI IOPF
 	    - Remove PASID supervisor request support
 	    - Various small and misc cleanups
 
 	- ARM SMMU updates:
 	    - Device-tree binding updates:
 	        * Allow Qualcomm GPU SMMUs to accept relevant clock properties
 	        * Document Qualcomm 8550 SoC as implementing an MMU-500
 	        * Favour new "qcom,smmu-500" binding for Adreno SMMUs
 
 	    - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
 	      implementations
 
 	    - Acknowledge SMMUv3 PRI queue overflow when consuming events
 
 	    - Document (in a comment) why ATS is disabled for bypass streams
 
 	- AMD IOMMU updates:
 	    - 5-level page-table support
 	    - NUMA awareness for memory allocations
 
 	- Unisoc driver: Support for reattaching an existing domain
 
 	- Rockchip driver: Add missing set_platform_dma_ops callback
 
 	- Mediatek driver: Adjust the dma-ranges
 
 	- Various other small fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmRONeAACgkQK/BELZcB
 GuPmpw/8C9ruxQ0JU5rcDBXQGvos4gMmxlbELMrBpbbiTtdb35xchpKfdhnECGIF
 k2SrrcF40R/S82SyzNU/eZtGKirtcXvGFraUFgu/QdCcnnqpRHs+IJMXX2NJP+it
 +0wO1uiInt3CN1ERcR4F31cDKiWjDG8bvQVE5LIyiy4KrIU5ld2G91Fkaa0R13Au
 6H+/wKkcUC6OyaGE6wPx474xBkapT20vj5AIQuAWisXJJR0wbBon1sUTo/IRKsU+
 IkNxH0W+1PNImJ+crAdf/nkOlyqoChY4ww6cm07LrOsBLIsX5bCqXfL4HvKthElD
 MEgk2SN5kfjfR5Vf29W4hZVM1CT8VbhO41I7OzaZ6X6RU2PXoldPKlgKtZGeSKn1
 9bcMpSgB0BtbttvBevSkxTo5KHFozXS2DG3DFoMB3yFMme8Th0LrhBZ9oB7NIPNw
 ntMo4K75vviC6Vvzjy4Anj/+y+Zm3W6wDDP7F12O6WZLkK5s4hrSsHUm/MQnnKQP
 muJlG870RnSl73xUQZe3cuBxktXuJ3EHqqYIPE0npzvauu8hhWcis3opf2Y+U2s8
 aBCCIgp5kTKqjHLh2e4lNCKZf1/b/dhxRcRBQhpAIb8YsjMlIJyM+G8Jz6K6gBga
 5Ld+68UQ3oHJwoLV1HCFN8jbpQ9KZn1s9+h3yrYjRAcLNiFb3nU=
 =OvTo
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Convert to platform remove callback returning void

 - Extend changing default domain to normal group

 - Intel VT-d updates:
     - Remove VT-d virtual command interface and IOASID
     - Allow the VT-d driver to support non-PRI IOPF
     - Remove PASID supervisor request support
     - Various small and misc cleanups

 - ARM SMMU updates:
     - Device-tree binding updates:
         * Allow Qualcomm GPU SMMUs to accept relevant clock properties
         * Document Qualcomm 8550 SoC as implementing an MMU-500
         * Favour new "qcom,smmu-500" binding for Adreno SMMUs

     - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
       implementations

     - Acknowledge SMMUv3 PRI queue overflow when consuming events

     - Document (in a comment) why ATS is disabled for bypass streams

 - AMD IOMMU updates:
     - 5-level page-table support
     - NUMA awareness for memory allocations

 - Unisoc driver: Support for reattaching an existing domain

 - Rockchip driver: Add missing set_platform_dma_ops callback

 - Mediatek driver: Adjust the dma-ranges

 - Various other small fixes and cleanups

* tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (82 commits)
  iommu: Remove iommu_group_get_by_id()
  iommu: Make iommu_release_device() static
  iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope()
  iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn)
  iommu/vt-d: Remove BUG_ON in map/unmap()
  iommu/vt-d: Remove BUG_ON when domain->pgd is NULL
  iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation
  iommu/vt-d: Remove BUG_ON on checking valid pfn range
  iommu/vt-d: Make size of operands same in bitwise operations
  iommu/vt-d: Remove PASID supervisor request support
  iommu/vt-d: Use non-privileged mode for all PASIDs
  iommu/vt-d: Remove extern from function prototypes
  iommu/vt-d: Do not use GFP_ATOMIC when not needed
  iommu/vt-d: Remove unnecessary checks in iopf disabling path
  iommu/vt-d: Move PRI handling to IOPF feature path
  iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path
  iommu/vt-d: Move iopf code from SVA to IOPF enabling path
  iommu/vt-d: Allow SVA with device-specific IOPF
  dmaengine: idxd: Add enable/disable device IOPF feature
  arm64: dts: mt8186: Add dma-ranges for the parent "soc" node
  ...
2023-04-30 13:00:38 -07:00
Joerg Roedel
e51b419839 Merge branches 'iommu/fixes', 'arm/allwinner', 'arm/exynos', 'arm/mediatek', 'arm/omap', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 'unisoc', 'x86/vt-d', 'x86/amd', 'core' and 'platform-remove_new' into next 2023-04-14 13:45:50 +02:00
Tina Zhang
e60d63e32d iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope()
The dmar_insert_dev_scope() could fail if any unexpected condition is
encountered. However, in this situation, the kernel should attempt
recovery and proceed with execution. Remove BUG_ON with WARN_ON, so that
kernel can avoid being crashed when an unexpected condition occurs.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-8-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:53 +02:00
Tina Zhang
ff45ab9646 iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn)
When dmar_alloc_pci_notify_info() is being invoked, the invoker has
ensured the dev->is_virtfn is false. So, remove the useless BUG_ON in
dmar_alloc_pci_notify_info().

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-7-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:53 +02:00
Tina Zhang
cbf2f9e8ba iommu/vt-d: Remove BUG_ON in map/unmap()
Domain map/unmap with invalid parameters shouldn't crash the kernel.
Therefore, using if() replaces the BUG_ON.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-6-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:52 +02:00
Tina Zhang
998d4c2db3 iommu/vt-d: Remove BUG_ON when domain->pgd is NULL
When performing domain_context_mapping or getting dma_pte of a pfn, the
availability of the domain page table directory is ensured. Therefore,
the domain->pgd checkings are unnecessary.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-5-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:52 +02:00
Tina Zhang
4a627a2593 iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation
VT-d iotlb cache invalidation request with unexpected type is considered
as a bug to developers, which can be fixed. So, when such kind of issue
comes out, it needs to be reported through the kernel log, instead of
halting the system. Replacing BUG_ON with warning reporting.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-4-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:51 +02:00
Tina Zhang
35dc5d8998 iommu/vt-d: Remove BUG_ON on checking valid pfn range
When encountering an unexpected invalid pfn range, the kernel should
attempt recovery and proceed with execution. Therefore, using WARN_ON to
replace BUG_ON to avoid halting the machine.

Besides, one redundant checking is reduced.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-3-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:51 +02:00
Tina Zhang
b31064f881 iommu/vt-d: Make size of operands same in bitwise operations
This addresses the following issue reported by klocwork tool:

 - operands of different size in bitwise operations

Suggested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-2-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:50 +02:00
Jacob Pan
113a031bec iommu/vt-d: Remove PASID supervisor request support
There's no more usage, remove PASID supervisor support.

Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230331231137.1947675-3-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:50 +02:00
Jacob Pan
a7050fbde3 iommu/vt-d: Use non-privileged mode for all PASIDs
Supervisor Request Enable (SRE) bit in a PASID entry is for permission
checking on DMA requests. When SRE = 0, DMA with supervisor privilege
will be blocked. However, for in-kernel DMA this is not necessary in that
we are targeting kernel memory anyway. There's no need to differentiate
user and kernel for in-kernel DMA.

Let's use non-privileged (user) permission for all PASIDs used in kernel,
it will be consistent with DMA without PASID (RID_PASID) as well.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230331231137.1947675-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:49 +02:00
Lu Baolu
a06c2ecec1 iommu/vt-d: Remove extern from function prototypes
The kernel coding style does not require 'extern' in function prototypes
in .h files, so remove them from drivers/iommu/intel/iommu.h as they are
not needed.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230331045452.500265-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:49 +02:00
Christophe JAILLET
41d71e09a1 iommu/vt-d: Do not use GFP_ATOMIC when not needed
There is no need to use GFP_ATOMIC here. GFP_KERNEL is already used for
some other memory allocations just a few lines above.

Commit e3a981d61d ("iommu/vt-d: Convert allocations to GFP_KERNEL") has
changed the other memory allocation flags.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/e2a8a1019ffc8a86b4b4ed93def3623f60581274.1675542576.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:48 +02:00
Lu Baolu
7b8aa998d6 iommu/vt-d: Remove unnecessary checks in iopf disabling path
iommu_unregister_device_fault_handler() and iopf_queue_remove_device()
are called after device has stopped issuing new page falut requests and
all outstanding page requests have been drained. They should never fail.
Trigger a warning if it happens unfortunately.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:48 +02:00
Lu Baolu
fbcde5bb92 iommu/vt-d: Move PRI handling to IOPF feature path
PRI is only used for IOPF. With this move, the PCI/PRI feature could be
controlled by the device driver through iommu_dev_enable/disable_feature()
interfaces.

Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:48 +02:00
Lu Baolu
5ae4008055 iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path
They should be part of the per-device iommu private data initialization.

Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:47 +02:00
Lu Baolu
3d4c7cc3d1 iommu/vt-d: Move iopf code from SVA to IOPF enabling path
Generally enabling IOMMU_DEV_FEAT_SVA requires IOMMU_DEV_FEAT_IOPF, but
some devices manage I/O Page Faults themselves instead of relying on the
IOMMU. Move IOPF related code from SVA to IOPF enabling path.

For the device drivers that relies on the IOMMU for IOPF through PCI/PRI,
IOMMU_DEV_FEAT_IOPF must be enabled before and disabled after
IOMMU_DEV_FEAT_SVA.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:47 +02:00
Lu Baolu
a86fb77173 iommu/vt-d: Allow SVA with device-specific IOPF
Currently enabling SVA requires IOPF support from the IOMMU and device
PCI PRI. However, some devices can handle IOPF by itself without ever
sending PCI page requests nor advertising PRI capability.

Allow SVA support with IOPF handled either by IOMMU (PCI PRI) or device
driver (device-specific IOPF). As long as IOPF could be handled, SVA
should continue to work.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:46 +02:00
Kan Liang
16812c9655 iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug
A warning can be triggered when hotplug CPU 0.
$ echo 0 > /sys/devices/system/cpu/cpu0/online

 ------------[ cut here ]------------
 Voluntary context switch within RCU read-side critical section!
 WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318
          rcu_note_context_switch+0x4f4/0x580
 RIP: 0010:rcu_note_context_switch+0x4f4/0x580
 Call Trace:
  <TASK>
  ? perf_event_update_userpage+0x104/0x150
  __schedule+0x8d/0x960
  ? perf_event_set_state.part.82+0x11/0x50
  schedule+0x44/0xb0
  schedule_timeout+0x226/0x310
  ? __perf_event_disable+0x64/0x1a0
  ? _raw_spin_unlock+0x14/0x30
  wait_for_completion+0x94/0x130
  __wait_rcu_gp+0x108/0x130
  synchronize_rcu+0x67/0x70
  ? invoke_rcu_core+0xb0/0xb0
  ? __bpf_trace_rcu_stall_warning+0x10/0x10
  perf_pmu_migrate_context+0x121/0x370
  iommu_pmu_cpu_offline+0x6a/0xa0
  ? iommu_pmu_del+0x1e0/0x1e0
  cpuhp_invoke_callback+0x129/0x510
  cpuhp_thread_fun+0x94/0x150
  smpboot_thread_fn+0x183/0x220
  ? sort_range+0x20/0x20
  kthread+0xe6/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>
 ---[ end trace 0000000000000000 ]---

The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(),
when migrating a PMU to a new CPU. However, the current for_each_iommu()
is within RCU read-side critical section.

Two methods were considered to fix the issue.
- Use the dmar_global_lock to replace the RCU read lock when going
  through the drhd list. But it triggers a lockdep warning.
- Use the cpuhp_setup_state_multi() to set up a dedicated state for each
  IOMMU PMU. The lock can be avoided.

The latter method is implemented in this patch. Since each IOMMU PMU has
a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track
the state. The state can be dynamically allocated now. Remove the
CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE.

Fixes: 46284c6ceb ("iommu/vt-d: Support cpumask for IOMMU perfmon")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230328182028.1366416-1-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230329134721.469447-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:06:16 +02:00
Lu Baolu
bfd3c6b9fa iommu/vt-d: Allow zero SAGAW if second-stage not supported
The VT-d spec states (in section 11.4.2) that hardware implementations
reporting second-stage translation support (SSTS) field as Clear also
report the SAGAW field as 0. Fix an inappropriate check in alloc_iommu().

Fixes: 792fb43ce2 ("iommu/vt-d: Enable Intel IOMMU scalable mode by default")
Suggested-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230318024824.124542-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20230329134721.469447-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:06:15 +02:00
Lu Baolu
c7d624520c iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()
The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.

Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
the DMAR global data structures are not touched there. Remove it to avoid
below lockdep warning.

 ======================================================
 WARNING: possible circular locking dependency detected
 6.3.0-rc2 #468 Not tainted
 ------------------------------------------------------
 swapper/0/1 is trying to acquire lock:
 ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3},
   at: __irq_domain_alloc_irqs+0x3b/0xa0

 but task is already holding lock:
 ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3},
   at: intel_iommu_init+0x58e/0x880

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (dmar_global_lock){++++}-{3:3}:
        lock_acquire+0xd6/0x320
        down_read+0x42/0x180
        intel_irq_remapping_alloc+0xad/0x750
        mp_irqdomain_alloc+0xb8/0x2b0
        irq_domain_alloc_irqs_locked+0x12f/0x2d0
        __irq_domain_alloc_irqs+0x56/0xa0
        alloc_isa_irq_from_domain.isra.7+0xa0/0xe0
        mp_map_pin_to_irq+0x1dc/0x330
        setup_IO_APIC+0x128/0x210
        apic_intr_mode_init+0x67/0x110
        x86_late_time_init+0x24/0x40
        start_kernel+0x41e/0x7e0
        secondary_startup_64_no_verify+0xe0/0xeb

 -> #0 (&domain->mutex){+.+.}-{3:3}:
        check_prevs_add+0x160/0xef0
        __lock_acquire+0x147d/0x1950
        lock_acquire+0xd6/0x320
        __mutex_lock+0x9c/0xfc0
        __irq_domain_alloc_irqs+0x3b/0xa0
        dmar_alloc_hwirq+0x9e/0x120
        iommu_pmu_register+0x11d/0x200
        intel_iommu_init+0x5de/0x880
        pci_iommu_init+0x12/0x40
        do_one_initcall+0x65/0x350
        kernel_init_freeable+0x3ca/0x610
        kernel_init+0x1a/0x140
        ret_from_fork+0x29/0x50

 other info that might help us debug this:

 Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(dmar_global_lock);
                                lock(&domain->mutex);
                                lock(dmar_global_lock);
   lock(&domain->mutex);

                *** DEADLOCK ***

Fixes: 9dbb8e3452 ("irqdomain: Switch to per-domain locking")
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Tested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230314051836.23817-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20230329134721.469447-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:06:15 +02:00
Jason Gunthorpe
99b5726b44 iommu: Remove ioasid infrastructure
This has no use anymore, delete it all.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-8-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:31 +02:00
Jacob Pan
fffaed1e24 iommu/ioasid: Rename INVALID_IOASID
INVALID_IOASID and IOMMU_PASID_INVALID are duplicated. Rename
INVALID_IOASID and consolidate since we are moving away from IOASID
infrastructure.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-7-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:27 +02:00
Jacob Pan
760f41d182 iommu/vt-d: Remove virtual command interface
Virtual command interface was introduced to allow using host PASIDs
inside VMs. It is unused and abandoned due to architectural change.

With this patch, we can safely remove this feature and the related helpers.

Link: https://lore.kernel.org/r/20230210230206.3160144-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:21 +02:00
Lu Baolu
c33fcc13ee iommu: Use sysfs_emit() for sysfs show
Use sysfs_emit() instead of the sprintf() for sysfs entries. sysfs_emit()
knows the maximum of the temporary buffer used for outputting sysfs
content and avoids overrunning the buffer length.

Prefer 'long long' over 'long long int' as suggested by checkpatch.pl.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230322123421.278852-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:47:10 +01:00
Linus Torvalds
143c7bc649 iommufd for 6.3
Some polishing and small fixes for iommufd:
 
 - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem
 
 - Use GFP_KERNEL_ACCOUNT inside the iommu_domains
 
 - Support VFIO_NOIOMMU mode with iommufd
 
 - Various typos
 
 - A list corruption bug if HWPTs are used for attach
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/TgzQAKCRCFwuHvBreF
 Ya3AAP4/WxTJIbDvtTyH3Fae3NxTdO8j8gsUvU1vrRYG83zdnAEAxd1yii7GEO8D
 crkeq9D4FUiPAkFnJ64Exw2FHb060Qg=
 =RABK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Some polishing and small fixes for iommufd:

   - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt
     subsystem

   - Use GFP_KERNEL_ACCOUNT inside the iommu_domains

   - Support VFIO_NOIOMMU mode with iommufd

   - Various typos

   - A list corruption bug if HWPTs are used for attach"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Do not add the same hwpt to the ioas->hwpt_list twice
  iommufd: Make sure to zero vfio_iommu_type1_info before copying to user
  vfio: Support VFIO_NOIOMMU with iommufd
  iommufd: Add three missing structures in ucmd_buffer
  selftests: iommu: Fix test_cmd_destroy_access() call in user_copy
  iommu: Remove IOMMU_CAP_INTR_REMAP
  irq/s390: Add arch_is_isolated_msi() for s390
  iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI
  genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI
  genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code
  iommufd: Convert to msi_device_has_isolated_msi()
  vfio/type1: Convert to iommu_group_has_isolated_msi()
  iommu: Add iommu_group_has_isolated_msi()
  genirq/msi: Add msi_device_has_isolated_msi()
2023-02-24 14:34:12 -08:00
Joerg Roedel
bedd29d793 Merge branches 'apple/dart', 'arm/exynos', 'arm/renesas', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next 2023-02-18 15:43:04 +01:00
Tina Zhang
257ec29074 iommu/vt-d: Allow to use flush-queue when first level is default
Commit 29b3283972 ("iommu/vt-d: Do not use flush-queue when caching-mode
is on") forced default domains to be strict mode as long as IOMMU
caching-mode is flagged. The reason for doing this is that when vIOMMU
uses VT-d caching mode to synchronize shadowing page tables, the strict
mode shows better performance.

However, this optimization is orthogonal to the first-level page table
because the Intel VT-d architecture does not define the caching mode of
the first-level page table. Refer to VT-d spec, section 6.1, "When the
CM field is reported as Set, any software updates to remapping
structures other than first-stage mapping (including updates to not-
present entries or present entries whose programming resulted in
translation faults) requires explicit invalidation of the caches."
Exclude the first-level page table from this optimization.

Generally using first-stage translation in vIOMMU implies nested
translation enabled in the physical IOMMU. In this case the first-stage
page table is wholly captured by the guest. The vIOMMU only needs to
transfer the cache invalidations on vIOMMU to the physical IOMMU.
Forcing the default domain to strict mode will cause more frequent
cache invalidations, resulting in performance degradation. In a real
performance benchmark test measured by iperf receive, the performance
result on Sapphire Rapids 100Gb NIC shows:
w/ this fix ~51 Gbits/s, w/o this fix ~39.3 Gbits/s.

Theoretically a first-stage IOMMU page table can still be shadowed
in absence of the caching mode, e.g. with host write-protecting guest
IOMMU page table to synchronize changed PTEs with the physical
IOMMU page table. In this case the shadowing overhead is decoupled
from emulating IOTLB invalidation then the overhead of the latter part
is solely decided by the frequency of IOTLB invalidations. Hence
allowing guest default dma domain to be lazy can also benefit the
overall performance by reducing the total VM-exit numbers.

Fixes: 29b3283972 ("iommu/vt-d: Do not use flush-queue when caching-mode is on")
Reported-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230214025618.2292889-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:05 +01:00
Jacob Pan
194b3348bd iommu/vt-d: Fix PASID directory pointer coherency
On platforms that do not support IOMMU Extended capability bit 0
Page-walk Coherency, CPU caches are not snooped when IOMMU is accessing
any translation structures. IOMMU access goes only directly to
memory. Intel IOMMU code was missing a flush for the PASID table
directory that resulted in the unrecoverable fault as shown below.

This patch adds clflush calls whenever allocating and updating
a PASID table directory to ensure cache coherency.

On the reverse direction, there's no need to clflush the PASID directory
pointer when we deactivate a context entry in that IOMMU hardware will
not see the old PASID directory pointer after we clear the context entry.
PASID directory entries are also never freed once allocated.

 DMAR: DRHD: handling fault status reg 3
 DMAR: [DMA Read NO_PASID] Request device [00:0d.2] fault addr 0x1026a4000
       [fault reason 0x51] SM: Present bit in Directory Entry is clear
 DMAR: Dump dmar1 table entries for IOVA 0x1026a4000
 DMAR: scalable mode root entry: hi 0x0000000102448001, low 0x0000000101b3e001
 DMAR: context entry: hi 0x0000000000000000, low 0x0000000101b4d401
 DMAR: pasid dir entry: 0x0000000101b4e001
 DMAR: pasid table entry[0]: 0x0000000000000109
 DMAR: pasid table entry[1]: 0x0000000000000001
 DMAR: pasid table entry[2]: 0x0000000000000000
 DMAR: pasid table entry[3]: 0x0000000000000000
 DMAR: pasid table entry[4]: 0x0000000000000000
 DMAR: pasid table entry[5]: 0x0000000000000000
 DMAR: pasid table entry[6]: 0x0000000000000000
 DMAR: pasid table entry[7]: 0x0000000000000000
 DMAR: PTE not present at level 4

Cc: <stable@vger.kernel.org>
Fixes: 0bbeb01a4f ("iommu/vt-d: Manage scalalble mode PASID tables")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Sukumar Ghorai <sukumar.ghorai@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230209212843.1788125-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:05 +01:00
Jacob Pan
16a75bbe48 iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
Intel IOMMU driver implements IOTLB flush queue with domain selective
or PASID selective invalidations. In this case there's no need to track
IOVA page range and sync IOTLBs, which may cause significant performance
hit.

This patch adds a check to avoid IOVA gather page and IOTLB sync for
the lazy path.

The performance difference on Sapphire Rapids 100Gb NIC is improved by
the following (as measured by iperf send):

w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s

Cc: <stable@vger.kernel.org>
Fixes: 2a2b8eaa5b ("iommu: Handle freelists when using deferred flushing in iommu drivers")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230209175330.1783556-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:05 +01:00
Lu Baolu
60b1daa3b1 iommu/vt-d: Fix error handling in sva enable/disable paths
Roll back all previous actions in error paths of intel_iommu_enable_sva()
and intel_iommu_disable_sva().

Fixes: d5b9e4bfe0 ("iommu/vt-d: Report prq to io-pgfault framework")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230208051559.700109-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:04 +01:00
Kan Liang
d8a7c0cf05 iommu/vt-d: Enable IOMMU perfmon support
Register and enable an IOMMU perfmon for each active IOMMU device.

The failure of IOMMU perfmon registration doesn't impact other
functionalities of an IOMMU device.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-8-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:09 +01:00
Kan Liang
4a0d426565 iommu/vt-d: Add IOMMU perfmon overflow handler support
While enabled to count events and an event occurrence causes the counter
value to increment and roll over to or past zero, this is termed a
counter overflow. The overflow can trigger an interrupt. The IOMMU
perfmon needs to handle the case properly.

New HW IRQs are allocated for each IOMMU device for perfmon. The IRQ IDs
are after the SVM range.

In the overflow handler, the counter is not frozen. It's very unlikely
that the same counter overflows again during the period. But it's
possible that other counters overflow at the same time. Read the
overflow register at the end of the handler and check whether there are
more.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-7-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:08 +01:00
Kan Liang
46284c6ceb iommu/vt-d: Support cpumask for IOMMU perfmon
The perf subsystem assumes that all counters are by default per-CPU. So
the user space tool reads a counter from each CPU. However, the IOMMU
counters are system-wide and can be read from any CPU. Here we use a CPU
mask to restrict counting to one CPU to handle the issue. (with CPU
hotplug notifier to choose a different CPU if the chosen one is taken
off-line).

The CPU is exposed to /sys/bus/event_source/devices/dmar*/cpumask for
the user space perf tool.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-6-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:08 +01:00
Kan Liang
7232ab8b89 iommu/vt-d: Add IOMMU perfmon support
Implement the IOMMU performance monitor capability, which supports the
collection of information about key events occurring during operation of
the remapping hardware, to aid performance tuning and debug.

The IOMMU perfmon support is implemented as part of the IOMMU driver and
interfaces with the Linux perf subsystem.

The IOMMU PMU has the following unique features compared with the other
PMUs.
- Support counting. Not support sampling.
- Does not support per-thread counting. The scope is system-wide.
- Support per-counter capability register. The event constraints can be
  enumerated.
- The available event and event group can also be enumerated.
- Extra Enhanced Commands are introduced to control the counters.

Add a new variable, struct iommu_pmu *pmu, to in the struct intel_iommu
to track the PMU related information.

Add iommu_pmu_register() and iommu_pmu_unregister() to register and
unregister a IOMMU PMU. The register function setup the IOMMU PMU ops
and invoke the standard perf_pmu_register() interface to register a PMU
in the perf subsystem. This patch only exposes the functions. The
following patch will enable them in the IOMMU driver.

The IOMMU PMUs can be found under /sys/bus/event_source/devices/dmar*

The available filters and event format can be found at the format folder

 $ ls /sys/bus/event_source/devices/dmar1/format/
 event  event_group  filter_ats  filter_ats_en  filter_page_table
 filter_page_table_en

The supported events can be found at the events folder

 $ ls /sys/bus/event_source/devices/dmar1/events/
 ats_blocked        fs_nonleaf_hit           int_cache_hit_posted
 iommu_mem_blocked  iotlb_hit        pasid_cache_lookup  ss_nonleaf_hit
 ctxt_cache_hit     fs_nonleaf_lookup        int_cache_lookup
 iommu_mrds         iotlb_lookup     pg_req_posted    ss_nonleaf_lookup
 ctxt_cache_lookup  int_cache_hit_nonposted  iommu_clocks
 iommu_requests     pasid_cache_hit  pw_occupancy

The command below illustrates filter usage with a simple example.

 $ perf stat -e dmar1/iommu_requests,filter_ats_en=0x1,filter_ats=0x1/
   -a sleep 1

 Performance counter stats for 'system wide':

   368,947      dmar1/iommu_requests,filter_ats_en=0x1,filter_ats=0x1/

 1.002592074 seconds time elapsed

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-5-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:06 +01:00
Kan Liang
dc57875866 iommu/vt-d: Support Enhanced Command Interface
The Enhanced Command Register is to submit command and operand of
enhanced commands to DMA Remapping hardware. It can supports up to 256
enhanced commands.

There is a HW register to indicate the availability of all 256 enhanced
commands. Each bit stands for each command. But there isn't an existing
interface to read/write all 256 bits. Introduce the u64 ecmdcap[4] to
store the existence of each enhanced command. Read 4 times to get all of
them in map_iommu().

Add a helper to facilitate an enhanced command launch. Make sure hardware
complete the command. Also add a helper to facilitate the check of PMU
essentials. These helpers will be used later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-4-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:05 +01:00
Kan Liang
a6a5006dad iommu/vt-d: Retrieve IOMMU perfmon capability information
The performance monitoring infrastructure, perfmon, is to support
collection of information about key events occurring during operation of
the remapping hardware, to aid performance tuning and debug. Each
remapping hardware unit has capability registers that indicate support
for performance monitoring features and enumerate the capabilities.

Add alloc_iommu_pmu() to retrieve IOMMU perfmon capability information
for each iommu unit. The information is stored in the iommu->pmu data
structure. Capability registers are read-only, so it's safe to prefetch
and store them in the pmu structure. This could avoid unnecessary VMEXIT
when this code is running in the virtualization environment.

Add free_iommu_pmu() to free the saved capability information when
freeing the iommu unit.

Add a kernel config option for the IOMMU perfmon feature. Unless a user
explicitly uses the perf tool to monitor the IOMMU perfmon event, there
isn't any impact for the existing IOMMU. Enable it by default.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-3-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:04 +01:00
Kan Liang
4db96bfe9d iommu/vt-d: Support size of the register set in DRHD
A new field, which indicates the size of the remapping hardware register
set for this remapping unit, is introduced in the DMA-remapping hardware
unit definition (DRHD) structure with the VT-d Spec 4.0. With this
information, SW doesn't need to 'guess' the size of the register set
anymore.

Update the struct acpi_dmar_hardware_unit to reflect the field. Store the
size of the register set in struct dmar_drhd_unit for each dmar device.

The 'size' information is ResvZ for the old BIOS and platforms. Fall back
to the old guessing method. There is nothing changed.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-2-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:03 +01:00
Lu Baolu
e06d244355 iommu/vt-d: Set No Execute Enable bit in PASID table entry
Setup No Execute Enable bit (Bit 133) of a scalable mode PASID entry.
This is to allow the use of XD bit of the first level page table.

Fixes: ddf09b6d43 ("iommu/vt-d: Setup pasid entries for iova over first level")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230126095438.354205-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:03 +01:00
Lu Baolu
ec9ab12dee iommu/vt-d: Remove sva from intel_svm_dev
After commit be51b1d6bb ("iommu/sva: Refactoring
iommu_sva_bind/unbind_device()"), the iommu driver doesn't need to
return an iommu_sva pointer anymore. This removes the sva field
from intel_svm_dev and cleanups the code accordingly.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230109014955.147068-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:02 +01:00
Lu Baolu
49cab9d2b8 iommu/vt-d: Remove users from intel_svm_dev
It was used as a reference counter of an existing bond between device
and user application memory address. Commit be51b1d6bb ("iommu/sva:
Refactoring iommu_sva_bind/unbind_device()") has added this in iommu
core. Remove it to avoid duplicate code.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230109014955.147068-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:02 +01:00
Lu Baolu
557abbd60c iommu/vt-d: Remove unused fields in svm structures
They aren't used anywhere. Remove them to avoid dead code.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230109014955.147068-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:01 +01:00
Lu Baolu
d82e6ae67a iommu/vt-d: Remove include/linux/intel-svm.h
There's no need to have a public header for Intel SVA implementation.
The device driver should interact with Intel SVA implementation via
the IOMMU generic APIs.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230109014955.147068-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:00 +01:00
Jason Gunthorpe
fd9f2a9122 Merge branch 'iommu-memory-accounting' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu intoiommufd/for-next
Jason Gunthorpe says:

====================
iommufd follows the same design as KVM and uses memory cgroups to limit
the amount of kernel memory a iommufd file descriptor can pin down. The
various internal data structures already use GFP_KERNEL_ACCOUNT to charge
its own memory.

However, one of the biggest consumers of kernel memory is the IOPTEs
stored under the iommu_domain and these allocations are not tracked.

This series is the first step in fixing it.

The iommu driver contract already includes a 'gfp' argument to the
map_pages op, allowing iommufd to specify GFP_KERNEL_ACCOUNT and then
having the driver allocate the IOPTE tables with that flag will capture a
significant amount of the allocations.

Update the iommu_map() API to pass in the GFP argument, and fix all call
sites. Replace iommu_map_atomic().

Audit the "enterprise" iommu drivers to make sure they do the right thing.
Intel and S390 ignore the GFP argument and always use GFP_ATOMIC. This is
problematic for iommufd anyhow, so fix it. AMD and ARM SMMUv2/3 are
already correct.

A follow up series will be needed to capture the allocations made when the
iommu_domain itself is allocated, which will complete the job.
====================

* 'iommu-memory-accounting' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/s390: Use GFP_KERNEL in sleepable contexts
  iommu/s390: Push the gfp parameter to the kmem_cache_alloc()'s
  iommu/intel: Use GFP_KERNEL in sleepable contexts
  iommu/intel: Support the gfp argument to the map_pages op
  iommu/intel: Add a gfp parameter to alloc_pgtable_page()
  iommufd: Use GFP_KERNEL_ACCOUNT for iommu_map()
  iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous()
  iommu: Add a gfp parameter to iommu_map_sg()
  iommu: Remove iommu_map_atomic()
  iommu: Add a gfp parameter to iommu_map()

Link: https://lore.kernel.org/linux-iommu/0-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-30 13:54:35 -04:00
Jason Gunthorpe
4951eb2623 iommu/intel: Use GFP_KERNEL in sleepable contexts
These contexts are sleepable, so use the proper annotation. The GFP_ATOMIC
was added mechanically in the prior patches.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/8-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-25 11:52:06 +01:00
Jason Gunthorpe
2d4d767659 iommu/intel: Support the gfp argument to the map_pages op
Flow it down to alloc_pgtable_page() via pfn_to_dma_pte() and
__domain_mapping().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-25 11:52:06 +01:00
Jason Gunthorpe
2552d3a229 iommu/intel: Add a gfp parameter to alloc_pgtable_page()
This is eventually called by iommufd through intel_iommu_map_pages() and
it should not be forced to atomic. Push the GFP_ATOMIC to all callers.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-25 11:52:05 +01:00
Jason Gunthorpe
f188bdb5f1 iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI
On x86 platforms when the HW can support interrupt remapping the iommu
driver creates an irq_domain for the IR hardware and creates a child MSI
irq_domain.

When the global irq_remapping_enabled is set, the IR MSI domain is
assigned to the PCI devices (by intel_irq_remap_add_device(), or
amd_iommu_set_pci_msi_domain()) making those devices have the isolated MSI
property.

Due to how interrupt domains work, setting IRQ_DOMAIN_FLAG_ISOLATED_MSI on
the parent IR domain will cause all struct devices attached to it to
return true from msi_device_has_isolated_msi(). This replaces the
IOMMU_CAP_INTR_REMAP flag as all places using IOMMU_CAP_INTR_REMAP also
call msi_device_has_isolated_msi()

Set the flag and delete the cap.

Link: https://lore.kernel.org/r/7-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-11 16:27:23 -04:00
Linus Torvalds
b8fd76f418 IOMMU Updates for Linux v6.2
Including:
 
 	- Core code:
 	  - map/unmap_pages() cleanup
 	  - SVA and IOPF refactoring
 	  - Clean up and document return codes from device/domain
 	    attachment code
 
 	- AMD driver:
 	  - Rework and extend parsing code for ivrs_ioapic, ivrs_hpet
 	    and ivrs_acpihid command line options
 	  - Some smaller cleanups
 
 	- Intel driver:
 	  - Blocking domain support
 	  - Cleanups
 
 	- S390 driver:
 	  - Fixes and improvements for attach and aperture handling
 
 	- PAMU driver:
 	  - Resource leak fix and cleanup
 
 	- Rockchip driver:
 	  - Page table permission bit fix
 
 	- Mediatek driver:
 	  - Improve safety from invalid dts input
 	  - Smaller fixes and improvements
 
 	- Exynos driver:
 	  - Fix driver initialization sequence
 
 	- Sun50i driver:
 	  - Remove IOMMU_DOMAIN_IDENTITY as it has not been working
 	    forever
 	  - Various other fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmOd1PQACgkQK/BELZcB
 GuO7NxAAiwJUO99pTwvqnByzcC783AuE/fqKHDb9DZaN6Cr0VXSbKEwm8Lc2PC00
 2CTwK/zGhy8BKBQnPiooJ+YOMPjE4yhFIF9jr5ASH5AVWv8EEFpo8zIFKAcF5rh/
 c2Y5RIUwsGXuhR7U3lMTw84r39TZG2eHPwTEU6KvEJ1LCOMyD8IBYrZK2rvpGpem
 3swXUfF5bQGAT8LlIFN7p+qsVs6ZtuD40qre3kerjrBtCPUMlxIIV5TJ8oQTecsk
 vKpD51mEVW+rjUKvqui8NDYuPfT76F2FPS37dfA1F36p8dmsMGSrtWngNm73r546
 AmY8Gui6wKsv4Qn7Mxv49f/WZIXzdRTXOKx/zhYvvGxu7keqQIRIWYcLSxqfaGku
 cqJT401Ws1NHmRpx/t90lMH/anY5+kUMRTQG9Iq5ruLhExskd0SJcffa1i7YIGIe
 lPCTDf7MOXfDudR0Dtp87pGZQBaSkrSzZvb7qZY3Bj83WGZnLPpl6Z3N8KbkGzEO
 zNNvv1CtxZnIPrdOaKvfxQlAKiWKxkPRHuqk1TE8hkoNOe5ZgdOSJP5SeCrZ5tEf
 qljPXvDVF9f8CYw7QlfEDnbLnqDMGZpPAGqKPItbaijQLPZx4Jm4dw6+7i9hETIa
 wJ+1R9iAf+qiR0rlqueALKRaI4DjE8RU8yYSDpn2kn0BUOhWmb8=
 =ZM/m
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "Core code:
   - map/unmap_pages() cleanup
   - SVA and IOPF refactoring
   - Clean up and document return codes from device/domain attachment

  AMD driver:
   - Rework and extend parsing code for ivrs_ioapic, ivrs_hpet and
     ivrs_acpihid command line options
   - Some smaller cleanups

  Intel driver:
   - Blocking domain support
   - Cleanups

  S390 driver:
   - Fixes and improvements for attach and aperture handling

  PAMU driver:
   - Resource leak fix and cleanup

  Rockchip driver:
   - Page table permission bit fix

  Mediatek driver:
   - Improve safety from invalid dts input
   - Smaller fixes and improvements

  Exynos driver:
   - Fix driver initialization sequence

  Sun50i driver:
   - Remove IOMMU_DOMAIN_IDENTITY as it has not been working forever
   - Various other fixes"

* tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (74 commits)
  iommu/mediatek: Fix forever loop in error handling
  iommu/mediatek: Fix crash on isr after kexec()
  iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITY
  iommu/amd: Fix typo in macro parameter name
  iommu/mediatek: Remove unused "mapping" member from mtk_iommu_data
  iommu/mediatek: Improve safety for mediatek,smi property in larb nodes
  iommu/mediatek: Validate number of phandles associated with "mediatek,larbs"
  iommu/mediatek: Add error path for loop of mm_dts_parse
  iommu/mediatek: Use component_match_add
  iommu/mediatek: Add platform_device_put for recovering the device refcnt
  iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe()
  iommu/vt-d: Use real field for indication of first level
  iommu/vt-d: Remove unnecessary domain_context_mapped()
  iommu/vt-d: Rename domain_add_dev_info()
  iommu/vt-d: Rename iommu_disable_dev_iotlb()
  iommu/vt-d: Add blocking domain support
  iommu/vt-d: Add device_block_translation() helper
  iommu/vt-d: Allocate pasid table in device probe path
  iommu/amd: Check return value of mmu_notifier_register()
  iommu/amd: Fix pci device refcount leak in ppr_notifier()
  ...
2022-12-19 08:34:39 -06:00
Linus Torvalds
4f292c4de4 New Feature:
* Randomize the per-cpu entry areas
 Cleanups:
 * Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open
   coding it
 * Move to "native" set_memory_rox() helper
 * Clean up pmd_get_atomic() and i386-PAE
 * Remove some unused page table size macros
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOc53UACgkQaDWVMHDJ
 krCUHw//SGZ+La0hLZLAiAiZTXLZZHpYkOmg1Oj1+11qSU11uZzTFqDpauhaKpRS
 cJCSh+D+RXe5e2ipgt0+Zl0hESLt7pJf8258OE4ra0DL/IlyO9uqruAs9Kn3eRS/
 Fk76nG8gdEU+JKJqpG02GqOLslYQuIy96n9hpuj1x25b614+uezPfC7S4XEat0NT
 MbJQ+jnVDf16aJIJkzT+iSwhubDVeh+bSHeO0SSCzX23WLUqDeg5NvlyxoCHGbBh
 UpUTWggV/0pYAkBKRHToeJs8qTWREwuuH/8JGewpe9A0tjdB5wyZfNL2PuracweN
 9MauXC3T5f0+Ca4yIIaPq1fF7Ny/PR2dBFihk27rOD0N7tjaZxNwal2pB1sZcmvZ
 +PAokjyTPVH5ZXjkMYGGAUe1jyjwr2+TgFSZxhTnDuGtyVQiY4pihGKOifLCX6tv
 x6khvYeTBw7wfaDRtKEAf+2kLHYn+71HszHP/8bNKX9T03h+Zf0i1wdZu5xbM5Gc
 VK2wR7bCC+UftJJYG0pldcHg2qaF19RBHK2tLwp7zngUv7lTbkKfkgKjre73KV2a
 D4b76lrqdUMo6UYwYdw7WtDyarZS4OVLq2DcNhwwMddBCaX8kyN5a4AqwQlZYJ0u
 dM+kuMofE8U3yMxmMhJimkZUsj09yLHIqfynY0jbAcU3nhKZZNY=
 =wwVF
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Dave Hansen:
 "New Feature:

   - Randomize the per-cpu entry areas

  Cleanups:

   - Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it

   - Move to "native" set_memory_rox() helper

   - Clean up pmd_get_atomic() and i386-PAE

   - Remove some unused page table size macros"

* tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  x86/mm: Ensure forced page table splitting
  x86/kasan: Populate shadow for shared chunk of the CPU entry area
  x86/kasan: Add helpers to align shadow addresses up and down
  x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
  x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
  x86/mm: Recompute physical address for every page of per-CPU CEA mapping
  x86/mm: Rename __change_page_attr_set_clr(.checkalias)
  x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
  x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
  x86/mm: Add a few comments
  x86/mm: Fix CR3_ADDR_MASK
  x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
  mm: Convert __HAVE_ARCH_P..P_GET to the new style
  mm: Remove pointless barrier() after pmdp_get_lockless()
  x86/mm/pae: Get rid of set_64bit()
  x86_64: Remove pointless set_64bit() usage
  x86/mm/pae: Be consistent with pXXp_get_and_clear()
  x86/mm/pae: Use WRITE_ONCE()
  x86/mm/pae: Don't (ab)use atomic64
  mm/gup: Fix the lockless PMD access
  ...
2022-12-17 14:06:53 -06:00
Peter Zijlstra
9ee850acd2 x86_64: Remove pointless set_64bit() usage
The use of set_64bit() in X86_64 only code is pretty pointless, seeing
how it's a direct assignment. Remove all this nonsense.

[nathanchance: unbreak irte]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114425.168036718%40infradead.org
2022-12-15 10:37:27 -08:00
Linus Torvalds
08cdc21579 iommufd for 6.2
iommufd is the user API to control the IOMMU subsystem as it relates to
 managing IO page tables that point at user space memory.
 
 It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
 container) which is the VFIO specific interface for a similar idea.
 
 We see a broad need for extended features, some being highly IOMMU device
 specific:
  - Binding iommu_domain's to PASID/SSID
  - Userspace IO page tables, for ARM, x86 and S390
  - Kernel bypassed invalidation of user page tables
  - Re-use of the KVM page table in the IOMMU
  - Dirty page tracking in the IOMMU
  - Runtime Increase/Decrease of IOPTE size
  - PRI support with faults resolved in userspace
 
 Many of these HW features exist to support VM use cases - for instance the
 combination of PASID, PRI and Userspace IO Page Tables allows an
 implementation of DMA Shared Virtual Addressing (vSVA) within a
 guest. Dirty tracking enables VM live migration with SRIOV devices and
 PASID support allow creating "scalable IOV" devices, among other things.
 
 As these features are fundamental to a VM platform they need to be
 uniformly exposed to all the driver families that do DMA into VMs, which
 is currently VFIO and VDPA.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF
 YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl
 s7fAu+CQ1pr9+9NKGevD+frw8Solsw4=
 =jJkd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd implementation from Jason Gunthorpe:
 "iommufd is the user API to control the IOMMU subsystem as it relates
  to managing IO page tables that point at user space memory.

  It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
  container) which is the VFIO specific interface for a similar idea.

  We see a broad need for extended features, some being highly IOMMU
  device specific:
   - Binding iommu_domain's to PASID/SSID
   - Userspace IO page tables, for ARM, x86 and S390
   - Kernel bypassed invalidation of user page tables
   - Re-use of the KVM page table in the IOMMU
   - Dirty page tracking in the IOMMU
   - Runtime Increase/Decrease of IOPTE size
   - PRI support with faults resolved in userspace

  Many of these HW features exist to support VM use cases - for instance
  the combination of PASID, PRI and Userspace IO Page Tables allows an
  implementation of DMA Shared Virtual Addressing (vSVA) within a guest.
  Dirty tracking enables VM live migration with SRIOV devices and PASID
  support allow creating "scalable IOV" devices, among other things.

  As these features are fundamental to a VM platform they need to be
  uniformly exposed to all the driver families that do DMA into VMs,
  which is currently VFIO and VDPA"

For more background, see the extended explanations in Jason's pull request:

  https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits)
  iommufd: Change the order of MSI setup
  iommufd: Improve a few unclear bits of code
  iommufd: Fix comment typos
  vfio: Move vfio group specific code into group.c
  vfio: Refactor dma APIs for emulated devices
  vfio: Wrap vfio group module init/clean code into helpers
  vfio: Refactor vfio_device open and close
  vfio: Make vfio_device_open() truly device specific
  vfio: Swap order of vfio_device_container_register() and open_device()
  vfio: Set device->group in helper function
  vfio: Create wrappers for group register/unregister
  vfio: Move the sanity check of the group to vfio_create_group()
  vfio: Simplify vfio_create_group()
  iommufd: Allow iommufd to supply /dev/vfio/vfio
  vfio: Make vfio_container optionally compiled
  vfio: Move container related MODULE_ALIAS statements into container.c
  vfio-iommufd: Support iommufd for emulated VFIO devices
  vfio-iommufd: Support iommufd for physical VFIO devices
  vfio-iommufd: Allow iommufd to be used in place of a container fd
  vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
  ...
2022-12-14 09:15:43 -08:00
Linus Torvalds
9d33edb20f Updates for the interrupt core and driver subsystem:
- Core:
 
    The bulk is the rework of the MSI subsystem to support per device MSI
    interrupt domains. This solves conceptual problems of the current
    PCI/MSI design which are in the way of providing support for PCI/MSI[-X]
    and the upcoming PCI/IMS mechanism on the same device.
 
    IMS (Interrupt Message Store] is a new specification which allows device
    manufactures to provide implementation defined storage for MSI messages
    contrary to the uniform and specification defined storage mechanisms for
    PCI/MSI and PCI/MSI-X. IMS not only allows to overcome the size limitations
    of the MSI-X table, but also gives the device manufacturer the freedom to
    store the message in arbitrary places, even in host memory which is shared
    with the device.
 
    There have been several attempts to glue this into the current MSI code,
    but after lengthy discussions it turned out that there is a fundamental
    design problem in the current PCI/MSI-X implementation. This needs some
    historical background.
 
    When PCI/MSI[-X] support was added around 2003, interrupt management was
    completely different from what we have today in the actively developed
    architectures. Interrupt management was completely architecture specific
    and while there were attempts to create common infrastructure the
    commonalities were rudimentary and just providing shared data structures and
    interfaces so that drivers could be written in an architecture agnostic
    way.
 
    The initial PCI/MSI[-X] support obviously plugged into this model which
    resulted in some basic shared infrastructure in the PCI core code for
    setting up MSI descriptors, which are a pure software construct for holding
    data relevant for a particular MSI interrupt, but the actual association to
    Linux interrupts was completely architecture specific. This model is still
    supported today to keep museum architectures and notorious stranglers
    alive.
 
    In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel,
    which was creating yet another architecture specific mechanism and resulted
    in an unholy mess on top of the existing horrors of x86 interrupt handling.
    The x86 interrupt management code was already an incomprehensible maze of
    indirections between the CPU vector management, interrupt remapping and the
    actual IO/APIC and PCI/MSI[-X] implementation.
 
    At roughly the same time ARM struggled with the ever growing SoC specific
    extensions which were glued on top of the architected GIC interrupt
    controller.
 
    This resulted in a fundamental redesign of interrupt management and
    provided the today prevailing concept of hierarchical interrupt
    domains. This allowed to disentangle the interactions between x86 vector
    domain and interrupt remapping and also allowed ARM to handle the zoo of
    SoC specific interrupt components in a sane way.
 
    The concept of hierarchical interrupt domains aims to encapsulate the
    functionality of particular IP blocks which are involved in interrupt
    delivery so that they become extensible and pluggable. The X86
    encapsulation looks like this:
 
                                             |--- device 1
      [Vector]---[Remapping]---[PCI/MSI]--|...
                                             |--- device N
 
    where the remapping domain is an optional component and in case that it is
    not available the PCI/MSI[-X] domains have the vector domain as their
    parent. This reduced the required interaction between the domains pretty
    much to the initialization phase where it is obviously required to
    establish the proper parent relation ship in the components of the
    hierarchy.
 
    While in most cases the model is strictly representing the chain of IP
    blocks and abstracting them so they can be plugged together to form a
    hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware
    it's clear that the actual PCI/MSI[-X] interrupt controller is not a global
    entity, but strict a per PCI device entity.
 
    Here we took a short cut on the hierarchical model and went for the easy
    solution of providing "global" PCI/MSI domains which was possible because
    the PCI/MSI[-X] handling is uniform across the devices. This also allowed
    to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in
    turn made it simple to keep the existing architecture specific management
    alive.
 
    A similar problem was created in the ARM world with support for IP block
    specific message storage. Instead of going all the way to stack a IP block
    specific domain on top of the generic MSI domain this ended in a construct
    which provides a "global" platform MSI domain which allows overriding the
    irq_write_msi_msg() callback per allocation.
 
    In course of the lengthy discussions we identified other abuse of the MSI
    infrastructure in wireless drivers, NTB etc. where support for
    implementation specific message storage was just mindlessly glued into the
    existing infrastructure. Some of this just works by chance on particular
    platforms but will fail in hard to diagnose ways when the driver is used
    on platforms where the underlying MSI interrupt management code does not
    expect the creative abuse.
 
    Another shortcoming of today's PCI/MSI-X support is the inability to
    allocate or free individual vectors after the initial enablement of
    MSI-X. This results in an works by chance implementation of VFIO (PCI
    pass-through) where interrupts on the host side are not set up upfront to
    avoid resource exhaustion. They are expanded at run-time when the guest
    actually tries to use them. The way how this is implemented is that the
    host disables MSI-X and then re-enables it with a larger number of
    vectors again. That works by chance because most device drivers set up
    all interrupts before the device actually will utilize them. But that's
    not universally true because some drivers allocate a large enough number
    of vectors but do not utilize them until it's actually required,
    e.g. for acceleration support. But at that point other interrupts of the
    device might be in active use and the MSI-X disable/enable dance can
    just result in losing interrupts and therefore hard to diagnose subtle
    problems.
 
    Last but not least the "global" PCI/MSI-X domain approach prevents to
    utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS
    is not longer providing a uniform storage and configuration model.
 
    The solution to this is to implement the missing step and switch from
    global PCI/MSI domains to per device PCI/MSI domains. The resulting
    hierarchy then looks like this:
 
                               |--- [PCI/MSI] device 1
      [Vector]---[Remapping]---|...
                               |--- [PCI/MSI] device N
 
    which in turn allows to provide support for multiple domains per device:
 
                               |--- [PCI/MSI] device 1
                               |--- [PCI/IMS] device 1
      [Vector]---[Remapping]---|...
                               |--- [PCI/MSI] device N
                               |--- [PCI/IMS] device N
 
    This work converts the MSI and PCI/MSI core and the x86 interrupt
    domains to the new model, provides new interfaces for post-enable
    allocation/free of MSI-X interrupts and the base framework for PCI/IMS.
    PCI/IMS has been verified with the work in progress IDXD driver.
 
    There is work in progress to convert ARM over which will replace the
    platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
    "solutions" are in the works as well.
 
  - Drivers:
 
    - Updates for the LoongArch interrupt chip drivers
 
    - Support for MTK CIRQv2
 
    - The usual small fixes and updates all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUsygTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYXiD/40tXKzCzf0qFIqUlZLia1N3RRrwrNC
 DVTixuLtR9MrjwE+jWLQILa85SHInV8syXHSd35SzhsGDxkURFGi+HBgVWmysODf
 br9VSh3Gi+kt7iXtIwAg8WNWviGNmS3kPksxCko54F0YnJhMY5r5bhQVUBQkwFG2
 wES1C9Uzd4pdV2bl24Z+WKL85cSmZ+pHunyKw1n401lBABXnTF9c4f13zC14jd+y
 wDxNrmOxeL3mEH4Pg6VyrDuTOURSf3TjJjeEq3EYqvUo0FyLt9I/cKX0AELcZQX7
 fkRjrQQAvXNj39RJfeSkojDfllEPUHp7XSluhdBu5aIovSamdYGCDnuEoZ+l4MJ+
 CojIErp3Dwj/uSaf5c7C3OaDAqH2CpOFWIcrUebShJE60hVKLEpUwd6W8juplaoT
 gxyXRb1Y+BeJvO8VhMN4i7f3232+sj8wuj+HTRTTbqMhkElnin94tAx8rgwR1sgR
 BiOGMJi4K2Y8s9Rqqp0Dvs01CW4guIYvSR4YY+WDbbi1xgiev89OYs6zZTJCJe4Y
 NUwwpqYSyP1brmtdDdBOZLqegjQm+TwUb6oOaasFem4vT1swgawgLcDnPOx45bk5
 /FWt3EmnZxMz99x9jdDn1+BCqAZsKyEbEY1avvhPVMTwoVIuSX2ceTBMLseGq+jM
 03JfvdxnueM3gw==
 =9erA
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt core and driver subsystem:

  The bulk is the rework of the MSI subsystem to support per device MSI
  interrupt domains. This solves conceptual problems of the current
  PCI/MSI design which are in the way of providing support for
  PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device.

  IMS (Interrupt Message Store] is a new specification which allows
  device manufactures to provide implementation defined storage for MSI
  messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified
  message store which is uniform accross all devices). The PCI/MSI[-X]
  uniformity allowed us to get away with "global" PCI/MSI domains.

  IMS not only allows to overcome the size limitations of the MSI-X
  table, but also gives the device manufacturer the freedom to store the
  message in arbitrary places, even in host memory which is shared with
  the device.

  There have been several attempts to glue this into the current MSI
  code, but after lengthy discussions it turned out that there is a
  fundamental design problem in the current PCI/MSI-X implementation.
  This needs some historical background.

  When PCI/MSI[-X] support was added around 2003, interrupt management
  was completely different from what we have today in the actively
  developed architectures. Interrupt management was completely
  architecture specific and while there were attempts to create common
  infrastructure the commonalities were rudimentary and just providing
  shared data structures and interfaces so that drivers could be written
  in an architecture agnostic way.

  The initial PCI/MSI[-X] support obviously plugged into this model
  which resulted in some basic shared infrastructure in the PCI core
  code for setting up MSI descriptors, which are a pure software
  construct for holding data relevant for a particular MSI interrupt,
  but the actual association to Linux interrupts was completely
  architecture specific. This model is still supported today to keep
  museum architectures and notorious stragglers alive.

  In 2013 Intel tried to add support for hot-pluggable IO/APICs to the
  kernel, which was creating yet another architecture specific mechanism
  and resulted in an unholy mess on top of the existing horrors of x86
  interrupt handling. The x86 interrupt management code was already an
  incomprehensible maze of indirections between the CPU vector
  management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X]
  implementation.

  At roughly the same time ARM struggled with the ever growing SoC
  specific extensions which were glued on top of the architected GIC
  interrupt controller.

  This resulted in a fundamental redesign of interrupt management and
  provided the today prevailing concept of hierarchical interrupt
  domains. This allowed to disentangle the interactions between x86
  vector domain and interrupt remapping and also allowed ARM to handle
  the zoo of SoC specific interrupt components in a sane way.

  The concept of hierarchical interrupt domains aims to encapsulate the
  functionality of particular IP blocks which are involved in interrupt
  delivery so that they become extensible and pluggable. The X86
  encapsulation looks like this:

                                            |--- device 1
     [Vector]---[Remapping]---[PCI/MSI]--|...
                                            |--- device N

  where the remapping domain is an optional component and in case that
  it is not available the PCI/MSI[-X] domains have the vector domain as
  their parent. This reduced the required interaction between the
  domains pretty much to the initialization phase where it is obviously
  required to establish the proper parent relation ship in the
  components of the hierarchy.

  While in most cases the model is strictly representing the chain of IP
  blocks and abstracting them so they can be plugged together to form a
  hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the
  hardware it's clear that the actual PCI/MSI[-X] interrupt controller
  is not a global entity, but strict a per PCI device entity.

  Here we took a short cut on the hierarchical model and went for the
  easy solution of providing "global" PCI/MSI domains which was possible
  because the PCI/MSI[-X] handling is uniform across the devices. This
  also allowed to keep the existing PCI/MSI[-X] infrastructure mostly
  unchanged which in turn made it simple to keep the existing
  architecture specific management alive.

  A similar problem was created in the ARM world with support for IP
  block specific message storage. Instead of going all the way to stack
  a IP block specific domain on top of the generic MSI domain this ended
  in a construct which provides a "global" platform MSI domain which
  allows overriding the irq_write_msi_msg() callback per allocation.

  In course of the lengthy discussions we identified other abuse of the
  MSI infrastructure in wireless drivers, NTB etc. where support for
  implementation specific message storage was just mindlessly glued into
  the existing infrastructure. Some of this just works by chance on
  particular platforms but will fail in hard to diagnose ways when the
  driver is used on platforms where the underlying MSI interrupt
  management code does not expect the creative abuse.

  Another shortcoming of today's PCI/MSI-X support is the inability to
  allocate or free individual vectors after the initial enablement of
  MSI-X. This results in an works by chance implementation of VFIO (PCI
  pass-through) where interrupts on the host side are not set up upfront
  to avoid resource exhaustion. They are expanded at run-time when the
  guest actually tries to use them. The way how this is implemented is
  that the host disables MSI-X and then re-enables it with a larger
  number of vectors again. That works by chance because most device
  drivers set up all interrupts before the device actually will utilize
  them. But that's not universally true because some drivers allocate a
  large enough number of vectors but do not utilize them until it's
  actually required, e.g. for acceleration support. But at that point
  other interrupts of the device might be in active use and the MSI-X
  disable/enable dance can just result in losing interrupts and
  therefore hard to diagnose subtle problems.

  Last but not least the "global" PCI/MSI-X domain approach prevents to
  utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact
  that IMS is not longer providing a uniform storage and configuration
  model.

  The solution to this is to implement the missing step and switch from
  global PCI/MSI domains to per device PCI/MSI domains. The resulting
  hierarchy then looks like this:

                              |--- [PCI/MSI] device 1
     [Vector]---[Remapping]---|...
                              |--- [PCI/MSI] device N

  which in turn allows to provide support for multiple domains per
  device:

                              |--- [PCI/MSI] device 1
                              |--- [PCI/IMS] device 1
     [Vector]---[Remapping]---|...
                              |--- [PCI/MSI] device N
                              |--- [PCI/IMS] device N

  This work converts the MSI and PCI/MSI core and the x86 interrupt
  domains to the new model, provides new interfaces for post-enable
  allocation/free of MSI-X interrupts and the base framework for
  PCI/IMS. PCI/IMS has been verified with the work in progress IDXD
  driver.

  There is work in progress to convert ARM over which will replace the
  platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
  "solutions" are in the works as well.

  Drivers:

   - Updates for the LoongArch interrupt chip drivers

   - Support for MTK CIRQv2

   - The usual small fixes and updates all over the place"

* tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits)
  irqchip/ti-sci-inta: Fix kernel doc
  irqchip/gic-v2m: Mark a few functions __init
  irqchip/gic-v2m: Include arm-gic-common.h
  irqchip/irq-mvebu-icu: Fix works by chance pointer assignment
  iommu/amd: Enable PCI/IMS
  iommu/vt-d: Enable PCI/IMS
  x86/apic/msi: Enable PCI/IMS
  PCI/MSI: Provide pci_ims_alloc/free_irq()
  PCI/MSI: Provide IMS (Interrupt Message Store) support
  genirq/msi: Provide constants for PCI/IMS support
  x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN
  PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X
  PCI/MSI: Provide prepare_desc() MSI domain op
  PCI/MSI: Split MSI-X descriptor setup
  genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN
  genirq/msi: Provide msi_domain_alloc_irq_at()
  genirq/msi: Provide msi_domain_ops:: Prepare_desc()
  genirq/msi: Provide msi_desc:: Msi_data
  genirq/msi: Provide struct msi_map
  x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
  ...
2022-12-12 11:21:29 -08:00
Joerg Roedel
e3eca2e4f6 Merge branches 'arm/allwinner', 'arm/exynos', 'arm/mediatek', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 's390', 'x86/vt-d', 'x86/amd' and 'core' into next 2022-12-12 12:50:53 +01:00
Thomas Gleixner
810531a1af iommu/vt-d: Enable PCI/IMS
PCI/IMS works like PCI/MSI-X in the remapping. Just add the feature flag,
but only when on real hardware.

Virtualized IOMMUs need additional support, e.g. for PASID.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232327.081482253@linutronix.de
2022-12-05 22:22:35 +01:00
Thomas Gleixner
9a945234ab iommu/vt-d: Switch to MSI parent domains
Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.151226317@linutronix.de
2022-12-05 22:22:33 +01:00
Thomas Gleixner
b6d5fc3a52 x86/apic/vector: Provide MSI parent domain
Enable MSI parent domain support in the x86 vector domain and fixup the
checks in the iommu implementations to check whether device::msi::domain is
the default MSI parent domain. That keeps the existing logic to protect
e.g. devices behind VMD working.

The interrupt remap PCI/MSI code still works because the underlying vector
domain still provides the same functionality.

None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are
affected either. They still work the same way both at the low level and the
PCI/MSI implementations they provide.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.034672592@linutronix.de
2022-12-05 22:22:33 +01:00
Jacob Pan
81c95fbaeb iommu/vt-d: Fix buggy QAT device mask
Impacted QAT device IDs that need extra dtlb flush quirk is ranging
from 0x4940 to 0x4943. After bitwise AND device ID with 0xfffc the
result should be 0x4940 instead of 0x494c to identify these devices.

Fixes: e65a6897be ("iommu/vt-d: Add a fix for devices need extra dtlb flush")
Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20221203005610.2927487-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05 14:27:03 +01:00
Jason Gunthorpe
90337f526c Merge tag 'v6.1-rc7' into iommufd.git for-next
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version.
The rc fix was done a different way when iommufd patches reworked this
code.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02 12:04:39 -04:00
Xiongfeng Wang
4bedbbd782 iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()
for_each_pci_dev() is implemented by pci_get_device(). The comment of
pci_get_device() says that it will increase the reference count for the
returned pci_dev and also decrease the reference count for the input
pci_dev @from if it is not NULL.

If we break for_each_pci_dev() loop with pdev not NULL, we need to call
pci_dev_put() to decrease the reference count. Add the missing
pci_dev_put() for the error path to avoid reference count leak.

Fixes: 2e45528930 ("iommu/vt-d: Unify the way to process DMAR device scope array")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Link: https://lore.kernel.org/r/20221121113649.190393-3-wangxiongfeng2@huawei.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02 11:45:33 +01:00
Xiongfeng Wang
afca9e19cc iommu/vt-d: Fix PCI device refcount leak in has_external_pci()
for_each_pci_dev() is implemented by pci_get_device(). The comment of
pci_get_device() says that it will increase the reference count for the
returned pci_dev and also decrease the reference count for the input
pci_dev @from if it is not NULL.

If we break for_each_pci_dev() loop with pdev not NULL, we need to call
pci_dev_put() to decrease the reference count. Add the missing
pci_dev_put() before 'return true' to avoid reference count leak.

Fixes: 89a6079df7 ("iommu/vt-d: Force IOMMU on for platform opt in hint")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Link: https://lore.kernel.org/r/20221121113649.190393-2-wangxiongfeng2@huawei.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02 11:45:32 +01:00
Yang Yingliang
6927d35238 iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()
As comment of pci_get_domain_bus_and_slot() says, it returns a pci device
with refcount increment, when finish using it, the caller must decrease
the reference count by calling pci_dev_put(). So call pci_dev_put() after
using the 'pdev' to avoid refcount leak.

Besides, if the 'pdev' is null or intel_svm_prq_report() returns error,
there is no need to trace this fault.

Fixes: 06f4b8d09d ("iommu/vt-d: Remove unnecessary SVA data accesses in page fault path")
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20221119144028.2452731-1-yangyingliang@huawei.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02 11:45:32 +01:00
Jacob Pan
e65a6897be iommu/vt-d: Add a fix for devices need extra dtlb flush
QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in
address translation service (ATS). These devices may inadvertently issue
ATS invalidation completion before posted writes initiated with
translated address that utilized translations matching the invalidation
address range, violating the invalidation completion ordering.

This patch adds an extra device TLB invalidation for the affected devices,
it is needed to ensure no more posted writes with translated address
following the invalidation completion. Therefore, the ordering is
preserved and data-corruption is prevented.

Device TLBs are invalidated under the following six conditions:
1. Device driver does DMA API unmap IOVA
2. Device driver unbind a PASID from a process, sva_unbind_device()
3. PASID is torn down, after PASID cache is flushed. e.g. process
exit_mmap() due to crash
4. Under SVA usage, called by mmu_notifier.invalidate_range() where
VM has to free pages that were unmapped
5. userspace driver unmaps a DMA buffer
6. Cache invalidation in vSVA usage (upcoming)

For #1 and #2, device drivers are responsible for stopping DMA traffic
before unmap/unbind. For #3, iommu driver gets mmu_notifier to
invalidate TLB the same way as normal user unmap which will do an extra
invalidation. The dTLB invalidation after PASID cache flush does not
need an extra invalidation.

Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
covered by this patch due to common code path with #5.

Tested-by: Yuzhang Luo <yuzhang.luo@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20221130062449.1360063-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02 11:45:31 +01:00
Jason Gunthorpe
4989764d8e iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY
This queries if a domain linked to a device should expect to support
enforce_cache_coherency() so iommufd can negotiate the rules for when a
domain should be shared or not.

For iommufd a device that declares IOMMU_CAP_ENFORCE_CACHE_COHERENCY will
not be attached to a domain that does not support it.

Link: https://lore.kernel.org/r/1-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-29 16:34:15 -04:00
Lu Baolu
e5b0feb436 iommu/vt-d: Use real field for indication of first level
The dmar_domain uses bit field members to indicate the behaviors. Add
a bit field for using first level and remove the flags member to avoid
duplication.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:22 +01:00
Lu Baolu
b1cf1563f3 iommu/vt-d: Remove unnecessary domain_context_mapped()
The device_domain_info::domain accurately records the domain attached to
the device. It is unnecessary to check whether the context is present in
the attach_dev path. Remove it to make the code neat.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:22 +01:00
Lu Baolu
a8204479f2 iommu/vt-d: Rename domain_add_dev_info()
dmar_domain_attach_device() is more meaningful according to what this
helper does.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:21 +01:00
Lu Baolu
ba502132f5 iommu/vt-d: Rename iommu_disable_dev_iotlb()
Rename iommu_disable_dev_iotlb() to iommu_disable_pci_caps() to pair with
iommu_enable_pci_caps().

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:21 +01:00
Lu Baolu
35a99c54dd iommu/vt-d: Add blocking domain support
The Intel IOMMU hardwares support blocking DMA transactions by clearing
the translation table entries. This implements a real blocking domain to
avoid using an empty UNMANAGED domain. The detach_dev callback of the
domain ops is not used in any path. Remove it to avoid dead code as well.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:21 +01:00
Lu Baolu
c7be17c290 iommu/vt-d: Add device_block_translation() helper
If domain attaching to device fails, the IOMMU driver should bring the
device to blocking DMA state. The upper layer is expected to recover it
by attaching a new domain. Use device_block_translation() in the error
path of dev_attach to make the behavior specific.

The difference between device_block_translation() and the previous
dmar_remove_one_dev_info() is that, in the scalable mode, it is the
RID2PASID entry instead of context entry being cleared. As a result,
enabling PCI capabilities is moved up.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:20 +01:00
Lu Baolu
ec62b44241 iommu/vt-d: Allocate pasid table in device probe path
Whether or not a domain is attached to the device, the pasid table should
always be valid as long as it has been probed. This moves the pasid table
allocation from the domain attaching device path to device probe path and
frees it in the device release path.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:20 +01:00
Tina Zhang
7fc961cf7f iommu/vt-d: Set SRE bit only when hardware has SRS cap
SRS cap is the hardware cap telling if the hardware IOMMU can support
requests seeking supervisor privilege or not. SRE bit in scalable-mode
PASID table entry is treated as Reserved(0) for implementation not
supporting SRS cap.

Checking SRS cap before setting SRE bit can avoid the non-recoverable
fault of "Non-zero reserved field set in PASID Table Entry" caused by
setting SRE bit while there is no SRS cap support. The fault messages
look like below:

 DMAR: DRHD: handling fault status reg 2
 DMAR: [DMA Read NO_PASID] Request device [00:0d.0] fault addr 0x1154e1000
       [fault reason 0x5a]
       SM: Non-zero reserved field set in PASID Table Entry

Fixes: 6f7db75e1c ("iommu/vt-d: Add second level page table interface")
Cc: stable@vger.kernel.org
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20221115070346.1112273-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20221116051544.26540-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-19 10:46:52 +01:00
Tina Zhang
242b0aaeab iommu/vt-d: Preset Access bit for IOVA in FL non-leaf paging entries
The A/D bits are preseted for IOVA over first level(FL) usage for both
kernel DMA (i.e, domain typs is IOMMU_DOMAIN_DMA) and user space DMA
usage (i.e., domain type is IOMMU_DOMAIN_UNMANAGED).

Presetting A bit in FL requires to preset the bit in every related paging
entries, including the non-leaf ones. Otherwise, hardware may treat this
as an error. For example, in a case of ECAP_REG.SMPWC==0, DMA faults might
occur with below DMAR fault messages (wrapped for line length) dumped.

 DMAR: DRHD: handling fault status reg 2
 DMAR: [DMA Read NO_PASID] Request device [aa:00.0] fault addr 0x10c3a6000
    [fault reason 0x90]
    SM: A/D bit update needed in first-level entry when set up in no snoop

Fixes: 289b3b005c ("iommu/vt-d: Preset A/D bits for user space DMA usage")
Cc: stable@vger.kernel.org
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20221113010324.1094483-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20221116051544.26540-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-19 10:46:51 +01:00
Thomas Gleixner
d474d92d70 x86/apic: Remove X86_IRQ_ALLOC_CONTIGUOUS_VECTORS
Now that the PCI/MSI core code does early checking for multi-MSI support
X86_IRQ_ALLOC_CONTIGUOUS_VECTORS is not required anymore.

Remove the flag and rely on MSI_FLAG_MULTI_PCI_MSI.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122015.865042356@linutronix.de
2022-11-17 15:15:22 +01:00
Thomas Gleixner
527f378c42 iommu/vt-d: Remove bogus check for multi MSI-X
PCI/Multi-MSI is MSI specific and not supported for MSI-X.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Link: https://lore.kernel.org/r/20221111122013.713848846@linutronix.de
2022-11-17 15:15:18 +01:00
Joerg Roedel
69e61edebe iommu: Define EINVAL as device/domain incompatibility
This series is to replace the previous EMEDIUMTYPE patch in a VFIO series:
 https://lore.kernel.org/kvm/Yxnt9uQTmbqul5lf@8bytes.org/
 
 The purpose is to regulate all existing ->attach_dev callback functions to
 use EINVAL exclusively for an incompatibility error between a device and a
 domain. This allows VFIO and IOMMUFD to detect such a soft error, and then
 try a different domain with the same device.
 
 Among all the patches, the first two are preparatory changes. And then one
 patch to update kdocs and another three patches for the enforcement
 effort.
 
 Link: https://lore.kernel.org/r/cover.1666042872.git.nicolinc@nvidia.com
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY2JjUQAKCRCFwuHvBreF
 YaFbAP492zvOEaZaRxiK4XcdsU1ZBCovB/2Keh/QIQdb7Ig6hgD/dW7TygTP1+4a
 Oqpcu/6aLeHvhayfZt1142S3e0HuHwU=
 =g5C+
 -----END PGP SIGNATURE-----

Merge tag 'for-joerg' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core

iommu: Define EINVAL as device/domain incompatibility

This series is to replace the previous EMEDIUMTYPE patch in a VFIO series:
https://lore.kernel.org/kvm/Yxnt9uQTmbqul5lf@8bytes.org/

The purpose is to regulate all existing ->attach_dev callback functions to
use EINVAL exclusively for an incompatibility error between a device and a
domain. This allows VFIO and IOMMUFD to detect such a soft error, and then
try a different domain with the same device.

Among all the patches, the first two are preparatory changes. And then one
patch to update kdocs and another three patches for the enforcement
effort.

Link: https://lore.kernel.org/r/cover.1666042872.git.nicolinc@nvidia.com
2022-11-03 15:51:48 +01:00
Lu Baolu
757636ed26 iommu: Rename iommu-sva-lib.{c,h}
Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code
for SVA implementation in iommu core.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-14-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03 15:47:54 +01:00
Lu Baolu
1c263576f4 iommu: Remove SVA related callbacks from iommu ops
These ops'es have been deprecated. There's no need for them anymore.
Remove them to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-11-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03 15:47:51 +01:00
Lu Baolu
eaca8889a1 iommu/vt-d: Add SVA domain support
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops. This implementation is based on the existing SVA
code. Possible cleanup and refactoring are left for incremental
changes later.

The VT-d driver will also need to support setting a DMA domain to a
PASID of device. Current SVA implementation uses different data
structures to track the domain and device PASID relationship. That's
the reason why we need to check the domain type in remove_dev_pasid
callback. Eventually we'll consolidate the data structures and remove
the need of domain type check.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03 15:47:49 +01:00
Lu Baolu
942fd5435d iommu: Remove SVM_FLAG_SUPERVISOR_MODE support
The current kernel DMA with PASID support is based on the SVA with a flag
SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address
space to a PASID of the device. The device driver programs the device with
kernel virtual address (KVA) for DMA access. There have been security and
functional issues with this approach:

- The lack of IOTLB synchronization upon kernel page table updates.
  (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
- Other than slight more protection, using kernel virtual address (KVA)
  has little advantage over physical address. There are also no use
  cases yet where DMA engines need kernel virtual addresses for in-kernel
  DMA.

This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface.
The device drivers are suggested to handle kernel DMA with PASID through
the kernel DMA APIs.

The drvdata parameter in iommu_sva_bind_device() and all callbacks is not
needed anymore. Cleanup them as well.

Link: https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03 15:47:45 +01:00
Lu Baolu
1adf3cc20d iommu: Add max_pasids field in struct iommu_device
Use this field to keep the number of supported PASIDs that an IOMMU
hardware is able to support. This is a generic attribute of an IOMMU
and lifting it into the per-IOMMU device structure makes it possible
to allocate a PASID for device without calls into the IOMMU drivers.
Any iommu driver that supports PASID related features should set this
field before enabling them on the devices.

In the Intel IOMMU driver, intel_iommu_sm is moved to CONFIG_INTEL_IOMMU
enclave so that the pasid_supported() helper could be used in dmar.c
without compilation errors.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03 15:47:43 +01:00
Nicolin Chen
f4a1477357 iommu: Use EINVAL for incompatible device/domain in ->attach_dev
Following the new rules in include/linux/iommu.h kdocs, update all drivers
->attach_dev callback functions to return EINVAL in the failure paths that
are related to domain incompatibility.

Also, drop adjacent error prints to prevent a kernel log spam.

Link: https://lore.kernel.org/r/f52a07f7320da94afe575c9631340d0019a203a7.1666042873.git.nicolinc@nvidia.com
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-01 14:39:59 -03:00
Nicolin Chen
bd7ebb7719 iommu: Regulate EINVAL in ->attach_dev callback functions
Following the new rules in include/linux/iommu.h kdocs, EINVAL now can be
used to indicate that domain and device are incompatible by a caller that
treats it as a soft failure and tries attaching to another domain.

On the other hand, there are ->attach_dev callback functions returning it
for obvious device-specific errors. They will result in some inefficiency
in the caller handling routine.

Update these places to corresponding errnos following the new rules.

Link: https://lore.kernel.org/r/5924c03bea637f05feb2a20d624bae086b555ec5.1666042872.git.nicolinc@nvidia.com
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-01 14:39:59 -03:00
Jerry Snitselaar
620bf9f981 iommu/vt-d: Clean up si_domain in the init_dmars() error path
A splat from kmem_cache_destroy() was seen with a kernel prior to
commit ee2653bbe8 ("iommu/vt-d: Remove domain and devinfo mempool")
when there was a failure in init_dmars(), because the iommu_domain
cache still had objects. While the mempool code is now gone, there
still is a leak of the si_domain memory if init_dmars() fails. So
clean up si_domain in the init_dmars() error path.

Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Fixes: 86080ccc22 ("iommu/vt-d: Allocate si_domain in init_dmars()")
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-21 10:49:35 +02:00
Lu Baolu
bf638a6513 iommu/vt-d: Use rcu_lock in get_resv_regions
Commit 5f64ce5411 ("iommu/vt-d: Duplicate iommu_resv_region objects
per device list") converted rcu_lock in get_resv_regions to
dmar_global_lock to allow sleeping in iommu_alloc_resv_region(). This
introduced possible recursive locking if get_resv_regions is called from
within a section where intel_iommu_init() already holds dmar_global_lock.

Especially, after commit 57365a04c9 ("iommu: Move bus setup to IOMMU
device registration"), below lockdep splats could always be seen.

 ============================================
 WARNING: possible recursive locking detected
 6.0.0-rc4+ #325 Tainted: G          I
 --------------------------------------------
 swapper/0/1 is trying to acquire lock:
 ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at:
 intel_iommu_get_resv_regions+0x25/0x270

 but task is already holding lock:
 ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at:
 intel_iommu_init+0x36d/0x6ea

 ...

 Call Trace:
  <TASK>
  dump_stack_lvl+0x48/0x5f
  __lock_acquire.cold.73+0xad/0x2bb
  lock_acquire+0xc2/0x2e0
  ? intel_iommu_get_resv_regions+0x25/0x270
  ? lock_is_held_type+0x9d/0x110
  down_read+0x42/0x150
  ? intel_iommu_get_resv_regions+0x25/0x270
  intel_iommu_get_resv_regions+0x25/0x270
  iommu_create_device_direct_mappings.isra.28+0x8d/0x1c0
  ? iommu_get_dma_cookie+0x6d/0x90
  bus_iommu_probe+0x19f/0x2e0
  iommu_device_register+0xd4/0x130
  intel_iommu_init+0x3e1/0x6ea
  ? iommu_setup+0x289/0x289
  ? rdinit_setup+0x34/0x34
  pci_iommu_init+0x12/0x3a
  do_one_initcall+0x65/0x320
  ? rdinit_setup+0x34/0x34
  ? rcu_read_lock_sched_held+0x5a/0x80
  kernel_init_freeable+0x28a/0x2f3
  ? rest_init+0x1b0/0x1b0
  kernel_init+0x1a/0x130
  ret_from_fork+0x1f/0x30
  </TASK>

This rolls back dmar_global_lock to rcu_lock in get_resv_regions to avoid
the lockdep splat.

Fixes: 57365a04c9 ("iommu: Move bus setup to IOMMU device registration")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20220927053109.4053662-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-21 10:49:34 +02:00
Lu Baolu
0251d0107c iommu: Add gfp parameter to iommu_alloc_resv_region
Add gfp parameter to iommu_alloc_resv_region() for the callers to specify
the memory allocation behavior. Thus iommu_alloc_resv_region() could also
be available in critical contexts.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-21 10:49:32 +02:00
Joerg Roedel
38713c6028 Merge branches 'apple/dart', 'arm/mediatek', 'arm/omap', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2022-09-26 15:52:31 +02:00
Lu Baolu
6ad931a232 iommu/vt-d: Avoid unnecessary global DMA cache invalidation
Some VT-d hardware implementations invalidate all DMA remapping hardware
translation caches as part of SRTP flow. The VT-d spec adds a ESRTPS
(Enhanced Set Root Table Pointer Support, section 11.4.2 in VT-d spec)
capability bit to indicate this. With this bit set, software has no need
to issue the global invalidation request.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220919062523.3438951-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26 15:52:26 +02:00
Lu Baolu
eb5b20114b iommu/vt-d: Avoid unnecessary global IRTE cache invalidation
Some VT-d hardware implementations invalidate all interrupt remapping
hardware translation caches as part of SIRTP flow. The VT-d spec adds
a ESIRTPS (Enhanced Set Interrupt Remap Table Pointer Support, section
11.4.2 in VT-d spec) capability bit to indicate this.

The spec also states in 11.4.4 that hardware also performs global
invalidation on all interrupt remapping caches as part of Interrupt
Remapping Disable operation if ESIRTPS capability bit is set.

This checks the ESIRTPS capability bit and skip software global cache
invalidation if it's set.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220921065741.3572495-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26 15:52:26 +02:00
Yi Liu
b722cb32f0 iommu/vt-d: Rename cap_5lp_support to cap_fl5lp_support
This renaming better describes it is for first level page table (a.k.a
first stage page table since VT-d spec 3.4).

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220916071326.2223901-1-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26 15:52:25 +02:00
Lu Baolu
4759858726 iommu/vt-d: Remove pasid_set_eafe()
It is not used anywhere in the tree. Remove it to avoid dead code.
No functional change intended.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220915081645.1834555-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26 15:52:24 +02:00
Lu Baolu
0faa19a151 iommu/vt-d: Decouple PASID & PRI enabling from SVA
Previously the PCI PASID and PRI capabilities are enabled in the path of
iommu device probe only if INTEL_IOMMU_SVM is configured and the device
supports ATS. As we've already decoupled the I/O page fault handler from
SVA, we could also decouple PASID and PRI enabling from it to make room
for growth of new features like kernel DMA with PASID, SIOV and nested
translation.

At the same time, the iommu_enable_dev_iotlb() helper is also called in
iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) path. It's unnecessary
and duplicate. This cleanups this helper to make the code neat.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220915085814.2261409-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26 15:52:24 +02:00
Lu Baolu
06f4b8d09d iommu/vt-d: Remove unnecessary SVA data accesses in page fault path
The existing I/O page fault handling code accesses the per-PASID SVA data
structures. This is unnecessary and makes the fault handling code only
suitable for SVA scenarios. This removes the SVA data accesses from the
I/O page fault reporting and responding code, so that the fault handling
code could be generic.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220914011821.400986-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26 15:52:23 +02:00
Yi Liu
1548978070 iommu/vt-d: Check correct capability for sagaw determination
Check 5-level paging capability for 57 bits address width instead of
checking 1GB large page capability.

Fixes: 53fc7ad6ed ("iommu/vt-d: Correctly calculate sagaw value of IOMMU")
Cc: stable@vger.kernel.org
Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Link: https://lore.kernel.org/r/20220916071212.2223869-2-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-21 10:22:54 +02:00
Lu Baolu
7ebb5f8e00 Revert "iommu/vt-d: Fix possible recursive locking in intel_iommu_init()"
This reverts commit 9cd4f14344.

Some issues were reported on the original commit. Some thunderbolt devices
don't work anymore due to the following DMA fault.

DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [09:00.0] fault index 0x8080
      [fault reason 0x25]
      Blocked a compatibility format interrupt request

Bring it back for now to avoid functional regression.

Fixes: 9cd4f14344 ("iommu/vt-d: Fix possible recursive locking in intel_iommu_init()")
Link: https://lore.kernel.org/linux-iommu/485A6EA5-6D58-42EA-B298-8571E97422DE@getmailspring.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216497
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: <stable@vger.kernel.org> # 5.19.x
Reported-and-tested-by: George Hilliard <thirtythreeforty@gmail.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220920081701.3453504-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-21 10:22:54 +02:00
Lu Baolu
9cd4f14344 iommu/vt-d: Fix possible recursive locking in intel_iommu_init()
The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.

The dmar_global_lock used in the intel_iommu_init() might cause recursive
locking issue, for example, intel_iommu_get_resv_regions() is taking the
dmar_global_lock from within a section where intel_iommu_init() already
holds it via probe_acpi_namespace_devices().

Using dmar_global_lock in intel_iommu_init() could be relaxed since it is
unlikely that any IO board must be hot added before the IOMMU subsystem is
initialized. This eliminates the possible recursive locking issue by moving
down DMAR hotplug support after the IOMMU is initialized and removing the
uses of dmar_global_lock in intel_iommu_init().

Fixes: d5692d4af0 ("iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()")
Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/894db0ccae854b35c73814485569b634237b5538.1657034828.git.robin.murphy@arm.com
Link: https://lore.kernel.org/r/20220718235325.3952426-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-11 08:19:24 +02:00
Joerg Roedel
7f34891b15 Merge branch 'iommu/fixes' into core 2022-09-09 09:27:09 +02:00
Robin Murphy
f2042ed21d iommu/dma: Make header private
Now that dma-iommu.h only contains internal interfaces, make it
private to the IOMMU subsytem.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b237e06c56a101f77af142a54b629b27aa179d22.1660668998.git.robin.murphy@arm.com
[ joro : re-add stub for iommu_dma_get_resv_regions ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-09 09:26:22 +02:00
Lu Baolu
35bf49e054 iommu/vt-d: Fix lockdep splat due to klist iteration in atomic context
With CONFIG_INTEL_IOMMU_DEBUGFS enabled, below lockdep splat are seen
when an I/O fault occurs on a machine with an Intel IOMMU in it.

 DMAR: DRHD: handling fault status reg 3
 DMAR: [DMA Write NO_PASID] Request device [00:1a.0] fault addr 0x0
       [fault reason 0x05] PTE Write access is not set
 DMAR: Dump dmar0 table entries for IOVA 0x0
 DMAR: root entry: 0x0000000127f42001
 DMAR: context entry: hi 0x0000000000001502, low 0x000000012d8ab001
 ================================
 WARNING: inconsistent lock state
 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1 Not tainted
 --------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 rngd/1006 [HC1[1]:SC0[0]:HE0:SE1] takes:
 ff177021416f2d78 (&k->k_lock){?.+.}-{2:2}, at: klist_next+0x1b/0x160
 {HARDIRQ-ON-W} state was registered at:
   lock_acquire+0xce/0x2d0
   _raw_spin_lock+0x33/0x80
   klist_add_tail+0x46/0x80
   bus_add_device+0xee/0x150
   device_add+0x39d/0x9a0
   add_memory_block+0x108/0x1d0
   memory_dev_init+0xe1/0x117
   driver_init+0x43/0x4d
   kernel_init_freeable+0x1c2/0x2cc
   kernel_init+0x16/0x140
   ret_from_fork+0x1f/0x30
 irq event stamp: 7812
 hardirqs last  enabled at (7811): [<ffffffff85000e86>] asm_sysvec_apic_timer_interrupt+0x16/0x20
 hardirqs last disabled at (7812): [<ffffffff84f16894>] irqentry_enter+0x54/0x60
 softirqs last  enabled at (7794): [<ffffffff840ff669>] __irq_exit_rcu+0xf9/0x170
 softirqs last disabled at (7787): [<ffffffff840ff669>] __irq_exit_rcu+0xf9/0x170

The klist iterator functions using spin_*lock_irq*() but the klist
insertion functions using spin_*lock(), combined with the Intel DMAR
IOMMU driver iterating over klists from atomic (hardirq) context, where
pci_get_domain_bus_and_slot() calls into bus_find_device() which iterates
over klists.

As currently there's no plan to fix the klist to make it safe to use in
atomic context, this fixes the lockdep splat by avoid calling
pci_get_domain_bus_and_slot() in the hardirq context.

Fixes: 8ac0b64b97 ("iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()")
Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Link: https://lore.kernel.org/linux-iommu/Yvo2dfpEh%2FWC+Wrr@wantstofly.org/
Link: https://lore.kernel.org/linux-iommu/YvyBdPwrTuHHbn5X@wantstofly.org/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220819015949.4795-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 15:14:57 +02:00
Lu Baolu
a349ffcb4d iommu/vt-d: Fix recursive lock issue in iommu_flush_dev_iotlb()
The per domain spinlock is acquired in iommu_flush_dev_iotlb(), which
is possbile to be called in the interrupt context. For example, the
drm-intel's CI system got completely blocked with below error:

 WARNING: inconsistent lock state
 6.0.0-rc1-CI_DRM_11990-g6590d43d39b9+ #1 Not tainted
 --------------------------------
 inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
 swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
 ffff88810440d678 (&domain->lock){+.?.}-{2:2}, at: iommu_flush_dev_iotlb.part.61+0x23/0x80
 {SOFTIRQ-ON-W} state was registered at:
   lock_acquire+0xd3/0x310
   _raw_spin_lock+0x2a/0x40
   domain_update_iommu_cap+0x20b/0x2c0
   intel_iommu_attach_device+0x5bd/0x860
   __iommu_attach_device+0x18/0xe0
   bus_iommu_probe+0x1f3/0x2d0
   bus_set_iommu+0x82/0xd0
   intel_iommu_init+0xe45/0x102a
   pci_iommu_init+0x9/0x31
   do_one_initcall+0x53/0x2f0
   kernel_init_freeable+0x18f/0x1e1
   kernel_init+0x11/0x120
   ret_from_fork+0x1f/0x30
 irq event stamp: 162354
 hardirqs last  enabled at (162354): [<ffffffff81b59274>] _raw_spin_unlock_irqrestore+0x54/0x70
 hardirqs last disabled at (162353): [<ffffffff81b5901b>] _raw_spin_lock_irqsave+0x4b/0x50
 softirqs last  enabled at (162338): [<ffffffff81e00323>] __do_softirq+0x323/0x48e
 softirqs last disabled at (162349): [<ffffffff810c1588>] irq_exit_rcu+0xb8/0xe0
 other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0
        ----
   lock(&domain->lock);
   <Interrupt>
     lock(&domain->lock);
   *** DEADLOCK ***
 1 lock held by swapper/6/0:

This coverts the spin_lock/unlock() into the irq save/restore varieties
to fix the recursive locking issues.

Fixes: ffd5869d93 ("iommu/vt-d: Replace spin_lock_irqsave() with spin_lock()")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20220817025650.3253959-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 15:14:56 +02:00
Lu Baolu
53fc7ad6ed iommu/vt-d: Correctly calculate sagaw value of IOMMU
The Intel IOMMU driver possibly selects between the first-level and the
second-level translation tables for DMA address translation. However,
the levels of page-table walks for the 4KB base page size are calculated
from the SAGAW field of the capability register, which is only valid for
the second-level page table. This causes the IOMMU driver to stop working
if the hardware (or the emulated IOMMU) advertises only first-level
translation capability and reports the SAGAW field as 0.

This solves the above problem by considering both the first level and the
second level when calculating the supported page table levels.

Fixes: b802d070a5 ("iommu/vt-d: Use iova over first level")
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220817023558.3253263-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 15:14:56 +02:00
Lu Baolu
0c5f6c0d82 iommu/vt-d: Fix kdump kernels boot failure with scalable mode
The translation table copying code for kdump kernels is currently based
on the extended root/context entry formats of ECS mode defined in older
VT-d v2.5, and doesn't handle the scalable mode formats. This causes
the kexec capture kernel boot failure with DMAR faults if the IOMMU was
enabled in scalable mode by the previous kernel.

The ECS mode has already been deprecated by the VT-d spec since v3.0 and
Intel IOMMU driver doesn't support this mode as there's no real hardware
implementation. Hence this converts ECS checking in copying table code
into scalable mode.

The existing copying code consumes a bit in the context entry as a mark
of copied entry. It needs to work for the old format as well as for the
extended context entries. As it's hard to find such a common bit for both
legacy and scalable mode context entries. This replaces it with a per-
IOMMU bitmap.

Fixes: 7373a8cc38 ("iommu/vt-d: Setup context and enable RID2PASID support")
Cc: stable@vger.kernel.org
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Tested-by: Wen Jin <wen.jin@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220817011035.3250131-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 15:14:55 +02:00
Robin Murphy
de9f8a91eb iommu/dma: Clean up Kconfig
Although iommu-dma is a per-architecture chonce, that is currently
implemented in a rather haphazard way. Selecting from the arch Kconfig
was the original logical approach, but is complicated by having to
manage dependencies; conversely, selecting from drivers ends up hiding
the architecture dependency *too* well. Instead, let's just have it
enable itself automatically when IOMMU API support is enabled for the
relevant architectures. It can't get much clearer than that.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/2e33c8bc2b1bb478157b7964bfed976cb7466139.1660668998.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 14:46:59 +02:00
Robin Murphy
29e932295b iommu: Clean up bus_set_iommu()
Clean up the remaining trivial bus_set_iommu() callsites along
with the implementation. Now drivers only have to know and care
about iommu_device instances, phew!

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> # s390
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ea383d5f4d74ffe200ab61248e5de6e95846180a.1660572783.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 14:26:17 +02:00
Robin Murphy
c919739ce4 iommu/vt-d: Handle race between registration and device probe
Currently we rely on registering all our instances before initially
allowing any .probe_device calls via bus_set_iommu(). In preparation for
phasing out the latter, make sure we won't inadvertently return success
for a device associated with a known but not yet registered instance,
otherwise we'll run straight into iommu_group_get_for_dev() trying to
use NULL ops.

That also highlights an issue with intel_iommu_get_resv_regions() taking
dmar_global_lock from within a section where intel_iommu_init() already
holds it, which already exists via probe_acpi_namespace_devices() when
an ANDD device is probed, but gets more obvious with the upcoming change
to iommu_device_register(). Since they are both read locks it manages
not to deadlock in practice, and a more in-depth rework of this locking
is underway, so no attempt is made to address it here.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/579f2692291bcbfc3ac64f7456fcff0d629af131.1660572783.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 14:25:01 +02:00
Robin Murphy
359ad15763 iommu: Retire iommu_capable()
With all callers now converted to the device-specific version, retire
the old bus-based interface, and give drivers the chance to indicate
accurate per-instance capabilities.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/d8bd8777d06929ad8f49df7fc80e1b9af32a41b5.1660574547.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 14:16:37 +02:00
Linus Torvalds
4e23eeebb2 Bitmap patches for v6.0-rc1
This branch consists of:
 
 Qu Wenruo:
 lib: bitmap: fix the duplicated comments on bitmap_to_arr64()
 https://lore.kernel.org/lkml/0d85e1dbad52ad7fb5787c4432bdb36cbd24f632.1656063005.git.wqu@suse.com/
 
 Alexander Lobakin:
 bitops: let optimize out non-atomic bitops on compile-time constants
 https://lore.kernel.org/lkml/20220624121313.2382500-1-alexandr.lobakin@intel.com/T/
 
 Yury Norov:
 lib: cleanup bitmap-related headers
 https://lore.kernel.org/linux-arm-kernel/YtCVeOGLiQ4gNPSf@yury-laptop/T/#m305522194c4d38edfdaffa71fcaaf2e2ca00a961
 
 Alexander Lobakin:
 x86/olpc: fix 'logical not is only applied to the left hand side'
 https://www.spinics.net/lists/kernel/msg4440064.html
 
 Yury Norov:
 lib/nodemask: inline wrappers around bitmap
 https://lore.kernel.org/all/20220723214537.2054208-1-yury.norov@gmail.com/
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmLpVvwACgkQsUSA/Tof
 vsiAHgwAwS9pl8GJ+fKYnue2CYo9349d2oT6BBUs/Rv8uqYEa4QkpYsR7NS733TG
 pos0hhoRvSOzrUP4qppXUjfJ+NkzLgpnKFOeWfFoNAKlHuaaMRvF3Y0Q/P8g0/Kg
 HPWcCQLHyCH9Wjs3e2TTgRjxTrHuruD2VJ401/PX/lw0DicUhmev5mUFa10uwFkP
 ZJRprjoFn9HJ0Hk16pFZDi36d3YumhACOcWRiJdoBDrEPV3S6lm9EeOy/yHBNp5k
 9bKj+RboeT2t70KaZcKv+M5j1nu0cAhl7kRkjcxcmGyimI0l82Vgq9yFxhGqvWg8
 RnCrJ5EaO08FGCAKG9GEwzdiNa24Gdq5XZSpQA7JZHmhmchpnnlNenJicyv0gOQi
 abChZeWSEsyA+78l2+kk9nezfVKUOnKDEZQxBVTOyWsmZYxHZV94oam340VjQDaY
 4/fETdOy/qqPIxnpxAeFGWxZjcVaYiYPLj7KLPMsB0aAAF7pZrem465vSfgbrE81
 +gCdqrWd
 =4dTW
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-6.0-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo)

 - optimize out non-atomic bitops on compile-time constants (Alexander
   Lobakin)

 - cleanup bitmap-related headers (Yury Norov)

 - x86/olpc: fix 'logical not is only applied to the left hand side'
   (Alexander Lobakin)

 - lib/nodemask: inline wrappers around bitmap (Yury Norov)

* tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits)
  lib/nodemask: inline next_node_in() and node_random()
  powerpc: drop dependency on <asm/machdep.h> in archrandom.h
  x86/olpc: fix 'logical not is only applied to the left hand side'
  lib/cpumask: move some one-line wrappers to header file
  headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure
  headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>
  headers/deps: mm: Optimize <linux/gfp.h> header dependencies
  lib/cpumask: move trivial wrappers around find_bit to the header
  lib/cpumask: change return types to unsigned where appropriate
  cpumask: change return types to bool where appropriate
  lib/bitmap: change type of bitmap_weight to unsigned long
  lib/bitmap: change return types to bool where appropriate
  arm: align find_bit declarations with generic kernel
  iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)
  lib/test_bitmap: test the tail after bitmap_to_arr64()
  lib/bitmap: fix off-by-one in bitmap_to_arr64()
  lib: test_bitmap: add compile-time optimization/evaluations assertions
  bitmap: don't assume compiler evaluates small mem*() builtins calls
  net/ice: fix initializing the bitmap in the switch code
  bitops: let optimize out non-atomic bitops on compile-time constants
  ...
2022-08-07 17:52:35 -07:00
Joerg Roedel
c10100a416 Merge branches 'arm/exynos', 'arm/mediatek', 'arm/msm', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2022-07-29 12:06:56 +02:00
Yury Norov
4dea97f863 lib/bitmap: change type of bitmap_weight to unsigned long
bitmap_weight() doesn't return negative values, so change it's type
to unsigned long. It may help compiler to generate better code and
catch bugs.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-07-15 06:35:54 -07:00
Lu Baolu
bdb46d1758 iommu/vt-d: Remove global g_iommus array
The g_iommus and g_num_of_iommus is not used anywhere. Remove them to
avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220702015610.2849494-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:42 +02:00
Lu Baolu
97a79de99a iommu/vt-d: Remove unnecessary check in intel_iommu_add()
The Intel IOMMU hot-add process starts from dmar_device_hotplug(). It
uses the global dmar_global_lock to synchronize all the hot-add and
hot-remove paths. In the hot-add path, the new IOMMU data structures
are allocated firstly by dmar_parse_one_drhd() and then initialized by
dmar_hp_add_drhd(). All the IOMMU units are allocated and initialized
in the same synchronized path. There is no case where any IOMMU unit
is created and then initialized for multiple times.

This removes the unnecessary check in intel_iommu_add() which is the
last reference place of the global IOMMU array.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220702015610.2849494-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:42 +02:00
Lu Baolu
ba949f4cd4 iommu/vt-d: Refactor iommu information of each domain
When a DMA domain is attached to a device, it needs to allocate a domain
ID from its IOMMU. Currently, the domain ID information is stored in two
static arrays embedded in the domain structure. This can lead to memory
waste when the driver is running on a small platform.

This optimizes these static arrays by replacing them with an xarray and
consuming memory on demand.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220702015610.2849494-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:41 +02:00
Lu Baolu
913432f217 iommu/vt-d: Use IDA interface to manage iommu sequence id
Switch dmar unit sequence id allocation and release from bitmap to IDA
interface.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220702015610.2849494-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:40 +02:00
Lu Baolu
c3f27c834a iommu/vt-d: Remove unused domain_get_iommu()
It is not used anywhere. Remove it to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220702015610.2849494-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:40 +02:00
Lu Baolu
5eaafdf0c0 iommu/vt-d: Convert global spinlock into per domain lock
Using a global device_domain_lock spinlock to protect per-domain device
tracking lists is an inefficient way, especially considering this lock
is also needed in the hot paths. This optimizes the locking mechanism
by converting the global lock to per domain lock.

On the other hand, as the device tracking lists are never accessed in
any interrupt context, there is no need to disable interrupts while
spinning. Replace irqsave variant with spinlock calls.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:39 +02:00
Lu Baolu
969aaefbaa iommu/vt-d: Use device_domain_lock accurately
The device_domain_lock is used to protect the device tracking list of
a domain. Remove unnecessary spin_lock/unlock()'s and move the necessary
ones around the list access.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-11-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:39 +02:00
Lu Baolu
db75c9573b iommu/vt-d: Fold __dmar_remove_one_dev_info() into its caller
Fold __dmar_remove_one_dev_info() into dmar_remove_one_dev_info() which
is its only caller. Make the spin lock critical range only cover the
device list change code and remove some unnecessary checks.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:38 +02:00
Lu Baolu
79d82ce402 iommu/vt-d: Check device list of domain in domain free path
When the IOMMU domain is about to be freed, it should not be set on any
device. Instead of silently dealing with some bug cases, it's better to
trigger a warning to report and fix any potential bugs at the first time.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:38 +02:00
Lu Baolu
8430fd3f32 iommu/vt-d: Acquiring lock in pasid manipulation helpers
The iommu->lock is used to protect the per-IOMMU pasid directory table
and pasid table. Move the spinlock acquisition/release into the helpers
to make the code self-contained.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:37 +02:00
Lu Baolu
2c3262f9e8 iommu/vt-d: Acquiring lock in domain ID allocation helpers
The iommu->lock is used to protect the per-IOMMU domain ID resource.
Moving the lock into the ID alloc/free helpers makes the code more
compact. At the same time, the device_domain_lock is irrelevant to
the domain ID resource, remove its assertion as well.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:37 +02:00
Lu Baolu
ffd5869d93 iommu/vt-d: Replace spin_lock_irqsave() with spin_lock()
The iommu->lock is used to protect changes in root/context/pasid tables
and domain ID allocation. There's no use case to change these resources
in any interrupt context. Therefore, it is unnecessary to disable the
interrupts when the spinlock is held. The same thing happens on the
device_domain_lock side, which protects the device domain attachment
information. This replaces spin_lock/unlock_irqsave/irqrestore() calls
with the normal spin_lock/unlock().

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:36 +02:00
Lu Baolu
2e1c8dafb8 iommu/vt-d: Unnecessary spinlock for root table alloc and free
The IOMMU root table is allocated and freed in the IOMMU initialization
code in static boot or hot-remove paths. There's no need for a spinlock.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:35 +02:00
Lu Baolu
8ac0b64b97 iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()
Use pci_get_domain_bus_and_slot() instead of searching the global list
to retrieve the pci device pointer. This also removes the global
device_domain_list as there isn't any consumer anymore.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:35 +02:00
Lu Baolu
98f7b0db49 iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()
The disable_dmar_iommu() is called when IOMMU initialization fails or
the IOMMU is hot-removed from the system. In both cases, there is no
need to clear the IOMMU translation data structures for devices.

On the initialization path, the device probing only happens after the
IOMMU is initialized successfully, hence there're no translation data
structures.

On the hot-remove path, there is no real use case where the IOMMU is
hot-removed, but the devices that it manages are still alive in the
system. The translation data structures were torn down during device
release, hence there's no need to repeat it in IOMMU hot-remove path
either. This removes the unnecessary code and only leaves a check.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:34 +02:00
Lu Baolu
983ebe57b3 iommu/vt-d: debugfs: Remove device_domain_lock usage
The domain_translation_struct debugfs node is used to dump the DMAR page
tables for the PCI devices. It potentially races with setting domains to
devices. The existing code uses the global spinlock device_domain_lock to
avoid the races.

This removes the use of device_domain_lock outside of iommu.c by replacing
it with the group mutex lock. Using the group mutex lock is cleaner and
more compatible to following cleanups.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220706025524.2904370-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:33 +02:00
Lu Baolu
9f18abab60 iommu/vt-d: Remove unused iovad from dmar_domain
Not used anywhere. Cleanup it to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220527053424.3111186-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:33 +02:00
Lu Baolu
2585a2790e iommu/vt-d: Move include/linux/intel-iommu.h under iommu
This header file is private to the Intel IOMMU driver. Move it to the
driver folder.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220514014322.2927339-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:31 +02:00
Lu Baolu
853788b9a6 x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU
tboot_force_iommu() is only called by the Intel IOMMU driver. Move the
helper into that driver. No functional change intended.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220514014322.2927339-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:30 +02:00
Lu Baolu
f9903555dd iommu/vt-d: Remove unnecessary exported symbol
The exported symbol intel_iommu_gfx_mapped is not used anywhere in the
tree. Remove it to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220514014322.2927339-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:29 +02:00
Lu Baolu
933ab6d301 iommu/vt-d: Move trace/events/intel_iommu.h under iommu
This header file is private to the Intel IOMMU driver. Move it to the
driver folder.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220514014322.2927339-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:28 +02:00
Christoph Hellwig
ae3ff39a51 iommu: remove the put_resv_regions method
All drivers that implement get_resv_regions just use
generic_put_resv_regions to implement the put side.  Remove the
indirections and document the allocations constraints.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220708080616.238833-4-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:13:45 +02:00
Alexander Lobakin
b0b0b77ea6 iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)
KASAN reports:

[ 4.668325][ T0] BUG: KASAN: wild-memory-access in dmar_parse_one_rhsa (arch/x86/include/asm/bitops.h:214 arch/x86/include/asm/bitops.h:226 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/nodemask.h:415 drivers/iommu/intel/dmar.c:497)
[    4.676149][    T0] Read of size 8 at addr 1fffffff85115558 by task swapper/0/0
[    4.683454][    T0]
[    4.685638][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc3-00004-g0e862838f290 #1
[    4.694331][    T0] Hardware name: Supermicro SYS-5018D-FN4T/X10SDV-8C-TLN4F, BIOS 1.1 03/02/2016
[    4.703196][    T0] Call Trace:
[    4.706334][    T0]  <TASK>
[ 4.709133][ T0] ? dmar_parse_one_rhsa (arch/x86/include/asm/bitops.h:214 arch/x86/include/asm/bitops.h:226 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/nodemask.h:415 drivers/iommu/intel/dmar.c:497)

after converting the type of the first argument (@nr, bit number)
of arch_test_bit() from `long` to `unsigned long`[0].

Under certain conditions (for example, when ACPI NUMA is disabled
via command line), pxm_to_node() can return %NUMA_NO_NODE (-1).
It is valid 'magic' number of NUMA node, but not valid bit number
to use in bitops.
node_online() eventually descends to test_bit() without checking
for the input, assuming it's on caller side (which might be good
for perf-critical tasks). There, -1 becomes %ULONG_MAX which leads
to an insane array index when calculating bit position in memory.

For now, add an explicit check for @node being not %NUMA_NO_NODE
before calling test_bit(). The actual logics didn't change here
at all.

[0] 0e862838f2

Fixes: ee34b32d8c ("dmar: support for parsing Remapping Hardware Static Affinity structure")
Cc: stable@vger.kernel.org # 2.6.33+
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-07-13 09:10:32 -07:00
Lu Baolu
4140d77a02 iommu/vt-d: Fix RID2PASID setup/teardown failure
The IOMMU driver shares the pasid table for PCI alias devices. When the
RID2PASID entry of the shared pasid table has been filled by the first
device, the subsequent device will encounter the "DMAR: Setup RID2PASID
failed" failure as the pasid entry has already been marked as present.
As the result, the IOMMU probing process will be aborted.

On the contrary, when any alias device is hot-removed from the system,
for example, by writing to /sys/bus/pci/devices/.../remove, the shared
RID2PASID will be cleared without any notifications to other devices.
As the result, any DMAs from those rest devices are blocked.

Sharing pasid table among PCI alias devices could save two memory pages
for devices underneath the PCIe-to-PCI bridges. Anyway, considering that
those devices are rare on modern platforms that support VT-d in scalable
mode and the saved memory is negligible, it's reasonable to remove this
part of immature code to make the driver feasible and stable.

Fixes: ef848b7e5a ("iommu/vt-d: Setup pasid entry for RID2PASID support")
Reported-by: Chenyi Qiang <chenyi.qiang@intel.com>
Reported-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220623065720.727849-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220625133430.2200315-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-06 12:59:21 +02:00
Yian Chen
316f92a705 iommu/vt-d: Fix PCI bus rescan device hot add
Notifier calling chain uses priority to determine the execution
order of the notifiers or listeners registered to the chain.
PCI bus device hot add utilizes the notification mechanism.

The current code sets low priority (INT_MIN) to Intel
dmar_pci_bus_notifier and postpones DMAR decoding after adding
new device into IOMMU. The result is that struct device pointer
cannot be found in DRHD search for the new device's DMAR/IOMMU.
Subsequently, the device is put under the "catch-all" IOMMU
instead of the correct one. This could cause system hang when
device TLB invalidation is sent to the wrong IOMMU. Invalidation
timeout error and hard lockup have been observed and data
inconsistency/crush may occur as well.

This patch fixes the issue by setting a positive priority(1) for
dmar_pci_bus_notifier while the priority of IOMMU bus notifier
uses the default value(0), therefore DMAR decoding will be in
advance of DRHD search for a new device to find the correct IOMMU.

Following is a 2-step example that triggers the bug by simulating
PCI device hot add behavior in Intel Sapphire Rapids server.

echo 1 > /sys/bus/pci/devices/0000:6a:01.0/remove
echo 1 > /sys/bus/pci/rescan

Fixes: 59ce0515cd ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
Cc: stable@vger.kernel.org # v3.15+
Reported-by: Zhang, Bernice <bernice.zhang@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Yian Chen <yian.chen@intel.com>
Link: https://lore.kernel.org/r/20220521002115.1624069-1-yian.chen@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-06 12:44:07 +02:00
Linus Torvalds
e1cbc3b96a IOMMU Updates for Linux v5.19
Including:
 
 	- Intel VT-d driver updates
 	  - Domain force snooping improvement.
 	  - Cleanups, no intentional functional changes.
 
 	- ARM SMMU driver updates
 	  - Add new Qualcomm device-tree compatible strings
 	  - Add new Nvidia device-tree compatible string for Tegra234
 	  - Fix UAF in SMMUv3 shared virtual addressing code
 	  - Force identity-mapped domains for users of ye olde SMMU
 	    legacy binding
 	  - Minor cleanups
 
 	- Patches to fix a BUG_ON in the vfio_iommu_group_notifier
 	  - Groundwork for upcoming iommufd framework
 	  - Introduction of DMA ownership so that an entire IOMMU group
 	    is either controlled by the kernel or by user-space
 
 	- MT8195 and MT8186 support in the Mediatek IOMMU driver
 
 	- Patches to make forcing of cache-coherent DMA more coherent
 	  between IOMMU drivers
 
 	- Fixes for thunderbolt device DMA protection
 
 	- Various smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmKWCbUACgkQK/BELZcB
 GuPHmRAAuoH9iK/jrC3SgrqpBfH2iRN7ovIX8dFvgbQWX27lhXF4gvj2/nYdIvPK
 75j/LmdibuzV3Iez4kjbGKNG1AikwK3dKIH21a84f3ctnoamQyL6nMfCVBFaVD/D
 kvPpTHyjbGPNf6KZyWQdkJ5DXD1aoG1DKkBnslH5pTNPqGuNqbcnRTg0YxiJFLBv
 5w2B6jL06XRzunh+Sp1Dbj+po8ROjLRCEU+tdrndO8W/Dyp6+ZNNuxL9/3BM9zMj
 py0M4piFtGnhmJSdym1eeHm7r1YRjkZw+MN+e8NcrcSihmDutEWo7nRRxA5uVaa+
 3O2DNERqCvQUYxfNRUOKwzV8v51GYQHEPhvOe/MLgaEQDmDmlF2dHNGm93eCMdrv
 m1cT011oU7pa4qHomwLyTJxSsR7FzJ37igq/WeY++MBhl+frqfzEQPVxF+W7GLb8
 QvT/+woCPzLVpJbE7s0FUD4nbPd8c1dAz4+HO1DajxILIOTq1bnPIorSjgXODRjq
 yzsiP1rAg0L0PsL7pXn3cPMzNCE//xtOsRsAGmaVv6wBoMLyWVFCU/wjPEdjrSWA
 nXpAuCL84uxCEl/KLYMsg9UhjT6ko7CuKdsybIG9zNIiUau43uSqgTen0xCpYt0i
 m//O/X3tPyxmoLKRW+XVehGOrBZW+qrQny6hk/Zex+6UJQqVMTA=
 =W0hj
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Intel VT-d driver updates:
     - Domain force snooping improvement.
     - Cleanups, no intentional functional changes.

 - ARM SMMU driver updates:
     - Add new Qualcomm device-tree compatible strings
     - Add new Nvidia device-tree compatible string for Tegra234
     - Fix UAF in SMMUv3 shared virtual addressing code
     - Force identity-mapped domains for users of ye olde SMMU legacy
       binding
     - Minor cleanups

 - Fix a BUG_ON in the vfio_iommu_group_notifier:
     - Groundwork for upcoming iommufd framework
     - Introduction of DMA ownership so that an entire IOMMU group is
       either controlled by the kernel or by user-space

 - MT8195 and MT8186 support in the Mediatek IOMMU driver

 - Make forcing of cache-coherent DMA more coherent between IOMMU
   drivers

 - Fixes for thunderbolt device DMA protection

 - Various smaller fixes and cleanups

* tag 'iommu-updates-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (88 commits)
  iommu/amd: Increase timeout waiting for GA log enablement
  iommu/s390: Tolerate repeat attach_dev calls
  iommu/vt-d: Remove hard coding PGSNP bit in PASID entries
  iommu/vt-d: Remove domain_update_iommu_snooping()
  iommu/vt-d: Check domain force_snooping against attached devices
  iommu/vt-d: Block force-snoop domain attaching if no SC support
  iommu/vt-d: Size Page Request Queue to avoid overflow condition
  iommu/vt-d: Fold dmar_insert_one_dev_info() into its caller
  iommu/vt-d: Change return type of dmar_insert_one_dev_info()
  iommu/vt-d: Remove unneeded validity check on dev
  iommu/dma: Explicitly sort PCI DMA windows
  iommu/dma: Fix iova map result check bug
  iommu/mediatek: Fix NULL pointer dereference when printing dev_name
  iommu: iommu_group_claim_dma_owner() must always assign a domain
  iommu/arm-smmu: Force identity domains for legacy binding
  iommu/arm-smmu: Support Tegra234 SMMU
  dt-bindings: arm-smmu: Add compatible for Tegra234 SOC
  dt-bindings: arm-smmu: Document nvidia,memory-controller property
  iommu/arm-smmu-qcom: Add SC8280XP support
  dt-bindings: arm-smmu: Add compatible for Qualcomm SC8280XP
  ...
2022-05-31 09:56:54 -07:00
Linus Torvalds
3f306ea2e1 dma-mapping updates for Linux 5.19
- don't over-decrypt memory (Robin Murphy)
  - takes min align mask into account for the swiotlb max mapping size
    (Tianyu Lan)
  - use GFP_ATOMIC in dma-debug (Mikulas Patocka)
  - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)
  - don't fail on highmem CMA pages in dma_direct_alloc_pages (me)
  - cleanup swiotlb initialization and share more code with swiotlb-xen
    (me, Stefano Stabellini)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmKObTQLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYObmA//dIcDB/q4iFGD+WJh4MhM+asx0ZsdF2OJz42WEhgT
 Z9duOrgcneEQundCamqJP9rNTs980LHDA8uWQC5rZEc9vxuRVOdS7bSgYRUwWh6B
 r0ZjOsvQCn+ChoZML8uyk4rfmEINq+EvJuec3G5fgecZOhPuJS2i2uzzv5cHwqgP
 ChC0fwyZlkfdECXgvZXbEoCJLfTgGNlziN6Ai8dirSoqgEQUoCsY89/M7OiEBvV2
 R4XUWD7OvQERfB4t6xLuUHyzf9PAuWB+OiblRVNeAmK3lMjxVrc3k4kIowgklnzD
 8hfmphAa9Zou3zdfi6Gd4fiQRHRVOwKVp1rtqUmJ+lPSiwyMzu64z9ld2+2qac0h
 V4sSr/yJkhxnBT4/0MkTChvhnRobisackpUzNRpiM4ck7cNVb7eAvkISsbH+pWI9
 aEexPhbyskjlV+GOyM4QL4ygG0dpXY0HSyoh6uaSVsaXMycnWIsJCPidXxV1HGV0
 q2/RLHuHwYxia8cYCF01/DQvwOKSjwbU0zModxtRezGD5GYh2C0a+SrA1aX+qiTu
 yGJCs2UHtSQstAt78tTVp499YeDeL/oGSQkPAu8zyRkSczzF+CncGTuXyoJbAWyK
 otcgERWljgZ4scxjfu1uacfoVhKQ7nOu7hiJokL0U80FESAennLC3ZlocvB9h/ff
 HNA=
 =n2rk
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - don't over-decrypt memory (Robin Murphy)

 - takes min align mask into account for the swiotlb max mapping size
   (Tianyu Lan)

 - use GFP_ATOMIC in dma-debug (Mikulas Patocka)

 - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)

 - don't fail on highmem CMA pages in dma_direct_alloc_pages (me)

 - cleanup swiotlb initialization and share more code with swiotlb-xen
   (me, Stefano Stabellini)

* tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping: (23 commits)
  dma-direct: don't over-decrypt memory
  swiotlb: max mapping size takes min align mask into account
  swiotlb: use the right nslabs-derived sizes in swiotlb_init_late
  swiotlb: use the right nslabs value in swiotlb_init_remap
  swiotlb: don't panic when the swiotlb buffer can't be allocated
  dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC
  dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages
  swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm
  x86: remove cruft from <asm/dma-mapping.h>
  swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl
  swiotlb: merge swiotlb-xen initialization into swiotlb
  swiotlb: provide swiotlb_init variants that remap the buffer
  swiotlb: pass a gfp_mask argument to swiotlb_init_late
  swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction
  swiotlb: make the swiotlb_init interface more useful
  x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled
  x86: remove the IOMMU table infrastructure
  MIPS/octeon: use swiotlb_init instead of open coding it
  arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region
  swiotlb: rename swiotlb_late_init_with_default_size
  ...
2022-05-25 19:18:36 -07:00
Linus Torvalds
2518f226c6 drm for 5.19-rc1
dma-buf:
 - add dma_resv_replace_fences
 - add dma_resv_get_singleton
 - make dma_excl_fence private
 
 core:
 - EDID parser refactorings
 - switch drivers to drm_mode_copy/duplicate
 - DRM managed mutex initialization
 
 display-helper:
 - put HDMI, SCDC, HDCP, DSC and DP into new module
 
 gem:
 - rework fence handling
 
 ttm:
 - rework bulk move handling
 - add common debugfs for resource managers
 - convert to kvcalloc
 
 format helpers:
 - support monochrome formats
 - RGB888, RGB565 to XRGB8888 conversions
 
 fbdev:
 - cfb/sys_imageblit fixes
 - pagelist corruption fix
 - create offb platform device
 - deferred io improvements
 
 sysfb:
 - Kconfig rework
 - support for VESA mode selection
 
 bridge:
 - conversions to devm_drm_of_get_bridge
 - conversions to panel_bridge
 - analogix_dp - autosuspend support
 - it66121 - audio support
 - tc358767 - DSI to DPI support
 - icn6211 - PLL/I2C fixes, DT property
 - adv7611 - enable DRM_BRIDGE_OP_HPD
 - anx7625 - fill ELD if no monitor
 - dw_hdmi - add audio support
 - lontium LT9211 support, i.MXMP LDB
 - it6505: Kconfig fix, DPCD set power fix
 - adv7511 - CEC support for ADV7535
 
 panel:
 - ltk035c5444t, B133UAN01, NV3052C panel support
 - DataImage FG040346DSSWBG04 support
 - st7735r - DT bindings fix
 - ssd130x - fixes
 
 i915:
 - DG2 laptop PCI-IDs ("motherboard down")
 - Initial RPL-P PCI IDs
 - compute engine ABI
 - DG2 Tile4 support
 - DG2 CCS clear color compression support
 - DG2 render/media compression formats support
 - ATS-M platform info
 - RPL-S PCI IDs added
 - Bump ADL-P DMC version to v2.16
 - Support static DRRS
 - Support multiple eDP/LVDS native mode refresh rates
 - DP HDR support for HSW+
 - Lots of display refactoring + fixes
 - GuC hwconfig support and query
 - sysfs support for multi-tile
 - fdinfo per-client gpu utilisation
 - add geometry subslices query
 - fix prime mmap with LMEM
 - fix vm open count and remove vma refcounts
 - contiguous allocation fixes
 - steered register write support
 - small PCI BAR enablement
 - GuC error capture support
 - sunset igpu legacy mmap support for newer devices
 - GuC version 70.1.1 support
 
 amdgpu:
 - Initial SoC21 support
 - SMU 13.x enablement
 - SMU 13.0.4 support
 - ttm_eu cleanups
 - USB-C, GPUVM updates
 - TMZ fixes for RV
 - RAS support for VCN
 - PM sysfs code cleanup
 - DC FP rework
 - extend CG/PG flags to 64-bit
 - SI dpm lockdep fix
 - runtime PM fixes
 
 amdkfd:
 - RAS/SVM fixes
 - TLB flush fixes
 - CRIU GWS support
 - ignore bogus MEC signals more efficiently
 
 msm:
 - Fourcc modifier for tiled but not compressed layouts
 - Support for userspace allocated IOVA (GPU virtual address)
 - DPU: DSC (Display Stream Compression) support
 - DP: eDP support
 - DP: conversion to use drm_bridge and drm_bridge_connector
 - Merge DPU1 and MDP5 MDSS driver
 - DPU: writeback support
 
 nouveau:
 - make some structures static
 - make some variables static
 - switch to drm_gem_plane_helper_prepare_fb
 
 radeon:
 - misc fixes/cleanups
 
 mxsfb:
 - rework crtc mode setting
 - LCDIF CRC support
 
 etnaviv:
 - fencing improvements
 - fix address space collisions
 - cleanup MMU reference handling
 
 gma500:
 - GEM/GTT improvements
 - connector handling fixes
 
 komeda:
 - switch to plane reset helper
 
 mediatek:
 - MIPI DSI improvements
 
 omapdrm:
 - GEM improvements
 
 qxl:
 - aarch64 support
 
 vc4:
 - add a CL submission tracepoint
 - HDMI YUV support
 - HDMI/clock improvements
 - drop is_hdmi caching
 
 virtio:
 - remove restriction of non-zero blob types
 
 vmwgfx:
 - support for cursormob and cursorbypass 4
 - fence improvements
 
 tidss:
 - reset DISPC on startup
 
 solomon:
 - SPI support
 - DT improvements
 
 sun4i:
 - allwinner D1 support
 - drop is_hdmi caching
 
 imx:
 - use swap() instead of open-coding
 - use devm_platform_ioremap_resource
 - remove redunant initializations
 
 ast:
 - Displayport support
 
 rockchip:
 - Refactor IOMMU initialisation
 - make some structures static
 - replace drm_detect_hdmi_monitor with drm_display_info.is_hdmi
 - support swapped YUV formats,
 - clock improvements
 - rk3568 support
 - VOP2 support
 
 mediatek:
 - MT8186 support
 
 tegra:
 - debugabillity improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmKNxkAACgkQDHTzWXnE
 hr4hghAAqSXeMEw1w34miyM28hcOpXqkDfT1VVooqFxBT8MBqamzpZvCH94qsZwm
 3DRXlhQ4pk8wzUcWJpGprdNakxNQPpFVs2UuxYxOyrxYpdkbOwqsEcM3d8VXD9Cy
 E36z+dr85A8Te/J0Yg/FLoZMHulTlidqEZeOz6SMaNUohtrmH/oPWR+cPIy4/Zpp
 yysfbBSKTwblJFDf4+nIpks/VvJYAUO3i6KClT/Rh79Yg6de582AU0YaNcEArTi6
 JqdiYIoYLx609Ecy5NVme6wR/ai46afFLMYt3ZIP4OfHRINk+YL01BYMo2JE2M8l
 xjOH0Iwb7evzWqLK/ESwqp3P7nyppmLlfbZOFHWUfNJsjq2H3ePaAGhzOlYx1c70
 XENzY4IvpYYdR0pJuh1gw1cNZfM9JDAynGJ5jvsATLGBGQbpFsy3w/PMZT17q8an
 DpBwqQmShUdCJ2m+6zznC3VsxJpbvWKNE1I93NxAWZXmFYxoHCzRihahUxKcNDrQ
 ZLH7RSlk9SE/ZtNSLkU15YnKtoW+ThFIssUpVio6U/fZot1+efZkmkXplSuFvj6R
 i7s14hMWQjSJzpJg1DXfhDMycEOujNiQppCG2EaDlVxvUtCqYBd3EHOI7KQON//+
 iVtmEEnWh5rcCM+WsxLGf3Y7sVP3vfo1LOCxshb1XVfDmeMksoI=
 =BYQA
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Intel have enabled DG2 on certain SKUs for laptops, AMD has started
  some new GPU support, msm has user allocated VA controls

  dma-buf:
   - add dma_resv_replace_fences
   - add dma_resv_get_singleton
   - make dma_excl_fence private

  core:
   - EDID parser refactorings
   - switch drivers to drm_mode_copy/duplicate
   - DRM managed mutex initialization

  display-helper:
   - put HDMI, SCDC, HDCP, DSC and DP into new module

  gem:
   - rework fence handling

  ttm:
   - rework bulk move handling
   - add common debugfs for resource managers
   - convert to kvcalloc

  format helpers:
   - support monochrome formats
   - RGB888, RGB565 to XRGB8888 conversions

  fbdev:
   - cfb/sys_imageblit fixes
   - pagelist corruption fix
   - create offb platform device
   - deferred io improvements

  sysfb:
   - Kconfig rework
   - support for VESA mode selection

  bridge:
   - conversions to devm_drm_of_get_bridge
   - conversions to panel_bridge
   - analogix_dp - autosuspend support
   - it66121 - audio support
   - tc358767 - DSI to DPI support
   - icn6211 - PLL/I2C fixes, DT property
   - adv7611 - enable DRM_BRIDGE_OP_HPD
   - anx7625 - fill ELD if no monitor
   - dw_hdmi - add audio support
   - lontium LT9211 support, i.MXMP LDB
   - it6505: Kconfig fix, DPCD set power fix
   - adv7511 - CEC support for ADV7535

  panel:
   - ltk035c5444t, B133UAN01, NV3052C panel support
   - DataImage FG040346DSSWBG04 support
   - st7735r - DT bindings fix
   - ssd130x - fixes

  i915:
   - DG2 laptop PCI-IDs ("motherboard down")
   - Initial RPL-P PCI IDs
   - compute engine ABI
   - DG2 Tile4 support
   - DG2 CCS clear color compression support
   - DG2 render/media compression formats support
   - ATS-M platform info
   - RPL-S PCI IDs added
   - Bump ADL-P DMC version to v2.16
   - Support static DRRS
   - Support multiple eDP/LVDS native mode refresh rates
   - DP HDR support for HSW+
   - Lots of display refactoring + fixes
   - GuC hwconfig support and query
   - sysfs support for multi-tile
   - fdinfo per-client gpu utilisation
   - add geometry subslices query
   - fix prime mmap with LMEM
   - fix vm open count and remove vma refcounts
   - contiguous allocation fixes
   - steered register write support
   - small PCI BAR enablement
   - GuC error capture support
   - sunset igpu legacy mmap support for newer devices
   - GuC version 70.1.1 support

  amdgpu:
   - Initial SoC21 support
   - SMU 13.x enablement
   - SMU 13.0.4 support
   - ttm_eu cleanups
   - USB-C, GPUVM updates
   - TMZ fixes for RV
   - RAS support for VCN
   - PM sysfs code cleanup
   - DC FP rework
   - extend CG/PG flags to 64-bit
   - SI dpm lockdep fix
   - runtime PM fixes

  amdkfd:
   - RAS/SVM fixes
   - TLB flush fixes
   - CRIU GWS support
   - ignore bogus MEC signals more efficiently

  msm:
   - Fourcc modifier for tiled but not compressed layouts
   - Support for userspace allocated IOVA (GPU virtual address)
   - DPU: DSC (Display Stream Compression) support
   - DP: eDP support
   - DP: conversion to use drm_bridge and drm_bridge_connector
   - Merge DPU1 and MDP5 MDSS driver
   - DPU: writeback support

  nouveau:
   - make some structures static
   - make some variables static
   - switch to drm_gem_plane_helper_prepare_fb

  radeon:
   - misc fixes/cleanups

  mxsfb:
   - rework crtc mode setting
   - LCDIF CRC support

  etnaviv:
   - fencing improvements
   - fix address space collisions
   - cleanup MMU reference handling

  gma500:
   - GEM/GTT improvements
   - connector handling fixes

  komeda:
   - switch to plane reset helper

  mediatek:
   - MIPI DSI improvements

  omapdrm:
   - GEM improvements

  qxl:
   - aarch64 support

  vc4:
   - add a CL submission tracepoint
   - HDMI YUV support
   - HDMI/clock improvements
   - drop is_hdmi caching

  virtio:
   - remove restriction of non-zero blob types

  vmwgfx:
   - support for cursormob and cursorbypass 4
   - fence improvements

  tidss:
   - reset DISPC on startup

  solomon:
   - SPI support
   - DT improvements

  sun4i:
   - allwinner D1 support
   - drop is_hdmi caching

  imx:
   - use swap() instead of open-coding
   - use devm_platform_ioremap_resource
   - remove redunant initializations

  ast:
   - Displayport support

  rockchip:
   - Refactor IOMMU initialisation
   - make some structures static
   - replace drm_detect_hdmi_monitor with drm_display_info.is_hdmi
   - support swapped YUV formats,
   - clock improvements
   - rk3568 support
   - VOP2 support

  mediatek:
   - MT8186 support

  tegra:
   - debugabillity improvements"

* tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm: (1740 commits)
  drm/i915/dsi: fix VBT send packet port selection for ICL+
  drm/i915/uc: Fix undefined behavior due to shift overflowing the constant
  drm/i915/reg: fix undefined behavior due to shift overflowing the constant
  drm/i915/gt: Fix use of static in macro mismatch
  drm/i915/audio: fix audio code enable/disable pipe logging
  drm/i915: Fix CFI violation with show_dynamic_id()
  drm/i915: Fix 'mixing different enum types' warnings in intel_display_power.c
  drm/i915/gt: Fix build error without CONFIG_PM
  drm/msm/dpu: handle pm_runtime_get_sync() errors in bind path
  drm/msm/dpu: add DRM_MODE_ROTATE_180 back to supported rotations
  drm/msm: don't free the IRQ if it was not requested
  drm/msm/dpu: limit writeback modes according to max_linewidth
  drm/amd: Don't reset dGPUs if the system is going to s2idle
  drm/amdgpu: Unmap legacy queue when MES is enabled
  drm: msm: fix possible memory leak in mdp5_crtc_cursor_set()
  drm/msm: Fix fb plane offset calculation
  drm/msm/a6xx: Fix refcount leak in a6xx_gpu_init
  drm/msm/dsi: don't powerup at modeset time for parade-ps8640
  drm/rockchip: Change register space names in vop2
  dt-bindings: display: rockchip: make reg-names mandatory for VOP2
  ...
2022-05-25 16:18:27 -07:00
Joerg Roedel
b0dacee202 Merge branches 'apple/dart', 'arm/mediatek', 'arm/msm', 'arm/smmu', 'ppc/pamu', 'x86/vt-d', 'x86/amd' and 'vfio-notifier-fix' into next 2022-05-20 12:27:17 +02:00
Lu Baolu
0d647b33e7 iommu/vt-d: Remove hard coding PGSNP bit in PASID entries
As enforce_cache_coherency has been introduced into the iommu_domain_ops,
the kernel component which owns the iommu domain is able to opt-in its
requirement for force snooping support. The iommu driver has no need to
hard code the page snoop control bit in the PASID table entries anymore.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220508123525.1973626-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220510023407.2759143-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-13 15:14:56 +02:00
Lu Baolu
e80552267b iommu/vt-d: Remove domain_update_iommu_snooping()
The IOMMU force snooping capability is not required to be consistent
among all the IOMMUs anymore. Remove force snooping capability check
in the IOMMU hot-add path and domain_update_iommu_snooping() becomes
a dead code now.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220508123525.1973626-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220510023407.2759143-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-13 15:14:56 +02:00
Lu Baolu
fc0051cb95 iommu/vt-d: Check domain force_snooping against attached devices
As domain->force_snooping only impacts the devices attached with the
domain, there's no need to check against all IOMMU units. On the other
hand, force_snooping could be set on a domain no matter whether it has
been attached or not, and once set it is an immutable flag. If no
device attached, the operation always succeeds. Then this empty domain
can be only attached to a device of which the IOMMU supports snoop
control.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220508123525.1973626-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220510023407.2759143-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-13 15:14:56 +02:00
Lu Baolu
9d6ab26a75 iommu/vt-d: Block force-snoop domain attaching if no SC support
In the attach_dev callback of the default domain ops, if the domain has
been set force_snooping, but the iommu hardware of the device does not
support SC(Snoop Control) capability, the callback should block it and
return a corresponding error code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220508123525.1973626-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220510023407.2759143-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-13 15:14:56 +02:00
Lu Baolu
bac4e778d6 iommu/vt-d: Fold dmar_insert_one_dev_info() into its caller
Fold dmar_insert_one_dev_info() into domain_add_dev_info() which is its
only caller.

No intentional functional impact.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220416120423.879552-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220510023407.2759143-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-13 15:14:56 +02:00
Lu Baolu
e19c3992b9 iommu/vt-d: Change return type of dmar_insert_one_dev_info()
The dmar_insert_one_dev_info() returns the pass-in domain on success and
NULL on failure. This doesn't make much sense. Change it to an integer.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220416120423.879552-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220510023407.2759143-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-13 15:14:56 +02:00
Muhammad Usama Anjum
cd901e9284 iommu/vt-d: Remove unneeded validity check on dev
dev_iommu_priv_get() is being used at the top of this function which
dereferences dev. Dev cannot be NULL after this. Remove the validity
check on dev and simplify the code.

Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20220313150337.593650-1-usama.anjum@collabora.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220510023407.2759143-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-13 15:14:56 +02:00
Jason Gunthorpe
f78dc1dad8 iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for IOMMU_CACHE
While the comment was correct that this flag was intended to convey the
block no-snoop support in the IOMMU, it has become widely implemented and
used to mean the IOMMU supports IOMMU_CACHE as a map flag. Only the Intel
driver was different.

Now that the Intel driver is using enforce_cache_coherency() update the
comment to make it clear that IOMMU_CAP_CACHE_COHERENCY is only about
IOMMU_CACHE.  Fix the Intel driver to return true since IOMMU_CACHE always
works.

The two places that test this flag, usnic and vdpa, are both assigning
userspace pages to a driver controlled iommu_domain and require
IOMMU_CACHE behavior as they offer no way for userspace to synchronize
caches.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28 17:24:57 +02:00
Jason Gunthorpe
71cfafda9c vfio: Move the Intel no-snoop control off of IOMMU_CACHE
IOMMU_CACHE means "normal DMA to this iommu_domain's IOVA should be cache
coherent" and is used by the DMA API. The definition allows for special
non-coherent DMA to exist - ie processing of the no-snoop flag in PCIe
TLPs - so long as this behavior is opt-in by the device driver.

The flag is mainly used by the DMA API to synchronize the IOMMU setting
with the expected cache behavior of the DMA master. eg based on
dev_is_dma_coherent() in some case.

For Intel IOMMU IOMMU_CACHE was redefined to mean 'force all DMA to be
cache coherent' which has the practical effect of causing the IOMMU to
ignore the no-snoop bit in a PCIe TLP.

x86 platforms are always IOMMU_CACHE, so Intel should ignore this flag.

Instead use the new domain op enforce_cache_coherency() which causes every
IOPTE created in the domain to have the no-snoop blocking behavior.

Reconfigure VFIO to always use IOMMU_CACHE and call
enforce_cache_coherency() to operate the special Intel behavior.

Remove the IOMMU_CACHE test from Intel IOMMU.

Ultimately VFIO plumbs the result of enforce_cache_coherency() back into
the x86 platform code through kvm_arch_register_noncoherent_dma() which
controls if the WBINVD instruction is available in the guest.  No other
archs implement kvm_arch_register_noncoherent_dma() nor are there any
other known consumers of VFIO_DMA_CC_IOMMU that might be affected by the
user visible result change on non-x86 archs.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28 17:24:57 +02:00
Jason Gunthorpe
6043257b1d iommu: Introduce the domain op enforce_cache_coherency()
This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY and
IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU.

Currently only Intel and AMD IOMMUs are known to support this
feature. They both implement it as an IOPTE bit, that when set, will cause
PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though
the no-snoop bit was clear.

The new API is triggered by calling enforce_cache_coherency() before
mapping any IOVA to the domain which globally switches on no-snoop
blocking. This allows other implementations that might block no-snoop
globally and outside the IOPTE - AMD also documents such a HW capability.

Leave AMD out of sync with Intel and have it block no-snoop even for
in-kernel users. This can be trivially resolved in a follow up patch.

Only VFIO needs to call this API because it does not have detailed control
over the device to avoid requesting no-snoop behavior at the device
level. Other places using domains with real kernel drivers should simply
avoid asking their devices to set the no-snoop bit.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28 17:24:57 +02:00
Lu Baolu
da8669ff41 iommu/vt-d: Drop stop marker messages
The page fault handling framework in the IOMMU core explicitly states
that it doesn't handle PCI PASID Stop Marker and the IOMMU drivers must
discard them before reporting faults. This handles Stop Marker messages
in prq_event_thread() before reporting events to the core.

The VT-d driver explicitly drains the pending page requests when a CPU
page table (represented by a mm struct) is unbound from a PASID according
to the procedures defined in the VT-d spec. The Stop Marker messages do
not need a response. Hence, it is safe to drop the Stop Marker messages
silently if any of them is found in the page request queue.

Fixes: d5b9e4bfe0 ("iommu/vt-d: Report prq to io-pgfault framework")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220421113558.3504874-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220423082330.3897867-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28 11:01:53 +02:00
David Stevens
59bf3557cf iommu/vt-d: Calculate mask for non-aligned flushes
Calculate the appropriate mask for non-size-aligned page selective
invalidation. Since psi uses the mask value to mask out the lower order
bits of the target address, properly flushing the iotlb requires using a
mask value such that [pfn, pfn+pages) all lie within the flushed
size-aligned region.  This is not normally an issue because iova.c
always allocates iovas that are aligned to their size. However, iovas
which come from other sources (e.g. userspace via VFIO) may not be
aligned.

To properly flush the IOTLB, both the start and end pfns need to be
equal after applying the mask. That means that the most efficient mask
to use is the index of the lowest bit that is equal where all higher
bits are also equal. For example, if pfn=0x17f and pages=3, then
end_pfn=0x181, so the smallest mask we can use is 8. Any differences
above the highest bit of pages are due to carrying, so by xnor'ing pfn
and end_pfn and then masking out the lower order bits based on pages, we
get 0xffffff00, where the first set bit is the mask we want to use.

Fixes: 6fe1010d6d ("vfio/type1: DMA unmap chunking")
Cc: stable@vger.kernel.org
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220401022430.1262215-1-stevensd@google.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220410013533.3959168-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28 11:01:06 +02:00
Robin Murphy
d0be55fbeb iommu: Add capability for pre-boot DMA protection
VT-d's dmar_platform_optin() actually represents a combination of
properties fairly well standardised by Microsoft as "Pre-boot DMA
Protection" and "Kernel DMA Protection"[1]. As such, we can provide
interested consumers with an abstracted capability rather than
driver-specific interfaces that won't scale. We name it for the former
aspect since that's what external callers are most likely to be
interested in; the latter is for the IOMMU layer to handle itself.

[1] https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-kernel-dma-protection

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/d6218dff2702472da80db6aec2c9589010684551.1650878781.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28 10:30:25 +02:00
Christoph Hellwig
78013eaadf x86: remove the IOMMU table infrastructure
The IOMMU table tries to separate the different IOMMUs into different
backends, but actually requires various cross calls.

Rewrite the code to do the generic swiotlb/swiotlb-xen setup directly
in pci-dma.c and then just call into the IOMMU drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18 07:21:10 +02:00
Jani Nikula
83970cd63b Merge drm/drm-next into drm-intel-next
Sync up with v5.18-rc1, in particular to get 5e3094cfd9
("drm/i915/xehpsdv: Add has_flat_ccs to device info").

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2022-04-11 16:01:56 +03:00
Linus Torvalds
34af78c4e6 IOMMU Updates for Linux v5.18
Including:
 
 	- IOMMU Core changes:
 	  - Removal of aux domain related code as it is basically dead
 	    and will be replaced by iommu-fd framework
 	  - Split of iommu_ops to carry domain-specific call-backs
 	    separatly
 	  - Cleanup to remove useless ops->capable implementations
 	  - Improve 32-bit free space estimate in iova allocator
 
 	- Intel VT-d updates:
 	  - Various cleanups of the driver
 	  - Support for ATS of SoC-integrated devices listed in
 	    ACPI/SATC table
 
 	- ARM SMMU updates:
 	  - Fix SMMUv3 soft lockup during continuous stream of events
 	  - Fix error path for Qualcomm SMMU probe()
 	  - Rework SMMU IRQ setup to prepare the ground for PMU support
 	  - Minor cleanups and refactoring
 
 	- AMD IOMMU driver:
 	  - Some minor cleanups and error-handling fixes
 
 	- Rockchip IOMMU driver:
 	  - Use standard driver registration
 
 	- MSM IOMMU driver:
 	  - Minor cleanup and change to standard driver registration
 
 	- Mediatek IOMMU driver:
 	  - Fixes for IOTLB flushing logic
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmI8OUkACgkQK/BELZcB
 GuNz9xAAvlgEg3byMx1y6LY9IctVGyLsegweVGM4+m6XR7qvT5Llc1E2Yw4Gooe4
 EAceOihDKW2T9VnMlz9g/cG7Modrx60chcB22KKfxDXPl6yF3R89EMd7DE43T6n/
 KPrP9+EsBnI8QSXyYu9ZowioX4CYwWhWD0dKHKAwDvw0BWHHUJ4hTaoHqEoIqLdP
 vubeHziIok/g1sylSpJjTzV7r/Na8Q3TGcb/Mi5qC8uiyiyx40vtaduMGNW+/ToN
 EqOKszxPmHfHv/xf0IHo0eUZ2L/JAe0mAlZzOb09f5F2sXJrbC05jlmRaDmSjT+u
 iEc1r2By/0xo6iOuQC3wD6kTvwwO/ecpNYGhXYXdTbtLquYfL5PSXjRHEU9gf2BO
 i/llPDsnytPvm/hnmbi26ChNR6yrDPz5bkoCUl5mnB1jZcaZtIURN7cRlEPPZUWo
 62VDNdqWDB6AvALc1/SwYdJX/i5eaBf+niS7/BJ/IkLp2oJxFzrGsU8SRJFHNYsa
 zdFIUUoTw647Ul6derSpGzHow169/RwVKYPiXMsaA8/viPNjpBOtfg56abn1WLW6
 N4CtwNu6tt+sPfftFdFx2cDEMW2zpWg5doMddBfEx9FAk0HJ4WLZiTpaO2PxcLyd
 kCAsGHj+ViAZHINVKFV4nQN/V9yQtcIc4UPmSGJBtKCK3KUYujw=
 =bcqr
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - IOMMU Core changes:
      - Removal of aux domain related code as it is basically dead and
        will be replaced by iommu-fd framework
      - Split of iommu_ops to carry domain-specific call-backs separatly
      - Cleanup to remove useless ops->capable implementations
      - Improve 32-bit free space estimate in iova allocator

 - Intel VT-d updates:
      - Various cleanups of the driver
      - Support for ATS of SoC-integrated devices listed in ACPI/SATC
        table

 - ARM SMMU updates:
      - Fix SMMUv3 soft lockup during continuous stream of events
      - Fix error path for Qualcomm SMMU probe()
      - Rework SMMU IRQ setup to prepare the ground for PMU support
      - Minor cleanups and refactoring

 - AMD IOMMU driver:
      - Some minor cleanups and error-handling fixes

 - Rockchip IOMMU driver:
      - Use standard driver registration

 - MSM IOMMU driver:
      - Minor cleanup and change to standard driver registration

 - Mediatek IOMMU driver:
      - Fixes for IOTLB flushing logic

* tag 'iommu-updates-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (47 commits)
  iommu/amd: Improve amd_iommu_v2_exit()
  iommu/amd: Remove unused struct fault.devid
  iommu/amd: Clean up function declarations
  iommu/amd: Call memunmap in error path
  iommu/arm-smmu: Account for PMU interrupts
  iommu/vt-d: Enable ATS for the devices in SATC table
  iommu/vt-d: Remove unused function intel_svm_capable()
  iommu/vt-d: Add missing "__init" for rmrr_sanity_check()
  iommu/vt-d: Move intel_iommu_ops to header file
  iommu/vt-d: Fix indentation of goto labels
  iommu/vt-d: Remove unnecessary prototypes
  iommu/vt-d: Remove unnecessary includes
  iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO
  iommu/vt-d: Remove domain and devinfo mempool
  iommu/vt-d: Remove iova_cache_get/put()
  iommu/vt-d: Remove finding domain in dmar_insert_one_dev_info()
  iommu/vt-d: Remove intel_iommu::domains
  iommu/mediatek: Always tlb_flush_all when each PM resume
  iommu/mediatek: Add tlb_lock in tlb_flush_all
  iommu/mediatek: Remove the power status checking in tlb flush all
  ...
2022-03-24 19:48:57 -07:00
Linus Torvalds
3fd33273a4 Reenable ENQCMD/PASID support:
- Simplify the PASID handling to allocate the PASID once, associate it to
    the mm of a process and free it on mm_exit(). The previous attempt of
    refcounted PASIDs and dynamic alloc()/free() turned out to be error
    prone and too complex. The PASID space is 20bits, so the case of
    resource exhaustion is a pure academic concern.
 
  - Populate the PASID MSR on demand via #GP to avoid racy updates via IPIs.
 
  - Reenable ENQCMD and let objtool check for the forbidden usage of ENQCMD
    in the kernel.
 
  - Update the documentation for Shared Virtual Addressing accordingly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmI4WpETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUfnD/0bY94rgEX4Uuy/mFQ1W8X8XlcyKrha
 0/cRATb+4QV/pwJgGr2nClKhGlFMYPdJLvKMC1TCUPCVrLD1RNmluIZoFzeqXwhm
 jDdCcFOuGZ2D4ujDPWwOOpKBT1ytovnQa7+lH6QJyKkEqdcC2ncOvGJQoiRxRQIG
 8wTVs/OUvQJ5ZhSZQMKQN4uMWMyHEjhbroYS30/uNi/598jTPgzlEoa14XocQ9Os
 nS6ALvjuc9MsJ34F61etMaJU1ZMI3Wx75u9QjEvX6hmJs87YdvgwE7lzJUKFDEuh
 gewM0wp2fTa8/azzP0eMiHTin56PqFdmllzRqXmilbZMEPOeI29dZVArCdpKcAn0
 r9p1kJUT3Xl2G3Oir/OdCaaQHcznD1Y5ZFOyh12wgEucZ/rdeSr7nq7n5HoOL5Bw
 Q2o6YvTkE9DOL0nTN1lSXGiPspou7fzX0uUcRBrbJUS3sBv4zGIlaJXUaTVnSdAt
 VZj4LeOK7v2BjyeiOY0iaaIQd3xjmLUF0UjozXS5M13SoVcToZRbyWqhDzPvNuKA
 imQb/dnFpXhABgmuqAiJLeqM0VtGMFNc780OURkcsBSPng+iSEdV4DzuhK0jpU8x
 Uk1RuGMd/vgmrlDFBrw+orQQiiKR1ixpI0LiHfcOBycfJhqTwcnrNZvAN5/do28Z
 E23+QzlUbZF0cw==
 =Dy8V
 -----END PGP SIGNATURE-----

Merge tag 'x86-pasid-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 PASID support from Thomas Gleixner:
 "Reenable ENQCMD/PASID support:

   - Simplify the PASID handling to allocate the PASID once, associate
     it to the mm of a process and free it on mm_exit().

     The previous attempt of refcounted PASIDs and dynamic
     alloc()/free() turned out to be error prone and too complex. The
     PASID space is 20bits, so the case of resource exhaustion is a pure
     academic concern.

   - Populate the PASID MSR on demand via #GP to avoid racy updates via
     IPIs.

   - Reenable ENQCMD and let objtool check for the forbidden usage of
     ENQCMD in the kernel.

   - Update the documentation for Shared Virtual Addressing accordingly"

* tag 'x86-pasid-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/x86: Update documentation for SVA (Shared Virtual Addressing)
  tools/objtool: Check for use of the ENQCMD instruction in the kernel
  x86/cpufeatures: Re-enable ENQCMD
  x86/traps: Demand-populate PASID MSR via #GP
  sched: Define and initialize a flag to identify valid PASID in the task
  x86/fpu: Clear PASID when copying fpstate
  iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
  kernel/fork: Initialize mm's PASID
  iommu/ioasid: Introduce a helper to check for valid PASIDs
  mm: Change CONFIG option for mm->pasid field
  iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA
2022-03-21 12:28:13 -07:00
Joerg Roedel
e17c6debd4 Merge branches 'arm/mediatek', 'arm/msm', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'x86/vt-d' and 'x86/amd' into next 2022-03-08 12:21:31 +01:00
Yian Chen
97f2f2c531 iommu/vt-d: Enable ATS for the devices in SATC table
Starting from Intel VT-d v3.2, Intel platform BIOS can provide additional
SATC table structure. SATC table includes a list of SoC integrated devices
that support ATC (Address translation cache).

Enabling ATC (via ATS capability) can be a functional requirement for SATC
device operation or optional to enhance device performance/functionality.
This is determined by the bit of ATC_REQUIRED in SATC table. When IOMMU is
working in scalable mode, software chooses to always enable ATS for every
device in SATC table because Intel SoC devices in SATC table are trusted to
use ATS.

On the other hand, if IOMMU is in legacy mode, ATS of SATC capable devices
can work transparently to software and be automatically enabled by IOMMU
hardware. As the result, there is no need for software to enable ATS on
these devices.

This also removes dmar_find_matched_atsr_unit() helper as it becomes dead
code now.

Signed-off-by: Yian Chen <yian.chen@intel.com>
Link: https://lore.kernel.org/r/20220222185416.1722611-1-yian.chen@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220301020159.633356-13-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:31 +01:00
YueHaibing
b897a1b7ad iommu/vt-d: Remove unused function intel_svm_capable()
This is unused since commit 4048377414 ("iommu/vt-d: Use
iommu_sva_alloc(free)_pasid() helpers").

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20220216113851.25004-1-yuehaibing@huawei.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220301020159.633356-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:31 +01:00
Marco Bonelli
45967ffb9e iommu/vt-d: Add missing "__init" for rmrr_sanity_check()
rmrr_sanity_check() calls arch_rmrr_sanity_check(), which is marked as
"__init". Add the annotation for rmrr_sanity_check() too. This function is
currently only called from dmar_parse_one_rmrr() which is also "__init".

Signed-off-by: Marco Bonelli <marco@mebeim.net>
Link: https://lore.kernel.org/r/20220210023141.9208-1-marco@mebeim.net
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220301020159.633356-11-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:31 +01:00
Andy Shevchenko
2852631d96 iommu/vt-d: Move intel_iommu_ops to header file
Compiler is not happy about hidden declaration of intel_iommu_ops.

.../iommu.c:414:24: warning: symbol 'intel_iommu_ops' was not declared. Should it be static?

Move declaration to header file to make compiler happy.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220207141240.8253-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220301020159.633356-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:31 +01:00
Lu Baolu
2187a57ef0 iommu/vt-d: Fix indentation of goto labels
Remove blanks before goto labels.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220301020159.633356-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:31 +01:00
Lu Baolu
782861df7d iommu/vt-d: Remove unnecessary prototypes
Some prototypes in iommu.c are unnecessary. Delete them.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220301020159.633356-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:30 +01:00
Lu Baolu
763e656c69 iommu/vt-d: Remove unnecessary includes
Remove unnecessary include files and sort the remaining alphabetically.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220301020159.633356-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:30 +01:00
Lu Baolu
586081d3f6 iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO
Allocate and set the per-device iommu private data during iommu device
probe. This makes the per-device iommu private data always available
during iommu_probe_device() and iommu_release_device(). With this changed,
the dummy DEFER_DEVICE_DOMAIN_INFO pointer could be removed. The wrappers
for getting the private data and domain are also cleaned.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220301020159.633356-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:30 +01:00
Lu Baolu
ee2653bbe8 iommu/vt-d: Remove domain and devinfo mempool
The domain and devinfo memory blocks are only allocated during device
probe and released during remove. There's no hot-path context, hence
no need for memory pools.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220301020159.633356-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:30 +01:00
Lu Baolu
c8850a6e6d iommu/vt-d: Remove iova_cache_get/put()
These have been done in drivers/iommu/dma-iommu.c. Remove this duplicate
code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220301020159.633356-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:30 +01:00
Lu Baolu
c5d27545fb iommu/vt-d: Remove finding domain in dmar_insert_one_dev_info()
The Intel IOMMU driver has already converted to use default domain
framework in iommu core. There's no need to find a domain for the
device in the domain attaching path. Cleanup that code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220301020159.633356-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:30 +01:00
Lu Baolu
402e6688a7 iommu/vt-d: Remove intel_iommu::domains
The "domains" field of the intel_iommu structure keeps the mapping of
domain_id to dmar_domain. This information is not used anywhere. Remove
and cleanup it to avoid unnecessary memory consumption.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220301020159.633356-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-03-04 16:46:30 +01:00
Joerg Roedel
067e95fc34 Merge branch 'core' into x86/vt-d 2022-03-04 16:46:19 +01:00
Tejas Upadhyay
0a967f5bfd iommu/vt-d: Add RPLS to quirk list to skip TE disabling
The VT-d spec requires (10.4.4 Global Command Register, TE
field) that:

Hardware implementations supporting DMA draining must drain
any in-flight DMA read/write requests queued within the
Root-Complex before completing the translation enable
command and reflecting the status of the command through
the TES field in the Global Status register.

Unfortunately, some integrated graphic devices fail to do
so after some kind of power state transition. As the
result, the system might stuck in iommu_disable_translati
on(), waiting for the completion of TE transition.

This adds RPLS to a quirk list for those devices and skips
TE disabling if the qurik hits.

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4898
Tested-by: Raviteja Goud Talla <ravitejax.goud.talla@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302043256.191529-1-tejaskumarx.surendrakumar.upadhyay@intel.com
2022-03-02 17:06:30 -05:00
Adrian Huang
b00833768e iommu/vt-d: Fix double list_add when enabling VMD in scalable mode
When enabling VMD and IOMMU scalable mode, the following kernel panic
call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids
CPU) during booting:

pci 0000:59:00.5: Adding to iommu group 42
...
vmd 0000:59:00.5: PCI host bridge to bus 10000:80
pci 10000:80:01.0: [8086:352a] type 01 class 0x060400
pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:01.0: enabling Extended Tags
pci 10000:80:01.0: PME# supported from D0 D3hot D3cold
pci 10000:80:01.0: DMAR: Setup RID2PASID failed
pci 10000:80:01.0: Failed to add to iommu group 42: -16
pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:03.0: enabling Extended Tags
pci 10000:80:03.0: PME# supported from D0 D3hot D3cold
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7
Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022
Workqueue: events work_for_cpu_fn
RIP: 0010:__list_add_valid.cold+0x26/0x3f
Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f
      0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1
      fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9
      9e e8 8b b1 fe
RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246
RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8
RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20
RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888
R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0
R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70
FS:  0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 intel_pasid_alloc_table+0x9c/0x1d0
 dmar_insert_one_dev_info+0x423/0x540
 ? device_to_iommu+0x12d/0x2f0
 intel_iommu_attach_device+0x116/0x290
 __iommu_attach_device+0x1a/0x90
 iommu_group_add_device+0x190/0x2c0
 __iommu_probe_device+0x13e/0x250
 iommu_probe_device+0x24/0x150
 iommu_bus_notifier+0x69/0x90
 blocking_notifier_call_chain+0x5a/0x80
 device_add+0x3db/0x7b0
 ? arch_memremap_can_ram_remap+0x19/0x50
 ? memremap+0x75/0x140
 pci_device_add+0x193/0x1d0
 pci_scan_single_device+0xb9/0xf0
 pci_scan_slot+0x4c/0x110
 pci_scan_child_bus_extend+0x3a/0x290
 vmd_enable_domain.constprop.0+0x63e/0x820
 vmd_probe+0x163/0x190
 local_pci_probe+0x42/0x80
 work_for_cpu_fn+0x13/0x20
 process_one_work+0x1e2/0x3b0
 worker_thread+0x1c4/0x3a0
 ? rescuer_thread+0x370/0x370
 kthread+0xc7/0xf0
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
...
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---

The following 'lspci' output shows devices '10000:80:*' are subdevices of
the VMD device 0000:59:00.5:

  $ lspci
  ...
  0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20)
  ...
  10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03)
  10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03)
  10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03)
  10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03)
  10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
  10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]

The symptom 'list_add double add' is caused by the following failure
message:

  pci 10000:80:01.0: DMAR: Setup RID2PASID failed
  pci 10000:80:01.0: Failed to add to iommu group 42: -16
  pci 10000:80:03.0: [8086:352b] type 01 class 0x060400

Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5,
so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD
device 0000:59:00.5. Here is call path:

  intel_pasid_alloc_table
    pci_for_each_dma_alias
     get_alias_pasid_table
       search_pasid_table

pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device
which is the VMD device 0000:59:00.5. However, pte of the VMD device
0000:59:00.5 has been configured during this message "pci 0000:59:00.5:
Adding to iommu group 42". So, the status -EBUSY is returned when
configuring pasid entry for device 10000:80:01.0.

It then invokes dmar_remove_one_dev_info() to release
'struct device_domain_info *' from iommu_devinfo_cache. But, the pasid
table is not released because of the following statement in
__dmar_remove_one_dev_info():

	if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
		...
		intel_pasid_free_table(info->dev);
        }

The subsequent dmar_insert_one_dev_info() operation of device
10000:80:03.0 allocates 'struct device_domain_info *' from
iommu_devinfo_cache. The allocated address is the same address that
is released previously for device 10000:80:01.0. Finally, invoking
device_attach_pasid_table() causes the issue.

`git bisect` points to the offending commit 474dd1c650 ("iommu/vt-d:
Fix clearing real DMA device's scalable-mode context entries"), which
releases the pasid table if the device is not the subdevice by
checking the returned status of dev_is_real_dma_subdevice().
Reverting the offending commit can work around the issue.

The solution is to prevent from allocating pasid table if those
devices are subdevices of the VMD device.

Fixes: 474dd1c650 ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Link: https://lore.kernel.org/r/20220216091307.703-1-adrianhuang0701@gmail.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220221053348.262724-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-02-28 13:48:44 +01:00
Lu Baolu
9a630a4b41 iommu: Split struct iommu_ops
Move the domain specific operations out of struct iommu_ops into a new
structure that only has domain specific operations. This solves the
problem of needing to know if the method vector for a given operation
needs to be retrieved from the device or the domain. Logically the domain
ops are the ones that make sense for external subsystems and endpoint
drivers to use, while device ops, with the sole exception of domain_alloc,
are IOMMU API internals.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-02-28 13:25:49 +01:00
Lu Baolu
41bb23e70b iommu: Remove unused argument in is_attach_deferred
The is_attach_deferred iommu_ops callback is a device op. The domain
argument is unnecessary and never used. Remove it to make code clean.

Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-02-28 13:25:49 +01:00
Lu Baolu
241469685d iommu/vt-d: Remove aux-domain related callbacks
The aux-domain related callbacks are not called in the tree. Remove them
to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-02-28 13:25:48 +01:00
Lu Baolu
989192ac6a iommu/vt-d: Remove guest pasid related callbacks
The guest pasid related callbacks are not called in the tree. Remove them
to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-02-28 13:25:48 +01:00
Fenghua Yu
701fac4038 iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
PASIDs are process-wide. It was attempted to use refcounted PASIDs to
free them when the last thread drops the refcount. This turned out to
be complex and error prone. Given the fact that the PASID space is 20
bits, which allows up to 1M processes to have a PASID associated
concurrently, PASID resource exhaustion is not a realistic concern.

Therefore, it was decided to simplify the approach and stick with lazy
on demand PASID allocation, but drop the eager free approach and make an
allocated PASID's lifetime bound to the lifetime of the process.

Get rid of the refcounting mechanisms and replace/rename the interfaces
to reflect this new approach.

  [ bp: Massage commit message. ]

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20220207230254.3342514-6-fenghua.yu@intel.com
2022-02-15 11:31:35 +01:00
Fenghua Yu
7ba564722d iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA
This CONFIG option originally only referred to the Shared
Virtual Address (SVA) library. But it is now also used for
non-library portions of code.

Drop the "_LIB" suffix so that there is just one configuration
option for all code relating to SVA.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220207230254.3342514-2-fenghua.yu@intel.com
2022-02-14 19:17:46 +01:00
Rafael J. Wysocki
f266c11bce iommu/vtd: Replace acpi_bus_get_device()
Replace acpi_bus_get_device() that is going to be dropped with
acpi_fetch_acpi_dev().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1807113.tdWV9SEqCh@kreacher
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-02-14 12:59:45 +01:00
Guoqing Jiang
99e675d473 iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()
After commit e3beca48a4 ("irqdomain/treewide: Keep firmware node
unconditionally allocated"). For tear down scenario, fn is only freed
after fail to allocate ir_domain, though it also should be freed in case
dmar_enable_qi returns error.

Besides free fn, irq_domain and ir_msi_domain need to be removed as well
if intel_setup_irq_remapping fails to enable queued invalidation.

Improve the rewinding path by add out_free_ir_domain and out_free_fwnode
lables per Baolu's suggestion.

Fixes: e3beca48a4 ("irqdomain/treewide: Keep firmware node unconditionally allocated")
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20220119063640.16864-1-guoqing.jiang@linux.dev
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220128031002.2219155-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-01-31 16:53:09 +01:00
Joerg Roedel
66dc1b791c Merge branches 'arm/smmu', 'virtio', 'x86/amd', 'x86/vt-d' and 'core' into next 2022-01-04 10:33:45 +01:00
Matthew Wilcox (Oracle)
87f60cc65d iommu/vt-d: Use put_pages_list
page->freelist is for the use of slab.  We already have the ability
to free a list of pages in the core mm, but it requires the use of a
list_head and for the pages to be chained together through page->lru.
Switch the Intel IOMMU and IOVA code over to using free_pages_list().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[rm: split from original patch, cosmetic tweaks, fix fq entries]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/2115b560d9a0ce7cd4b948bd51a2b7bde8fdfd59.1639753638.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-12-20 09:03:05 +01:00
Maíra Canal
c95a9c278d iommu/vt-d: Remove unused dma_to_mm_pfn function
Remove dma_to_buf_pfn function, which is not used in the codebase.

This was pointed by clang with the following warning:

'dma_to_mm_pfn' [-Wunused-function]
static inline unsigned long dma_to_mm_pfn(unsigned long dma_pfn)
                            ^
https://lore.kernel.org/r/YYhY7GqlrcTZlzuA@fedora

drivers/iommu/intel/iommu.c:136:29: warning: unused function
Signed-off-by: Maíra Canal <maira.canal@usp.br>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211217083817.1745419-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-12-17 09:40:29 +01:00
Kefeng Wang
f5209f9127 iommu/vt-d: Drop duplicate check in dma_pte_free_pagetable()
The BUG_ON check exists in dma_pte_clear_range(), kill the duplicate
check.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20211025032307.182974-1-wangkefeng.wang@huawei.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211217083817.1745419-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-12-17 09:40:29 +01:00
Christophe JAILLET
bb71257396 iommu/vt-d: Use bitmap_zalloc() when applicable
'iommu->domain_ids' is a bitmap. So use 'bitmap_zalloc()' to simplify code
and improve the semantic.

Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/cb7a3e0a8d522447a06298a4f244c3df072f948b.1635018498.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211217083817.1745419-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-12-17 09:40:29 +01:00
Kees Cook
4599d78a82 iommu/vt-d: Use correctly sized arguments for bit field
The find.h APIs are designed to be used only on unsigned long arguments.
This can technically result in a over-read, but it is harmless in this
case. Regardless, fix it to avoid the warning seen under -Warray-bounds,
which we'd like to enable globally:

In file included from ./include/linux/bitmap.h:9,
                 from drivers/iommu/intel/iommu.c:17:
drivers/iommu/intel/iommu.c: In function 'domain_context_mapping_one':
./include/linux/find.h:119:37: warning: array subscript 'long unsigned int[0]' is partly outside array bounds of 'int[1]' [-Warray-bounds]
  119 |                 unsigned long val = *addr & GENMASK(size - 1, 0);
      |                                     ^~~~~
drivers/iommu/intel/iommu.c:2115:18: note: while referencing 'max_pde'
 2115 |         int pds, max_pde;
      |                  ^~~~~~~

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20211215232432.2069605-1-keescook@chromium.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-12-17 09:05:04 +01:00
Alex Williamson
86dc40c7ea iommu/vt-d: Fix unmap_pages support
When supporting only the .map and .unmap callbacks of iommu_ops,
the IOMMU driver can make assumptions about the size and alignment
used for mappings based on the driver provided pgsize_bitmap.  VT-d
previously used essentially PAGE_MASK for this bitmap as any power
of two mapping was acceptably filled by native page sizes.

However, with the .map_pages and .unmap_pages interface we're now
getting page-size and count arguments.  If we simply combine these
as (page-size * count) and make use of the previous map/unmap
functions internally, any size and alignment assumptions are very
different.

As an example, a given vfio device assignment VM will often create
a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
does not support IOMMU super pages, the unmap_pages interface will
ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
will recurse down to level 2 of the page table where the first half
of the pfn range exactly matches the entire pte level.  We clear the
pte, increment the pfn by the level size, but (oops) the next pte is
on a new page, so we exit the loop an pop back up a level.  When we
then update the pfn based on that higher level, we seem to assume
that the previous pfn value was at the start of the level.  In this
case the level size is 256K pfns, which we add to the base pfn and
get a results of 0x7fe00, which is clearly greater than 0x401ff,
so we're done.  Meanwhile we never cleared the ptes for the remainder
of the range.  When the VM remaps this range, we're overwriting valid
ptes and the VT-d driver complains loudly, as reported by the user
report linked below.

The fix for this seems relatively simple, if each iteration of the
loop in dma_pte_clear_level() is assumed to clear to the end of the
level pte page, then our next pfn should be calculated from level_pfn
rather than our working pfn.

Fixes: 3f34f12597 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
Link: https://lore.kernel.org/r/163659074748.1617923.12716161410774184024.stgit@omen
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211126135556.397932-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-11-26 22:54:47 +01:00
Christophe JAILLET
4e5973dd27 iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock()
If we return -EOPNOTSUPP, the rcu lock remains lock. This is spurious.
Go through the end of the function instead. This way, the missing
'rcu_read_unlock()' is called.

Fixes: 7afd7f6aa2 ("iommu/vt-d: Check FL and SL capability sanity in scalable mode")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/40cc077ca5f543614eab2a10e84d29dd190273f6.1636217517.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211126135556.397932-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-11-26 22:54:47 +01:00
Longpeng(Mike)
9906b9352a iommu/vt-d: Avoid duplicate removing in __domain_mapping()
The __domain_mapping() always removes the pages in the range from
'iov_pfn' to 'end_pfn', but the 'end_pfn' is always the last pfn
of the range that the caller wants to map.

This would introduce too many duplicated removing and leads the
map operation take too long, for example:

  Map iova=0x100000,nr_pages=0x7d61800
    iov_pfn: 0x100000, end_pfn: 0x7e617ff
    iov_pfn: 0x140000, end_pfn: 0x7e617ff
    iov_pfn: 0x180000, end_pfn: 0x7e617ff
    iov_pfn: 0x1c0000, end_pfn: 0x7e617ff
    iov_pfn: 0x200000, end_pfn: 0x7e617ff
    ...
  it takes about 50ms in total.

We can reduce the cost by recalculate the 'end_pfn' and limit it
to the boundary of the end of this pte page.

  Map iova=0x100000,nr_pages=0x7d61800
    iov_pfn: 0x100000, end_pfn: 0x13ffff
    iov_pfn: 0x140000, end_pfn: 0x17ffff
    iov_pfn: 0x180000, end_pfn: 0x1bffff
    iov_pfn: 0x1c0000, end_pfn: 0x1fffff
    iov_pfn: 0x200000, end_pfn: 0x23ffff
    ...
  it only need 9ms now.

This also removes a meaningless BUG_ON() in __domain_mapping().

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Tested-by: Liujunjie <liujunjie23@huawei.com>
Link: https://lore.kernel.org/r/20211008000433.1115-1-longpeng2@huawei.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211014053839.727419-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18 12:31:48 +02:00
Fenghua Yu
00ecd54013 iommu/vt-d: Clean up unused PASID updating functions
update_pasid() and its call chain are currently unused in the tree because
Thomas disabled the ENQCMD feature. The feature will be re-enabled shortly
using a different approach and update_pasid() and its call chain will not
be used in the new approach.

Remove the useless functions.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210920192349.2602141-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211014053839.727419-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18 12:31:48 +02:00
Lu Baolu
94f797ad61 iommu/vt-d: Delete dev_has_feat callback
The commit 262948f8ba ("iommu: Delete iommu_dev_has_feature()") has
deleted the iommu_dev_has_feature() interface. Remove the dev_has_feat
callback to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210929072030.1330225-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20211014053839.727419-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18 12:31:48 +02:00
Lu Baolu
032c5ee40e iommu/vt-d: Use second level for GPA->HPA translation
The IOMMU VT-d implementation uses the first level for GPA->HPA translation
by default. Although both the first level and the second level could handle
the DMA translation, they're different in some way. For example, the second
level translation has separate controls for the Access/Dirty page tracking.
With the first level translation, there's no such control. On the other
hand, the second level translation has the page-level control for forcing
snoop, but the first level only has global control with pasid granularity.

This uses the second level for GPA->HPA translation so that we can provide
a consistent hardware interface for use cases like dirty page tracking for
live migration.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20211014053839.727419-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18 12:31:48 +02:00
Lu Baolu
7afd7f6aa2 iommu/vt-d: Check FL and SL capability sanity in scalable mode
An iommu domain could be allocated and mapped before it's attached to any
device. This requires that in scalable mode, when the domain is allocated,
the format (FL or SL) of the page table must be determined. In order to
achieve this, the platform should support consistent SL or FL capabilities
on all IOMMU's. This adds a check for this and aborts IOMMU probing if it
doesn't meet this requirement.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20211014053839.727419-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18 12:31:48 +02:00
Lu Baolu
b34380a6d7 iommu/vt-d: Remove duplicate identity domain flag
The iommu_domain data structure already has the "type" field to keep the
type of a domain. It's unnecessary to have the DOMAIN_FLAG_STATIC_IDENTITY
flag in the vt-d implementation. This cleans it up with no functionality
change.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20211014053839.727419-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18 12:31:48 +02:00
Kyung Min Park
914ff7719e iommu/vt-d: Dump DMAR translation structure when DMA fault occurs
When the dmar translation fault happens, the kernel prints a single line
fault reason with corresponding hexadecimal code defined in the Intel VT-d
specification.

Currently, when user wants to debug the translation fault in detail,
debugfs is used for dumping the dmar_translation_struct, which is not
available when the kernel failed to boot.

Dump the DMAR translation structure, pagewalk the IO page table and print
the page table entry when the fault happens.

This takes effect only when CONFIG_DMAR_DEBUG is enabled.

Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
Link: https://lore.kernel.org/r/20210815203845.31287-1-kyung.min.park@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211014053839.727419-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18 12:31:48 +02:00
Tvrtko Ursulin
5240aed2cd iommu/vt-d: Do not falsely log intel_iommu is unsupported kernel option
Handling of intel_iommu kernel command line option should return "true" to
indicate option is valid and so avoid logging it as unknown by the core
parsing code.

Also log unknown sub-options at the notice level to let user know of
potential typos or similar.

Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://lore.kernel.org/r/20210831112947.310080-1-tvrtko.ursulin@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211014053839.727419-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-10-18 12:31:48 +02:00
Bjorn Helgaas
0b482d0c75 iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses
719a193356 ("iommu/vt-d: Tweak the description of a DMA fault") changed
the DMA fault reason from hex to decimal.  It also added "0x" prefixes to
the PCI bus/device, e.g.,

  - DMAR: [INTR-REMAP] Request device [00:00.5]
  + DMAR: [INTR-REMAP] Request device [0x00:0x00.5]

These no longer match dev_printk() and other similar messages in
dmar_match_pci_path() and dmar_acpi_insert_dev_scope().

Drop the "0x" prefixes from the bus and device addresses.

Fixes: 719a193356 ("iommu/vt-d: Tweak the description of a DMA fault")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20210903193711.483999-1-helgaas@kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210922054726.499110-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-09-28 11:41:11 +02:00
Fenghua Yu
6ef0505158 iommu/vt-d: Fix a deadlock in intel_svm_drain_prq()
pasid_mutex and dev->iommu->param->lock are held while unbinding mm is
flushing IO page fault workqueue and waiting for all page fault works to
finish. But an in-flight page fault work also need to hold the two locks
while unbinding mm are holding them and waiting for the work to finish.
This may cause an ABBA deadlock issue as shown below:

	idxd 0000:00:0a.0: unbind PASID 2
	======================================================
	WARNING: possible circular locking dependency detected
	5.14.0-rc7+ #549 Not tainted [  186.615245] ----------
	dsa_test/898 is trying to acquire lock:
	ffff888100d854e8 (&param->lock){+.+.}-{3:3}, at:
	iopf_queue_flush_dev+0x29/0x60
	but task is already holding lock:
	ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
	intel_svm_unbind+0x34/0x1e0
	which lock already depends on the new lock.

	the existing dependency chain (in reverse order) is:

	-> #2 (pasid_mutex){+.+.}-{3:3}:
	       __mutex_lock+0x75/0x730
	       mutex_lock_nested+0x1b/0x20
	       intel_svm_page_response+0x8e/0x260
	       iommu_page_response+0x122/0x200
	       iopf_handle_group+0x1c2/0x240
	       process_one_work+0x2a5/0x5a0
	       worker_thread+0x55/0x400
	       kthread+0x13b/0x160
	       ret_from_fork+0x22/0x30

	-> #1 (&param->fault_param->lock){+.+.}-{3:3}:
	       __mutex_lock+0x75/0x730
	       mutex_lock_nested+0x1b/0x20
	       iommu_report_device_fault+0xc2/0x170
	       prq_event_thread+0x28a/0x580
	       irq_thread_fn+0x28/0x60
	       irq_thread+0xcf/0x180
	       kthread+0x13b/0x160
	       ret_from_fork+0x22/0x30

	-> #0 (&param->lock){+.+.}-{3:3}:
	       __lock_acquire+0x1134/0x1d60
	       lock_acquire+0xc6/0x2e0
	       __mutex_lock+0x75/0x730
	       mutex_lock_nested+0x1b/0x20
	       iopf_queue_flush_dev+0x29/0x60
	       intel_svm_drain_prq+0x127/0x210
	       intel_svm_unbind+0xc5/0x1e0
	       iommu_sva_unbind_device+0x62/0x80
	       idxd_cdev_release+0x15a/0x200 [idxd]
	       __fput+0x9c/0x250
	       ____fput+0xe/0x10
	       task_work_run+0x64/0xa0
	       exit_to_user_mode_prepare+0x227/0x230
	       syscall_exit_to_user_mode+0x2c/0x60
	       do_syscall_64+0x48/0x90
	       entry_SYSCALL_64_after_hwframe+0x44/0xae

	other info that might help us debug this:

	Chain exists of:
	  &param->lock --> &param->fault_param->lock --> pasid_mutex

	 Possible unsafe locking scenario:

	       CPU0                    CPU1
	       ----                    ----
	  lock(pasid_mutex);
				       lock(&param->fault_param->lock);
				       lock(pasid_mutex);
	  lock(&param->lock);

	 *** DEADLOCK ***

	2 locks held by dsa_test/898:
	 #0: ffff888100cc1cc0 (&group->mutex){+.+.}-{3:3}, at:
	 iommu_sva_unbind_device+0x53/0x80
	 #1: ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
	 intel_svm_unbind+0x34/0x1e0

	stack backtrace:
	CPU: 2 PID: 898 Comm: dsa_test Not tainted 5.14.0-rc7+ #549
	Hardware name: Intel Corporation Kabylake Client platform/KBL S
	DDR4 UD IMM CRB, BIOS KBLSE2R1.R00.X050.P01.1608011715 08/01/2016
	Call Trace:
	 dump_stack_lvl+0x5b/0x74
	 dump_stack+0x10/0x12
	 print_circular_bug.cold+0x13d/0x142
	 check_noncircular+0xf1/0x110
	 __lock_acquire+0x1134/0x1d60
	 lock_acquire+0xc6/0x2e0
	 ? iopf_queue_flush_dev+0x29/0x60
	 ? pci_mmcfg_read+0xde/0x240
	 __mutex_lock+0x75/0x730
	 ? iopf_queue_flush_dev+0x29/0x60
	 ? pci_mmcfg_read+0xfd/0x240
	 ? iopf_queue_flush_dev+0x29/0x60
	 mutex_lock_nested+0x1b/0x20
	 iopf_queue_flush_dev+0x29/0x60
	 intel_svm_drain_prq+0x127/0x210
	 ? intel_pasid_tear_down_entry+0x22e/0x240
	 intel_svm_unbind+0xc5/0x1e0
	 iommu_sva_unbind_device+0x62/0x80
	 idxd_cdev_release+0x15a/0x200

pasid_mutex protects pasid and svm data mapping data. It's unnecessary
to hold pasid_mutex while flushing the workqueue. To fix the deadlock
issue, unlock pasid_pasid during flushing the workqueue to allow the works
to be handled.

Fixes: d5b9e4bfe0 ("iommu/vt-d: Report prq to io-pgfault framework")
Reported-and-tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210828070622.2437559-3-baolu.lu@linux.intel.com
[joro: Removed timing information from kernel log messages]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-09-09 13:18:07 +02:00
Fenghua Yu
a21518cb23 iommu/vt-d: Fix PASID leak in intel_svm_unbind_mm()
The mm->pasid will be used in intel_svm_free_pasid() after load_pasid()
during unbinding mm. Clearing it in load_pasid() will cause PASID cannot
be freed in intel_svm_free_pasid().

Additionally mm->pasid was updated already before load_pasid() during pasid
allocation. No need to update it again in load_pasid() during binding mm.
Don't update mm->pasid to avoid the issues in both binding mm and unbinding
mm.

Fixes: 4048377414 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers")
Reported-and-tested-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210828070622.2437559-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-09-09 13:18:06 +02:00
Joerg Roedel
d8768d7eb9 Merge branches 'apple/dart', 'arm/smmu', 'iommu/fixes', 'x86/amd', 'x86/vt-d' and 'core' into next 2021-08-20 17:14:35 +02:00
Liu Yi L
423d39d851 iommu/vt-d: Add present bit check in pasid entry setup helpers
The helper functions should not modify the pasid entries which are still
in use. Add a check against present bit.

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20210817042425.1784279-1-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210818134852.1847070-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19 10:41:08 +02:00
Liu Yi L
8123b0b868 iommu/vt-d: Use pasid_pte_is_present() helper function
Use the pasid_pte_is_present() helper for present bit check in the
intel_pasid_tear_down_entry().

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20210817042425.1784279-1-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210818134852.1847070-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19 10:41:08 +02:00
Andy Shevchenko
9ddc348214 iommu/vt-d: Drop the kernel doc annotation
Kernel doc validator is unhappy with the following

.../perf.c:16: warning: Function parameter or member 'latency_lock' not described in 'DEFINE_SPINLOCK'
.../perf.c:16: warning: expecting prototype for perf.c(). Prototype was for DEFINE_SPINLOCK() instead

Drop kernel doc annotation since the top comment is not in the required format.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210729163538.40101-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210818134852.1847070-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19 10:41:08 +02:00
Lu Baolu
48811c4434 iommu/vt-d: Allow devices to have more than 32 outstanding PRs
The minimum per-IOMMU PRQ queue size is one 4K page, this is more entries
than the hardcoded limit of 32 in the current VT-d code. Some devices can
support up to 512 outstanding PRQs but underutilized by this limit of 32.
Although, 32 gives some rough fairness when multiple devices share the same
IOMMU PRQ queue, but far from optimal for customized use case. This extends
the per-IOMMU PRQ queue size to four 4K pages and let the devices have as
many outstanding page requests as they can.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210818134852.1847070-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19 10:41:08 +02:00
Lu Baolu
289b3b005c iommu/vt-d: Preset A/D bits for user space DMA usage
We preset the access and dirty bits for IOVA over first level usage only
for the kernel DMA (i.e., when domain type is IOMMU_DOMAIN_DMA). We should
also preset the FL A/D for user space DMA usage. The idea is that even the
user space A/D bit memory write is unnecessary. We should avoid it to
minimize the overhead.

Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210818134852.1847070-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19 10:41:08 +02:00
Lu Baolu
792fb43ce2 iommu/vt-d: Enable Intel IOMMU scalable mode by default
The commit 8950dcd83a ("iommu/vt-d: Leave scalable mode default off")
leaves the scalable mode default off and end users could turn it on with
"intel_iommu=sm_on". Using the Intel IOMMU scalable mode for kernel DMA,
user-level device access and Shared Virtual Address have been enabled.
This enables the scalable mode by default if the hardware advertises the
support and adds kernel options of "intel_iommu=sm_on/sm_off" for end
users to configure it through the kernel parameters.

Suggested-by: Ashok Raj <ashok.raj@intel.com>
Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210818134852.1847070-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19 10:41:08 +02:00
Lu Baolu
01dac2d9d2 iommu/vt-d: Refactor Kconfig a bit
Put all sub-options inside a "if INTEL_IOMMU" so that they don't need to
always depend on INTEL_IOMMU. Use IS_ENABLED() instead of #ifdef as well.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210818134852.1847070-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19 10:41:08 +02:00
Zhen Lei
5e41c99894 iommu/vt-d: Remove unnecessary oom message
Fixes scripts/checkpatch.pl warning:
WARNING: Possible unnecessary 'out of memory' message

Remove it can help us save a bit of memory.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210609124937.14260-1-thunder.leizhen@huawei.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210818134852.1847070-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19 10:41:08 +02:00
Lu Baolu
4d99efb229 iommu/vt-d: Update the virtual command related registers
The VT-d spec Revision 3.3 updated the virtual command registers, virtual
command opcode B register, virtual command response register and virtual
command capability register (Section 10.4.43, 10.4.44, 10.4.45, 10.4.46).
This updates the virtual command interface implementation in the Intel
IOMMU driver accordingly.

Fixes: 24f27d32ab ("iommu/vt-d: Enlightened PASID allocation")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Sanjay Kumar <sanjay.k.kumar@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210713042649.3547403-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210818134852.1847070-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19 10:41:08 +02:00
Robin Murphy
78ca078459 iommu/vt-d: Prepare for multiple DMA domain types
In preparation for the strict vs. non-strict decision for DMA domains
to be expressed in the domain type, make sure we expose our flush queue
awareness by accepting the new domain type, and test the specific
feature flag where we want to identify DMA domains in general. The DMA
ops reset/setup can simply be made unconditional, since iommu-dma
already knows only to touch DMA domains.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/31a8ef868d593a2f3826a6a120edee81815375a7.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18 13:27:49 +02:00
Robin Murphy
f297e27f83 iommu/vt-d: Drop IOVA cookie management
The core code bakes its own cookies now.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/e9dbe3b6108f8538e17e0c5f59f8feeb714f51a4.1628682048.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18 13:25:31 +02:00
Liu Yi L
8798d36411 iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry()
This fixes improper iotlb invalidation in intel_pasid_tear_down_entry().
When a PASID was used as nested mode, released and reused, the following
error message will appear:

[  180.187556] Unexpected page request in Privilege Mode
[  180.187565] Unexpected page request in Privilege Mode
[  180.279933] Unexpected page request in Privilege Mode
[  180.279937] Unexpected page request in Privilege Mode

Per chapter 6.5.3.3 of VT-d spec 3.3, when tear down a pasid entry, the
software should use Domain selective IOTLB flush if the PGTT of the pasid
entry is SL only or Nested, while for the pasid entries whose PGTT is FL
only or PT using PASID-based IOTLB flush is enough.

Fixes: 2cd1311a26 ("iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr")
Signed-off-by: Kumar Sanjay K <sanjay.k.kumar@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Tested-by: Yi Sun <yi.y.sun@intel.com>
Link: https://lore.kernel.org/r/20210817042425.1784279-1-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210817124321.1517985-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18 13:15:58 +02:00
Fenghua Yu
62ef907a04 iommu/vt-d: Fix PASID reference leak
A PASID reference is increased whenever a device is bound to an mm (and
its PASID) successfully (i.e. the device's sdev user count is increased).
But the reference is not dropped every time the device is unbound
successfully from the mm (i.e. the device's sdev user count is decreased).
The reference is dropped only once by calling intel_svm_free_pasid() when
there isn't any device bound to the mm. intel_svm_free_pasid() drops the
reference and only frees the PASID on zero reference.

Fix the issue by dropping the PASID reference and freeing the PASID when
no reference on successful unbinding the device by calling
intel_svm_free_pasid() .

Fixes: 4048377414 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers")
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210813181345.1870742-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210817124321.1517985-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18 13:15:58 +02:00
Lu Baolu
75cc1018a9 iommu/vt-d: Move clflush'es from iotlb_sync_map() to map_pages()
As the Intel VT-d driver has switched to use the iommu_ops.map_pages()
callback, multiple pages of the same size will be mapped in a call.
There's no need to put the clflush'es in iotlb_sync_map() callback.
Move them back into __domain_mapping() to simplify the code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210720020615.4144323-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26 13:56:25 +02:00
Lu Baolu
3f34f12597 iommu/vt-d: Implement map/unmap_pages() iommu_ops callback
Implement the map_pages() and unmap_pages() callback for the Intel IOMMU
driver to allow calls from iommu core to map and unmap multiple pages of
the same size in one call. With map/unmap_pages() implemented, the prior
map/unmap callbacks are deprecated.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210720020615.4144323-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26 13:56:25 +02:00
Lu Baolu
a886d5a7e6 iommu/vt-d: Report real pgsize bitmap to iommu core
The pgsize bitmap is used to advertise the page sizes our hardware supports
to the IOMMU core, which will then use this information to split physically
contiguous memory regions it is mapping into page sizes that we support.

Traditionally the IOMMU core just handed us the mappings directly, after
making sure the size is an order of a 4KiB page and that the mapping has
natural alignment. To retain this behavior, we currently advertise that we
support all page sizes that are an order of 4KiB.

We are about to utilize the new IOMMU map/unmap_pages APIs. We could change
this to advertise the real page sizes we support.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210720020615.4144323-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26 13:56:25 +02:00
John Garry
308723e358 iommu: Remove mode argument from iommu_set_dma_strict()
We only ever now set strict mode enabled in iommu_set_dma_strict(), so
just remove the argument.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-7-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26 13:27:38 +02:00
Zhen Lei
d0e108b8e9 iommu/vt-d: Add support for IOMMU default DMA mode build options
Make IOMMU_DEFAULT_LAZY default for when INTEL_IOMMU config is set,
as is current behaviour.

Also delete global flag intel_iommu_strict:
- In intel_iommu_setup(), call iommu_set_dma_strict(true) directly. Also
  remove the print, as iommu_subsys_init() prints the mode and we have
  already marked this param as deprecated.

- For cap_caching_mode() check in intel_iommu_setup(), call
  iommu_set_dma_strict(true) directly; also reword the accompanying print
  with a level downgrade and also add the missing '\n'.

- For Ironlake GPU, again call iommu_set_dma_strict(true) directly and
  keep the accompanying print.

[jpg: Remove intel_iommu_strict]

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-5-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26 13:27:38 +02:00
John Garry
1d479f160c iommu: Deprecate Intel and AMD cmdline methods to enable strict mode
Now that the x86 drivers support iommu.strict, deprecate the custom
methods.

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-2-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26 13:27:38 +02:00
Lu Baolu
474dd1c650 iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries
The commit 2b0140c696 ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
fixes an issue of "sub-device is removed where the context entry is cleared
for all aliases". But this commit didn't consider the PASID entry and PASID
table in VT-d scalable mode. This fix increases the coverage of scalable
mode.

Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Fixes: 8038bdb855 ("iommu/vt-d: Only clear real DMA device's context entries")
Fixes: 2b0140c696 ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
Cc: stable@vger.kernel.org # v5.6+
Cc: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210712071712.3416949-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-14 12:58:07 +02:00
Sanjay Kumar
37764b952e iommu/vt-d: Global devTLB flush when present context entry changed
This fixes a bug in context cache clear operation. The code was not
following the correct invalidation flow. A global device TLB invalidation
should be added after the IOTLB invalidation. At the same time, it
uses the domain ID from the context entry. But in scalable mode, the
domain ID is in PASID table entry, not context entry.

Fixes: 7373a8cc38 ("iommu/vt-d: Setup context and enable RID2PASID support")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210712071315.3416543-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-14 12:57:39 +02:00
Joerg Roedel
2b9d8e3e9a Merge branches 'iommu/fixes', 'arm/rockchip', 'arm/smmu', 'x86/vt-d', 'x86/amd', 'virtio' and 'core' into next 2021-06-25 15:23:25 +02:00
Jean-Philippe Brucker
ac6d704679 iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()
Passing a 64-bit address width to iommu_setup_dma_ops() is valid on
virtual platforms, but isn't currently possible. The overflow check in
iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass
a limit address instead of a size, so callers don't have to fake a size
to work around the check.

The base and limit parameters are being phased out, because:
* they are redundant for x86 callers. dma-iommu already reserves the
  first page, and the upper limit is already in domain->geometry.
* they can now be obtained from dev->dma_range_map on Arm.
But removing them on Arm isn't completely straightforward so is left for
future work. As an intermediate step, simplify the x86 callers by
passing dummy limits.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20210618152059.1194210-5-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-25 15:02:43 +02:00
Colin Ian King
934ed4580c iommu/vt-d: Fix dereference of pointer info before it is null checked
The assignment of iommu from info->iommu occurs before info is null checked
hence leading to a potential null pointer dereference issue. Fix this by
assigning iommu and checking if iommu is null after null checking info.

Addresses-Coverity: ("Dereference before null check")
Fixes: 4c82b88696 ("iommu/vt-d: Allocate/register iopf queue for sva devices")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210611135024.32781-1-colin.king@canonical.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-18 15:19:50 +02:00
Joerg Roedel
d6a9642bd6 iommu/vt-d: Fix linker error on 32-bit
A recent commit broke the build on 32-bit x86. The linker throws these
messages:

	ld: drivers/iommu/intel/perf.o: in function `dmar_latency_snapshot':
	perf.c:(.text+0x40c): undefined reference to `__udivdi3'
	ld: perf.c:(.text+0x458): undefined reference to `__udivdi3'

The reason are the 64-bit divides in dmar_latency_snapshot(). Use the
div_u64() helper function for those.

Fixes: 55ee5e67a5 ("iommu/vt-d: Add common code for dmar latency performance monitors")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20210610083120.29224-1-joro@8bytes.org
2021-06-10 10:33:14 +02:00
Parav Pandit
7a0f06c197 iommu/vt-d: No need to typecast
Page directory assignment by alloc_pgtable_page() or phys_to_virt()
doesn't need typecasting as both routines return void*. Hence, remove
typecasting from both the calls.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-24-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:14 +02:00
Parav Pandit
cee57d4fe7 iommu/vt-d: Remove unnecessary braces
No need for braces for single line statement under if() block.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-22-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:14 +02:00
Parav Pandit
74f6d776ae iommu/vt-d: Removed unused iommu_count in dmar domain
DMAR domain uses per DMAR refcount. It is indexed by iommu seq_id.
Older iommu_count is only incremented and decremented but no decisions
are taken based on this refcount. This is not of much use.

Hence, remove iommu_count and further simplify domain_detach_iommu()
by returning void.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-21-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:14 +02:00
Parav Pandit
1f106ff0ea iommu/vt-d: Use bitfields for DMAR capabilities
IOTLB device presence, iommu coherency and snooping are boolean
capabilities. Use them as bits and keep them adjacent.

Structure layout before the reorg.
$ pahole -C dmar_domain drivers/iommu/intel/dmar.o
struct dmar_domain {
        int                        nid;                  /*     0     4 */
        unsigned int               iommu_refcnt[128];    /*     4   512 */
        /* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */
        u16                        iommu_did[128];       /*   516   256 */
        /* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */
        bool                       has_iotlb_device;     /*   772     1 */

        /* XXX 3 bytes hole, try to pack */

        struct list_head           devices;              /*   776    16 */
        struct list_head           subdevices;           /*   792    16 */
        struct iova_domain         iovad __attribute__((__aligned__(8)));
							 /*   808  2320 */
        /* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */
        struct dma_pte *           pgd;                  /*  3128     8 */
        /* --- cacheline 49 boundary (3136 bytes) --- */
        int                        gaw;                  /*  3136     4 */
        int                        agaw;                 /*  3140     4 */
        int                        flags;                /*  3144     4 */
        int                        iommu_coherency;      /*  3148     4 */
        int                        iommu_snooping;       /*  3152     4 */
        int                        iommu_count;          /*  3156     4 */
        int                        iommu_superpage;      /*  3160     4 */

        /* XXX 4 bytes hole, try to pack */

        u64                        max_addr;             /*  3168     8 */
        u32                        default_pasid;        /*  3176     4 */

        /* XXX 4 bytes hole, try to pack */

        struct iommu_domain        domain;               /*  3184    72 */

        /* size: 3256, cachelines: 51, members: 18 */
        /* sum members: 3245, holes: 3, sum holes: 11 */
        /* forced alignments: 1 */
        /* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));

After arranging it for natural padding and to make flags as u8 bits, it
saves 8 bytes for the struct.

struct dmar_domain {
        int                        nid;                  /*     0     4 */
        unsigned int               iommu_refcnt[128];    /*     4   512 */
        /* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */
        u16                        iommu_did[128];       /*   516   256 */
        /* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */
        u8                         has_iotlb_device:1;   /*   772: 0  1 */
        u8                         iommu_coherency:1;    /*   772: 1  1 */
        u8                         iommu_snooping:1;     /*   772: 2  1 */

        /* XXX 5 bits hole, try to pack */
        /* XXX 3 bytes hole, try to pack */

        struct list_head           devices;              /*   776    16 */
        struct list_head           subdevices;           /*   792    16 */
        struct iova_domain         iovad __attribute__((__aligned__(8)));
							 /*   808  2320 */
        /* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */
        struct dma_pte *           pgd;                  /*  3128     8 */
        /* --- cacheline 49 boundary (3136 bytes) --- */
        int                        gaw;                  /*  3136     4 */
        int                        agaw;                 /*  3140     4 */
        int                        flags;                /*  3144     4 */
        int                        iommu_count;          /*  3148     4 */
        int                        iommu_superpage;      /*  3152     4 */

        /* XXX 4 bytes hole, try to pack */

        u64                        max_addr;             /*  3160     8 */
        u32                        default_pasid;        /*  3168     4 */

        /* XXX 4 bytes hole, try to pack */

        struct iommu_domain        domain;               /*  3176    72 */

        /* size: 3248, cachelines: 51, members: 18 */
        /* sum members: 3236, holes: 3, sum holes: 11 */
        /* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */
        /* forced alignments: 1 */
        /* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com
Link: https://lore.kernel.org/r/20210610020115.1637656-20-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
YueHaibing
3bc770b0e9 iommu/vt-d: Use DEVICE_ATTR_RO macro
Use DEVICE_ATTR_RO() helper instead of plain DEVICE_ATTR(),
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210528130229.22108-1-yuehaibing@huawei.com
Link: https://lore.kernel.org/r/20210610020115.1637656-19-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Gustavo A. R. Silva
606636dcbd iommu/vt-d: Fix out-bounds-warning in intel/svm.c
Replace a couple of calls to memcpy() with simple assignments in order
to fix the following out-of-bounds warning:

drivers/iommu/intel/svm.c:1198:4: warning: 'memcpy' offset [25, 32] from
    the object at 'desc' is out of the bounds of referenced subobject
    'qw2' with type 'long long unsigned int' at offset 16 [-Warray-bounds]

The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). This causes a legitimate compiler warning because memcpy()
overruns the length of &desc.qw2 and &resp.qw2, respectively.

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210414201403.GA392764@embeddedor
Link: https://lore.kernel.org/r/20210610020115.1637656-18-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Lu Baolu
0f4834ab25 iommu/vt-d: Add PRQ handling latency sampling
The execution time for page fault request handling is performance critical
and needs to be monitored. This adds code to sample the execution time of
page fault request handling.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-17-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Lu Baolu
74eb87a0f9 iommu/vt-d: Add cache invalidation latency sampling
Queued invalidation execution time is performance critical and needs
to be monitored. This adds code to sample the execution time of IOTLB/
devTLB/ICE cache invalidation.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-16-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Lu Baolu
456bb0b97f iommu/vt-d: Expose latency monitor data through debugfs
A debugfs interface /sys/kernel/debug/iommu/intel/dmar_perf_latency is
created to control and show counts of execution time ranges for various
types per DMAR. The interface may help debug any potential performance
issue.

By default, the interface is disabled.

Possible write value of /sys/kernel/debug/iommu/intel/dmar_perf_latency
  0 - disable sampling all latency data
  1 - enable sampling IOTLB invalidation latency data
  2 - enable sampling devTLB invalidation latency data
  3 - enable sampling intr entry cache invalidation latency data
  4 - enable sampling prq handling latency data

Read /sys/kernel/debug/iommu/intel/dmar_perf_latency gives a snapshot
of sampling result of all enabled monitors.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-15-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Lu Baolu
55ee5e67a5 iommu/vt-d: Add common code for dmar latency performance monitors
The execution time of some operations is very performance critical, such
as cache invalidation and PRQ processing time. This adds some common code
to monitor the execution time range of those operations. The interfaces
include enabling/disabling, checking status, updating sampling data and
providing a common string format for users.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-14-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Lu Baolu
e93a67f5a0 iommu/vt-d: Add prq_report trace event
This adds a new trace event to track the page fault request report.
This event will provide almost all information defined in a page
request descriptor.

A sample output:
| prq_report: dmar0/0000:00:0a.0 seq# 1: rid=0x50 addr=0x559ef6f97 r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 2: rid=0x50 addr=0x559ef6f9c rw--l pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 3: rid=0x50 addr=0x559ef6f98 r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 4: rid=0x50 addr=0x559ef6f9d rw--l pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 5: rid=0x50 addr=0x559ef6f99 r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 6: rid=0x50 addr=0x559ef6f9e rw--l pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 7: rid=0x50 addr=0x559ef6f9a r---- pasid=0x2 index=0x1
| prq_report: dmar0/0000:00:0a.0 seq# 8: rid=0x50 addr=0x559ef6f9f rw--l pasid=0x2 index=0x1

This will be helpful for I/O page fault related debugging.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-13-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Lu Baolu
d5b9e4bfe0 iommu/vt-d: Report prq to io-pgfault framework
Let the IO page fault requests get handled through the io-pgfault
framework.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Lu Baolu
4c82b88696 iommu/vt-d: Allocate/register iopf queue for sva devices
This allocates and registers the iopf queue infrastructure for devices
which want to support IO page fault for SVA.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-11-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Lu Baolu
ae7f09b14b iommu/vt-d: Refactor prq_event_thread()
Refactor prq_event_thread() by moving handling single prq event out of
the main loop.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Lu Baolu
9e52cc0fed iommu/vt-d: Use common helper to lookup svm devices
It's common to iterate the svm device list and find a matched device. Add
common helpers to do this and consolidate the code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:13 +02:00
Lu Baolu
4048377414 iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers
Align the pasid alloc/free code with the generic helpers defined in the
iommu core. This also refactored the SVA binding code to improve the
readability.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:12 +02:00
Lu Baolu
100b8a14a3 iommu/vt-d: Add pasid private data helpers
We are about to use iommu_sva_alloc/free_pasid() helpers in iommu core.
That means the pasid life cycle will be managed by iommu core. Use a
local array to save the per pasid private data instead of attaching it
the real pasid.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:12 +02:00
Lu Baolu
521f546b4e iommu/vt-d: Support asynchronous IOMMU nested capabilities
Current VT-d implementation supports nested translation only if all
underlying IOMMUs support the nested capability. This is unnecessary
as the upper layer is allowed to create different containers and set
them with different type of iommu backend. The IOMMU driver needs to
guarantee that devices attached to a nested mode iommu_domain should
support nested capabilility.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210517065701.5078-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:12 +02:00
Lu Baolu
879fcc6bda iommu/vt-d: Select PCI_ATS explicitly
The Intel VT-d implementation supports device TLB management. Select
PCI_ATS explicitly so that the pci_ats helpers are always available.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210512065313.3441309-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:12 +02:00
Lu Baolu
719a193356 iommu/vt-d: Tweak the description of a DMA fault
The Intel IOMMU driver reports the DMA fault reason in a decimal number
while the VT-d specification uses a hexadecimal one. It's inconvenient
that users need to covert them everytime before consulting the spec.
Let's use hexadecimal number for a DMA fault reason.

The fault message uses 0xffffffff as PASID for DMA requests w/o PASID.
This is confusing. Tweak this by adding "NO_PASID" explicitly.

Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210517065425.4953-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210610020115.1637656-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:12 +02:00
Aditya Srivastava
367f82de5a iommu/vt-d: Fix kernel-doc syntax in file header
The opening comment mark '/**' is used for highlighting the beginning of
kernel-doc comments.
The header for drivers/iommu/intel/pasid.c follows this syntax, but
the content inside does not comply with kernel-doc.

This line was probably not meant for kernel-doc parsing, but is parsed
due to the presence of kernel-doc like comment syntax(i.e, '/**'), which
causes unexpected warnings from kernel-doc:
warning: Function parameter or member 'fmt' not described in 'pr_fmt'

Provide a simple fix by replacing this occurrence with general comment
format, i.e. '/*', to prevent kernel-doc from parsing it.

Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210523143245.19040-1-yashsri421@gmail.com
Link: https://lore.kernel.org/r/20210610020115.1637656-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:12 +02:00
Colin Ian King
05d2cbf969 iommu/vt-d: Remove redundant assignment to variable agaw
The variable agaw is initialized with a value that is never read and it
is being updated later with a new value as a counter in a for-loop. The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210416171826.64091-1-colin.king@canonical.com
Link: https://lore.kernel.org/r/20210610020115.1637656-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10 09:06:12 +02:00
Rolf Eike Beer
0ee74d5a48 iommu/vt-d: Fix sysfs leak in alloc_iommu()
iommu_device_sysfs_add() is called before, so is has to be cleaned on subsequent
errors.

Fixes: 39ab9555c2 ("iommu: Add sysfs bindings for struct iommu_device")
Cc: stable@vger.kernel.org # 4.11.x
Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/17411490.HIIP88n32C@mobilepool36.emlix.com
Link: https://lore.kernel.org/r/20210525070802.361755-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-05-27 16:07:08 +02:00
Lu Baolu
54c80d9074 iommu/vt-d: Use user privilege for RID2PASID translation
When first-level page tables are used for IOVA translation, we use user
privilege by setting U/S bit in the page table entry. This is to make it
consistent with the second level translation, where the U/S enforcement
is not available. Clear the SRE (Supervisor Request Enable) field in the
pasid table entry of RID2PASID so that requests requesting the supervisor
privilege are blocked and treated as DMA remapping faults.

Fixes: b802d070a5 ("iommu/vt-d: Use iova over first level")
Suggested-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210512064426.3440915-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210519015027.108468-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-05-19 08:51:02 +02:00
Dan Carpenter
1a590a1c8b iommu/vt-d: Check for allocation failure in aux_detach_device()
In current kernels small allocations never fail, but checking for
allocation failure is the correct thing to do.

Fixes: 18abda7a2d ("iommu/vt-d: Fix general protection fault in aux_detach_device()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/YJuobKuSn81dOPLd@mwanda
Link: https://lore.kernel.org/r/20210519015027.108468-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-05-19 08:51:02 +02:00
Linus Torvalds
57151b502c pci-v5.13-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmCRp48UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwsVRAAsIYueNKzZczpkeQwHigYzf4HLdKm
 yyT2c/Zlj9REAUOe7ApkowVAJWiMGDJP0J361KIluAGvAxnkMP1V6WlVdByorYd0
 CrXc/UhD//cs+3QDo4SmJRHyL8q5QQTDa8Z/8seVJUYTR/t5OhSpMOuEJPhpeQ1s
 nqUk0yWNJRoN6wn6T/7KqgYEvPhARXo9epuWy5MNPZ5f8E7SRi/QG/6hP8/YOLpK
 A+8beIOX5LAvUJaXxEovwv5UQnSUkeZTGDyRietQYE6xXNeHPKCvZ7vDjjSE7NOW
 mIodD6JcG3n/riYV3sMA5PKDZgsPI3P/qJU6Y6vWBBYOaO/kQX/c7CZ+M2bcZay4
 mh1dW0vOqoTy/pAVwQB2aq08Rrg2SAskpNdeyzduXllmuTyuwCMPXzG4RKmbQ8I1
 qMFb8qOyNulRAWcTKgSMKByEQYASQsFA5yShtaba6h0+vqrseuP6hchBKKOEan8F
 9THTI3ZflKwRvGjkI0MDbp0z0+wPYmNhrcZDpAJ3bEltw58E8TL/9aBtuhajmo8+
 wJ64mZclFuMmSyhsfkAXOvjeKXMlEBaw7vinZGbcACmv4ZGI0MV7r4vVYQbQltcy
 myzB6xJxcWB8N07UpKpUbsGMb9JjTUPlaT36eZNvUZQDntrE1ljt8RSq3nphDrcD
 KmBRU8ru74I2RE0=
 =WvTD
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:
   - Release OF node when pci_scan_device() fails (Dmitry Baryshkov)
   - Add pci_disable_parity() (Bjorn Helgaas)
   - Disable Mellanox Tavor parity reporting (Heiner Kallweit)
   - Disable N2100 r8169 parity reporting (Heiner Kallweit)
   - Fix RCiEP device to RCEC association (Qiuxu Zhuo)
   - Convert sysfs "config", "rom", "reset", "label", "index",
     "acpi_index" to static attributes to help fix races in device
     enumeration (Krzysztof Wilczyński)
   - Convert sysfs "vpd" to static attribute (Heiner Kallweit, Krzysztof
     Wilczyński)
   - Use sysfs_emit() in "show" functions (Krzysztof Wilczyński)
   - Remove unused alloc_pci_root_info() return value (Krzysztof
     Wilczyński)

  PCI device hotplug:
   - Fix acpiphp reference count leak (Feilong Lin)

  Power management:
   - Fix acpi_pci_set_power_state() debug message (Rafael J. Wysocki)
   - Fix runtime PM imbalance (Dinghao Liu)

  Virtualization:
   - Increase delay after FLR to work around Intel DC P4510 NVMe erratum
     (Raphael Norwitz)

  MSI:
   - Convert rcar, tegra, xilinx to MSI domains (Marc Zyngier)
   - For rcar, xilinx, use controller address as MSI doorbell (Marc
     Zyngier)
   - Remove unused hv msi_controller struct (Marc Zyngier)
   - Remove unused PCI core msi_controller support (Marc Zyngier)
   - Remove struct msi_controller altogether (Marc Zyngier)
   - Remove unused default_teardown_msi_irqs() (Marc Zyngier)
   - Let host bridges declare their reliance on MSI domains (Marc
     Zyngier)
   - Make pci_host_common_probe() declare its reliance on MSI domains
     (Marc Zyngier)
   - Advertise mediatek lack of built-in MSI handling (Thomas Gleixner)
   - Document ways of ending up with NO_MSI (Marc Zyngier)
   - Refactor HT advertising of NO_MSI flag (Marc Zyngier)

  VPD:
   - Remove obsolete Broadcom NIC VPD length-limiting quirk (Heiner
     Kallweit)
   - Remove sysfs VPD size checking dead code (Heiner Kallweit)
   - Convert VPF sysfs file to static attribute (Heiner Kallweit)
   - Remove unnecessary pci_set_vpd_size() (Heiner Kallweit)
   - Tone down "missing VPD" message (Heiner Kallweit)

  Endpoint framework:
   - Fix NULL pointer dereference when epc_features not implemented
     (Shradha Todi)
   - Add missing destroy_workqueue() in endpoint test (Yang Yingliang)

  Amazon Annapurna Labs PCIe controller driver:
   - Fix compile testing without CONFIG_PCI_ECAM (Arnd Bergmann)
   - Fix "no symbols" warnings when compile testing with
     CONFIG_TRIM_UNUSED_KSYMS (Arnd Bergmann)

  APM X-Gene PCIe controller driver:
   - Fix cfg resource mapping regression (Dejin Zheng)

  Broadcom iProc PCIe controller driver:
   - Return zero for success of iproc_msi_irq_domain_alloc() (Pali
     Rohár)

  Broadcom STB PCIe controller driver:
   - Add reset_control_rearm() stub for !CONFIG_RESET_CONTROLLER (Jim
     Quinlan)
   - Fix use of BCM7216 reset controller (Jim Quinlan)
   - Use reset/rearm for Broadcom STB pulse reset instead of
     deassert/assert (Jim Quinlan)
   - Fix brcm_pcie_probe() error return for unsupported revision (Wei
     Yongjun)

  Cavium ThunderX PCIe controller driver:
   - Fix compile testing (Arnd Bergmann)
   - Fix "no symbols" warnings when compile testing with
     CONFIG_TRIM_UNUSED_KSYMS (Arnd Bergmann)

  Freescale Layerscape PCIe controller driver:
   - Fix ls_pcie_ep_probe() syntax error (comma for semicolon)
     (Krzysztof Wilczyński)
   - Remove layerscape-gen4 dependencies on OF and ARM64, add dependency
     on ARCH_LAYERSCAPE (Geert Uytterhoeven)

  HiSilicon HIP PCIe controller driver:
   - Remove obsolete HiSilicon PCIe DT description (Dongdong Liu)

  Intel Gateway PCIe controller driver:
   - Remove unused pcie_app_rd() (Jiapeng Chong)

  Intel VMD host bridge driver:
   - Program IRTE with Requester ID of VMD endpoint, not child device
     (Jon Derrick)
   - Disable VMD MSI-X remapping when possible so children can use more
     MSI-X vectors (Jon Derrick)

  MediaTek PCIe controller driver:
   - Configure FC and FTS for functions other than 0 (Ryder Lee)
   - Add YAML schema for MediaTek (Jianjun Wang)
   - Export pci_pio_to_address() for module use (Jianjun Wang)
   - Add MediaTek MT8192 PCIe controller driver (Jianjun Wang)
   - Add MediaTek MT8192 INTx support (Jianjun Wang)
   - Add MediaTek MT8192 MSI support (Jianjun Wang)
   - Add MediaTek MT8192 system power management support (Jianjun Wang)
   - Add missing MODULE_DEVICE_TABLE (Qiheng Lin)

  Microchip PolarFlare PCIe controller driver:
   - Make several symbols static (Wei Yongjun)

  NVIDIA Tegra PCIe controller driver:
   - Add MCFG quirks for Tegra194 ECAM errata (Vidya Sagar)
   - Make several symbols const (Rikard Falkeborn)
   - Fix Kconfig host/endpoint typo (Wesley Sheng)

  SiFive FU740 PCIe controller driver:
   - Add pcie_aux clock to prci driver (Greentime Hu)
   - Use reset-simple in prci driver for PCIe (Greentime Hu)
   - Add SiFive FU740 PCIe host controller driver and DT binding (Paul
     Walmsley, Greentime Hu)

  Synopsys DesignWare PCIe controller driver:
   - Move MSI Receiver init to dw_pcie_host_init() so it is
     re-initialized along with the RC in resume (Jisheng Zhang)
   - Move iATU detection earlier to fix regression (Hou Zhiqiang)

  TI J721E PCIe driver:
   - Add DT binding and TI j721e support for refclk to PCIe connector
     (Kishon Vijay Abraham I)
   - Add host mode and endpoint mode DT bindings for TI AM64 SoC (Kishon
     Vijay Abraham I)

  TI Keystone PCIe controller driver:
   - Use generic config accessors for TI AM65x (K3) to fix regression
     (Kishon Vijay Abraham I)

  Xilinx NWL PCIe controller driver:
   - Add support for coherent PCIe DMA traffic using CCI (Bharat Kumar
     Gogada)
   - Add optional "dma-coherent" DT property (Bharat Kumar Gogada)

  Miscellaneous:
   - Fix kernel-doc warnings (Krzysztof Wilczyński)
   - Remove unused MicroGate SyncLink device IDs (Jiri Slaby)
   - Remove redundant dev_err() for devm_ioremap_resource() failure
     (Chen Hui)
   - Remove redundant initialization (Colin Ian King)
   - Drop redundant dev_err() for platform_get_irq() errors (Krzysztof
     Wilczyński)"

* tag 'pci-v5.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (98 commits)
  riscv: dts: Add PCIe support for the SiFive FU740-C000 SoC
  PCI: fu740: Add SiFive FU740 PCIe host controller driver
  dt-bindings: PCI: Add SiFive FU740 PCIe host controller
  MAINTAINERS: Add maintainers for SiFive FU740 PCIe driver
  clk: sifive: Use reset-simple in prci driver for PCIe driver
  clk: sifive: Add pcie_aux clock in prci driver for PCIe driver
  PCI: brcmstb: Use reset/rearm instead of deassert/assert
  ata: ahci_brcm: Fix use of BCM7216 reset controller
  reset: add missing empty function reset_control_rearm()
  PCI: Allow VPD access for QLogic ISP2722
  PCI/VPD: Add helper pci_get_func0_dev()
  PCI/VPD: Remove pci_vpd_find_tag() SRDT handling
  PCI/VPD: Remove pci_vpd_find_tag() 'offset' argument
  PCI/VPD: Change pci_vpd_init() return type to void
  PCI/VPD: Make missing VPD message less alarming
  PCI/VPD: Remove pci_set_vpd_size()
  x86/PCI: Remove unused alloc_pci_root_info() return value
  MAINTAINERS: Add Jianjun Wang as MediaTek PCI co-maintainer
  PCI: mediatek-gen3: Add system PM support
  PCI: mediatek-gen3: Add MSI support
  ...
2021-05-05 13:24:11 -07:00
Robin Murphy
2d471b20c5 iommu: Streamline registration interface
Rather than have separate opaque setter functions that are easy to
overlook and lead to repetitive boilerplate in drivers, let's pass the
relevant initialisation parameters directly to iommu_device_register().

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ab001b87c533b6f4db71eb90db6f888953986c36.1617285386.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-16 17:20:45 +02:00
Joerg Roedel
49d11527e5 Merge branches 'iommu/fixes', 'arm/mediatek', 'arm/smmu', 'arm/exynos', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next 2021-04-16 17:16:03 +02:00
Longpeng(Mike)
38c527aeb4 iommu/vt-d: Force to flush iotlb before creating superpage
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.

1.mmap(4GB,MAP_HUGETLB)
2.
  while (1) {
   (a)    DMA MAP   0,0xa0000
   (b)    DMA UNMAP 0,0xa0000
   (c)    DMA MAP   0,0xc0000000
             * DMA read IOVA 0 may failure here (Not present)
             * if the problem occurs.
   (d)    DMA UNMAP 0,0xc0000000
  }

The page table(only focus on IOVA 0) after (a) is:
 PML4: 0x19db5c1003   entry:0xffff899bdcd2f000
  PDPE: 0x1a1cacb003  entry:0xffff89b35b5c1000
   PDE: 0x1a30a72003  entry:0xffff89b39cacb000
    PTE: 0x21d200803  entry:0xffff89b3b0a72000

The page table after (b) is:
 PML4: 0x19db5c1003   entry:0xffff899bdcd2f000
  PDPE: 0x1a1cacb003  entry:0xffff89b35b5c1000
   PDE: 0x1a30a72003  entry:0xffff89b39cacb000
    PTE: 0x0          entry:0xffff89b3b0a72000

The page table after (c) is:
 PML4: 0x19db5c1003   entry:0xffff899bdcd2f000
  PDPE: 0x1a1cacb003  entry:0xffff89b35b5c1000
   PDE: 0x21d200883   entry:0xffff89b39cacb000 (*)

Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.

However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:

1. PDE: 0x1a30a72003
2. __domain_mapping
     dma_pte_free_pagetable
       Set the PDE entry to ZERO
     Set the PDE entry to 0x21d200883

So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Gonglei (Arei) <arei.gonglei@huawei.com>

Fixes: 6491d4d028 ("intel-iommu: Free old page tables before creating superpage")
Cc: <stable@vger.kernel.org> # v3.0+
Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@huawei.com/
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210415004628.1779-1-longpeng2@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-15 16:12:31 +02:00
Christophe JAILLET
745610c4a3 iommu/vt-d: Fix an error handling path in 'intel_prepare_irq_remapping()'
If 'intel_cap_audit()' fails, we should return directly, as already done in
the surrounding error handling path.

Fixes: ad3d190299 ("iommu/vt-d: Audit IOMMU Capabilities and add helper functions")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/98d531caabe66012b4fffc7813fd4b9470afd517.1618124777.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-15 15:44:38 +02:00
Lu Baolu
906f86c860 iommu/vt-d: Fix build error of pasid_enable_wpe() with !X86
Commit f68c7f539b ("iommu/vt-d: Enable write protect for supervisor
SVM") added pasid_enable_wpe() which hit below compile error with !X86.

../drivers/iommu/intel/pasid.c: In function 'pasid_enable_wpe':
../drivers/iommu/intel/pasid.c:554:22: error: implicit declaration of function 'read_cr0' [-Werror=implicit-function-declaration]
  554 |  unsigned long cr0 = read_cr0();
      |                      ^~~~~~~~
In file included from ../include/linux/build_bug.h:5,
                 from ../include/linux/bits.h:22,
                 from ../include/linux/bitops.h:6,
                 from ../drivers/iommu/intel/pasid.c:12:
../drivers/iommu/intel/pasid.c:557:23: error: 'X86_CR0_WP' undeclared (first use in this function)
  557 |  if (unlikely(!(cr0 & X86_CR0_WP))) {
      |                       ^~~~~~~~~~
../include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
   78 | # define unlikely(x) __builtin_expect(!!(x), 0)
      |                                          ^
../drivers/iommu/intel/pasid.c:557:23: note: each undeclared identifier is reported only once for each function it appears in
  557 |  if (unlikely(!(cr0 & X86_CR0_WP))) {
      |                       ^~~~~~~~~~
../include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
   78 | # define unlikely(x) __builtin_expect(!!(x), 0)
      |

Add the missing dependency.

Cc: Sanjay Kumar <sanjay.k.kumar@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: f68c7f539b ("iommu/vt-d: Enable write protect for supervisor SVM")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20210411062312.3057579-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-15 15:43:20 +02:00
Lu Baolu
8b74b6ab25 iommu/vt-d: Avoid unnecessary cache flush in pasid entry teardown
When a present pasid entry is disassembled, all kinds of pasid related
caches need to be flushed. But when a pasid entry is not being used
(PRESENT bit not set), we don't need to do this. Check the PRESENT bit
in intel_pasid_tear_down_entry() and avoid flushing caches if it's not
set.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320025415.641201-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 11:55:47 +02:00
Lu Baolu
c0474a606e iommu/vt-d: Invalidate PASID cache when root/context entry changed
When the Intel IOMMU is operating in the scalable mode, some information
from the root and context table may be used to tag entries in the PASID
cache. Software should invalidate the PASID-cache when changing root or
context table entries.

Suggested-by: Ashok Raj <ashok.raj@intel.com>
Fixes: 7373a8cc38 ("iommu/vt-d: Setup context and enable RID2PASID support")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320025415.641201-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 11:55:47 +02:00
Lu Baolu
eea53c5816 iommu/vt-d: Remove WO permissions on second-level paging entries
When the first level page table is used for IOVA translation, it only
supports Read-Only and Read-Write permissions. The Write-Only permission
is not supported as the PRESENT bit (implying Read permission) should
always set. When using second level, we still give separate permissions
that allows WriteOnly which seems inconsistent and awkward. We want to
have consistent behavior. After moving to 1st level, we don't want things
to work sometimes, and break if we use 2nd level for the same mappings.
Hence remove this configuration.

Suggested-by: Ashok Raj <ashok.raj@intel.com>
Fixes: b802d070a5 ("iommu/vt-d: Use iova over first level")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320025415.641201-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 11:55:47 +02:00
Lu Baolu
03d205094a iommu/vt-d: Report the right page fault address
The Address field of the Page Request Descriptor only keeps bit [63:12]
of the offending address. Convert it to a full address before reporting
it to device drivers.

Fixes: eb8d93ea3c ("iommu/vt-d: Report page request faults for guest SVA")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320025415.641201-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 11:55:46 +02:00
Robin Murphy
a250c23f15 iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

[ported on top of the other iommu_attr changes and added a few small
 missing bits]

Signed-off-by: Robin Murphy <robin.murphy@arm.com>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210401155256.298656-19-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:56:53 +02:00
Christoph Hellwig
7e14754778 iommu: remove DOMAIN_ATTR_NESTING
Use an explicit enable_nesting method instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-17-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:56:53 +02:00
Jean-Philippe Brucker
9003351cb6 iommu/vt-d: Support IOMMU_DEV_FEAT_IOPF
Allow drivers to query and enable IOMMU_DEV_FEAT_IOPF, which amounts to
checking whether PRI is enabled.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20210401154718.307519-5-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:54:29 +02:00
Lu Baolu
6c00612d0c iommu/vt-d: Report right snoop capability when using FL for IOVA
The Intel VT-d driver checks wrong register to report snoop capablility
when using first level page table for GPA to HPA translation. This might
lead the IOMMU driver to say that it supports snooping control, but in
reality, it does not. Fix this by always setting PASID-table-entry.PGSNP
whenever a pasid entry is setting up for GPA to HPA translation so that
the IOMMU driver could report snoop capability as long as it runs in the
scalable mode.

Fixes: b802d070a5 ("iommu/vt-d: Use iova over first level")
Suggested-by: Rajesh Sankaran <rajesh.sankaran@intel.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210330021145.13824-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:41:30 +02:00
John Garry
363f266eef iommu/vt-d: Remove IOVA domain rcache flushing for CPU offlining
Now that the core code handles flushing per-IOVA domain CPU rcaches,
remove the handling here.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1616675401-151997-3-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:27:27 +02:00
Lu Baolu
442b81836d iommu/vt-d: Make unnecessarily global functions static
Make some functions static as they are only used inside pasid.c.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323010600.678627-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:15:58 +02:00
Lu Baolu
1b169fdf42 iommu/vt-d: Remove unused function declarations
Some functions have been removed. Remove the remaining function
delarations.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323010600.678627-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:15:49 +02:00
Lu Baolu
06905ea831 iommu/vt-d: Remove SVM_FLAG_PRIVATE_PASID
The SVM_FLAG_PRIVATE_PASID has never been referenced in the tree, and
there's no plan to have anything to use it. So cleanup it.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323010600.678627-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:15:19 +02:00
Lu Baolu
2e1a44c1c4 iommu/vt-d: Remove svm_dev_ops
The svm_dev_ops has never been referenced in the tree, and there's no
plan to have anything to use it. Remove it to make the code neat.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323010600.678627-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:15:19 +02:00