linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-04 10:33:13 +00:00

Author	SHA1	Message	Date
Jason Gunthorpe	7d12eb2d2f	iommu/vt-d: Use ops->blocked_domain Trivially migrate to the ops->blocked_domain for the existing global static. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/3-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-26 16:53:50 +02:00
Jason Gunthorpe	7b6dd84e70	iommu/vt-d: Update the definition of the blocking domain The global static should pre-define the type and the NOP free function can be now left as NULL. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/2-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-26 16:53:50 +02:00
Lu Baolu	03476e687e	iommu/vt-d: Disallow read-only mappings to nest parent domain When remapping hardware is configured by system software in scalable mode as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry, it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled) in first-stage page-table entries even when second-stage mappings indicate that corresponding first-stage page-table is Read-Only. As the result, contents of pages designated by VMM as Read-Only can be modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of address translation process due to DMAs issued by Guest. This disallows read-only mappings in the domain that is supposed to be used as nested parent. Reference from Sapphire Rapids Specification Update [1], errata details, SPR17. Userspace should know this limitation by checking the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the IOMMU_GET_HW_INFO ioctl. [1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html Link: https://lore.kernel.org/r/20231026044216.64964-9-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-26 11:16:34 -03:00
Lu Baolu	b41e38e225	iommu/vt-d: Add nested domain allocation This adds the support for IOMMU_HWPT_DATA_VTD_S1 type. And 'nested_parent' is added to mark the nested parent domain to sanitize the input parent domain. Link: https://lore.kernel.org/r/20231026044216.64964-8-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-26 11:16:34 -03:00
Yi Liu	d86724d4dc	iommu/vt-d: Make domain attach helpers to be extern This makes the helpers visible to nested.c. Link: https://lore.kernel.org/r/20231026044216.64964-6-yi.l.liu@intel.com Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-26 11:16:34 -03:00
Yi Liu	a2cdecdf9d	iommu/vt-d: Enhance capability check for nested parent domain allocation This adds the scalable mode check before allocating the nested parent domain as checking nested capability is not enough. User may turn off scalable mode which also means no nested support even if the hardware supports it. Fixes: `c97d1b20d3` ("iommu/vt-d: Add domain_alloc_user op") Link: https://lore.kernel.org/r/20231024150011.44642-1-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-26 11:16:11 -03:00
Yi Liu	2bdabb8e82	iommu: Pass in parent domain with user_data to domain_alloc_user op domain_alloc_user op already accepts user flags for domain allocation, add a parent domain pointer and a driver specific user data support as well. The user data would be tagged with a type for iommu drivers to add their own driver specific user data per hw_pagetable. Add a struct iommu_user_data as a bundle of data_ptr/data_len/type from an iommufd core uAPI structure. Make the user data opaque to the core, since a userspace driver must match the kernel driver. In the future, if drivers share some common parameter, there would be a generic parameter as well. Link: https://lore.kernel.org/r/20231026043938.63898-7-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-26 11:15:57 -03:00
Joao Martins	f35f22cc76	iommu/vt-d: Access/Dirty bit support for SS domains IOMMU advertises Access/Dirty bits for second-stage page table if the extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS). The first stage table is compatible with CPU page table thus A/D bits are implicitly supported. Relevant Intel IOMMU SDM ref for first stage table "3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table "3.7.2 Accessed and Dirty Flags". First stage page table is enabled by default so it's allowed to set dirty tracking and no control bits needed, it just returns 0. To use SSADS, set bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB via pasid_flush_caches() following the manual. Relevant SDM refs: "3.7.2 Accessed and Dirty Flags" "6.5.3.3 Guidance to Software for Invalidations, Table 23. Guidance to Software for Invalidations" PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus the caller of the iommu op will flush the IOTLB. Relevant manuals over the hardware translation is chapter 6 with some special mention to: "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations" "6.2.4 IOTLB" Select IOMMUFD_DRIVER only if IOMMUFD is enabled, given that IOMMU dirty tracking requires IOMMUFD. Link: https://lore.kernel.org/r/20231024135109.73787-13-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:43 -03:00
Jingqi Liu	d87731f609	iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid} Add a debugfs directory per pair of {device, pasid} if the mappings of its page table are created and destroyed by the iommu_map/unmap() interfaces. i.e. /sys/kernel/debug/iommu/intel/<device source id>/<pasid>. Create a debugfs file in the directory for users to dump the page table corresponding to {device, pasid}. e.g. /sys/kernel/debug/iommu/intel/0000:00:02.0/1/domain_translation_struct. For the default domain without pasid, it creates a debugfs file in the debugfs device directory for users to dump its page table. e.g. /sys/kernel/debug/iommu/intel/0000:00:02.0/domain_translation_struct. When setting a domain to a PASID of device, create a debugfs file in the pasid debugfs directory for users to dump the page table of the specified pasid. Remove the debugfs device directory of the device when releasing a device. e.g. /sys/kernel/debug/iommu/intel/0000:00:01.0 Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20231013135811.73953-3-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-16 09:34:51 +02:00
Yi Liu	c97d1b20d3	iommu/vt-d: Add domain_alloc_user op Add the domain_alloc_user() op implementation. It supports allocating domains to be used as parent under nested translation. Unlike other drivers VT-D uses only a single page table format so it only needs to check if the HW can support nesting. Link: https://lore.kernel.org/r/20230928071528.26258-7-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-10 13:31:24 -03:00
Niklas Schnelle	fa4c450709	iommu: Allow .iotlb_sync_map to fail and handle s390's -ENOMEM return On s390 when using a paging hypervisor, .iotlb_sync_map is used to sync mappings by letting the hypervisor inspect the synced IOVA range and updating a shadow table. This however means that .iotlb_sync_map can fail as the hypervisor may run out of resources while doing the sync. This can be due to the hypervisor being unable to pin guest pages, due to a limit on mapped addresses such as vfio_iommu_type1.dma_entry_limit or lack of other resources. Either way such a failure to sync a mapping should result in a DMA_MAPPING_ERROR. Now especially when running with batched IOTLB flushes for unmap it may be that some IOVAs have already been invalidated but not yet synced via .iotlb_sync_map. Thus if the hypervisor indicates running out of resources, first do a global flush allowing the hypervisor to free resources associated with these mappings as well a retry creating the new mappings and only if that also fails report this error to callers. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> # sun50i Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-1-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-02 08:42:57 +02:00
Zhang Rui	59df44bfb0	iommu/vt-d: Avoid memory allocation in iommu_suspend() The iommu_suspend() syscore suspend callback is invoked with IRQ disabled. Allocating memory with the GFP_KERNEL flag may re-enable IRQs during the suspend callback, which can cause intermittent suspend/hibernation problems with the following kernel traces: Calling iommu_suspend+0x0/0x1d0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 15 at kernel/time/timekeeping.c:868 ktime_get+0x9b/0xb0 ... CPU: 0 PID: 15 Comm: rcu_preempt Tainted: G U E 6.3-intel #r1 RIP: 0010:ktime_get+0x9b/0xb0 ... Call Trace: <IRQ> tick_sched_timer+0x22/0x90 ? __pfx_tick_sched_timer+0x10/0x10 __hrtimer_run_queues+0x111/0x2b0 hrtimer_interrupt+0xfa/0x230 __sysvec_apic_timer_interrupt+0x63/0x140 sysvec_apic_timer_interrupt+0x7b/0xa0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1f/0x30 ... ------------[ cut here ]------------ Interrupts enabled after iommu_suspend+0x0/0x1d0 WARNING: CPU: 0 PID: 27420 at drivers/base/syscore.c:68 syscore_suspend+0x147/0x270 CPU: 0 PID: 27420 Comm: rtcwake Tainted: G U W E 6.3-intel #r1 RIP: 0010:syscore_suspend+0x147/0x270 ... Call Trace: <TASK> hibernation_snapshot+0x25b/0x670 hibernate+0xcd/0x390 state_store+0xcf/0xe0 kobj_attr_store+0x13/0x30 sysfs_kf_write+0x3f/0x50 kernfs_fop_write_iter+0x128/0x200 vfs_write+0x1fd/0x3c0 ksys_write+0x6f/0xf0 __x64_sys_write+0x1d/0x30 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Given that only 4 words memory is needed, avoid the memory allocation in iommu_suspend(). CC: stable@kernel.org Fixes: `33e0715710` ("iommu/vt-d: Avoid GFP_ATOMIC where it is not needed") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Ooi, Chin Hao <chin.hao.ooi@intel.com> Link: https://lore.kernel.org/r/20230921093956.234692-1-rui.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230925120417.55977-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 16:10:36 +02:00
Linus Torvalds	0468be89b3	IOMMU Updates for Linux v6.6 Including: - Core changes: - Consolidate probe_device path - Make the PCI-SAC IOVA allocation trick PCI-only - AMD IOMMU: - Consolidate PPR log handling - Interrupt handling improvements - Refcount fixes for amd_iommu_v2 driver - Intel VT-d driver: - Enable idxd device DMA with pasid through iommu dma ops. - Lift RESV_DIRECT check from VT-d driver to core. - Miscellaneous cleanups and fixes. - ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Allow ASIDs to be configured in the DT to work around Qualcomm's broken hypervisor - Fix clocks for Qualcomm's MSM8998 SoC - SMMUv2: - Support for Qualcomm's legacy firmware implementation featured on at least MSM8956 and MSM8976. - Match compatible strings for Qualcomm SM6350 and SM6375 SoC variants - SMMUv3: - Use 'ida' instead of a bitmap for VMID allocation - Rockchip IOMMU: - Lift page-table allocation restrictions on newer hardware - Mediatek IOMMU: - Add MT8188 IOMMU Support - Renesas IOMMU: - Allow PCIe devices - Usual set of cleanups an smaller fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmTx7IMACgkQK/BELZcB GuMxUA/+P/wYvAKCbDpXyszIpyCTx37BkeRTBaVqG0vEKLG6439i+PIm3oudQK+6 0y+1clJi0Ddu0uv1ck90cIEP1YDuKaKdrOVeE7TtlK+6LKYxTyeN+mz4csMIbahI 6JMrWzrIEPIyMBHzAepQiGDCsmDkrCngPj0WmA7+EQZSSHVYp+TLe6OLzNs74vDF zCITkYNq6aKyg/dNJpMRy6VOHvw9PUiwRvm7ko7WONP4VCtpW4g3Jpkerf19zoV2 s0nwZuGn3o7F0aFOpRJPPKQNfQnNjOjHdxjcsGBafD9qqAk4TLvnZH24njKtPidJ P8CiAu//HxhDyUPTgTIrDroVOGVG7s85XO+WesjPkEI3vnNjXy+qEIinQBJ3oIaI ppDLSnArEhfSRgt6dXvPCJ/g4+WGS9jNV85GCa7XBtal2Msu8G89NKC97mpmjCkb lnGmCF9t7Tkt/fLWxw4GADBN3m2tOib1GQMvPYAF2WM3jH5aRq2UliIRuCHZkzwv EF3SiFQQqab6oogU9tF/A1QLUKQ8QfYOdabqL9z2COgF5tS00VC6b/6VTNkKeBHe qIiOpI7IWo76tFJule5gRaUth9nVkjpEo6kL9I6rEldOlFJrX6uaHTta6/isY3gx vkN98V/OThRUbDwMD122YVKNNjZE2MNsTeptXqB3jHvl3UWiLsQ= =RV+G -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Consolidate probe_device path - Make the PCI-SAC IOVA allocation trick PCI-only AMD IOMMU: - Consolidate PPR log handling - Interrupt handling improvements - Refcount fixes for amd_iommu_v2 driver Intel VT-d driver: - Enable idxd device DMA with pasid through iommu dma ops - Lift RESV_DIRECT check from VT-d driver to core - Miscellaneous cleanups and fixes ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Allow ASIDs to be configured in the DT to work around Qualcomm's broken hypervisor - Fix clocks for Qualcomm's MSM8998 SoC - SMMUv2: - Support for Qualcomm's legacy firmware implementation featured on at least MSM8956 and MSM8976 - Match compatible strings for Qualcomm SM6350 and SM6375 SoC variants - SMMUv3: - Use 'ida' instead of a bitmap for VMID allocation - Rockchip IOMMU: - Lift page-table allocation restrictions on newer hardware - Mediatek IOMMU: - Add MT8188 IOMMU Support - Renesas IOMMU: - Allow PCIe devices .. and the usual set of cleanups an smaller fixes" * tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (64 commits) iommu: Explicitly include correct DT includes iommu/amd: Remove unused declarations iommu/arm-smmu-qcom: Add SM6375 SMMUv2 iommu/arm-smmu-qcom: Add SM6350 DPU compatible iommu/arm-smmu-qcom: Add SM6375 DPU compatible iommu/arm-smmu-qcom: Sort the compatible list alphabetically dt-bindings: arm-smmu: Fix MSM8998 clocks description iommu/vt-d: Remove unused extern declaration dmar_parse_dev_scope() iommu/vt-d: Fix to convert mm pfn to dma pfn iommu/vt-d: Fix to flush cache of PASID directory table iommu/vt-d: Remove rmrr check in domain attaching device path iommu: Prevent RESV_DIRECT devices from blocking domains dmaengine/idxd: Re-enable kernel workqueue under DMA API iommu/vt-d: Add set_dev_pasid callback for dma domain iommu/vt-d: Prepare for set_dev_pasid callback iommu/vt-d: Make prq draining code generic iommu/vt-d: Remove pasid_mutex iommu/vt-d: Add domain_flush_pasid_iotlb() iommu: Move global PASID allocation from SVA to core iommu: Generalize PASID 0 for normal DMA w/o PASID ...	2023-09-01 16:54:25 -07:00
Joerg Roedel	d8fe59f110	Merge branches 'apple/dart', 'arm/mediatek', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next	2023-08-21 14:18:43 +02:00
Yi Liu	55243393b0	iommu/vt-d: Implement hw_info for iommu capability query Add intel_iommu_hw_info() to report cap_reg and ecap_reg information. Link: https://lore.kernel.org/r/20230818101033.4100-6-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-08-18 12:52:15 -03:00
Yanfei Xu	fb5f50a43d	iommu/vt-d: Fix to convert mm pfn to dma pfn For the case that VT-d page is smaller than mm page, converting dma pfn should be handled in two cases which are for start pfn and for end pfn. Currently the calculation of end dma pfn is incorrect and the result is less than real page frame number which is causing the mapping of iova always misses some page frames. Rename the mm_to_dma_pfn() to mm_to_dma_pfn_start() and add a new helper for converting end dma pfn named mm_to_dma_pfn_end(). Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Link: https://lore.kernel.org/r/20230625082046.979742-1-yanfei.xu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-08-09 17:46:19 +02:00
Lu Baolu	d3aedf94f4	iommu/vt-d: Remove rmrr check in domain attaching device path The core code now prevents devices with RMRR regions from being assigned to user space. There is no need to check for this condition in individual drivers. Remove it to avoid duplicate code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230724060352.113458-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-08-09 17:46:18 +02:00
Lu Baolu	7d0c9da6c1	iommu/vt-d: Add set_dev_pasid callback for dma domain This allows the upper layers to set a domain to a PASID of a device if the PASID feature is supported by the IOMMU hardware. The typical use cases are, for example, kernel DMA with PASID and hardware assisted mediated device drivers. The attaching device and pasid information is tracked in a per-domain list and is used for IOTLB and devTLB invalidation. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-8-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-08-09 17:44:39 +02:00
Lu Baolu	37f900e718	iommu/vt-d: Prepare for set_dev_pasid callback The domain_flush_pasid_iotlb() helper function is used to flush the IOTLB entries for a given PASID. Previously, this function assumed that RID2PASID was only used for the first-level DMA translation. However, with the introduction of the set_dev_pasid callback, this assumption is no longer valid. Add a check before using the RID2PASID for PASID invalidation. This check ensures that the domain has been attached to a physical device before using RID2PASID. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-7-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-08-09 17:44:38 +02:00
Lu Baolu	154786235d	iommu/vt-d: Make prq draining code generic Currently draining page requests and responses for a pasid is part of SVA implementation. This is because the driver only supports attaching an SVA domain to a device pasid. As we are about to support attaching other types of domains to a device pasid, the prq draining code becomes generic. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-6-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-08-09 17:44:37 +02:00
Lu Baolu	ac1a3483fe	iommu/vt-d: Add domain_flush_pasid_iotlb() The VT-d spec requires to use PASID-based-IOTLB invalidation descriptor to invalidate IOTLB and the paging-structure caches for a first-stage page table. Add a generic helper to do this. RID2PASID is used if the domain has been attached to a physical device, otherwise real PASIDs that the domain has been attached to will be used. The 'real' PASID attachment is handled in the subsequent change. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-4-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-08-09 17:44:37 +02:00
Jacob Pan	4298780126	iommu: Generalize PASID 0 for normal DMA w/o PASID PCIe Process address space ID (PASID) is used to tag DMA traffic, it provides finer grained isolation than requester ID (RID). For each device/RID, 0 is a special PASID for the normal DMA (no PASID). This is universal across all architectures that supports PASID, therefore warranted to be reserved globally and declared in the common header. Consequently, we can avoid the conflict between different PASID use cases in the generic code. e.g. SVA and DMA API with PASIDs. This paved away for device drivers to choose global PASID policy while continue doing normal DMA. Noting that VT-d could support none-zero RID/NO_PASID, but currently not used. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-2-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-08-09 17:44:34 +02:00
Jason Gunthorpe	6eb4da8cf5	iommu: Have __iommu_probe_device() check for already probed devices This is a step toward making __iommu_probe_device() self contained. It should, under proper locking, check if the device is already associated with an iommu driver and resolve parallel probes. All but one of the callers open code this test using two different means, but they all rely on dev->iommu_group. Currently the bus_iommu_probe()/probe_iommu_group() and probe_acpi_namespace_devices() rejects already probed devices with an unlocked read of dev->iommu_group. The OF and ACPI "replay" functions use device_iommu_mapped() which is the same read without the pointless refcount. Move this test into __iommu_probe_device() and put it under the iommu_probe_device_lock. The store to dev->iommu_group is in iommu_group_add_device() which is also called under this lock for iommu driver devices, making it properly locked. The only path that didn't have this check is the hotplug path triggered by BUS_NOTIFY_ADD_DEVICE. The only way to get dev->iommu_group assigned outside the probe path is via iommu_group_add_device(). Today the only caller is VFIO no-iommu which never associates with an iommu driver. Thus adding this additional check is safe. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-07-14 16:14:12 +02:00
Joerg Roedel	a7a334076d	Merge branches 'iommu/fixes', 'arm/smmu', 'ppc/pamu', 'virtio', 'x86/vt-d', 'core' and 'x86/amd' into next	2023-06-19 10:12:42 +02:00
Lu Baolu	b4da4e112a	iommu/vt-d: Remove commented-out code These lines of code were commented out when they were first added in commit `ba39592764` ("Intel IOMMU: Intel IOMMU driver"). We do not want to restore them because the VT-d spec has deprecated the read/write draining hit. VT-d spec (section 11.4.2): " Hardware implementation with Major Version 2 or higher (VER_REG), always performs required drain without software explicitly requesting a drain in IOTLB invalidation. This field is deprecated and hardware will always report it as 1 to maintain backward compatibility with software. " Remove the code to make the code cleaner. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230609060514.15154-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-16 16:38:33 +02:00
Yanfei Xu	3f13f72787	iommu/vt-d: Remove two WARN_ON in domain_context_mapping_one() Remove the WARN_ON(did == 0) as the domain id 0 is reserved and set once the domain_ids is allocated. So iommu_init_domains will never return 0. Remove the WARN_ON(!table) as this pointer will be accessed in the following code, if empty "table" really happens, the kernel will report a NULL pointer reference warning at the first place. Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Link: https://lore.kernel.org/r/20230605112659.308981-3-yanfei.xu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-16 16:38:32 +02:00
Yanfei Xu	a0e9911ac1	iommu/vt-d: Handle the failure case of dmar_reenable_qi() dmar_reenable_qi() may not succeed. Check and return when it fails. Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Link: https://lore.kernel.org/r/20230605112659.308981-2-yanfei.xu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-16 16:38:32 +02:00
Suhui	82d9654f92	iommu/vt-d: Remove unnecessary (void) conversions No need cast (void) to (struct root_entry *). Signed-off-by: Suhui <suhui@nfschina.com> Link: https://lore.kernel.org/r/20230425033743.75986-1-suhui@nfschina.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-16 16:38:31 +02:00
Robin Murphy	a4fdd97622	iommu: Use flush queue capability It remains really handy to have distinct DMA domain types within core code for the sake of default domain policy selection, but we can now hide that detail from drivers by using the new capability instead. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> # amd, intel, smmu-v3 Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1c552d99e8ba452bdac48209fa74c0bdd52fd9d9.1683233867.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-22 17:38:45 +02:00
Robin Murphy	4a20ce0ff6	iommu: Add a capability for flush queue support Passing a special type to domain_alloc to indirectly query whether flush queues are a worthwhile optimisation with the given driver is a bit clunky, and looking increasingly anachronistic. Let's put that into an explicit capability instead. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> # amd, intel, smmu-v3 Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/f0086a93dbccb92622e1ace775846d81c1c4b174.1683233867.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-22 17:38:44 +02:00
Joerg Roedel	e51b419839	Merge branches 'iommu/fixes', 'arm/allwinner', 'arm/exynos', 'arm/mediatek', 'arm/omap', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 'unisoc', 'x86/vt-d', 'x86/amd', 'core' and 'platform-remove_new' into next	2023-04-14 13:45:50 +02:00
Tina Zhang	cbf2f9e8ba	iommu/vt-d: Remove BUG_ON in map/unmap() Domain map/unmap with invalid parameters shouldn't crash the kernel. Therefore, using if() replaces the BUG_ON. Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20230406065944.2773296-6-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:52 +02:00
Tina Zhang	998d4c2db3	iommu/vt-d: Remove BUG_ON when domain->pgd is NULL When performing domain_context_mapping or getting dma_pte of a pfn, the availability of the domain page table directory is ensured. Therefore, the domain->pgd checkings are unnecessary. Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20230406065944.2773296-5-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:52 +02:00
Tina Zhang	4a627a2593	iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation VT-d iotlb cache invalidation request with unexpected type is considered as a bug to developers, which can be fixed. So, when such kind of issue comes out, it needs to be reported through the kernel log, instead of halting the system. Replacing BUG_ON with warning reporting. Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20230406065944.2773296-4-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:51 +02:00
Tina Zhang	35dc5d8998	iommu/vt-d: Remove BUG_ON on checking valid pfn range When encountering an unexpected invalid pfn range, the kernel should attempt recovery and proceed with execution. Therefore, using WARN_ON to replace BUG_ON to avoid halting the machine. Besides, one redundant checking is reduced. Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20230406065944.2773296-3-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:51 +02:00
Tina Zhang	b31064f881	iommu/vt-d: Make size of operands same in bitwise operations This addresses the following issue reported by klocwork tool: - operands of different size in bitwise operations Suggested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20230406065944.2773296-2-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:50 +02:00
Jacob Pan	a7050fbde3	iommu/vt-d: Use non-privileged mode for all PASIDs Supervisor Request Enable (SRE) bit in a PASID entry is for permission checking on DMA requests. When SRE = 0, DMA with supervisor privilege will be blocked. However, for in-kernel DMA this is not necessary in that we are targeting kernel memory anyway. There's no need to differentiate user and kernel for in-kernel DMA. Let's use non-privileged (user) permission for all PASIDs used in kernel, it will be consistent with DMA without PASID (RID_PASID) as well. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230331231137.1947675-2-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:49 +02:00
Lu Baolu	7b8aa998d6	iommu/vt-d: Remove unnecessary checks in iopf disabling path iommu_unregister_device_fault_handler() and iopf_queue_remove_device() are called after device has stopped issuing new page falut requests and all outstanding page requests have been drained. They should never fail. Trigger a warning if it happens unfortunately. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230324120234.313643-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:48 +02:00
Lu Baolu	fbcde5bb92	iommu/vt-d: Move PRI handling to IOPF feature path PRI is only used for IOPF. With this move, the PCI/PRI feature could be controlled by the device driver through iommu_dev_enable/disable_feature() interfaces. Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230324120234.313643-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:48 +02:00
Lu Baolu	5ae4008055	iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path They should be part of the per-device iommu private data initialization. Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230324120234.313643-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:47 +02:00
Lu Baolu	3d4c7cc3d1	iommu/vt-d: Move iopf code from SVA to IOPF enabling path Generally enabling IOMMU_DEV_FEAT_SVA requires IOMMU_DEV_FEAT_IOPF, but some devices manage I/O Page Faults themselves instead of relying on the IOMMU. Move IOPF related code from SVA to IOPF enabling path. For the device drivers that relies on the IOMMU for IOPF through PCI/PRI, IOMMU_DEV_FEAT_IOPF must be enabled before and disabled after IOMMU_DEV_FEAT_SVA. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230324120234.313643-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:47 +02:00
Lu Baolu	a86fb77173	iommu/vt-d: Allow SVA with device-specific IOPF Currently enabling SVA requires IOPF support from the IOMMU and device PCI PRI. However, some devices can handle IOPF by itself without ever sending PCI page requests nor advertising PRI capability. Allow SVA support with IOPF handled either by IOMMU (PCI PRI) or device driver (device-specific IOPF). As long as IOPF could be handled, SVA should continue to work. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230324120234.313643-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 12:05:46 +02:00
Jacob Pan	fffaed1e24	iommu/ioasid: Rename INVALID_IOASID INVALID_IOASID and IOMMU_PASID_INVALID are duplicated. Rename INVALID_IOASID and consolidate since we are moving away from IOASID infrastructure. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230322200803.869130-7-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-03-31 10:03:27 +02:00
Jacob Pan	760f41d182	iommu/vt-d: Remove virtual command interface Virtual command interface was introduced to allow using host PASIDs inside VMs. It is unused and abandoned due to architectural change. With this patch, we can safely remove this feature and the related helpers. Link: https://lore.kernel.org/r/20230210230206.3160144-2-jacob.jun.pan@linux.intel.com Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230322200803.869130-2-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-03-31 10:03:21 +02:00
Lu Baolu	c33fcc13ee	iommu: Use sysfs_emit() for sysfs show Use sysfs_emit() instead of the sprintf() for sysfs entries. sysfs_emit() knows the maximum of the temporary buffer used for outputting sysfs content and avoids overrunning the buffer length. Prefer 'long long' over 'long long int' as suggested by checkpatch.pl. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230322123421.278852-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-03-22 15:47:10 +01:00
Linus Torvalds	143c7bc649	iommufd for 6.3 Some polishing and small fixes for iommufd: - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem - Use GFP_KERNEL_ACCOUNT inside the iommu_domains - Support VFIO_NOIOMMU mode with iommufd - Various typos - A list corruption bug if HWPTs are used for attach -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/TgzQAKCRCFwuHvBreF Ya3AAP4/WxTJIbDvtTyH3Fae3NxTdO8j8gsUvU1vrRYG83zdnAEAxd1yii7GEO8D crkeq9D4FUiPAkFnJ64Exw2FHb060Qg= =RABK -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Some polishing and small fixes for iommufd: - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem - Use GFP_KERNEL_ACCOUNT inside the iommu_domains - Support VFIO_NOIOMMU mode with iommufd - Various typos - A list corruption bug if HWPTs are used for attach" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Do not add the same hwpt to the ioas->hwpt_list twice iommufd: Make sure to zero vfio_iommu_type1_info before copying to user vfio: Support VFIO_NOIOMMU with iommufd iommufd: Add three missing structures in ucmd_buffer selftests: iommu: Fix test_cmd_destroy_access() call in user_copy iommu: Remove IOMMU_CAP_INTR_REMAP irq/s390: Add arch_is_isolated_msi() for s390 iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code iommufd: Convert to msi_device_has_isolated_msi() vfio/type1: Convert to iommu_group_has_isolated_msi() iommu: Add iommu_group_has_isolated_msi() genirq/msi: Add msi_device_has_isolated_msi()	2023-02-24 14:34:12 -08:00
Joerg Roedel	bedd29d793	Merge branches 'apple/dart', 'arm/exynos', 'arm/renesas', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next	2023-02-18 15:43:04 +01:00
Tina Zhang	257ec29074	iommu/vt-d: Allow to use flush-queue when first level is default Commit `29b3283972` ("iommu/vt-d: Do not use flush-queue when caching-mode is on") forced default domains to be strict mode as long as IOMMU caching-mode is flagged. The reason for doing this is that when vIOMMU uses VT-d caching mode to synchronize shadowing page tables, the strict mode shows better performance. However, this optimization is orthogonal to the first-level page table because the Intel VT-d architecture does not define the caching mode of the first-level page table. Refer to VT-d spec, section 6.1, "When the CM field is reported as Set, any software updates to remapping structures other than first-stage mapping (including updates to not- present entries or present entries whose programming resulted in translation faults) requires explicit invalidation of the caches." Exclude the first-level page table from this optimization. Generally using first-stage translation in vIOMMU implies nested translation enabled in the physical IOMMU. In this case the first-stage page table is wholly captured by the guest. The vIOMMU only needs to transfer the cache invalidations on vIOMMU to the physical IOMMU. Forcing the default domain to strict mode will cause more frequent cache invalidations, resulting in performance degradation. In a real performance benchmark test measured by iperf receive, the performance result on Sapphire Rapids 100Gb NIC shows: w/ this fix ~51 Gbits/s, w/o this fix ~39.3 Gbits/s. Theoretically a first-stage IOMMU page table can still be shadowed in absence of the caching mode, e.g. with host write-protecting guest IOMMU page table to synchronize changed PTEs with the physical IOMMU page table. In this case the shadowing overhead is decoupled from emulating IOTLB invalidation then the overhead of the latter part is solely decided by the frequency of IOTLB invalidations. Hence allowing guest default dma domain to be lazy can also benefit the overall performance by reducing the total VM-exit numbers. Fixes: `29b3283972` ("iommu/vt-d: Do not use flush-queue when caching-mode is on") Reported-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230214025618.2292889-1-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-02-16 14:43:05 +01:00
Jacob Pan	16a75bbe48	iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode Intel IOMMU driver implements IOTLB flush queue with domain selective or PASID selective invalidations. In this case there's no need to track IOVA page range and sync IOTLBs, which may cause significant performance hit. This patch adds a check to avoid IOVA gather page and IOTLB sync for the lazy path. The performance difference on Sapphire Rapids 100Gb NIC is improved by the following (as measured by iperf send): w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s Cc: <stable@vger.kernel.org> Fixes: `2a2b8eaa5b` ("iommu: Handle freelists when using deferred flushing in iommu drivers") Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230209175330.1783556-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-02-16 14:43:05 +01:00
Lu Baolu	60b1daa3b1	iommu/vt-d: Fix error handling in sva enable/disable paths Roll back all previous actions in error paths of intel_iommu_enable_sva() and intel_iommu_disable_sva(). Fixes: `d5b9e4bfe0` ("iommu/vt-d: Report prq to io-pgfault framework") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230208051559.700109-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-02-16 14:43:04 +01:00
Kan Liang	d8a7c0cf05	iommu/vt-d: Enable IOMMU perfmon support Register and enable an IOMMU perfmon for each active IOMMU device. The failure of IOMMU perfmon registration doesn't impact other functionalities of an IOMMU device. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-8-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-02-03 11:06:09 +01:00
Kan Liang	dc57875866	iommu/vt-d: Support Enhanced Command Interface The Enhanced Command Register is to submit command and operand of enhanced commands to DMA Remapping hardware. It can supports up to 256 enhanced commands. There is a HW register to indicate the availability of all 256 enhanced commands. Each bit stands for each command. But there isn't an existing interface to read/write all 256 bits. Introduce the u64 ecmdcap[4] to store the existence of each enhanced command. Read 4 times to get all of them in map_iommu(). Add a helper to facilitate an enhanced command launch. Make sure hardware complete the command. Also add a helper to facilitate the check of PMU essentials. These helpers will be used later. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-4-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-02-03 11:06:05 +01:00
Lu Baolu	d82e6ae67a	iommu/vt-d: Remove include/linux/intel-svm.h There's no need to have a public header for Intel SVA implementation. The device driver should interact with Intel SVA implementation via the IOMMU generic APIs. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230109014955.147068-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-02-03 11:06:00 +01:00
Jason Gunthorpe	fd9f2a9122	Merge branch 'iommu-memory-accounting' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu intoiommufd/for-next Jason Gunthorpe says: ==================== iommufd follows the same design as KVM and uses memory cgroups to limit the amount of kernel memory a iommufd file descriptor can pin down. The various internal data structures already use GFP_KERNEL_ACCOUNT to charge its own memory. However, one of the biggest consumers of kernel memory is the IOPTEs stored under the iommu_domain and these allocations are not tracked. This series is the first step in fixing it. The iommu driver contract already includes a 'gfp' argument to the map_pages op, allowing iommufd to specify GFP_KERNEL_ACCOUNT and then having the driver allocate the IOPTE tables with that flag will capture a significant amount of the allocations. Update the iommu_map() API to pass in the GFP argument, and fix all call sites. Replace iommu_map_atomic(). Audit the "enterprise" iommu drivers to make sure they do the right thing. Intel and S390 ignore the GFP argument and always use GFP_ATOMIC. This is problematic for iommufd anyhow, so fix it. AMD and ARM SMMUv2/3 are already correct. A follow up series will be needed to capture the allocations made when the iommu_domain itself is allocated, which will complete the job. ==================== * 'iommu-memory-accounting' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/s390: Use GFP_KERNEL in sleepable contexts iommu/s390: Push the gfp parameter to the kmem_cache_alloc()'s iommu/intel: Use GFP_KERNEL in sleepable contexts iommu/intel: Support the gfp argument to the map_pages op iommu/intel: Add a gfp parameter to alloc_pgtable_page() iommufd: Use GFP_KERNEL_ACCOUNT for iommu_map() iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous() iommu: Add a gfp parameter to iommu_map_sg() iommu: Remove iommu_map_atomic() iommu: Add a gfp parameter to iommu_map() Link: https://lore.kernel.org/linux-iommu/0-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-30 13:54:35 -04:00
Jason Gunthorpe	4951eb2623	iommu/intel: Use GFP_KERNEL in sleepable contexts These contexts are sleepable, so use the proper annotation. The GFP_ATOMIC was added mechanically in the prior patches. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-01-25 11:52:06 +01:00
Jason Gunthorpe	2d4d767659	iommu/intel: Support the gfp argument to the map_pages op Flow it down to alloc_pgtable_page() via pfn_to_dma_pte() and __domain_mapping(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-01-25 11:52:06 +01:00
Jason Gunthorpe	2552d3a229	iommu/intel: Add a gfp parameter to alloc_pgtable_page() This is eventually called by iommufd through intel_iommu_map_pages() and it should not be forced to atomic. Push the GFP_ATOMIC to all callers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-01-25 11:52:05 +01:00
Jason Gunthorpe	f188bdb5f1	iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI On x86 platforms when the HW can support interrupt remapping the iommu driver creates an irq_domain for the IR hardware and creates a child MSI irq_domain. When the global irq_remapping_enabled is set, the IR MSI domain is assigned to the PCI devices (by intel_irq_remap_add_device(), or amd_iommu_set_pci_msi_domain()) making those devices have the isolated MSI property. Due to how interrupt domains work, setting IRQ_DOMAIN_FLAG_ISOLATED_MSI on the parent IR domain will cause all struct devices attached to it to return true from msi_device_has_isolated_msi(). This replaces the IOMMU_CAP_INTR_REMAP flag as all places using IOMMU_CAP_INTR_REMAP also call msi_device_has_isolated_msi() Set the flag and delete the cap. Link: https://lore.kernel.org/r/7-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-11 16:27:23 -04:00
Linus Torvalds	b8fd76f418	IOMMU Updates for Linux v6.2 Including: - Core code: - map/unmap_pages() cleanup - SVA and IOPF refactoring - Clean up and document return codes from device/domain attachment code - AMD driver: - Rework and extend parsing code for ivrs_ioapic, ivrs_hpet and ivrs_acpihid command line options - Some smaller cleanups - Intel driver: - Blocking domain support - Cleanups - S390 driver: - Fixes and improvements for attach and aperture handling - PAMU driver: - Resource leak fix and cleanup - Rockchip driver: - Page table permission bit fix - Mediatek driver: - Improve safety from invalid dts input - Smaller fixes and improvements - Exynos driver: - Fix driver initialization sequence - Sun50i driver: - Remove IOMMU_DOMAIN_IDENTITY as it has not been working forever - Various other fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmOd1PQACgkQK/BELZcB GuO7NxAAiwJUO99pTwvqnByzcC783AuE/fqKHDb9DZaN6Cr0VXSbKEwm8Lc2PC00 2CTwK/zGhy8BKBQnPiooJ+YOMPjE4yhFIF9jr5ASH5AVWv8EEFpo8zIFKAcF5rh/ c2Y5RIUwsGXuhR7U3lMTw84r39TZG2eHPwTEU6KvEJ1LCOMyD8IBYrZK2rvpGpem 3swXUfF5bQGAT8LlIFN7p+qsVs6ZtuD40qre3kerjrBtCPUMlxIIV5TJ8oQTecsk vKpD51mEVW+rjUKvqui8NDYuPfT76F2FPS37dfA1F36p8dmsMGSrtWngNm73r546 AmY8Gui6wKsv4Qn7Mxv49f/WZIXzdRTXOKx/zhYvvGxu7keqQIRIWYcLSxqfaGku cqJT401Ws1NHmRpx/t90lMH/anY5+kUMRTQG9Iq5ruLhExskd0SJcffa1i7YIGIe lPCTDf7MOXfDudR0Dtp87pGZQBaSkrSzZvb7qZY3Bj83WGZnLPpl6Z3N8KbkGzEO zNNvv1CtxZnIPrdOaKvfxQlAKiWKxkPRHuqk1TE8hkoNOe5ZgdOSJP5SeCrZ5tEf qljPXvDVF9f8CYw7QlfEDnbLnqDMGZpPAGqKPItbaijQLPZx4Jm4dw6+7i9hETIa wJ+1R9iAf+qiR0rlqueALKRaI4DjE8RU8yYSDpn2kn0BUOhWmb8= =ZM/m -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core code: - map/unmap_pages() cleanup - SVA and IOPF refactoring - Clean up and document return codes from device/domain attachment AMD driver: - Rework and extend parsing code for ivrs_ioapic, ivrs_hpet and ivrs_acpihid command line options - Some smaller cleanups Intel driver: - Blocking domain support - Cleanups S390 driver: - Fixes and improvements for attach and aperture handling PAMU driver: - Resource leak fix and cleanup Rockchip driver: - Page table permission bit fix Mediatek driver: - Improve safety from invalid dts input - Smaller fixes and improvements Exynos driver: - Fix driver initialization sequence Sun50i driver: - Remove IOMMU_DOMAIN_IDENTITY as it has not been working forever - Various other fixes" * tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (74 commits) iommu/mediatek: Fix forever loop in error handling iommu/mediatek: Fix crash on isr after kexec() iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITY iommu/amd: Fix typo in macro parameter name iommu/mediatek: Remove unused "mapping" member from mtk_iommu_data iommu/mediatek: Improve safety for mediatek,smi property in larb nodes iommu/mediatek: Validate number of phandles associated with "mediatek,larbs" iommu/mediatek: Add error path for loop of mm_dts_parse iommu/mediatek: Use component_match_add iommu/mediatek: Add platform_device_put for recovering the device refcnt iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe() iommu/vt-d: Use real field for indication of first level iommu/vt-d: Remove unnecessary domain_context_mapped() iommu/vt-d: Rename domain_add_dev_info() iommu/vt-d: Rename iommu_disable_dev_iotlb() iommu/vt-d: Add blocking domain support iommu/vt-d: Add device_block_translation() helper iommu/vt-d: Allocate pasid table in device probe path iommu/amd: Check return value of mmu_notifier_register() iommu/amd: Fix pci device refcount leak in ppr_notifier() ...	2022-12-19 08:34:39 -06:00
Linus Torvalds	08cdc21579	iommufd for 6.2 iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl s7fAu+CQ1pr9+9NKGevD+frw8Solsw4= =jJkd -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd implementation from Jason Gunthorpe: "iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA" For more background, see the extended explanations in Jason's pull request: https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits) iommufd: Change the order of MSI setup iommufd: Improve a few unclear bits of code iommufd: Fix comment typos vfio: Move vfio group specific code into group.c vfio: Refactor dma APIs for emulated devices vfio: Wrap vfio group module init/clean code into helpers vfio: Refactor vfio_device open and close vfio: Make vfio_device_open() truly device specific vfio: Swap order of vfio_device_container_register() and open_device() vfio: Set device->group in helper function vfio: Create wrappers for group register/unregister vfio: Move the sanity check of the group to vfio_create_group() vfio: Simplify vfio_create_group() iommufd: Allow iommufd to supply /dev/vfio/vfio vfio: Make vfio_container optionally compiled vfio: Move container related MODULE_ALIAS statements into container.c vfio-iommufd: Support iommufd for emulated VFIO devices vfio-iommufd: Support iommufd for physical VFIO devices vfio-iommufd: Allow iommufd to be used in place of a container fd vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() ...	2022-12-14 09:15:43 -08:00
Joerg Roedel	e3eca2e4f6	Merge branches 'arm/allwinner', 'arm/exynos', 'arm/mediatek', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 's390', 'x86/vt-d', 'x86/amd' and 'core' into next	2022-12-12 12:50:53 +01:00
Jacob Pan	81c95fbaeb	iommu/vt-d: Fix buggy QAT device mask Impacted QAT device IDs that need extra dtlb flush quirk is ranging from 0x4940 to 0x4943. After bitwise AND device ID with 0xfffc the result should be 0x4940 instead of 0x494c to identify these devices. Fixes: `e65a6897be` ("iommu/vt-d: Add a fix for devices need extra dtlb flush") Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20221203005610.2927487-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-12-05 14:27:03 +01:00
Jason Gunthorpe	90337f526c	Merge tag 'v6.1-rc7' into iommufd.git for-next Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version. The rc fix was done a different way when iommufd patches reworked this code. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-02 12:04:39 -04:00
Xiongfeng Wang	afca9e19cc	iommu/vt-d: Fix PCI device refcount leak in has_external_pci() for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL. If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() before 'return true' to avoid reference count leak. Fixes: `89a6079df7` ("iommu/vt-d: Force IOMMU on for platform opt in hint") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221121113649.190393-2-wangxiongfeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-12-02 11:45:32 +01:00
Jacob Pan	e65a6897be	iommu/vt-d: Add a fix for devices need extra dtlb flush QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in address translation service (ATS). These devices may inadvertently issue ATS invalidation completion before posted writes initiated with translated address that utilized translations matching the invalidation address range, violating the invalidation completion ordering. This patch adds an extra device TLB invalidation for the affected devices, it is needed to ensure no more posted writes with translated address following the invalidation completion. Therefore, the ordering is preserved and data-corruption is prevented. Device TLBs are invalidated under the following six conditions: 1. Device driver does DMA API unmap IOVA 2. Device driver unbind a PASID from a process, sva_unbind_device() 3. PASID is torn down, after PASID cache is flushed. e.g. process exit_mmap() due to crash 4. Under SVA usage, called by mmu_notifier.invalidate_range() where VM has to free pages that were unmapped 5. userspace driver unmaps a DMA buffer 6. Cache invalidation in vSVA usage (upcoming) For #1 and #2, device drivers are responsible for stopping DMA traffic before unmap/unbind. For #3, iommu driver gets mmu_notifier to invalidate TLB the same way as normal user unmap which will do an extra invalidation. The dTLB invalidation after PASID cache flush does not need an extra invalidation. Therefore, we only need to deal with #4 and #5 in this patch. #1 is also covered by this patch due to common code path with #5. Tested-by: Yuzhang Luo <yuzhang.luo@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20221130062449.1360063-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-12-02 11:45:31 +01:00
Jason Gunthorpe	4989764d8e	iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY This queries if a domain linked to a device should expect to support enforce_cache_coherency() so iommufd can negotiate the rules for when a domain should be shared or not. For iommufd a device that declares IOMMU_CAP_ENFORCE_CACHE_COHERENCY will not be attached to a domain that does not support it. Link: https://lore.kernel.org/r/1-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-29 16:34:15 -04:00
Lu Baolu	e5b0feb436	iommu/vt-d: Use real field for indication of first level The dmar_domain uses bit field members to indicate the behaviors. Add a bit field for using first level and remove the flags member to avoid duplication. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221118132451.114406-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-22 14:05:22 +01:00
Lu Baolu	b1cf1563f3	iommu/vt-d: Remove unnecessary domain_context_mapped() The device_domain_info::domain accurately records the domain attached to the device. It is unnecessary to check whether the context is present in the attach_dev path. Remove it to make the code neat. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221118132451.114406-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-22 14:05:22 +01:00
Lu Baolu	a8204479f2	iommu/vt-d: Rename domain_add_dev_info() dmar_domain_attach_device() is more meaningful according to what this helper does. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221118132451.114406-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-22 14:05:21 +01:00
Lu Baolu	ba502132f5	iommu/vt-d: Rename iommu_disable_dev_iotlb() Rename iommu_disable_dev_iotlb() to iommu_disable_pci_caps() to pair with iommu_enable_pci_caps(). Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221118132451.114406-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-22 14:05:21 +01:00
Lu Baolu	35a99c54dd	iommu/vt-d: Add blocking domain support The Intel IOMMU hardwares support blocking DMA transactions by clearing the translation table entries. This implements a real blocking domain to avoid using an empty UNMANAGED domain. The detach_dev callback of the domain ops is not used in any path. Remove it to avoid dead code as well. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221118132451.114406-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-22 14:05:21 +01:00
Lu Baolu	c7be17c290	iommu/vt-d: Add device_block_translation() helper If domain attaching to device fails, the IOMMU driver should bring the device to blocking DMA state. The upper layer is expected to recover it by attaching a new domain. Use device_block_translation() in the error path of dev_attach to make the behavior specific. The difference between device_block_translation() and the previous dmar_remove_one_dev_info() is that, in the scalable mode, it is the RID2PASID entry instead of context entry being cleared. As a result, enabling PCI capabilities is moved up. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221118132451.114406-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-22 14:05:20 +01:00
Lu Baolu	ec62b44241	iommu/vt-d: Allocate pasid table in device probe path Whether or not a domain is attached to the device, the pasid table should always be valid as long as it has been probed. This moves the pasid table allocation from the domain attaching device path to device probe path and frees it in the device release path. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221118132451.114406-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-22 14:05:20 +01:00
Tina Zhang	242b0aaeab	iommu/vt-d: Preset Access bit for IOVA in FL non-leaf paging entries The A/D bits are preseted for IOVA over first level(FL) usage for both kernel DMA (i.e, domain typs is IOMMU_DOMAIN_DMA) and user space DMA usage (i.e., domain type is IOMMU_DOMAIN_UNMANAGED). Presetting A bit in FL requires to preset the bit in every related paging entries, including the non-leaf ones. Otherwise, hardware may treat this as an error. For example, in a case of ECAP_REG.SMPWC==0, DMA faults might occur with below DMAR fault messages (wrapped for line length) dumped. DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read NO_PASID] Request device [aa:00.0] fault addr 0x10c3a6000 [fault reason 0x90] SM: A/D bit update needed in first-level entry when set up in no snoop Fixes: `289b3b005c` ("iommu/vt-d: Preset A/D bits for user space DMA usage") Cc: stable@vger.kernel.org Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20221113010324.1094483-1-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20221116051544.26540-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-19 10:46:51 +01:00
Joerg Roedel	69e61edebe	iommu: Define EINVAL as device/domain incompatibility This series is to replace the previous EMEDIUMTYPE patch in a VFIO series: https://lore.kernel.org/kvm/Yxnt9uQTmbqul5lf@8bytes.org/ The purpose is to regulate all existing ->attach_dev callback functions to use EINVAL exclusively for an incompatibility error between a device and a domain. This allows VFIO and IOMMUFD to detect such a soft error, and then try a different domain with the same device. Among all the patches, the first two are preparatory changes. And then one patch to update kdocs and another three patches for the enforcement effort. Link: https://lore.kernel.org/r/cover.1666042872.git.nicolinc@nvidia.com -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY2JjUQAKCRCFwuHvBreF YaFbAP492zvOEaZaRxiK4XcdsU1ZBCovB/2Keh/QIQdb7Ig6hgD/dW7TygTP1+4a Oqpcu/6aLeHvhayfZt1142S3e0HuHwU= =g5C+ -----END PGP SIGNATURE----- Merge tag 'for-joerg' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core iommu: Define EINVAL as device/domain incompatibility This series is to replace the previous EMEDIUMTYPE patch in a VFIO series: https://lore.kernel.org/kvm/Yxnt9uQTmbqul5lf@8bytes.org/ The purpose is to regulate all existing ->attach_dev callback functions to use EINVAL exclusively for an incompatibility error between a device and a domain. This allows VFIO and IOMMUFD to detect such a soft error, and then try a different domain with the same device. Among all the patches, the first two are preparatory changes. And then one patch to update kdocs and another three patches for the enforcement effort. Link: https://lore.kernel.org/r/cover.1666042872.git.nicolinc@nvidia.com	2022-11-03 15:51:48 +01:00
Lu Baolu	757636ed26	iommu: Rename iommu-sva-lib.{c,h} Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code for SVA implementation in iommu core. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-14-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-03 15:47:54 +01:00
Lu Baolu	1c263576f4	iommu: Remove SVA related callbacks from iommu ops These ops'es have been deprecated. There's no need for them anymore. Remove them to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-03 15:47:51 +01:00
Lu Baolu	eaca8889a1	iommu/vt-d: Add SVA domain support Add support for SVA domain allocation and provide an SVA-specific iommu_domain_ops. This implementation is based on the existing SVA code. Possible cleanup and refactoring are left for incremental changes later. The VT-d driver will also need to support setting a DMA domain to a PASID of device. Current SVA implementation uses different data structures to track the domain and device PASID relationship. That's the reason why we need to check the domain type in remove_dev_pasid callback. Eventually we'll consolidate the data structures and remove the need of domain type check. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-11-03 15:47:49 +01:00
Nicolin Chen	f4a1477357	iommu: Use EINVAL for incompatible device/domain in ->attach_dev Following the new rules in include/linux/iommu.h kdocs, update all drivers ->attach_dev callback functions to return EINVAL in the failure paths that are related to domain incompatibility. Also, drop adjacent error prints to prevent a kernel log spam. Link: https://lore.kernel.org/r/f52a07f7320da94afe575c9631340d0019a203a7.1666042873.git.nicolinc@nvidia.com Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-01 14:39:59 -03:00
Jerry Snitselaar	620bf9f981	iommu/vt-d: Clean up si_domain in the init_dmars() error path A splat from kmem_cache_destroy() was seen with a kernel prior to commit `ee2653bbe8` ("iommu/vt-d: Remove domain and devinfo mempool") when there was a failure in init_dmars(), because the iommu_domain cache still had objects. While the mempool code is now gone, there still is a leak of the si_domain memory if init_dmars() fails. So clean up si_domain in the init_dmars() error path. Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Fixes: `86080ccc22` ("iommu/vt-d: Allocate si_domain in init_dmars()") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-10-21 10:49:35 +02:00
Lu Baolu	bf638a6513	iommu/vt-d: Use rcu_lock in get_resv_regions Commit `5f64ce5411` ("iommu/vt-d: Duplicate iommu_resv_region objects per device list") converted rcu_lock in get_resv_regions to dmar_global_lock to allow sleeping in iommu_alloc_resv_region(). This introduced possible recursive locking if get_resv_regions is called from within a section where intel_iommu_init() already holds dmar_global_lock. Especially, after commit `57365a04c9` ("iommu: Move bus setup to IOMMU device registration"), below lockdep splats could always be seen. ============================================ WARNING: possible recursive locking detected 6.0.0-rc4+ #325 Tainted: G I -------------------------------------------- swapper/0/1 is trying to acquire lock: ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at: intel_iommu_get_resv_regions+0x25/0x270 but task is already holding lock: ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at: intel_iommu_init+0x36d/0x6ea ... Call Trace: <TASK> dump_stack_lvl+0x48/0x5f __lock_acquire.cold.73+0xad/0x2bb lock_acquire+0xc2/0x2e0 ? intel_iommu_get_resv_regions+0x25/0x270 ? lock_is_held_type+0x9d/0x110 down_read+0x42/0x150 ? intel_iommu_get_resv_regions+0x25/0x270 intel_iommu_get_resv_regions+0x25/0x270 iommu_create_device_direct_mappings.isra.28+0x8d/0x1c0 ? iommu_get_dma_cookie+0x6d/0x90 bus_iommu_probe+0x19f/0x2e0 iommu_device_register+0xd4/0x130 intel_iommu_init+0x3e1/0x6ea ? iommu_setup+0x289/0x289 ? rdinit_setup+0x34/0x34 pci_iommu_init+0x12/0x3a do_one_initcall+0x65/0x320 ? rdinit_setup+0x34/0x34 ? rcu_read_lock_sched_held+0x5a/0x80 kernel_init_freeable+0x28a/0x2f3 ? rest_init+0x1b0/0x1b0 kernel_init+0x1a/0x130 ret_from_fork+0x1f/0x30 </TASK> This rolls back dmar_global_lock to rcu_lock in get_resv_regions to avoid the lockdep splat. Fixes: `57365a04c9` ("iommu: Move bus setup to IOMMU device registration") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220927053109.4053662-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-10-21 10:49:34 +02:00
Lu Baolu	0251d0107c	iommu: Add gfp parameter to iommu_alloc_resv_region Add gfp parameter to iommu_alloc_resv_region() for the callers to specify the memory allocation behavior. Thus iommu_alloc_resv_region() could also be available in critical contexts. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-10-21 10:49:32 +02:00
Joerg Roedel	38713c6028	Merge branches 'apple/dart', 'arm/mediatek', 'arm/omap', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next	2022-09-26 15:52:31 +02:00
Lu Baolu	6ad931a232	iommu/vt-d: Avoid unnecessary global DMA cache invalidation Some VT-d hardware implementations invalidate all DMA remapping hardware translation caches as part of SRTP flow. The VT-d spec adds a ESRTPS (Enhanced Set Root Table Pointer Support, section 11.4.2 in VT-d spec) capability bit to indicate this. With this bit set, software has no need to issue the global invalidation request. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220919062523.3438951-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-26 15:52:26 +02:00
Yi Liu	b722cb32f0	iommu/vt-d: Rename cap_5lp_support to cap_fl5lp_support This renaming better describes it is for first level page table (a.k.a first stage page table since VT-d spec 3.4). Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220916071326.2223901-1-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-26 15:52:25 +02:00
Lu Baolu	0faa19a151	iommu/vt-d: Decouple PASID & PRI enabling from SVA Previously the PCI PASID and PRI capabilities are enabled in the path of iommu device probe only if INTEL_IOMMU_SVM is configured and the device supports ATS. As we've already decoupled the I/O page fault handler from SVA, we could also decouple PASID and PRI enabling from it to make room for growth of new features like kernel DMA with PASID, SIOV and nested translation. At the same time, the iommu_enable_dev_iotlb() helper is also called in iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) path. It's unnecessary and duplicate. This cleanups this helper to make the code neat. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220915085814.2261409-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-26 15:52:24 +02:00
Yi Liu	1548978070	iommu/vt-d: Check correct capability for sagaw determination Check 5-level paging capability for 57 bits address width instead of checking 1GB large page capability. Fixes: `53fc7ad6ed` ("iommu/vt-d: Correctly calculate sagaw value of IOMMU") Cc: stable@vger.kernel.org Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com> Link: https://lore.kernel.org/r/20220916071212.2223869-2-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-21 10:22:54 +02:00
Lu Baolu	7ebb5f8e00	Revert "iommu/vt-d: Fix possible recursive locking in intel_iommu_init()" This reverts commit `9cd4f14344`. Some issues were reported on the original commit. Some thunderbolt devices don't work anymore due to the following DMA fault. DMAR: DRHD: handling fault status reg 2 DMAR: [INTR-REMAP] Request device [09:00.0] fault index 0x8080 [fault reason 0x25] Blocked a compatibility format interrupt request Bring it back for now to avoid functional regression. Fixes: `9cd4f14344` ("iommu/vt-d: Fix possible recursive locking in intel_iommu_init()") Link: https://lore.kernel.org/linux-iommu/485A6EA5-6D58-42EA-B298-8571E97422DE@getmailspring.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216497 Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: <stable@vger.kernel.org> # 5.19.x Reported-and-tested-by: George Hilliard <thirtythreeforty@gmail.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220920081701.3453504-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-21 10:22:54 +02:00
Lu Baolu	9cd4f14344	iommu/vt-d: Fix possible recursive locking in intel_iommu_init() The global rwsem dmar_global_lock was introduced by commit `3a5670e8ac` ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It is used to protect DMAR related global data from DMAR hotplug operations. The dmar_global_lock used in the intel_iommu_init() might cause recursive locking issue, for example, intel_iommu_get_resv_regions() is taking the dmar_global_lock from within a section where intel_iommu_init() already holds it via probe_acpi_namespace_devices(). Using dmar_global_lock in intel_iommu_init() could be relaxed since it is unlikely that any IO board must be hot added before the IOMMU subsystem is initialized. This eliminates the possible recursive locking issue by moving down DMAR hotplug support after the IOMMU is initialized and removing the uses of dmar_global_lock in intel_iommu_init(). Fixes: `d5692d4af0` ("iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()") Reported-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/894db0ccae854b35c73814485569b634237b5538.1657034828.git.robin.murphy@arm.com Link: https://lore.kernel.org/r/20220718235325.3952426-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-11 08:19:24 +02:00
Joerg Roedel	7f34891b15	Merge branch 'iommu/fixes' into core	2022-09-09 09:27:09 +02:00
Robin Murphy	f2042ed21d	iommu/dma: Make header private Now that dma-iommu.h only contains internal interfaces, make it private to the IOMMU subsytem. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/b237e06c56a101f77af142a54b629b27aa179d22.1660668998.git.robin.murphy@arm.com [ joro : re-add stub for iommu_dma_get_resv_regions ] Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-09 09:26:22 +02:00
Lu Baolu	35bf49e054	iommu/vt-d: Fix lockdep splat due to klist iteration in atomic context With CONFIG_INTEL_IOMMU_DEBUGFS enabled, below lockdep splat are seen when an I/O fault occurs on a machine with an Intel IOMMU in it. DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Write NO_PASID] Request device [00:1a.0] fault addr 0x0 [fault reason 0x05] PTE Write access is not set DMAR: Dump dmar0 table entries for IOVA 0x0 DMAR: root entry: 0x0000000127f42001 DMAR: context entry: hi 0x0000000000001502, low 0x000000012d8ab001 ================================ WARNING: inconsistent lock state 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1 Not tainted -------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. rngd/1006 [HC1[1]:SC0[0]:HE0:SE1] takes: ff177021416f2d78 (&k->k_lock){?.+.}-{2:2}, at: klist_next+0x1b/0x160 {HARDIRQ-ON-W} state was registered at: lock_acquire+0xce/0x2d0 _raw_spin_lock+0x33/0x80 klist_add_tail+0x46/0x80 bus_add_device+0xee/0x150 device_add+0x39d/0x9a0 add_memory_block+0x108/0x1d0 memory_dev_init+0xe1/0x117 driver_init+0x43/0x4d kernel_init_freeable+0x1c2/0x2cc kernel_init+0x16/0x140 ret_from_fork+0x1f/0x30 irq event stamp: 7812 hardirqs last enabled at (7811): [<ffffffff85000e86>] asm_sysvec_apic_timer_interrupt+0x16/0x20 hardirqs last disabled at (7812): [<ffffffff84f16894>] irqentry_enter+0x54/0x60 softirqs last enabled at (7794): [<ffffffff840ff669>] __irq_exit_rcu+0xf9/0x170 softirqs last disabled at (7787): [<ffffffff840ff669>] __irq_exit_rcu+0xf9/0x170 The klist iterator functions using spin_lock_irq() but the klist insertion functions using spin_*lock(), combined with the Intel DMAR IOMMU driver iterating over klists from atomic (hardirq) context, where pci_get_domain_bus_and_slot() calls into bus_find_device() which iterates over klists. As currently there's no plan to fix the klist to make it safe to use in atomic context, this fixes the lockdep splat by avoid calling pci_get_domain_bus_and_slot() in the hardirq context. Fixes: `8ac0b64b97` ("iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()") Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Link: https://lore.kernel.org/linux-iommu/Yvo2dfpEh%2FWC+Wrr@wantstofly.org/ Link: https://lore.kernel.org/linux-iommu/YvyBdPwrTuHHbn5X@wantstofly.org/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220819015949.4795-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-07 15:14:57 +02:00
Lu Baolu	a349ffcb4d	iommu/vt-d: Fix recursive lock issue in iommu_flush_dev_iotlb() The per domain spinlock is acquired in iommu_flush_dev_iotlb(), which is possbile to be called in the interrupt context. For example, the drm-intel's CI system got completely blocked with below error: WARNING: inconsistent lock state 6.0.0-rc1-CI_DRM_11990-g6590d43d39b9+ #1 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff88810440d678 (&domain->lock){+.?.}-{2:2}, at: iommu_flush_dev_iotlb.part.61+0x23/0x80 {SOFTIRQ-ON-W} state was registered at: lock_acquire+0xd3/0x310 _raw_spin_lock+0x2a/0x40 domain_update_iommu_cap+0x20b/0x2c0 intel_iommu_attach_device+0x5bd/0x860 __iommu_attach_device+0x18/0xe0 bus_iommu_probe+0x1f3/0x2d0 bus_set_iommu+0x82/0xd0 intel_iommu_init+0xe45/0x102a pci_iommu_init+0x9/0x31 do_one_initcall+0x53/0x2f0 kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 ret_from_fork+0x1f/0x30 irq event stamp: 162354 hardirqs last enabled at (162354): [<ffffffff81b59274>] _raw_spin_unlock_irqrestore+0x54/0x70 hardirqs last disabled at (162353): [<ffffffff81b5901b>] _raw_spin_lock_irqsave+0x4b/0x50 softirqs last enabled at (162338): [<ffffffff81e00323>] __do_softirq+0x323/0x48e softirqs last disabled at (162349): [<ffffffff810c1588>] irq_exit_rcu+0xb8/0xe0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&domain->lock); <Interrupt> lock(&domain->lock); * DEADLOCK * 1 lock held by swapper/6/0: This coverts the spin_lock/unlock() into the irq save/restore varieties to fix the recursive locking issues. Fixes: `ffd5869d93` ("iommu/vt-d: Replace spin_lock_irqsave() with spin_lock()") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20220817025650.3253959-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-07 15:14:56 +02:00
Lu Baolu	53fc7ad6ed	iommu/vt-d: Correctly calculate sagaw value of IOMMU The Intel IOMMU driver possibly selects between the first-level and the second-level translation tables for DMA address translation. However, the levels of page-table walks for the 4KB base page size are calculated from the SAGAW field of the capability register, which is only valid for the second-level page table. This causes the IOMMU driver to stop working if the hardware (or the emulated IOMMU) advertises only first-level translation capability and reports the SAGAW field as 0. This solves the above problem by considering both the first level and the second level when calculating the supported page table levels. Fixes: `b802d070a5` ("iommu/vt-d: Use iova over first level") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220817023558.3253263-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-07 15:14:56 +02:00
Lu Baolu	0c5f6c0d82	iommu/vt-d: Fix kdump kernels boot failure with scalable mode The translation table copying code for kdump kernels is currently based on the extended root/context entry formats of ECS mode defined in older VT-d v2.5, and doesn't handle the scalable mode formats. This causes the kexec capture kernel boot failure with DMAR faults if the IOMMU was enabled in scalable mode by the previous kernel. The ECS mode has already been deprecated by the VT-d spec since v3.0 and Intel IOMMU driver doesn't support this mode as there's no real hardware implementation. Hence this converts ECS checking in copying table code into scalable mode. The existing copying code consumes a bit in the context entry as a mark of copied entry. It needs to work for the old format as well as for the extended context entries. As it's hard to find such a common bit for both legacy and scalable mode context entries. This replaces it with a per- IOMMU bitmap. Fixes: `7373a8cc38` ("iommu/vt-d: Setup context and enable RID2PASID support") Cc: stable@vger.kernel.org Reported-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Wen Jin <wen.jin@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220817011035.3250131-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-07 15:14:55 +02:00
Robin Murphy	29e932295b	iommu: Clean up bus_set_iommu() Clean up the remaining trivial bus_set_iommu() callsites along with the implementation. Now drivers only have to know and care about iommu_device instances, phew! Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390 Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> # s390 Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/ea383d5f4d74ffe200ab61248e5de6e95846180a.1660572783.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-07 14:26:17 +02:00
Robin Murphy	c919739ce4	iommu/vt-d: Handle race between registration and device probe Currently we rely on registering all our instances before initially allowing any .probe_device calls via bus_set_iommu(). In preparation for phasing out the latter, make sure we won't inadvertently return success for a device associated with a known but not yet registered instance, otherwise we'll run straight into iommu_group_get_for_dev() trying to use NULL ops. That also highlights an issue with intel_iommu_get_resv_regions() taking dmar_global_lock from within a section where intel_iommu_init() already holds it, which already exists via probe_acpi_namespace_devices() when an ANDD device is probed, but gets more obvious with the upcoming change to iommu_device_register(). Since they are both read locks it manages not to deadlock in practice, and a more in-depth rework of this locking is underway, so no attempt is made to address it here. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/579f2692291bcbfc3ac64f7456fcff0d629af131.1660572783.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-07 14:25:01 +02:00
Robin Murphy	359ad15763	iommu: Retire iommu_capable() With all callers now converted to the device-specific version, retire the old bus-based interface, and give drivers the chance to indicate accurate per-instance capabilities. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/d8bd8777d06929ad8f49df7fc80e1b9af32a41b5.1660574547.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-07 14:16:37 +02:00
Joerg Roedel	c10100a416	Merge branches 'arm/exynos', 'arm/mediatek', 'arm/msm', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next	2022-07-29 12:06:56 +02:00
Lu Baolu	bdb46d1758	iommu/vt-d: Remove global g_iommus array The g_iommus and g_num_of_iommus is not used anywhere. Remove them to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20220702015610.2849494-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:42 +02:00
Lu Baolu	97a79de99a	iommu/vt-d: Remove unnecessary check in intel_iommu_add() The Intel IOMMU hot-add process starts from dmar_device_hotplug(). It uses the global dmar_global_lock to synchronize all the hot-add and hot-remove paths. In the hot-add path, the new IOMMU data structures are allocated firstly by dmar_parse_one_drhd() and then initialized by dmar_hp_add_drhd(). All the IOMMU units are allocated and initialized in the same synchronized path. There is no case where any IOMMU unit is created and then initialized for multiple times. This removes the unnecessary check in intel_iommu_add() which is the last reference place of the global IOMMU array. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20220702015610.2849494-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:42 +02:00
Lu Baolu	ba949f4cd4	iommu/vt-d: Refactor iommu information of each domain When a DMA domain is attached to a device, it needs to allocate a domain ID from its IOMMU. Currently, the domain ID information is stored in two static arrays embedded in the domain structure. This can lead to memory waste when the driver is running on a small platform. This optimizes these static arrays by replacing them with an xarray and consuming memory on demand. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20220702015610.2849494-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:41 +02:00
Lu Baolu	c3f27c834a	iommu/vt-d: Remove unused domain_get_iommu() It is not used anywhere. Remove it to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20220702015610.2849494-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:40 +02:00
Lu Baolu	5eaafdf0c0	iommu/vt-d: Convert global spinlock into per domain lock Using a global device_domain_lock spinlock to protect per-domain device tracking lists is an inefficient way, especially considering this lock is also needed in the hot paths. This optimizes the locking mechanism by converting the global lock to per domain lock. On the other hand, as the device tracking lists are never accessed in any interrupt context, there is no need to disable interrupts while spinning. Replace irqsave variant with spinlock calls. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-12-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:39 +02:00
Lu Baolu	969aaefbaa	iommu/vt-d: Use device_domain_lock accurately The device_domain_lock is used to protect the device tracking list of a domain. Remove unnecessary spin_lock/unlock()'s and move the necessary ones around the list access. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:39 +02:00
Lu Baolu	db75c9573b	iommu/vt-d: Fold __dmar_remove_one_dev_info() into its caller Fold __dmar_remove_one_dev_info() into dmar_remove_one_dev_info() which is its only caller. Make the spin lock critical range only cover the device list change code and remove some unnecessary checks. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:38 +02:00
Lu Baolu	79d82ce402	iommu/vt-d: Check device list of domain in domain free path When the IOMMU domain is about to be freed, it should not be set on any device. Instead of silently dealing with some bug cases, it's better to trigger a warning to report and fix any potential bugs at the first time. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-9-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:38 +02:00
Lu Baolu	8430fd3f32	iommu/vt-d: Acquiring lock in pasid manipulation helpers The iommu->lock is used to protect the per-IOMMU pasid directory table and pasid table. Move the spinlock acquisition/release into the helpers to make the code self-contained. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:37 +02:00
Lu Baolu	2c3262f9e8	iommu/vt-d: Acquiring lock in domain ID allocation helpers The iommu->lock is used to protect the per-IOMMU domain ID resource. Moving the lock into the ID alloc/free helpers makes the code more compact. At the same time, the device_domain_lock is irrelevant to the domain ID resource, remove its assertion as well. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:37 +02:00
Lu Baolu	ffd5869d93	iommu/vt-d: Replace spin_lock_irqsave() with spin_lock() The iommu->lock is used to protect changes in root/context/pasid tables and domain ID allocation. There's no use case to change these resources in any interrupt context. Therefore, it is unnecessary to disable the interrupts when the spinlock is held. The same thing happens on the device_domain_lock side, which protects the device domain attachment information. This replaces spin_lock/unlock_irqsave/irqrestore() calls with the normal spin_lock/unlock(). Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:36 +02:00
Lu Baolu	2e1c8dafb8	iommu/vt-d: Unnecessary spinlock for root table alloc and free The IOMMU root table is allocated and freed in the IOMMU initialization code in static boot or hot-remove paths. There's no need for a spinlock. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:35 +02:00
Lu Baolu	8ac0b64b97	iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk() Use pci_get_domain_bus_and_slot() instead of searching the global list to retrieve the pci device pointer. This also removes the global device_domain_list as there isn't any consumer anymore. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:35 +02:00
Lu Baolu	98f7b0db49	iommu/vt-d: Remove clearing translation data in disable_dmar_iommu() The disable_dmar_iommu() is called when IOMMU initialization fails or the IOMMU is hot-removed from the system. In both cases, there is no need to clear the IOMMU translation data structures for devices. On the initialization path, the device probing only happens after the IOMMU is initialized successfully, hence there're no translation data structures. On the hot-remove path, there is no real use case where the IOMMU is hot-removed, but the devices that it manages are still alive in the system. The translation data structures were torn down during device release, hence there's no need to repeat it in IOMMU hot-remove path either. This removes the unnecessary code and only leaves a check. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:34 +02:00
Lu Baolu	983ebe57b3	iommu/vt-d: debugfs: Remove device_domain_lock usage The domain_translation_struct debugfs node is used to dump the DMAR page tables for the PCI devices. It potentially races with setting domains to devices. The existing code uses the global spinlock device_domain_lock to avoid the races. This removes the use of device_domain_lock outside of iommu.c by replacing it with the group mutex lock. Using the group mutex lock is cleaner and more compatible to following cleanups. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220706025524.2904370-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:33 +02:00
Lu Baolu	2585a2790e	iommu/vt-d: Move include/linux/intel-iommu.h under iommu This header file is private to the Intel IOMMU driver. Move it to the driver folder. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20220514014322.2927339-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:31 +02:00
Lu Baolu	853788b9a6	x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU tboot_force_iommu() is only called by the Intel IOMMU driver. Move the helper into that driver. No functional change intended. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20220514014322.2927339-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:30 +02:00
Lu Baolu	f9903555dd	iommu/vt-d: Remove unnecessary exported symbol The exported symbol intel_iommu_gfx_mapped is not used anywhere in the tree. Remove it to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20220514014322.2927339-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:21:29 +02:00
Christoph Hellwig	ae3ff39a51	iommu: remove the put_resv_regions method All drivers that implement get_resv_regions just use generic_put_resv_regions to implement the put side. Remove the indirections and document the allocations constraints. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220708080616.238833-4-hch@lst.de Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-15 10:13:45 +02:00
Lu Baolu	4140d77a02	iommu/vt-d: Fix RID2PASID setup/teardown failure The IOMMU driver shares the pasid table for PCI alias devices. When the RID2PASID entry of the shared pasid table has been filled by the first device, the subsequent device will encounter the "DMAR: Setup RID2PASID failed" failure as the pasid entry has already been marked as present. As the result, the IOMMU probing process will be aborted. On the contrary, when any alias device is hot-removed from the system, for example, by writing to /sys/bus/pci/devices/.../remove, the shared RID2PASID will be cleared without any notifications to other devices. As the result, any DMAs from those rest devices are blocked. Sharing pasid table among PCI alias devices could save two memory pages for devices underneath the PCIe-to-PCI bridges. Anyway, considering that those devices are rare on modern platforms that support VT-d in scalable mode and the saved memory is negligible, it's reasonable to remove this part of immature code to make the driver feasible and stable. Fixes: `ef848b7e5a` ("iommu/vt-d: Setup pasid entry for RID2PASID support") Reported-by: Chenyi Qiang <chenyi.qiang@intel.com> Reported-by: Ethan Zhao <haifeng.zhao@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Ethan Zhao <haifeng.zhao@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220623065720.727849-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220625133430.2200315-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-07-06 12:59:21 +02:00
Linus Torvalds	e1cbc3b96a	IOMMU Updates for Linux v5.19 Including: - Intel VT-d driver updates - Domain force snooping improvement. - Cleanups, no intentional functional changes. - ARM SMMU driver updates - Add new Qualcomm device-tree compatible strings - Add new Nvidia device-tree compatible string for Tegra234 - Fix UAF in SMMUv3 shared virtual addressing code - Force identity-mapped domains for users of ye olde SMMU legacy binding - Minor cleanups - Patches to fix a BUG_ON in the vfio_iommu_group_notifier - Groundwork for upcoming iommufd framework - Introduction of DMA ownership so that an entire IOMMU group is either controlled by the kernel or by user-space - MT8195 and MT8186 support in the Mediatek IOMMU driver - Patches to make forcing of cache-coherent DMA more coherent between IOMMU drivers - Fixes for thunderbolt device DMA protection - Various smaller fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmKWCbUACgkQK/BELZcB GuPHmRAAuoH9iK/jrC3SgrqpBfH2iRN7ovIX8dFvgbQWX27lhXF4gvj2/nYdIvPK 75j/LmdibuzV3Iez4kjbGKNG1AikwK3dKIH21a84f3ctnoamQyL6nMfCVBFaVD/D kvPpTHyjbGPNf6KZyWQdkJ5DXD1aoG1DKkBnslH5pTNPqGuNqbcnRTg0YxiJFLBv 5w2B6jL06XRzunh+Sp1Dbj+po8ROjLRCEU+tdrndO8W/Dyp6+ZNNuxL9/3BM9zMj py0M4piFtGnhmJSdym1eeHm7r1YRjkZw+MN+e8NcrcSihmDutEWo7nRRxA5uVaa+ 3O2DNERqCvQUYxfNRUOKwzV8v51GYQHEPhvOe/MLgaEQDmDmlF2dHNGm93eCMdrv m1cT011oU7pa4qHomwLyTJxSsR7FzJ37igq/WeY++MBhl+frqfzEQPVxF+W7GLb8 QvT/+woCPzLVpJbE7s0FUD4nbPd8c1dAz4+HO1DajxILIOTq1bnPIorSjgXODRjq yzsiP1rAg0L0PsL7pXn3cPMzNCE//xtOsRsAGmaVv6wBoMLyWVFCU/wjPEdjrSWA nXpAuCL84uxCEl/KLYMsg9UhjT6ko7CuKdsybIG9zNIiUau43uSqgTen0xCpYt0i m//O/X3tPyxmoLKRW+XVehGOrBZW+qrQny6hk/Zex+6UJQqVMTA= =W0hj -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Intel VT-d driver updates: - Domain force snooping improvement. - Cleanups, no intentional functional changes. - ARM SMMU driver updates: - Add new Qualcomm device-tree compatible strings - Add new Nvidia device-tree compatible string for Tegra234 - Fix UAF in SMMUv3 shared virtual addressing code - Force identity-mapped domains for users of ye olde SMMU legacy binding - Minor cleanups - Fix a BUG_ON in the vfio_iommu_group_notifier: - Groundwork for upcoming iommufd framework - Introduction of DMA ownership so that an entire IOMMU group is either controlled by the kernel or by user-space - MT8195 and MT8186 support in the Mediatek IOMMU driver - Make forcing of cache-coherent DMA more coherent between IOMMU drivers - Fixes for thunderbolt device DMA protection - Various smaller fixes and cleanups * tag 'iommu-updates-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (88 commits) iommu/amd: Increase timeout waiting for GA log enablement iommu/s390: Tolerate repeat attach_dev calls iommu/vt-d: Remove hard coding PGSNP bit in PASID entries iommu/vt-d: Remove domain_update_iommu_snooping() iommu/vt-d: Check domain force_snooping against attached devices iommu/vt-d: Block force-snoop domain attaching if no SC support iommu/vt-d: Size Page Request Queue to avoid overflow condition iommu/vt-d: Fold dmar_insert_one_dev_info() into its caller iommu/vt-d: Change return type of dmar_insert_one_dev_info() iommu/vt-d: Remove unneeded validity check on dev iommu/dma: Explicitly sort PCI DMA windows iommu/dma: Fix iova map result check bug iommu/mediatek: Fix NULL pointer dereference when printing dev_name iommu: iommu_group_claim_dma_owner() must always assign a domain iommu/arm-smmu: Force identity domains for legacy binding iommu/arm-smmu: Support Tegra234 SMMU dt-bindings: arm-smmu: Add compatible for Tegra234 SOC dt-bindings: arm-smmu: Document nvidia,memory-controller property iommu/arm-smmu-qcom: Add SC8280XP support dt-bindings: arm-smmu: Add compatible for Qualcomm SC8280XP ...	2022-05-31 09:56:54 -07:00
Linus Torvalds	2518f226c6	drm for 5.19-rc1 dma-buf: - add dma_resv_replace_fences - add dma_resv_get_singleton - make dma_excl_fence private core: - EDID parser refactorings - switch drivers to drm_mode_copy/duplicate - DRM managed mutex initialization display-helper: - put HDMI, SCDC, HDCP, DSC and DP into new module gem: - rework fence handling ttm: - rework bulk move handling - add common debugfs for resource managers - convert to kvcalloc format helpers: - support monochrome formats - RGB888, RGB565 to XRGB8888 conversions fbdev: - cfb/sys_imageblit fixes - pagelist corruption fix - create offb platform device - deferred io improvements sysfb: - Kconfig rework - support for VESA mode selection bridge: - conversions to devm_drm_of_get_bridge - conversions to panel_bridge - analogix_dp - autosuspend support - it66121 - audio support - tc358767 - DSI to DPI support - icn6211 - PLL/I2C fixes, DT property - adv7611 - enable DRM_BRIDGE_OP_HPD - anx7625 - fill ELD if no monitor - dw_hdmi - add audio support - lontium LT9211 support, i.MXMP LDB - it6505: Kconfig fix, DPCD set power fix - adv7511 - CEC support for ADV7535 panel: - ltk035c5444t, B133UAN01, NV3052C panel support - DataImage FG040346DSSWBG04 support - st7735r - DT bindings fix - ssd130x - fixes i915: - DG2 laptop PCI-IDs ("motherboard down") - Initial RPL-P PCI IDs - compute engine ABI - DG2 Tile4 support - DG2 CCS clear color compression support - DG2 render/media compression formats support - ATS-M platform info - RPL-S PCI IDs added - Bump ADL-P DMC version to v2.16 - Support static DRRS - Support multiple eDP/LVDS native mode refresh rates - DP HDR support for HSW+ - Lots of display refactoring + fixes - GuC hwconfig support and query - sysfs support for multi-tile - fdinfo per-client gpu utilisation - add geometry subslices query - fix prime mmap with LMEM - fix vm open count and remove vma refcounts - contiguous allocation fixes - steered register write support - small PCI BAR enablement - GuC error capture support - sunset igpu legacy mmap support for newer devices - GuC version 70.1.1 support amdgpu: - Initial SoC21 support - SMU 13.x enablement - SMU 13.0.4 support - ttm_eu cleanups - USB-C, GPUVM updates - TMZ fixes for RV - RAS support for VCN - PM sysfs code cleanup - DC FP rework - extend CG/PG flags to 64-bit - SI dpm lockdep fix - runtime PM fixes amdkfd: - RAS/SVM fixes - TLB flush fixes - CRIU GWS support - ignore bogus MEC signals more efficiently msm: - Fourcc modifier for tiled but not compressed layouts - Support for userspace allocated IOVA (GPU virtual address) - DPU: DSC (Display Stream Compression) support - DP: eDP support - DP: conversion to use drm_bridge and drm_bridge_connector - Merge DPU1 and MDP5 MDSS driver - DPU: writeback support nouveau: - make some structures static - make some variables static - switch to drm_gem_plane_helper_prepare_fb radeon: - misc fixes/cleanups mxsfb: - rework crtc mode setting - LCDIF CRC support etnaviv: - fencing improvements - fix address space collisions - cleanup MMU reference handling gma500: - GEM/GTT improvements - connector handling fixes komeda: - switch to plane reset helper mediatek: - MIPI DSI improvements omapdrm: - GEM improvements qxl: - aarch64 support vc4: - add a CL submission tracepoint - HDMI YUV support - HDMI/clock improvements - drop is_hdmi caching virtio: - remove restriction of non-zero blob types vmwgfx: - support for cursormob and cursorbypass 4 - fence improvements tidss: - reset DISPC on startup solomon: - SPI support - DT improvements sun4i: - allwinner D1 support - drop is_hdmi caching imx: - use swap() instead of open-coding - use devm_platform_ioremap_resource - remove redunant initializations ast: - Displayport support rockchip: - Refactor IOMMU initialisation - make some structures static - replace drm_detect_hdmi_monitor with drm_display_info.is_hdmi - support swapped YUV formats, - clock improvements - rk3568 support - VOP2 support mediatek: - MT8186 support tegra: - debugabillity improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmKNxkAACgkQDHTzWXnE hr4hghAAqSXeMEw1w34miyM28hcOpXqkDfT1VVooqFxBT8MBqamzpZvCH94qsZwm 3DRXlhQ4pk8wzUcWJpGprdNakxNQPpFVs2UuxYxOyrxYpdkbOwqsEcM3d8VXD9Cy E36z+dr85A8Te/J0Yg/FLoZMHulTlidqEZeOz6SMaNUohtrmH/oPWR+cPIy4/Zpp yysfbBSKTwblJFDf4+nIpks/VvJYAUO3i6KClT/Rh79Yg6de582AU0YaNcEArTi6 JqdiYIoYLx609Ecy5NVme6wR/ai46afFLMYt3ZIP4OfHRINk+YL01BYMo2JE2M8l xjOH0Iwb7evzWqLK/ESwqp3P7nyppmLlfbZOFHWUfNJsjq2H3ePaAGhzOlYx1c70 XENzY4IvpYYdR0pJuh1gw1cNZfM9JDAynGJ5jvsATLGBGQbpFsy3w/PMZT17q8an DpBwqQmShUdCJ2m+6zznC3VsxJpbvWKNE1I93NxAWZXmFYxoHCzRihahUxKcNDrQ ZLH7RSlk9SE/ZtNSLkU15YnKtoW+ThFIssUpVio6U/fZot1+efZkmkXplSuFvj6R i7s14hMWQjSJzpJg1DXfhDMycEOujNiQppCG2EaDlVxvUtCqYBd3EHOI7KQON//+ iVtmEEnWh5rcCM+WsxLGf3Y7sVP3vfo1LOCxshb1XVfDmeMksoI= =BYQA -----END PGP SIGNATURE----- Merge tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm Pull drm updates from Dave Airlie: "Intel have enabled DG2 on certain SKUs for laptops, AMD has started some new GPU support, msm has user allocated VA controls dma-buf: - add dma_resv_replace_fences - add dma_resv_get_singleton - make dma_excl_fence private core: - EDID parser refactorings - switch drivers to drm_mode_copy/duplicate - DRM managed mutex initialization display-helper: - put HDMI, SCDC, HDCP, DSC and DP into new module gem: - rework fence handling ttm: - rework bulk move handling - add common debugfs for resource managers - convert to kvcalloc format helpers: - support monochrome formats - RGB888, RGB565 to XRGB8888 conversions fbdev: - cfb/sys_imageblit fixes - pagelist corruption fix - create offb platform device - deferred io improvements sysfb: - Kconfig rework - support for VESA mode selection bridge: - conversions to devm_drm_of_get_bridge - conversions to panel_bridge - analogix_dp - autosuspend support - it66121 - audio support - tc358767 - DSI to DPI support - icn6211 - PLL/I2C fixes, DT property - adv7611 - enable DRM_BRIDGE_OP_HPD - anx7625 - fill ELD if no monitor - dw_hdmi - add audio support - lontium LT9211 support, i.MXMP LDB - it6505: Kconfig fix, DPCD set power fix - adv7511 - CEC support for ADV7535 panel: - ltk035c5444t, B133UAN01, NV3052C panel support - DataImage FG040346DSSWBG04 support - st7735r - DT bindings fix - ssd130x - fixes i915: - DG2 laptop PCI-IDs ("motherboard down") - Initial RPL-P PCI IDs - compute engine ABI - DG2 Tile4 support - DG2 CCS clear color compression support - DG2 render/media compression formats support - ATS-M platform info - RPL-S PCI IDs added - Bump ADL-P DMC version to v2.16 - Support static DRRS - Support multiple eDP/LVDS native mode refresh rates - DP HDR support for HSW+ - Lots of display refactoring + fixes - GuC hwconfig support and query - sysfs support for multi-tile - fdinfo per-client gpu utilisation - add geometry subslices query - fix prime mmap with LMEM - fix vm open count and remove vma refcounts - contiguous allocation fixes - steered register write support - small PCI BAR enablement - GuC error capture support - sunset igpu legacy mmap support for newer devices - GuC version 70.1.1 support amdgpu: - Initial SoC21 support - SMU 13.x enablement - SMU 13.0.4 support - ttm_eu cleanups - USB-C, GPUVM updates - TMZ fixes for RV - RAS support for VCN - PM sysfs code cleanup - DC FP rework - extend CG/PG flags to 64-bit - SI dpm lockdep fix - runtime PM fixes amdkfd: - RAS/SVM fixes - TLB flush fixes - CRIU GWS support - ignore bogus MEC signals more efficiently msm: - Fourcc modifier for tiled but not compressed layouts - Support for userspace allocated IOVA (GPU virtual address) - DPU: DSC (Display Stream Compression) support - DP: eDP support - DP: conversion to use drm_bridge and drm_bridge_connector - Merge DPU1 and MDP5 MDSS driver - DPU: writeback support nouveau: - make some structures static - make some variables static - switch to drm_gem_plane_helper_prepare_fb radeon: - misc fixes/cleanups mxsfb: - rework crtc mode setting - LCDIF CRC support etnaviv: - fencing improvements - fix address space collisions - cleanup MMU reference handling gma500: - GEM/GTT improvements - connector handling fixes komeda: - switch to plane reset helper mediatek: - MIPI DSI improvements omapdrm: - GEM improvements qxl: - aarch64 support vc4: - add a CL submission tracepoint - HDMI YUV support - HDMI/clock improvements - drop is_hdmi caching virtio: - remove restriction of non-zero blob types vmwgfx: - support for cursormob and cursorbypass 4 - fence improvements tidss: - reset DISPC on startup solomon: - SPI support - DT improvements sun4i: - allwinner D1 support - drop is_hdmi caching imx: - use swap() instead of open-coding - use devm_platform_ioremap_resource - remove redunant initializations ast: - Displayport support rockchip: - Refactor IOMMU initialisation - make some structures static - replace drm_detect_hdmi_monitor with drm_display_info.is_hdmi - support swapped YUV formats, - clock improvements - rk3568 support - VOP2 support mediatek: - MT8186 support tegra: - debugabillity improvements" * tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm: (1740 commits) drm/i915/dsi: fix VBT send packet port selection for ICL+ drm/i915/uc: Fix undefined behavior due to shift overflowing the constant drm/i915/reg: fix undefined behavior due to shift overflowing the constant drm/i915/gt: Fix use of static in macro mismatch drm/i915/audio: fix audio code enable/disable pipe logging drm/i915: Fix CFI violation with show_dynamic_id() drm/i915: Fix 'mixing different enum types' warnings in intel_display_power.c drm/i915/gt: Fix build error without CONFIG_PM drm/msm/dpu: handle pm_runtime_get_sync() errors in bind path drm/msm/dpu: add DRM_MODE_ROTATE_180 back to supported rotations drm/msm: don't free the IRQ if it was not requested drm/msm/dpu: limit writeback modes according to max_linewidth drm/amd: Don't reset dGPUs if the system is going to s2idle drm/amdgpu: Unmap legacy queue when MES is enabled drm: msm: fix possible memory leak in mdp5_crtc_cursor_set() drm/msm: Fix fb plane offset calculation drm/msm/a6xx: Fix refcount leak in a6xx_gpu_init drm/msm/dsi: don't powerup at modeset time for parade-ps8640 drm/rockchip: Change register space names in vop2 dt-bindings: display: rockchip: make reg-names mandatory for VOP2 ...	2022-05-25 16:18:27 -07:00
Joerg Roedel	b0dacee202	Merge branches 'apple/dart', 'arm/mediatek', 'arm/msm', 'arm/smmu', 'ppc/pamu', 'x86/vt-d', 'x86/amd' and 'vfio-notifier-fix' into next	2022-05-20 12:27:17 +02:00
Lu Baolu	e80552267b	iommu/vt-d: Remove domain_update_iommu_snooping() The IOMMU force snooping capability is not required to be consistent among all the IOMMUs anymore. Remove force snooping capability check in the IOMMU hot-add path and domain_update_iommu_snooping() becomes a dead code now. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220508123525.1973626-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220510023407.2759143-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-05-13 15:14:56 +02:00
Lu Baolu	fc0051cb95	iommu/vt-d: Check domain force_snooping against attached devices As domain->force_snooping only impacts the devices attached with the domain, there's no need to check against all IOMMU units. On the other hand, force_snooping could be set on a domain no matter whether it has been attached or not, and once set it is an immutable flag. If no device attached, the operation always succeeds. Then this empty domain can be only attached to a device of which the IOMMU supports snoop control. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220508123525.1973626-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220510023407.2759143-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-05-13 15:14:56 +02:00
Lu Baolu	9d6ab26a75	iommu/vt-d: Block force-snoop domain attaching if no SC support In the attach_dev callback of the default domain ops, if the domain has been set force_snooping, but the iommu hardware of the device does not support SC(Snoop Control) capability, the callback should block it and return a corresponding error code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220508123525.1973626-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220510023407.2759143-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-05-13 15:14:56 +02:00
Lu Baolu	bac4e778d6	iommu/vt-d: Fold dmar_insert_one_dev_info() into its caller Fold dmar_insert_one_dev_info() into domain_add_dev_info() which is its only caller. No intentional functional impact. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220416120423.879552-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220510023407.2759143-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-05-13 15:14:56 +02:00
Lu Baolu	e19c3992b9	iommu/vt-d: Change return type of dmar_insert_one_dev_info() The dmar_insert_one_dev_info() returns the pass-in domain on success and NULL on failure. This doesn't make much sense. Change it to an integer. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220416120423.879552-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220510023407.2759143-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-05-13 15:14:56 +02:00
Muhammad Usama Anjum	cd901e9284	iommu/vt-d: Remove unneeded validity check on dev dev_iommu_priv_get() is being used at the top of this function which dereferences dev. Dev cannot be NULL after this. Remove the validity check on dev and simplify the code. Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20220313150337.593650-1-usama.anjum@collabora.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220510023407.2759143-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-05-13 15:14:56 +02:00
Jason Gunthorpe	f78dc1dad8	iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for IOMMU_CACHE While the comment was correct that this flag was intended to convey the block no-snoop support in the IOMMU, it has become widely implemented and used to mean the IOMMU supports IOMMU_CACHE as a map flag. Only the Intel driver was different. Now that the Intel driver is using enforce_cache_coherency() update the comment to make it clear that IOMMU_CAP_CACHE_COHERENCY is only about IOMMU_CACHE. Fix the Intel driver to return true since IOMMU_CACHE always works. The two places that test this flag, usnic and vdpa, are both assigning userspace pages to a driver controlled iommu_domain and require IOMMU_CACHE behavior as they offer no way for userspace to synchronize caches. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 17:24:57 +02:00
Jason Gunthorpe	71cfafda9c	vfio: Move the Intel no-snoop control off of IOMMU_CACHE IOMMU_CACHE means "normal DMA to this iommu_domain's IOVA should be cache coherent" and is used by the DMA API. The definition allows for special non-coherent DMA to exist - ie processing of the no-snoop flag in PCIe TLPs - so long as this behavior is opt-in by the device driver. The flag is mainly used by the DMA API to synchronize the IOMMU setting with the expected cache behavior of the DMA master. eg based on dev_is_dma_coherent() in some case. For Intel IOMMU IOMMU_CACHE was redefined to mean 'force all DMA to be cache coherent' which has the practical effect of causing the IOMMU to ignore the no-snoop bit in a PCIe TLP. x86 platforms are always IOMMU_CACHE, so Intel should ignore this flag. Instead use the new domain op enforce_cache_coherency() which causes every IOPTE created in the domain to have the no-snoop blocking behavior. Reconfigure VFIO to always use IOMMU_CACHE and call enforce_cache_coherency() to operate the special Intel behavior. Remove the IOMMU_CACHE test from Intel IOMMU. Ultimately VFIO plumbs the result of enforce_cache_coherency() back into the x86 platform code through kvm_arch_register_noncoherent_dma() which controls if the WBINVD instruction is available in the guest. No other archs implement kvm_arch_register_noncoherent_dma() nor are there any other known consumers of VFIO_DMA_CC_IOMMU that might be affected by the user visible result change on non-x86 archs. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 17:24:57 +02:00
Jason Gunthorpe	6043257b1d	iommu: Introduce the domain op enforce_cache_coherency() This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY and IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU. Currently only Intel and AMD IOMMUs are known to support this feature. They both implement it as an IOPTE bit, that when set, will cause PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though the no-snoop bit was clear. The new API is triggered by calling enforce_cache_coherency() before mapping any IOVA to the domain which globally switches on no-snoop blocking. This allows other implementations that might block no-snoop globally and outside the IOPTE - AMD also documents such a HW capability. Leave AMD out of sync with Intel and have it block no-snoop even for in-kernel users. This can be trivially resolved in a follow up patch. Only VFIO needs to call this API because it does not have detailed control over the device to avoid requesting no-snoop behavior at the device level. Other places using domains with real kernel drivers should simply avoid asking their devices to set the no-snoop bit. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 17:24:57 +02:00
David Stevens	59bf3557cf	iommu/vt-d: Calculate mask for non-aligned flushes Calculate the appropriate mask for non-size-aligned page selective invalidation. Since psi uses the mask value to mask out the lower order bits of the target address, properly flushing the iotlb requires using a mask value such that [pfn, pfn+pages) all lie within the flushed size-aligned region. This is not normally an issue because iova.c always allocates iovas that are aligned to their size. However, iovas which come from other sources (e.g. userspace via VFIO) may not be aligned. To properly flush the IOTLB, both the start and end pfns need to be equal after applying the mask. That means that the most efficient mask to use is the index of the lowest bit that is equal where all higher bits are also equal. For example, if pfn=0x17f and pages=3, then end_pfn=0x181, so the smallest mask we can use is 8. Any differences above the highest bit of pages are due to carrying, so by xnor'ing pfn and end_pfn and then masking out the lower order bits based on pages, we get 0xffffff00, where the first set bit is the mask we want to use. Fixes: `6fe1010d6d` ("vfio/type1: DMA unmap chunking") Cc: stable@vger.kernel.org Signed-off-by: David Stevens <stevensd@chromium.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220401022430.1262215-1-stevensd@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220410013533.3959168-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 11:01:06 +02:00
Robin Murphy	d0be55fbeb	iommu: Add capability for pre-boot DMA protection VT-d's dmar_platform_optin() actually represents a combination of properties fairly well standardised by Microsoft as "Pre-boot DMA Protection" and "Kernel DMA Protection"[1]. As such, we can provide interested consumers with an abstracted capability rather than driver-specific interfaces that won't scale. We name it for the former aspect since that's what external callers are most likely to be interested in; the latter is for the IOMMU layer to handle itself. [1] https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-kernel-dma-protection Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/d6218dff2702472da80db6aec2c9589010684551.1650878781.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 10:30:25 +02:00
Jani Nikula	83970cd63b	Merge drm/drm-next into drm-intel-next Sync up with v5.18-rc1, in particular to get `5e3094cfd9` ("drm/i915/xehpsdv: Add has_flat_ccs to device info"). Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2022-04-11 16:01:56 +03:00
Joerg Roedel	e17c6debd4	Merge branches 'arm/mediatek', 'arm/msm', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'x86/vt-d' and 'x86/amd' into next	2022-03-08 12:21:31 +01:00
Yian Chen	97f2f2c531	iommu/vt-d: Enable ATS for the devices in SATC table Starting from Intel VT-d v3.2, Intel platform BIOS can provide additional SATC table structure. SATC table includes a list of SoC integrated devices that support ATC (Address translation cache). Enabling ATC (via ATS capability) can be a functional requirement for SATC device operation or optional to enhance device performance/functionality. This is determined by the bit of ATC_REQUIRED in SATC table. When IOMMU is working in scalable mode, software chooses to always enable ATS for every device in SATC table because Intel SoC devices in SATC table are trusted to use ATS. On the other hand, if IOMMU is in legacy mode, ATS of SATC capable devices can work transparently to software and be automatically enabled by IOMMU hardware. As the result, there is no need for software to enable ATS on these devices. This also removes dmar_find_matched_atsr_unit() helper as it becomes dead code now. Signed-off-by: Yian Chen <yian.chen@intel.com> Link: https://lore.kernel.org/r/20220222185416.1722611-1-yian.chen@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220301020159.633356-13-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-03-04 16:46:31 +01:00
Marco Bonelli	45967ffb9e	iommu/vt-d: Add missing "__init" for rmrr_sanity_check() rmrr_sanity_check() calls arch_rmrr_sanity_check(), which is marked as "__init". Add the annotation for rmrr_sanity_check() too. This function is currently only called from dmar_parse_one_rmrr() which is also "__init". Signed-off-by: Marco Bonelli <marco@mebeim.net> Link: https://lore.kernel.org/r/20220210023141.9208-1-marco@mebeim.net Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220301020159.633356-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-03-04 16:46:31 +01:00
Lu Baolu	2187a57ef0	iommu/vt-d: Fix indentation of goto labels Remove blanks before goto labels. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220301020159.633356-9-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-03-04 16:46:31 +01:00
Lu Baolu	782861df7d	iommu/vt-d: Remove unnecessary prototypes Some prototypes in iommu.c are unnecessary. Delete them. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220301020159.633356-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-03-04 16:46:30 +01:00
Lu Baolu	763e656c69	iommu/vt-d: Remove unnecessary includes Remove unnecessary include files and sort the remaining alphabetically. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220301020159.633356-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-03-04 16:46:30 +01:00
Lu Baolu	586081d3f6	iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO Allocate and set the per-device iommu private data during iommu device probe. This makes the per-device iommu private data always available during iommu_probe_device() and iommu_release_device(). With this changed, the dummy DEFER_DEVICE_DOMAIN_INFO pointer could be removed. The wrappers for getting the private data and domain are also cleaned. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220301020159.633356-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-03-04 16:46:30 +01:00
Lu Baolu	ee2653bbe8	iommu/vt-d: Remove domain and devinfo mempool The domain and devinfo memory blocks are only allocated during device probe and released during remove. There's no hot-path context, hence no need for memory pools. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220301020159.633356-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-03-04 16:46:30 +01:00
Lu Baolu	c8850a6e6d	iommu/vt-d: Remove iova_cache_get/put() These have been done in drivers/iommu/dma-iommu.c. Remove this duplicate code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220301020159.633356-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-03-04 16:46:30 +01:00
Lu Baolu	c5d27545fb	iommu/vt-d: Remove finding domain in dmar_insert_one_dev_info() The Intel IOMMU driver has already converted to use default domain framework in iommu core. There's no need to find a domain for the device in the domain attaching path. Cleanup that code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220301020159.633356-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-03-04 16:46:30 +01:00
Lu Baolu	402e6688a7	iommu/vt-d: Remove intel_iommu::domains The "domains" field of the intel_iommu structure keeps the mapping of domain_id to dmar_domain. This information is not used anywhere. Remove and cleanup it to avoid unnecessary memory consumption. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220301020159.633356-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-03-04 16:46:30 +01:00
Tejas Upadhyay	0a967f5bfd	iommu/vt-d: Add RPLS to quirk list to skip TE disabling The VT-d spec requires (10.4.4 Global Command Register, TE field) that: Hardware implementations supporting DMA draining must drain any in-flight DMA read/write requests queued within the Root-Complex before completing the translation enable command and reflecting the status of the command through the TES field in the Global Status register. Unfortunately, some integrated graphic devices fail to do so after some kind of power state transition. As the result, the system might stuck in iommu_disable_translati on(), waiting for the completion of TE transition. This adds RPLS to a quirk list for those devices and skips TE disabling if the qurik hits. Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4898 Tested-by: Raviteja Goud Talla <ravitejax.goud.talla@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302043256.191529-1-tejaskumarx.surendrakumar.upadhyay@intel.com	2022-03-02 17:06:30 -05:00
Adrian Huang	b00833768e	iommu/vt-d: Fix double list_add when enabling VMD in scalable mode When enabling VMD and IOMMU scalable mode, the following kernel panic call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids CPU) during booting: pci 0000:59:00.5: Adding to iommu group 42 ... vmd 0000:59:00.5: PCI host bridge to bus 10000:80 pci 10000:80:01.0: [8086:352a] type 01 class 0x060400 pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit] pci 10000:80:01.0: enabling Extended Tags pci 10000:80:01.0: PME# supported from D0 D3hot D3cold pci 10000:80:01.0: DMAR: Setup RID2PASID failed pci 10000:80:01.0: Failed to add to iommu group 42: -16 pci 10000:80:03.0: [8086:352b] type 01 class 0x060400 pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit] pci 10000:80:03.0: enabling Extended Tags pci 10000:80:03.0: PME# supported from D0 D3hot D3cold ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:29! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7 Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022 Workqueue: events work_for_cpu_fn RIP: 0010:__list_add_valid.cold+0x26/0x3f Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f 0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1 fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9 9e e8 8b b1 fe RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246 RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8 RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20 RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888 R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0 R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70 FS: 0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> intel_pasid_alloc_table+0x9c/0x1d0 dmar_insert_one_dev_info+0x423/0x540 ? device_to_iommu+0x12d/0x2f0 intel_iommu_attach_device+0x116/0x290 __iommu_attach_device+0x1a/0x90 iommu_group_add_device+0x190/0x2c0 __iommu_probe_device+0x13e/0x250 iommu_probe_device+0x24/0x150 iommu_bus_notifier+0x69/0x90 blocking_notifier_call_chain+0x5a/0x80 device_add+0x3db/0x7b0 ? arch_memremap_can_ram_remap+0x19/0x50 ? memremap+0x75/0x140 pci_device_add+0x193/0x1d0 pci_scan_single_device+0xb9/0xf0 pci_scan_slot+0x4c/0x110 pci_scan_child_bus_extend+0x3a/0x290 vmd_enable_domain.constprop.0+0x63e/0x820 vmd_probe+0x163/0x190 local_pci_probe+0x42/0x80 work_for_cpu_fn+0x13/0x20 process_one_work+0x1e2/0x3b0 worker_thread+0x1c4/0x3a0 ? rescuer_thread+0x370/0x370 kthread+0xc7/0xf0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- ... Kernel panic - not syncing: Fatal exception Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]--- The following 'lspci' output shows devices '10000:80:' are subdevices of the VMD device 0000:59:00.5: $ lspci ... 0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20) ... 10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03) 10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03) 10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03) 10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03) 10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] 10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] The symptom 'list_add double add' is caused by the following failure message: pci 10000:80:01.0: DMAR: Setup RID2PASID failed pci 10000:80:01.0: Failed to add to iommu group 42: -16 pci 10000:80:03.0: [8086:352b] type 01 class 0x060400 Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5, so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD device 0000:59:00.5. Here is call path: intel_pasid_alloc_table pci_for_each_dma_alias get_alias_pasid_table search_pasid_table pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device which is the VMD device 0000:59:00.5. However, pte of the VMD device 0000:59:00.5 has been configured during this message "pci 0000:59:00.5: Adding to iommu group 42". So, the status -EBUSY is returned when configuring pasid entry for device 10000:80:01.0. It then invokes dmar_remove_one_dev_info() to release 'struct device_domain_info ' from iommu_devinfo_cache. But, the pasid table is not released because of the following statement in __dmar_remove_one_dev_info(): if (info->dev && !dev_is_real_dma_subdevice(info->dev)) { ... intel_pasid_free_table(info->dev); } The subsequent dmar_insert_one_dev_info() operation of device 10000:80:03.0 allocates 'struct device_domain_info *' from iommu_devinfo_cache. The allocated address is the same address that is released previously for device 10000:80:01.0. Finally, invoking device_attach_pasid_table() causes the issue. `git bisect` points to the offending commit `474dd1c650` ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries"), which releases the pasid table if the device is not the subdevice by checking the returned status of dev_is_real_dma_subdevice(). Reverting the offending commit can work around the issue. The solution is to prevent from allocating pasid table if those devices are subdevices of the VMD device. Fixes: `474dd1c650` ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries") Cc: stable@vger.kernel.org # v5.14+ Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Link: https://lore.kernel.org/r/20220216091307.703-1-adrianhuang0701@gmail.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220221053348.262724-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-02-28 13:48:44 +01:00
Lu Baolu	9a630a4b41	iommu: Split struct iommu_ops Move the domain specific operations out of struct iommu_ops into a new structure that only has domain specific operations. This solves the problem of needing to know if the method vector for a given operation needs to be retrieved from the device or the domain. Logically the domain ops are the ones that make sense for external subsystems and endpoint drivers to use, while device ops, with the sole exception of domain_alloc, are IOMMU API internals. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220216025249.3459465-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-02-28 13:25:49 +01:00
Lu Baolu	41bb23e70b	iommu: Remove unused argument in is_attach_deferred The is_attach_deferred iommu_ops callback is a device op. The domain argument is unnecessary and never used. Remove it to make code clean. Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220216025249.3459465-9-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-02-28 13:25:49 +01:00
Lu Baolu	241469685d	iommu/vt-d: Remove aux-domain related callbacks The aux-domain related callbacks are not called in the tree. Remove them to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220216025249.3459465-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-02-28 13:25:48 +01:00
Lu Baolu	989192ac6a	iommu/vt-d: Remove guest pasid related callbacks The guest pasid related callbacks are not called in the tree. Remove them to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220216025249.3459465-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-02-28 13:25:48 +01:00
Joerg Roedel	66dc1b791c	Merge branches 'arm/smmu', 'virtio', 'x86/amd', 'x86/vt-d' and 'core' into next	2022-01-04 10:33:45 +01:00
Matthew Wilcox (Oracle)	87f60cc65d	iommu/vt-d: Use put_pages_list page->freelist is for the use of slab. We already have the ability to free a list of pages in the core mm, but it requires the use of a list_head and for the pages to be chained together through page->lru. Switch the Intel IOMMU and IOVA code over to using free_pages_list(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> [rm: split from original patch, cosmetic tweaks, fix fq entries] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/2115b560d9a0ce7cd4b948bd51a2b7bde8fdfd59.1639753638.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-12-20 09:03:05 +01:00
Maíra Canal	c95a9c278d	iommu/vt-d: Remove unused dma_to_mm_pfn function Remove dma_to_buf_pfn function, which is not used in the codebase. This was pointed by clang with the following warning: 'dma_to_mm_pfn' [-Wunused-function] static inline unsigned long dma_to_mm_pfn(unsigned long dma_pfn) ^ https://lore.kernel.org/r/YYhY7GqlrcTZlzuA@fedora drivers/iommu/intel/iommu.c:136:29: warning: unused function Signed-off-by: Maíra Canal <maira.canal@usp.br> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211217083817.1745419-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-12-17 09:40:29 +01:00
Kefeng Wang	f5209f9127	iommu/vt-d: Drop duplicate check in dma_pte_free_pagetable() The BUG_ON check exists in dma_pte_clear_range(), kill the duplicate check. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20211025032307.182974-1-wangkefeng.wang@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211217083817.1745419-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-12-17 09:40:29 +01:00
Christophe JAILLET	bb71257396	iommu/vt-d: Use bitmap_zalloc() when applicable 'iommu->domain_ids' is a bitmap. So use 'bitmap_zalloc()' to simplify code and improve the semantic. Also change the corresponding 'kfree()' into 'bitmap_free()' to keep consistency. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/cb7a3e0a8d522447a06298a4f244c3df072f948b.1635018498.git.christophe.jaillet@wanadoo.fr Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211217083817.1745419-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-12-17 09:40:29 +01:00
Kees Cook	4599d78a82	iommu/vt-d: Use correctly sized arguments for bit field The find.h APIs are designed to be used only on unsigned long arguments. This can technically result in a over-read, but it is harmless in this case. Regardless, fix it to avoid the warning seen under -Warray-bounds, which we'd like to enable globally: In file included from ./include/linux/bitmap.h:9, from drivers/iommu/intel/iommu.c:17: drivers/iommu/intel/iommu.c: In function 'domain_context_mapping_one': ./include/linux/find.h:119:37: warning: array subscript 'long unsigned int[0]' is partly outside array bounds of 'int[1]' [-Warray-bounds] 119 \| unsigned long val = *addr & GENMASK(size - 1, 0); \| ^~~~~ drivers/iommu/intel/iommu.c:2115:18: note: while referencing 'max_pde' 2115 \| int pds, max_pde; \| ^~~~~~~ Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20211215232432.2069605-1-keescook@chromium.org Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-12-17 09:05:04 +01:00
Alex Williamson	86dc40c7ea	iommu/vt-d: Fix unmap_pages support When supporting only the .map and .unmap callbacks of iommu_ops, the IOMMU driver can make assumptions about the size and alignment used for mappings based on the driver provided pgsize_bitmap. VT-d previously used essentially PAGE_MASK for this bitmap as any power of two mapping was acceptably filled by native page sizes. However, with the .map_pages and .unmap_pages interface we're now getting page-size and count arguments. If we simply combine these as (page-size * count) and make use of the previous map/unmap functions internally, any size and alignment assumptions are very different. As an example, a given vfio device assignment VM will often create a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that does not support IOMMU super pages, the unmap_pages interface will ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level() will recurse down to level 2 of the page table where the first half of the pfn range exactly matches the entire pte level. We clear the pte, increment the pfn by the level size, but (oops) the next pte is on a new page, so we exit the loop an pop back up a level. When we then update the pfn based on that higher level, we seem to assume that the previous pfn value was at the start of the level. In this case the level size is 256K pfns, which we add to the base pfn and get a results of 0x7fe00, which is clearly greater than 0x401ff, so we're done. Meanwhile we never cleared the ptes for the remainder of the range. When the VM remaps this range, we're overwriting valid ptes and the VT-d driver complains loudly, as reported by the user report linked below. The fix for this seems relatively simple, if each iteration of the loop in dma_pte_clear_level() is assumed to clear to the end of the level pte page, then our next pfn should be calculated from level_pfn rather than our working pfn. Fixes: `3f34f12597` ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback") Reported-by: Ajay Garg <ajaygargnsit@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/ Link: https://lore.kernel.org/r/163659074748.1617923.12716161410774184024.stgit@omen Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211126135556.397932-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-11-26 22:54:47 +01:00
Longpeng(Mike)	9906b9352a	iommu/vt-d: Avoid duplicate removing in __domain_mapping() The __domain_mapping() always removes the pages in the range from 'iov_pfn' to 'end_pfn', but the 'end_pfn' is always the last pfn of the range that the caller wants to map. This would introduce too many duplicated removing and leads the map operation take too long, for example: Map iova=0x100000,nr_pages=0x7d61800 iov_pfn: 0x100000, end_pfn: 0x7e617ff iov_pfn: 0x140000, end_pfn: 0x7e617ff iov_pfn: 0x180000, end_pfn: 0x7e617ff iov_pfn: 0x1c0000, end_pfn: 0x7e617ff iov_pfn: 0x200000, end_pfn: 0x7e617ff ... it takes about 50ms in total. We can reduce the cost by recalculate the 'end_pfn' and limit it to the boundary of the end of this pte page. Map iova=0x100000,nr_pages=0x7d61800 iov_pfn: 0x100000, end_pfn: 0x13ffff iov_pfn: 0x140000, end_pfn: 0x17ffff iov_pfn: 0x180000, end_pfn: 0x1bffff iov_pfn: 0x1c0000, end_pfn: 0x1fffff iov_pfn: 0x200000, end_pfn: 0x23ffff ... it only need 9ms now. This also removes a meaningless BUG_ON() in __domain_mapping(). Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Tested-by: Liujunjie <liujunjie23@huawei.com> Link: https://lore.kernel.org/r/20211008000433.1115-1-longpeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211014053839.727419-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-10-18 12:31:48 +02:00
Lu Baolu	94f797ad61	iommu/vt-d: Delete dev_has_feat callback The commit `262948f8ba` ("iommu: Delete iommu_dev_has_feature()") has deleted the iommu_dev_has_feature() interface. Remove the dev_has_feat callback to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210929072030.1330225-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20211014053839.727419-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-10-18 12:31:48 +02:00
Lu Baolu	032c5ee40e	iommu/vt-d: Use second level for GPA->HPA translation The IOMMU VT-d implementation uses the first level for GPA->HPA translation by default. Although both the first level and the second level could handle the DMA translation, they're different in some way. For example, the second level translation has separate controls for the Access/Dirty page tracking. With the first level translation, there's no such control. On the other hand, the second level translation has the page-level control for forcing snoop, but the first level only has global control with pasid granularity. This uses the second level for GPA->HPA translation so that we can provide a consistent hardware interface for use cases like dirty page tracking for live migration. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20211014053839.727419-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-10-18 12:31:48 +02:00
Lu Baolu	b34380a6d7	iommu/vt-d: Remove duplicate identity domain flag The iommu_domain data structure already has the "type" field to keep the type of a domain. It's unnecessary to have the DOMAIN_FLAG_STATIC_IDENTITY flag in the vt-d implementation. This cleans it up with no functionality change. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20211014053839.727419-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-10-18 12:31:48 +02:00
Kyung Min Park	914ff7719e	iommu/vt-d: Dump DMAR translation structure when DMA fault occurs When the dmar translation fault happens, the kernel prints a single line fault reason with corresponding hexadecimal code defined in the Intel VT-d specification. Currently, when user wants to debug the translation fault in detail, debugfs is used for dumping the dmar_translation_struct, which is not available when the kernel failed to boot. Dump the DMAR translation structure, pagewalk the IO page table and print the page table entry when the fault happens. This takes effect only when CONFIG_DMAR_DEBUG is enabled. Signed-off-by: Kyung Min Park <kyung.min.park@intel.com> Link: https://lore.kernel.org/r/20210815203845.31287-1-kyung.min.park@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211014053839.727419-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-10-18 12:31:48 +02:00
Tvrtko Ursulin	5240aed2cd	iommu/vt-d: Do not falsely log intel_iommu is unsupported kernel option Handling of intel_iommu kernel command line option should return "true" to indicate option is valid and so avoid logging it as unknown by the core parsing code. Also log unknown sub-options at the notice level to let user know of potential typos or similar. Reported-by: Eero Tamminen <eero.t.tamminen@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://lore.kernel.org/r/20210831112947.310080-1-tvrtko.ursulin@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211014053839.727419-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-10-18 12:31:48 +02:00
Lu Baolu	48811c4434	iommu/vt-d: Allow devices to have more than 32 outstanding PRs The minimum per-IOMMU PRQ queue size is one 4K page, this is more entries than the hardcoded limit of 32 in the current VT-d code. Some devices can support up to 512 outstanding PRQs but underutilized by this limit of 32. Although, 32 gives some rough fairness when multiple devices share the same IOMMU PRQ queue, but far from optimal for customized use case. This extends the per-IOMMU PRQ queue size to four 4K pages and let the devices have as many outstanding page requests as they can. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-08-19 10:41:08 +02:00
Lu Baolu	289b3b005c	iommu/vt-d: Preset A/D bits for user space DMA usage We preset the access and dirty bits for IOVA over first level usage only for the kernel DMA (i.e., when domain type is IOMMU_DOMAIN_DMA). We should also preset the FL A/D for user space DMA usage. The idea is that even the user space A/D bit memory write is unnecessary. We should avoid it to minimize the overhead. Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-08-19 10:41:08 +02:00
Lu Baolu	792fb43ce2	iommu/vt-d: Enable Intel IOMMU scalable mode by default The commit `8950dcd83a` ("iommu/vt-d: Leave scalable mode default off") leaves the scalable mode default off and end users could turn it on with "intel_iommu=sm_on". Using the Intel IOMMU scalable mode for kernel DMA, user-level device access and Shared Virtual Address have been enabled. This enables the scalable mode by default if the hardware advertises the support and adds kernel options of "intel_iommu=sm_on/sm_off" for end users to configure it through the kernel parameters. Suggested-by: Ashok Raj <ashok.raj@intel.com> Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-08-19 10:41:08 +02:00
Lu Baolu	01dac2d9d2	iommu/vt-d: Refactor Kconfig a bit Put all sub-options inside a "if INTEL_IOMMU" so that they don't need to always depend on INTEL_IOMMU. Use IS_ENABLED() instead of #ifdef as well. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-08-19 10:41:08 +02:00
Zhen Lei	5e41c99894	iommu/vt-d: Remove unnecessary oom message Fixes scripts/checkpatch.pl warning: WARNING: Possible unnecessary 'out of memory' message Remove it can help us save a bit of memory. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20210609124937.14260-1-thunder.leizhen@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210818134852.1847070-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-08-19 10:41:08 +02:00
Robin Murphy	78ca078459	iommu/vt-d: Prepare for multiple DMA domain types In preparation for the strict vs. non-strict decision for DMA domains to be expressed in the domain type, make sure we expose our flush queue awareness by accepting the new domain type, and test the specific feature flag where we want to identify DMA domains in general. The DMA ops reset/setup can simply be made unconditional, since iommu-dma already knows only to touch DMA domains. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/31a8ef868d593a2f3826a6a120edee81815375a7.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-08-18 13:27:49 +02:00
Robin Murphy	f297e27f83	iommu/vt-d: Drop IOVA cookie management The core code bakes its own cookies now. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/e9dbe3b6108f8538e17e0c5f59f8feeb714f51a4.1628682048.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-08-18 13:25:31 +02:00
Lu Baolu	75cc1018a9	iommu/vt-d: Move clflush'es from iotlb_sync_map() to map_pages() As the Intel VT-d driver has switched to use the iommu_ops.map_pages() callback, multiple pages of the same size will be mapped in a call. There's no need to put the clflush'es in iotlb_sync_map() callback. Move them back into __domain_mapping() to simplify the code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720020615.4144323-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-07-26 13:56:25 +02:00
Lu Baolu	3f34f12597	iommu/vt-d: Implement map/unmap_pages() iommu_ops callback Implement the map_pages() and unmap_pages() callback for the Intel IOMMU driver to allow calls from iommu core to map and unmap multiple pages of the same size in one call. With map/unmap_pages() implemented, the prior map/unmap callbacks are deprecated. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720020615.4144323-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-07-26 13:56:25 +02:00
Lu Baolu	a886d5a7e6	iommu/vt-d: Report real pgsize bitmap to iommu core The pgsize bitmap is used to advertise the page sizes our hardware supports to the IOMMU core, which will then use this information to split physically contiguous memory regions it is mapping into page sizes that we support. Traditionally the IOMMU core just handed us the mappings directly, after making sure the size is an order of a 4KiB page and that the mapping has natural alignment. To retain this behavior, we currently advertise that we support all page sizes that are an order of 4KiB. We are about to utilize the new IOMMU map/unmap_pages APIs. We could change this to advertise the real page sizes we support. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720020615.4144323-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-07-26 13:56:25 +02:00
John Garry	308723e358	iommu: Remove mode argument from iommu_set_dma_strict() We only ever now set strict mode enabled in iommu_set_dma_strict(), so just remove the argument. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1626088340-5838-7-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-07-26 13:27:38 +02:00
Zhen Lei	d0e108b8e9	iommu/vt-d: Add support for IOMMU default DMA mode build options Make IOMMU_DEFAULT_LAZY default for when INTEL_IOMMU config is set, as is current behaviour. Also delete global flag intel_iommu_strict: - In intel_iommu_setup(), call iommu_set_dma_strict(true) directly. Also remove the print, as iommu_subsys_init() prints the mode and we have already marked this param as deprecated. - For cap_caching_mode() check in intel_iommu_setup(), call iommu_set_dma_strict(true) directly; also reword the accompanying print with a level downgrade and also add the missing '\n'. - For Ironlake GPU, again call iommu_set_dma_strict(true) directly and keep the accompanying print. [jpg: Remove intel_iommu_strict] Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1626088340-5838-5-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-07-26 13:27:38 +02:00
John Garry	1d479f160c	iommu: Deprecate Intel and AMD cmdline methods to enable strict mode Now that the x86 drivers support iommu.strict, deprecate the custom methods. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1626088340-5838-2-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-07-26 13:27:38 +02:00
Lu Baolu	474dd1c650	iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries The commit `2b0140c696` ("iommu/vt-d: Use pci_real_dma_dev() for mapping") fixes an issue of "sub-device is removed where the context entry is cleared for all aliases". But this commit didn't consider the PASID entry and PASID table in VT-d scalable mode. This fix increases the coverage of scalable mode. Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Fixes: `8038bdb855` ("iommu/vt-d: Only clear real DMA device's context entries") Fixes: `2b0140c696` ("iommu/vt-d: Use pci_real_dma_dev() for mapping") Cc: stable@vger.kernel.org # v5.6+ Cc: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210712071712.3416949-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-07-14 12:58:07 +02:00
Sanjay Kumar	37764b952e	iommu/vt-d: Global devTLB flush when present context entry changed This fixes a bug in context cache clear operation. The code was not following the correct invalidation flow. A global device TLB invalidation should be added after the IOTLB invalidation. At the same time, it uses the domain ID from the context entry. But in scalable mode, the domain ID is in PASID table entry, not context entry. Fixes: `7373a8cc38` ("iommu/vt-d: Setup context and enable RID2PASID support") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210712071315.3416543-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-07-14 12:57:39 +02:00
Joerg Roedel	2b9d8e3e9a	Merge branches 'iommu/fixes', 'arm/rockchip', 'arm/smmu', 'x86/vt-d', 'x86/amd', 'virtio' and 'core' into next	2021-06-25 15:23:25 +02:00
Jean-Philippe Brucker	ac6d704679	iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops() Passing a 64-bit address width to iommu_setup_dma_ops() is valid on virtual platforms, but isn't currently possible. The overflow check in iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass a limit address instead of a size, so callers don't have to fake a size to work around the check. The base and limit parameters are being phased out, because: * they are redundant for x86 callers. dma-iommu already reserves the first page, and the upper limit is already in domain->geometry. * they can now be obtained from dev->dma_range_map on Arm. But removing them on Arm isn't completely straightforward so is left for future work. As an intermediate step, simplify the x86 callers by passing dummy limits. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20210618152059.1194210-5-jean-philippe@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-25 15:02:43 +02:00
Colin Ian King	934ed4580c	iommu/vt-d: Fix dereference of pointer info before it is null checked The assignment of iommu from info->iommu occurs before info is null checked hence leading to a potential null pointer dereference issue. Fix this by assigning iommu and checking if iommu is null after null checking info. Addresses-Coverity: ("Dereference before null check") Fixes: `4c82b88696` ("iommu/vt-d: Allocate/register iopf queue for sva devices") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210611135024.32781-1-colin.king@canonical.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-18 15:19:50 +02:00
Parav Pandit	7a0f06c197	iommu/vt-d: No need to typecast Page directory assignment by alloc_pgtable_page() or phys_to_virt() doesn't need typecasting as both routines return void*. Hence, remove typecasting from both the calls. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-24-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-10 09:06:14 +02:00
Parav Pandit	cee57d4fe7	iommu/vt-d: Remove unnecessary braces No need for braces for single line statement under if() block. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-22-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-10 09:06:14 +02:00
Parav Pandit	74f6d776ae	iommu/vt-d: Removed unused iommu_count in dmar domain DMAR domain uses per DMAR refcount. It is indexed by iommu seq_id. Older iommu_count is only incremented and decremented but no decisions are taken based on this refcount. This is not of much use. Hence, remove iommu_count and further simplify domain_detach_iommu() by returning void. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-21-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-10 09:06:14 +02:00
Parav Pandit	1f106ff0ea	iommu/vt-d: Use bitfields for DMAR capabilities IOTLB device presence, iommu coherency and snooping are boolean capabilities. Use them as bits and keep them adjacent. Structure layout before the reorg. $ pahole -C dmar_domain drivers/iommu/intel/dmar.o struct dmar_domain { int nid; /* 0 4 / unsigned int iommu_refcnt[128]; / 4 512 / / --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- / u16 iommu_did[128]; / 516 256 / / --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- / bool has_iotlb_device; / 772 1 / / XXX 3 bytes hole, try to pack / struct list_head devices; / 776 16 / struct list_head subdevices; / 792 16 / struct iova_domain iovad __attribute__((__aligned__(8))); / 808 2320 / / --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- / struct dma_pte pgd; /* 3128 8 / / --- cacheline 49 boundary (3136 bytes) --- / int gaw; / 3136 4 / int agaw; / 3140 4 / int flags; / 3144 4 / int iommu_coherency; / 3148 4 / int iommu_snooping; / 3152 4 / int iommu_count; / 3156 4 / int iommu_superpage; / 3160 4 / / XXX 4 bytes hole, try to pack / u64 max_addr; / 3168 8 / u32 default_pasid; / 3176 4 / / XXX 4 bytes hole, try to pack / struct iommu_domain domain; / 3184 72 / / size: 3256, cachelines: 51, members: 18 / / sum members: 3245, holes: 3, sum holes: 11 / / forced alignments: 1 / / last cacheline: 56 bytes / } __attribute__((__aligned__(8))); After arranging it for natural padding and to make flags as u8 bits, it saves 8 bytes for the struct. struct dmar_domain { int nid; / 0 4 / unsigned int iommu_refcnt[128]; / 4 512 / / --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- / u16 iommu_did[128]; / 516 256 / / --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- / u8 has_iotlb_device:1; / 772: 0 1 / u8 iommu_coherency:1; / 772: 1 1 / u8 iommu_snooping:1; / 772: 2 1 / / XXX 5 bits hole, try to pack / / XXX 3 bytes hole, try to pack / struct list_head devices; / 776 16 / struct list_head subdevices; / 792 16 / struct iova_domain iovad __attribute__((__aligned__(8))); / 808 2320 / / --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- / struct dma_pte pgd; /* 3128 8 / / --- cacheline 49 boundary (3136 bytes) --- / int gaw; / 3136 4 / int agaw; / 3140 4 / int flags; / 3144 4 / int iommu_count; / 3148 4 / int iommu_superpage; / 3152 4 / / XXX 4 bytes hole, try to pack / u64 max_addr; / 3160 8 / u32 default_pasid; / 3168 4 / / XXX 4 bytes hole, try to pack / struct iommu_domain domain; / 3176 72 / / size: 3248, cachelines: 51, members: 18 / / sum members: 3236, holes: 3, sum holes: 11 / / sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits / / forced alignments: 1 / / last cacheline: 48 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-20-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-10 09:06:13 +02:00
YueHaibing	3bc770b0e9	iommu/vt-d: Use DEVICE_ATTR_RO macro Use DEVICE_ATTR_RO() helper instead of plain DEVICE_ATTR(), which makes the code a bit shorter and easier to read. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210528130229.22108-1-yuehaibing@huawei.com Link: https://lore.kernel.org/r/20210610020115.1637656-19-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-10 09:06:13 +02:00
Lu Baolu	d5b9e4bfe0	iommu/vt-d: Report prq to io-pgfault framework Let the IO page fault requests get handled through the io-pgfault framework. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-12-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-10 09:06:13 +02:00
Lu Baolu	4c82b88696	iommu/vt-d: Allocate/register iopf queue for sva devices This allocates and registers the iopf queue infrastructure for devices which want to support IO page fault for SVA. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-10 09:06:13 +02:00
Lu Baolu	4048377414	iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers Align the pasid alloc/free code with the generic helpers defined in the iommu core. This also refactored the SVA binding code to improve the readability. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-10 09:06:12 +02:00
Lu Baolu	521f546b4e	iommu/vt-d: Support asynchronous IOMMU nested capabilities Current VT-d implementation supports nested translation only if all underlying IOMMUs support the nested capability. This is unnecessary as the upper layer is allowed to create different containers and set them with different type of iommu backend. The IOMMU driver needs to guarantee that devices attached to a nested mode iommu_domain should support nested capabilility. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210517065701.5078-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-10 09:06:12 +02:00
Colin Ian King	05d2cbf969	iommu/vt-d: Remove redundant assignment to variable agaw The variable agaw is initialized with a value that is never read and it is being updated later with a new value as a counter in a for-loop. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210416171826.64091-1-colin.king@canonical.com Link: https://lore.kernel.org/r/20210610020115.1637656-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-10 09:06:12 +02:00
Lu Baolu	54c80d9074	iommu/vt-d: Use user privilege for RID2PASID translation When first-level page tables are used for IOVA translation, we use user privilege by setting U/S bit in the page table entry. This is to make it consistent with the second level translation, where the U/S enforcement is not available. Clear the SRE (Supervisor Request Enable) field in the pasid table entry of RID2PASID so that requests requesting the supervisor privilege are blocked and treated as DMA remapping faults. Fixes: `b802d070a5` ("iommu/vt-d: Use iova over first level") Suggested-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210512064426.3440915-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210519015027.108468-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-05-19 08:51:02 +02:00
Dan Carpenter	1a590a1c8b	iommu/vt-d: Check for allocation failure in aux_detach_device() In current kernels small allocations never fail, but checking for allocation failure is the correct thing to do. Fixes: `18abda7a2d` ("iommu/vt-d: Fix general protection fault in aux_detach_device()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/YJuobKuSn81dOPLd@mwanda Link: https://lore.kernel.org/r/20210519015027.108468-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-05-19 08:51:02 +02:00
Robin Murphy	2d471b20c5	iommu: Streamline registration interface Rather than have separate opaque setter functions that are easy to overlook and lead to repetitive boilerplate in drivers, let's pass the relevant initialisation parameters directly to iommu_device_register(). Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/ab001b87c533b6f4db71eb90db6f888953986c36.1617285386.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-04-16 17:20:45 +02:00
Joerg Roedel	49d11527e5	Merge branches 'iommu/fixes', 'arm/mediatek', 'arm/smmu', 'arm/exynos', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next	2021-04-16 17:16:03 +02:00
Longpeng(Mike)	38c527aeb4	iommu/vt-d: Force to flush iotlb before creating superpage The translation caches may preserve obsolete data when the mapping size is changed, suppose the following sequence which can reveal the problem with high probability. 1.mmap(4GB,MAP_HUGETLB) 2. while (1) { (a) DMA MAP 0,0xa0000 (b) DMA UNMAP 0,0xa0000 (c) DMA MAP 0,0xc0000000 * DMA read IOVA 0 may failure here (Not present) * if the problem occurs. (d) DMA UNMAP 0,0xc0000000 } The page table(only focus on IOVA 0) after (a) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x1a30a72003 entry:0xffff89b39cacb000 PTE: 0x21d200803 entry:0xffff89b3b0a72000 The page table after (b) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x1a30a72003 entry:0xffff89b39cacb000 PTE: 0x0 entry:0xffff89b3b0a72000 The page table after (c) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x21d200883 entry:0xffff89b39cacb000 (*) Because the PDE entry after (b) is present, it won't be flushed even if the iommu driver flush cache when unmap, so the obsolete data may be preserved in cache, which would cause the wrong translation at end. However, we can see the PDE entry is finally switch to 2M-superpage mapping, but it does not transform to 0x21d200883 directly: 1. PDE: 0x1a30a72003 2. __domain_mapping dma_pte_free_pagetable Set the PDE entry to ZERO Set the PDE entry to 0x21d200883 So we must flush the cache after the entry switch to ZERO to avoid the obsolete info be preserved. Cc: David Woodhouse <dwmw2@infradead.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Gonglei (Arei) <arei.gonglei@huawei.com> Fixes: `6491d4d028` ("intel-iommu: Free old page tables before creating superpage") Cc: <stable@vger.kernel.org> # v3.0+ Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@huawei.com/ Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210415004628.1779-1-longpeng2@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-04-15 16:12:31 +02:00
Lu Baolu	c0474a606e	iommu/vt-d: Invalidate PASID cache when root/context entry changed When the Intel IOMMU is operating in the scalable mode, some information from the root and context table may be used to tag entries in the PASID cache. Software should invalidate the PASID-cache when changing root or context table entries. Suggested-by: Ashok Raj <ashok.raj@intel.com> Fixes: `7373a8cc38` ("iommu/vt-d: Setup context and enable RID2PASID support") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210320025415.641201-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-04-07 11:55:47 +02:00
Lu Baolu	eea53c5816	iommu/vt-d: Remove WO permissions on second-level paging entries When the first level page table is used for IOVA translation, it only supports Read-Only and Read-Write permissions. The Write-Only permission is not supported as the PRESENT bit (implying Read permission) should always set. When using second level, we still give separate permissions that allows WriteOnly which seems inconsistent and awkward. We want to have consistent behavior. After moving to 1st level, we don't want things to work sometimes, and break if we use 2nd level for the same mappings. Hence remove this configuration. Suggested-by: Ashok Raj <ashok.raj@intel.com> Fixes: `b802d070a5` ("iommu/vt-d: Use iova over first level") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210320025415.641201-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-04-07 11:55:47 +02:00
Robin Murphy	a250c23f15	iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Robin Murphy <robin.murphy@arm.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210401155256.298656-19-hch@lst.de Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-04-07 10:56:53 +02:00

... 2 3 4 5 6 ...

421 Commits