linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-01 06:39:05 +00:00

Author	SHA1	Message	Date
Rolf Eike Beer	85ef671f97	iommu: make inclusion of amd directory conditional Nothing in there is active if CONFIG_AMD_IOMMU is not enabled, so the whole directory can depend on that switch as well. Fixes: `cbe94c6e1a` ("iommu/amd: Move Kconfig and Makefile bits down into amd directory") Signed-off-by: Rolf Eike Beer <eb@emlix.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1894970.atdPhlSkOF@devpool92.emlix.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:46:46 +02:00
Rolf Eike Beer	ddcc66cfe8	iommu: make inclusion of intel directory conditional Nothing in there is active if CONFIG_INTEL_IOMMU is not enabled, so the whole directory can depend on that switch as well. Fixes: `ab65ba57e3` ("iommu/vt-d: Move Kconfig and Makefile bits down into intel directory") Signed-off-by: Rolf Eike Beer <eb@emlix.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/3818749.MHq7AAxBmi@devpool92.emlix.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:46:46 +02:00
Rolf Eike Beer	9548feff84	iommu: remove duplicate selection of DMAR_TABLE This is already done in intel/Kconfig. Fixes: `70bad345e6` ("iommu: Fix compilation without CONFIG_IOMMU_INTEL") Signed-off-by: Rolf Eike Beer <eb@emlix.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/2232605.Mh6RI2rZIc@devpool92.emlix.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:46:45 +02:00
Colin Ian King	04cfc1ae14	iommu/fsl_pamu: remove trailing space after \n There is an extraenous space after \n in a pr_debug message. Remove it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250430151853.923614-1-colin.i.king@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:39:04 +02:00
Lukas Wunner	3be5fa2366	Revert "iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices" Commit `991de2e590` ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()") changed IRQ handling on PCI driver probing. It inadvertently broke resume from system sleep on AMD platforms: https://lore.kernel.org/r/20150926164651.GA3640@pd.tnic/ This was fixed by two independent commits: * `8affb487d4` ("x86/PCI: Don't alloc pcibios-irq when MSI is enabled") * `cbbc00be2c` ("iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices") The breaking change and one of these two fixes were subsequently reverted: * `fe25d07887` ("Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"") * `6c777e8799` ("Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"") This rendered the second fix unnecessary, so revert it as well. It used the match_driver flag in struct pci_dev, which is internal to the PCI core and not supposed to be touched by arbitrary drivers. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://patch.msgid.link/9a3ddff5cc49512044f963ba0904347bd404094d.1745572340.git.lukas@wunner.de	2025-05-15 13:40:46 +00:00
Dmitry Baryshkov	b3f3c493e9	iommu/arm-smmu-qcom: Add SAR2130P MDSS compatible Add the SAR2130P compatible to clients compatible list, the device require identity domain. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250418-sar2130p-display-v5-9-442c905cb3a4@oss.qualcomm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-06 13:57:09 +01:00
Qinxin Xia	be5a2d3f8f	iommu/arm-smmu-v3: Fix incorrect return in arm_smmu_attach_dev After commit `48e7b8e284` ("iommu/arm-smmu-v3: Remove arm_smmu_domain_finalise() during attach"), an error code is not returned on the attach path when the smmu does not match with the domain. This causes problems with VFIO because vfio_iommu_type1_attach_group() relies on this check to determine domain compatability. Re-instate the -EINVAL return value when the SMMU doesn't match on the device attach path. Fixes: `48e7b8e284` ("iommu/arm-smmu-v3: Remove arm_smmu_domain_finalise() during attach") Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com> Link: https://lore.kernel.org/r/20250422112951.2027969-1-xiaqinxin@huawei.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-06 13:45:49 +01:00
Leon Romanovsky	433a76207d	dma-mapping: Implement link/unlink ranges API Introduce new DMA APIs to perform DMA linkage of buffers in layers higher than DMA. In proposed API, the callers will perform the following steps. In map path: if (dma_can_use_iova(...)) dma_iova_alloc() for (page in range) dma_iova_link_next(...) dma_iova_sync(...) else /* Fallback to legacy map pages */ for (all pages) dma_map_page(...) In unmap path: if (dma_can_use_iova(...)) dma_iova_destroy() else for (all pages) dma_unmap_page(...) Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2025-05-06 08:36:53 +02:00
Christoph Hellwig	ed18a46262	iommu/dma: Factor out a iommu_dma_map_swiotlb helper Split the iommu logic from iommu_dma_map_page into a separate helper. This not only keeps the code neatly separated, but will also allow for reuse in another caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2025-05-06 08:36:53 +02:00
Leon Romanovsky	393cf700e6	dma-mapping: Provide an interface to allow allocate IOVA The existing .map_pages() callback provides both allocating of IOVA and linking DMA pages. That combination works great for most of the callers who use it in control paths, but is less effective in fast paths where there may be multiple calls to map_page(). These advanced callers already manage their data in some sort of database and can perform IOVA allocation in advance, leaving range linkage operation to be in fast path. Provide an interface to allocate/deallocate IOVA and next patch link/unlink DMA ranges to that specific IOVA. In the new API a DMA mapping transaction is identified by a struct dma_iova_state, which holds some recomputed information for the transaction which does not change for each page being mapped, so add a check if IOVA can be used for the specific transaction. The API is exported from dma-iommu as it is the only implementation supported, the namespace is clearly different from iommu_* functions which are not allowed to be used. This code layout allows us to save function call per API call used in datapath as well as a lot of boilerplate code. Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2025-05-06 08:36:53 +02:00
Leon Romanovsky	dc2e692943	iommu: add kernel-doc for iommu_unmap_fast Add kernel-doc section for iommu_unmap_fast to document existing limitation of underlying functions which can't split individual ranges. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2025-05-06 08:36:53 +02:00
Christoph Hellwig	5c87cffe2d	iommu: generalize the batched sync after map interface For the upcoming IOVA-based DMA API we want to batch the ops->iotlb_sync_map() call after mapping multiple IOVAs from dma-iommu without having a scatterlist. Improve the API. Add a wrapper for the map_sync as iommu_sync_map() so that callers don't need to poke into the methods directly. Formalize __iommu_map() into iommu_map_nosync() which requires the caller to call iommu_sync_map() after all maps are completed. Refactor the existing sanity checks from all the different layers into iommu_map_nosync(). Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Will Deacon <will@kernel.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2025-05-06 08:36:53 +02:00
Christoph Hellwig	ca2c2e4a78	dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h To support the upcoming non-scatterlist mapping helpers, we need to go back to have them called outside of the DMA API. Thus move them out of dma-map-ops.h, which is only for DMA API implementations to pci-p2pdma.h, which is for driver use. Note that the core helper is still not exported as the mapping is expected to be done only by very highlevel subsystem code at least for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2025-05-06 08:36:53 +02:00
Christoph Hellwig	a25e7962db	PCI/P2PDMA: Refactor the p2pdma mapping helpers The current scheme with a single helper to determine the P2P status and map a scatterlist segment force users to always use the map_sg helper to DMA map, which we're trying to get away from because they are very cache inefficient. Refactor the code so that there is a single helper that checks the P2P state for a page, including the result that it is not a P2P page to simplify the callers, and a second one to perform the address translation for a bus mapped P2P transfer that does not depend on the scatterlist structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2025-05-06 08:36:53 +02:00
Linus Torvalds	8164851725	IOMMU Fixes for Linux v6.15-rc4: Including: * ARM-SMMU fixes: - Fix broken detection of the S2FWB feature - Ensure page-size bitmap is initialised for SVA domains - Fix handling of SMMU client devices with duplicate Stream IDs - Don't fail SMMU probe if Stream IDs are aliased across clients * Intel VT-d fixes: - Add quirk for IGFX device - Revert an ATS change to fix a boot failure * AMD IOMMU: - Fix potential buffer overflow * Core: - Fix for iommu_copy_struct_from_user() -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmgUbnIACgkQK/BELZcB GuNZYw//c0mNpQw+Psx/IVDL5/G3e+6yF1ZxgvNrpD4Z2d9en99FkvzyOhMy8PNQ 013z6Do5MYEL+Z+Obkeh3fVZhPh7bNYsXLeZ2KtzP9f39WJzSW8oNJ1gC2knP6uf ZZsc90cj7m+F4mXOLXs9HNvsb+koOZZ8gKByrod4hfUJfJW+9KqBXeJYYGR83anK ALpTkaBF512GxsmWNvPqYM0PMUkmTX1at9Cd+gcqmHaqyEfkT5Wk1r85MWxBwgBo JbfaEASUBhcwRhBkJJvyncB17iRo/nn+tihl2FjxHRAv2qFPy4J0b56lmcXZMAIJ bTi9zNW5EUarJUfwAIppdbcx16OCvBhp6AU26esgrfNcPG2tdcgDkVshHKNjWHAW tkaA1usbx8zAoxmYNrx1lbrKnwIVyITmKVUGPn6MulDXmgdDEOIhHEzXE6mDYez5 R66Ge6aw1Of2PQIzEgKSE5Ya6OKWHhxswBfYN3oC5V31SGWvhvbDuOWblwua07US 6UtwUplSDG88BfKKc9joJN6hSF23IwPnYN8OgCHQVQXi//keRG7/5rNm7cLgRPId VJYwbi9OFmZRhFfmRItpBgtropV+nEIaq/fc7c1nQBNBr571iBrCKeoIIcqna0Q0 ZeGB9KmIksCxoxyZiVD+68sDCeCz8xHrqV8Ya4T5IhqXxL2A4Zc= =r0II -----END PGP SIGNATURE----- Merge tag 'iommu-fixes-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: "ARM-SMMU fixes: - Fix broken detection of the S2FWB feature - Ensure page-size bitmap is initialised for SVA domains - Fix handling of SMMU client devices with duplicate Stream IDs - Don't fail SMMU probe if Stream IDs are aliased across clients Intel VT-d fixes: - Add quirk for IGFX device - Revert an ATS change to fix a boot failure AMD IOMMU: - Fix potential buffer overflow Core: - Fix for iommu_copy_struct_from_user()" * tag 'iommu-fixes-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57) iommu/vt-d: Revert ATS timing change to fix boot failure iommu: Fix two issues in iommu_copy_struct_from_user() iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihid iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream ids iommu/arm-smmu-v3: Fix pgsize_bit for sva domains iommu/arm-smmu-v3: Add missing S2FWB feature detection	2025-05-02 08:57:00 -07:00
Sairaj Kodilkar	94a9921e2c	iommu/amd: Add support for HTRangeIgnore feature AMD IOMMU reserves the address range 0xfd00000000-0xffffffffff for the hypertransport protocol (HT) and has special meaning. Hence devices cannot use this address range for the DMA. However on some AMD platforms this HT range is shifted to the very top of the address space and new feature bit `HTRangeIgnore` is introduced. When this feature bit is on, IOMMU treats the GPA access to the legacy HT range as regular GPA access. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250317055020.25214-1-sarunkod@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-02 08:38:09 +02:00
Sean Christopherson	94c721ea03	iommu/amd: Ensure GA log notifier callbacks finish running before module unload Synchronize RCU when unregistering KVM's GA log notifier to ensure all in-flight interrupt handlers complete before KVM-the module is unloaded. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250315031048.2374109-1-seanjc@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:40:20 +02:00
Jason Gunthorpe	e586e22974	iommu: Protect against overflow in iommu_pgsize() On a 32 bit system calling: iommu_map(0, 0x40000000) When using the AMD V1 page table type with a domain->pgsize of 0xfffff000 causes iommu_pgsize() to miscalculate a result of: size=0x40000000 count=2 count should be 1. This completely corrupts the mapping process. This is because the final test to adjust the pagesize malfunctions when the addition overflows. Use check_add_overflow() to prevent this. Fixes: `b1d99dc5f9` ("iommu: Hook up '->unmap_pages' driver callback") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-3ad28fc2e3a3+163327-iommu_overflow_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:33:30 +02:00
Robin Murphy	da33e87bd2	iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit `b46064a188` ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:31:18 +02:00
Lu Baolu	6f7340120a	iommu: Allow attaching static domains in iommu_attach_device_pasid() The idxd driver attaches the default domain to a PASID of the device to perform kernel DMA using that PASID. The domain is attached to the device's PASID through iommu_attach_device_pasid(), which checks if the domain->owner matches the iommu_ops retrieved from the device. If they do not match, it returns a failure. if (ops != domain->owner \|\| pasid == IOMMU_NO_PASID) return -EINVAL; The static identity domain implemented by the intel iommu driver doesn't specify the domain owner. Therefore, kernel DMA with PASID doesn't work for the idxd driver if the device translation mode is set to passthrough. Generally the owner field of static domains are not set because they are already part of iommu ops. Add a helper domain_iommu_ops_compatible() that checks if a domain is compatible with the device's iommu ops. This helper explicitly allows the static blocked and identity domains associated with the device's iommu_ops to be considered compatible. Fixes: `2031c469f8` ("iommu/vt-d: Add support for static identity domain") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220031 Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/20250422191554.GC1213339@ziepe.ca/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250424034123.2311362-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:23:35 +02:00
Arnd Bergmann	fa26198d30	iommu/io-pgtable-arm: dynamically allocate selftest device struct In general a 'struct device' is way too large to be put on the kernel stack. Apparently something just caused it to grow a slightly larger, which pushed the arm_lpae_do_selftests() function over the warning limit in some configurations: drivers/iommu/io-pgtable-arm.c:1423:19: error: stack frame size (1032) exceeds limit (1024) in 'arm_lpae_do_selftests' [-Werror,-Wframe-larger-than] 1423 \| static int __init arm_lpae_do_selftests(void) \| ^ Change the function to use a dynamically allocated faux_device instead of the on-stack device structure. Fixes: `ca25ec247a` ("iommu/io-pgtable-arm: Remove iommu_dev==NULL special case") Link: https://lore.kernel.org/all/ab75a444-22a1-47f5-b3c0-253660395b5a@arm.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20250423164826.2931382-1-arnd@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:21:18 +02:00
Arnd Bergmann	33647d0be3	iommu: ipmmu-vmsa: avoid Wformat-security warning iommu_device_sysfs_add() requires a constant format string, otherwise a W=1 build produces a warning: drivers/iommu/ipmmu-vmsa.c:1093:62: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security] 1093 \| ret = iommu_device_sysfs_add(&mmu->iommu, &pdev->dev, NULL, dev_name(&pdev->dev)); \| ^~~~~~~~~~~~~~~~~~~~ drivers/iommu/ipmmu-vmsa.c:1093:62: note: treat the string as an argument to avoid this 1093 \| ret = iommu_device_sysfs_add(&mmu->iommu, &pdev->dev, NULL, dev_name(&pdev->dev)); \| ^ \| "%s", This was an old bug but I saw it now because the code was changed as part of commit `d9d3cede41` ("iommu/ipmmu-vmsa: Register in a sensible order"). Fixes: `7af9a5fdb9` ("iommu/ipmmu-vmsa: Use iommu_device_sysfs_add()/remove()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250423164006.2661372-1-arnd@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:20:09 +02:00
Jason Gunthorpe	21c03574df	iommu: Hide ops.domain_alloc behind CONFIG_FSL_PAMU fsl_pamu is the last user of domain_alloc(), and it is using it to create something weird that doesn't really fit into the iommu subsystem architecture. It is a not a paging domain since it doesn't have any map/unmap ops. It may be some special kind of identity domain. For now just leave it as is. Wrap it's definition in CONFIG_FSL_PAMU to discourage any new drivers from attempting to use it. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/5-v4-ff5fb6b03bd1+288-iommu_virtio_domains_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:14:59 +02:00
Jason Gunthorpe	a4672d0fe1	iommu: Do not call domain_alloc() in iommu_sva_domain_alloc() No driver implements SVA under domain_alloc() anymore, this is dead code. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/4-v4-ff5fb6b03bd1+288-iommu_virtio_domains_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:14:59 +02:00
Jason Gunthorpe	07107e7444	iommu/virtio: Move to domain_alloc_paging() virtio has the complication that it sometimes wants to return a paging domain for IDENTITY which makes this conversion a little different than other drivers. Add a viommu_domain_alloc_paging() that combines viommu_domain_alloc() and viommu_domain_finalise() to always return a fully initialized and finalized paging domain. Use viommu_domain_alloc_identity() to implement the special non-bypass IDENTITY flow by calling viommu_domain_alloc_paging() then viommu_domain_map_identity(). Remove support for deferred finalize and the vdomain->mutex. Remove core support for domain_alloc() IDENTITY as virtio was the last driver using it. Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/3-v4-ff5fb6b03bd1+288-iommu_virtio_domains_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:14:58 +02:00
Jason Gunthorpe	0d609a1450	iommu: Add domain_alloc_identity() virtio-iommu has a mode where the IDENTITY domain is actually a paging domain with an identity mapping covering some of the system address space manually created. To support this add a new domain_alloc_identity() op that accepts the struct device so that virtio can allocate and fully finalize a paging domain to return. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/2-v4-ff5fb6b03bd1+288-iommu_virtio_domains_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:14:57 +02:00
Jason Gunthorpe	0d76a6edae	iommu/virtio: Break out bypass identity support into a global static To make way for a domain_alloc_paging conversion add the typical global static IDENTITY domain. This supports VMMs that have a VIRTIO_IOMMU_F_BYPASS_CONFIG config. If the VMM does not have support then the domain_alloc path is still used, which creates an IDENTITY domain out of a paging domain. Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/1-v4-ff5fb6b03bd1+288-iommu_virtio_domains_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:14:57 +02:00
Chen Ni	c6f0d53ebf	iommu: apple-dart: Remove unnecessary NULL check before free_io_pgtable_ops() free_io_pgtable_ops() checks for NULL pointers internally. Remove unneeded NULL check here. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20250422072511.1334243-1-nichen@iscas.ac.cn Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:11:26 +02:00
Lu Baolu	f984fb09e6	iommu: Remove iommu_dev_enable/disable_feature() No external drivers use these interfaces anymore. Furthermore, no existing iommu drivers implement anything in the callbacks. Remove them to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20250418080130.1844424-9-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:04:35 +02:00
Lu Baolu	be2a24322c	iommufd: Remove unnecessary IOMMU_DEV_FEAT_IOPF The iopf enablement has been moved to the iommu drivers. It is unnecessary for iommufd to handle iopf enablement. Remove the iopf enablement logic to avoid duplication. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250418080130.1844424-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:04:34 +02:00
Lu Baolu	c2fa4d4cce	iommufd/selftest: Put iopf enablement in domain attach path Update iopf enablement in the iommufd mock device driver to use the new method, similar to the arm-smmu-v3 driver. Enable iopf support when any domain with an iopf_handler is attached, and disable it when the domain is removed. Add a refcount in the mock device state structure to keep track of the number of domains set to the device and PASIDs that require iopf. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250418080130.1844424-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:04:33 +02:00
Lu Baolu	17fce9d233	iommu/vt-d: Put iopf enablement in domain attach path Update iopf enablement in the driver to use the new method, similar to the arm-smmu-v3 driver. Enable iopf support when any domain with an iopf_handler is attached, and disable it when the domain is removed. Place all the logic for controlling the PRI and iopf queue in the domain set/remove/replace paths. Keep track of the number of domains set to the device and PASIDs that require iopf. When the first domain requiring iopf is attached, add the device to the iopf queue and enable PRI. When the last domain is removed, remove it from the iopf queue and disable PRI. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250418080130.1844424-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:04:31 +02:00
Jason Gunthorpe	7c8896dd4a	iommu: Remove IOMMU_DEV_FEAT_SVA None of the drivers implement anything here anymore, remove the dead code. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250418080130.1844424-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:04:29 +02:00
Jason Gunthorpe	cfea71aea9	iommu/arm-smmu-v3: Put iopf enablement in the domain attach path SMMUv3 co-mingles FEAT_IOPF and FEAT_SVA behaviors so that fault reporting doesn't work unless both are enabled. This is not correct and causes problems for iommufd which does not enable FEAT_SVA for it's fault capable domains. These APIs are both obsolete, update SMMUv3 to use the new method like AMD implements. A driver should enable iopf support when a domain with an iopf_handler is attached, and disable iopf support when the domain is removed. Move the fault support logic to sva domain allocation and to domain attach, refusing to create or attach fault capable domains if the HW doesn't support it. Move all the logic for controlling the iopf queue under arm_smmu_attach_prepare(). Keep track of the number of domains on the master (over all the SSIDs) that require iopf. When the first domain requiring iopf is attached create the iopf queue, when the last domain is detached destroy it. Turn FEAT_IOPF and FEAT_SVA into no ops. Remove the sva_lock, this is all protected by the group mutex. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250418080130.1844424-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:04:28 +02:00
Mingcong Bai	2c8a7c66c9	iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57) On the Lenovo ThinkPad X201, when Intel VT-d is enabled in the BIOS, the kernel boots with errors related to DMAR, the graphical interface appeared quite choppy, and the system resets erratically within a minute after it booted: DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0xb97ff000 [fault reason 0x05] PTE Write access is not set Upon comparing boot logs with VT-d on/off, I found that the Intel Calpella quirk (`quirk_calpella_no_shadow_gtt()') correctly applied the igfx IOMMU disable/quirk correctly: pci 0000:00:00.0: DMAR: BIOS has allocated no shadow GTT; disabling IOMMU for graphics Whereas with VT-d on, it went into the "else" branch, which then triggered the DMAR handling fault above: ... else if (!disable_igfx_iommu) { /* we have to ensure the gfx device is idle before we flush */ pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n"); iommu_set_dma_strict(); } Now, this is not exactly scientific, but moving 0x0044 to quirk_iommu_igfx seems to have fixed the aforementioned issue. Running a few `git blame' runs on the function, I have found that the quirk was originally introduced as a fix specific to ThinkPad X201: commit `9eecabcb9a` ("intel-iommu: Abort IOMMU setup for igfx if BIOS gave no shadow GTT space") Which was later revised twice to the "else" branch we saw above: - 2011: commit `6fbcfb3e46` ("intel-iommu: Workaround IOTLB hang on Ironlake GPU") - 2024: commit `ba00196ca4` ("iommu/vt-d: Decouple igfx_off from graphic identity mapping") I'm uncertain whether further testings on this particular laptops were done in 2011 and (honestly I'm not sure) 2024, but I would be happy to do some distro-specific testing if that's what would be required to verify this patch. P.S., I also see IDs 0x0040, 0x0062, and 0x006a listed under the same `quirk_calpella_no_shadow_gtt()' quirk, but I'm not sure how similar these chipsets are (if they share the same issue with VT-d or even, indeed, if this issue is specific to a bug in the Lenovo BIOS). With regards to 0x0062, it seems to be a Centrino wireless card, but not a chipset? I have also listed a couple (distro and kernel) bug reports below as references (some of them are from 7-8 years ago!), as they seem to be similar issue found on different Westmere/Ironlake, Haswell, and Broadwell hardware setups. Cc: stable@vger.kernel.org Fixes: `6fbcfb3e46` ("intel-iommu: Workaround IOTLB hang on Ironlake GPU") Fixes: `ba00196ca4` ("iommu/vt-d: Decouple igfx_off from graphic identity mapping") Link: https://groups.google.com/g/qubes-users/c/4NP4goUds2c?pli=1 Link: https://bugs.archlinux.org/task/65362 Link: https://bbs.archlinux.org/viewtopic.php?id=230323 Reported-by: Wenhao Sun <weiguangtwk@outlook.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=197029 Signed-off-by: Mingcong Bai <jeffbai@aosc.io> Link: https://lore.kernel.org/r/20250415133330.12528-1-jeffbai@aosc.io Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:00:58 +02:00
Sean Christopherson	aae251a380	iommu/amd: WARN if KVM attempts to set vCPU affinity without posted intrrupts WARN if KVM attempts to set vCPU affinity when posted interrupts aren't enabled, as KVM shouldn't try to enable posting when they're unsupported, and the IOMMU driver darn well should only advertise posting support when AMD_IOMMU_GUEST_IR_VAPIC() is true. Note, KVM consumes is_guest_mode only on success. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250404193923.1413163-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-04-24 09:52:31 -04:00
Sean Christopherson	07172206a2	iommu/amd: Return an error if vCPU affinity is set for non-vCPU IRTE Return -EINVAL instead of success if amd_ir_set_vcpu_affinity() is invoked without use_vapic; lying to KVM about whether or not the IRTE was configured to post IRQs is all kinds of bad. Fixes: `d98de49a53` ("iommu/amd: Enable vAPIC interrupt remapping mode by default") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250404193923.1413163-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-04-24 09:52:31 -04:00
Lu Baolu	4f1492efb4	iommu/vt-d: Revert ATS timing change to fix boot failure Commit <5518f239aff1> ("iommu/vt-d: Move scalable mode ATS enablement to probe path") changed the PCI ATS enablement logic to run earlier, specifically before the default domain attachment. On some client platforms, this change resulted in boot failures, causing the kernel to panic with the following message and call trace: Kernel panic - not syncing: DMAR hardware is malfunctioning CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #175 Call Trace: <TASK> dump_stack_lvl+0x6f/0xb0 dump_stack+0x10/0x16 panic+0x10a/0x2b7 iommu_enable_translation.cold+0xc/0xc intel_iommu_init+0xe39/0xec0 ? trace_hardirqs_on+0x1e/0xd0 ? __pfx_pci_iommu_init+0x10/0x10 pci_iommu_init+0xd/0x40 do_one_initcall+0x5b/0x390 kernel_init_freeable+0x26d/0x2b0 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x15/0x120 ret_from_fork+0x35/0x60 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1a/0x30 RIP: 1f0f:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0000:0000000000000000 EFLAGS: 841f0f2e66 ORIG_RAX: 1f0f2e6600000000 RAX: 0000000000000000 RBX: 1f0f2e6600000000 RCX: 2e66000000000084 RDX: 0000000000841f0f RSI: 000000841f0f2e66 RDI: 00841f0f2e660000 RBP: 00841f0f2e660000 R08: 00841f0f2e660000 R09: 000000841f0f2e66 R10: 0000000000841f0f R11: 2e66000000000084 R12: 000000841f0f2e66 R13: 0000000000841f0f R14: 2e66000000000084 R15: 1f0f2e6600000000 </TASK> ---[ end Kernel panic - not syncing: DMAR hardware is malfunctioning ]--- Fix this by reverting the timing change for ATS enablement introduced by the offending commit and restoring the previous behavior. Fixes: `5518f239af` ("iommu/vt-d: Move scalable mode ATS enablement to probe path") Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Closes: https://lore.kernel.org/linux-iommu/01b9c72f-460d-4f77-b696-54c6825babc9@linux.intel.com/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250416073608.1799578-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:45:28 +02:00
Matthew Rosato	81244074b5	iommu/s390: allow larger region tables Extend the aperture calculation to consider sizes beyond the maximum size of a region third table. Attempt to always use the smallest table size possible to avoid unnecessary extra steps during translation. Update reserved region calculations to use the appropriate table size. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20250411202433.181683-6-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:43:12 +02:00
Matthew Rosato	d5fbc5efbd	iommu/s390: support map/unmap for additional table regions Map and unmap ops use the shared dma_walk_cpu_trans routine, update this using the origin_type of the dma_table to determine how many table levels must be walked. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20250411202433.181683-5-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:43:12 +02:00
Matthew Rosato	1fe3f3cad5	iommu/s390: support iova_to_phys for additional table regions The origin_type of the dma_table is used to determine how many table levels must be traversed for the translation. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20250411202433.181683-4-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:43:11 +02:00
Matthew Rosato	a2392b8f1f	iommu/s390: support cleanup of additional table regions Extend the existing dma_cleanup_tables to also handle region second and region first tables. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20250411202433.181683-3-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:43:10 +02:00
Matthew Rosato	83c1aec210	iommu/s390: set appropriate IOTA region type When registering the I/O Translation Anchor, use the current table type stored in the s390_domain to set the appropriate region type indication. For the moment, the table type will always be stored as region third. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20250411202433.181683-2-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:43:10 +02:00
AngeloGioacchino Del Regno	f6a1e89ab6	iommu/mediatek: Add support for Dimensity 1200 MT6893 MM IOMMU Add support for the two MM IOMMUs found in the MediaTek Dimensity 1200 (MT6893) SoC, used for display, camera, imgsys and vpu. Since the MT6893 SoC is almost fully compatible with MT8192, also move the mt8192_larb_region_msk before the newly introduced mt6893 platform data, as the larb regions from mt8192 are fully reused. Note that the only difference with MT8192 is that MT6893 needs the SHARE_PGTABLE flag. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Yong Wu <yong.wu@mediatek.com> Link: https://lore.kernel.org/r/20250410144008.475888-3-angelogioacchino.delregno@collabora.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:41:11 +02:00
Pavel Paklov	8dee308e4c	iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihid There is a string parsing logic error which can lead to an overflow of hid or uid buffers. Comparing ACPIID_LEN against a total string length doesn't take into account the lengths of individual hid and uid buffers so the check is insufficient in some cases. For example if the length of hid string is 4 and the length of the uid string is 260, the length of str will be equal to ACPIID_LEN + 1 but uid string will overflow uid buffer which size is 256. The same applies to the hid string with length 13 and uid string with length 250. Check the length of hid and uid strings separately to prevent buffer overflow. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `ca3bf5d47c` ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Cc: stable@vger.kernel.org Signed-off-by: Pavel Paklov <Pavel.Paklov@cyberprotect.ru> Link: https://lore.kernel.org/r/20250325092259.392844-1-Pavel.Paklov@cyberprotect.ru Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:37:21 +02:00
Robin Murphy	0da188c846	iommu: Split out and tidy up Arm Kconfig There are quite a lot of options for the Arm drivers, still all buried in the top-level Kconfig. For ease of use and consistency with all the other subdirectories, break these out into drivers/arm. For similar clarity and self-consistency, also tweak the ARM_SMMU sub-options to use "if" instead of "depends", to match ARM_SMMU_V3. Lastly also clean up the slightly messy description of ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT as highlighted by Geert - by now we really shouldn't need commentary on v4.x kernel behaviour anyway - and downgrade it to EXPERT as the first step in the 6-year-old threat to remove it entirely. Cc: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/a614ec86ba78c09cd16e348f633f6bb38793391f.1742480488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:34:00 +02:00
Robin Murphy	0c8e9c148e	iommu: Avoid introducing more races Although the lock-juggling is only a temporary workaround, we don't want it to make things avoidably worse. Jason was right to be nervous, since bus_iommu_probe() doesn't care which IOMMU instance it's probing for, so it probably is possible for one walk to finish a probe which a different walk started, thus we do want to check for that. Also there's no need to drop the lock just to have of_iommu_configure() do nothing when a fwspec already exists; check that directly and avoid opening a window at all in that (still somewhat likely) case. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/09d901ad11b3a410fbb6e27f7d04ad4609c3fe4a.1741706365.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:31:07 +02:00
Jason Gunthorpe	249d3327f0	iommu/vtd: Remove iommu_alloc_pages_node() Intel is the only thing that uses this now, convert to the size versions, trying to avoid PAGE_SHIFT. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/23-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:54 +02:00
Jason Gunthorpe	c3b42b6ffa	iommu/amd: Use iommu_alloc_pages_node_sz() for the IRT Use the actual size of the irq_table allocation, limiting to 128 due to the HW alignment needs. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/22-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:53 +02:00
Jason Gunthorpe	5087f663c2	iommu/pages: Remove iommu_alloc_page_node() Use iommu_alloc_pages_node_sz() instead. AMD and Intel are both using 4K pages for these structures since those drivers only work on 4K PAGE_SIZE. riscv is also spec'd to use SZ_4K. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/21-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:51 +02:00
Jason Gunthorpe	28024569e8	iommu/pages: Remove iommu_alloc_page/pages() A few small changes to the remaining drivers using these will allow them to be removed: - Exynos wants to allocate fixed 16K/8K allocations - Rockchip already has a define SPAGE_SIZE which is used by the dma_map immediately following, using SPAGE_ORDER which is a lg2size - tegra has size constants already for its two allocations Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:50 +02:00
Jason Gunthorpe	d50aaa4a9f	iommu: Update various drivers to pass in lg2sz instead of order to iommu pages Convert most of the places calling get_order() as an argument to the iommu-pages allocator into order_base_2() or the _sz flavour instead. These places already have an exact size, there is no particular reason to use order here. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/19-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:48 +02:00
Jason Gunthorpe	9dda3f01dd	iommu/riscv: Update to use iommu_alloc_pages_node_lg2() One part of RISCV already has a computed size, however the queue allocation must be aligned to 4k. The other objects are 4k by spec. Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/18-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:48 +02:00
Jason Gunthorpe	5faa04c4ed	iommu/amd: Use roundup_pow_two() instead of get_order() If x >= PAGE_SIZE then: 1 << (get_order(x) + PAGE_SHIFT) == roundup_pow_two() Inline this into the only caller, compute the size of the HW device table in terms of 4K pages which matches the HW definition. Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/17-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:47 +02:00
Jason Gunthorpe	e874c666b1	iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc() This is just CPU memory used by the driver to track things, it doesn't need to use iommu-pages. All of them are indexed by devid and devid is bounded by pci_seg->last_bdf or we are already out of bounds on the page allocation. Switch them to use some version of kvmalloc_array() and drop the now unused constants and remove the tbl_size() round up to PAGE_SIZE multiples logic. Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/16-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:47 +02:00
Jason Gunthorpe	b3efacc451	iommu/pages: Allow sub page sizes to be passed into the allocator Generally drivers have a specific idea what their HW structure size should be. In a lot of cases this is related to PAGE_SIZE, but not always. ARM64, for example, allows a 4K IO page table size on a 64K CPU page table system. Currently we don't have any good support for sub page allocations, but make the API accommodate this by accepting a sub page size from the caller and rounding up internally. This is done by moving away from order as the size input and using size: size == 1 << (order + PAGE_SHIFT) Following patches convert drivers away from using order and try to specify allocation sizes independent of PAGE_SIZE. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/15-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:46 +02:00
Jason Gunthorpe	580ccca4ee	iommu/pages: Move the __GFP_HIGHMEM checks into the common code The entire allocator API is built around using the kernel virtual address, it is illegal to pass GFP_HIGHMEM in as a GFP flag. Block it in the common code. Remove the duplicated checks from drivers. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/14-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:45 +02:00
Jason Gunthorpe	212fcf36c6	iommu/pages: Move from struct page to struct ioptdesc and folio This brings the iommu page table allocator into the modern world of having its own private page descriptor and not re-using fields from struct page for its own purpose. It follows the basic pattern of struct ptdesc which did this transformation for the CPU page table allocator. Currently iommu-pages is pretty basic so this isn't a huge benefit, however I see a coming need for features that CPU allocator has, like sub PAGE_SIZE allocations, and RCU freeing. This provides the base infrastructure to implement those cleanly. Remove numa_node_id() calls from the inlines and instead use NUMA_NO_NODE which will get switched to numa_mem_id(), which seems to be the right ID to use for memory allocations. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/13-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:44 +02:00
Jason Gunthorpe	27bc9f717f	iommu/pages: Remove iommu_put_pages_list_old and the _Generic Nothing uses the old list_head path now, remove it. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/12-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:43 +02:00
Jason Gunthorpe	868240c34e	iommu: Change iommu_iotlb_gather to use iommu_page_list This converts the remaining places using list of pages to the new API. The Intel free path was shared with its gather path, so it is converted at the same time. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/11-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:42 +02:00
Jason Gunthorpe	c70637cdd8	iommu/amd: Convert to use struct iommu_pages_list Change the internal freelist to use struct iommu_pages_list. AMD uses the freelist to batch free the entire table during domain destruction, and to replace table levels with leafs during map. Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/10-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:42 +02:00
Jason Gunthorpe	d4d5153ad6	iommu/riscv: Convert to use struct iommu_pages_list Change the internal freelist to use struct iommu_pages_list. riscv uses this page list to free page table levels that are replaced with leaf ptes. Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:41 +02:00
Jason Gunthorpe	13f43d7cf3	iommu/pages: Formalize the freelist API We want to get rid of struct page references outside the internal allocator implementation. The free list has the driver open code something like: list_add_tail(&virt_to_page(ptr)->lru, freelist); Move the above into a small inline and make the freelist into a wrapper type 'struct iommu_pages_list' so that the compiler can help check all the conversion. This struct has also proven helpful in some future ideas to convert to a singly linked list to get an extra pointer in the struct page, and to signal that the pages should be freed with RCU. Use a temporary _Generic so we don't need to rename the free function as the patches progress. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:40 +02:00
Jason Gunthorpe	f5af4a4f7c	iommu/pages: De-inline the substantial functions These are called in a lot of places and are not trivial. Move them to the core module. Tidy some of the comments and function arguments, fold __iommu_alloc_account() into its only caller, change __iommu_free_account() into __iommu_free_page() to remove some duplication. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:39 +02:00
Jason Gunthorpe	3e8e986ce8	iommu/pages: Remove iommu_free_page() Use iommu_free_pages() instead. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:36 +02:00
Jason Gunthorpe	4316ba4a50	iommu/pages: Remove the order argument to iommu_free_pages() Now that we have a folio under the allocation iommu_free_pages() can know the order of the original allocation and do the correct thing to free it. The next patch will rename iommu_free_page() to iommu_free_pages() so we have naming consistency with iommu_alloc_pages_node(). Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:33 +02:00
Jason Gunthorpe	c11a1a4792	iommu/pages: Make iommu_put_pages_list() work with high order allocations alloc_pages_node(, order) needs to be paired with __free_pages(, order) to free all the allocated pages. For order != 0 the return from alloc_pages_node() is just a page list, it hasn't been formed into a folio. However iommu_put_pages_list() just calls put_page() on the head page of an allocation, which will end up leaking the tail pages if order != 0. Fix this by using __GFP_COMP to create a high order folio and then always use put_page() to free the full high order folio. __iommu_free_account() can get the order of the allocation via folio_order(), which corrects the accounting of high order allocations in iommu_put_pages_list(). This is the same technique slub uses. As far as I can tell, none of the places using high order allocations are also using the free list, so this not a current bug. Fixes: `06c375053c` ("iommu/vt-d: add wrapper functions for page allocations") Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:33 +02:00
Jason Gunthorpe	8360c03dd9	iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages() These were only used by tegra-smmu and leaked the struct page out of the API. Delete them since tega-smmu has been converted to the other APIs. In the process flatten the call tree so we have fewer one line functions calling other one line functions.. iommu_alloc_pages_node() is the real allocator and everything else can just call it directly. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:32 +02:00
Jason Gunthorpe	a96969a915	iommu/tegra: Do not use struct page as the handle for pts Instead use the virtual address and dma_map_single() like as->pd uses. Introduce a small struct tegra_pt instead of void * to have some clarity what is using this API and add compile safety during the conversion. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:32 +02:00
Jason Gunthorpe	50568f87d1	iommu/terga: Do not use struct page as the handle for as->pd memory Instead use the virtual address. Change from dma_map_page() to dma_map_single() which works directly on a KVA. Add a type for the pd table level for clarity. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:22:31 +02:00
Robin Murphy	2d00c34d66	iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully We've never supported StreamID aliasing between devices, and as such they will never have had functioning DMA, but this is not fatal to the SMMU itself. Although aliasing between hard-wired platform device StreamIDs would tend to raise questions about the whole system, in practice it's far more likely to occur relatively innocently due to legacy PCI bridges, where the underlying StreamID mappings are still perfectly reasonable. As such, return a more benign -ENODEV when failing probe for such an unsupported device (and log a more obvious error message), so that it doesn't break the entire SMMU probe now that bus_iommu_probe() runs in the right order and can propagate that error back. The end result is still that the device doesn't get an IOMMU group and probably won't work, same as before. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/39d54e49c8476efc4653e352150d44b185d6d50f.1744380554.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-04-17 12:42:02 +01:00
Nicolin Chen	b00d24997a	iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream ids ASPEED VGA card has two built-in devices: 0008:06:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 06) 0008:07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) Its toplogy looks like this: +-[0008:00]---00.0-[01-09]--+-00.0-[02-09]--+-00.0-[03]----00.0 Sandisk Corp Device 5017 \| +-01.0-[04]-- \| +-02.0-[05]----00.0 NVIDIA Corporation Device \| +-03.0-[06-07]----00.0-[07]----00.0 ASPEED Technology, Inc. ASPEED Graphics Family \| +-04.0-[08]----00.0 Renesas Technology Corp. uPD720201 USB 3.0 Host Controller \| \-05.0-[09]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller \-00.1 PMC-Sierra Inc. Device 4028 The IORT logic populaties two identical IDs into the fwspec->ids array via DMA aliasing in iort_pci_iommu_init() called by pci_for_each_dma_alias(). Though the SMMU driver had been able to handle this situation since commit `563b5cbe33` ("iommu/arm-smmu-v3: Cope with duplicated Stream IDs"), that got broken by the later commit `cdf315f907` ("iommu/arm-smmu-v3: Maintain a SID->device structure"), which ended up with allocating separate streams with the same stuffing. On a kernel prior to v6.15-rc1, there has been an overlooked warning: pci 0008:07:00.0: vgaarb: setting as boot VGA device pci 0008:07:00.0: vgaarb: bridge control possible pci 0008:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none pcieport 0008:06:00.0: Adding to iommu group 14 ast 0008:07:00.0: stream 67328 already in tree <===== WARNING ast 0008:07:00.0: enabling device (0002 -> 0003) ast 0008:07:00.0: Using default configuration ast 0008:07:00.0: AST 2600 detected ast 0008:07:00.0: [drm] Using analog VGA ast 0008:07:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16 [drm] Initialized ast 0.1.0 for 0008:07:00.0 on minor 0 ast 0008:07:00.0: [drm] fb0: astdrmfb frame buffer device With v6.15-rc, since the commit `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path"), the error returned with the warning is moved to the SMMU device probe flow: arm_smmu_probe_device+0x15c/0x4c0 __iommu_probe_device+0x150/0x4f8 probe_iommu_group+0x44/0x80 bus_for_each_dev+0x7c/0x100 bus_iommu_probe+0x48/0x1a8 iommu_device_register+0xb8/0x178 arm_smmu_device_probe+0x1350/0x1db0 which then fails the entire SMMU driver probe: pci 0008:06:00.0: Adding to iommu group 21 pci 0008:07:00.0: stream 67328 already in tree arm-smmu-v3 arm-smmu-v3.9.auto: Failed to register iommu arm-smmu-v3 arm-smmu-v3.9.auto: probe with driver arm-smmu-v3 failed with error -22 Since SMMU driver had been already expecting a potential duplicated Stream ID in arm_smmu_install_ste_for_dev(), change the arm_smmu_insert_master() routine to ignore a duplicated ID from the fwspec->sids array as well. Note: this has been failing the iommu_device_probe() since 2021, although a recent iommu commit in v6.15-rc1 that moves iommu_device_probe() started to fail the SMMU driver probe. Since nobody has cared about DMA Alias support, leave that as it was but fix the fundamental iommu_device_probe() breakage. Fixes: `cdf315f907` ("iommu/arm-smmu-v3: Maintain a SID->device structure") Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20250415185620.504299-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2025-04-17 12:37:58 +01:00
Balbir Singh	12f7802197	iommu/arm-smmu-v3: Fix pgsize_bit for sva domains UBSan caught a bug with IOMMU SVA domains, where the reported exponent value in __arm_smmu_tlb_inv_range() was >= 64. __arm_smmu_tlb_inv_range() uses the domain's pgsize_bitmap to compute the number of pages to invalidate and the invalidation range. Currently arm_smmu_sva_domain_alloc() does not setup the iommu domain's pgsize_bitmap. This leads to __ffs() on the value returning 64 and that leads to undefined behaviour w.r.t. shift operations Fix this by initializing the iommu_domain's pgsize_bitmap to PAGE_SIZE. Effectively the code needs to use the smallest page size for invalidation Cc: stable@vger.kernel.org Fixes: `eb6c97647b` ("iommu/arm-smmu-v3: Avoid constructing invalid range commands") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250412002354.3071449-1-balbirs@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2025-04-17 12:28:42 +01:00
Aneesh Kumar K.V (Arm)	45e00e3671	iommu/arm-smmu-v3: Add missing S2FWB feature detection Commit `67e4fe3985` ("iommu/arm-smmu-v3: Use S2FWB for NESTED domains") introduced S2FWB usage but omitted the corresponding feature detection. As a result, vIOMMU allocation fails on FVP in arm_vsmmu_alloc(), due to the following check: if (!arm_smmu_master_canwbs(master) && !(smmu->features & ARM_SMMU_FEAT_S2FWB)) return ERR_PTR(-EOPNOTSUPP); This patch adds the missing detection logic to prevent allocation failure when S2FWB is supported. Fixes: `67e4fe3985` ("iommu/arm-smmu-v3: Use S2FWB for NESTED domains") Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250408033351.1012411-1-aneesh.kumar@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2025-04-17 12:25:34 +01:00
Nicolin Chen	767e22001d	iommu/tegra241-cmdqv: Fix warnings due to dmam_free_coherent() Two WARNINGs are observed when SMMU driver rolls back upon failure: arm-smmu-v3.9.auto: Failed to register iommu arm-smmu-v3.9.auto: probe with driver arm-smmu-v3 failed with error -22 ------------[ cut here ]------------ WARNING: CPU: 5 PID: 1 at kernel/dma/mapping.c:74 dmam_free_coherent+0xc0/0xd8 Call trace: dmam_free_coherent+0xc0/0xd8 (P) tegra241_vintf_free_lvcmdq+0x74/0x188 tegra241_cmdqv_remove_vintf+0x60/0x148 tegra241_cmdqv_remove+0x48/0xc8 arm_smmu_impl_remove+0x28/0x60 devm_action_release+0x1c/0x40 ------------[ cut here ]------------ 128 pages are still in use! WARNING: CPU: 16 PID: 1 at mm/page_alloc.c:6902 free_contig_range+0x18c/0x1c8 Call trace: free_contig_range+0x18c/0x1c8 (P) cma_release+0x154/0x2f0 dma_free_contiguous+0x38/0xa0 dma_direct_free+0x10c/0x248 dma_free_attrs+0x100/0x290 dmam_free_coherent+0x78/0xd8 tegra241_vintf_free_lvcmdq+0x74/0x160 tegra241_cmdqv_remove+0x98/0x198 arm_smmu_impl_remove+0x28/0x60 devm_action_release+0x1c/0x40 This is because the LVCMDQ queue memory are managed by devres, while that dmam_free_coherent() is called in the context of devm_action_release(). Jason pointed out that "arm_smmu_impl_probe() has mis-ordered the devres callbacks if ops->device_remove() is going to be manually freeing things that probe allocated": https://lore.kernel.org/linux-iommu/20250407174408.GB1722458@nvidia.com/ In fact, tegra241_cmdqv_init_structures() only allocates memory resources which means any failure that it generates would be similar to -ENOMEM, so there is no point in having that "falling back to standard SMMU" routine, as the standard SMMU would likely fail to allocate memory too. Remove the unwind part in tegra241_cmdqv_init_structures(), and return a proper error code to ask SMMU driver to call tegra241_cmdqv_remove() via impl_ops->device_remove(). Then, drop tegra241_vintf_free_lvcmdq() since devres will take care of that. Fixes: `483e0bd888` ("iommu/tegra241-cmdqv: Do not allocate vcmdq until dma_set_mask_and_coherent") Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250407201908.172225-1-nicolinc@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 12:44:27 +02:00
Pei Xiao	ae4814a3aa	iommu: remove unneeded semicolon cocci warnings: drivers/iommu/dma-iommu.c:1788:2-3: Unneeded semicolon so remove unneeded semicolon to fix cocci warnings. Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/tencent_73EEE47E6ECCF538229C9B9E6A0272DA2B05@qq.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 12:43:06 +02:00
Louis-Alexis Eyraud	38e8844005	iommu/mediatek: Fix NULL pointer deference in mtk_iommu_device_group Currently, mtk_iommu calls during probe iommu_device_register before the hw_list from driver data is initialized. Since iommu probing issue fix, it leads to NULL pointer dereference in mtk_iommu_device_group when hw_list is accessed with list_first_entry (not null safe). So, change the call order to ensure iommu_device_register is called after the driver data are initialized. Fixes: `9e3a2a6436` ("iommu/mediatek: Adapt sharing and non-sharing pgtable case") Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Reviewed-by: Yong Wu <yong.wu@mediatek.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> # MT8183 Juniper, MT8186 Tentacruel Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com> Link: https://lore.kernel.org/r/20250403-fix-mtk-iommu-error-v2-1-fe8b18f8b0a8@collabora.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 12:40:55 +02:00
Marek Szyprowski	99deffc409	iommu/exynos: Fix suspend/resume with IDENTITY domain Commit `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") changed the sequence of probing the SYSMMU controller devices and calls to arm_iommu_attach_device(), what results in resuming SYSMMU controller earlier, when it is still set to IDENTITY mapping. Such change revealed the bug in IDENTITY handling in the exynos-iommu driver. When SYSMMU controller is set to IDENTITY mapping, data->domain is NULL, so adjust checks in suspend & resume callbacks to handle this case correctly. Fixes: `b3d14960e6` ("iommu/exynos: Implement an IDENTITY domain") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250401202731.2810474-1-m.szyprowski@samsung.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 12:40:41 +02:00
Robin Murphy	d9d3cede41	iommu/ipmmu-vmsa: Register in a sensible order IPMMU registers almost-initialised instances, but misses assigning the drvdata to make them fully functional, so initial calls back into ipmmu_probe_device() are likely to fail unnecessarily. Reorder this to work as it should, also pruning the long-out-of-date comment and adding the missing sysfs cleanup on error for good measure. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/53be6667544de65a15415b699e38a9a965692e45.1742481687.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 09:24:10 +02:00
Robin Murphy	280e5a3010	iommu: Clear iommu-dma ops on cleanup If iommu_device_register() encounters an error, it can end up tearing down already-configured groups and default domains, however this currently still leaves devices hooked up to iommu-dma (and even historically the behaviour in this area was at best inconsistent across architectures/drivers...) Although in the case that an IOMMU is present whose driver has failed to probe, users cannot necessarily expect DMA to work anyway, it's still arguable that we should do our best to put things back as if the IOMMU driver was never there at all, and certainly the potential for crashing in iommu-dma itself is undesirable. Make sure we clean up the dev->dma_iommu flag along with everything else. Reported-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Closes: https://lore.kernel.org/all/CAGXv+5HJpTYmQ2h-GD7GjyeYT7bL9EBCvu0mz5LgpzJZtzfW0w@mail.gmail.com/ Tested-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/e788aa927f6d827dd4ea1ed608fada79f2bab030.1744284228.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 09:15:59 +02:00
Petr Tesarik	7d8c490ba3	iommu/vt-d: Remove an unnecessary call set_dma_ops() Do not touch per-device DMA ops when the driver has been converted to use the dma-iommu API. Fixes: `c588072bba` ("iommu/vt-d: Convert intel iommu driver to the iommu ops") Signed-off-by: Petr Tesarik <ptesarik@suse.com> Link: https://lore.kernel.org/r/20250403165605.278541-1-ptesarik@suse.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 09:06:06 +02:00
Sean Christopherson	548183ea38	iommu/vt-d: Wire up irq_ack() to irq_move_irq() for posted MSIs Set the posted MSI irq_chip's irq_ack() hook to irq_move_irq() instead of a dummy/empty callback so that posted MSIs process pending changes to the IRQ's SMP affinity. Failure to honor a pending set-affinity results in userspace being unable to change the effective affinity of the IRQ, as IRQD_SETAFFINITY_PENDING is never cleared and so irq_set_affinity_locked() always defers moving the IRQ. The issue is most easily reproducible by setting /proc/irq/xx/smp_affinity multiple times in quick succession, as only the first update is likely to be handled in process context. Fixes: `ed1e48ea43` ("iommu/vt-d: Enable posted mode for device MSIs") Cc: Robert Lippert <rlippert@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Wentao Yang <wentaoyang@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250321194249.1217961-1-seanjc@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 09:06:06 +02:00
Fedor Pchelkin	df4bf3fa1b	iommu: Fix crash in report_iommu_fault() The following crash is observed while handling an IOMMU fault with a recent kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle page fault for address: ffff8c708299f700 PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3 Oops: Oops: 0011 [#1] SMP NOPTI CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20 PREEMPT(lazy) Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW 09/27/2024 RIP: 0010:0xffff8c708299f700 Call Trace: <TASK> ? report_iommu_fault+0x78/0xd3 ? amd_iommu_report_page_fault+0x91/0x150 ? amd_iommu_int_thread+0x77/0x180 ? __pfx_irq_thread_fn+0x10/0x10 ? irq_thread_fn+0x23/0x60 ? irq_thread+0xf9/0x1e0 ? __pfx_irq_thread_dtor+0x10/0x10 ? __pfx_irq_thread+0x10/0x10 ? kthread+0xfc/0x240 ? __pfx_kthread+0x10/0x10 ? ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ? ret_from_fork_asm+0x1a/0x30 </TASK> report_iommu_fault() checks for an installed handler comparing the corresponding field to NULL. It can (and could before) be called for a domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA, specifically. Cookie is represented as a union so we may end up with a garbage value treated there if this happens for a domain with another cookie type. Formerly there were two exclusive cookie types in the union. IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault(). Call the fault handler only if the passed domain has a required cookie type. Found by Linux Verification Center (linuxtesting.org). Fixes: `6aa63a4ec9` ("iommu: Sort out domain user data") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250408213342.285955-1-pchelkin@ispras.ru Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 09:03:44 +02:00
Thomas Gleixner	8fa7292fee	treewide: Switch/rename to timer_delete[_sync]() timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-05 10:30:12 +02:00
Linus Torvalds	48552153cf	iommufd 6.15 merge window pull Two significant new items: - Allow reporting IOMMU HW events to userspace when the events are clearly linked to a device. This is linked to the VIOMMU object and is intended to be used by a VMM to forward HW events to the virtual machine as part of emulating a vIOMMU. ARM SMMUv3 is the first driver to use this mechanism. Like the existing fault events the data is delivered through a simple FD returning event records on read(). - PASID support in VFIO. "Process Address Space ID" is a PCI feature that allows the device to tag all PCI DMA operations with an ID. The IOMMU will then use the ID to select a unique translation for those DMAs. This is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to manage each PASID entry. The support is generic so any VFIO user could attach any translation to a PASID, and the support should work on ARM SMMUv3 as well. AMD requires additional driver work. Some minor updates, along with fixes: - Prevent using nested parents with fault's, no driver support today - Put a single "cookie_type" value in the iommu_domain to indicate what owns the various opaque owner fields -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ+q6NgAKCRCFwuHvBreF YZ3zAQDbl4/Z0O+CLN2AXq4Zeiyq1HTSoF94hzqmm7lQ17zTIwD8CCdyLXHvupaq tkBIv5IovpaxlrSk6M0kh2K8vPCk9Qk= =CIM3 -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Two significant new items: - Allow reporting IOMMU HW events to userspace when the events are clearly linked to a device. This is linked to the VIOMMU object and is intended to be used by a VMM to forward HW events to the virtual machine as part of emulating a vIOMMU. ARM SMMUv3 is the first driver to use this mechanism. Like the existing fault events the data is delivered through a simple FD returning event records on read(). - PASID support in VFIO. The "Process Address Space ID" is a PCI feature that allows the device to tag all PCI DMA operations with an ID. The IOMMU will then use the ID to select a unique translation for those DMAs. This is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to manage each PASID entry. The support is generic so any VFIO user could attach any translation to a PASID, and the support should work on ARM SMMUv3 as well. AMD requires additional driver work. Some minor updates, along with fixes: - Prevent using nested parents with fault's, no driver support today - Put a single "cookie_type" value in the iommu_domain to indicate what owns the various opaque owner fields" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits) iommufd: Test attach before detaching pasid iommufd: Fix iommu_vevent_header tables markup iommu: Convert unreachable() to BUG() iommufd: Balance veventq->num_events inc/dec iommufd: Initialize the flags of vevent in iommufd_viommu_report_event() iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability vfio: VFIO_DEVICE_[AT\|DE]TACH_IOMMUFD_PT support pasid vfio-iommufd: Support pasid [at\|de]tach for physical VFIO devices ida: Add ida_find_first_range() iommufd/selftest: Add coverage for iommufd pasid attach/detach iommufd/selftest: Add test ops to test pasid attach/detach iommufd/selftest: Add a helper to get test device iommufd/selftest: Add set_dev_pasid in mock iommu iommufd: Allow allocating PASID-compatible domain iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support iommufd: Enforce PASID-compatible domain for RID iommufd: Support pasid attach/replace iommufd: Enforce PASID-compatible domain in PASID path iommufd/device: Add pasid_attach array to track per-PASID attach ...	2025-04-01 18:03:46 -07:00
Yi Liu	7be11d34f6	iommufd: Test attach before detaching pasid Check if the pasid has been attached before going further in the detach path. This fixes a crash found by syzkaller. Add a selftest as well. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 668 Comm: repro Not tainted 6.14.0-next-20250325-eb4bc4b07f66 #1 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org4 RIP: 0010:iommufd_hw_pagetable_detach+0x8a/0x4d0 Code: 00 00 00 44 89 ee 48 89 c7 48 89 75 c8 48 89 45 c0 e8 ca 55 17 02 48 89 c2 49 89 c4 48 b8 00 00 00b RSP: 0018:ffff888021b17b78 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff888014b5a000 RCX: ffff888021b17a64 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88801dad07fc RBP: ffff888021b17bc8 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: ffff88801dad0e58 R12: 0000000000000000 R13: 0000000000000001 R14: ffff888021b17e18 R15: ffff8880132d3008 FS: 00007fca52013600(0000) GS:ffff8880e3684000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200006c0 CR3: 00000000112d0005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> iommufd_device_detach+0x2a/0x2e0 iommufd_test+0x2f99/0x5cd0 iommufd_fops_ioctl+0x38e/0x520 __x64_sys_ioctl+0x1ba/0x220 x64_sys_call+0x122e/0x2150 do_syscall_64+0x6d/0x150 entry_SYSCALL_64_after_hwframe+0x76/0x7e Link: https://patch.msgid.link/r/20250328133448.22052-1-yi.l.liu@intel.com Reported-by: Lai Yi <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/linux-iommu/Z+X0tzxhiaupJT7b@ly-workstation Fixes: `c0e301b297` ("iommufd/device: Add pasid_attach array to track per-PASID attach") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-28 11:40:41 -03:00
Josh Poimboeuf	3a2ffd3f3e	iommu: Convert unreachable() to BUG() Bare unreachable() should be avoided as it generates undefined behavior, e.g. falling through to the next function. Use BUG() instead so the error is defined. Fixes the following warnings: drivers/iommu/dma-iommu.o: warning: objtool: iommu_dma_sw_msi+0x92: can't find jump dest instruction at .text+0x54d5 vmlinux.o: warning: objtool: iommu_dma_get_msi_page() falls through to next function __iommu_dma_unmap() Link: https://patch.msgid.link/r/0c801ae017ec078cacd39f8f0898fc7780535f85.1743053325.git.jpoimboe@kernel.org Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/314f8809-cd59-479b-97d7-49356bf1c8d1@infradead.org Reported-by: Paul E. McKenney <paulmck@kernel.org> Closes: https://lore.kernel.org/5dd1f35e-8ece-43b7-ad6d-86d02d2718f6@paulmck-laptop Fixes: `6aa63a4ec9` ("iommu: Sort out domain user data") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-28 10:07:23 -03:00
Yi Liu	6fc85bbbea	iommufd: Balance veventq->num_events inc/dec iommufd_veventq_fops_read() decrements veventq->num_events when a vevent is read out. However, the report path ony increments veventq->num_events for normal events. To be balanced, make the read path decrement num_events only for normal vevents. Fixes: `e36ba5ab80` ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC") Link: https://patch.msgid.link/r/20250324120034.5940-3-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-28 10:07:23 -03:00
Yi Liu	41464a4628	iommufd: Initialize the flags of vevent in iommufd_viommu_report_event() The vevent->header.flags is not initialized per allocation, hence the vevent read path may treat the vevent as lost_events_header wrongly. Use kzalloc() to alloc memory for new vevent. Fixes: `e8e1ef9b77` ("iommufd/viommu: Add iommufd_viommu_report_event helper") Link: https://patch.msgid.link/r/20250324120034.5940-2-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-28 10:07:23 -03:00
Yi Liu	803f97298e	iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability PASID usage requires PASID support in both device and IOMMU. Since the iommu drivers always enable the PASID capability for the device if it is supported, this extends the IOMMU_GET_HW_INFO to report the PASID capability to userspace. Also, enhances the selftest accordingly. Link: https://patch.msgid.link/r/20250321180143.8468-5-yi.l.liu@intel.com Cc: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-28 10:07:23 -03:00
Linus Torvalds	336b4dae6d	IOMMU Updates for Linux v6.15 Including: - Core: IOMMUFD dependencies from Jason: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API - Core: Fixes for probe-issues from Robin - Intel VT-d changes: - Checking for SVA support in domain allocation and attach paths - Move PCI ATS and PRI configuration into probe paths - Fix a pentential hang on reboot -f - Miscellaneous cleanups - AMD-Vi changes: - Support for up to 2k IRQs per PCI device function - Set of smaller fixes - ARM-SMMU changes: - SMMUv2 devicetree binding updates for Qualcomm implementations (QCS8300 GPU and MSM8937) - Clean up SMMUv2 runtime PM implementation to help with wider rework of pm_runtime_put_autosuspend() - Rockchip driver changes: - Driver adjustments for recent DT probing changes - S390 IOMMU changes: - Support for IOMMU passthrough - Apple Dart changes: - Driver adjustments to meet ISP device requirements - Null-ptr deref fix - Disable subpage protection for DART 1 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmfkFSkACgkQK/BELZcB GuOVUg/9En4QlF0eeg4E1stszGcOOXn9IkKiGP9iqKpL/WqVVitoagn7nwq0rBFF PdqqcGbSPNHctiUursB18DyENOAmRUE8mGG29po9rAfq5wxZ2yA6eLmpFAf9QDCW vY3nz2oen1lUrzbPwVI1IR67L6BrtqEUz/1syf295RMr2yXkdmBXb12lbL1rdjv0 7YfWwlDtMOVsewlZmCuDAAfWHtd7cO/ulDcYwq+GOiSXojVmKw8624EFlFbM9E9f tkceE4mjGQnN/CpZzUc48f1DgIW82PVVTek95Cznbk9ACDJQ4VYYwDMx2WszfMzu D1CXLiQu0vrxQgFdHn02bw3T4yvXQ4HeawIaTF+1J1gEHbrj6MoVQ88zaO3GnKFk yxYI5sfOs5iL6zSMkig4OytXuRmRqDVJhgrF1i4XIM3GXrRKf9EiRZ/1eXb+1gab 3Q5k7+u4+7vWbaKLsxOdIBNzz02hb7LVndPIFlWmSnr1YY0LFBfwmgbfTAsSINZo 22Ni8aE2C42lYOZxJ67m3khLpqTZaTERtQIab/gUd5FcuEjxKuHBIzj6SZeC/0Q9 Y6rzklHxJxxtQLrUJp16IzjpiMJbWH0H2jKstRberOpxk3L5aLrJNd8M1B+CokmP LZeKgqgxV5F+VlR2Bap5AXL9ZEuqjCVDIEy79j8XzAzxobDVnVM= =a8Pd -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core iommufd dependencies from Jason: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API Core fixes for probe-issues from Robin Intel VT-d changes: - Checking for SVA support in domain allocation and attach paths - Move PCI ATS and PRI configuration into probe paths - Fix a pentential hang on reboot -f - Miscellaneous cleanups AMD-Vi changes: - Support for up to 2k IRQs per PCI device function - Set of smaller fixes ARM-SMMU changes: - SMMUv2 devicetree binding updates for Qualcomm implementations (QCS8300 GPU and MSM8937) - Clean up SMMUv2 runtime PM implementation to help with wider rework of pm_runtime_put_autosuspend() Rockchip driver changes: - Driver adjustments for recent DT probing changes S390 IOMMU changes: - Support for IOMMU passthrough Apple Dart changes: - Driver adjustments to meet ISP device requirements - Null-ptr deref fix - Disable subpage protection for DART 1" * tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits) iommu/vt-d: Fix possible circular locking dependency iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changes iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabled iommu: apple-dart: fix potential null pointer deref iommu/rockchip: Retire global dma_dev workaround iommu/rockchip: Register in a sensible order iommu/rockchip: Allocate per-device data sensibly iommu/mediatek-v1: Support COMPILE_TEST iommu/amd: Enable support for up to 2K interrupts per function iommu/amd: Rename DTE_INTTABLEN* and MAX_IRQS_PER_TABLE macro iommu/amd: Replace slab cache allocator with page allocator iommu/amd: Introduce generic function to set multibit feature value iommu: Don't warn prematurely about dodgy probes iommu/arm-smmu: Set rpm auto_suspend once during probe dt-bindings: arm-smmu: Document QCS8300 GPU SMMU iommu: Get DT/ACPI parsing into the proper probe path iommu: Keep dev->iommu state consistent iommu: Resolve ops in iommu_init_device() iommu: Handle race with default domain setup iommu: Unexport iommu_fwspec_free() ...	2025-03-26 20:10:09 -07:00
Linus Torvalds	a5b3d8660b	hyperv-next for 6.15 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmfhlLATHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXgchCADOz33rSm4G4w4r0qT05dTDi/lZkEdK 64dQq322XXP/C9FfR66d30243gsAmuM5a0SvzFHLXAOu6yqM270Xehd/Rud+Um2s lSVnc0Ux0AWBgksqFd0t577aN7zmJEukosEYO5lBNop+zOcadrm3S6Th/AoL2h/D yphPkhH13bsCK+Wll/eBOQLIhC9iA0konYbBLuEQ5MqvUbrzc6Rmb5gxsHHZKOqg vLjkrYR/d3s2gIpKxiFp0RwvzGyffZEHxvU/YF3hTenPMlTlnXWbyspBSTVmWggP 13IFLzqxDdW9RgUnGB4xRc424AC1LKqEr42QPQE7zGvl2jdJriA2Q1LT =BXqj -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Add support for running as the root partition in Hyper-V (Microsoft Hypervisor) by exposing /dev/mshv (Nuno and various people) - Add support for CPU offlining in Hyper-V (Hamza Mahfooz) - Misc fixes and cleanups (Roman Kisel, Tianyu Lan, Wei Liu, Michael Kelley, Thorsten Blum) * tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits) x86/hyperv: fix an indentation issue in mshyperv.h x86/hyperv: Add comments about hv_vpset and var size hypercall input args Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs hyperv: Add definitions for root partition driver to hv headers x86: hyperv: Add mshv_handler() irq handler and setup function Drivers: hv: Introduce per-cpu event ring tail Drivers: hv: Export some functions for use by root partition module acpi: numa: Export node_to_pxm() hyperv: Introduce hv_recommend_using_aeoi() arm64/hyperv: Add some missing functions to arm64 x86/mshyperv: Add support for extended Hyper-V features hyperv: Log hypercall status codes as strings x86/hyperv: Fix check of return value from snp_set_vmsa() x86/hyperv: Add VTL mode callback for restarting the system x86/hyperv: Add VTL mode emergency restart callback hyperv: Remove unused union and structs hyperv: Add CONFIG_MSHV_ROOT to gate root partition support hyperv: Change hv_root_partition into a function hyperv: Convert hypercall statuses to linux error codes drivers/hv: add CPU offlining support ...	2025-03-25 14:47:04 -07:00
Yi Liu	c1b52b0a97	iommufd/selftest: Add test ops to test pasid attach/detach This adds 4 test ops for pasid attach/replace/detach testing. There are ops to attach/detach pasid, and also op to check the attached hwpt of a pasid. Link: https://patch.msgid.link/r/20250321171940.7213-18-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	068e140251	iommufd/selftest: Add a helper to get test device There is need to get the selftest device (sobj->type == TYPE_IDEV) in multiple places, so have a helper to for it. Link: https://patch.msgid.link/r/20250321171940.7213-17-yi.l.liu@intel.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	9eb59204d5	iommufd/selftest: Add set_dev_pasid in mock iommu The callback is needed to make pasid_attach/detach path complete for mock device. A nop is enough for set_dev_pasid. A MOCK_FLAGS_DEVICE_PASID is added to indicate a pasid-capable mock device for the pasid test cases. Other test cases will still create a non-pasid mock device. While the mock iommu always pretends to be pasid-capable. Link: https://patch.msgid.link/r/20250321171940.7213-16-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	dbc5f37b4f	iommufd: Allow allocating PASID-compatible domain The underlying infrastructure has supported the PASID attach and related enforcement per the requirement of the IOMMU_HWPT_ALLOC_PASID flag. This extends iommufd to support PASID compatible domain requested by userspace. Link: https://patch.msgid.link/r/20250321171940.7213-15-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	ce15c13e7a	iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support Intel iommu driver just treats it as a nop since Intel VT-d does not have special requirement on domains attached to either the PASID or RID of a PASID-capable device. Link: https://patch.msgid.link/r/20250321171940.7213-14-yi.l.liu@intel.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	4c3f4f432c	iommufd: Enforce PASID-compatible domain for RID Per the definition of IOMMU_HWPT_ALLOC_PASID, iommufd needs to enforce the RID to use PASID-compatible domain if PASID has been attached, and vice versa. The PASID path has already enforced it. This adds the enforcement in the RID path. This enforcement requires a lock across the RID and PASID attach path, the idev->igroup->lock is used as both the RID and the PASID path holds it. Link: https://patch.msgid.link/r/20250321171940.7213-13-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	2fb69c602d	iommufd: Support pasid attach/replace This extends the below APIs to support PASID. Device drivers to manage pasid attach/replace/detach. int iommufd_device_attach(struct iommufd_device idev, ioasid_t pasid, u32 pt_id); int iommufd_device_replace(struct iommufd_device idev, ioasid_t pasid, u32 pt_id); void iommufd_device_detach(struct iommufd_device *idev, ioasid_t pasid); The pasid operations share underlying attach/replace/detach infrastructure with the device operations, but still have some different implications: - no reserved region per pasid otherwise SVA architecture is already broken (CPU address space doesn't count device reserved regions); - accordingly no sw_msi trick; Cache coherency enforcement is still applied to pasid operations since it is about memory accesses post page table walking (no matter the walk is per RID or per PASID). Link: https://patch.msgid.link/r/20250321171940.7213-12-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	ff3f014ebb	iommufd: Enforce PASID-compatible domain in PASID path AMD IOMMU requires attaching PASID-compatible domains to PASID-capable devices. This includes the domains attached to RID and PASIDs. Related discussions in link [1] and [2]. ARM also has such a requirement, Intel does not need it, but can live up with it. Hence, iommufd is going to enforce this requirement as it is not harmful to vendors that do not need it. Mark the PASID-compatible domains and enforce it in the PASID path. [1] https://lore.kernel.org/linux-iommu/20240709182303.GK14050@ziepe.ca/ [2] https://lore.kernel.org/linux-iommu/20240822124433.GD3468552@ziepe.ca/ Link: https://patch.msgid.link/r/20250321171940.7213-11-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	c0e301b297	iommufd/device: Add pasid_attach array to track per-PASID attach PASIDs of PASID-capable device can be attached to hwpt separately, hence a pasid array to track per-PASID attachment is necessary. The index IOMMU_NO_PASID is used by the RID path. Hence drop the igroup->attach. Link: https://patch.msgid.link/r/20250321171940.7213-10-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	831b40f841	iommufd/device: Replace device_list with device_array igroup->attach->device_list is used to track attached device of a group in the RID path. Such tracking is also needed in the PASID path in order to share path with the RID path. While there is only one list_head in the iommufd_device. It cannot work if the device has been attached in both RID path and PASID path. To solve it, replacing the device_list with an xarray. The attached iommufd_device is stored in the entry indexed by the idev->obj.id. Link: https://patch.msgid.link/r/20250321171940.7213-9-yi.l.liu@intel.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	75f990aef3	iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct The igroup->hwpt and igroup->device_list are used to track the hwpt attach of a group in the RID path. While the coming PASID path also needs such tracking. To be prepared, wrap igroup->hwpt and igroup->device_list into attach struct which is allocated per attaching the first device of the group and freed per detaching the last device of the group. Link: https://patch.msgid.link/r/20250321171940.7213-8-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	ba1de6cd41	iommufd/device: Add helper to detect the first attach of a group The existing code detects the first attach by checking the igroup->device_list. However, the igroup->hwpt can also be used to detect the first attach. In future modifications, it is better to check the igroup->hwpt instead of the device_list. To improve readbility and also prepare for further modifications on this part, this adds a helper for it. Link: https://patch.msgid.link/r/20250321171940.7213-7-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	2eaa7f845e	iommufd/device: Replace idev->igroup with local variable With more use of the fields of igroup, use a local vairable instead of using the idev->igroup heavily. No functional change expected. Link: https://patch.msgid.link/r/20250321171940.7213-6-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	bc06f7f66d	iommufd/device: Only add reserved_iova in non-pasid path As the pasid is passed through the attach/replace/detach helpers, it is necessary to ensure only the non-pasid path adds reserved_iova. Link: https://patch.msgid.link/r/20250321171940.7213-5-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:30 -03:00
Yi Liu	03c9b102be	iommufd: Pass @pasid through the device attach/replace path Most of the core logic before conducting the actual device attach/ replace operation can be shared with pasid attach/replace. So pass @pasid through the device attach/replace helpers to prepare adding pasid attach/replace. So far the @pasid should only be IOMMU_NO_PASID. No functional change. Link: https://patch.msgid.link/r/20250321171940.7213-4-yi.l.liu@intel.com Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:30 -03:00
Yi Liu	8a9e1e773f	iommu: Introduce a replace API for device pasid Provide a high-level API to allow replacements of one domain with another for specific pasid of a device. This is similar to iommu_replace_group_handle() and it is expected to be used only by IOMMUFD. Link: https://patch.msgid.link/r/20250321171940.7213-3-yi.l.liu@intel.com Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:30 -03:00
Yi Liu	ada14b9f1a	iommu: Require passing new handles to APIs supporting handle Add kdoc to highligt the caller of iommu_[attach\|replace]_group_handle() and iommu_attach_device_pasid() should always provide a new handle. This can avoid race with lockless reference to the handle. e.g. the find_fault_handler() and iommu_report_device_fault() in the PRI path. Link: https://patch.msgid.link/r/20250321171940.7213-2-yi.l.liu@intel.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:30 -03:00
Nicolin Chen	06d54f00f3	iommu: Drop sw_msi from iommu_domain There are only two sw_msi implementations in the entire system, thus it's not very necessary to have an sw_msi pointer. Instead, check domain->cookie_type to call the two sw_msi implementations directly from the core code. Link: https://patch.msgid.link/r/7ded87c871afcbaac665b71354de0a335087bf0f.1742871535.git.nicolinc@nvidia.com Suggested-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:19 -03:00
Nicolin Chen	ec031e1b35	iommufd: Move iommufd_sw_msi and related functions to driver.c To provide the iommufd_sw_msi() to the iommu core that is under a different Kconfig, move it and its related functions to driver.c. Then, stub it into the iommu-priv header. The iommufd_sw_msi_install() continues to be used by iommufd internal, so put it in the private header. Note that iommufd_sw_msi() will be called in the iommu core, replacing the sw_msi function pointer. Given that IOMMU_API is "bool" in Kconfig, change IOMMUFD_DRIVER_CORE to "bool" as well. Since this affects the module size, here is before-n-after size comparison: [Before] text data bss dec hex filename 18797 848 56 19701 4cf5 drivers/iommu/iommufd/device.o 722 44 0 766 2fe drivers/iommu/iommufd/driver.o [After] text data bss dec hex filename 17735 808 56 18599 48a7 drivers/iommu/iommufd/device.o 3020 180 0 3200 c80 drivers/iommu/iommufd/driver.o Link: https://patch.msgid.link/r/374c159592dba7852bee20968f3f66fa0ee8ca93.1742871535.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:19 -03:00
Robin Murphy	6aa63a4ec9	iommu: Sort out domain user data When DMA/MSI cookies were made first-class citizens back in commit `46983fcd67` ("iommu: Pull IOVA cookie management into the core"), there was no real need to further expose the two different cookie types. However, now that IOMMUFD wants to add a third type of MSI-mapping cookie, we do have a nicely compelling reason to properly dismabiguate things at the domain level beyond just vaguely guessing from the domain type. Meanwhile, we also effectively have another "cookie" in the form of the anonymous union for other user data, which isn't much better in terms of being vague and unenforced. The fact is that all these cookie types are mutually exclusive, in the sense that combining them makes zero sense and/or would be catastrophic (iommu_set_fault_handler() on an SVA domain, anyone?) - the only combination which might be reasonable is perhaps a fault handler and an MSI cookie, but nobody's doing that at the moment, so let's rule it out as well for the sake of being clear and robust. To that end, we pull DMA and MSI cookies apart a little more, mostly to clear up the ambiguity at domain teardown, then for clarity (and to save a little space), move them into the union, whose ownership we can then properly describe and enforce entirely unambiguously. [nicolinc: rebase on latest tree; use prefix IOMMU_COOKIE_; merge unions in iommu_domain; add IOMMU_COOKIE_IOMMUFD for iommufd_hwpt] Link: https://patch.msgid.link/r/1ace9076c95204bbe193ee77499d395f15f44b23.1742871535.git.nicolinc@nvidia.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:18 -03:00
Nuno Das Neves	3817854ba8	hyperv: Log hypercall status codes as strings Introduce hv_status_printk() macros as a convenience to log hypercall errors, formatting them with the status code (HV_STATUS_*) as a raw hex value and also as a string, which saves some time while debugging. Create a table of HV_STATUS_ codes with strings and mapped errnos, and use it for hv_result_to_string() and hv_result_to_errno(). Use the new hv_status_printk()s in hv_proc.c, hyperv-iommu.c, and irqdomain.c hypercalls to aid debugging in the root partition. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Link: https://lore.kernel.org/r/1741980536-3865-2-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-2-git-send-email-nunodasneves@linux.microsoft.com>	2025-03-20 21:23:03 +00:00
Joerg Roedel	22df63a23a	Merge branches 'apple/dart', 'arm/smmu/updates', 'arm/smmu/bindings', 'rockchip', 's390', 'core', 'intel/vt-d' and 'amd/amd-vi' into next	2025-03-20 09:11:09 +01:00
Lu Baolu	93ae6e68b6	iommu/vt-d: Fix possible circular locking dependency We have recently seen report of lockdep circular lock dependency warnings on platforms like Skylake and Kabylake: ====================================================== WARNING: possible circular locking dependency detected 6.14.0-rc6-CI_DRM_16276-gca2c04fe76e8+ #1 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: ffffffff8360ee48 (iommu_probe_device_lock){+.+.}-{3:3}, at: iommu_probe_device+0x1d/0x70 but task is already holding lock: ffff888102c7efa8 (&device->physical_node_lock){+.+.}-{3:3}, at: intel_iommu_init+0xe75/0x11f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (&device->physical_node_lock){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 intel_iommu_init+0xe75/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #5 (dmar_global_lock){++++}-{3:3}: down_read+0x43/0x1d0 enable_drhd_fault_handling+0x21/0x110 cpuhp_invoke_callback+0x4c6/0x870 cpuhp_issue_call+0xbf/0x1f0 __cpuhp_setup_state_cpuslocked+0x111/0x320 __cpuhp_setup_state+0xb0/0x220 irq_remap_enable_fault_handling+0x3f/0xa0 apic_intr_mode_init+0x5c/0x110 x86_late_time_init+0x24/0x40 start_kernel+0x895/0xbd0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xbf/0x110 common_startup_64+0x13e/0x141 -> #4 (cpuhp_state_mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 __cpuhp_setup_state_cpuslocked+0x67/0x320 __cpuhp_setup_state+0xb0/0x220 page_alloc_init_cpuhp+0x2d/0x60 mm_core_init+0x18/0x2c0 start_kernel+0x576/0xbd0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xbf/0x110 common_startup_64+0x13e/0x141 -> #3 (cpu_hotplug_lock){++++}-{0:0}: __cpuhp_state_add_instance+0x4f/0x220 iova_domain_init_rcaches+0x214/0x280 iommu_setup_dma_ops+0x1a4/0x710 iommu_device_register+0x17d/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #2 (&domain->iova_cookie->mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 iommu_setup_dma_ops+0x16b/0x710 iommu_device_register+0x17d/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #1 (&group->mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 __iommu_probe_device+0x24c/0x4e0 probe_iommu_group+0x2b/0x50 bus_for_each_dev+0x7d/0xe0 iommu_device_register+0xe1/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #0 (iommu_probe_device_lock){+.+.}-{3:3}: __lock_acquire+0x1637/0x2810 lock_acquire+0xc9/0x300 __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 iommu_probe_device+0x1d/0x70 intel_iommu_init+0xe90/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: iommu_probe_device_lock --> dmar_global_lock --> &device->physical_node_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&device->physical_node_lock); lock(dmar_global_lock); lock(&device->physical_node_lock); lock(iommu_probe_device_lock); * DEADLOCK * This driver uses a global lock to protect the list of enumerated DMA remapping units. It is necessary due to the driver's support for dynamic addition and removal of remapping units at runtime. Two distinct code paths require iteration over this remapping unit list: - Device registration and probing: the driver iterates the list to register each remapping unit with the upper layer IOMMU framework and subsequently probe the devices managed by that unit. - Global configuration: Upper layer components may also iterate the list to apply configuration changes. The lock acquisition order between these two code paths was reversed. This caused lockdep warnings, indicating a risk of deadlock. Fix this warning by releasing the global lock before invoking upper layer interfaces for device registration. Fixes: `b150654f74` ("iommu/vt-d: Fix suspicious RCU usage") Closes: https://lore.kernel.org/linux-iommu/SJ1PR11MB612953431F94F18C954C4A9CB9D32@SJ1PR11MB6129.namprd11.prod.outlook.com/ Tested-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250317035714.1041549-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 09:03:57 +01:00
Sean Christopherson	688124cc54	iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changes Don't overwrite an IRTE that is posting IRQs to a vCPU with a posted MSI entry if the host IRQ affinity happens to change. If/when the IRTE is reverted back to "host mode", it will be reconfigured as a posted MSI or remapped entry as appropriate. Drop the "mode" field, which doesn't differentiate between posted MSIs and posted vCPUs, in favor of a dedicated posted_vcpu flag. Note! The two posted_{msi,vcpu} flags are intentionally not mutually exclusive; an IRTE can transition between posted MSI and posted vCPU. Fixes: `ed1e48ea43` ("iommu/vt-d: Enable posted mode for device MSIs") Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250315025135.2365846-3-seanjc@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 09:03:56 +01:00
Sean Christopherson	2454823e97	iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabled Add a helper to take care of reconfiguring an IRTE to deliver IRQs to the host, i.e. not to a vCPU, and use the helper when an IRTE's vCPU affinity is nullified, i.e. when KVM puts an IRTE back into "host" mode. Because posted MSIs use an ephemeral IRTE, using modify_irte() puts the IRTE into full remapped mode, i.e. unintentionally disables posted MSIs on the IRQ. Fixes: `ed1e48ea43` ("iommu/vt-d: Enable posted mode for device MSIs") Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250315025135.2365846-2-seanjc@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 09:03:56 +01:00
Qasim Ijaz	b8741496c0	iommu: apple-dart: fix potential null pointer deref If kzalloc() fails, accessing cfg->supports_bypass causes a null pointer dereference. Fix by checking for NULL immediately after allocation and returning -ENOMEM. Fixes: `3bc0102835` ("iommu: apple-dart: Allow mismatched bypass support") Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Link: https://lore.kernel.org/r/20250314230102.11008-1-qasdev00@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 08:59:15 +01:00
Robin Murphy	dcde1c4aa7	iommu/rockchip: Retire global dma_dev workaround The global dma_dev trick was mostly because the old domain_alloc op provided no context, so no way to know which IOMMU was to own the pagetable, or if a suitable one even existed at all. In the new multi-instance world with domain_alloc_paging this is no longer a concern - now we know that the given device must be associated with a valid IOMMU instance which provided the op to call in the first place, and therefore that instance can and should be the pagetable owner. To avoid worrying about the lifetime and stability of the rk_domain->iommus list, and keep the lookups simple and efficient, we'll still stash a dma_dev pointer, but now it's accurately per-domain. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Quentin Schulz <quentin.schulz@cherry.de> Tested-by: Dang Huynh <danct12@riseup.net> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/25dc948a7d35c8142c5719ac22bc523f8524d006.1741886382.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 08:58:24 +01:00
Robin Murphy	f90aa59eb2	iommu/rockchip: Register in a sensible order Currently Rockchip calls iommu_device_register() before it's finished setting up the hardware and driver state, and as such it now gets unhappy in various ways when registration starts working the way it was always intended to, and probing client devices straight away. Reorder the operations to ensure that what we're registering is a prepared and functional IOMMU instance. Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Quentin Schulz <quentin.schulz@cherry.de> Tested-by: Dang Huynh <danct12@riseup.net> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/e69532f00bf49d98322b96788edb7e2e305e4006.1741886382.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 08:58:23 +01:00
Robin Murphy	f48dcda8f6	iommu/rockchip: Allocate per-device data sensibly Now that DT-based probing is finally happening in the right order again, it reveals an issue in Rockchip's of_xlate, which can now be called during registration, but is using the global dma_dev which is only assigned later. However, this makes little sense when we're already looking up the correct IOMMU device, who should logically be the owner of the devm allocation anyway. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Anders Roxell <anders.roxell@linaro.org> Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Quentin Schulz <quentin.schulz@cherry.de> Tested-by: Dang Huynh <danct12@riseup.net> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/771e91cf16b3048e93f657153b76905665878fa2.1741886382.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 08:58:22 +01:00
Nicolin Chen	da0c56520e	iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations There is a DoS concern on the shared hardware event queue among devices passed through to VMs, that too many translation failures that belong to VMs could overflow the shared hardware event queue if those VMs or their VMMs don't handle/recover the devices properly. The MEV bit in the STE allows to configure the SMMU HW to merge similar event records, though there is no guarantee. Set it in a nested STE for DoS mitigations. In the future, we might want to enable the MEV for non-nested cases too such as domain->type == IOMMU_DOMAIN_UNMANAGED or even IOMMU_DOMAIN_DMA. Link: https://patch.msgid.link/r/8ed12feef67fc65273d0f5925f401a81f56acebe.1741719725.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	e7d3fa3d29	iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU Aside from the IOPF framework, iommufd provides an additional pathway to report hardware events, via the vEVENTQ of vIOMMU infrastructure. Define an iommu_vevent_arm_smmuv3 uAPI structure, and report stage-1 events in the threaded IRQ handler. Also, add another four event record types that can be forwarded to a VM. Link: https://patch.msgid.link/r/5cf6719682fdfdabffdb08374cdf31ad2466d75a.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	f0ea207ed7	iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster Use it to store all vSMMU-related data. The vsid (Virtual Stream ID) will be the first use case. Since the vsid reader will be the eventq handler that already holds a streams_mutex, reuse that to fence the vmaster too. Also add a pair of arm_smmu_attach_prepare/commit_vmaster helpers to set or unset the master->vmaster pointer. Put the helpers inside the existing arm_smmu_attach_prepare/commit(). For identity/blocked ops that don't call arm_smmu_attach_prepare/commit(), add a simpler arm_smmu_master_clear_vmaster helper to unset the vmaster. Link: https://patch.msgid.link/r/a7f282e1a531279e25f06c651e95d56f6b120886.1741719725.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	b3cc0b7599	iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VEVENT for vEVENTQ coverage The handler will get vDEVICE object from the given mdev and convert it to its per-vIOMMU virtual ID to mimic a real IOMMU driver. Link: https://patch.msgid.link/r/1ea874d20e56d65e7cfd6e0e8e01bd3dbd038761.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	941d0719aa	iommufd/selftest: Require vdev_id when attaching to a nested domain When attaching a device to a vIOMMU-based nested domain, vdev_id must be present. Add a piece of code hard-requesting it, preparing for a vEVENTQ support in the following patch. Then, update the TEST_F. A HWPT-based nested domain will return a NULL new_viommu, thus no such a vDEVICE requirement. Link: https://patch.msgid.link/r/4051ca8a819e51cb30de6b4fe9e4d94d956afe3d.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	e8e1ef9b77	iommufd/viommu: Add iommufd_viommu_report_event helper Similar to iommu_report_device_fault, this allows IOMMU drivers to report vIOMMU events from threaded IRQ handlers to user space hypervisors. Link: https://patch.msgid.link/r/44be825042c8255e75d0151b338ffd8ba0e4920b.1741719725.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:47 -03:00
Nicolin Chen	ea94b211c5	iommufd/viommu: Add iommufd_viommu_get_vdev_id helper This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want to convert a struct device pointer (physical) to its virtual device ID for an event injection to the user space VM. Again, this avoids exposing more core structures to the drivers, than the iommufd_viommu alone. Link: https://patch.msgid.link/r/18b8e8bc1b8104d43b205d21602c036fd0804e56.1741719725.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:47 -03:00
Nicolin Chen	e36ba5ab80	iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC Introduce a new IOMMUFD_OBJ_VEVENTQ object for vIOMMU Event Queue that provides user space (VMM) another FD to read the vIOMMU Events. Allow a vIOMMU object to allocate vEVENTQs, with a condition that each vIOMMU can only have one single vEVENTQ per type. Add iommufd_veventq_alloc() with iommufd_veventq_ops for the new ioctl. Link: https://patch.msgid.link/r/21acf0751dd5c93846935ee06f93b9c65eff5e04.1741719725.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:47 -03:00
Nicolin Chen	0507f337fc	iommufd: Rename fault.c to eventq.c Rename the file, aligning with the new eventq object. Link: https://patch.msgid.link/r/d726397e2d08028e25a1cb6eb9febefac35a32ba.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:51:19 -03:00
Nicolin Chen	5426a78beb	iommufd: Abstract an iommufd_eventq from iommufd_fault The fault object was designed exclusively for hwpt's IO page faults (PRI). But its queue implementation can be reused for other purposes too, such as hardware IRQ and event injections to user space. Meanwhile, a fault object holds a list of faults. So it's more accurate to call it a "fault queue". Combining the reusing idea above, abstract a new iommufd_eventq as a common structure embedded into struct iommufd_fault, similar to hwpt_paging holding a common hwpt. Add a common iommufd_eventq_ops and iommufd_eventq_init to prepare for an IOMMUFD_OBJ_VEVENTQ (vIOMMU Event Queue). Link: https://patch.msgid.link/r/e7336a857954209aabb466e0694aab323da95d90.1741719725.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:51:19 -03:00
Nicolin Chen	927dabc9aa	iommufd/fault: Add an iommufd_fault_init() helper The infrastructure of a fault object will be shared with a new vEVENTQ object in a following change. Add an iommufd_fault_init helper and an INIT_EVENTQ_FOPS marco for a vEVENTQ allocator to use too. Reorder the iommufd_ctx_get and refcount_inc, to keep them symmetrical with the iommufd_fault_fops_release(). Since the new vEVENTQ doesn't need "response" and its "mutex", so keep the xa_init_flags and mutex_init in their original locations. Link: https://patch.msgid.link/r/a9522c521909baeb1bd843950b2490478f3d06e0.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:51:19 -03:00
Nicolin Chen	dbf00d7d89	iommufd/fault: Move two fault functions out of the header There is no need to keep them in the header. The vEVENTQ version of these two functions will turn out to be a different implementation and will not share with this fault version. Thus, move them out of the header. Link: https://patch.msgid.link/r/7eebe32f3d354799f5e28128c693c3c284740b21.1741719725.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:51:18 -03:00
Robin Murphy	ba40f9dc95	iommu/mediatek-v1: Support COMPILE_TEST Add COMPILE_TEST support to Mediatek v1 so I have less chance of breaking it again in future. This just needs the ARM dma-iommu API stubbing out for now. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/a8d7c0b4393b360ce556f5f15e229d1e2fe73c83.1741702556.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-13 12:17:30 +01:00
Kishon Vijay Abraham I	19e5cc156c	iommu/amd: Enable support for up to 2K interrupts per function AMD IOMMU optionally supports up to 2K interrupts per function on newer platforms. Support for this feature is indicated through Extended Feature 2 Register (MMIO Offset 01A0h[NumIntRemapSup]). Allocate 2K IRTEs per device when this support is available. Co-developed-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250307095822.2274-5-sarunkod@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-13 12:14:17 +01:00
Sairaj Kodilkar	950865c1b8	iommu/amd: Rename DTE_INTTABLEN* and MAX_IRQS_PER_TABLE macro AMD iommu can support both 512 and 2K interrupts on newer platform. Hence add suffix "512" to the existing macros. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250307095822.2274-4-sarunkod@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-13 12:14:16 +01:00
Sairaj Kodilkar	eaf717fa1c	iommu/amd: Replace slab cache allocator with page allocator Commit `05152a0494` ("iommu/amd: Add slab-cache for irq remapping tables") introduces slab cache allocator. But slab cache allocator provides benefit only when the allocation and deallocation of many identical objects is frequent. The AMD IOMMU driver allocates Interrupt remapping table (IRT) when device driver requests IRQ for the first time and never frees it. Hence the slab allocator does not provide any benefit here. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250307095822.2274-3-sarunkod@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-13 12:14:15 +01:00
Sairaj Kodilkar	1c608b0b28	iommu/amd: Introduce generic function to set multibit feature value Define generic function `iommu_feature_set()` to set the values in the feature control register and replace `iommu_set_inv_tlb_timeout()` with it. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250307095822.2274-2-sarunkod@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-13 12:14:14 +01:00
Robin Murphy	73d2f10957	iommu: Don't warn prematurely about dodgy probes The warning for suspect probe conditions inadvertently got moved too early in a prior respin - it happened to work out OK for fwspecs, but in general still needs to be after the ops->probe_device call so drivers which filter devices for themselves have a chance to do that. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Link: https://lore.kernel.org/r/72a4853e7ef36e7c1c4ca171ed4ed8e1a463a61a.1741791691.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-12 16:58:27 +01:00
Pranjal Shrivastava	0a679336dc	iommu/arm-smmu: Set rpm auto_suspend once during probe The current code calls arm_smmu_rpm_use_autosuspend() during device attach, which seems unusual as it sets the autosuspend delay and the 'use_autosuspend' flag for the smmu device. These parameters can be simply set once during the smmu probe and in order to avoid bouncing rpm states, we can simply mark_last_busy() during a client dev attach as discussed in [1]. Move the handling of arm_smmu_rpm_use_autosuspend() to the SMMU probe and modify the arm_smmu_rpm_put() function to mark_last_busy() before calling __pm_runtime_put_autosuspend(). Additionally, s/pm_runtime_put_autosuspend/__pm_runtime_put_autosuspend/ to help with the refactor of the pm_runtime_put_autosuspend() API [2]. Link: https://lore.kernel.org/r/20241023164835.GF29251@willie-the-truck [1] Link: https://git.kernel.org/linus/b7d46644e554 [2] Signed-off-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250123195636.4182099-1-praan@google.com Signed-off-by: Will Deacon <will@kernel.org>	2025-03-11 13:23:24 +00:00
Robin Murphy	bcb81ac6ae	iommu: Get DT/ACPI parsing into the proper probe path In hindsight, there were some crucial subtleties overlooked when moving {of,acpi}_dma_configure() to driver probe time to allow waiting for IOMMU drivers with -EPROBE_DEFER, and these have become an ever-increasing source of problems. The IOMMU API has some fundamental assumptions that iommu_probe_device() is called for every device added to the system, in the order in which they are added. Calling it in a random order or not at all dependent on driver binding leads to malformed groups, a potential lack of isolation for devices with no driver, and all manner of unexpected concurrency and race conditions. We've attempted to mitigate the latter with point-fix bodges like iommu_probe_device_lock, but it's a losing battle and the time has come to bite the bullet and address the true source of the problem instead. The crux of the matter is that the firmware parsing actually serves two distinct purposes; one is identifying the IOMMU instance associated with a device so we can check its availability, the second is actually telling that instance about the relevant firmware-provided data for the device. However the latter also depends on the former, and at the time there was no good place to defer and retry that separately from the availability check we also wanted for client driver probe. Nowadays, though, we have a proper notion of multiple IOMMU instances in the core API itself, and each one gets a chance to probe its own devices upon registration, so we can finally make that work as intended for DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to be able to run the iommu_fwspec machinery currently buried deep in the wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be surprisingly straightforward to bootstrap this transformation by pretty much just calling the same path twice. At client driver probe time, dev->driver is obviously set; conversely at device_add(), or a subsequent bus_iommu_probe(), any device waiting for an IOMMU really should not have a driver already, so we can use that as a condition to disambiguate the two cases, and avoid recursing back into the IOMMU core at the wrong times. Obviously this isn't the nicest thing, but for now it gives us a functional baseline to then unpick the layers in between without many more awkward cross-subsystem patches. There are some minor side-effects like dma_range_map potentially being created earlier, and some debug prints being repeated, but these aren't significantly detrimental. Let's make things work first, then deal with making them nice. With the basic flow finally in the right order again, the next step is probably turning the bus->dma_configure paths inside-out, since all we really need from bus code is its notion of which device and input ID(s) to parse the common firmware properties with... Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:43 +01:00
Robin Murphy	3832862eb9	iommu: Keep dev->iommu state consistent At the moment, if of_iommu_configure() allocates dev->iommu itself via iommu_fwspec_init(), then suffers a DT parsing failure, it cleans up the fwspec but leaves the empty dev_iommu hanging around. So far this is benign (if a tiny bit wasteful), but we'd like to be able to reason about dev->iommu having a consistent and unambiguous lifecycle. Thus make sure that the of_iommu cleanup undoes precisely whatever it did. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/d219663a3f23001f23d520a883ac622d70b4e642.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:42 +01:00
Robin Murphy	fd598f71b6	iommu: Resolve ops in iommu_init_device() Since iommu_init_device() was factored out, it is in fact the only consumer of the ops which __iommu_probe_device() is resolving, so let it do that itself rather than passing them in. This also puts the ops lookup at a more logical point relative to the rest of the flow through __iommu_probe_device(). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/fa4b6cfc67a352488b7f4e0b736008307ce9ac2e.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:41 +01:00
Robin Murphy	b46064a188	iommu: Handle race with default domain setup It turns out that deferred default domain creation leaves a subtle race window during iommu_device_register() wherein a client driver may asynchronously probe in parallel and get as far as performing DMA API operations with dma-direct, only to be switched to iommu-dma underfoot once the default domain attachment finally happens, with obviously disastrous consequences. Even the wonky of_iommu_configure() path is at risk, since iommu_fwspec_init() will no longer defer client probe as the instance ops are (necessarily) already registered, and the "replay" iommu_probe_device() call can see dev->iommu_group already set and so think there's nothing to do either. Fortunately we already have the right tool in the right place in the form of iommu_device_use_default_domain(), which just needs to ensure that said default domain is actually ready to be used. Deferring the client probe shouldn't have too much impact, given that this only happens while the IOMMU driver is probing, and thus due to kick the deferred probe list again once it finishes. Reported-by: Charan Teja Kalla <quic_charante@quicinc.com> Fixes: `98ac73f99b` ("iommu: Require a default_domain for all iommu drivers") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/e88b94c9b575034a2c98a48b3d383654cbda7902.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:40 +01:00
Robin Murphy	29c6e1c2b9	iommu: Unexport iommu_fwspec_free() The drivers doing their own fwspec parsing have no need to call iommu_fwspec_free() since fwspecs were moved into dev_iommu, as returning an error from .probe_device will tear down the whole lot anyway. Move it into the private interface now that it only serves for of_iommu to clean up in an error case. I have no idea what mtk_v1 was doing in effectively guaranteeing a NULL fwspec would be dereferenced if no "iommus" DT property was found, so add a check for that to at least make the code look sane. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/36e245489361de2d13db22a510fa5c79e7126278.1740667667.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:39 +01:00
Lu Baolu	4c293add58	iommu/vt-d: Cleanup intel_context_flush_present() The intel_context_flush_present() is called in places where either the scalable mode is disabled, or scalable mode is enabled but all PASID entries are known to be non-present. In these cases, the flush_domains path within intel_context_flush_present() will never execute. This dead code is therefore removed. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250228092631.3425464-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:05 +01:00
Lu Baolu	87caaba1d1	iommu/vt-d: Move PRI enablement in probe path Update PRI enablement to use the new method, similar to the amd iommu driver. Enable PRI in the device probe path and disable it when the device is released. PRI is enabled throughout the device's iommu lifecycle. The infrastructure for the iommu subsystem to handle iopf requests is created during iopf enablement and released during iopf disablement. All invalid page requests from the device are automatically handled by the iommu subsystem if iopf is not enabled. Add iopf_refcount to track the iopf enablement. Convert the return type of intel_iommu_disable_iopf() to void, as there is no way to handle a failure when disabling this feature. Make intel_iommu_enable/disable_iopf() helpers global, as they will be used beyond the current file in the subsequent patch. The iopf_refcount is not protected by any lock. This is acceptable, as there is no concurrent access to it in the current code. The following patch will address this by moving it to the domain attach/detach paths, which are protected by the iommu group mutex. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250228092631.3425464-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:04 +01:00
Lu Baolu	5518f239af	iommu/vt-d: Move scalable mode ATS enablement to probe path Device ATS is currently enabled when a domain is attached to the device and disabled when the domain is detached. This creates a limitation: when the IOMMU is operating in scalable mode and IOPF is enabled, the device's domain cannot be changed. The previous code enables ATS when a domain is set to a device's RID and disables it during RID domain switch. So, if a PASID is set with a domain requiring PRI, ATS should remain enabled until the domain is removed. During the PASID domain's lifecycle, if the RID's domain changes, PRI will be disrupted because it depends on ATS, which is disabled when the blocking domain is set for the device's RID. Remove this limitation by moving ATS enablement to the device probe path. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250228092631.3425464-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:04 +01:00
Jason Gunthorpe	607ba1bb09	iommu/vt-d: Check if SVA is supported when attaching the SVA domain Attach of a SVA domain should fail if SVA is not supported, move the check for SVA support out of IOMMU_DEV_FEAT_SVA and into attach. Also check when allocating a SVA domain to match other drivers. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250228092631.3425464-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:03 +01:00
Jason Gunthorpe	a8653e5cc2	iommu/vt-d: Use virt_to_phys() If all the inlines are unwound virt_to_dma_pfn() is simply: return page_to_pfn(virt_to_page(p)) << (PAGE_SHIFT - VTD_PAGE_SHIFT); Which can be re-arranged to: (page_to_pfn(virt_to_page(p)) << PAGE_SHIFT) >> VTD_PAGE_SHIFT The only caller is: ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) re-arranged to: ((page_to_pfn(virt_to_page(tmp_page)) << PAGE_SHIFT) >> VTD_PAGE_SHIFT) << VTD_PAGE_SHIFT Which simplifies to: page_to_pfn(virt_to_page(tmp_page)) << PAGE_SHIFT That is the same as virt_to_phys(tmp_page), so just remove all of this. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-e797f4dc6918+93057-iommu_pages_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:02 +01:00
Yunhui Cui	9ce7603ad3	iommu/vt-d: Fix system hang on reboot -f We found that executing the command ./a.out &;reboot -f (where a.out is a program that only executes a while(1) infinite loop) can probabilistically cause the system to hang in the intel_iommu_shutdown() function, rendering it unresponsive. Through analysis, we identified that the factors contributing to this issue are as follows: 1. The reboot -f command does not prompt the kernel to notify the application layer to perform cleanup actions, allowing the application to continue running. 2. When the kernel reaches the intel_iommu_shutdown() function, only the BSP (Bootstrap Processor) CPU is operational in the system. 3. During the execution of intel_iommu_shutdown(), the function down_write (&dmar_global_lock) causes the process to sleep and be scheduled out. 4. At this point, though the processor's interrupt flag is not cleared, allowing interrupts to be accepted. However, only legacy devices and NMI (Non-Maskable Interrupt) interrupts could come in, as other interrupts routing have already been disabled. If no legacy or NMI interrupts occur at this stage, the scheduler will not be able to run. 5. If the application got scheduled at this time is executing a while(1)- type loop, it will be unable to be preempted, leading to an infinite loop and causing the system to become unresponsive. To resolve this issue, the intel_iommu_shutdown() function should not execute down_write(), which can potentially cause the process to be scheduled out. Furthermore, since only the BSP is running during the later stages of the reboot, there is no need for protection against parallel access to the DMAR (DMA Remapping) unit. Therefore, the following lines could be removed: down_write(&dmar_global_lock); up_write(&dmar_global_lock); After testing, the issue has been resolved. Fixes: `6c3a44ed3c` ("iommu/vt-d: Turn off translations at shutdown") Co-developed-by: Ethan Zhao <haifeng.zhao@linux.intel.com> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20250303062421.17929-1-cuiyunhui@bytedance.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:02 +01:00
Hector Martin	3bc0102835	iommu: apple-dart: Allow mismatched bypass support This is needed by ISP, which has DART0 with bypass and DART1/2 without. Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20250307-isp-dart-v3-2-684fe4489591@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:29:52 +01:00
Hector Martin	9d7b779e30	iommu: apple-dart: Increase MAX_DARTS_PER_DEVICE to 3 ISP needs this. Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20250307-isp-dart-v3-1-684fe4489591@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:29:51 +01:00
Vasant Hegde	625586855f	iommu/amd: Consolidate protection domain free code Consolidate protection domain free code inside amd_iommu_domain_free() and remove protection_domain_free() function. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-8-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:11 +01:00
Vasant Hegde	5536e19e94	iommu/amd: Remove unused forward declaration Remove unused forward declaration. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-7-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:10 +01:00
Vasant Hegde	ee4cf9260a	iommu/amd: Fix header file Move function declaration inside AMD_IOMMU_H defination. Fixes: `fd5dff9de4` ("iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers") Fixes: `457da57646` ("iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()") Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-6-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:10 +01:00
Vasant Hegde	60785fceda	iommu/amd: Remove outdated comment Remove outdated comment. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-5-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:09 +01:00
Vasant Hegde	36a1cfd497	iommu/amd/pgtbl_v2: Improve error handling Return -ENOMEM if v2_alloc_pte() fails to allocate memory. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:09 +01:00
Vasant Hegde	e481f8a5db	iommu/amd: Remove unused variable Remove 'amd_iommu_aperture_order' as its not used since commit `d9cfed9254`. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:08 +01:00
Vasant Hegde	558d2bbd45	iommu/amd: Log IOMMU control register in event log path Useful for debugging ILLEGAL_DEV_TABLE_ENTRY events as some of the DTE settings depends on Control register settings. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:08 +01:00
Robin Murphy	032d7e435c	iommu/dma: Remove redundant locking This reverts commit `ac9a5d522b`. iommu_dma_init_domain() is now only called under the group mutex, as it should be given that that the default domain belongs to the group, so the additional internal locking is no longer needed. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/a943d4c198e6a1fffe998337d577dc3aa7f660a9.1740585469.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:23:58 +01:00
Yi Liu	55c85fa757	iommufd: Fail replace if device has not been attached The current implementation of iommufd_device_do_replace() implicitly assumes that the input device has already been attached. However, there is no explicit check to verify this assumption. If another device within the same group has been attached, the replace operation might succeed, but the input device itself may not have been attached yet. As a result, the input device might not be tracked in the igroup->device_list, and its reserved IOVA might not be added. Despite this, the caller might incorrectly assume that the device has been successfully replaced, which could lead to unexpected behavior or errors. To address this issue, add a check to ensure that the input device has been attached before proceeding with the replace operation. This check will help maintain the integrity of the device tracking system and prevent potential issues arising from incorrect assumptions about the device's attachment status. Fixes: `e88d4ec154` ("iommufd: Add iommufd_device_replace()") Link: https://patch.msgid.link/r/20250306034842.5950-1-yi.l.liu@intel.com Cc: stable@vger.kernel.org Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-07 17:05:24 -04:00
Nicolin Chen	897008d0f7	iommufd: Set domain->iommufd_hwpt in all hwpt->domain allocators Setting domain->iommufd_hwpt in iommufd_hwpt_alloc() only covers the HWPT allocations from user space, but not for an auto domain. This resulted in a NULL pointer access in the auto domain pathway: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 pc : iommufd_sw_msi+0x54/0x2b0 lr : iommufd_sw_msi+0x40/0x2b0 Call trace: iommufd_sw_msi+0x54/0x2b0 (P) iommu_dma_prepare_msi+0x64/0xa8 its_irq_domain_alloc+0xf0/0x2c0 irq_domain_alloc_irqs_parent+0x2c/0xa8 msi_domain_alloc+0xa0/0x1a8 Since iommufd_sw_msi() requires to access the domain->iommufd_hwpt, it is better to set that explicitly prior to calling iommu_domain_set_sw_msi(). Fixes: `748706d7ca` ("iommu: Turn fault_data to iommufd private pointer") Link: https://patch.msgid.link/r/20250305211800.229465-1-nicolinc@nvidia.com Reported-by: Ankit Agrawal <ankita@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Ankit Agrawal <ankita@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-07 15:56:22 -04:00
Nicolin Chen	a05df03a88	iommufd: Fix uninitialized rc in iommufd_access_rw() Reported by smatch: drivers/iommu/iommufd/device.c:1392 iommufd_access_rw() error: uninitialized symbol 'rc'. Fixes: `8d40205f60` ("iommufd: Add kAPI toward external drivers for kernel access") Link: https://patch.msgid.link/r/20250227200729.85030-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202502271339.a2nWr9UA-lkp@intel.com/ [nicolinc: can't find an original report but only in "old smatch warnings"] Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-04 09:41:25 -04:00
Yi Liu	1062d81086	iommufd: Disallow allocating nested parent domain with fault ID Allocating a domain with a fault ID indicates that the domain is faultable. However, there is a gap for the nested parent domain to support PRI. Some hardware lacks the capability to distinguish whether PRI occurs at stage 1 or stage 2. This limitation may require software-based page table walking to resolve. Since no in-tree IOMMU driver currently supports this functionality, it is disallowed. For more details, refer to the related discussion at [1]. [1] https://lore.kernel.org/linux-iommu/bd1655c6-8b2f-4cfa-adb1-badc00d01811@intel.com/ Link: https://patch.msgid.link/r/20250226104012.82079-1-yi.l.liu@intel.com Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-04 09:34:54 -04:00
Joerg Roedel	59b6c3469e	iommu shared branch with iommufd The three dependent series on a shared branch: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ8HEaQAKCRCFwuHvBreF YfEHAP9khiTmkfsx5JGBtKMqFbF/TShyZfgcLTCWw/VGXLcdswD/fnxA8L+y6zD4 9P73iOqSfHRV69dXu9CU6duZZifWawI= =YZh7 -----END PGP SIGNATURE----- Merge tag 'for-joerg' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core iommu shared branch with iommufd The three dependent series on a shared branch: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API	2025-02-28 16:35:19 +01:00
Yi Liu	5e9f822c9c	iommu: Swap the order of setting group->pasid_array and calling attach op of iommu drivers The current implementation stores entry to the group->pasid_array before the underlying iommu driver has successfully set the new domain. This can lead to issues where PRIs are received on the new domain before the attach operation is completed. This patch swaps the order of operations to ensure that the domain is set in the underlying iommu driver before updating the group->pasid_array. Link: https://patch.msgid.link/r/20250226011849.5102-5-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Yi Liu	e1ea9d30d8	iommu: Store either domain or handle in group->pasid_array iommu_attach_device_pasid() only stores handle to group->pasid_array when there is a valid handle input. However, it makes the iommu_attach_device_pasid() unable to detect if the pasid has been attached or not previously. To be complete, let the iommu_attach_device_pasid() store the domain to group->pasid_array if no valid handle. The other users of the group->pasid_array should be updated to be consistent. e.g. the iommu_attach_group_handle() and iommu_replace_group_handle(). Link: https://patch.msgid.link/r/20250226011849.5102-4-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Yi Liu	473ec072a6	iommu: Drop iommu_group_replace_domain() iommufd does not use it now, so drop it. Link: https://patch.msgid.link/r/20250226011849.5102-3-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Yi Liu	237603a46a	iommu: Make @handle mandatory in iommu_{attach\|replace}_group_handle() Caller of the two APIs always provide a valid handle, make @handle as mandatory parameter. Take this chance incoporate the handle->domain set under the protection of group->mutex in iommu_attach_group_handle(). Link: https://patch.msgid.link/r/20250226011849.5102-2-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Lu Baolu	b150654f74	iommu/vt-d: Fix suspicious RCU usage Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts locally") moved the call to enable_drhd_fault_handling() to a code path that does not hold any lock while traversing the drhd list. Fix it by ensuring the dmar_global_lock lock is held when traversing the drhd list. Without this fix, the following warning is triggered: ============================= WARNING: suspicious RCU usage 6.14.0-rc3 #55 Not tainted ----------------------------- drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 2 locks held by cpuhp/1/23: #0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 #1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 stack backtrace: CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 #55 Call Trace: <TASK> dump_stack_lvl+0xb7/0xd0 lockdep_rcu_suspicious+0x159/0x1f0 ? __pfx_enable_drhd_fault_handling+0x10/0x10 enable_drhd_fault_handling+0x151/0x180 cpuhp_invoke_callback+0x1df/0x990 cpuhp_thread_fun+0x1ea/0x2c0 smpboot_thread_fn+0x1f5/0x2e0 ? __pfx_smpboot_thread_fn+0x10/0x10 kthread+0x12a/0x2d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x4a/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Holding the lock in enable_drhd_fault_handling() triggers a lockdep splat about a possible deadlock between dmar_global_lock and cpu_hotplug_lock. This is avoided by not holding dmar_global_lock when calling iommu_device_register(), which initiates the device probe process. Fixes: `d74169ceb0` ("iommu/vt-d: Allocate DMAR fault interrupts locally") Reported-and-tested-by: Ido Schimmel <idosch@nvidia.com> Closes: https://lore.kernel.org/linux-iommu/Zx9OwdLIc_VoQ0-a@shredder.mtl.com/ Tested-by: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250218022422.2315082-1-baolu.lu@linux.intel.com Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-28 12:19:01 +01:00
Jerry Snitselaar	64f792981e	iommu/vt-d: Remove device comparison in context_setup_pass_through_cb Remove the device comparison check in context_setup_pass_through_cb. pci_for_each_dma_alias already makes a decision on whether the callback function should be called for a device. With the check in place it will fail to create context entries for aliases as it walks up to the root bus. Fixes: `2031c469f8` ("iommu/vt-d: Add support for static identity domain") Closes: https://lore.kernel.org/linux-iommu/82499eb6-00b7-4f83-879a-e97b4144f576@linux.intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20250224180316.140123-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-28 12:19:00 +01:00
Alejandro Jimenez	c5b0320bbf	iommu/amd: Preserve default DTE fields when updating Host Page Table Root When updating the page table root field on the DTE, avoid overwriting any bits that are already set. The earlier call to make_clear_dte() writes default values that all DTEs must have set (currently DTE[V]), and those must be preserved. Currently this doesn't cause problems since the page table root update is the first field that is set after make_clear_dte() is called, and DTE_FLAG_V is set again later along with the permission bits (IR/IW). Remove this redundant assignment too. Fixes: `fd5dff9de4` ("iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250106191413.3107140-1-alejandro.j.jimenez@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-28 12:18:00 +01:00
Jason Gunthorpe	40f5175d0e	iommufd: Implement sw_msi support natively iommufd has a model where the iommu_domain can be changed while the VFIO device is attached. In this case, the MSI should continue to work. This corner case has not worked because the dma-iommu implementation of sw_msi is tied to a single domain. Implement the sw_msi mapping directly and use a global per-fd table to associate assigned IOVA to the MSI pages. This allows the MSI pages to be loaded into a domain before it is attached ensuring that MSI is not disrupted. Link: https://patch.msgid.link/r/e13d23eeacd67c0a692fc468c85b483f4dd51c57.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-27 15:29:35 -04:00
Nuno Das Neves	db912b8954	hyperv: Change hv_root_partition into a function Introduce hv_curr_partition_type to store the partition type as an enum. Right now this is limited to guest or root partition, but there will be other kinds in future and the enum is easily extensible. Set up hv_curr_partition_type early in Hyper-V initialization with hv_identify_partition_type(). hv_root_partition() just queries this value, and shouldn't be called before that. Making this check into a function sets the stage for adding a config option to gate the compilation of root partition code. In particular, hv_root_partition() can be stubbed out always be false if root partition support isn't desired. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com>	2025-02-22 02:21:45 +00:00
Nicolin Chen	748706d7ca	iommu: Turn fault_data to iommufd private pointer A "fault_data" was added exclusively for the iommufd_fault_iopf_handler() used by IOPF/PRI use cases, along with the attach_handle. Now, the iommufd version of the sw_msi function will reuse the attach_handle and fault_data for a non-fault case. Rename "fault_data" to "iommufd_hwpt" so as not to confine it to a "fault" case. Move it into a union to be the iommufd private pointer. A following patch will move the iova_cookie to the union for dma-iommu too after the iommufd_sw_msi implementation is added. Since we have two unions now, add some simple comments for readability. Link: https://patch.msgid.link/r/ee5039503f28a16590916e9eef28b917e2d1607a.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:49:05 -04:00
Jason Gunthorpe	96093fe54f	irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need it Currently, IRQ_MSI_IOMMU is selected if DMA_IOMMU is available to provide an implementation for iommu_dma_prepare/compose_msi_msg(). However, it'll make more sense for irqchips that call prepare/compose to select it, and that will trigger all the additional code and data to be compiled into the kernel. If IRQ_MSI_IOMMU is selected with no IOMMU side implementation, then the prepare/compose() will be NOP stubs. If IRQ_MSI_IOMMU is not selected by an irqchip, then the related code on the iommu side is compiled out. Link: https://patch.msgid.link/r/a2620f67002c5cdf974e89ca3bf905f5c0817be6.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:48:38 -04:00
Jason Gunthorpe	288683c92b	iommu: Make iommu_dma_prepare_msi() into a generic operation SW_MSI supports IOMMU to translate an MSI message before the MSI message is delivered to the interrupt controller. On such systems, an iommu_domain must have a translation for the MSI message for interrupts to work. The IRQ subsystem will call into IOMMU to request that a physical page be set up to receive MSI messages, and the IOMMU then sets an IOVA that maps to that physical page. Ultimately the IOVA is programmed into the device via the msi_msg. Generalize this by allowing iommu_domain owners to provide implementations of this mapping. Add a function pointer in struct iommu_domain to allow a domain owner to provide its own implementation. Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in the iommu_get_msi_cookie() path. Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain doesn't change or become freed while running. Races with IRQ operations from VFIO and domain changes from iommufd are possible here. Replace the msi_prepare_lock with a lockdep assertion for the group mutex as documentation. For the dmau_iommu.c each iommu_domain is unique to a group. Link: https://patch.msgid.link/r/4ca696150d2baee03af27c4ddefdb7b0b0280e7b.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:04:12 -04:00
Jason Gunthorpe	9349887e93	genirq/msi: Refactor iommu_dma_compose_msi_msg() The two-step process to translate the MSI address involves two functions, iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg(). Previously iommu_dma_compose_msi_msg() needed to be in the iommu layer as it had to dereference the opaque cookie pointer. Now, the previous patch changed the cookie pointer into an integer so there is no longer any need for the iommu layer to be involved. Further, the call sites of iommu_dma_compose_msi_msg() all follow the same pattern of setting an MSI message address_hi/lo to non-translated and then immediately calling iommu_dma_compose_msi_msg(). Refactor iommu_dma_compose_msi_msg() into msi_msg_set_addr() that directly accepts the u64 version of the address and simplifies all the callers. Move the new helper to linux/msi.h since it has nothing to do with iommu. Aside from refactoring, this logically prepares for the next patch, which allows multiple implementation options for iommu_dma_prepare_msi(). So, it does not make sense to have the iommu_dma_compose_msi_msg() in dma-iommu.c as it no longer provides the only iommu_dma_prepare_msi() implementation. Link: https://patch.msgid.link/r/eda62a9bafa825e9cdabd7ddc61ad5a21c32af24.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:04:12 -04:00
Jason Gunthorpe	1f7df3a691	genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie The IOMMU translation for MSI message addresses has been a 2-step process, separated in time: 1) iommu_dma_prepare_msi(): A cookie pointer containing the IOVA address is stored in the MSI descriptor when an MSI interrupt is allocated. 2) iommu_dma_compose_msi_msg(): this cookie pointer is used to compute a translated message address. This has an inherent lifetime problem for the pointer stored in the cookie that must remain valid between the two steps. However, there is no locking at the irq layer that helps protect the lifetime. Today, this works under the assumption that the iommu domain is not changed while MSI interrupts being programmed. This is true for normal DMA API users within the kernel, as the iommu domain is attached before the driver is probed and cannot be changed while a driver is attached. Classic VFIO type1 also prevented changing the iommu domain while VFIO was running as it does not support changing the "container" after starting up. However, iommufd has improved this so that the iommu domain can be changed during VFIO operation. This potentially allows userspace to directly race VFIO_DEVICE_ATTACH_IOMMUFD_PT (which calls iommu_attach_group()) and VFIO_DEVICE_SET_IRQS (which calls into iommu_dma_compose_msi_msg()). This potentially causes both the cookie pointer and the unlocked call to iommu_get_domain_for_dev() on the MSI translation path to become UAFs. Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA address is already known during iommu_dma_prepare_msi() and cannot change. Thus, it can simply be stored as an integer in the MSI descriptor. The other UAF related to iommu_get_domain_for_dev() will be addressed in patch "iommu: Make iommu_dma_prepare_msi() into a generic operation" by using the IOMMU group mutex. Link: https://patch.msgid.link/r/a4f2cd76b9dc1833ee6c1cf325cba57def22231c.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:04:11 -04:00
Asahi Lina	5276c1e076	iommu/io-pgtable-dart: Only set subpage protection disable for DART 1 Subpage protection can't be disabled on t6000-style darts, as such the disable flag no longer applies, and probably even affects something else. Fixes: `dc09fe1c5e` ("iommu/io-pgtable-dart: Add DART PTE support for t6000") Signed-off-by: Asahi Lina <lina@asahilina.net> Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20250219-dart2-no-sp-disable-v1-1-9f324cfa4e70@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-21 12:06:54 +01:00
Matthew Rosato	64af12c6ec	iommu/s390: implement iommu passthrough via identity domain Enabled via the kernel command-line 'iommu.passthrough=1' option. Introduce the concept of identity domains to s390-iommu, which relies on the bus_dma_region to offset identity mappings to the start of the DMA aperture advertized by CLP. Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250212213418.182902-5-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-21 12:02:00 +01:00
Matthew Rosato	0ed5967a0a	iommu/s390: handle IOAT registration based on domain At this point, the dma_table is really a property of the s390-iommu domain. Rather than checking its contents elsewhere in the codebase, move the code that registers the table with firmware into s390-iommu and make a decision what to register with firmware based upon the type of domain in use for the device in question. Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20250212213418.182902-4-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-21 12:01:58 +01:00
Linus Torvalds	82ff316456	ARM: - Large set of fixes for vector handling, specially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours. - Fix use of kernel VAs at EL2 when emulating timers with nVHE. - Small set of pKVM improvements and cleanups. x86: - Fix broken SNP support with KVM module built-in, ensuring the PSP module is initialized before KVM even when the module infrastructure cannot be used to order initcalls - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference. - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1. - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmev2ykUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMvxwf/bw2u08moAYWAjJLROFvfiKXnznLS iqJ2+jcw0lJ7wDqm4Zw8M5t74Rd+y5yzkLkZOyjav9yBB09zRkItiTHljCNMOQnt 2QptBa3CUN8N+rNnvVRt6dMkhw7z6n7eoFRSIDY2Y9PgiTapbFXPV1gFkMPO6+0f SyF4LCr0iuDkJdvGAZJAH/Mp8nG6dv/A6a+Q+R1RkbKn9c2OdWw4VMfhIzimFGN6 0RFjbfXXvyO0aU/W/VHwvvuhcjGkAZWfHDdaTXqbvSMhayW562UPVMVBwXdVBmDj Dk1gCKcbm4WyktbXYW6iOYj3MgdK96eI24ozps4R0aDexsrTRY4IfH4KEg== =20Ql -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Large set of fixes for vector handling, especially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours - Fix use of kernel VAs at EL2 when emulating timers with nVHE - Small set of pKVM improvements and cleanups x86: - Fix broken SNP support with KVM module built-in, ensuring the PSP module is initialized before KVM even when the module infrastructure cannot be used to order initcalls - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1 - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) x86/sev: Fix broken SNP support with KVM module built-in KVM: SVM: Ensure PSP module is initialized if KVM module is built-in crypto: ccp: Add external API interface for PSP module initialization KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic() KVM: arm64: timer: Drop warning on failed interrupt signalling KVM: arm64: Fix alignment of kvm_hyp_memcache allocations KVM: arm64: Convert timer offset VA when accessed in HYP code KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp() KVM: arm64: Eagerly switch ZCR_EL{1,2} KVM: arm64: Mark some header functions as inline KVM: arm64: Refactor exit handlers KVM: arm64: Refactor CPTR trap deactivation KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN KVM: arm64: Remove host FPSIMD saving for non-protected KVM KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop KVM: nSVM: Enter guest mode before initializing nested NPT MMU KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper ...	2025-02-16 10:25:12 -08:00
Ashish Kalra	409f45387c	x86/sev: Fix broken SNP support with KVM module built-in Fix issues with enabling SNP host support and effectively SNP support which is broken with respect to the KVM module being built-in. SNP host support is enabled in snp_rmptable_init() which is invoked as device_initcall(). SNP check on IOMMU is done during IOMMU PCI init (IOMMU_PCI_INIT stage). And for that reason snp_rmptable_init() is currently invoked via device_initcall() and cannot be invoked via subsys_initcall() as core IOMMU subsystem gets initialized via subsys_initcall(). Now, if kvm_amd module is built-in, it gets initialized before SNP host support is enabled in snp_rmptable_init() : [ 10.131811] kvm_amd: TSC scaling supported [ 10.136384] kvm_amd: Nested Virtualization enabled [ 10.141734] kvm_amd: Nested Paging enabled [ 10.146304] kvm_amd: LBR virtualization supported [ 10.151557] kvm_amd: SEV enabled (ASIDs 100 - 509) [ 10.156905] kvm_amd: SEV-ES enabled (ASIDs 1 - 99) [ 10.162256] kvm_amd: SEV-SNP enabled (ASIDs 1 - 99) [ 10.171508] kvm_amd: Virtual VMLOAD VMSAVE supported [ 10.177052] kvm_amd: Virtual GIF supported ... ... [ 10.201648] kvm_amd: in svm_enable_virtualization_cpu And then svm_x86_ops->enable_virtualization_cpu() (svm_enable_virtualization_cpu) programs MSR_VM_HSAVE_PA as following: wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa); So VM_HSAVE_PA is non-zero before SNP support is enabled on all CPUs. snp_rmptable_init() gets invoked after svm_enable_virtualization_cpu() as following : ... [ 11.256138] kvm_amd: in svm_enable_virtualization_cpu ... [ 11.264918] SEV-SNP: in snp_rmptable_init This triggers a #GP exception in snp_rmptable_init() when snp_enable() is invoked to set SNP_EN in SYSCFG MSR: [ 11.294289] unchecked MSR access error: WRMSR to 0xc0010010 (tried to write 0x0000000003fc0000) at rIP: 0xffffffffaf5d5c28 (native_write_msr+0x8/0x30) ... [ 11.294404] Call Trace: [ 11.294482] <IRQ> [ 11.294513] ? show_stack_regs+0x26/0x30 [ 11.294522] ? ex_handler_msr+0x10f/0x180 [ 11.294529] ? search_extable+0x2b/0x40 [ 11.294538] ? fixup_exception+0x2dd/0x340 [ 11.294542] ? exc_general_protection+0x14f/0x440 [ 11.294550] ? asm_exc_general_protection+0x2b/0x30 [ 11.294557] ? __pfx_snp_enable+0x10/0x10 [ 11.294567] ? native_write_msr+0x8/0x30 [ 11.294570] ? __snp_enable+0x5d/0x70 [ 11.294575] snp_enable+0x19/0x20 [ 11.294578] __flush_smp_call_function_queue+0x9c/0x3a0 [ 11.294586] generic_smp_call_function_single_interrupt+0x17/0x20 [ 11.294589] __sysvec_call_function+0x20/0x90 [ 11.294596] sysvec_call_function+0x80/0xb0 [ 11.294601] </IRQ> [ 11.294603] <TASK> [ 11.294605] asm_sysvec_call_function+0x1f/0x30 ... [ 11.294631] arch_cpu_idle+0xd/0x20 [ 11.294633] default_idle_call+0x34/0xd0 [ 11.294636] do_idle+0x1f1/0x230 [ 11.294643] ? complete+0x71/0x80 [ 11.294649] cpu_startup_entry+0x30/0x40 [ 11.294652] start_secondary+0x12d/0x160 [ 11.294655] common_startup_64+0x13e/0x141 [ 11.294662] </TASK> This #GP exception is getting triggered due to the following errata for AMD family 19h Models 10h-1Fh Processors: Processor may generate spurious #GP(0) Exception on WRMSR instruction: Description: The Processor will generate a spurious #GP(0) Exception on a WRMSR instruction if the following conditions are all met: - the target of the WRMSR is a SYSCFG register. - the write changes the value of SYSCFG.SNPEn from 0 to 1. - One of the threads that share the physical core has a non-zero value in the VM_HSAVE_PA MSR. The document being referred to above: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/57095-PUB_1_01.pdf To summarize, with kvm_amd module being built-in, KVM/SVM initialization happens before host SNP is enabled and this SVM initialization sets VM_HSAVE_PA to non-zero, which then triggers a #GP when SYSCFG.SNPEn is being set and this will subsequently cause SNP_INIT(_EX) to fail with INVALID_CONFIG error as SYSCFG[SnpEn] is not set on all CPUs. Essentially SNP host enabling code should be invoked before KVM initialization, which is currently not the case when KVM is built-in. Add fix to call snp_rmptable_init() early from iommu_snp_enable() directly and not invoked via device_initcall() which enables SNP host support before KVM initialization with kvm_amd module built-in. Add additional handling for `iommu=off` or `amd_iommu=off` options. Note that IOMMUs need to be enabled for SNP initialization, therefore, if host SNP support is enabled but late IOMMU initialization fails then that will cause PSP driver's SNP_INIT to fail as IOMMU SNP sanity checks in SNP firmware will fail with invalid configuration error as below: [ 9.723114] ccp 0000:23:00.1: sev enabled [ 9.727602] ccp 0000:23:00.1: psp enabled [ 9.732527] ccp 0000:a2:00.1: enabling device (0000 -> 0002) [ 9.739098] ccp 0000:a2:00.1: no command queues available [ 9.745167] ccp 0000:a2:00.1: psp enabled [ 9.805337] ccp 0000:23:00.1: SEV-SNP: failed to INIT rc -5, error 0x3 [ 9.866426] ccp 0000:23:00.1: SEV API:1.53 build:5 Fixes: `c3b86e61b7` ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Acked-by: Joerg Roedel <jroedel@suse.de> Message-ID: <138b520fb83964782303b43ade4369cd181fdd9c.1739226950.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-02-14 18:39:19 -05:00
Lu Baolu	add43c4fbc	iommu/vt-d: Make intel_iommu_drain_pasid_prq() cover faults for RID This driver supports page faults on PCI RID since commit <9f831c16c69e> ("iommu/vt-d: Remove the pasid present check in prq_event_thread") by allowing the reporting of page faults with the pasid_present field cleared to the upper layer for further handling. The fundamental assumption here is that the detach or replace operations act as a fence for page faults. This implies that all pending page faults associated with a specific RID or PASID are flushed when a domain is detached or replaced from a device RID or PASID. However, the intel_iommu_drain_pasid_prq() helper does not correctly handle faults for RID. This leads to faults potentially remaining pending in the iommu hardware queue even after the domain is detached, thereby violating the aforementioned assumption. Fix this issue by extending intel_iommu_drain_pasid_prq() to cover faults for RID. Fixes: `9f831c16c6` ("iommu/vt-d: Remove the pasid present check in prq_event_thread") Cc: stable@vger.kernel.org Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250121023150.815972-1-baolu.lu@linux.intel.com Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20250211005512.985563-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-14 09:12:47 +01:00
Andrew Kreimer	4a8991fe9c	iommu/exynos: Fix typos There are some typos in comments/messages: - modyfying -> modifying - Unabled -> Unable Fix them via codespell. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Link: https://lore.kernel.org/r/20250210112027.29791-1-algonell@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-14 09:12:46 +01:00
Easwar Hariharan	78be7f0453	iommu: Fix a spelling error Fix spelling error IDENITY -> IDENTITY in drivers/iommu/iommu.c. Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250128190522.70800-1-eahariha@linux.microsoft.com [ joro: Add commit message ] Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-14 09:12:46 +01:00
Vasant Hegde	ef75966abf	iommu/amd: Expicitly enable CNTRL.EPHEn bit in resume path With recent kernel, AMDGPU failed to resume after suspend on certain laptop. Sample log: ----------- Nov 14 11:52:19 Thinkbook kernel: iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:06:00.0 pasid=0x00000 address=0x135300000 flags=0x0080] Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[0]: 7d90000000000003 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[1]: 0000100103fc0009 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[2]: 2000000117840013 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[3]: 0000000000000000 This is because in resume path, CNTRL[EPHEn] is not set. Fix this by setting CNTRL[EPHEn] to 1 in resume path if EFR[EPHSUP] is set. Note May be better approach is to save the control register in suspend path and restore it in resume path instead of trying to set indivisual bits. We will have separate patch for that. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219499 Fixes: `c4cb231111` ("iommu/amd: Add support for enable/disable IOPF") Tested-by: Hamish McIntyre-Bhatty <kernel-bugzilla@regd.hamishmb.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250127094411.5931-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-14 09:11:58 +01:00
Nicolin Chen	dc10ba25d4	iommufd/fault: Remove iommufd_fault_domain_attach/detach/replace_dev() There are new attach/detach/replace helpers in device.c taking care of both the attach_handle and the fault specific routines for iopf_enable/disable() and auto response. Clean up these redundant functions in the fault.c file. Link: https://patch.msgid.link/r/3ca94625e9d78270d9a715fa0809414fddd57e58.1738645017.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-11 14:21:03 -04:00
Nicolin Chen	fb21b1568a	iommufd: Make attach_handle generic than fault specific "attach_handle" was added exclusively for the iommufd_fault_iopf_handler() used by IOPF/PRI use cases. Now, both the MSI and PASID series require to reuse the attach_handle for non-fault cases. Add a set of new attach/detach/replace helpers that does the attach_handle allocation/releasing/replacement in the common path and also handles those fault specific routines such as iopf enabling/disabling and auto response. This covers both non-fault and fault cases in a clean way, replacing those inline helpers in the header. The following patch will clean up those old helpers in the fault.c file. Link: https://patch.msgid.link/r/32687df01c02291d89986a9fca897bbbe2b10987.1738645017.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-11 14:21:03 -04:00
Lu Baolu	9759ae2cee	iommu: Fix potential memory leak in iopf_queue_remove_device() The iopf_queue_remove_device() helper removes a device from the per-iommu iopf queue when PRI is disabled on the device. It responds to all outstanding iopf's with an IOMMU_PAGE_RESP_INVALID code and detaches the device from the queue. However, it fails to release the group structure that represents a group of iopf's awaiting for a response after responding to the hardware. This can cause a memory leak if iopf_queue_remove_device() is called with pending iopf's. Fix it by calling iopf_free_group() after the iopf group is responded. Fixes: `1991123271` ("iommu: Track iopf group instead of last fault") Cc: stable@vger.kernel.org Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250117055800.782462-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-10 14:49:30 +01:00
Linus Torvalds	382e391365	hyperv-next for v6.14 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmeTFQ4THHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXqMWB/4uHjnu50u+m00OwXAKQr6i92zh50BZ RQragd9s9C8tuUNwPDmS/ct2BNAhoy43KJ0ClegdZjKxT1Ys8cLv4Wr5CaGckqWq +WCHqTgt+cPe0vUofqahB5wiAZMsnBgzFkV/OfFwBx0wkub9y5T3qVq5KapYlaDI 7Gftb+wg1AAsrdZ/HuLRy5ZVvkM/73rU2uoi8WXjr/T14E1krCFR/qirLd1OXo6Q Jb97qhnCt/N9JPwIq5/VnYWde5Mpqz6UgtA2rFLDXgNGz+h9/ND6ecWFHjZWNVdc AKWZTO5t+fRVBOSyahoyRoYSntPw3wlxyL7A2/54h6j4Dex7wLt6NQBj =empO -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Introduce a new set of Hyper-V headers in include/hyperv and replace the old hyperv-tlfs.h with the new headers (Nuno Das Neves) - Fixes for the Hyper-V VTL mode (Roman Kisel) - Fixes for cpu mask usage in Hyper-V code (Michael Kelley) - Document the guest VM hibernation behaviour (Michael Kelley) - Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain) * tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Documentation: hyperv: Add overview of guest VM hibernation hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() hyperv: Do not overlap the hvcall IO areas in get_vtl() hyperv: Enable the hypercall output page for the VTL mode hv_balloon: Fallback to generic_online_page() for non-HV hot added mem Drivers: hv: vmbus: Log on missing offers if any Drivers: hv: vmbus: Wait for boot-time offers during boot and resume uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup iommu/hyper-v: Don't assume cpu_possible_mask is dense Drivers: hv: Don't assume cpu_possible_mask is dense x86/hyperv: Don't assume cpu_possible_mask is dense hyperv: Remove the now unused hyperv-tlfs.h files hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h hyperv: Add new Hyper-V headers in include/hyperv hyperv: Clean up unnecessary #includes hyperv: Move hv_connection_id to hyperv-tlfs.h	2025-01-25 09:22:55 -08:00
Linus Torvalds	aa44198a6c	iommufd 6.14 merge window pull No major functionality this cycle: - iommufd part of the domain_alloc_paging_flags() conversion - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers - Increase a timeout waiting for other threads to drop transient refcounts that syzkaller was hitting - Fix a UBSAN hit in iova_bitmap due to shift out of bounds - Add missing cleanup of fault events during FD shutdown, fixing a memory leak - Improve the fault delivery flow to have a smaller locking critical region that does not include copy_to_user() - Fix 32 bit ABI breakage due to missed implicit padding, and fix the stack memory leakage -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ5JgnAAKCRCFwuHvBreF YQWrAP9ItOsbeOxYIEVQK1E90HbMWVHF8RhWDPnpChgzyorjjQEA/OOou6uTAcVC fJybLk3T7JrutMmBFvVLsRg+yCJPlgE= =ldhT -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "No major functionality this cycle: - iommufd part of the domain_alloc_paging_flags() conversion - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers - Increase a timeout waiting for other threads to drop transient refcounts that syzkaller was hitting - Fix a UBSAN hit in iova_bitmap due to shift out of bounds - Add missing cleanup of fault events during FD shutdown, fixing a memory leak - Improve the fault delivery flow to have a smaller locking critical region that does not include copy_to_user() - Fix 32 bit ABI breakage due to missed implicit padding, and fix the stack memory leakage" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix struct iommu_hwpt_pgfault init and padding iommufd/fault: Use a separate spinlock to protect fault->deliver list iommufd/fault: Destroy response and mutex in iommufd_fault_destroy() iommufd: Keep OBJ/IOCTL lists in an alphabetical order iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index() iommu: iommufd: fix WARNING in iommufd_device_unbind iommufd: Deal with IOMMU_HWPT_FAULT_ID_VALID in iommufd core iommufd/selftest: Remove domain_alloc_paging()	2025-01-24 12:04:35 -08:00
Linus Torvalds	f1c243fc78	IOMMU Updates for Linux v6.14 Including: - Core changes: - PASID support for the blocked_domain. - ARM-SMMU Updates: - SMMUv2: * Implement per-client prefetcher configuration on Qualcomm SoCs. * Support for the Adreno SMMU on Qualcomm's SDM670 SOC. - SMMUv3: * Pretty-printing of event records. * Drop the ->domain_alloc_paging implementation in favour of ->domain_alloc_paging_flags(flags==0). - IO-PGTable: * Generalisation of the page-table walker to enable external walkers (e.g. for debugging unexpected page-faults from the GPU). * Minor fix for handling concatenated PGDs at stage-2 with 16KiB pages. - Misc: * Clean-up device probing and replace the crufty probe-deferral hack with a more robust implementation of arm_smmu_get_by_fwnode(). * Device-tree binding updates for a bunch of Qualcomm platforms. - Intel VT-d Updates: - Remove domain_alloc_paging(). - Remove capability audit code. - Draining PRQ in sva unbind path when FPD bit set. - Link cache tags of same iommu unit together. - AMD-Vi Updates: - Use CMPXCHG128 to update DTE. - Cleanups of the domain_alloc_paging() path. - RiscV IOMMU: - Platform MSI support. - Shutdown support. - Rockchip IOMMU: - Add DT bindings for Rockchip RK3576. - More smaller fixes and cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmeQsnQACgkQK/BELZcB GuOHug/8DDIuKlUHU2+3U2kKxMb8o1kDxGPkfKgXBaxxpprvY9DARRv7N4mnF6Vu Db8P7QihXfQKnZHXEiqHE7TlA5B+IVQd3Kz96P4sY3OlVWGZYqyKv2GEHyG5CjN/ bay7bfgeo2EVEiAio6VToFFWTm+oxFZzhoYFIlAyZAuIQUp17gHXf7YyhUwk4rOz 8g0XMH6uldidID6BVpArxHh/bN9MOTdHzkyhwPF3FL8E94ziX6rWILH9ADYxBn2o wqHR1STxv398k62toPpWb78c2RdANI8treDXsYpCyDF87dygdP+SA0qkK3G6kAVA /IiPothAj6oNm+Gvwd04tEkuqVVrVqrmWE3VXSps33Tk+apYtcmLCtdpwY/F93D1 EZwTVqveBKk2hSWYVDlEyj9XKmZ9dYWDGvg2Fx844gltQHoHtEgYL+azCMU/ry7k 3+KlkUqFZZxUDbVBbbDdMF+NpxyZTtfKsaLB8f5laOP1R8Os3+dC69Gw6bhoxaMc xfL3v245V5kWSRy+w02TyZGmZSzRx0FKUbFLKpLOvZD6pfx8t8oqJTG9FwN1KzpL lvcBpPB5AUeNJQpKcVSjs/ne8WiOqdABt7ge4E9J5TwrDI8sXk0mwJaPrlDgK1V/ 0xkkLmxWnqsu04CyDay+Uv3Bls/rRBpikR/iGt9P3BbZs7Cyryw= =Xei0 -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core changes: - PASID support for the blocked_domain ARM-SMMU Updates: - SMMUv2: - Implement per-client prefetcher configuration on Qualcomm SoCs - Support for the Adreno SMMU on Qualcomm's SDM670 SOC - SMMUv3: - Pretty-printing of event records - Drop the ->domain_alloc_paging implementation in favour of domain_alloc_paging_flags(flags==0) - IO-PGTable: - Generalisation of the page-table walker to enable external walkers (e.g. for debugging unexpected page-faults from the GPU) - Minor fix for handling concatenated PGDs at stage-2 with 16KiB pages - Misc: - Clean-up device probing and replace the crufty probe-deferral hack with a more robust implementation of arm_smmu_get_by_fwnode() - Device-tree binding updates for a bunch of Qualcomm platforms Intel VT-d Updates: - Remove domain_alloc_paging() - Remove capability audit code - Draining PRQ in sva unbind path when FPD bit set - Link cache tags of same iommu unit together AMD-Vi Updates: - Use CMPXCHG128 to update DTE - Cleanups of the domain_alloc_paging() path RiscV IOMMU: - Platform MSI support - Shutdown support Rockchip IOMMU: - Add DT bindings for Rockchip RK3576 More smaller fixes and cleanups" * tag 'iommu-updates-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (66 commits) iommu: Use str_enable_disable-like helpers iommu/amd: Fully decode all combinations of alloc_paging_flags iommu/amd: Move the nid to pdom_setup_pgtable() iommu/amd: Change amd_iommu_pgtable to use enum protection_domain_mode iommu/amd: Remove type argument from do_iommu_domain_alloc() and related iommu/amd: Remove dev == NULL checks iommu/amd: Remove domain_alloc() iommu/amd: Remove unused amd_iommu_domain_update() iommu/riscv: Fixup compile warning iommu/arm-smmu-v3: Add missing #include of linux/string_choices.h iommu/arm-smmu-v3: Use str_read_write helper w/ logs iommu/io-pgtable-arm: Add way to debug pgtable walk iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys iommu/io-pgtable-arm: Make pgtable walker more generic iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500 iommu/arm-smmu: Introduce ACTLR custom prefetcher settings iommu/arm-smmu: Add support for PRR bit setup iommu/arm-smmu: Refactor qcom_smmu structure to include single pointer iommu/arm-smmu: Re-enable context caching in smmu reset operation iommu/vt-d: Link cache tags of same iommu unit together ...	2025-01-24 07:33:46 -08:00
Linus Torvalds	4c551165e7	Updates for the interrupt subsystem: - Consolidation of the machine_kexec_mask_interrupts() by providing a generic implementation and replacing the copy & pasta orgy in the relevant architectures. - Prevent unconditional operations on interrupt chips during kexec shutdown, which can trigger warnings in certain cases when the underlying interrupt has been shut down before. - Make the enforcement of interrupt handling in interrupt context unconditionally available, so that it actually works for non x86 related interrupt chips. The earlier enablement for ARM GIC chips set the required chip flag, but did not notice that the check was hidden behind a config switch which is not selected by ARM[64]. - Decrapify the handling of deferred interrupt affinity setting. Some interrupt chips require that affinity changes are made from the context of handling an interrupt to avoid certain race conditions. For x86 this was the default, but with interrupt remapping this requirement was lifted and a flag was introduced which tells the core code that affinity changes can be done in any context. Unrestricted affinity changes are the default for the majority of interrupt chips. RISCV has the requirement to add the deferred mode to one of it's interrupt controllers, but with the original implementation this would require to add the any context flag to all other RISC-V interrupt chips. That's backwards, so reverse the logic and require that chips, which need the deferred mode have to be marked accordingly. That avoids chasing the 'sane' chips and marking them. - Add multi-node support to the Loongarch AVEC interrupt controller driver. - The usual tiny cleanups, fixes and improvements all over the place. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmePkVITHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoRbQD/9bHVph/V9Ekl7JAX3aY4gG4JbRhOc7 dp1VAcHRhktRfoTztYRbjsbMu2nvZ58GKA8bkOS2jHSF/m3PbkIJfOhwk0YdIAoa +kdy5yDgqCGfkqW43DN4Cr+CnzGjWMitw67tFp3fhwehMDpDjdt2L28IjtanSS0f hO6FV7o65MWeJwxk4Isb2/nvkO+X23Lrp6RrWS8SXBnF9FFXxiPIg/fiOPTizhCh 1W/bSGxLLb9WwsVzmlGAKVFlXDij0QGaIUug2fdVZ63OsELXD7tJrLSPG133yk92 ppIa0s6BT4IBsfM00us4hG15PkLuJmP3yWWcoquG0rP8Wq58VOXiN6+rcJIyvB+5 mWceTH6IKfZGoRQKwXC7BxeBAIb147reiJtb06meq1/8ADIvzafiNy0c8x9i/UaV QiyhPVENjaGCGDomZmJQqN7Yb02Wge1k8InQnodDrHxZNl/bX/B1Z8Bxd0n6hPHg NSJXYif2AxgaddpohsdygqRDbT6SNyQdj7YjJFY5qAGJ3yFyJ4JB6WTqkWW4o1vH 3FVqdAnJmejAmmYSkah0Hkem2T5QASQmTWb93PLxiV6q+d0NM8stWAujjyVdIV/B W4Uj9mQ20cz54TjLtxqX+A1k6KcqOWRgh1l2QbUlFsgsOP3V8yz47yqYdR9qMWlO 9kNEjI3sw+G/IQ== =q4rj -----END PGP SIGNATURE----- Merge tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: - Consolidate the machine_kexec_mask_interrupts() by providing a generic implementation and replacing the copy & pasta orgy in the relevant architectures. - Prevent unconditional operations on interrupt chips during kexec shutdown, which can trigger warnings in certain cases when the underlying interrupt has been shut down before. - Make the enforcement of interrupt handling in interrupt context unconditionally available, so that it actually works for non x86 related interrupt chips. The earlier enablement for ARM GIC chips set the required chip flag, but did not notice that the check was hidden behind a config switch which is not selected by ARM[64]. - Decrapify the handling of deferred interrupt affinity setting. Some interrupt chips require that affinity changes are made from the context of handling an interrupt to avoid certain race conditions. For x86 this was the default, but with interrupt remapping this requirement was lifted and a flag was introduced which tells the core code that affinity changes can be done in any context. Unrestricted affinity changes are the default for the majority of interrupt chips. RISCV has the requirement to add the deferred mode to one of it's interrupt controllers, but with the original implementation this would require to add the any context flag to all other RISC-V interrupt chips. That's backwards, so reverse the logic and require that chips, which need the deferred mode have to be marked accordingly. That avoids chasing the 'sane' chips and marking them. - Add multi-node support to the Loongarch AVEC interrupt controller driver. - The usual tiny cleanups, fixes and improvements all over the place. * tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set() genirq/timings: Add kernel-doc for a function parameter genirq: Remove IRQ_MOVE_PCNTXT and related code x86/apic: Convert to IRQCHIP_MOVE_DEFERRED genirq: Provide IRQCHIP_MOVE_DEFERRED hexagon: Remove GENERIC_PENDING_IRQ leftover ARC: Remove GENERIC_PENDING_IRQ genirq: Remove handle_enforce_irqctx() wrapper genirq: Make handle_enforce_irqctx() unconditionally available irqchip/loongarch-avec: Add multi-nodes topology support irqchip/ts4800: Replace seq_printf() by seq_puts() irqchip/ti-sci-inta : Add module build support irqchip/ti-sci-intr: Add module build support irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown kexec: Consolidate machine_kexec_mask_interrupts() implementation genirq: Reuse irq_thread_fn() for forced thread case genirq: Move irq_thread_fn() further up in the code	2025-01-21 13:51:07 -08:00
Nicolin Chen	e721f619e3	iommufd: Fix struct iommu_hwpt_pgfault init and padding The iommu_hwpt_pgfault is used to report IO page fault data to userspace, but iommufd_fault_fops_read was never zeroing its padding. This leaks the content of the kernel stack memory to userspace. Also, the iommufd uAPI requires explicit padding and use of __aligned_u64 to ensure ABI compatibility's with 32 bit. pahole result, before: struct iommu_hwpt_pgfault { __u32 flags; /* 0 4 / __u32 dev_id; / 4 4 / __u32 pasid; / 8 4 / __u32 grpid; / 12 4 / __u32 perm; / 16 4 / / XXX 4 bytes hole, try to pack / __u64 addr; / 24 8 / __u32 length; / 32 4 / __u32 cookie; / 36 4 / / size: 40, cachelines: 1, members: 8 / / sum members: 36, holes: 1, sum holes: 4 / / last cacheline: 40 bytes / }; pahole result, after: struct iommu_hwpt_pgfault { __u32 flags; / 0 4 / __u32 dev_id; / 4 4 / __u32 pasid; / 8 4 / __u32 grpid; / 12 4 / __u32 perm; / 16 4 / __u32 __reserved; / 20 4 / __u64 addr __attribute__((__aligned__(8))); / 24 8 / __u32 length; / 32 4 / __u32 cookie; / 36 4 / / size: 40, cachelines: 1, members: 9 / / forced alignments: 1 / / last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); Fixes: `c714f15860` ("iommufd: Add fault and response message definitions") Link: https://patch.msgid.link/r/20250120195051.2450-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-01-21 13:55:49 -04:00
Nicolin Chen	3d49020a32	iommufd/fault: Use a separate spinlock to protect fault->deliver list The fault->mutex serializes the fault read()/write() fops and the iommufd_fault_auto_response_faults(), mainly for fault->response. Also, it was conveniently used to fence the fault->deliver in poll() fop and iommufd_fault_iopf_handler(). However, copy_from/to_user() may sleep if pagefaults are enabled. Thus, they could take a long time to wait for user pages to swap in, blocking iommufd_fault_iopf_handler() and its caller that is typically a shared IRQ handler of an IOMMU driver, resulting in a potential global DOS. Instead of reusing the mutex to protect the fault->deliver list, add a separate spinlock, nested under the mutex, to do the job. iommufd_fault_iopf_handler() would no longer be blocked by copy_from/to_user(). Add a free_list in iommufd_auto_response_faults(), so the spinlock can simply fence a fast list_for_each_entry_safe routine. Provide two deliver list helpers for iommufd_fault_fops_read() to use: - Fetch the first iopf_group out of the fault->deliver list - Restore an iopf_group back to the head of the fault->deliver list Lastly, move the mutex closer to the response in the fault structure, and update its kdoc accordingly. Fixes: `07838f7fd5` ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/20250117192901.79491-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-01-20 12:31:15 -04:00
Joerg Roedel	125f34e4c1	Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'qualcomm/msm', 'rockchip', 'riscv', 'core', 'intel/vt-d' and 'amd/amd-vi' into next	2025-01-17 09:02:35 +01:00
Krzysztof Kozlowski	54e7d90089	iommu: Use str_enable_disable-like helpers Replace ternary (condition ? "enable" : "disable") syntax with helpers from string_choices.h because: 1. Simple function call with one argument is easier to read. Ternary operator has three arguments and with wrapping might lead to quite long code. 2. Is slightly shorter thus also easier to read. 3. It brings uniformity in the text - same string. 4. Allows deduping by the linker, which results in a smaller binary file. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250114192642.912331-1-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-17 09:00:37 +01:00

... 2 3 4 5 6 ...

5989 Commits