linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-01 06:39:05 +00:00

Author	SHA1	Message	Date
Yi Liu	c0e301b297	iommufd/device: Add pasid_attach array to track per-PASID attach PASIDs of PASID-capable device can be attached to hwpt separately, hence a pasid array to track per-PASID attachment is necessary. The index IOMMU_NO_PASID is used by the RID path. Hence drop the igroup->attach. Link: https://patch.msgid.link/r/20250321171940.7213-10-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	831b40f841	iommufd/device: Replace device_list with device_array igroup->attach->device_list is used to track attached device of a group in the RID path. Such tracking is also needed in the PASID path in order to share path with the RID path. While there is only one list_head in the iommufd_device. It cannot work if the device has been attached in both RID path and PASID path. To solve it, replacing the device_list with an xarray. The attached iommufd_device is stored in the entry indexed by the idev->obj.id. Link: https://patch.msgid.link/r/20250321171940.7213-9-yi.l.liu@intel.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	75f990aef3	iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct The igroup->hwpt and igroup->device_list are used to track the hwpt attach of a group in the RID path. While the coming PASID path also needs such tracking. To be prepared, wrap igroup->hwpt and igroup->device_list into attach struct which is allocated per attaching the first device of the group and freed per detaching the last device of the group. Link: https://patch.msgid.link/r/20250321171940.7213-8-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	ba1de6cd41	iommufd/device: Add helper to detect the first attach of a group The existing code detects the first attach by checking the igroup->device_list. However, the igroup->hwpt can also be used to detect the first attach. In future modifications, it is better to check the igroup->hwpt instead of the device_list. To improve readbility and also prepare for further modifications on this part, this adds a helper for it. Link: https://patch.msgid.link/r/20250321171940.7213-7-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	2eaa7f845e	iommufd/device: Replace idev->igroup with local variable With more use of the fields of igroup, use a local vairable instead of using the idev->igroup heavily. No functional change expected. Link: https://patch.msgid.link/r/20250321171940.7213-6-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Yi Liu	bc06f7f66d	iommufd/device: Only add reserved_iova in non-pasid path As the pasid is passed through the attach/replace/detach helpers, it is necessary to ensure only the non-pasid path adds reserved_iova. Link: https://patch.msgid.link/r/20250321171940.7213-5-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:30 -03:00
Yi Liu	03c9b102be	iommufd: Pass @pasid through the device attach/replace path Most of the core logic before conducting the actual device attach/ replace operation can be shared with pasid attach/replace. So pass @pasid through the device attach/replace helpers to prepare adding pasid attach/replace. So far the @pasid should only be IOMMU_NO_PASID. No functional change. Link: https://patch.msgid.link/r/20250321171940.7213-4-yi.l.liu@intel.com Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:30 -03:00
Yi Liu	8a9e1e773f	iommu: Introduce a replace API for device pasid Provide a high-level API to allow replacements of one domain with another for specific pasid of a device. This is similar to iommu_replace_group_handle() and it is expected to be used only by IOMMUFD. Link: https://patch.msgid.link/r/20250321171940.7213-3-yi.l.liu@intel.com Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:30 -03:00
Yi Liu	ada14b9f1a	iommu: Require passing new handles to APIs supporting handle Add kdoc to highligt the caller of iommu_[attach\|replace]_group_handle() and iommu_attach_device_pasid() should always provide a new handle. This can avoid race with lockless reference to the handle. e.g. the find_fault_handler() and iommu_report_device_fault() in the PRI path. Link: https://patch.msgid.link/r/20250321171940.7213-2-yi.l.liu@intel.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:30 -03:00
Nicolin Chen	06d54f00f3	iommu: Drop sw_msi from iommu_domain There are only two sw_msi implementations in the entire system, thus it's not very necessary to have an sw_msi pointer. Instead, check domain->cookie_type to call the two sw_msi implementations directly from the core code. Link: https://patch.msgid.link/r/7ded87c871afcbaac665b71354de0a335087bf0f.1742871535.git.nicolinc@nvidia.com Suggested-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:19 -03:00
Nicolin Chen	ec031e1b35	iommufd: Move iommufd_sw_msi and related functions to driver.c To provide the iommufd_sw_msi() to the iommu core that is under a different Kconfig, move it and its related functions to driver.c. Then, stub it into the iommu-priv header. The iommufd_sw_msi_install() continues to be used by iommufd internal, so put it in the private header. Note that iommufd_sw_msi() will be called in the iommu core, replacing the sw_msi function pointer. Given that IOMMU_API is "bool" in Kconfig, change IOMMUFD_DRIVER_CORE to "bool" as well. Since this affects the module size, here is before-n-after size comparison: [Before] text data bss dec hex filename 18797 848 56 19701 4cf5 drivers/iommu/iommufd/device.o 722 44 0 766 2fe drivers/iommu/iommufd/driver.o [After] text data bss dec hex filename 17735 808 56 18599 48a7 drivers/iommu/iommufd/device.o 3020 180 0 3200 c80 drivers/iommu/iommufd/driver.o Link: https://patch.msgid.link/r/374c159592dba7852bee20968f3f66fa0ee8ca93.1742871535.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:19 -03:00
Robin Murphy	6aa63a4ec9	iommu: Sort out domain user data When DMA/MSI cookies were made first-class citizens back in commit `46983fcd67` ("iommu: Pull IOVA cookie management into the core"), there was no real need to further expose the two different cookie types. However, now that IOMMUFD wants to add a third type of MSI-mapping cookie, we do have a nicely compelling reason to properly dismabiguate things at the domain level beyond just vaguely guessing from the domain type. Meanwhile, we also effectively have another "cookie" in the form of the anonymous union for other user data, which isn't much better in terms of being vague and unenforced. The fact is that all these cookie types are mutually exclusive, in the sense that combining them makes zero sense and/or would be catastrophic (iommu_set_fault_handler() on an SVA domain, anyone?) - the only combination which might be reasonable is perhaps a fault handler and an MSI cookie, but nobody's doing that at the moment, so let's rule it out as well for the sake of being clear and robust. To that end, we pull DMA and MSI cookies apart a little more, mostly to clear up the ambiguity at domain teardown, then for clarity (and to save a little space), move them into the union, whose ownership we can then properly describe and enforce entirely unambiguously. [nicolinc: rebase on latest tree; use prefix IOMMU_COOKIE_; merge unions in iommu_domain; add IOMMU_COOKIE_IOMMUFD for iommufd_hwpt] Link: https://patch.msgid.link/r/1ace9076c95204bbe193ee77499d395f15f44b23.1742871535.git.nicolinc@nvidia.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:18 -03:00
Nuno Das Neves	3817854ba8	hyperv: Log hypercall status codes as strings Introduce hv_status_printk() macros as a convenience to log hypercall errors, formatting them with the status code (HV_STATUS_*) as a raw hex value and also as a string, which saves some time while debugging. Create a table of HV_STATUS_ codes with strings and mapped errnos, and use it for hv_result_to_string() and hv_result_to_errno(). Use the new hv_status_printk()s in hv_proc.c, hyperv-iommu.c, and irqdomain.c hypercalls to aid debugging in the root partition. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Link: https://lore.kernel.org/r/1741980536-3865-2-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-2-git-send-email-nunodasneves@linux.microsoft.com>	2025-03-20 21:23:03 +00:00
Joerg Roedel	22df63a23a	Merge branches 'apple/dart', 'arm/smmu/updates', 'arm/smmu/bindings', 'rockchip', 's390', 'core', 'intel/vt-d' and 'amd/amd-vi' into next	2025-03-20 09:11:09 +01:00
Lu Baolu	93ae6e68b6	iommu/vt-d: Fix possible circular locking dependency We have recently seen report of lockdep circular lock dependency warnings on platforms like Skylake and Kabylake: ====================================================== WARNING: possible circular locking dependency detected 6.14.0-rc6-CI_DRM_16276-gca2c04fe76e8+ #1 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: ffffffff8360ee48 (iommu_probe_device_lock){+.+.}-{3:3}, at: iommu_probe_device+0x1d/0x70 but task is already holding lock: ffff888102c7efa8 (&device->physical_node_lock){+.+.}-{3:3}, at: intel_iommu_init+0xe75/0x11f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (&device->physical_node_lock){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 intel_iommu_init+0xe75/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #5 (dmar_global_lock){++++}-{3:3}: down_read+0x43/0x1d0 enable_drhd_fault_handling+0x21/0x110 cpuhp_invoke_callback+0x4c6/0x870 cpuhp_issue_call+0xbf/0x1f0 __cpuhp_setup_state_cpuslocked+0x111/0x320 __cpuhp_setup_state+0xb0/0x220 irq_remap_enable_fault_handling+0x3f/0xa0 apic_intr_mode_init+0x5c/0x110 x86_late_time_init+0x24/0x40 start_kernel+0x895/0xbd0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xbf/0x110 common_startup_64+0x13e/0x141 -> #4 (cpuhp_state_mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 __cpuhp_setup_state_cpuslocked+0x67/0x320 __cpuhp_setup_state+0xb0/0x220 page_alloc_init_cpuhp+0x2d/0x60 mm_core_init+0x18/0x2c0 start_kernel+0x576/0xbd0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xbf/0x110 common_startup_64+0x13e/0x141 -> #3 (cpu_hotplug_lock){++++}-{0:0}: __cpuhp_state_add_instance+0x4f/0x220 iova_domain_init_rcaches+0x214/0x280 iommu_setup_dma_ops+0x1a4/0x710 iommu_device_register+0x17d/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #2 (&domain->iova_cookie->mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 iommu_setup_dma_ops+0x16b/0x710 iommu_device_register+0x17d/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #1 (&group->mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 __iommu_probe_device+0x24c/0x4e0 probe_iommu_group+0x2b/0x50 bus_for_each_dev+0x7d/0xe0 iommu_device_register+0xe1/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #0 (iommu_probe_device_lock){+.+.}-{3:3}: __lock_acquire+0x1637/0x2810 lock_acquire+0xc9/0x300 __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 iommu_probe_device+0x1d/0x70 intel_iommu_init+0xe90/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: iommu_probe_device_lock --> dmar_global_lock --> &device->physical_node_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&device->physical_node_lock); lock(dmar_global_lock); lock(&device->physical_node_lock); lock(iommu_probe_device_lock); * DEADLOCK * This driver uses a global lock to protect the list of enumerated DMA remapping units. It is necessary due to the driver's support for dynamic addition and removal of remapping units at runtime. Two distinct code paths require iteration over this remapping unit list: - Device registration and probing: the driver iterates the list to register each remapping unit with the upper layer IOMMU framework and subsequently probe the devices managed by that unit. - Global configuration: Upper layer components may also iterate the list to apply configuration changes. The lock acquisition order between these two code paths was reversed. This caused lockdep warnings, indicating a risk of deadlock. Fix this warning by releasing the global lock before invoking upper layer interfaces for device registration. Fixes: `b150654f74` ("iommu/vt-d: Fix suspicious RCU usage") Closes: https://lore.kernel.org/linux-iommu/SJ1PR11MB612953431F94F18C954C4A9CB9D32@SJ1PR11MB6129.namprd11.prod.outlook.com/ Tested-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250317035714.1041549-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 09:03:57 +01:00
Sean Christopherson	688124cc54	iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changes Don't overwrite an IRTE that is posting IRQs to a vCPU with a posted MSI entry if the host IRQ affinity happens to change. If/when the IRTE is reverted back to "host mode", it will be reconfigured as a posted MSI or remapped entry as appropriate. Drop the "mode" field, which doesn't differentiate between posted MSIs and posted vCPUs, in favor of a dedicated posted_vcpu flag. Note! The two posted_{msi,vcpu} flags are intentionally not mutually exclusive; an IRTE can transition between posted MSI and posted vCPU. Fixes: `ed1e48ea43` ("iommu/vt-d: Enable posted mode for device MSIs") Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250315025135.2365846-3-seanjc@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 09:03:56 +01:00
Sean Christopherson	2454823e97	iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabled Add a helper to take care of reconfiguring an IRTE to deliver IRQs to the host, i.e. not to a vCPU, and use the helper when an IRTE's vCPU affinity is nullified, i.e. when KVM puts an IRTE back into "host" mode. Because posted MSIs use an ephemeral IRTE, using modify_irte() puts the IRTE into full remapped mode, i.e. unintentionally disables posted MSIs on the IRQ. Fixes: `ed1e48ea43` ("iommu/vt-d: Enable posted mode for device MSIs") Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250315025135.2365846-2-seanjc@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 09:03:56 +01:00
Qasim Ijaz	b8741496c0	iommu: apple-dart: fix potential null pointer deref If kzalloc() fails, accessing cfg->supports_bypass causes a null pointer dereference. Fix by checking for NULL immediately after allocation and returning -ENOMEM. Fixes: `3bc0102835` ("iommu: apple-dart: Allow mismatched bypass support") Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Link: https://lore.kernel.org/r/20250314230102.11008-1-qasdev00@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 08:59:15 +01:00
Robin Murphy	dcde1c4aa7	iommu/rockchip: Retire global dma_dev workaround The global dma_dev trick was mostly because the old domain_alloc op provided no context, so no way to know which IOMMU was to own the pagetable, or if a suitable one even existed at all. In the new multi-instance world with domain_alloc_paging this is no longer a concern - now we know that the given device must be associated with a valid IOMMU instance which provided the op to call in the first place, and therefore that instance can and should be the pagetable owner. To avoid worrying about the lifetime and stability of the rk_domain->iommus list, and keep the lookups simple and efficient, we'll still stash a dma_dev pointer, but now it's accurately per-domain. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Quentin Schulz <quentin.schulz@cherry.de> Tested-by: Dang Huynh <danct12@riseup.net> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/25dc948a7d35c8142c5719ac22bc523f8524d006.1741886382.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 08:58:24 +01:00
Robin Murphy	f90aa59eb2	iommu/rockchip: Register in a sensible order Currently Rockchip calls iommu_device_register() before it's finished setting up the hardware and driver state, and as such it now gets unhappy in various ways when registration starts working the way it was always intended to, and probing client devices straight away. Reorder the operations to ensure that what we're registering is a prepared and functional IOMMU instance. Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Quentin Schulz <quentin.schulz@cherry.de> Tested-by: Dang Huynh <danct12@riseup.net> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/e69532f00bf49d98322b96788edb7e2e305e4006.1741886382.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 08:58:23 +01:00
Robin Murphy	f48dcda8f6	iommu/rockchip: Allocate per-device data sensibly Now that DT-based probing is finally happening in the right order again, it reveals an issue in Rockchip's of_xlate, which can now be called during registration, but is using the global dma_dev which is only assigned later. However, this makes little sense when we're already looking up the correct IOMMU device, who should logically be the owner of the devm allocation anyway. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Anders Roxell <anders.roxell@linaro.org> Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Quentin Schulz <quentin.schulz@cherry.de> Tested-by: Dang Huynh <danct12@riseup.net> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/771e91cf16b3048e93f657153b76905665878fa2.1741886382.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-20 08:58:22 +01:00
Nicolin Chen	da0c56520e	iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations There is a DoS concern on the shared hardware event queue among devices passed through to VMs, that too many translation failures that belong to VMs could overflow the shared hardware event queue if those VMs or their VMMs don't handle/recover the devices properly. The MEV bit in the STE allows to configure the SMMU HW to merge similar event records, though there is no guarantee. Set it in a nested STE for DoS mitigations. In the future, we might want to enable the MEV for non-nested cases too such as domain->type == IOMMU_DOMAIN_UNMANAGED or even IOMMU_DOMAIN_DMA. Link: https://patch.msgid.link/r/8ed12feef67fc65273d0f5925f401a81f56acebe.1741719725.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	e7d3fa3d29	iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU Aside from the IOPF framework, iommufd provides an additional pathway to report hardware events, via the vEVENTQ of vIOMMU infrastructure. Define an iommu_vevent_arm_smmuv3 uAPI structure, and report stage-1 events in the threaded IRQ handler. Also, add another four event record types that can be forwarded to a VM. Link: https://patch.msgid.link/r/5cf6719682fdfdabffdb08374cdf31ad2466d75a.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	f0ea207ed7	iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster Use it to store all vSMMU-related data. The vsid (Virtual Stream ID) will be the first use case. Since the vsid reader will be the eventq handler that already holds a streams_mutex, reuse that to fence the vmaster too. Also add a pair of arm_smmu_attach_prepare/commit_vmaster helpers to set or unset the master->vmaster pointer. Put the helpers inside the existing arm_smmu_attach_prepare/commit(). For identity/blocked ops that don't call arm_smmu_attach_prepare/commit(), add a simpler arm_smmu_master_clear_vmaster helper to unset the vmaster. Link: https://patch.msgid.link/r/a7f282e1a531279e25f06c651e95d56f6b120886.1741719725.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	b3cc0b7599	iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VEVENT for vEVENTQ coverage The handler will get vDEVICE object from the given mdev and convert it to its per-vIOMMU virtual ID to mimic a real IOMMU driver. Link: https://patch.msgid.link/r/1ea874d20e56d65e7cfd6e0e8e01bd3dbd038761.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	941d0719aa	iommufd/selftest: Require vdev_id when attaching to a nested domain When attaching a device to a vIOMMU-based nested domain, vdev_id must be present. Add a piece of code hard-requesting it, preparing for a vEVENTQ support in the following patch. Then, update the TEST_F. A HWPT-based nested domain will return a NULL new_viommu, thus no such a vDEVICE requirement. Link: https://patch.msgid.link/r/4051ca8a819e51cb30de6b4fe9e4d94d956afe3d.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	e8e1ef9b77	iommufd/viommu: Add iommufd_viommu_report_event helper Similar to iommu_report_device_fault, this allows IOMMU drivers to report vIOMMU events from threaded IRQ handlers to user space hypervisors. Link: https://patch.msgid.link/r/44be825042c8255e75d0151b338ffd8ba0e4920b.1741719725.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:47 -03:00
Nicolin Chen	ea94b211c5	iommufd/viommu: Add iommufd_viommu_get_vdev_id helper This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want to convert a struct device pointer (physical) to its virtual device ID for an event injection to the user space VM. Again, this avoids exposing more core structures to the drivers, than the iommufd_viommu alone. Link: https://patch.msgid.link/r/18b8e8bc1b8104d43b205d21602c036fd0804e56.1741719725.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:47 -03:00
Nicolin Chen	e36ba5ab80	iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC Introduce a new IOMMUFD_OBJ_VEVENTQ object for vIOMMU Event Queue that provides user space (VMM) another FD to read the vIOMMU Events. Allow a vIOMMU object to allocate vEVENTQs, with a condition that each vIOMMU can only have one single vEVENTQ per type. Add iommufd_veventq_alloc() with iommufd_veventq_ops for the new ioctl. Link: https://patch.msgid.link/r/21acf0751dd5c93846935ee06f93b9c65eff5e04.1741719725.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:47 -03:00
Nicolin Chen	0507f337fc	iommufd: Rename fault.c to eventq.c Rename the file, aligning with the new eventq object. Link: https://patch.msgid.link/r/d726397e2d08028e25a1cb6eb9febefac35a32ba.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:51:19 -03:00
Nicolin Chen	5426a78beb	iommufd: Abstract an iommufd_eventq from iommufd_fault The fault object was designed exclusively for hwpt's IO page faults (PRI). But its queue implementation can be reused for other purposes too, such as hardware IRQ and event injections to user space. Meanwhile, a fault object holds a list of faults. So it's more accurate to call it a "fault queue". Combining the reusing idea above, abstract a new iommufd_eventq as a common structure embedded into struct iommufd_fault, similar to hwpt_paging holding a common hwpt. Add a common iommufd_eventq_ops and iommufd_eventq_init to prepare for an IOMMUFD_OBJ_VEVENTQ (vIOMMU Event Queue). Link: https://patch.msgid.link/r/e7336a857954209aabb466e0694aab323da95d90.1741719725.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:51:19 -03:00
Nicolin Chen	927dabc9aa	iommufd/fault: Add an iommufd_fault_init() helper The infrastructure of a fault object will be shared with a new vEVENTQ object in a following change. Add an iommufd_fault_init helper and an INIT_EVENTQ_FOPS marco for a vEVENTQ allocator to use too. Reorder the iommufd_ctx_get and refcount_inc, to keep them symmetrical with the iommufd_fault_fops_release(). Since the new vEVENTQ doesn't need "response" and its "mutex", so keep the xa_init_flags and mutex_init in their original locations. Link: https://patch.msgid.link/r/a9522c521909baeb1bd843950b2490478f3d06e0.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:51:19 -03:00
Nicolin Chen	dbf00d7d89	iommufd/fault: Move two fault functions out of the header There is no need to keep them in the header. The vEVENTQ version of these two functions will turn out to be a different implementation and will not share with this fault version. Thus, move them out of the header. Link: https://patch.msgid.link/r/7eebe32f3d354799f5e28128c693c3c284740b21.1741719725.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:51:18 -03:00
Robin Murphy	ba40f9dc95	iommu/mediatek-v1: Support COMPILE_TEST Add COMPILE_TEST support to Mediatek v1 so I have less chance of breaking it again in future. This just needs the ARM dma-iommu API stubbing out for now. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/a8d7c0b4393b360ce556f5f15e229d1e2fe73c83.1741702556.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-13 12:17:30 +01:00
Kishon Vijay Abraham I	19e5cc156c	iommu/amd: Enable support for up to 2K interrupts per function AMD IOMMU optionally supports up to 2K interrupts per function on newer platforms. Support for this feature is indicated through Extended Feature 2 Register (MMIO Offset 01A0h[NumIntRemapSup]). Allocate 2K IRTEs per device when this support is available. Co-developed-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250307095822.2274-5-sarunkod@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-13 12:14:17 +01:00
Sairaj Kodilkar	950865c1b8	iommu/amd: Rename DTE_INTTABLEN* and MAX_IRQS_PER_TABLE macro AMD iommu can support both 512 and 2K interrupts on newer platform. Hence add suffix "512" to the existing macros. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250307095822.2274-4-sarunkod@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-13 12:14:16 +01:00
Sairaj Kodilkar	eaf717fa1c	iommu/amd: Replace slab cache allocator with page allocator Commit `05152a0494` ("iommu/amd: Add slab-cache for irq remapping tables") introduces slab cache allocator. But slab cache allocator provides benefit only when the allocation and deallocation of many identical objects is frequent. The AMD IOMMU driver allocates Interrupt remapping table (IRT) when device driver requests IRQ for the first time and never frees it. Hence the slab allocator does not provide any benefit here. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250307095822.2274-3-sarunkod@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-13 12:14:15 +01:00
Sairaj Kodilkar	1c608b0b28	iommu/amd: Introduce generic function to set multibit feature value Define generic function `iommu_feature_set()` to set the values in the feature control register and replace `iommu_set_inv_tlb_timeout()` with it. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250307095822.2274-2-sarunkod@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-13 12:14:14 +01:00
Robin Murphy	73d2f10957	iommu: Don't warn prematurely about dodgy probes The warning for suspect probe conditions inadvertently got moved too early in a prior respin - it happened to work out OK for fwspecs, but in general still needs to be after the ops->probe_device call so drivers which filter devices for themselves have a chance to do that. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Link: https://lore.kernel.org/r/72a4853e7ef36e7c1c4ca171ed4ed8e1a463a61a.1741791691.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-12 16:58:27 +01:00
Pranjal Shrivastava	0a679336dc	iommu/arm-smmu: Set rpm auto_suspend once during probe The current code calls arm_smmu_rpm_use_autosuspend() during device attach, which seems unusual as it sets the autosuspend delay and the 'use_autosuspend' flag for the smmu device. These parameters can be simply set once during the smmu probe and in order to avoid bouncing rpm states, we can simply mark_last_busy() during a client dev attach as discussed in [1]. Move the handling of arm_smmu_rpm_use_autosuspend() to the SMMU probe and modify the arm_smmu_rpm_put() function to mark_last_busy() before calling __pm_runtime_put_autosuspend(). Additionally, s/pm_runtime_put_autosuspend/__pm_runtime_put_autosuspend/ to help with the refactor of the pm_runtime_put_autosuspend() API [2]. Link: https://lore.kernel.org/r/20241023164835.GF29251@willie-the-truck [1] Link: https://git.kernel.org/linus/b7d46644e554 [2] Signed-off-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250123195636.4182099-1-praan@google.com Signed-off-by: Will Deacon <will@kernel.org>	2025-03-11 13:23:24 +00:00
Robin Murphy	bcb81ac6ae	iommu: Get DT/ACPI parsing into the proper probe path In hindsight, there were some crucial subtleties overlooked when moving {of,acpi}_dma_configure() to driver probe time to allow waiting for IOMMU drivers with -EPROBE_DEFER, and these have become an ever-increasing source of problems. The IOMMU API has some fundamental assumptions that iommu_probe_device() is called for every device added to the system, in the order in which they are added. Calling it in a random order or not at all dependent on driver binding leads to malformed groups, a potential lack of isolation for devices with no driver, and all manner of unexpected concurrency and race conditions. We've attempted to mitigate the latter with point-fix bodges like iommu_probe_device_lock, but it's a losing battle and the time has come to bite the bullet and address the true source of the problem instead. The crux of the matter is that the firmware parsing actually serves two distinct purposes; one is identifying the IOMMU instance associated with a device so we can check its availability, the second is actually telling that instance about the relevant firmware-provided data for the device. However the latter also depends on the former, and at the time there was no good place to defer and retry that separately from the availability check we also wanted for client driver probe. Nowadays, though, we have a proper notion of multiple IOMMU instances in the core API itself, and each one gets a chance to probe its own devices upon registration, so we can finally make that work as intended for DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to be able to run the iommu_fwspec machinery currently buried deep in the wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be surprisingly straightforward to bootstrap this transformation by pretty much just calling the same path twice. At client driver probe time, dev->driver is obviously set; conversely at device_add(), or a subsequent bus_iommu_probe(), any device waiting for an IOMMU really should not have a driver already, so we can use that as a condition to disambiguate the two cases, and avoid recursing back into the IOMMU core at the wrong times. Obviously this isn't the nicest thing, but for now it gives us a functional baseline to then unpick the layers in between without many more awkward cross-subsystem patches. There are some minor side-effects like dma_range_map potentially being created earlier, and some debug prints being repeated, but these aren't significantly detrimental. Let's make things work first, then deal with making them nice. With the basic flow finally in the right order again, the next step is probably turning the bus->dma_configure paths inside-out, since all we really need from bus code is its notion of which device and input ID(s) to parse the common firmware properties with... Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:43 +01:00
Robin Murphy	3832862eb9	iommu: Keep dev->iommu state consistent At the moment, if of_iommu_configure() allocates dev->iommu itself via iommu_fwspec_init(), then suffers a DT parsing failure, it cleans up the fwspec but leaves the empty dev_iommu hanging around. So far this is benign (if a tiny bit wasteful), but we'd like to be able to reason about dev->iommu having a consistent and unambiguous lifecycle. Thus make sure that the of_iommu cleanup undoes precisely whatever it did. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/d219663a3f23001f23d520a883ac622d70b4e642.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:42 +01:00
Robin Murphy	fd598f71b6	iommu: Resolve ops in iommu_init_device() Since iommu_init_device() was factored out, it is in fact the only consumer of the ops which __iommu_probe_device() is resolving, so let it do that itself rather than passing them in. This also puts the ops lookup at a more logical point relative to the rest of the flow through __iommu_probe_device(). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/fa4b6cfc67a352488b7f4e0b736008307ce9ac2e.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:41 +01:00
Robin Murphy	b46064a188	iommu: Handle race with default domain setup It turns out that deferred default domain creation leaves a subtle race window during iommu_device_register() wherein a client driver may asynchronously probe in parallel and get as far as performing DMA API operations with dma-direct, only to be switched to iommu-dma underfoot once the default domain attachment finally happens, with obviously disastrous consequences. Even the wonky of_iommu_configure() path is at risk, since iommu_fwspec_init() will no longer defer client probe as the instance ops are (necessarily) already registered, and the "replay" iommu_probe_device() call can see dev->iommu_group already set and so think there's nothing to do either. Fortunately we already have the right tool in the right place in the form of iommu_device_use_default_domain(), which just needs to ensure that said default domain is actually ready to be used. Deferring the client probe shouldn't have too much impact, given that this only happens while the IOMMU driver is probing, and thus due to kick the deferred probe list again once it finishes. Reported-by: Charan Teja Kalla <quic_charante@quicinc.com> Fixes: `98ac73f99b` ("iommu: Require a default_domain for all iommu drivers") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/e88b94c9b575034a2c98a48b3d383654cbda7902.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:40 +01:00
Robin Murphy	29c6e1c2b9	iommu: Unexport iommu_fwspec_free() The drivers doing their own fwspec parsing have no need to call iommu_fwspec_free() since fwspecs were moved into dev_iommu, as returning an error from .probe_device will tear down the whole lot anyway. Move it into the private interface now that it only serves for of_iommu to clean up in an error case. I have no idea what mtk_v1 was doing in effectively guaranteeing a NULL fwspec would be dereferenced if no "iommus" DT property was found, so add a check for that to at least make the code look sane. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/36e245489361de2d13db22a510fa5c79e7126278.1740667667.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:39 +01:00
Lu Baolu	4c293add58	iommu/vt-d: Cleanup intel_context_flush_present() The intel_context_flush_present() is called in places where either the scalable mode is disabled, or scalable mode is enabled but all PASID entries are known to be non-present. In these cases, the flush_domains path within intel_context_flush_present() will never execute. This dead code is therefore removed. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250228092631.3425464-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:05 +01:00
Lu Baolu	87caaba1d1	iommu/vt-d: Move PRI enablement in probe path Update PRI enablement to use the new method, similar to the amd iommu driver. Enable PRI in the device probe path and disable it when the device is released. PRI is enabled throughout the device's iommu lifecycle. The infrastructure for the iommu subsystem to handle iopf requests is created during iopf enablement and released during iopf disablement. All invalid page requests from the device are automatically handled by the iommu subsystem if iopf is not enabled. Add iopf_refcount to track the iopf enablement. Convert the return type of intel_iommu_disable_iopf() to void, as there is no way to handle a failure when disabling this feature. Make intel_iommu_enable/disable_iopf() helpers global, as they will be used beyond the current file in the subsequent patch. The iopf_refcount is not protected by any lock. This is acceptable, as there is no concurrent access to it in the current code. The following patch will address this by moving it to the domain attach/detach paths, which are protected by the iommu group mutex. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250228092631.3425464-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:04 +01:00
Lu Baolu	5518f239af	iommu/vt-d: Move scalable mode ATS enablement to probe path Device ATS is currently enabled when a domain is attached to the device and disabled when the domain is detached. This creates a limitation: when the IOMMU is operating in scalable mode and IOPF is enabled, the device's domain cannot be changed. The previous code enables ATS when a domain is set to a device's RID and disables it during RID domain switch. So, if a PASID is set with a domain requiring PRI, ATS should remain enabled until the domain is removed. During the PASID domain's lifecycle, if the RID's domain changes, PRI will be disrupted because it depends on ATS, which is disabled when the blocking domain is set for the device's RID. Remove this limitation by moving ATS enablement to the device probe path. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250228092631.3425464-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:04 +01:00
Jason Gunthorpe	607ba1bb09	iommu/vt-d: Check if SVA is supported when attaching the SVA domain Attach of a SVA domain should fail if SVA is not supported, move the check for SVA support out of IOMMU_DEV_FEAT_SVA and into attach. Also check when allocating a SVA domain to match other drivers. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20250228092631.3425464-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:03 +01:00
Jason Gunthorpe	a8653e5cc2	iommu/vt-d: Use virt_to_phys() If all the inlines are unwound virt_to_dma_pfn() is simply: return page_to_pfn(virt_to_page(p)) << (PAGE_SHIFT - VTD_PAGE_SHIFT); Which can be re-arranged to: (page_to_pfn(virt_to_page(p)) << PAGE_SHIFT) >> VTD_PAGE_SHIFT The only caller is: ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) re-arranged to: ((page_to_pfn(virt_to_page(tmp_page)) << PAGE_SHIFT) >> VTD_PAGE_SHIFT) << VTD_PAGE_SHIFT Which simplifies to: page_to_pfn(virt_to_page(tmp_page)) << PAGE_SHIFT That is the same as virt_to_phys(tmp_page), so just remove all of this. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-e797f4dc6918+93057-iommu_pages_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:02 +01:00
Yunhui Cui	9ce7603ad3	iommu/vt-d: Fix system hang on reboot -f We found that executing the command ./a.out &;reboot -f (where a.out is a program that only executes a while(1) infinite loop) can probabilistically cause the system to hang in the intel_iommu_shutdown() function, rendering it unresponsive. Through analysis, we identified that the factors contributing to this issue are as follows: 1. The reboot -f command does not prompt the kernel to notify the application layer to perform cleanup actions, allowing the application to continue running. 2. When the kernel reaches the intel_iommu_shutdown() function, only the BSP (Bootstrap Processor) CPU is operational in the system. 3. During the execution of intel_iommu_shutdown(), the function down_write (&dmar_global_lock) causes the process to sleep and be scheduled out. 4. At this point, though the processor's interrupt flag is not cleared, allowing interrupts to be accepted. However, only legacy devices and NMI (Non-Maskable Interrupt) interrupts could come in, as other interrupts routing have already been disabled. If no legacy or NMI interrupts occur at this stage, the scheduler will not be able to run. 5. If the application got scheduled at this time is executing a while(1)- type loop, it will be unable to be preempted, leading to an infinite loop and causing the system to become unresponsive. To resolve this issue, the intel_iommu_shutdown() function should not execute down_write(), which can potentially cause the process to be scheduled out. Furthermore, since only the BSP is running during the later stages of the reboot, there is no need for protection against parallel access to the DMAR (DMA Remapping) unit. Therefore, the following lines could be removed: down_write(&dmar_global_lock); up_write(&dmar_global_lock); After testing, the issue has been resolved. Fixes: `6c3a44ed3c` ("iommu/vt-d: Turn off translations at shutdown") Co-developed-by: Ethan Zhao <haifeng.zhao@linux.intel.com> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20250303062421.17929-1-cuiyunhui@bytedance.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:31:02 +01:00
Hector Martin	3bc0102835	iommu: apple-dart: Allow mismatched bypass support This is needed by ISP, which has DART0 with bypass and DART1/2 without. Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20250307-isp-dart-v3-2-684fe4489591@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:29:52 +01:00
Hector Martin	9d7b779e30	iommu: apple-dart: Increase MAX_DARTS_PER_DEVICE to 3 ISP needs this. Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20250307-isp-dart-v3-1-684fe4489591@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:29:51 +01:00
Vasant Hegde	625586855f	iommu/amd: Consolidate protection domain free code Consolidate protection domain free code inside amd_iommu_domain_free() and remove protection_domain_free() function. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-8-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:11 +01:00
Vasant Hegde	5536e19e94	iommu/amd: Remove unused forward declaration Remove unused forward declaration. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-7-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:10 +01:00
Vasant Hegde	ee4cf9260a	iommu/amd: Fix header file Move function declaration inside AMD_IOMMU_H defination. Fixes: `fd5dff9de4` ("iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers") Fixes: `457da57646` ("iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()") Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-6-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:10 +01:00
Vasant Hegde	60785fceda	iommu/amd: Remove outdated comment Remove outdated comment. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-5-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:09 +01:00
Vasant Hegde	36a1cfd497	iommu/amd/pgtbl_v2: Improve error handling Return -ENOMEM if v2_alloc_pte() fails to allocate memory. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:09 +01:00
Vasant Hegde	e481f8a5db	iommu/amd: Remove unused variable Remove 'amd_iommu_aperture_order' as its not used since commit `d9cfed9254`. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:08 +01:00
Vasant Hegde	558d2bbd45	iommu/amd: Log IOMMU control register in event log path Useful for debugging ILLEGAL_DEV_TABLE_ENTRY events as some of the DTE settings depends on Control register settings. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:27:08 +01:00
Robin Murphy	032d7e435c	iommu/dma: Remove redundant locking This reverts commit `ac9a5d522b`. iommu_dma_init_domain() is now only called under the group mutex, as it should be given that that the default domain belongs to the group, so the additional internal locking is no longer needed. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/a943d4c198e6a1fffe998337d577dc3aa7f660a9.1740585469.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-10 09:23:58 +01:00
Yi Liu	55c85fa757	iommufd: Fail replace if device has not been attached The current implementation of iommufd_device_do_replace() implicitly assumes that the input device has already been attached. However, there is no explicit check to verify this assumption. If another device within the same group has been attached, the replace operation might succeed, but the input device itself may not have been attached yet. As a result, the input device might not be tracked in the igroup->device_list, and its reserved IOVA might not be added. Despite this, the caller might incorrectly assume that the device has been successfully replaced, which could lead to unexpected behavior or errors. To address this issue, add a check to ensure that the input device has been attached before proceeding with the replace operation. This check will help maintain the integrity of the device tracking system and prevent potential issues arising from incorrect assumptions about the device's attachment status. Fixes: `e88d4ec154` ("iommufd: Add iommufd_device_replace()") Link: https://patch.msgid.link/r/20250306034842.5950-1-yi.l.liu@intel.com Cc: stable@vger.kernel.org Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-07 17:05:24 -04:00
Nicolin Chen	897008d0f7	iommufd: Set domain->iommufd_hwpt in all hwpt->domain allocators Setting domain->iommufd_hwpt in iommufd_hwpt_alloc() only covers the HWPT allocations from user space, but not for an auto domain. This resulted in a NULL pointer access in the auto domain pathway: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 pc : iommufd_sw_msi+0x54/0x2b0 lr : iommufd_sw_msi+0x40/0x2b0 Call trace: iommufd_sw_msi+0x54/0x2b0 (P) iommu_dma_prepare_msi+0x64/0xa8 its_irq_domain_alloc+0xf0/0x2c0 irq_domain_alloc_irqs_parent+0x2c/0xa8 msi_domain_alloc+0xa0/0x1a8 Since iommufd_sw_msi() requires to access the domain->iommufd_hwpt, it is better to set that explicitly prior to calling iommu_domain_set_sw_msi(). Fixes: `748706d7ca` ("iommu: Turn fault_data to iommufd private pointer") Link: https://patch.msgid.link/r/20250305211800.229465-1-nicolinc@nvidia.com Reported-by: Ankit Agrawal <ankita@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Ankit Agrawal <ankita@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-07 15:56:22 -04:00
Nicolin Chen	a05df03a88	iommufd: Fix uninitialized rc in iommufd_access_rw() Reported by smatch: drivers/iommu/iommufd/device.c:1392 iommufd_access_rw() error: uninitialized symbol 'rc'. Fixes: `8d40205f60` ("iommufd: Add kAPI toward external drivers for kernel access") Link: https://patch.msgid.link/r/20250227200729.85030-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202502271339.a2nWr9UA-lkp@intel.com/ [nicolinc: can't find an original report but only in "old smatch warnings"] Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-04 09:41:25 -04:00
Yi Liu	1062d81086	iommufd: Disallow allocating nested parent domain with fault ID Allocating a domain with a fault ID indicates that the domain is faultable. However, there is a gap for the nested parent domain to support PRI. Some hardware lacks the capability to distinguish whether PRI occurs at stage 1 or stage 2. This limitation may require software-based page table walking to resolve. Since no in-tree IOMMU driver currently supports this functionality, it is disallowed. For more details, refer to the related discussion at [1]. [1] https://lore.kernel.org/linux-iommu/bd1655c6-8b2f-4cfa-adb1-badc00d01811@intel.com/ Link: https://patch.msgid.link/r/20250226104012.82079-1-yi.l.liu@intel.com Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-04 09:34:54 -04:00
Joerg Roedel	59b6c3469e	iommu shared branch with iommufd The three dependent series on a shared branch: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ8HEaQAKCRCFwuHvBreF YfEHAP9khiTmkfsx5JGBtKMqFbF/TShyZfgcLTCWw/VGXLcdswD/fnxA8L+y6zD4 9P73iOqSfHRV69dXu9CU6duZZifWawI= =YZh7 -----END PGP SIGNATURE----- Merge tag 'for-joerg' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core iommu shared branch with iommufd The three dependent series on a shared branch: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API	2025-02-28 16:35:19 +01:00
Yi Liu	5e9f822c9c	iommu: Swap the order of setting group->pasid_array and calling attach op of iommu drivers The current implementation stores entry to the group->pasid_array before the underlying iommu driver has successfully set the new domain. This can lead to issues where PRIs are received on the new domain before the attach operation is completed. This patch swaps the order of operations to ensure that the domain is set in the underlying iommu driver before updating the group->pasid_array. Link: https://patch.msgid.link/r/20250226011849.5102-5-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Yi Liu	e1ea9d30d8	iommu: Store either domain or handle in group->pasid_array iommu_attach_device_pasid() only stores handle to group->pasid_array when there is a valid handle input. However, it makes the iommu_attach_device_pasid() unable to detect if the pasid has been attached or not previously. To be complete, let the iommu_attach_device_pasid() store the domain to group->pasid_array if no valid handle. The other users of the group->pasid_array should be updated to be consistent. e.g. the iommu_attach_group_handle() and iommu_replace_group_handle(). Link: https://patch.msgid.link/r/20250226011849.5102-4-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Yi Liu	473ec072a6	iommu: Drop iommu_group_replace_domain() iommufd does not use it now, so drop it. Link: https://patch.msgid.link/r/20250226011849.5102-3-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Yi Liu	237603a46a	iommu: Make @handle mandatory in iommu_{attach\|replace}_group_handle() Caller of the two APIs always provide a valid handle, make @handle as mandatory parameter. Take this chance incoporate the handle->domain set under the protection of group->mutex in iommu_attach_group_handle(). Link: https://patch.msgid.link/r/20250226011849.5102-2-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Lu Baolu	b150654f74	iommu/vt-d: Fix suspicious RCU usage Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts locally") moved the call to enable_drhd_fault_handling() to a code path that does not hold any lock while traversing the drhd list. Fix it by ensuring the dmar_global_lock lock is held when traversing the drhd list. Without this fix, the following warning is triggered: ============================= WARNING: suspicious RCU usage 6.14.0-rc3 #55 Not tainted ----------------------------- drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 2 locks held by cpuhp/1/23: #0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 #1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 stack backtrace: CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 #55 Call Trace: <TASK> dump_stack_lvl+0xb7/0xd0 lockdep_rcu_suspicious+0x159/0x1f0 ? __pfx_enable_drhd_fault_handling+0x10/0x10 enable_drhd_fault_handling+0x151/0x180 cpuhp_invoke_callback+0x1df/0x990 cpuhp_thread_fun+0x1ea/0x2c0 smpboot_thread_fn+0x1f5/0x2e0 ? __pfx_smpboot_thread_fn+0x10/0x10 kthread+0x12a/0x2d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x4a/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Holding the lock in enable_drhd_fault_handling() triggers a lockdep splat about a possible deadlock between dmar_global_lock and cpu_hotplug_lock. This is avoided by not holding dmar_global_lock when calling iommu_device_register(), which initiates the device probe process. Fixes: `d74169ceb0` ("iommu/vt-d: Allocate DMAR fault interrupts locally") Reported-and-tested-by: Ido Schimmel <idosch@nvidia.com> Closes: https://lore.kernel.org/linux-iommu/Zx9OwdLIc_VoQ0-a@shredder.mtl.com/ Tested-by: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250218022422.2315082-1-baolu.lu@linux.intel.com Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-28 12:19:01 +01:00
Jerry Snitselaar	64f792981e	iommu/vt-d: Remove device comparison in context_setup_pass_through_cb Remove the device comparison check in context_setup_pass_through_cb. pci_for_each_dma_alias already makes a decision on whether the callback function should be called for a device. With the check in place it will fail to create context entries for aliases as it walks up to the root bus. Fixes: `2031c469f8` ("iommu/vt-d: Add support for static identity domain") Closes: https://lore.kernel.org/linux-iommu/82499eb6-00b7-4f83-879a-e97b4144f576@linux.intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20250224180316.140123-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-28 12:19:00 +01:00
Alejandro Jimenez	c5b0320bbf	iommu/amd: Preserve default DTE fields when updating Host Page Table Root When updating the page table root field on the DTE, avoid overwriting any bits that are already set. The earlier call to make_clear_dte() writes default values that all DTEs must have set (currently DTE[V]), and those must be preserved. Currently this doesn't cause problems since the page table root update is the first field that is set after make_clear_dte() is called, and DTE_FLAG_V is set again later along with the permission bits (IR/IW). Remove this redundant assignment too. Fixes: `fd5dff9de4` ("iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250106191413.3107140-1-alejandro.j.jimenez@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-28 12:18:00 +01:00
Jason Gunthorpe	40f5175d0e	iommufd: Implement sw_msi support natively iommufd has a model where the iommu_domain can be changed while the VFIO device is attached. In this case, the MSI should continue to work. This corner case has not worked because the dma-iommu implementation of sw_msi is tied to a single domain. Implement the sw_msi mapping directly and use a global per-fd table to associate assigned IOVA to the MSI pages. This allows the MSI pages to be loaded into a domain before it is attached ensuring that MSI is not disrupted. Link: https://patch.msgid.link/r/e13d23eeacd67c0a692fc468c85b483f4dd51c57.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-27 15:29:35 -04:00
Nuno Das Neves	db912b8954	hyperv: Change hv_root_partition into a function Introduce hv_curr_partition_type to store the partition type as an enum. Right now this is limited to guest or root partition, but there will be other kinds in future and the enum is easily extensible. Set up hv_curr_partition_type early in Hyper-V initialization with hv_identify_partition_type(). hv_root_partition() just queries this value, and shouldn't be called before that. Making this check into a function sets the stage for adding a config option to gate the compilation of root partition code. In particular, hv_root_partition() can be stubbed out always be false if root partition support isn't desired. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com>	2025-02-22 02:21:45 +00:00
Nicolin Chen	748706d7ca	iommu: Turn fault_data to iommufd private pointer A "fault_data" was added exclusively for the iommufd_fault_iopf_handler() used by IOPF/PRI use cases, along with the attach_handle. Now, the iommufd version of the sw_msi function will reuse the attach_handle and fault_data for a non-fault case. Rename "fault_data" to "iommufd_hwpt" so as not to confine it to a "fault" case. Move it into a union to be the iommufd private pointer. A following patch will move the iova_cookie to the union for dma-iommu too after the iommufd_sw_msi implementation is added. Since we have two unions now, add some simple comments for readability. Link: https://patch.msgid.link/r/ee5039503f28a16590916e9eef28b917e2d1607a.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:49:05 -04:00
Jason Gunthorpe	96093fe54f	irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need it Currently, IRQ_MSI_IOMMU is selected if DMA_IOMMU is available to provide an implementation for iommu_dma_prepare/compose_msi_msg(). However, it'll make more sense for irqchips that call prepare/compose to select it, and that will trigger all the additional code and data to be compiled into the kernel. If IRQ_MSI_IOMMU is selected with no IOMMU side implementation, then the prepare/compose() will be NOP stubs. If IRQ_MSI_IOMMU is not selected by an irqchip, then the related code on the iommu side is compiled out. Link: https://patch.msgid.link/r/a2620f67002c5cdf974e89ca3bf905f5c0817be6.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:48:38 -04:00
Jason Gunthorpe	288683c92b	iommu: Make iommu_dma_prepare_msi() into a generic operation SW_MSI supports IOMMU to translate an MSI message before the MSI message is delivered to the interrupt controller. On such systems, an iommu_domain must have a translation for the MSI message for interrupts to work. The IRQ subsystem will call into IOMMU to request that a physical page be set up to receive MSI messages, and the IOMMU then sets an IOVA that maps to that physical page. Ultimately the IOVA is programmed into the device via the msi_msg. Generalize this by allowing iommu_domain owners to provide implementations of this mapping. Add a function pointer in struct iommu_domain to allow a domain owner to provide its own implementation. Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in the iommu_get_msi_cookie() path. Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain doesn't change or become freed while running. Races with IRQ operations from VFIO and domain changes from iommufd are possible here. Replace the msi_prepare_lock with a lockdep assertion for the group mutex as documentation. For the dmau_iommu.c each iommu_domain is unique to a group. Link: https://patch.msgid.link/r/4ca696150d2baee03af27c4ddefdb7b0b0280e7b.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:04:12 -04:00
Jason Gunthorpe	9349887e93	genirq/msi: Refactor iommu_dma_compose_msi_msg() The two-step process to translate the MSI address involves two functions, iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg(). Previously iommu_dma_compose_msi_msg() needed to be in the iommu layer as it had to dereference the opaque cookie pointer. Now, the previous patch changed the cookie pointer into an integer so there is no longer any need for the iommu layer to be involved. Further, the call sites of iommu_dma_compose_msi_msg() all follow the same pattern of setting an MSI message address_hi/lo to non-translated and then immediately calling iommu_dma_compose_msi_msg(). Refactor iommu_dma_compose_msi_msg() into msi_msg_set_addr() that directly accepts the u64 version of the address and simplifies all the callers. Move the new helper to linux/msi.h since it has nothing to do with iommu. Aside from refactoring, this logically prepares for the next patch, which allows multiple implementation options for iommu_dma_prepare_msi(). So, it does not make sense to have the iommu_dma_compose_msi_msg() in dma-iommu.c as it no longer provides the only iommu_dma_prepare_msi() implementation. Link: https://patch.msgid.link/r/eda62a9bafa825e9cdabd7ddc61ad5a21c32af24.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:04:12 -04:00
Jason Gunthorpe	1f7df3a691	genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie The IOMMU translation for MSI message addresses has been a 2-step process, separated in time: 1) iommu_dma_prepare_msi(): A cookie pointer containing the IOVA address is stored in the MSI descriptor when an MSI interrupt is allocated. 2) iommu_dma_compose_msi_msg(): this cookie pointer is used to compute a translated message address. This has an inherent lifetime problem for the pointer stored in the cookie that must remain valid between the two steps. However, there is no locking at the irq layer that helps protect the lifetime. Today, this works under the assumption that the iommu domain is not changed while MSI interrupts being programmed. This is true for normal DMA API users within the kernel, as the iommu domain is attached before the driver is probed and cannot be changed while a driver is attached. Classic VFIO type1 also prevented changing the iommu domain while VFIO was running as it does not support changing the "container" after starting up. However, iommufd has improved this so that the iommu domain can be changed during VFIO operation. This potentially allows userspace to directly race VFIO_DEVICE_ATTACH_IOMMUFD_PT (which calls iommu_attach_group()) and VFIO_DEVICE_SET_IRQS (which calls into iommu_dma_compose_msi_msg()). This potentially causes both the cookie pointer and the unlocked call to iommu_get_domain_for_dev() on the MSI translation path to become UAFs. Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA address is already known during iommu_dma_prepare_msi() and cannot change. Thus, it can simply be stored as an integer in the MSI descriptor. The other UAF related to iommu_get_domain_for_dev() will be addressed in patch "iommu: Make iommu_dma_prepare_msi() into a generic operation" by using the IOMMU group mutex. Link: https://patch.msgid.link/r/a4f2cd76b9dc1833ee6c1cf325cba57def22231c.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:04:11 -04:00
Asahi Lina	5276c1e076	iommu/io-pgtable-dart: Only set subpage protection disable for DART 1 Subpage protection can't be disabled on t6000-style darts, as such the disable flag no longer applies, and probably even affects something else. Fixes: `dc09fe1c5e` ("iommu/io-pgtable-dart: Add DART PTE support for t6000") Signed-off-by: Asahi Lina <lina@asahilina.net> Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20250219-dart2-no-sp-disable-v1-1-9f324cfa4e70@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-21 12:06:54 +01:00
Matthew Rosato	64af12c6ec	iommu/s390: implement iommu passthrough via identity domain Enabled via the kernel command-line 'iommu.passthrough=1' option. Introduce the concept of identity domains to s390-iommu, which relies on the bus_dma_region to offset identity mappings to the start of the DMA aperture advertized by CLP. Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250212213418.182902-5-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-21 12:02:00 +01:00
Matthew Rosato	0ed5967a0a	iommu/s390: handle IOAT registration based on domain At this point, the dma_table is really a property of the s390-iommu domain. Rather than checking its contents elsewhere in the codebase, move the code that registers the table with firmware into s390-iommu and make a decision what to register with firmware based upon the type of domain in use for the device in question. Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20250212213418.182902-4-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-21 12:01:58 +01:00
Linus Torvalds	82ff316456	ARM: - Large set of fixes for vector handling, specially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours. - Fix use of kernel VAs at EL2 when emulating timers with nVHE. - Small set of pKVM improvements and cleanups. x86: - Fix broken SNP support with KVM module built-in, ensuring the PSP module is initialized before KVM even when the module infrastructure cannot be used to order initcalls - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference. - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1. - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmev2ykUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMvxwf/bw2u08moAYWAjJLROFvfiKXnznLS iqJ2+jcw0lJ7wDqm4Zw8M5t74Rd+y5yzkLkZOyjav9yBB09zRkItiTHljCNMOQnt 2QptBa3CUN8N+rNnvVRt6dMkhw7z6n7eoFRSIDY2Y9PgiTapbFXPV1gFkMPO6+0f SyF4LCr0iuDkJdvGAZJAH/Mp8nG6dv/A6a+Q+R1RkbKn9c2OdWw4VMfhIzimFGN6 0RFjbfXXvyO0aU/W/VHwvvuhcjGkAZWfHDdaTXqbvSMhayW562UPVMVBwXdVBmDj Dk1gCKcbm4WyktbXYW6iOYj3MgdK96eI24ozps4R0aDexsrTRY4IfH4KEg== =20Ql -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Large set of fixes for vector handling, especially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours - Fix use of kernel VAs at EL2 when emulating timers with nVHE - Small set of pKVM improvements and cleanups x86: - Fix broken SNP support with KVM module built-in, ensuring the PSP module is initialized before KVM even when the module infrastructure cannot be used to order initcalls - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1 - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) x86/sev: Fix broken SNP support with KVM module built-in KVM: SVM: Ensure PSP module is initialized if KVM module is built-in crypto: ccp: Add external API interface for PSP module initialization KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic() KVM: arm64: timer: Drop warning on failed interrupt signalling KVM: arm64: Fix alignment of kvm_hyp_memcache allocations KVM: arm64: Convert timer offset VA when accessed in HYP code KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp() KVM: arm64: Eagerly switch ZCR_EL{1,2} KVM: arm64: Mark some header functions as inline KVM: arm64: Refactor exit handlers KVM: arm64: Refactor CPTR trap deactivation KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN KVM: arm64: Remove host FPSIMD saving for non-protected KVM KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop KVM: nSVM: Enter guest mode before initializing nested NPT MMU KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper ...	2025-02-16 10:25:12 -08:00
Ashish Kalra	409f45387c	x86/sev: Fix broken SNP support with KVM module built-in Fix issues with enabling SNP host support and effectively SNP support which is broken with respect to the KVM module being built-in. SNP host support is enabled in snp_rmptable_init() which is invoked as device_initcall(). SNP check on IOMMU is done during IOMMU PCI init (IOMMU_PCI_INIT stage). And for that reason snp_rmptable_init() is currently invoked via device_initcall() and cannot be invoked via subsys_initcall() as core IOMMU subsystem gets initialized via subsys_initcall(). Now, if kvm_amd module is built-in, it gets initialized before SNP host support is enabled in snp_rmptable_init() : [ 10.131811] kvm_amd: TSC scaling supported [ 10.136384] kvm_amd: Nested Virtualization enabled [ 10.141734] kvm_amd: Nested Paging enabled [ 10.146304] kvm_amd: LBR virtualization supported [ 10.151557] kvm_amd: SEV enabled (ASIDs 100 - 509) [ 10.156905] kvm_amd: SEV-ES enabled (ASIDs 1 - 99) [ 10.162256] kvm_amd: SEV-SNP enabled (ASIDs 1 - 99) [ 10.171508] kvm_amd: Virtual VMLOAD VMSAVE supported [ 10.177052] kvm_amd: Virtual GIF supported ... ... [ 10.201648] kvm_amd: in svm_enable_virtualization_cpu And then svm_x86_ops->enable_virtualization_cpu() (svm_enable_virtualization_cpu) programs MSR_VM_HSAVE_PA as following: wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa); So VM_HSAVE_PA is non-zero before SNP support is enabled on all CPUs. snp_rmptable_init() gets invoked after svm_enable_virtualization_cpu() as following : ... [ 11.256138] kvm_amd: in svm_enable_virtualization_cpu ... [ 11.264918] SEV-SNP: in snp_rmptable_init This triggers a #GP exception in snp_rmptable_init() when snp_enable() is invoked to set SNP_EN in SYSCFG MSR: [ 11.294289] unchecked MSR access error: WRMSR to 0xc0010010 (tried to write 0x0000000003fc0000) at rIP: 0xffffffffaf5d5c28 (native_write_msr+0x8/0x30) ... [ 11.294404] Call Trace: [ 11.294482] <IRQ> [ 11.294513] ? show_stack_regs+0x26/0x30 [ 11.294522] ? ex_handler_msr+0x10f/0x180 [ 11.294529] ? search_extable+0x2b/0x40 [ 11.294538] ? fixup_exception+0x2dd/0x340 [ 11.294542] ? exc_general_protection+0x14f/0x440 [ 11.294550] ? asm_exc_general_protection+0x2b/0x30 [ 11.294557] ? __pfx_snp_enable+0x10/0x10 [ 11.294567] ? native_write_msr+0x8/0x30 [ 11.294570] ? __snp_enable+0x5d/0x70 [ 11.294575] snp_enable+0x19/0x20 [ 11.294578] __flush_smp_call_function_queue+0x9c/0x3a0 [ 11.294586] generic_smp_call_function_single_interrupt+0x17/0x20 [ 11.294589] __sysvec_call_function+0x20/0x90 [ 11.294596] sysvec_call_function+0x80/0xb0 [ 11.294601] </IRQ> [ 11.294603] <TASK> [ 11.294605] asm_sysvec_call_function+0x1f/0x30 ... [ 11.294631] arch_cpu_idle+0xd/0x20 [ 11.294633] default_idle_call+0x34/0xd0 [ 11.294636] do_idle+0x1f1/0x230 [ 11.294643] ? complete+0x71/0x80 [ 11.294649] cpu_startup_entry+0x30/0x40 [ 11.294652] start_secondary+0x12d/0x160 [ 11.294655] common_startup_64+0x13e/0x141 [ 11.294662] </TASK> This #GP exception is getting triggered due to the following errata for AMD family 19h Models 10h-1Fh Processors: Processor may generate spurious #GP(0) Exception on WRMSR instruction: Description: The Processor will generate a spurious #GP(0) Exception on a WRMSR instruction if the following conditions are all met: - the target of the WRMSR is a SYSCFG register. - the write changes the value of SYSCFG.SNPEn from 0 to 1. - One of the threads that share the physical core has a non-zero value in the VM_HSAVE_PA MSR. The document being referred to above: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/57095-PUB_1_01.pdf To summarize, with kvm_amd module being built-in, KVM/SVM initialization happens before host SNP is enabled and this SVM initialization sets VM_HSAVE_PA to non-zero, which then triggers a #GP when SYSCFG.SNPEn is being set and this will subsequently cause SNP_INIT(_EX) to fail with INVALID_CONFIG error as SYSCFG[SnpEn] is not set on all CPUs. Essentially SNP host enabling code should be invoked before KVM initialization, which is currently not the case when KVM is built-in. Add fix to call snp_rmptable_init() early from iommu_snp_enable() directly and not invoked via device_initcall() which enables SNP host support before KVM initialization with kvm_amd module built-in. Add additional handling for `iommu=off` or `amd_iommu=off` options. Note that IOMMUs need to be enabled for SNP initialization, therefore, if host SNP support is enabled but late IOMMU initialization fails then that will cause PSP driver's SNP_INIT to fail as IOMMU SNP sanity checks in SNP firmware will fail with invalid configuration error as below: [ 9.723114] ccp 0000:23:00.1: sev enabled [ 9.727602] ccp 0000:23:00.1: psp enabled [ 9.732527] ccp 0000:a2:00.1: enabling device (0000 -> 0002) [ 9.739098] ccp 0000:a2:00.1: no command queues available [ 9.745167] ccp 0000:a2:00.1: psp enabled [ 9.805337] ccp 0000:23:00.1: SEV-SNP: failed to INIT rc -5, error 0x3 [ 9.866426] ccp 0000:23:00.1: SEV API:1.53 build:5 Fixes: `c3b86e61b7` ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Acked-by: Joerg Roedel <jroedel@suse.de> Message-ID: <138b520fb83964782303b43ade4369cd181fdd9c.1739226950.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-02-14 18:39:19 -05:00
Lu Baolu	add43c4fbc	iommu/vt-d: Make intel_iommu_drain_pasid_prq() cover faults for RID This driver supports page faults on PCI RID since commit <9f831c16c69e> ("iommu/vt-d: Remove the pasid present check in prq_event_thread") by allowing the reporting of page faults with the pasid_present field cleared to the upper layer for further handling. The fundamental assumption here is that the detach or replace operations act as a fence for page faults. This implies that all pending page faults associated with a specific RID or PASID are flushed when a domain is detached or replaced from a device RID or PASID. However, the intel_iommu_drain_pasid_prq() helper does not correctly handle faults for RID. This leads to faults potentially remaining pending in the iommu hardware queue even after the domain is detached, thereby violating the aforementioned assumption. Fix this issue by extending intel_iommu_drain_pasid_prq() to cover faults for RID. Fixes: `9f831c16c6` ("iommu/vt-d: Remove the pasid present check in prq_event_thread") Cc: stable@vger.kernel.org Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250121023150.815972-1-baolu.lu@linux.intel.com Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20250211005512.985563-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-14 09:12:47 +01:00
Andrew Kreimer	4a8991fe9c	iommu/exynos: Fix typos There are some typos in comments/messages: - modyfying -> modifying - Unabled -> Unable Fix them via codespell. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Link: https://lore.kernel.org/r/20250210112027.29791-1-algonell@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-14 09:12:46 +01:00
Easwar Hariharan	78be7f0453	iommu: Fix a spelling error Fix spelling error IDENITY -> IDENTITY in drivers/iommu/iommu.c. Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250128190522.70800-1-eahariha@linux.microsoft.com [ joro: Add commit message ] Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-14 09:12:46 +01:00
Vasant Hegde	ef75966abf	iommu/amd: Expicitly enable CNTRL.EPHEn bit in resume path With recent kernel, AMDGPU failed to resume after suspend on certain laptop. Sample log: ----------- Nov 14 11:52:19 Thinkbook kernel: iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:06:00.0 pasid=0x00000 address=0x135300000 flags=0x0080] Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[0]: 7d90000000000003 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[1]: 0000100103fc0009 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[2]: 2000000117840013 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[3]: 0000000000000000 This is because in resume path, CNTRL[EPHEn] is not set. Fix this by setting CNTRL[EPHEn] to 1 in resume path if EFR[EPHSUP] is set. Note May be better approach is to save the control register in suspend path and restore it in resume path instead of trying to set indivisual bits. We will have separate patch for that. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219499 Fixes: `c4cb231111` ("iommu/amd: Add support for enable/disable IOPF") Tested-by: Hamish McIntyre-Bhatty <kernel-bugzilla@regd.hamishmb.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250127094411.5931-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-14 09:11:58 +01:00
Nicolin Chen	dc10ba25d4	iommufd/fault: Remove iommufd_fault_domain_attach/detach/replace_dev() There are new attach/detach/replace helpers in device.c taking care of both the attach_handle and the fault specific routines for iopf_enable/disable() and auto response. Clean up these redundant functions in the fault.c file. Link: https://patch.msgid.link/r/3ca94625e9d78270d9a715fa0809414fddd57e58.1738645017.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-11 14:21:03 -04:00
Nicolin Chen	fb21b1568a	iommufd: Make attach_handle generic than fault specific "attach_handle" was added exclusively for the iommufd_fault_iopf_handler() used by IOPF/PRI use cases. Now, both the MSI and PASID series require to reuse the attach_handle for non-fault cases. Add a set of new attach/detach/replace helpers that does the attach_handle allocation/releasing/replacement in the common path and also handles those fault specific routines such as iopf enabling/disabling and auto response. This covers both non-fault and fault cases in a clean way, replacing those inline helpers in the header. The following patch will clean up those old helpers in the fault.c file. Link: https://patch.msgid.link/r/32687df01c02291d89986a9fca897bbbe2b10987.1738645017.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-11 14:21:03 -04:00
Lu Baolu	9759ae2cee	iommu: Fix potential memory leak in iopf_queue_remove_device() The iopf_queue_remove_device() helper removes a device from the per-iommu iopf queue when PRI is disabled on the device. It responds to all outstanding iopf's with an IOMMU_PAGE_RESP_INVALID code and detaches the device from the queue. However, it fails to release the group structure that represents a group of iopf's awaiting for a response after responding to the hardware. This can cause a memory leak if iopf_queue_remove_device() is called with pending iopf's. Fix it by calling iopf_free_group() after the iopf group is responded. Fixes: `1991123271` ("iommu: Track iopf group instead of last fault") Cc: stable@vger.kernel.org Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250117055800.782462-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-10 14:49:30 +01:00
Linus Torvalds	382e391365	hyperv-next for v6.14 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmeTFQ4THHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXqMWB/4uHjnu50u+m00OwXAKQr6i92zh50BZ RQragd9s9C8tuUNwPDmS/ct2BNAhoy43KJ0ClegdZjKxT1Ys8cLv4Wr5CaGckqWq +WCHqTgt+cPe0vUofqahB5wiAZMsnBgzFkV/OfFwBx0wkub9y5T3qVq5KapYlaDI 7Gftb+wg1AAsrdZ/HuLRy5ZVvkM/73rU2uoi8WXjr/T14E1krCFR/qirLd1OXo6Q Jb97qhnCt/N9JPwIq5/VnYWde5Mpqz6UgtA2rFLDXgNGz+h9/ND6ecWFHjZWNVdc AKWZTO5t+fRVBOSyahoyRoYSntPw3wlxyL7A2/54h6j4Dex7wLt6NQBj =empO -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Introduce a new set of Hyper-V headers in include/hyperv and replace the old hyperv-tlfs.h with the new headers (Nuno Das Neves) - Fixes for the Hyper-V VTL mode (Roman Kisel) - Fixes for cpu mask usage in Hyper-V code (Michael Kelley) - Document the guest VM hibernation behaviour (Michael Kelley) - Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain) * tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Documentation: hyperv: Add overview of guest VM hibernation hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() hyperv: Do not overlap the hvcall IO areas in get_vtl() hyperv: Enable the hypercall output page for the VTL mode hv_balloon: Fallback to generic_online_page() for non-HV hot added mem Drivers: hv: vmbus: Log on missing offers if any Drivers: hv: vmbus: Wait for boot-time offers during boot and resume uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup iommu/hyper-v: Don't assume cpu_possible_mask is dense Drivers: hv: Don't assume cpu_possible_mask is dense x86/hyperv: Don't assume cpu_possible_mask is dense hyperv: Remove the now unused hyperv-tlfs.h files hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h hyperv: Add new Hyper-V headers in include/hyperv hyperv: Clean up unnecessary #includes hyperv: Move hv_connection_id to hyperv-tlfs.h	2025-01-25 09:22:55 -08:00
Linus Torvalds	aa44198a6c	iommufd 6.14 merge window pull No major functionality this cycle: - iommufd part of the domain_alloc_paging_flags() conversion - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers - Increase a timeout waiting for other threads to drop transient refcounts that syzkaller was hitting - Fix a UBSAN hit in iova_bitmap due to shift out of bounds - Add missing cleanup of fault events during FD shutdown, fixing a memory leak - Improve the fault delivery flow to have a smaller locking critical region that does not include copy_to_user() - Fix 32 bit ABI breakage due to missed implicit padding, and fix the stack memory leakage -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ5JgnAAKCRCFwuHvBreF YQWrAP9ItOsbeOxYIEVQK1E90HbMWVHF8RhWDPnpChgzyorjjQEA/OOou6uTAcVC fJybLk3T7JrutMmBFvVLsRg+yCJPlgE= =ldhT -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "No major functionality this cycle: - iommufd part of the domain_alloc_paging_flags() conversion - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers - Increase a timeout waiting for other threads to drop transient refcounts that syzkaller was hitting - Fix a UBSAN hit in iova_bitmap due to shift out of bounds - Add missing cleanup of fault events during FD shutdown, fixing a memory leak - Improve the fault delivery flow to have a smaller locking critical region that does not include copy_to_user() - Fix 32 bit ABI breakage due to missed implicit padding, and fix the stack memory leakage" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix struct iommu_hwpt_pgfault init and padding iommufd/fault: Use a separate spinlock to protect fault->deliver list iommufd/fault: Destroy response and mutex in iommufd_fault_destroy() iommufd: Keep OBJ/IOCTL lists in an alphabetical order iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index() iommu: iommufd: fix WARNING in iommufd_device_unbind iommufd: Deal with IOMMU_HWPT_FAULT_ID_VALID in iommufd core iommufd/selftest: Remove domain_alloc_paging()	2025-01-24 12:04:35 -08:00
Linus Torvalds	f1c243fc78	IOMMU Updates for Linux v6.14 Including: - Core changes: - PASID support for the blocked_domain. - ARM-SMMU Updates: - SMMUv2: * Implement per-client prefetcher configuration on Qualcomm SoCs. * Support for the Adreno SMMU on Qualcomm's SDM670 SOC. - SMMUv3: * Pretty-printing of event records. * Drop the ->domain_alloc_paging implementation in favour of ->domain_alloc_paging_flags(flags==0). - IO-PGTable: * Generalisation of the page-table walker to enable external walkers (e.g. for debugging unexpected page-faults from the GPU). * Minor fix for handling concatenated PGDs at stage-2 with 16KiB pages. - Misc: * Clean-up device probing and replace the crufty probe-deferral hack with a more robust implementation of arm_smmu_get_by_fwnode(). * Device-tree binding updates for a bunch of Qualcomm platforms. - Intel VT-d Updates: - Remove domain_alloc_paging(). - Remove capability audit code. - Draining PRQ in sva unbind path when FPD bit set. - Link cache tags of same iommu unit together. - AMD-Vi Updates: - Use CMPXCHG128 to update DTE. - Cleanups of the domain_alloc_paging() path. - RiscV IOMMU: - Platform MSI support. - Shutdown support. - Rockchip IOMMU: - Add DT bindings for Rockchip RK3576. - More smaller fixes and cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmeQsnQACgkQK/BELZcB GuOHug/8DDIuKlUHU2+3U2kKxMb8o1kDxGPkfKgXBaxxpprvY9DARRv7N4mnF6Vu Db8P7QihXfQKnZHXEiqHE7TlA5B+IVQd3Kz96P4sY3OlVWGZYqyKv2GEHyG5CjN/ bay7bfgeo2EVEiAio6VToFFWTm+oxFZzhoYFIlAyZAuIQUp17gHXf7YyhUwk4rOz 8g0XMH6uldidID6BVpArxHh/bN9MOTdHzkyhwPF3FL8E94ziX6rWILH9ADYxBn2o wqHR1STxv398k62toPpWb78c2RdANI8treDXsYpCyDF87dygdP+SA0qkK3G6kAVA /IiPothAj6oNm+Gvwd04tEkuqVVrVqrmWE3VXSps33Tk+apYtcmLCtdpwY/F93D1 EZwTVqveBKk2hSWYVDlEyj9XKmZ9dYWDGvg2Fx844gltQHoHtEgYL+azCMU/ry7k 3+KlkUqFZZxUDbVBbbDdMF+NpxyZTtfKsaLB8f5laOP1R8Os3+dC69Gw6bhoxaMc xfL3v245V5kWSRy+w02TyZGmZSzRx0FKUbFLKpLOvZD6pfx8t8oqJTG9FwN1KzpL lvcBpPB5AUeNJQpKcVSjs/ne8WiOqdABt7ge4E9J5TwrDI8sXk0mwJaPrlDgK1V/ 0xkkLmxWnqsu04CyDay+Uv3Bls/rRBpikR/iGt9P3BbZs7Cyryw= =Xei0 -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core changes: - PASID support for the blocked_domain ARM-SMMU Updates: - SMMUv2: - Implement per-client prefetcher configuration on Qualcomm SoCs - Support for the Adreno SMMU on Qualcomm's SDM670 SOC - SMMUv3: - Pretty-printing of event records - Drop the ->domain_alloc_paging implementation in favour of domain_alloc_paging_flags(flags==0) - IO-PGTable: - Generalisation of the page-table walker to enable external walkers (e.g. for debugging unexpected page-faults from the GPU) - Minor fix for handling concatenated PGDs at stage-2 with 16KiB pages - Misc: - Clean-up device probing and replace the crufty probe-deferral hack with a more robust implementation of arm_smmu_get_by_fwnode() - Device-tree binding updates for a bunch of Qualcomm platforms Intel VT-d Updates: - Remove domain_alloc_paging() - Remove capability audit code - Draining PRQ in sva unbind path when FPD bit set - Link cache tags of same iommu unit together AMD-Vi Updates: - Use CMPXCHG128 to update DTE - Cleanups of the domain_alloc_paging() path RiscV IOMMU: - Platform MSI support - Shutdown support Rockchip IOMMU: - Add DT bindings for Rockchip RK3576 More smaller fixes and cleanups" * tag 'iommu-updates-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (66 commits) iommu: Use str_enable_disable-like helpers iommu/amd: Fully decode all combinations of alloc_paging_flags iommu/amd: Move the nid to pdom_setup_pgtable() iommu/amd: Change amd_iommu_pgtable to use enum protection_domain_mode iommu/amd: Remove type argument from do_iommu_domain_alloc() and related iommu/amd: Remove dev == NULL checks iommu/amd: Remove domain_alloc() iommu/amd: Remove unused amd_iommu_domain_update() iommu/riscv: Fixup compile warning iommu/arm-smmu-v3: Add missing #include of linux/string_choices.h iommu/arm-smmu-v3: Use str_read_write helper w/ logs iommu/io-pgtable-arm: Add way to debug pgtable walk iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys iommu/io-pgtable-arm: Make pgtable walker more generic iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500 iommu/arm-smmu: Introduce ACTLR custom prefetcher settings iommu/arm-smmu: Add support for PRR bit setup iommu/arm-smmu: Refactor qcom_smmu structure to include single pointer iommu/arm-smmu: Re-enable context caching in smmu reset operation iommu/vt-d: Link cache tags of same iommu unit together ...	2025-01-24 07:33:46 -08:00
Linus Torvalds	4c551165e7	Updates for the interrupt subsystem: - Consolidation of the machine_kexec_mask_interrupts() by providing a generic implementation and replacing the copy & pasta orgy in the relevant architectures. - Prevent unconditional operations on interrupt chips during kexec shutdown, which can trigger warnings in certain cases when the underlying interrupt has been shut down before. - Make the enforcement of interrupt handling in interrupt context unconditionally available, so that it actually works for non x86 related interrupt chips. The earlier enablement for ARM GIC chips set the required chip flag, but did not notice that the check was hidden behind a config switch which is not selected by ARM[64]. - Decrapify the handling of deferred interrupt affinity setting. Some interrupt chips require that affinity changes are made from the context of handling an interrupt to avoid certain race conditions. For x86 this was the default, but with interrupt remapping this requirement was lifted and a flag was introduced which tells the core code that affinity changes can be done in any context. Unrestricted affinity changes are the default for the majority of interrupt chips. RISCV has the requirement to add the deferred mode to one of it's interrupt controllers, but with the original implementation this would require to add the any context flag to all other RISC-V interrupt chips. That's backwards, so reverse the logic and require that chips, which need the deferred mode have to be marked accordingly. That avoids chasing the 'sane' chips and marking them. - Add multi-node support to the Loongarch AVEC interrupt controller driver. - The usual tiny cleanups, fixes and improvements all over the place. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmePkVITHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoRbQD/9bHVph/V9Ekl7JAX3aY4gG4JbRhOc7 dp1VAcHRhktRfoTztYRbjsbMu2nvZ58GKA8bkOS2jHSF/m3PbkIJfOhwk0YdIAoa +kdy5yDgqCGfkqW43DN4Cr+CnzGjWMitw67tFp3fhwehMDpDjdt2L28IjtanSS0f hO6FV7o65MWeJwxk4Isb2/nvkO+X23Lrp6RrWS8SXBnF9FFXxiPIg/fiOPTizhCh 1W/bSGxLLb9WwsVzmlGAKVFlXDij0QGaIUug2fdVZ63OsELXD7tJrLSPG133yk92 ppIa0s6BT4IBsfM00us4hG15PkLuJmP3yWWcoquG0rP8Wq58VOXiN6+rcJIyvB+5 mWceTH6IKfZGoRQKwXC7BxeBAIb147reiJtb06meq1/8ADIvzafiNy0c8x9i/UaV QiyhPVENjaGCGDomZmJQqN7Yb02Wge1k8InQnodDrHxZNl/bX/B1Z8Bxd0n6hPHg NSJXYif2AxgaddpohsdygqRDbT6SNyQdj7YjJFY5qAGJ3yFyJ4JB6WTqkWW4o1vH 3FVqdAnJmejAmmYSkah0Hkem2T5QASQmTWb93PLxiV6q+d0NM8stWAujjyVdIV/B W4Uj9mQ20cz54TjLtxqX+A1k6KcqOWRgh1l2QbUlFsgsOP3V8yz47yqYdR9qMWlO 9kNEjI3sw+G/IQ== =q4rj -----END PGP SIGNATURE----- Merge tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: - Consolidate the machine_kexec_mask_interrupts() by providing a generic implementation and replacing the copy & pasta orgy in the relevant architectures. - Prevent unconditional operations on interrupt chips during kexec shutdown, which can trigger warnings in certain cases when the underlying interrupt has been shut down before. - Make the enforcement of interrupt handling in interrupt context unconditionally available, so that it actually works for non x86 related interrupt chips. The earlier enablement for ARM GIC chips set the required chip flag, but did not notice that the check was hidden behind a config switch which is not selected by ARM[64]. - Decrapify the handling of deferred interrupt affinity setting. Some interrupt chips require that affinity changes are made from the context of handling an interrupt to avoid certain race conditions. For x86 this was the default, but with interrupt remapping this requirement was lifted and a flag was introduced which tells the core code that affinity changes can be done in any context. Unrestricted affinity changes are the default for the majority of interrupt chips. RISCV has the requirement to add the deferred mode to one of it's interrupt controllers, but with the original implementation this would require to add the any context flag to all other RISC-V interrupt chips. That's backwards, so reverse the logic and require that chips, which need the deferred mode have to be marked accordingly. That avoids chasing the 'sane' chips and marking them. - Add multi-node support to the Loongarch AVEC interrupt controller driver. - The usual tiny cleanups, fixes and improvements all over the place. * tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set() genirq/timings: Add kernel-doc for a function parameter genirq: Remove IRQ_MOVE_PCNTXT and related code x86/apic: Convert to IRQCHIP_MOVE_DEFERRED genirq: Provide IRQCHIP_MOVE_DEFERRED hexagon: Remove GENERIC_PENDING_IRQ leftover ARC: Remove GENERIC_PENDING_IRQ genirq: Remove handle_enforce_irqctx() wrapper genirq: Make handle_enforce_irqctx() unconditionally available irqchip/loongarch-avec: Add multi-nodes topology support irqchip/ts4800: Replace seq_printf() by seq_puts() irqchip/ti-sci-inta : Add module build support irqchip/ti-sci-intr: Add module build support irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown kexec: Consolidate machine_kexec_mask_interrupts() implementation genirq: Reuse irq_thread_fn() for forced thread case genirq: Move irq_thread_fn() further up in the code	2025-01-21 13:51:07 -08:00
Nicolin Chen	e721f619e3	iommufd: Fix struct iommu_hwpt_pgfault init and padding The iommu_hwpt_pgfault is used to report IO page fault data to userspace, but iommufd_fault_fops_read was never zeroing its padding. This leaks the content of the kernel stack memory to userspace. Also, the iommufd uAPI requires explicit padding and use of __aligned_u64 to ensure ABI compatibility's with 32 bit. pahole result, before: struct iommu_hwpt_pgfault { __u32 flags; /* 0 4 / __u32 dev_id; / 4 4 / __u32 pasid; / 8 4 / __u32 grpid; / 12 4 / __u32 perm; / 16 4 / / XXX 4 bytes hole, try to pack / __u64 addr; / 24 8 / __u32 length; / 32 4 / __u32 cookie; / 36 4 / / size: 40, cachelines: 1, members: 8 / / sum members: 36, holes: 1, sum holes: 4 / / last cacheline: 40 bytes / }; pahole result, after: struct iommu_hwpt_pgfault { __u32 flags; / 0 4 / __u32 dev_id; / 4 4 / __u32 pasid; / 8 4 / __u32 grpid; / 12 4 / __u32 perm; / 16 4 / __u32 __reserved; / 20 4 / __u64 addr __attribute__((__aligned__(8))); / 24 8 / __u32 length; / 32 4 / __u32 cookie; / 36 4 / / size: 40, cachelines: 1, members: 9 / / forced alignments: 1 / / last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); Fixes: `c714f15860` ("iommufd: Add fault and response message definitions") Link: https://patch.msgid.link/r/20250120195051.2450-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-01-21 13:55:49 -04:00
Nicolin Chen	3d49020a32	iommufd/fault: Use a separate spinlock to protect fault->deliver list The fault->mutex serializes the fault read()/write() fops and the iommufd_fault_auto_response_faults(), mainly for fault->response. Also, it was conveniently used to fence the fault->deliver in poll() fop and iommufd_fault_iopf_handler(). However, copy_from/to_user() may sleep if pagefaults are enabled. Thus, they could take a long time to wait for user pages to swap in, blocking iommufd_fault_iopf_handler() and its caller that is typically a shared IRQ handler of an IOMMU driver, resulting in a potential global DOS. Instead of reusing the mutex to protect the fault->deliver list, add a separate spinlock, nested under the mutex, to do the job. iommufd_fault_iopf_handler() would no longer be blocked by copy_from/to_user(). Add a free_list in iommufd_auto_response_faults(), so the spinlock can simply fence a fast list_for_each_entry_safe routine. Provide two deliver list helpers for iommufd_fault_fops_read() to use: - Fetch the first iopf_group out of the fault->deliver list - Restore an iopf_group back to the head of the fault->deliver list Lastly, move the mutex closer to the response in the fault structure, and update its kdoc accordingly. Fixes: `07838f7fd5` ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/20250117192901.79491-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-01-20 12:31:15 -04:00
Joerg Roedel	125f34e4c1	Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'qualcomm/msm', 'rockchip', 'riscv', 'core', 'intel/vt-d' and 'amd/amd-vi' into next	2025-01-17 09:02:35 +01:00
Krzysztof Kozlowski	54e7d90089	iommu: Use str_enable_disable-like helpers Replace ternary (condition ? "enable" : "disable") syntax with helpers from string_choices.h because: 1. Simple function call with one argument is easier to read. Ternary operator has three arguments and with wrapping might lead to quite long code. 2. Is slightly shorter thus also easier to read. 3. It brings uniformity in the text - same string. 4. Allows deduping by the linker, which results in a smaller binary file. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250114192642.912331-1-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-17 09:00:37 +01:00
Jason Gunthorpe	082f1bcae8	iommu/amd: Fully decode all combinations of alloc_paging_flags Currently AMD does not support IOMMU_HWPT_ALLOC_PASID \| IOMMU_HWPT_ALLOC_DIRTY_TRACKING It should be rejected. Instead it creates a V1 domain without dirty tracking support. Use a switch to fully decode the flags. Fixes: `ce2cd17546` ("iommu/amd: Enhance amd_iommu_domain_alloc_user()") Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-17 08:59:32 +01:00
Jason Gunthorpe	5a081f7f42	iommu/amd: Move the nid to pdom_setup_pgtable() The only thing that uses the nid is the io_pgtable code, and it should be set before calling alloc_io_pgtable_ops() to ensure that the top levels are allocated on the correct nid. Since dev is never NULL now we can just do this trivially and remove the other uses of nid. SVA and identity code paths never use it since they don't use io_pgtable. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-17 08:59:31 +01:00
Jason Gunthorpe	13b4ec7491	iommu/amd: Change amd_iommu_pgtable to use enum protection_domain_mode Currently it uses enum io_pgtable_fmt which is from the io pagetable code and most of the enum values are invalid. protection_domain_mode is internal the driver and has the only two valid values. Fix some signatures and variables to use the right type as well. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-17 08:59:30 +01:00
Jason Gunthorpe	55b237dd7f	iommu/amd: Remove type argument from do_iommu_domain_alloc() and related do_iommu_domain_alloc() is only called from amd_iommu_domain_alloc_paging_flags() so type is always IOMMU_DOMAIN_UNMANAGED. Remove type and all the dead conditionals checking it. IOMMU_DOMAIN_IDENTITY checks are similarly obsolete as the conversion to the global static identity domain removed those call paths. The caller of protection_domain_alloc() should set the type, fix the miss in the SVA code. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-17 08:59:29 +01:00
Jason Gunthorpe	02bcd1a8b9	iommu/amd: Remove dev == NULL checks This is no longer possible, amd_iommu_domain_alloc_paging_flags() is never called with dev = NULL from the core code. Similarly get_amd_iommu_from_dev() can never be NULL either. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-17 08:59:29 +01:00
Jason Gunthorpe	f9b80f941e	iommu/amd: Remove domain_alloc() IOMMU drivers should not be sensitive to the domain type, a paging domain should be created based only on the flags passed in, the same for all callers. AMD was using the domain_alloc() path to force VFIO into a v1 domain type, because v1 gives higher performance. However now that IOMMU_HWPT_ALLOC_PASID is present, and a NULL device is not possible, domain_alloc_paging_flags() will do the right thing for VFIO. When invoked from VFIO flags will be 0 and the amd_iommu_pgtable type of domain will be selected. This is v1 by default unless the kernel command line has overridden it to v2. If the admin is forcing v2 assume they know what they are doing so force it everywhere, including for VFIO. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-17 08:59:28 +01:00
Alejandro Jimenez	1a684b099f	iommu/amd: Remove unused amd_iommu_domain_update() All the callers have been removed by the below commit, remove the implementation and prototypes. Fixes: `322d889ae7` ("iommu/amd: Remove amd_iommu_domain_update() from page table freeing") Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-17 08:59:28 +01:00
Guo Ren	10c62c38b0	iommu/riscv: Fixup compile warning When __BITS_PER_LONG == 32, size_t is defined as unsigned int rather than unsigned long. Therefore, we should use size_t to avoid type-checking errors. Fixes: `488ffbf181` ("iommu/riscv: Paging domain support") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Tomasz Jeznach <tjeznach@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com> Link: https://lore.kernel.org/r/20250103024616.3359159-1-guoren@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-17 08:58:06 +01:00
Nicolin Chen	3f4818ec13	iommufd/fault: Destroy response and mutex in iommufd_fault_destroy() Both were missing in the initial patch. Fixes: `07838f7fd5` ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/bc8bb13e215af27e62ee51bdba3648dd4ed2dce3.1736923732.git.nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-01-16 16:40:33 -04:00
Thomas Gleixner	7d04319a05	x86/apic: Convert to IRQCHIP_MOVE_DEFERRED Instead of marking individual interrupts as safe to be migrated in arbitrary contexts, mark the interrupt chips, which require the interrupt to be moved in actual interrupt context, with the new IRQCHIP_MOVE_DEFERRED flag. This makes more sense because this is a per interrupt chip property and not restricted to individual interrupts. That flips the logic from the historical opt-out to a opt-in model. This is simpler to handle for other architectures, which default to unrestricted affinity setting. It also allows to cleanup the redundant core logic significantly. All interrupt chips, which belong to a top-level domain sitting directly on top of the x86 vector domain are marked accordingly, unless the related setup code marks the interrupts with IRQ_MOVE_PCNTXT, i.e. XEN. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/all/20241210103335.563277044@linutronix.de	2025-01-15 21:38:53 +01:00
Nicolin Chen	442003f3a8	iommufd: Keep OBJ/IOCTL lists in an alphabetical order Reorder the existing OBJ/IOCTL lists. Also run clang-format for the same coding style at line wrappings. No functional change. Link: https://patch.msgid.link/r/c5e6d6e0a0bb7abc92ad26937fde19c9426bee96.1736237481.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-01-14 15:26:46 -04:00
Qasim Ijaz	e24c155105	iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index() Resolve a UBSAN shift-out-of-bounds issue in iova_bitmap_offset_to_index() where shifting the constant "1" (of type int) by bitmap->mapped.pgshift (an unsigned long value) could result in undefined behavior. The constant "1" defaults to a 32-bit "int", and when "pgshift" exceeds 31 (e.g., pgshift = 63) the shift operation overflows, as the result cannot be represented in a 32-bit type. To resolve this, the constant is updated to "1UL", promoting it to an unsigned long type to match the operand's type. Fixes: `58ccf0190d` ("vfio: Add an IOVA bitmap support") Link: https://patch.msgid.link/r/20250113223820.10713-1-qasdev00@gmail.com Reported-by: syzbot <syzbot+85992ace37d5b7b51635@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=85992ace37d5b7b51635 Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-01-14 13:53:18 -04:00
Suraj Sonawane	d9df72c6ac	iommu: iommufd: fix WARNING in iommufd_device_unbind Fix an issue detected by syzbot: WARNING in iommufd_device_unbind iommufd: Time out waiting for iommufd object to become free Resolve a warning in iommufd_device_unbind caused by a timeout while waiting for the shortterm_users reference count to reach zero. The existing 10-second timeout is insufficient in some scenarios, resulting in failures the above warning. Increase the timeout in iommufd_object_dec_wait_shortterm from 10 seconds to 60 seconds to allow sufficient time for the reference count to drop to zero. This change prevents premature timeouts and reduces the likelihood of warnings during iommufd_device_unbind. Fixes: `6f9c4d8c46` ("iommufd: Do not UAF during iommufd_put_object()") Link: https://patch.msgid.link/r/20241123195900.3176-1-surajsonawane0215@gmail.com Reported-by: syzbot+c92878e123785b1fa2db@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c92878e123785b1fa2db Tested-by: syzbot+c92878e123785b1fa2db@syzkaller.appspotmail.com Signed-off-by: Suraj Sonawane <surajsonawane0215@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-01-14 13:50:05 -04:00
Will Deacon	1f3dc29d24	iommu/arm-smmu-v3: Add missing #include of linux/string_choices.h Commit `f2c77f6e41` ("iommu/arm-smmu-v3: Use str_read_write helper w/ logs") introduced a call to str_read_write() in the SMMUv3 driver but without an explicit #include of <linux/string_choices.h>. This breaks the build for custom configurations where CONFIG_ACPI=n: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:1909:4: error: call to undeclared function 'str_read_write'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 1909 \| str_read_write(evt->read), \| ^ Add the missing #include. Link: https://lore.kernel.org/r/d07e82a4-2880-4ae3-961b-471bfa7ac6c4@samsung.com Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: `f2c77f6e41` ("iommu/arm-smmu-v3: Use str_read_write helper w/ logs") Signed-off-by: Will Deacon <will@kernel.org>	2025-01-10 13:28:56 +00:00
Michael Kelley	4f6b64f3d3	iommu/hyper-v: Don't assume cpu_possible_mask is dense Current code gets the APIC IDs for CPUs numbered 255 and lower. This code assumes cpu_possible_mask is dense, which is not true in the general case per [1]. If cpu_possible_mask contains holes, num_possible_cpus() is less than nr_cpu_ids, so some CPUs might get skipped. Furthermore, getting the APIC ID of a CPU that isn't in cpu_possible_mask is invalid. However, the configurations that Hyper-V provides to guest VMs on x86 hardware, in combination with how x86 code assigns Linux CPU numbers, does always produce a dense cpu_possible_mask. So the dense assumption is not currently causing failures. But for robustness against future changes in how cpu_possible_mask is populated, update the code to no longer assume dense. The correct approach is to determine the range to scan based on nr_cpu_ids, and skip any CPUs that are not in the cpu_possible_mask. [1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241003035333.49261-4-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241003035333.49261-4-mhklinux@outlook.com>	2025-01-10 00:54:21 +00:00
Pranjal Shrivastava	f2c77f6e41	iommu/arm-smmu-v3: Use str_read_write helper w/ logs Adopt the `str_read_write` helper in event logging as suggested by the coccinelle tool. Signed-off-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/all/20250107130053.GC6991@willie-the-truck/ Link: https://lore.kernel.org/r/20250107165100.1093357-1-praan@google.com Signed-off-by: Will Deacon <will@kernel.org>	2025-01-08 14:00:34 +00:00
Rob Clark	aff028a819	iommu/io-pgtable-arm: Add way to debug pgtable walk Add an io-pgtable method to walk the pgtable returning the raw PTEs that would be traversed for a given iova access. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241210165127.600817-4-robdclark@gmail.com [will: Removed 'arm_lpae_io_pgtable_walk_data::level' per Mostafa] Signed-off-by: Will Deacon <will@kernel.org>	2025-01-07 15:44:20 +00:00
Rob Clark	d9e589e6ad	iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys Re-use the generic pgtable walk path. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241210165127.600817-3-robdclark@gmail.com Signed-off-by: Will Deacon <will@kernel.org>	2025-01-07 15:42:23 +00:00
Rob Clark	821500d5c5	iommu/io-pgtable-arm: Make pgtable walker more generic We can re-use this basic pgtable walk logic in a few places. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241210165127.600817-2-robdclark@gmail.com Signed-off-by: Will Deacon <will@kernel.org>	2025-01-07 15:42:23 +00:00
Bibek Kumar Patro	3e35c3e725	iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500 Add ACTLR data table for qcom_smmu_500 including corresponding data entry and set prefetch value by way of a list of compatible strings. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-6-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>	2025-01-07 13:55:28 +00:00
Bibek Kumar Patro	9fe18d825a	iommu/arm-smmu: Introduce ACTLR custom prefetcher settings Currently in Qualcomm SoCs the default prefetch is set to 1 which allows the TLB to fetch just the next page table. MMU-500 features ACTLR register which is implementation defined and is used for Qualcomm SoCs to have a custom prefetch setting enabling TLB to prefetch the next set of page tables accordingly allowing for faster translations. ACTLR value is unique for each SMR (Stream matching register) and stored in a pre-populated table. This value is set to the register during context bank initialisation. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-5-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>	2025-01-07 13:55:28 +00:00
Bibek Kumar Patro	7f2ef1bfc7	iommu/arm-smmu: Add support for PRR bit setup Add an adreno-smmu-priv interface for drm/msm to call into arm-smmu-qcom and initiate the "Partially Resident Region" (PRR) bit setup or reset sequence as per request. This will be used by GPU to setup the PRR bit and related configuration registers through adreno-smmu private interface instead of directly poking the smmu hardware. Suggested-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-4-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>	2025-01-07 13:55:07 +00:00
Bibek Kumar Patro	445d7a8ed9	iommu/arm-smmu: Refactor qcom_smmu structure to include single pointer qcom_smmu_match_data is static and constant so refactor qcom_smmu to store single pointer to qcom_smmu_match_data instead of replicating multiple child members of the same and handle the further dereferences in the places that want them. Suggested-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-3-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>	2025-01-07 13:26:51 +00:00
Bibek Kumar Patro	ef4144b1b4	iommu/arm-smmu: Re-enable context caching in smmu reset operation Default MMU-500 reset operation disables context caching in prefetch buffer. It is however expected for context banks using the ACTLR register to retain their prefetch value during reset and runtime suspend. Add config 'ARM_SMMU_MMU_500_CPRE_ERRATA' to gate this errata workaround in default MMU-500 reset operation which defaults to 'Y' and provide option to disable workaround for context caching in prefetch buffer as and when needed. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-2-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>	2025-01-07 13:26:28 +00:00
Zhenzhong Duan	acf5d49aaf	iommu/vt-d: Link cache tags of same iommu unit together Cache tag invalidation requests for a domain are accumulated until a different iommu unit is found when traversing the cache_tags linked list. But cache tags of same iommu unit can be distributed in the linked list, this make batched flush less efficient. E.g., one device backed by iommu0 is attached to a domain in between two devices attaching backed by iommu1. Group cache tags together for same iommu unit in cache_tag_assign() to maximize the performance of batched flush. Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/r/20241219054358.8654-1-zhenzhong.duan@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-07 09:30:53 +01:00
Lu Baolu	cf08ca81d0	iommu/vt-d: Draining PRQ in sva unbind path when FPD bit set When a device uses a PASID for SVA (Shared Virtual Address), it's possible that the PASID entry is marked as non-present and FPD bit set before the device flushes all ongoing DMA requests and removes the SVA domain. This can occur when an exception happens and the process terminates before the device driver stops DMA and calls the iommu driver to unbind the PASID. There's no need to drain the PRQ in the mm release path. Instead, the PRQ will be drained in the SVA unbind path. But in such case, intel_pasid_tear_down_entry() only checks the presence of the pasid entry and returns directly. Add the code to clear the FPD bit and drain the PRQ. Fixes: `c43e1ccdeb` ("iommu/vt-d: Drain PRQs when domain removed from RID") Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241217024240.139615-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-07 09:30:52 +01:00
Lu Baolu	c220629940	iommu/vt-d: Remove iommu cap audit The capability audit code was introduced by commit <ad3d19029979> "iommu/vt-d: Audit IOMMU Capabilities and add helper functions", aiming to verify the consistency of capabilities across all IOMMUs for supported features. Nowadays, all the kAPIs of the iommu subsystem have evolved to be device oriented, in preparation for supporting heterogeneous IOMMU architectures. There is no longer a need to require capability consistence among IOMMUs for any feature. Remove the iommu cap audit code to make the driver align with the design in the iommu core. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241216071828.22962-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-07 09:30:51 +01:00
Jason Gunthorpe	de1dda7e0b	iommu/vt-d: Remove domain_alloc_paging() This is duplicated by intel_iommu_domain_alloc_paging_flags(), just remove it. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/0-v1-b101d00c5ee5+17645-vtd_paging_flags_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-07 09:30:50 +01:00
Kees Bakker	60f030f741	iommu/vt-d: Avoid use of NULL after WARN_ON_ONCE There is a WARN_ON_ONCE to catch an unlikely situation when domain_remove_dev_pasid can't find the `pasid`. In case it nevertheless happens we must avoid using a NULL pointer. Signed-off-by: Kees Bakker <kees@ijzerbout.nl> Link: https://lore.kernel.org/r/20241218201048.E544818E57E@bout3.ijzerbout.nl Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-07 09:30:50 +01:00
Gao Shiyuan	5bb494d5cb	iommu/amd: remove return value of amd_iommu_detect The return value of amd_iommu_detect is not used, so remove it and is consistent with other iommu detect functions. Signed-off-by: Gao Shiyuan <gaoshiyuan@baidu.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250103165808.80939-1-gaoshiyuan@baidu.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-06 12:42:00 +01:00
Zhang Heng	afc0cbc6e2	iommu/msm: Use helper function devm_clk_get_prepared() Since commit `7ef9651e97` ("clk: Provide new devm_clk helpers for prepared and enabled clocks"), devm_clk_get() and clk_prepare() can now be replaced by devm_clk_get_prepared() when driver prepares the clocks for the whole lifetime of the device. Moreover, it is no longer necessary to unprepare the clocks explicitly. Signed-off-by: Zhang Heng <zhangheng@kylinos.cn> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20250103113059.463033-1-zhangheng@kylinos.cn Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-06 12:41:00 +01:00
Xu Lu	77a44196ab	iommu/riscv: Add shutdown function for iommu driver This commit supplies shutdown callback for iommu driver. The shutdown callback resets necessary registers so that newly booted kernel can pass riscv_iommu_init_check() after kexec. Also, the shutdown callback resets iommu mode to bare instead of off so that new kernel can still use PCIE devices even when CONFIG_RISCV_IOMMU is not enabled. Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Link: https://lore.kernel.org/r/20250103093220.38106-3-luxu.kernel@bytedance.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-06 12:38:11 +01:00
Xu Lu	8d8d3752c0	iommu/riscv: Empty iommu queue before enabling it Changing cqen/fqen/pqen from 0 to 1 sets the cqh/fqt/pqt registers to 0. But the cqt/fqh/pqh registers are left unmodified. This commit resets cqt/fqh/pqh registers to ensure corresponding queues are empty before being enabled during initialization. Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Link: https://lore.kernel.org/r/20250103093220.38106-2-luxu.kernel@bytedance.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-01-06 12:37:19 +01:00
Nicolin Chen	e94dc6ddda	iommu/tegra241-cmdqv: Read SMMU IDR1.CMDQS instead of hardcoding The hardware limitation "max=19" actually comes from SMMU Command Queue. So, it'd be more natural for tegra241-cmdqv driver to read it out rather than hardcoding it itself. This is not an issue yet for a kernel on a baremetal system, but a guest kernel setting the queue base/size in form of IPA/gPA might result in a noncontiguous queue in the physical address space, if underlying physical pages backing up the guest RAM aren't contiguous entirely: e.g. 2MB-page backed guest RAM cannot guarantee a contiguous queue if it is 8MB (capped to VCMDQ_LOG2SIZE_MAX=19). This might lead to command errors when HW does linear-read from a noncontiguous queue memory. Adding this extra IDR1.CMDQS cap (in the guest kernel) allows VMM to set SMMU's IDR1.CMDQS=17 for the case mentioned above, so a guest-level queue will be capped to maximum 2MB, ensuring a contiguous queue memory. Fixes: `a3799717b8` ("iommu/tegra241-cmdqv: Fix alignment failure at max_n_shift") Reported-by: Ian Kalinowski <ikalinowski@nvidia.com> Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20241219051421.1850267-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-19 15:44:11 +00:00
Mostafa Saleh	b7b8a63055	iommu/io-pgtable-arm: Fix cfg reading in arm_lpae_concat_mandatory() The newly introduced arm_lpae_concat_mandatory() function reads the ias/oas fields from the 'io_pgtable_cfg' copy embedded inside the 'arm_lpae_io_pgtable' structure. However, this copy is not set until later in alloc_io_pgtable_ops() after the alloc() function has been called. Use the address sizes passed in the 'io_pgtable_cfg' structure when deciding whether or not to concatenate the PGD. Fixes: `4dcac8407f` ("iommu/io-pgtable-arm: Fix stage-2 concatenation with 16K") Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241215200412.561400-1-smostafa@google.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-19 15:41:27 +00:00
Yi Liu	647b7aad19	iommu: Remove the remove_dev_pasid op The iommu drivers that supports PASID have supported attaching pasid to the blocked_domain, hence remove the remove_dev_pasid op from the iommu_ops. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-8-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:39:37 +01:00
Yi Liu	5f53638882	iommu/amd: Make the blocked domain support PASID The blocked domain can be extended to park PASID of a device to be the DMA blocking state. By this the remove_dev_pasid() op is dropped. Remove PASID from old domain and device GCR3 table. No need to attach PASID to the blocked domain as clearing PASID from GCR3 table will make sure all DMAs for that PASID are blocked. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-7-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:39:37 +01:00
Yi Liu	4f0bdab175	iommu/vt-d: Make the blocked domain support PASID The blocked domain can be extended to park PASID of a device to be the DMA blocking state. By this the remove_dev_pasid() op is dropped. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-6-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:39:36 +01:00
Jason Gunthorpe	ef181762cb	iommu/arm-smmu-v3: Make the blocked domain support PASID The blocked domain is used to park RID to be blocking DMA state. This can be extended to PASID as well. By this, the remove_dev_pasid() op of ARM SMMUv3 can be dropped. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-5-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:39:36 +01:00
Yi Liu	b18301b915	iommu: Detaching pasid by attaching to the blocked_domain The iommu drivers are on the way to detach pasid by attaching to the blocked domain. However, this cannot be done in one shot. During the transition, iommu core would select between the remove_dev_pasid op and the blocked domain. Suggested-by: Kevin Tian <kevin.tian@intel.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-4-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:39:35 +01:00
Yi Liu	1fbf73425f	iommu: Consolidate the ops->remove_dev_pasid usage into a helper Add a wrapper for the ops->remove_dev_pasid, this consolidates the iommu_ops fetching and callback invoking. It is also a preparation for starting the transition from using remove_dev_pasid op to detach pasid to the way using blocked_domain to detach pasid. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-3-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:39:35 +01:00
Yi Liu	fb3de9f9b0	iommu: Prevent pasid attach if no ops->remove_dev_pasid driver should implement both set_dev_pasid and remove_dev_pasid op, otherwise it is a problem how to detach pasid. In reality, it is impossible that an iommu driver implements set_dev_pasid() but no remove_dev_pasid() op. However, it is better to check it. Move the group check to be the first as dev_iommu_ops() may fail when there is no valid group. Also take the chance to remove the dev_has_iommu() check as it is duplicated to the group check. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-2-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:39:34 +01:00
Suravee Suthikulpanit	b0988acc94	iommu/amd: Remove amd_iommu_apply_erratum_63() Also replace __set_dev_entry_bit() with set_dte_bit() and remove unused helper functions. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241118054937.5203-10-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:37:43 +01:00
Suravee Suthikulpanit	457da57646	iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() When updating only within a 64-bit tuple of a DTE, just lock the DTE and use WRITE_ONCE() because it is writing to memory read back by HW. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241118054937.5203-9-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:37:42 +01:00
Suravee Suthikulpanit	66ea3f96ae	iommu/amd: Modify clear_dte_entry() to avoid in-place update By reusing the make_clear_dte() and update_dte256(). Also, there is no need to set TV bit for non-SNP system when clearing DTE for blocked domain, and no longer need to apply erratum 63 in clear_dte() since it is already stored in struct ivhd_dte_flags and apply in set_dte_entry(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241118054937.5203-8-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:37:42 +01:00
Suravee Suthikulpanit	a2ce608a1e	iommu/amd: Introduce helper function get_dte256() And use it in clone_alias() along with update_dte256(). Also use get_dte256() in dump_dte_entry(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241118054937.5203-7-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:37:41 +01:00
Suravee Suthikulpanit	fd5dff9de4	iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Also, the set_dte_entry() is used to program several DTE fields (e.g. stage1 table, stage2 table, domain id, and etc.), which is difficult to keep track with current implementation. Therefore, separate logic for clearing DTE (i.e. make_clear_dte) and another function for setting up the GCR3 Table Root Pointer, GIOV, GV, GLX, and GuestPagingMode into another function set_dte_gcr3_table(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241118054937.5203-6-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:37:40 +01:00
Suravee Suthikulpanit	8b3f787338	iommu/amd: Introduce helper function to update 256-bit DTE The current implementation does not follow 128-bit write requirement to update DTE as specified in the AMD I/O Virtualization Techonology (IOMMU) Specification. Therefore, modify the struct dev_table_entry to contain union of u128 data array, and introduce a helper functions update_dte256() to update DTE using two 128-bit cmpxchg operations to update 256-bit DTE with the modified structure, and take into account the DTE[V, GV] bits when programming the DTE to ensure proper order of DTE programming and flushing. In addition, introduce a per-DTE spin_lock struct dev_data.dte_lock to provide synchronization when updating the DTE to prevent cmpxchg128 failure. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241118054937.5203-5-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:37:39 +01:00
Suravee Suthikulpanit	7bea695ada	iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags During early initialization, the driver parses IVRS IVHD block to get list of downstream devices along with their DTE flags (i.e INITPass, EIntPass, NMIPass, SysMgt, Lint0Pass, Lint1Pass). This information is currently store in the device DTE, and needs to be preserved when clearing and configuring each DTE, which makes it difficult to manage. Introduce struct ivhd_dte_flags to store IVHD DTE settings for a device or range of devices, which are stored in the amd_ivhd_dev_flags_list during initial IVHD parsing. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241118054937.5203-4-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:37:38 +01:00
Suravee Suthikulpanit	82582f85ed	iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported According to the AMD IOMMU spec, IOMMU hardware reads the entire DTE in a single 256-bit transaction. It is recommended to update DTE using 128-bit operation followed by an INVALIDATE_DEVTAB_ENTYRY command when the IV=1b or V=1b before the change. According to the AMD BIOS and Kernel Developer's Guide (BDKG) dated back to family 10h Processor [1], which is the first introduction of AMD IOMMU, AMD processor always has CPUID Fn0000_0001_ECX[CMPXCHG16B]=1. Therefore, it is safe to assume cmpxchg128 is available with all AMD processor w/ IOMMU. In addition, the CMPXCHG16B feature has already been checked separately before enabling the GA, XT, and GAM modes. Consolidate the detection logic, and fail the IOMMU initialization if the feature is not supported. [1] https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/31116.pdf Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241118054937.5203-3-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:37:38 +01:00
Suravee Suthikulpanit	f20a6e3eb2	iommu/amd: Misc ACPI IVRS debug info clean up * Remove redundant AMD-Vi prefix. * Print IVHD device entry settings field using hex value. * Print root device of IVHD ACPI device entry using hex value. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241118054937.5203-2-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:37:37 +01:00
Andrew Jones	d5f88acdd6	iommu/riscv: Add support for platform msi Apply platform_device_msi_init_and_alloc_irqs() to add support for MSIs when the IOMMU is a platform device. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241112133504.491984-4-ajones@ventanamicro.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-18 09:32:53 +01:00
Lu Baolu	dda2b8c3c6	iommu/vt-d: Avoid draining PRQ in sva mm release path When a PASID is used for SVA by a device, it's possible that the PASID entry is cleared before the device flushes all ongoing DMA requests and removes the SVA domain. This can occur when an exception happens and the process terminates before the device driver stops DMA and calls the iommu driver to unbind the PASID. There's no need to drain the PRQ in the mm release path. Instead, the PRQ will be drained in the SVA unbind path. Unfortunately, commit `c43e1ccdeb` ("iommu/vt-d: Drain PRQs when domain removed from RID") changed this behavior by unconditionally draining the PRQ in intel_pasid_tear_down_entry(). This can lead to a potential sleeping-in-atomic-context issue. Smatch static checker warning: drivers/iommu/intel/prq.c:95 intel_iommu_drain_pasid_prq() warn: sleeping in atomic context To avoid this issue, prevent draining the PRQ in the SVA mm release path and restore the previous behavior. Fixes: `c43e1ccdeb` ("iommu/vt-d: Drain PRQs when domain removed from RID") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-iommu/c5187676-2fa2-4e29-94e0-4a279dc88b49@stanley.mountain/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241212021529.1104745-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-13 15:54:27 +01:00
Yi Liu	74536f9196	iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain The qi_batch is allocated when assigning cache tag for a domain. While for nested parent domain, it is missed. Hence, when trying to map pages to the nested parent, NULL dereference occurred. Also, there is potential memleak since there is no lock around domain->qi_batch allocation. To solve it, add a helper for qi_batch allocation, and call it in both the __cache_tag_assign_domain() and __cache_tag_assign_parent_domain(). BUG: kernel NULL pointer dereference, address: 0000000000000200 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8104795067 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632 Call Trace: ? __die+0x24/0x70 ? page_fault_oops+0x80/0x150 ? do_user_addr_fault+0x63/0x7b0 ? exc_page_fault+0x7c/0x220 ? asm_exc_page_fault+0x26/0x30 ? cache_tag_flush_range_np+0x13c/0x260 intel_iommu_iotlb_sync_map+0x1a/0x30 iommu_map+0x61/0xf0 batch_to_domain+0x188/0x250 iopt_area_fill_domains+0x125/0x320 ? rcu_is_watching+0x11/0x50 iopt_map_pages+0x63/0x100 iopt_map_common.isra.0+0xa7/0x190 iopt_map_user_pages+0x6a/0x80 iommufd_ioas_map+0xcd/0x1d0 iommufd_fops_ioctl+0x118/0x1c0 __x64_sys_ioctl+0x93/0xc0 do_syscall_64+0x71/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `705c1cdf1e` ("iommu/vt-d: Introduce batched cache invalidation") Cc: stable@vger.kernel.org Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241210130322.17175-1-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-13 15:54:25 +01:00
Lu Baolu	1f2557e08a	iommu/vt-d: Remove cache tags before disabling ATS The current implementation removes cache tags after disabling ATS, leading to potential memory leaks and kernel crashes. Specifically, CACHE_TAG_DEVTLB type cache tags may still remain in the list even after the domain is freed, causing a use-after-free condition. This issue really shows up when multiple VFs from different PFs passed through to a single user-space process via vfio-pci. In such cases, the kernel may crash with kernel messages like: BUG: kernel NULL pointer dereference, address: 0000000000000014 PGD 19036a067 P4D 1940a3067 PUD 136c9b067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 74 UID: 0 PID: 3183 Comm: testCli Not tainted 6.11.9 #2 RIP: 0010:cache_tag_flush_range+0x9b/0x250 Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x163/0x590 ? exc_page_fault+0x72/0x190 ? asm_exc_page_fault+0x22/0x30 ? cache_tag_flush_range+0x9b/0x250 ? cache_tag_flush_range+0x5d/0x250 intel_iommu_tlb_sync+0x29/0x40 intel_iommu_unmap_pages+0xfe/0x160 __iommu_unmap+0xd8/0x1a0 vfio_unmap_unpin+0x182/0x340 [vfio_iommu_type1] vfio_remove_dma+0x2a/0xb0 [vfio_iommu_type1] vfio_iommu_type1_ioctl+0xafa/0x18e0 [vfio_iommu_type1] Move cache_tag_unassign_domain() before iommu_disable_pci_caps() to fix it. Fixes: `3b1d9e2b2d` ("iommu/vt-d: Add cache tag assignment interface") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241129020506.576413-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-13 15:54:24 +01:00
Joerg Roedel	7460d43bd7	Arm SMMU fixes for 6.13-rc - Use raw_smp_processor_id() when balancing traffic for NVIDIA's custom command queue implementation. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmdaJS8QHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNBkkB/97jVNBdS6nZ+LYZ8Tt7hGLnoO5Wo9BPcxA lKfqrHz/WanPrRobiF4M5UNN01ITnjcWeVsjxWqox8liE9AwXftbpbnP6qQ6bEgI ZpUNcAhF/CBRcAmJpCwHfoDvpOg5dm9PmuUEhAitD4R3w+I0Yygdm+fLmrsrG7/e fx6M52b1J++BK7MXAUAijIi1TG5yVVuHIku+HkOdMurSAwxiH9Shwtib1RK8w/63 O2DVjJNfbM35xXFduIPSzbOqTComC/j1dd3mvbdCsumPKr3UzSbkWMZ3q3OiwOyU BNVXpvx1ymjv3iI7kFuEW1v1ykUiimZhzbdAKaT3Dyp46fkhX1pu =7lXv -----END PGP SIGNATURE----- Merge tag 'arm-smmu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into fixes Arm SMMU fixes for 6.13-rc - Use raw_smp_processor_id() when balancing traffic for NVIDIA's custom command queue implementation.	2024-12-12 17:25:37 +01:00
Yi Liu	11534b4de2	iommufd: Deal with IOMMU_HWPT_FAULT_ID_VALID in iommufd core IOMMU_HWPT_FAULT_ID_VALID is used to mark if the fault_id field of iommu_hwp_alloc is valid or not. As the fault_id field is handled in the iommufd core, so it makes sense to sanitize the IOMMU_HWPT_FAULT_ID_VALID flag in the iommufd core, and mask it out before passing the user flags to the iommu drivers. Link: https://patch.msgid.link/r/20241207120108.5640-1-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-12-11 15:46:14 -04:00
Jason Gunthorpe	d61927d784	iommufd/selftest: Remove domain_alloc_paging() Since this implements domain_alloc_paging_flags() it only needs one op. Fold mock_domain_alloc_paging() into mock_domain_alloc_paging_flags(). Link: https://patch.msgid.link/r/0-v1-8a3e7e21ff6a+1745d-iommufd_paging_flags_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-12-11 14:09:29 -04:00
Jason Gunthorpe	91da87d38c	iommu/amd: Add lockdep asserts for domain->dev_list Add an assertion to all the iteration points that don't obviously have the lock held already. These all take the locker higher in their call chains. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/2-v1-3b9edcf8067d+3975-amd_dev_list_locking_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-10 10:12:06 +01:00
Jason Gunthorpe	4a552f7890	iommu/amd: Put list_add/del(dev_data) back under the domain->lock The list domain->dev_list is protected by the domain->lock spinlock. Any iteration, addition or removal must be under the lock. Move the list_del() up into the critical section. pdom_is_sva_capable(), and destroy_gcr3_table() do not interact with the list element. Wrap the list_add() in a lock, it would make more sense if this was under the same critical section as adjusting the refcounts earlier, but that requires more complications. Fixes: `d6b47dec36` ("iommu/amd: Reduce domain lock scope in attach device path") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/1-v1-3b9edcf8067d+3975-amd_dev_list_locking_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-12-10 10:12:05 +01:00
Mostafa Saleh	376ce8b35e	iommu/io-pgtable-arm: Add coverage for different OAS in selftest Run selftests with different OAS values intead of hardcoding it to 48 bits. We always keep OAS >= IAS to make the config valid for stage-2. This can be further improved, if we split IAS/OAS configuration for stage-1 and stage-2 (to use input sizes compatible with VA_BITS as SMMUv3 does, or IAS > OAS which is valid for stage-1). However, that adds more complexity, and the current change improves coverage and makes it possible to test all concatenation cases. Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241202140604.422235-3-smostafa@google.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 23:53:19 +00:00
Mostafa Saleh	4dcac8407f	iommu/io-pgtable-arm: Fix stage-2 concatenation with 16K At the moment, io-pgtable-arm uses concatenation only if it is possible at level 0, which misses a case where concatenation is mandatory at level 1 according to R_SRKBC in Arm spec DDI0487 K.a. Also, that means concatenation can be used when not mandated, contradicting the comment on the code. However, these cases can only happen if the SMMUv3 driver is changed to use ias != oas for stage-2. This patch re-writes the code to use concatenation only if mandatory, fixing the missing case for level-1 and granule 16K with PA = 40 bits. Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241202140604.422235-2-smostafa@google.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 23:53:18 +00:00
Luis Claudio R. Goncalves	1f80621816	iommu/tegra241-cmdqv: do not use smp_processor_id in preemptible context During boot some of the calls to tegra241_cmdqv_get_cmdq() will happen in preemptible context. As this function calls smp_processor_id(), if CONFIG_DEBUG_PREEMPT is enabled, these calls will trigger a series of "BUG: using smp_processor_id() in preemptible" backtraces. As tegra241_cmdqv_get_cmdq() only calls smp_processor_id() to use the CPU number as a factor to balance out traffic on cmdq usage, it is safe to use raw_smp_processor_id() here. Cc: <stable@vger.kernel.org> Fixes: `918eb5c856` ("iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV") Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/Z1L1mja3nXzsJ0Pk@uudg.org Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 23:38:30 +00:00
Jason Gunthorpe	cdfb9840fc	iommu/arm-smmu-v3: Remove domain_alloc_paging() arm_smmu_domain_alloc_paging_flags() with a flags = 0 now does the same thing as arm_smmu_domain_alloc_paging(), remove arm_smmu_domain_alloc_paging(). Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v1-0bb8d5313a27+27b-smmuv3_paging_flags_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 23:09:03 +00:00
Jason Gunthorpe	bb857c5c01	iommu/arm-smmu-v3: Make domain_alloc_paging_flags() directly determine the S1/S2 The selection of S1/S2 is a bit indirect today, make domain_alloc_paging_flags() directly decode the flags and select the correct S1/S2 type. Directly reject flag combinations the HW doesn't support when processing the flags. Fix missing rejection of some flag combinations that are not supported today (ie NEST_PARENT \| DIRTY_TRACKING) by using a switch statement to list out exactly the combinations that are currently supported. Move the determination of the stage out of arm_smmu_domain_finalise() and into both callers. As today the default stage is S1 if supported in HW. This makes arm_smmu_domain_alloc_paging_flags() self contained and no longer calling arm_smmu_domain_alloc_paging(). Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v1-0bb8d5313a27+27b-smmuv3_paging_flags_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 23:08:06 +00:00
Jason Gunthorpe	48e7b8e284	iommu/arm-smmu-v3: Remove arm_smmu_domain_finalise() during attach Domains are now always finalized during allocation because the core code no longer permits a NULL dev argument to domain_alloc_paging/_flags(). Remove the late finalize during attach that supported domains that were not fully initialized. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v1-0bb8d5313a27+27b-smmuv3_paging_flags_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 23:08:06 +00:00
Robin Murphy	6e192214c6	iommu/arm-smmu-v3: Document SVA interaction with new pagetable features Process pagetables may now be using new permission-indirection-based features which an SMMU may not understand when given such a table for SVA. Although SMMUv3.4 does add its own S1PIE feature, realistically we're still going to have to cope with feature mismatches between CPUs and SMMUs, so let's start simple and essentially just document the expectations for what falls out as-is. Although it seems unlikely for SVA applications to also depend on memory-hardening features, or vice-versa, the relative lifecycles make it tricky to enforce mutual exclusivity. Thankfully our PIE index allocation makes it relatively benign for an SMMU to keep interpreting them as direct permissions, the only real implication is that an SVA application cannot harden itself against its own devices with these features. Thus, inform the user about that just in case they have other expectations. Also we don't (yet) support LPA2, so deny SVA entirely if we're going to misunderstand the pagetable format altogether. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/68a37b00a720f0827cac0e4f40e4d3a688924054.1733406275.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 22:49:19 +00:00
Robin Murphy	46b3df8eb9	iommu: Manage driver probe deferral better Since iommu_fwspec_init() absorbed the basic driver probe deferral check to wait for an IOMMU to register, we may as well handle the probe deferral timeout there as well. The current inconsistency of callers results in client devices deferring forever on an arm64 ACPI system where an SMMU has failed its own driver probe. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/41fa59f156ef8d196d08fa75c4901e6d4b12e6c4.1733406914.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 22:46:54 +00:00
Robin Murphy	fcbd621567	iommu/arm-smmu-v3: Clean up more on probe failure kmemleak noticed that the iopf queue allocated deep down within arm_smmu_init_structures() can be leaked by a subsequent error return from arm_smmu_device_probe(). Furthermore, after arm_smmu_device_reset() we will also leave the SMMU enabled with an empty Stream Table, silently blocking all DMA. This proves rather annoying for debugging said probe failure, so let's handle it a bit better by putting the SMMU back into (more or less) the same state as if it hadn't probed at all. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/5137901958471cf67f2fad5c2229f8a8f1ae901a.1733406914.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 22:46:53 +00:00
Robin Murphy	97cb1fa027	iommu/arm-smmu: Retire probe deferral workaround This reverts commit `229e6ee43d`. Now that the fundamental ordering issue between arm_smmu_get_by_fwnode() and iommu_device_register() is resolved, the race condition for client probe no longer exists either, so retire the specific workaround. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/4167c5dfa052d4c8bb780f0a30af63dcfc4ce6c1.1733406914.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 22:46:53 +00:00
Robin Murphy	7d835134d4	iommu/arm-smmu: Make instance lookup robust Relying on the driver list was a cute idea for minimising the scope of our SMMU device lookups, however it turns out to have a subtle flaw. The SMMU device only gets added to that list after arm_smmu_device_probe() returns success, so there's actually no way the iommu_device_register() call from there could ever work as intended, even if it wasn't already hampered by the fwspec setup not happening early enough. Switch both arm_smmu_get_by_fwnode() implementations to use a platform bus lookup instead, which will reliably work. Also make sure that we don't register SMMUv2 instances until we've fully initialised them, to avoid similar consequences of the lookup now finding a device with no drvdata. Moving the error returns is also a perfect excuse to streamline them with dev_err_probe() in the process. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/6d7ce1dc31873abdb75c895fb8bd2097cce098b4.1733406914.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 22:46:53 +00:00
Jason Gunthorpe	9b640ae7fb	iommu/arm-smmuv3: Update comments about ATS and bypass The SMMUv3 spec has a note that BYPASS and ATS don't work together under the STE EATS field definition. However there is another section "13.6.4 Full ATS skipping stage 1" that explains under certain conditions BYPASS and ATS do work together if the STE is using S1DSS to select BYPASS and the CD table has the possibility for a substream. When these comments were written the understanding was that all forms of BYPASS just didn't work and this was to be a future problem to solve. It turns out that ATS and IDENTITY will always work just fine: - If STE.Config = BYPASS then the PCI ATS is disabled - If a PASID domain is attached then S1DSS = BYPASS and ATS will be enabled. This meets the requirements of 13.6.4 to automatically generate 1:1 ATS replies on the RID. Update the comments to reflect this. Fixes: `7497f4211f` ("iommu/arm-smmu-v3: Make changing domains be hitless for ATS") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-f27174f44f39+27a33-smmuv3_ats_note_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 22:41:42 +00:00
Pranjal Shrivastava	d814b70b9b	iommu/arm-smmu-v3: Log better event records Currently, the driver dumps the raw hex for a received event record. Improve this by leveraging `struct arm_smmu_event` for event fields and log human-readable event records with meaningful information. Signed-off-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20241203184906.2264528-3-praan@google.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 21:43:43 +00:00
Pranjal Shrivastava	43ca55f555	iommu/arm-smmu-v3: Introduce struct arm_smmu_event Introduce `struct arm_smmu_event` to represent event records. Parse out relevant fields from raw event records for ease and use the new `struct arm_smmu_event` instead. Signed-off-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20241203184906.2264528-2-praan@google.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 21:43:43 +00:00
Richard Acayan	4231473890	iommu/arm-smmu-qcom: add sdm670 adreno iommu compatible Add the compatible for the separate IOMMU on SDM670 for the Adreno GPU. This IOMMU has the compatible strings: "qcom,sdm670-smmu-v2", "qcom,adreno-smmu", "qcom,smmu-v2" While the SMMU 500 doesn't need an entry for this specific SoC, the SMMU v2 compatible should have its own entry, as the fallback entry in arm-smmu.c handles "qcom,smmu-v2" without per-process page table support unless there is an entry here. This entry can't be the "qcom,adreno-smmu" compatible because dedicated GPU IOMMUs can also be SMMU 500 with different handling. Signed-off-by: Richard Acayan <mailingradian@gmail.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20241114004713.42404-6-mailingradian@gmail.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-09 19:09:10 +00:00
Linus Torvalds	6a103867b9	iommufd 6.13 first rc pull - Correct typos in comments - Elaborate a comment about how the uAPI works for IOMMU_HW_INFO_TYPE_ARM_SMMUV3 - Fix a double free on error path and add test coverage for the bug -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ1Hx1QAKCRCFwuHvBreF YRA6AQCtirCoWoMgq1mSsFjtzmr5XCffS1g6T7KsKs54sDZZXAD/doa4NpE3LTPo huOPMfH2O0HufLpj+FTzew4q0RbAJAc= =z0Ak -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: "One bug fix and some documentation updates: - Correct typos in comments - Elaborate a comment about how the uAPI works for IOMMU_HW_INFO_TYPE_ARM_SMMUV3 - Fix a double free on error path and add test coverage for the bug" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommu/arm-smmu-v3: Improve uAPI comment for IOMMU_HW_INFO_TYPE_ARM_SMMUV3 iommufd/selftest: Cover IOMMU_FAULT_QUEUE_ALLOC in iommufd_fail_nth iommufd: Fix out_fput in iommufd_fault_alloc() iommufd: Fix typos in kernel-doc comments	2024-12-05 15:02:20 -08:00
Nicolin Chen	af7f478051	iommufd: Fix out_fput in iommufd_fault_alloc() As fput() calls the file->f_op->release op, where fault obj and ictx are getting released, there is no need to release these two after fput() one more time, which would result in imbalanced refcounts: refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 48 PID: 2369 at lib/refcount.c:31 refcount_warn_saturate+0x60/0x230 Call trace: refcount_warn_saturate+0x60/0x230 (P) refcount_warn_saturate+0x60/0x230 (L) iommufd_fault_fops_release+0x9c/0xe0 [iommufd] ... VFS: Close: file count is 0 (f_op=iommufd_fops [iommufd]) WARNING: CPU: 48 PID: 2369 at fs/open.c:1507 filp_flush+0x3c/0xf0 Call trace: filp_flush+0x3c/0xf0 (P) filp_flush+0x3c/0xf0 (L) __arm64_sys_close+0x34/0x98 ... imbalanced put on file reference count WARNING: CPU: 48 PID: 2369 at fs/file.c:74 __file_ref_put+0x100/0x138 Call trace: __file_ref_put+0x100/0x138 (P) __file_ref_put+0x100/0x138 (L) __fput_sync+0x4c/0xd0 Drop those two lines to fix the warnings above. Cc: stable@vger.kernel.org Fixes: `07838f7fd5` ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/b5651beb3a6b1adeef26fffac24607353bf67ba1.1733212723.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-12-03 12:15:00 -04:00
Peter Zijlstra	cdd30ebb1b	module: Convert symbol namespace to string literal Clean up the existing export namespace code along the same lines of commit `33def8498f` ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS \| while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify$ns$/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify$ns$/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS$([^)])$/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(])$([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(])\(([^,]+), ([^)]+)$/ && $0 !~ /(EXPORT_SYMBOL_NS[^(])/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(])$([^,]+), ([^)]+)$/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-12-02 11:34:44 -08:00
Linus Torvalds	e70140ba0d	Get rid of 'remove_new' relic from platform driver struct The continual trickle of small conversion patches is grating on me, and is really not helping. Just get rid of the 'remove_new' member function, which is just an alias for the plain 'remove', and had a comment to that effect: /* * .remove_new() is a relic from a prototype conversion of .remove(). * New drivers are supposed to implement .remove(). Once all drivers are * converted to not use .remove_new any more, it will be dropped. */ This was just a tree-wide 'sed' script that replaced '.remove_new' with '.remove', with some care taken to turn a subsequent tab into two tabs to make things line up. I did do some minimal manual whitespace adjustment for places that used spaces to line things up. Then I just removed the old (sic) .remove_new member function, and this is the end result. No more unnecessary conversion noise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-12-01 15:12:43 -08:00
Linus Torvalds	64e6fc27d6	iommufd 6.13 merge window pull #2 Change the driver callback op domain_alloc_user() into two ops domain_alloc_paging_flags() and domain_alloc_nesting() that better describe what the ops are expected to do. There will be per-driver cleanup based on this going into the next cycle via the driver trees. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ0c+GQAKCRCFwuHvBreF YVJXAP97zriu5x9AVNzG6mLOOYYROuhCTXGieKPqNg/apGOyogD/cMuMql3w86h+ oBIMai6GaTBUU+Ie8S07PM0849TpLgA= =yyYk -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull more iommufd updates from Jason Gunthorpe: "Change the driver callback op domain_alloc_user() into two ops: domain_alloc_paging_flags() and domain_alloc_nesting() that better describe what the ops are expected to do. There will be per-driver cleanup based on this going into the next cycle via the driver trees" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommu: Rename ops->domain_alloc_user() to domain_alloc_paging_flags() iommu: Add ops->domain_alloc_nested()	2024-11-27 14:24:34 -08:00
Linus Torvalds	5c00ff742b	- The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro\|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq xyyenpibWgUoShU2wZ/Ha8FE5WDINwg= =JfWR -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro\|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...	2024-11-23 09:58:07 -08:00
Linus Torvalds	ceba6f6f33	IOMMU Updates for Linux v6.13: Including: - Core Updates: - Convert call-sites using iommu_domain_alloc() to more specific versions and remove function. - Introduce iommu_paging_domain_alloc_flags(). - Extend support for allocating PASID-capable domains to more drivers. - Remove iommu_present(). - Some smaller improvements. - New IOMMU driver for RISC-V. - Intel VT-d Updates: - Add domain_alloc_paging support. - Enable user space IOPFs in non-PASID and non-svm cases. - Small code refactoring and cleanups. - Add domain replacement support for pasid. - AMD-Vi Updates: - Adapt to iommu_paging_domain_alloc_flags() interface and alloc V2 page-tables by default. - Replace custom domain ID allocator with IDA allocator. - Add ops->release_domain() support. - Other improvements to device attach and domain allocation code paths. - ARM-SMMU Updates: - SMMUv2: - Return -EPROBE_DEFER for client devices probing before their SMMU. - Devicetree binding updates for Qualcomm MMU-500 implementations. - SMMUv3: - Minor fixes and cleanup for NVIDIA's virtual command queue driver. - IO-PGTable: - Fix indexing of concatenated PGDs and extend selftest coverage. - Remove unused block-splitting support. - S390 IOMMU: - Implement support for blocking domain. - Mediatek IOMMU: - Enable 35-bit physical address support for mt8186. - OMAP IOMMU driver: - Adapt to recent IOMMU core changes and unbreak driver. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmdAPOoACgkQK/BELZcB GuOs1w/+PoLbOYUjmJiOfpI6YNSEfF2tE4z2al/YYIBcNoAmTTRauuhv6+S0gVRy NTfSucw7OuLlbE9vGsdY02UL1PK58NGfUF8Z2rZSf+RRgLACc47cjZWh0vzDlNbP 4LTdqJXmIWiYcmDtY7LmHtwTSiB900YFZwZOHmTSfNyJt8UC4tBPRh8k2YD3vuxc QZlxSihEf+F+vm8GtW40Ia9BiG3YhCYAcHq6Y4dKxI0JWN+7oRiPN8CF+z/vcdjV VpCDBcbHjvqqpXJvddQHA0SrGDBMHz1AXYhRXnfe7Ogh6SbaSWDSsdaIS27DsOzC L6fxW3+sNmfEOO1RmJoizkHzAtkLWCLNjBvjOb1hUCpwLcKf5nhgE3wOQSwzqumn KbxpoQpHFJutikDBGRsKJCsNqS8ZNWd4Z8rHhTnq2ctuYUFvurkcwX4WXOSRpsoA iJ+x1ezk9FxObHj/B+1nIAwKoeaLyFEwJe7Etom/E2m/2mq2oQOrq1bvfIGCms5h mqLYJ9L9MDanhEiOshHooy6ROPD842XmWILfq3HUi9JcrB/BvILPRsESQnNAn3Zl 8ImbR5VijGGDy50KBE8I9abRwDTIn9c2JJVDSh3tAz1aicGnRLcIeqNeuJ4IEQZf IQb7qcZQge17ie/Pwr24GlwrKG7DhOg5NXvl3DiVUum2NFGjuBc= =V9hb -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core Updates: - Convert call-sites using iommu_domain_alloc() to more specific versions and remove function - Introduce iommu_paging_domain_alloc_flags() - Extend support for allocating PASID-capable domains to more drivers - Remove iommu_present() - Some smaller improvements New IOMMU driver for RISC-V Intel VT-d Updates: - Add domain_alloc_paging support - Enable user space IOPFs in non-PASID and non-svm cases - Small code refactoring and cleanups - Add domain replacement support for pasid AMD-Vi Updates: - Adapt to iommu_paging_domain_alloc_flags() interface and alloc V2 page-tables by default - Replace custom domain ID allocator with IDA allocator - Add ops->release_domain() support - Other improvements to device attach and domain allocation code paths ARM-SMMU Updates: - SMMUv2: - Return -EPROBE_DEFER for client devices probing before their SMMU - Devicetree binding updates for Qualcomm MMU-500 implementations - SMMUv3: - Minor fixes and cleanup for NVIDIA's virtual command queue driver - IO-PGTable: - Fix indexing of concatenated PGDs and extend selftest coverage - Remove unused block-splitting support S390 IOMMU: - Implement support for blocking domain Mediatek IOMMU: - Enable 35-bit physical address support for mt8186 OMAP IOMMU driver: - Adapt to recent IOMMU core changes and unbreak driver" * tag 'iommu-updates-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (92 commits) iommu/tegra241-cmdqv: Fix alignment failure at max_n_shift iommu: Make set_dev_pasid op support domain replacement iommu/arm-smmu-v3: Make set_dev_pasid() op support replace iommu/vt-d: Add set_dev_pasid callback for nested domain iommu/vt-d: Make identity_domain_set_dev_pasid() to handle domain replacement iommu/vt-d: Make intel_svm_set_dev_pasid() support domain replacement iommu/vt-d: Limit intel_iommu_set_dev_pasid() for paging domain iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement iommu/vt-d: Add iommu_domain_did() to get did iommu/vt-d: Consolidate the struct dev_pasid_info add/remove iommu/vt-d: Add pasid replace helpers iommu/vt-d: Refactor the pasid setup helpers iommu/vt-d: Add a helper to flush cache for updating present pasid entry iommu: Pass old domain to set_dev_pasid op iommu/iova: Fix typo 'adderss' iommu: Add a kdoc to iommu_unmap() iommu/io-pgtable-arm-v7s: Remove split on unmap behavior iommu/io-pgtable-arm: Remove split on unmap behavior iommu/vt-d: Drain PRQs when domain removed from RID iommu/vt-d: Drop pasid requirement for prq initialization ...	2024-11-22 19:55:10 -08:00
Jason Gunthorpe	d53764723e	iommu: Rename ops->domain_alloc_user() to domain_alloc_paging_flags() Now that the main domain allocating path is calling this function it doesn't make sense to leave it named _user. Change the name to alloc_paging_flags() to mirror the new iommu_paging_domain_alloc_flags() function. A driver should implement only one of ops->domain_alloc_paging() or ops->domain_alloc_paging_flags(). The former is a simpler interface with less boiler plate that the majority of drivers use. The latter is for drivers with a greater feature set (PASID, multiple page table support, advanced iommufd support, nesting, etc). Additional patches will be needed to achieve this. Link: https://patch.msgid.link/r/2-v1-c252ebdeb57b+329-iommu_paging_flags_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-22 14:43:45 -04:00
Jason Gunthorpe	64214c2b95	iommu: Add ops->domain_alloc_nested() It turns out all the drivers that are using this immediately call into another function, so just make that function directly into the op. This makes paging=NULL for domain_alloc_user and we can remove the argument in the next patch. The function mirrors the similar op in the viommu that allocates a nested domain on top of the viommu's nesting parent. This version supports cases where a viommu is not being used. Link: https://patch.msgid.link/r/1-v1-c252ebdeb57b+329-iommu_paging_flags_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-22 14:43:45 -04:00
Jason Gunthorpe	2d76228195	IOMMU Updates for Linux v6.13: Including: - Core Updates: - Convert call-sites using iommu_domain_alloc() to more specific versions and remove function. - Introduce iommu_paging_domain_alloc_flags(). - Extend support for allocating PASID-capable domains to more drivers. - Remove iommu_present(). - Some smaller improvements. - New IOMMU driver for RISC-V. - Intel VT-d Updates: - Add domain_alloc_paging support. - Enable user space IOPFs in non-PASID and non-svm cases. - Small code refactoring and cleanups. - Add domain replacement support for pasid. - AMD-Vi Updates: - Adapt to iommu_paging_domain_alloc_flags() interface and alloc V2 page-tables by default. - Replace custom domain ID allocator with IDA allocator. - Add ops->release_domain() support. - Other improvements to device attach and domain allocation code paths. - ARM-SMMU Updates: - SMMUv2: - Return -EPROBE_DEFER for client devices probing before their SMMU. - Devicetree binding updates for Qualcomm MMU-500 implementations. - SMMUv3: - Minor fixes and cleanup for NVIDIA's virtual command queue driver. - IO-PGTable: - Fix indexing of concatenated PGDs and extend selftest coverage. - Remove unused block-splitting support. - S390 IOMMU: - Implement support for blocking domain. - Mediatek IOMMU: - Enable 35-bit physical address support for mt8186. - OMAP IOMMU driver: - Adapt to recent IOMMU core changes and unbreak driver. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmdAPOoACgkQK/BELZcB GuOs1w/+PoLbOYUjmJiOfpI6YNSEfF2tE4z2al/YYIBcNoAmTTRauuhv6+S0gVRy NTfSucw7OuLlbE9vGsdY02UL1PK58NGfUF8Z2rZSf+RRgLACc47cjZWh0vzDlNbP 4LTdqJXmIWiYcmDtY7LmHtwTSiB900YFZwZOHmTSfNyJt8UC4tBPRh8k2YD3vuxc QZlxSihEf+F+vm8GtW40Ia9BiG3YhCYAcHq6Y4dKxI0JWN+7oRiPN8CF+z/vcdjV VpCDBcbHjvqqpXJvddQHA0SrGDBMHz1AXYhRXnfe7Ogh6SbaSWDSsdaIS27DsOzC L6fxW3+sNmfEOO1RmJoizkHzAtkLWCLNjBvjOb1hUCpwLcKf5nhgE3wOQSwzqumn KbxpoQpHFJutikDBGRsKJCsNqS8ZNWd4Z8rHhTnq2ctuYUFvurkcwX4WXOSRpsoA iJ+x1ezk9FxObHj/B+1nIAwKoeaLyFEwJe7Etom/E2m/2mq2oQOrq1bvfIGCms5h mqLYJ9L9MDanhEiOshHooy6ROPD842XmWILfq3HUi9JcrB/BvILPRsESQnNAn3Zl 8ImbR5VijGGDy50KBE8I9abRwDTIn9c2JJVDSh3tAz1aicGnRLcIeqNeuJ4IEQZf IQb7qcZQge17ie/Pwr24GlwrKG7DhOg5NXvl3DiVUum2NFGjuBc= =V9hb -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.13' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/iommu/linux into iommufd.git Merge with Joerg's tree for dependencies on the next patches. ====================================== IOMMU Updates for Linux v6.13: Including: - Core Updates: - Convert call-sites using iommu_domain_alloc() to more specific versions and remove function. - Introduce iommu_paging_domain_alloc_flags(). - Extend support for allocating PASID-capable domains to more drivers. - Remove iommu_present(). - Some smaller improvements. - New IOMMU driver for RISC-V. - Intel VT-d Updates: - Add domain_alloc_paging support. - Enable user space IOPFs in non-PASID and non-svm cases. - Small code refactoring and cleanups. - Add domain replacement support for pasid. - AMD-Vi Updates: - Adapt to iommu_paging_domain_alloc_flags() interface and alloc V2 page-tables by default. - Replace custom domain ID allocator with IDA allocator. - Add ops->release_domain() support. - Other improvements to device attach and domain allocation code paths. - ARM-SMMU Updates: - SMMUv2: - Return -EPROBE_DEFER for client devices probing before their SMMU. - Devicetree binding updates for Qualcomm MMU-500 implementations. - SMMUv3: - Minor fixes and cleanup for NVIDIA's virtual command queue driver. - IO-PGTable: - Fix indexing of concatenated PGDs and extend selftest coverage. - Remove unused block-splitting support. - S390 IOMMU: - Implement support for blocking domain. - Mediatek IOMMU: - Enable 35-bit physical address support for mt8186. - OMAP IOMMU driver: - Adapt to recent IOMMU core changes and unbreak driver. ====================================== Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-22 14:37:25 -04:00
Linus Torvalds	341d041daa	iommufd 6.13 merge window pull Several new features and uAPI for iommufd: - IOMMU_IOAS_MAP_FILE allows passing in a file descriptor as the backing memory for an iommu mapping. To date VFIO/iommufd have used VMA's and pin_user_pages(), this now allows using memfds and memfd_pin_folios(). Notably this creates a pure folio path from the memfd to the iommu page table where memory is never broken down to PAGE_SIZE. - IOMMU_IOAS_CHANGE_PROCESS moves the pinned page accounting between two processes. Combined with the above this allows iommufd to support a VMM re-start using exec() where something like qemu would exec() a new version of itself and fd pass the memfds/iommufd/etc to the new process. The memfd allows DMA access to the memory to continue while the new process is getting setup, and the CHANGE_PROCESS updates all the accounting. - Support for fault reporting to userspace on non-PRI HW, such as ARM stall-mode embedded devices. - IOMMU_VIOMMU_ALLOC introduces the concept of a HW/driver backed virtual iommu. This will be used by VMMs to access hardware features that are contained with in a VM. The first use is to inform the kernel of the virtual SID to physical SID mapping when issuing SID based invalidation on ARM. Further uses will tie HW features that are directly accessed by the VM, such as invalidation queue assignment and others. - IOMMU_VDEVICE_ALLOC informs the kernel about the mapping of virtual device to physical device within a VIOMMU. Minimially this is used to translate VM issued cache invalidation commands from virtual to physical device IDs. - Enhancements to IOMMU_HWPT_INVALIDATE and IOMMU_HWPT_ALLOC to work with the VIOMMU - ARM SMMuv3 support for nested translation. Using the VIOMMU and VDEVICE the driver can model this HW's behavior for nested translation. This includes a shared branch from Will. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZzzKKwAKCRCFwuHvBreF YaCMAQDOQAgw87eUYKnY7vFodlsTUA2E8uSxDmk6nPWySd0NKwD/flOP85MdEs9O Ot+RoL4/J3IyNH+eg5kN68odmx4mAw8= =ec8x -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Several new features and uAPI for iommufd: - IOMMU_IOAS_MAP_FILE allows passing in a file descriptor as the backing memory for an iommu mapping. To date VFIO/iommufd have used VMA's and pin_user_pages(), this now allows using memfds and memfd_pin_folios(). Notably this creates a pure folio path from the memfd to the iommu page table where memory is never broken down to PAGE_SIZE. - IOMMU_IOAS_CHANGE_PROCESS moves the pinned page accounting between two processes. Combined with the above this allows iommufd to support a VMM re-start using exec() where something like qemu would exec() a new version of itself and fd pass the memfds/iommufd/etc to the new process. The memfd allows DMA access to the memory to continue while the new process is getting setup, and the CHANGE_PROCESS updates all the accounting. - Support for fault reporting to userspace on non-PRI HW, such as ARM stall-mode embedded devices. - IOMMU_VIOMMU_ALLOC introduces the concept of a HW/driver backed virtual iommu. This will be used by VMMs to access hardware features that are contained with in a VM. The first use is to inform the kernel of the virtual SID to physical SID mapping when issuing SID based invalidation on ARM. Further uses will tie HW features that are directly accessed by the VM, such as invalidation queue assignment and others. - IOMMU_VDEVICE_ALLOC informs the kernel about the mapping of virtual device to physical device within a VIOMMU. Minimially this is used to translate VM issued cache invalidation commands from virtual to physical device IDs. - Enhancements to IOMMU_HWPT_INVALIDATE and IOMMU_HWPT_ALLOC to work with the VIOMMU - ARM SMMuv3 support for nested translation. Using the VIOMMU and VDEVICE the driver can model this HW's behavior for nested translation. This includes a shared branch from Will" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (51 commits) iommu/arm-smmu-v3: Import IOMMUFD module namespace iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest iommufd: Add IOMMU_IOAS_CHANGE_PROCESS iommufd: Lock all IOAS objects iommufd: Export do_update_pinned iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Use S2FWB for NESTED domains iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC Documentation: userspace-api: iommufd: Update vDEVICE iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add mock_viommu_cache_invalidate iommufd/viommu: Add iommufd_viommu_find_dev helper iommu: Add iommu_copy_struct_from_full_user_array helper iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE iommu/viommu: Add cache_invalidate to iommufd_viommu_ops iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl ...	2024-11-21 12:40:50 -08:00
Joerg Roedel	42f0cbb2a2	Merge branches 'intel/vt-d', 'amd/amd-vi' and 'iommufd/arm-smmuv3-nested' into next	2024-11-15 09:27:43 +01:00
Joerg Roedel	ae3325f752	Merge branches 'arm/smmu', 'mediatek', 's390', 'ti/omap', 'riscv' and 'core' into next	2024-11-15 09:27:02 +01:00
Nathan Chancellor	6d026e6d48	iommu/arm-smmu-v3: Import IOMMUFD module namespace Commit `69d9b312f3` ("iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC") started using _iommufd_object_alloc() without importing the IOMMUFD module namespace, resulting in a modpost warning: WARNING: modpost: module arm_smmu_v3 uses symbol _iommufd_object_alloc from namespace IOMMUFD, but does not import it. Commit `d68beb276b` ("iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object") added another warning by using iommufd_viommu_find_dev(): WARNING: modpost: module arm_smmu_v3 uses symbol iommufd_viommu_find_dev from namespace IOMMUFD, but does not import it. Import the IOMMUFD module namespace to resolve the warnings. Fixes: `69d9b312f3` ("iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC") Link: https://patch.msgid.link/r/20241114-arm-smmu-v3-import-iommufd-module-ns-v1-1-c551e7b972e9@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-14 21:07:15 -04:00
Steve Sistare	829ed62649	iommufd: Add IOMMU_IOAS_CHANGE_PROCESS Add an ioctl that updates all DMA mappings to reflect the current process, Change the mm and transfer locked memory accounting from old to current mm. This will be used for live update, allowing an old process to hand the iommufd device descriptor to a new process. The new process calls the ioctl. IOMMU_IOAS_CHANGE_PROCESS only supports DMA mappings created with IOMMU_IOAS_MAP_FILE, because the kernel metadata for such mappings does not depend on the userland VA of the pages (which is different in the new process). IOMMU_IOAS_CHANGE_PROCESS fails if other types of mappings are present. This is a revised version of code originally provided by Jason. Link: https://patch.msgid.link/r/1731527497-16091-4-git-send-email-steven.sistare@oracle.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-14 12:57:13 -04:00
Steve Sistare	051ae5aa73	iommufd: Lock all IOAS objects Define helpers to lock and unlock all IOAS objects. This will allow DMA mappings to be updated atomically during live update. This code is substantially the same as an initial version provided by Jason, plus fixes. Link: https://patch.msgid.link/r/1731527497-16091-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-14 12:47:16 -04:00
Steve Sistare	10caa8b451	iommufd: Export do_update_pinned Export do_update_pinned. No functional change. Link: https://patch.msgid.link/r/1731527497-16091-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-14 12:47:16 -04:00
Nicolin Chen	d68beb276b	iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object Implement the vIOMMU's cache_invalidate op for user space to invalidate the IOTLB entries, Device ATS and CD entries that are cached by hardware. Add struct iommu_viommu_arm_smmuv3_invalidate defining invalidation entries that are simply in the native format of a 128-bit TLBI command. Scan those commands against the permitted command list and fix their VMID/SID fields to match what is stored in the vIOMMU. Link: https://patch.msgid.link/r/12-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Co-developed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 14:11:03 -04:00
Jason Gunthorpe	f27298a82b	iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED The EATS flag needs to flow through the vSTE and into the pSTE, and ensure physical ATS is enabled on the PCI device. The physical ATS state must match the VM's idea of EATS as we rely on the VM to issue the ATS invalidation commands. Thus ATS must remain off at the device until EATS on a nesting domain turns it on. Attaching a nesting domain is the point where the invalidation responsibility transfers to userspace. Update the ATS logic to track EATS for nesting domains and flush the ATC whenever the S2 nesting parent changes. Link: https://patch.msgid.link/r/11-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 14:11:03 -04:00
Jason Gunthorpe	67e4fe3985	iommu/arm-smmu-v3: Use S2FWB for NESTED domains Force Write Back (FWB) changes how the S2 IOPTE's MemAttr field works. When S2FWB is supported and enabled the IOPTE will force cachable access to IOMMU_CACHE memory when nesting with a S1 and deny cachable access when !IOMMU_CACHE. When using a single stage of translation, a simple S2 domain, it doesn't change things for PCI devices as it is just a different encoding for the existing mapping of the IOMMU protection flags to cachability attributes. For non-PCI it also changes the combining rules when incoming transactions have inconsistent attributes. However, when used with a nested S1, FWB has the effect of preventing the guest from choosing a MemAttr in it's S1 that would cause ordinary DMA to bypass the cache. Consistent with KVM we wish to deny the guest the ability to become incoherent with cached memory the hypervisor believes is cachable so we don't have to flush it. Allow NESTED domains to be created if the SMMU has S2FWB support and use S2FWB for NESTING_PARENTS. This is an additional option to CANWBS. Link: https://patch.msgid.link/r/10-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 14:11:03 -04:00
Jason Gunthorpe	1e8be08d1c	iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED For SMMUv3 a IOMMU_DOMAIN_NESTED is composed of a S2 iommu_domain acting as the parent and a user provided STE fragment that defines the CD table and related data with addresses translated by the S2 iommu_domain. The kernel only permits userspace to control certain allowed bits of the STE that are safe for user/guest control. IOTLB maintenance is a bit subtle here, the S1 implicitly includes the S2 translation, but there is no way of knowing which S1 entries refer to a range of S2. For the IOTLB we follow ARM's guidance and issue a CMDQ_OP_TLBI_NH_ALL to flush all ASIDs from the VMID after flushing the S2 on any change to the S2. The IOMMU_DOMAIN_NESTED can only be created from inside a VIOMMU as the invalidation path relies on the VIOMMU to translate virtual stream ID used in the invalidation commands for the CD table and ATS. Link: https://patch.msgid.link/r/9-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 14:11:03 -04:00
Nicolin Chen	69d9b312f3	iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type. Implement an arm_vsmmu_alloc(). As an initial step, copy the VMID from s2_parent. A followup series is required to give the VIOMMU object it's own VMID that will be used in all nesting configurations. Link: https://patch.msgid.link/r/8-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 14:09:44 -04:00
Jason Gunthorpe	4e6bd13aa3	Merge branch 'iommufd/arm-smmuv3-nested' of iommu/linux into iommufd for-next Common SMMUv3 patches for the following patches adding nesting, shared branch with the iommu tree. * 'iommufd/arm-smmuv3-nested' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/arm-smmu-v3: Expose the arm_smmu_attach interface iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS ACPI/IORT: Support CANWBS memory access flag ACPICA: IORT: Update for revision E.f vfio: Remove VFIO_TYPE1_NESTING_IOMMU ... Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 13:47:28 -04:00
Nicolin Chen	576ad6eb45	iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command Similar to IOMMU_TEST_OP_MD_CHECK_IOTLB verifying a mock_domain's iotlb, IOMMU_TEST_OP_DEV_CHECK_CACHE will be used to verify a mock_dev's cache. Link: https://patch.msgid.link/r/cd4082079d75427bd67ed90c3c825e15b5720a5f.1730836308.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:19 -04:00
Nicolin Chen	d6563aa2a8	iommufd/selftest: Add mock_viommu_cache_invalidate Similar to the coverage of cache_invalidate_user for iotlb invalidation, add a device cache and a viommu_cache_invalidate function to test it out. Link: https://patch.msgid.link/r/a29c7c23d7cd143fb26ab68b3618e0957f485fdb.1730836308.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:19 -04:00
Nicolin Chen	c747e67978	iommufd/viommu: Add iommufd_viommu_find_dev helper This avoids a bigger trouble of exposing struct iommufd_device and struct iommufd_vdevice in the public header. Link: https://patch.msgid.link/r/84fa7c624db4d4508067ccfdf42059533950180a.1730836308.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:19 -04:00
Nicolin Chen	54ce69e36c	iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE With a vIOMMU object, use space can flush any IOMMU related cache that can be directed via a vIOMMU object. It is similar to the IOMMU_HWPT_INVALIDATE uAPI, but can cover a wider range than IOTLB, e.g. device/desciprtor cache. Allow hwpt_id of the iommu_hwpt_invalidate structure to carry a viommu_id, and reuse the IOMMU_HWPT_INVALIDATE uAPI for vIOMMU invalidations. Drivers can define different structures for vIOMMU invalidations v.s. HWPT ones. Since both the HWPT-based and vIOMMU-based invalidation pathways check own cache invalidation op, remove the WARN_ON_ONCE in the allocator. Update the uAPI, kdoc, and selftest case accordingly. Link: https://patch.msgid.link/r/b411e2245e303b8a964f39f49453a5dff280968f.1730836308.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:19 -04:00
Nicolin Chen	0ce5c2477a	iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device (struct device) against a vIOMMU (struct iommufd_viommu) object in a VM. This vDEVICE object (and its structure) holds all the infos and attributes in the VM, regarding the device related to the vIOMMU. As an initial patch, add a per-vIOMMU virtual ID. This can be: - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table - Virtual RID on a nested Intel VT-D IOMMU, an index to a Context Table Potentially, this vDEVICE structure would hold some vData for Confidential Compute Architecture (CCA). Use this virtual ID to index an "vdevs" xarray that belongs to a vIOMMU object. Add a new ioctl for vDEVICE allocations. Since a vDEVICE is a connection of a device object and an iommufd_viommu object, take two refcounts in the ioctl handler. Link: https://patch.msgid.link/r/cda8fd2263166e61b8191a3b3207e0d2b08545bf.1730836308.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:18 -04:00
Nicolin Chen	db70827a88	iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST Implement the viommu alloc/free functions to increase/reduce refcount of its dependent mock iommu device. User space can verify this loop via the IOMMU_VIOMMU_TYPE_SELFTEST. Link: https://patch.msgid.link/r/9d755a215a3007d4d8d1c2513846830332db62aa.1730836219.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:18 -04:00
Nicolin Chen	8607056945	iommufd/selftest: Add refcount to mock_iommu_device For an iommu_dev that can unplug (so far only this selftest does so), the viommu->iommu_dev pointer has no guarantee of its life cycle after it is copied from the idev->dev->iommu->iommu_dev. Track the user count of the iommu_dev. Postpone the exit routine using a completion, if refcount is unbalanced. The refcount inc/dec will be added in the following patch. Link: https://patch.msgid.link/r/33f28d64841b497eebef11b49a571e03103c5d24.1730836219.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:18 -04:00
Nicolin Chen	18f819901d	iommufd/selftest: Prepare for mock_viommu_alloc_domain_nested() A nested domain now can be allocated for a parent domain or for a vIOMMU object. Rework the existing allocators to prepare for the latter case. Link: https://patch.msgid.link/r/f62894ad8ccae28a8a616845947fe4b76135d79b.1730836219.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:18 -04:00
Nicolin Chen	fd6b853f50	iommufd/selftest: Add container_of helpers Use these inline helpers to shorten those container_of lines. Note that one of them goes back and forth between iommu_domain and mock_iommu_domain, which isn't necessary. So drop its container_of. Link: https://patch.msgid.link/r/518ec64dae2e814eb29fd9f170f58a3aad56c81c.1730836219.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:18 -04:00
Nicolin Chen	13a750180f	iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC Now a vIOMMU holds a shareable nesting parent HWPT. So, it can act like that nesting parent HWPT to allocate a nested HWPT. Support that in the IOMMU_HWPT_ALLOC ioctl handler, and update its kdoc. Also, add an iommufd_viommu_alloc_hwpt_nested helper to allocate a nested HWPT for a vIOMMU object. Since a vIOMMU object holds the parent hwpt's refcount already, increase the refcount of the vIOMMU only. Link: https://patch.msgid.link/r/a0f24f32bfada8b448d17587adcaedeeb50a67ed.1730836219.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:18 -04:00
Nicolin Chen	4db97c21ed	iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl Add a new ioctl for user space to do a vIOMMU allocation. It must be based on a nesting parent HWPT, so take its refcount. IOMMU driver wanting to support vIOMMUs must define its IOMMU_VIOMMU_TYPE_ in the uAPI header and implement a viommu_alloc op in its iommu_ops. Link: https://patch.msgid.link/r/dc2b8ba9ac935007beff07c1761c31cd097ed780.1730836219.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:18 -04:00
Nicolin Chen	d56d1e8405	iommufd: Verify object in iommufd_object_finalize/abort() To support driver-allocated vIOMMU objects, it's required for IOMMU driver to call the provided iommufd_viommu_alloc helper to embed the core struct. However, there is no guarantee that every driver will call it and allocate objects properly. Make the iommufd_object_finalize/abort functions more robust to verify if the xarray slot indexed by the input obj->id is having an XA_ZERO_ENTRY, which is the reserved value stored by xa_alloc via iommufd_object_alloc. Link: https://patch.msgid.link/r/334bd4dde8e0a88eb30fa67eeef61827cdb546f9.1730836219.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:18 -04:00
Nicolin Chen	7d4f46c237	iommufd: Move _iommufd_object_alloc helper to a sharable file The following patch will add a new vIOMMU allocator that will require this _iommufd_object_alloc to be sharable with IOMMU drivers (and iommufd too). Add a new driver.c file that will be built with CONFIG_IOMMUFD_DRIVER_CORE selected by CONFIG_IOMMUFD, and put the CONFIG_DRIVER under that remaining to be selectable for drivers to build the existing iova_bitmap.c file. Link: https://patch.msgid.link/r/2f4f6e116dc49ffb67ff6c5e8a7a8e789ab9e98e.1730836219.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-12 11:46:18 -04:00
Nicolin Chen	a3799717b8	iommu/tegra241-cmdqv: Fix alignment failure at max_n_shift When configuring a kernel with PAGE_SIZE=4KB, depending on its setting of CONFIG_CMA_ALIGNMENT, VCMDQ_LOG2SIZE_MAX=19 could fail the alignment test and trigger a WARN_ON: WARNING: at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3646 Call trace: arm_smmu_init_one_queue+0x15c/0x210 tegra241_cmdqv_init_structures+0x114/0x338 arm_smmu_device_probe+0xb48/0x1d90 Fix it by capping max_n_shift to CMDQ_MAX_SZ_SHIFT as SMMUv3 CMDQ does. Fixes: `918eb5c856` ("iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV") Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20241111030226.1940737-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-11-12 15:09:42 +00:00
Catalin Marinas	7591c127f3	kmemleak: iommu/iova: fix transient kmemleak false positive The introduction of iova_depot_pop() in `911aa1245d` ("iommu/iova: Make the rcache depot scale better") confused kmemleak by moving a struct iova_magazine object from a singly linked list to rcache->depot and resetting the 'next' pointer referencing it. Unlike doubly linked lists, the content of the object being referred is never changed on removal from a singly linked list and the kmemleak checksum heuristics do not detect such scenario. This leads to false positives like: unreferenced object 0xffff8881a5301000 (size 1024): comm "softirq", pid 0, jiffies 4306297099 (age 462.991s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 e7 7d 05 00 00 00 00 00 .........}...... 0f b4 05 00 00 00 00 00 b4 96 05 00 00 00 00 00 ................ backtrace: [<ffffffff819f5f08>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff818a239a>] kmalloc_trace+0x2a/0x60 [<ffffffff8231d31e>] free_iova_fast+0x28e/0x4e0 [<ffffffff82310860>] fq_ring_free_locked+0x1b0/0x310 [<ffffffff8231225d>] fq_flush_timeout+0x19d/0x2e0 [<ffffffff813e95ba>] call_timer_fn+0x19a/0x5c0 [<ffffffff813ea16b>] __run_timers+0x78b/0xb80 [<ffffffff813ea5bd>] run_timer_softirq+0x5d/0xd0 [<ffffffff82f1d915>] __do_softirq+0x205/0x8b5 Introduce kmemleak_transient_leak() which resets the object checksum requiring another scan pass before it is reported (if still unreferenced). Call this new API in iova_depot_pop(). Link: https://lkml.kernel.org/r/20241104111944.2207155-1-catalin.marinas@arm.com Link: https://lore.kernel.org/r/ZY1osaGLyT-sdKE8@shredder/ Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Ido Schimmel <idosch@idosch.org> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-11 17:22:26 -08:00
Nicolin Chen	d1b3dad9de	iommufd: Move struct iommufd_object to public iommufd header Prepare for an embedded structure design for driver-level iommufd_viommu objects: // include/linux/iommufd.h struct iommufd_viommu { struct iommufd_object obj; .... }; // Some IOMMU driver struct iommu_driver_viommu { struct iommufd_viommu core; .... }; It has to expose struct iommufd_object and enum iommufd_object_type from the core-level private header to the public iommufd header. Link: https://patch.msgid.link/r/54a43b0768089d690104530754f499ca05ce0074.1730836219.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-08 13:25:34 -04:00
Yi Liu	980e3016eb	iommu: Make set_dev_pasid op support domain replacement The iommu core is going to support domain replacement for pasid, it needs to make the set_dev_pasid op support replacing domain and keep the old domain config in the failure case. AMD iommu driver does not support domain replacement for pasid yet, so it would fail the set_dev_pasid op to keep the old config if the input @old is non-NULL. Till now, all the set_dev_pasid callbacks can handle the old parameter and can keep the old config when failed, so update the kdoc of set_dev_pasid op. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-14-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:58 +01:00
Jason Gunthorpe	e9f1f727e6	iommu/arm-smmu-v3: Make set_dev_pasid() op support replace set_dev_pasid() op is going to be enhanced to support domain replacement of a pasid. This prepares for this op definition. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-13-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:58 +01:00
Yi Liu	67f6f56b59	iommu/vt-d: Add set_dev_pasid callback for nested domain Add intel_nested_set_dev_pasid() to set a nested type domain to a PASID of a device. Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-12-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:57 +01:00
Yi Liu	9bc18d283d	iommu/vt-d: Make identity_domain_set_dev_pasid() to handle domain replacement Let identity_domain_set_dev_pasid() call the pasid replace helpers hence be able to do domain replacement. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-11-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:56 +01:00
Yi Liu	cfb31f194a	iommu/vt-d: Make intel_svm_set_dev_pasid() support domain replacement Make intel_svm_set_dev_pasid() support replacement. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-10-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:55 +01:00
Yi Liu	c33e20869c	iommu/vt-d: Limit intel_iommu_set_dev_pasid() for paging domain intel_iommu_set_dev_pasid() is only supposed to be used by paging domain, so limit it. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-9-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:55 +01:00
Yi Liu	c8596d65b2	iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement Let intel_iommu_set_dev_pasid() call the pasid replace helpers hence be able to do domain replacement. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-8-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:54 +01:00
Yi Liu	a1deee90a2	iommu/vt-d: Add iommu_domain_did() to get did domain_id_iommu() does not support SVA type and identity type domains. Add iommu_domain_did() to support all domain types. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-7-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:53 +01:00
Yi Liu	d93cf86cc6	iommu/vt-d: Consolidate the struct dev_pasid_info add/remove The domain_add_dev_pasid() and domain_remove_dev_pasid() are added to consolidate the adding/removing of the struct dev_pasid_info. Besides, it includes the cache tag assign/unassign as well. This also prepares for adding domain replacement for pasid. The set_dev_pasid callbacks need to deal with the dev_pasid_info for both old and new domain. These two helpers make the life easier. intel_iommu_set_dev_pasid() and intel_svm_set_dev_pasid() are updated to use the helpers. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-6-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:52 +01:00
Yi Liu	7543ee63e8	iommu/vt-d: Add pasid replace helpers pasid replacement allows converting a present pasid entry to be FS, SS, PT or nested, hence add helpers for such operations. Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-5-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:52 +01:00
Yi Liu	2cb5ff623d	iommu/vt-d: Refactor the pasid setup helpers It is clearer to have a new set of pasid replacement helpers other than extending the existing ones to cover both initial setup and replacement. Then abstract out the common code for manipulating the pasid entry as preparation. No functional change is intended. Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-4-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:51 +01:00
Yi Liu	9bd008f1a9	iommu/vt-d: Add a helper to flush cache for updating present pasid entry Generalize the logic for flushing pasid-related cache upon changes to bits other than SSADE and P which requires a different flow according to VT-d spec. No functional change is intended. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-3-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:50 +01:00
Yi Liu	b45a3777ce	iommu: Pass old domain to set_dev_pasid op To support domain replacement for pasid, the underlying iommu driver needs to know the old domain hence be able to clean up the existing attachment. It would be much convenient for iommu layer to pass down the old domain. Otherwise, iommu drivers would need to track domain for pasids by themselves, this would duplicate code among the iommu drivers. Or iommu drivers would rely group->pasid_array to get domain, which may not always the correct one. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-2-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:04:49 +01:00
Gan Jie	fcdb982e93	iommu/iova: Fix typo 'adderss' Fix typo 'adderss' to 'address'. Signed-off-by: Gan Jie <ganjie182@gmail.com> Link: https://lore.kernel.org/r/20241101072709.702-1-ganjie182@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-08 14:02:11 +01:00
Zhangfei Gao	c9d6ee6699	iommufd: Allow fault reporting for non-PRI PCI devices iommufd_fault_iopf_enable has limitation to PRI on PCI/SRIOV VFs because the PRI might be a shared resource and current iommu subsystem is not ready to support enabling/disabling PRI on a VF without any impact on others. However, we have devices that appear as PCI but are actually on the AMBA bus. These fake PCI devices have PASID capability, support stall as well as SRIOV, so remove the limitation for these devices. Link: https://patch.msgid.link/r/20241107043711.116-1-zhangfei.gao@linaro.org Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-11-07 13:06:01 -04:00
Jason Gunthorpe	6ac7dffe7c	iommu: Add a kdoc to iommu_unmap() Describe the most conservative version of the driver implementations. All drivers should support this. Many drivers support extending the range if a large page is hit, but let's not make that officially approved API. The main point is to document explicitly that split is not supported. Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-b3a5b5937f56+7bb-arm_no_split_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-11-06 15:42:36 +00:00
Jason Gunthorpe	fd50651636	iommu/io-pgtable-arm-v7s: Remove split on unmap behavior A minority of page table implementations (arm_lpae, armv7) are unique in how they handle partial unmap of large IOPTEs. Other implementations will unmap the large IOPTE and return it's length. For example if a 2M IOPTE is present and the first 4K is requested to be unmapped then unmap will remove the whole 2M and report 2M as the result. armv7 instead will break up contiguous entries and replace an entry with a whole table so it can unmap the requested 4k. This seems copied from the arm_lpae implementation, which was analyzed here: https://lore.kernel.org/all/20241024134411.GA6956@nvidia.com/ Bring consistency to the implementations and remove this unused functionality. There are no uses outside iommu, this effects the ARM_V7S drivers msm_iommu, mtk_iommu, and arm-smmmu. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-b3a5b5937f56+7bb-arm_no_split_jgg@nvidia.com [will: Remove unused 'loopnr' variable] Signed-off-by: Will Deacon <will@kernel.org>	2024-11-06 15:42:36 +00:00
Jason Gunthorpe	33729a5fc0	iommu/io-pgtable-arm: Remove split on unmap behavior A minority of page table implementations (arm_lpae, armv7) are unique in how they handle partial unmap of large IOPTEs. Other implementations will unmap the large IOPTE and return it's length. For example if a 2M IOPTE is present and the first 4K is requested to be unmapped then unmap will remove the whole 2M and report 2M as the result. arm_lpae instead replaces the IOPTE with a table of smaller IOPTEs, unmaps the 4K and returns 4k. This is actually an illegal/non-hitless operation on at least SMMUv3 because of the BBM level 0 rules. Will says this was done to support VFIO, but upon deeper analysis this was never strictly necessary: https://lore.kernel.org/all/20241024134411.GA6956@nvidia.com/ In summary, historical VFIO supported the AMD behavior of unmapping the whole large IOPTE and returning the size, even if asked to unmap a portion. The driver would see this as a request to split a large IOPTE. Modern VFIO always unmaps entire large IOPTEs (except on AMD) and drivers don't see an IOPTE split. Given it doesn't work fully correctly on SMMUv3 and relying on ARM unique behavior would create portability problems across IOMMU drivers, retire this functionality. Outside the iommu users, this will potentially effect io_pgtable users of ARM_32_LPAE_S1, ARM_32_LPAE_S2, ARM_64_LPAE_S1, ARM_64_LPAE_S2, and ARM_MALI_LPAE formats. Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: dri-devel@lists.freedesktop.org Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-b3a5b5937f56+7bb-arm_no_split_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-11-06 14:58:48 +00:00
Lu Baolu	c43e1ccdeb	iommu/vt-d: Drain PRQs when domain removed from RID As this iommu driver now supports page faults for requests without PASID, page requests should be drained when a domain is removed from the RID2PASID entry. This results in the intel_iommu_drain_pasid_prq() call being moved to intel_pasid_tear_down_entry(). This indicates that when a translation is removed from any PASID entry and the PRI has been enabled on the device, page requests are drained in the domain detachment path. The intel_iommu_drain_pasid_prq() helper has been modified to support sending device TLB invalidation requests for both PASID and non-PASID cases. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241101045543.70086-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:27 +01:00
Klaus Jensen	9baed1c280	iommu/vt-d: Drop pasid requirement for prq initialization PASID support within the IOMMU is not required to enable the Page Request Queue, only the PRS capability. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20241015-jag-iopfv8-v4-5-b696ca89ba29@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:27 +01:00
Joel Granados	cbeb1b7eee	iommufd: Enable PRI when doing the iommufd_hwpt_alloc Add IOMMU_HWPT_FAULT_ID_VALID as part of the valid flags when doing an iommufd_hwpt_alloc allowing the use of an iommu fault allocation (iommu_fault_alloc) with the IOMMU_HWPT_ALLOC ioctl. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20241015-jag-iopfv8-v4-4-b696ca89ba29@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:26 +01:00
Joel Granados	140f5dedbb	iommu/vt-d: Move IOMMU_IOPF into INTEL_IOMMU Move IOMMU_IOPF from under INTEL_IOMMU_SVM into INTEL_IOMMU. This certifies that the core intel iommu utilizes the IOPF library functions, independent of the INTEL_IOMMU_SVM config. Signed-off-by: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20241015-jag-iopfv8-v4-3-b696ca89ba29@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:26 +01:00
Klaus Jensen	9f831c16c6	iommu/vt-d: Remove the pasid present check in prq_event_thread PASID is not strictly needed when handling a PRQ event; remove the check for the pasid present bit in the request. This change was not included in the creation of prq.c to emphasize the change in capability checks when handing PRQ events. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20241015-jag-iopfv8-v4-2-b696ca89ba29@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:25 +01:00
Joel Granados	4d54409576	iommu/vt-d: Separate page request queue from SVM IO page faults are no longer dependent on CONFIG_INTEL_IOMMU_SVM. Move all Page Request Queue (PRQ) functions that handle prq events to a new file in drivers/iommu/intel/prq.c. The page_req_des struct is now declared in drivers/iommu/intel/prq.c. No functional changes are intended. This is a preparation patch to enable the use of IO page faults outside the SVM/PASID use cases. Signed-off-by: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20241015-jag-iopfv8-v4-1-b696ca89ba29@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:24 +01:00
Zhenzhong Duan	f1645676f2	iommu/vt-d: Fix checks and print in pgtable_walk() There are some issues in pgtable_walk(): 1. Super page is dumped as non-present page 2. dma_pte_superpage() should not check against leaf page table entries 3. Pointer pte is never NULL so checking it is meaningless 4. When an entry is not present, it still makes sense to dump the entry content. Fix 1,2 by checking dma_pte_superpage()'s returned value after level check. Fix 3 by removing pte check. Fix 4 by checking present bit after printing. By this chance, change to print "page table not present" instead of "PTE not present" to be clearer. Fixes: `914ff7719e` ("iommu/vt-d: Dump DMAR translation structure when DMA fault occurs") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/r/20241024092146.715063-3-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:23 +01:00
Zhenzhong Duan	6ceb93f952	iommu/vt-d: Fix checks and print in dmar_fault_dump_ptes() There are some issues in dmar_fault_dump_ptes(): 1. return value of phys_to_virt() is used for checking if an entry is present. 2. dump is confusing, e.g., "pasid table entry is not present", confusing by unpresent pasid table vs. unpresent pasid table entry. Current code means the former. 3. pgtable_walk() is called without checking if page table is present. Fix 1 by checking present bit of an entry before dump a lower level entry. Fix 2 by removing "entry" string, e.g., "pasid table is not present". Fix 3 by checking page table present before walk. Take issue 3 for example, before fix: [ 442.240357] DMAR: pasid dir entry: 0x000000012c83e001 [ 442.246661] DMAR: pasid table entry[0]: 0x0000000000000000 [ 442.253429] DMAR: pasid table entry[1]: 0x0000000000000000 [ 442.260203] DMAR: pasid table entry[2]: 0x0000000000000000 [ 442.266969] DMAR: pasid table entry[3]: 0x0000000000000000 [ 442.273733] DMAR: pasid table entry[4]: 0x0000000000000000 [ 442.280479] DMAR: pasid table entry[5]: 0x0000000000000000 [ 442.287234] DMAR: pasid table entry[6]: 0x0000000000000000 [ 442.293989] DMAR: pasid table entry[7]: 0x0000000000000000 [ 442.300742] DMAR: PTE not present at level 2 After fix: ... [ 357.241214] DMAR: pasid table entry[6]: 0x0000000000000000 [ 357.248022] DMAR: pasid table entry[7]: 0x0000000000000000 [ 357.254824] DMAR: scalable mode page table is not present Fixes: `914ff7719e` ("iommu/vt-d: Dump DMAR translation structure when DMA fault occurs") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/r/20241024092146.715063-2-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:23 +01:00
Yi Liu	4f178e07a2	iommu/vt-d: Drop s1_pgtbl from dmar_domain dmar_domian has stored the s1_cfg which includes the s1_pgtbl info, so no need to store s1_pgtbl, hence drop it. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241025143339.2328991-1-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:22 +01:00
Dr. David Alan Gilbert	95e2eaf5b9	iommu/vt-d: Remove unused dmar_msi_read dmar_msi_read() has been unused since 2022 in commit `cf8e865810` ("arch: Remove Itanium (IA-64) architecture") Remove it. (dmar_msi_write still exists and is used once). Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20241022002702.302728-1-linux@treblig.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:21 +01:00
Andy Shevchenko	6d8bac098e	iommu/vt-d: Increase buffer size for device name GCC is not happy with the current code, e.g.: .../iommu/intel/dmar.c:1063:9: note: ‘sprintf’ output between 6 and 15 bytes into a destination of size 13 1063 \| sprintf(iommu->name, "dmar%d", iommu->seq_id); When `make W=1` is supplied, this prevents kernel building. Fix it by increasing the buffer size for device name and use sizeoF() instead of hard coded constants. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20241014104529.4025937-1-andriy.shevchenko@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:20 +01:00
Jinjie Ruan	2a32309345	iommu/vt-d: Use PCI_DEVID() macro The macro PCI_DEVID() can be used instead of compose it manually. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240829021011.4135618-1-ruanjinjie@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:19 +01:00
Lu Baolu	621838c718	iommu/vt-d: Refine intel_iommu_domain_alloc_user() The domain_alloc_user ops should always allocate a guest-compatible page table unless specific allocation flags are specified. Currently, IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_ALLOC_DIRTY_TRACKING require special handling, as both require hardware support for scalable mode and second-stage translation. In such cases, the driver should select a second-stage page table for the paging domain. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:19 +01:00
Lu Baolu	ed56de8a9e	iommu/vt-d: Refactor first_level_by_default() The first stage page table is compatible across host and guest kernels. Therefore, this driver uses the first stage page table as the default for paging domains. The helper first_level_by_default() determines the feasibility of using the first stage page table based on a global policy. This policy requires consistency in scalable mode and first stage translation capability among all iommu units. However, this is unnecessary as domain allocation, attachment, and removal operations are performed on a per-device basis. The domain type (IOMMU_DOMAIN_DMA vs. IOMMU_DOMAIN_UNMANAGED) should not be a factor in determining the first stage page table usage. Both types are for paging domains, and there's no fundamental difference between them. The driver should not be aware of this distinction unless the core specifies allocation flags that require special handling. Convert first_level_by_default() from global to per-iommu and remove the 'type' input. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:18 +01:00
Lu Baolu	5bdd86ec5d	iommu/vt-d: Remove domain_update_iommu_superpage() The requirement for consistent super page support across all the IOMMU hardware in the system has been removed. In the past, if a new IOMMU was hot-added and lacked consistent super page capability, the hot-add process would be aborted. However, with the updated attachment semantics, it is now permissible for the super page capability to vary among different IOMMU hardware units. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:18 +01:00
Lu Baolu	c376a3456d	iommu/vt-d: Remove domain_update_iommu_cap() The attributes of a paging domain are initialized during the allocation process, and any attempt to attach a domain that is not compatible will result in a failure. Therefore, there is no need to update the domain attributes at the time of domain attachment. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:17 +01:00
Lu Baolu	a98db518dd	iommu/vt-d: Enhance compatibility check for paging domain attach The driver now supports domain_alloc_paging, ensuring that a valid device pointer is provided whenever a paging domain is allocated. Additionally, the dmar_domain attributes are set up at the time of allocation. Consistent with the established semantics in the IOMMU core, if a domain is attached to a device and found to be incompatible with the IOMMU hardware capabilities, the operation will return an -EINVAL error. This implicitly advises the caller to allocate a new domain for the device and attempt the domain attachment again. Rename prepare_domain_attach_device() to a more meaningful name. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:16 +01:00
Lu Baolu	9ecfcac1fe	iommu/vt-d: Remove unused domain_alloc callback With domain_alloc_paging callback supported, the legacy domain_alloc callback will never be used anymore. Remove it to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:16 +01:00
Lu Baolu	7c204426b8	iommu/vt-d: Add domain_alloc_paging support Add the domain_alloc_paging callback for domain allocation using the iommu_paging_domain_alloc() interface. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-11-05 13:32:15 +01:00
Jason Gunthorpe	f6681abd41	iommu/arm-smmu-v3: Expose the arm_smmu_attach interface The arm-smmuv3-iommufd.c file will need to call these functions too. Remove statics and put them in the header file. Remove the kunit visibility protections from arm_smmu_make_abort_ste() and arm_smmu_make_s2_domain_ste(). Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-11-05 10:24:17 +00:00
Jason Gunthorpe	874b87c753	iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT For SMMUv3 the parent must be a S2 domain, which can be composed into a IOMMU_DOMAIN_NESTED. In future the S2 parent will also need a VMID linked to the VIOMMU and even to KVM. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-11-05 10:24:17 +00:00
Nicolin Chen	6912ec9182	iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info For virtualization cases the IDR/IIDR/AIDR values of the actual SMMU instance need to be available to the VMM so it can construct an appropriate vSMMUv3 that reflects the correct HW capabilities. For userspace page tables these values are required to constrain the valid values within the CD table and the IOPTEs. The kernel does not sanitize these values. If building a VMM then userspace is required to only forward bits into a VM that it knows it can implement. Some bits will also require a VMM to detect if appropriate kernel support is available such as for ATS and BTM. Start a new file and kconfig for the advanced iommufd support. This lets it be compiled out for kernels that are not intended to support virtualization, and allows distros to leave it disabled until they are shipping a matching qemu too. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-11-05 10:24:17 +00:00
Jason Gunthorpe	e89573cf4a	iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS HW with CANWBS is always cache coherent and ignores PCI No Snoop requests as well. This meets the requirement for IOMMU_CAP_ENFORCE_CACHE_COHERENCY, so let's return it. Implement the enforce_cache_coherency() op to reject attaching devices that don't have CANWBS. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-11-05 10:24:17 +00:00
Jason Gunthorpe	35890f8557	vfio: Remove VFIO_TYPE1_NESTING_IOMMU This control causes the ARM SMMU drivers to choose a stage 2 implementation for the IO pagetable (vs the stage 1 usual default), however this choice has no significant visible impact to the VFIO user. Further qemu never implemented this and no other userspace user is known. The original description in commit `f5c9ecebaf` ("vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide SMMU translation services to the guest operating system" however the rest of the API to set the guest table pointer for the stage 1 and manage invalidation was never completed, or at least never upstreamed, rendering this part useless dead code. Upstream has now settled on iommufd as the uAPI for controlling nested translation. Choosing the stage 2 implementation should be done by through the IOMMU_HWPT_ALLOC_NEST_PARENT flag during domain allocation. Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the enable_nesting iommu_domain_op. Just in-case there is some userspace using this continue to treat requesting it as a NOP, but do not advertise support any more. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2024-11-05 10:24:16 +00:00
Vasant Hegde	18f5a6b34b	iommu/amd: Improve amd_iommu_release_device() Previous patch added ops->release_domain support. Core will attach devices to release_domain->attach_dev() before calling this function. Devices are already detached their current domain and attached to blocked domain. This is mostly dummy function now. Just throw warning if device is still attached to domain. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-13-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:48 +01:00
Vasant Hegde	a0e086b16e	iommu/amd: Add ops->release_domain In release path, remove device from existing domain and attach it to blocked domain. So that all DMAs from device is blocked. Note that soon blocked_domain will support other ops like set_dev_pasid() but release_domain supports only attach_dev ops. Hence added separate 'release_domain' variable. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-12-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:47 +01:00
Vasant Hegde	0b136493d3	iommu/amd: Reorder attach device code Ideally in attach device path, it should take dev_data lock before making changes to device data including IOPF enablement. So far dev_data was using spinlock and it was hitting lock order issue when it tries to enable IOPF. Hence Commit `526606b0a1` ("iommu/amd: Fix Invalid wait context issue") moved IOPF enablement outside dev_data->lock. Previous patch converted dev_data lock to mutex. Now its safe to call amd_iommu_iopf_add_device() with dev_data->mutex. Hence move back PCI device capability enablement (ATS, PRI, PASID) and IOPF enablement code inside the lock. Also in attach_device(), update 'dev_data->domain' at the end so that error handling becomes simple. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-11-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:46 +01:00
Vasant Hegde	e843aedbeb	iommu/amd: Convert dev_data lock from spinlock to mutex Currently in attach device path it takes dev_data->spinlock. But as per design attach device path can sleep. Also if device is PRI capable then it adds device to IOMMU fault handler queue which takes mutex. Hence currently PRI enablement is done outside dev_data lock. Covert dev_data lock from spinlock to mutex so that it follows the design and also PRI enablement can be done properly. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-10-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:45 +01:00
Vasant Hegde	4b18ef8491	iommu/amd: Rearrange attach device code attach_device() is just holding lock and calling do_attach(). There is not need to have another function. Just move do_attach() code to attach_device(). Similarly move do_detach() code to detach_device(). Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-9-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:45 +01:00
Vasant Hegde	d6b47dec36	iommu/amd: Reduce domain lock scope in attach device path Currently attach device path takes protection domain lock followed by dev_data lock. Most of the operations in this function is specific to device data except pdom_attach_iommu() where it updates protection domain structure. Hence reduce the scope of protection domain lock. Note that this changes the locking order. Now it takes device lock before taking doamin lock (group->mutex -> dev_data->lock -> pdom->lock). dev_data->lock is used only in device attachment path. So changing order is fine. It will not create any issue. Finally move numa node assignment to pdom_attach_iommu(). Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-8-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:44 +01:00
Vasant Hegde	07bbd660db	iommu/amd: Do not detach devices in domain free path All devices attached to a protection domain must be freed before calling domain free. Hence do not try to free devices in domain free path. Continue to throw warning if pdom->dev_list is not empty so that any potential issues can be fixed. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-7-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:43 +01:00
Vasant Hegde	b73c698fd5	iommu/amd: Remove unused amd_iommus variable protection_domain structure is updated to use xarray to track the IOMMUs attached to the domain. Now domain flush code is not using amd_iommus. Hence remove this unused variable. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-6-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:43 +01:00
Vasant Hegde	d16041124d	iommu/amd: xarray to track protection_domain->iommu list Use xarray to track IOMMU attached to protection domain instead of static array of MAX_IOMMUS. Also add lockdep assertion. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-5-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:42 +01:00
Vasant Hegde	743a4bae9f	iommu/amd: Remove protection_domain.dev_cnt variable protection_domain->dev_list tracks list of attached devices to domain. We can use list_* functions on dev_list to get device count. Hence remove 'dev_cnt' variable. No functional change intended. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:41 +01:00
Vasant Hegde	2fcab2deeb	iommu/amd: Use ida interface to manage protection domain ID Replace custom domain ID allocator with IDA interface. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:40 +01:00
Vasant Hegde	016991606a	iommu/amd/pgtbl_v2: Take protection domain lock before invalidating TLB Commit `c7fc12354b` ("iommu/amd/pgtbl_v2: Invalidate updated page ranges only") missed to take domain lock before calling amd_iommu_domain_flush_pages(). Fix this by taking protection domain lock before calling TLB invalidation function. Fixes: `c7fc12354b` ("iommu/amd/pgtbl_v2: Invalidate updated page ranges only") Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 11:06:39 +01:00
Joerg Roedel	556af583d2	Merge branch 'core' into amd/amd-vi	2024-10-30 11:02:48 +01:00
Robin Murphy	95b6235e36	iommu: Make bus_iommu_probe() static With the last external caller of bus_iommu_probe() now gone, make it internal as it really should be. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: H. Nikolaus Schaller <hns@goldelico.com> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Beleswar Padhi <b-padhi@ti.com> Link: https://lore.kernel.org/r/a7511a034a27259aff4e14d80a861d3c40fbff1e.1730136799.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 10:55:08 +01:00
Robin Murphy	c9b4a3185f	iommu/omap: Add minimal fwnode support The OMAP driver uses the generic "iommus" DT binding but is the final holdout not implementing a corresponding .of_xlate method. Unfortunately this now results in __iommu_probe_device() failing to find ops due to client devices missing the expected IOMMU fwnode association. The legacy DT parsing in omap_iommu_probe_device() could probably all be delegated to generic code now, but for the sake of an immediate fix, just add a minimal .of_xlate implementation to allow client fwspecs to be created appropriately, and so the ops lookup to work again. This means we also need to register the additional instances on DRA7 so that of_iommu_xlate() doesn't defer indefinitely waiting for their ops either, but we'll continue to hide them from sysfs just in case. This also renders the bus_iommu_probe() call entirely redundant. Reported-by: Beleswar Padhi <b-padhi@ti.com> Link: https://lore.kernel.org/linux-iommu/0dbde87b-593f-4b14-8929-b78e189549ad@ti.com/ Reported-by: H. Nikolaus Schaller <hns@goldelico.com> Link: https://lore.kernel.org/linux-media/A7C284A9-33A5-4E21-9B57-9C4C213CC13F@goldelico.com/ Fixes: `17de3f5fdd` ("iommu: Retire bus ops") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: H. Nikolaus Schaller <hns@goldelico.com> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Beleswar Padhi <b-padhi@ti.com> Link: https://lore.kernel.org/r/cfd766f96bc799e32b97f4664707adbcf99097b0.1730136799.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-30 10:55:07 +01:00
Will Deacon	5492f0c408	iommu/tegra241-cmdqv: Fix unused variable warning While testing some io-pgtable changes, I ran into a compiler warning from the Tegra CMDQ driver: drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c:803:23: warning: unused variable 'cmdqv_debugfs_dir' [-Wunused-variable] 803 \| static struct dentry *cmdqv_debugfs_dir; \| ^~~~~~~~~~~~~~~~~ 1 warning generated. Guard the variable declaration with CONFIG_IOMMU_DEBUGFS to silence the warning. Signed-off-by: Will Deacon <will@kernel.org>	2024-10-29 15:58:24 +00:00
Pratyush Brahma	229e6ee43d	iommu/arm-smmu: Defer probe of clients after smmu device bound Null pointer dereference occurs due to a race between smmu driver probe and client driver probe, when of_dma_configure() for client is called after the iommu_device_register() for smmu driver probe has executed but before the driver_bound() for smmu driver has been called. Following is how the race occurs: T1:Smmu device probe T2: Client device probe really_probe() arm_smmu_device_probe() iommu_device_register() really_probe() platform_dma_configure() of_dma_configure() of_dma_configure_id() of_iommu_configure() iommu_probe_device() iommu_init_device() arm_smmu_probe_device() arm_smmu_get_by_fwnode() driver_find_device_by_fwnode() driver_find_device() next_device() klist_next() /* null ptr assigned to smmu / / null ptr dereference while smmu->streamid_mask */ driver_bound() klist_add_tail() When this null smmu pointer is dereferenced later in arm_smmu_probe_device, the device crashes. Fix this by deferring the probe of the client device until the smmu device has bound to the arm smmu driver. Fixes: `021bb8420d` ("iommu/arm-smmu: Wire up generic configuration support") Cc: stable@vger.kernel.org Co-developed-by: Prakash Gupta <quic_guptap@quicinc.com> Signed-off-by: Prakash Gupta <quic_guptap@quicinc.com> Signed-off-by: Pratyush Brahma <quic_pbrahma@quicinc.com> Link: https://lore.kernel.org/r/20241004090428.2035-1-quic_pbrahma@quicinc.com [will: Add comment] Signed-off-by: Will Deacon <will@kernel.org>	2024-10-29 15:28:06 +00:00
Mostafa Saleh	d64c805337	iommu/io-pgtable-arm: Add self test for the last page in the IAS Add a case in the selftests that can detect some bugs with concatenated page tables, where it maps the biggest supported page size at the end of the IAS, this test would fail without the previous fix. Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241024162516.2005652-3-smostafa@google.com Signed-off-by: Will Deacon <will@kernel.org>	2024-10-29 15:28:06 +00:00
Mostafa Saleh	d71fa842d3	iommu/io-pgtable-arm: Fix stage-2 map/unmap for concatenated tables ARM_LPAE_LVL_IDX() takes into account concatenated PGDs and can return an index spanning multiple page-table pages given a sufficiently large input address. However, when the resulting index is used to calculate the number of remaining entries in the page, the possibility of concatenation is ignored and we end up computing a negative upper bound: max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start; On the map path, this results in a negative 'mapped' value being returned but on the unmap path we can leak child tables if they are skipped in __arm_lpae_free_pgtable(). Introduce an arm_lpae_max_entries() helper to convert a table index into the remaining number of entries within a single page-table page. Cc: <stable@vger.kernel.org> Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241024162516.2005652-2-smostafa@google.com [will: Tweaked comment and commit message] Signed-off-by: Will Deacon <will@kernel.org>	2024-10-29 15:27:57 +00:00
Jason Gunthorpe	4490ccc45f	iommu: Create __iommu_alloc_identity_domain() Consolidate all the code to create an IDENTITY domain into one function. This removes the legacy __iommu_domain_alloc() path from all paths, and preps it for final removal. BLOCKED/IDENTITY/PAGING are now always allocated via a type specific function. [Joerg: Actually remove __iommu_domain_alloc()] Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241028093810.5901-13-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:37:03 +01:00
Jason Gunthorpe	4208849ec7	iommu: Put domain allocation in __iommu_group_alloc_blocking_domain() There is no longer a reason to call __iommu_domain_alloc() to allocate the blocking domain. All drivers that support a native blocking domain provide it via the ops, for other drivers we should call iommu_paging_domain_alloc(). __iommu_group_alloc_blocking_domain() is the only place that allocates an BLOCKED domain, so move the ops->blocked_domain logic there. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241028093810.5901-12-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:08:23 +01:00
Vasant Hegde	4402f2627d	iommu/amd: Implement global identity domain Implement global identity domain. All device groups in identity domain will share this domain. In attach device path, based on device capability it will allocate per device domain ID and GCR3 table. So that it can support SVA. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241028093810.5901-11-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:08:22 +01:00
Vasant Hegde	ce2cd17546	iommu/amd: Enhance amd_iommu_domain_alloc_user() Previous patch enhanced core layer to check device PASID capability and pass right flags to ops->domain_alloc_user(). Enhance amd_iommu_domain_alloc_user() to allocate domain with appropriate page table based on flags parameter. - If flags is empty then allocate domain with default page table type. This will eventually replace ops->domain_alloc(). For UNMANAGED domain, core will call this interface with flags=0. So AMD driver will continue to allocate V1 page table. - If IOMMU_HWPT_ALLOC_PASID flags is passed then allocate domain with v2 page table. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241028093810.5901-10-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:08:22 +01:00
Vasant Hegde	a005ef62f9	iommu/amd: Pass page table type as param to pdom_setup_pgtable() Current code forces v1 page table for UNMANAGED domain and global page table type (amd_iommu_pgtable) for rest of paging domain. Following patch series adds support for domain_alloc_paging() ops. Also enhances domain_alloc_user() to allocate page table based on 'flags. Hence pass page table type as parameter to pdomain_setup_pgtable(). So that caller can decide right page table type. Also update dma_max_address() to take pgtable as parameter. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jacob Pan <jacob.pan@linux.microsoft.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241028093810.5901-9-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:08:21 +01:00
Vasant Hegde	b3c989083d	iommu/amd: Separate page table setup from domain allocation Currently protection_domain_alloc() allocates domain and also sets up page table. Page table setup is required for PAGING domain only. Domain type like SVA doesn't need page table. Hence move page table setup code to separate function. Also SVA domain allocation path does not call pdom_setup_pgtable(). Hence remove IOMMU_DOMAIN_SVA type check. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jacob Pan <jacob.pan@linux.microsoft.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241028093810.5901-8-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:08:21 +01:00
Vasant Hegde	d15f55d645	iommu/amd: Move V2 page table support check to early_amd_iommu_init() amd_iommu_pgtable validation has to be done before calling iommu_snp_enable(). It can be done immediately after reading IOMMU features. Hence move this check to early_amd_iommu_init(). Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241028093810.5901-7-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:08:20 +01:00
Vasant Hegde	b0ffdb23e9	iommu/amd: Add helper function to check GIOSUP/GTSUP amd_iommu_gt_ppr_supported() only checks for GTSUP. To support PASID with V2 page table we need GIOSUP as well. Hence add new helper function to check GIOSUP/GTSUP. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241028093810.5901-6-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:08:20 +01:00
Vasant Hegde	60c30aa6af	iommu/arm-smmu-v3: Enhance domain_alloc_user() to allocate PASID capable domain Core layer is modified to call domain_alloc_user() to allocate PASID capable domain. Enhance arm_smmu_domain_alloc_user() to allocate PASID capable domain based on the 'flags' parameter. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241028093810.5901-5-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:08:19 +01:00
Jason Gunthorpe	b7a0855eb9	iommu: Add new flag to explictly request PASID capable domain Introduce new flag (IOMMU_HWPT_ALLOC_PASID) to domain_alloc_users() ops. If IOMMU supports PASID it will allocate domain. Otherwise return error. In error path check for -EOPNOTSUPP and try to allocate non-PASID domain so that DMA-API mode work fine for drivers which does not support PASID as well. Also modify __iommu_group_alloc_default_domain() to call iommu_paging_domain_alloc_flags() with appropriate flag when allocating paging domain. Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241028093810.5901-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:08:18 +01:00
Jason Gunthorpe	20858d4ebb	iommu: Introduce iommu_paging_domain_alloc_flags() Currently drivers calls iommu_paging_domain_alloc(dev) to get an UNMANAGED domain. This is not sufficient to support PASID with UNMANAGED domain as some HW like AMD requires certain page table type to support PASIDs. Also the domain_alloc_paging op only passes device as param for domain allocation. This is not sufficient for AMD driver to decide the right page table. Instead of extending ops->domain_alloc_paging() it was decided to enhance ops->domain_alloc_user() so that caller can pass various additional flags. Hence add iommu_paging_domain_alloc_flags() API which takes flags as parameter. Caller can pass additional parameter to indicate type of domain required, etc. iommu_paging_domain_alloc_flags() internally calls appropriate callback function to allocate a domain. Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca> [Added description - Vasant] Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241028093810.5901-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:07:37 +01:00
Jason Gunthorpe	541b967f5a	iommu: Refactor __iommu_domain_alloc() Following patch will introduce iommu_paging_domain_alloc_flags() API. Hence move domain init code to separate function so that it can be reused. Also move iommu_get_dma_cookie() setup iommu_setup_default_domain() as it is required in DMA API mode only. Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca> [Split the patch and added description - Vasant] Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241028093810.5901-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:06:17 +01:00
Lu Baolu	f6440fcc9c	iommu: Remove iommu_domain_alloc() The iommu_domain_alloc() interface is no longer used in the tree anymore. Remove it to avoid dead code. There is increasing demand for supporting multiple IOMMU drivers, and this is the last bus-based thing standing in the way of that. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241009041147.28391-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 10:04:42 +01:00
Jason Gunthorpe	e3a682eaf2	iommu/amd: Fix corruption when mapping large pages from 0 If a page is mapped starting at 0 that is equal to or larger than can fit in the current mode (number of table levels) it results in corrupting the mapping as the following logic assumes the mode is correct for the page size being requested. There are two issues here, the check if the address fits within the table uses the start address, it should use the last address to ensure that last byte of the mapping fits within the current table mode. The second is if the mapping is exactly the size of the full page table it has to add another level to instead hold a single IOPTE for the large size. Since both corner cases require a 0 IOVA to be hit and doesn't start until a page size of 2^48 it is unlikely to ever hit in a real system. Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-27ab08d646a1+29-amd_0map_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:55:48 +01:00
Jason Gunthorpe	69e5a17511	iommu: Remove useless flush from iommu_create_device_direct_mappings() These days iommu_map() does not require external flushing, it always internally handles any required flushes. Since iommu_create_device_direct_mappings() only calls iommu_map(), remove the extra call. Since this is the last call site for iommu_flush_iotlb_all() remove it too. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-bb6c694e1b07+a29e1-iommu_no_flush_all_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:52:11 +01:00
Bartosz Golaszewski	a2f528e914	iommu/sysfs: constify the class struct All functions that take the class address as argument expect a const pointer so we can make the iommu class constant. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241018121725.61128-1-brgl@bgdev.pl Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:50:24 +01:00
Konrad Adamczyk	3ab21ad3d5	iommu/mediatek: Add PGTABLE_PA_35_EN to mt8186 platform data The MT8186 chip supports 35-bit physical addresses in page table [1]. Set this platform flag. [1] MT8186G_Application Processor Functional Specification_v1.0 Signed-off-by: Konrad Adamczyk <konrada@google.com> Reviewed-by: Yong Wu <yong.wu@mediatek.com> Link: https://lore.kernel.org/r/20241017112036.368772-1-konrada@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:49:11 +01:00
Vasant Hegde	3f6eeada69	iommu/amd: Do not try copy old DTE resume path In suspend/resume path, no need to copy old DTE (early_enable_iommus()). Just need to reload IOMMU hardware. This is the side effect of commit `3ac3e5ee5e` ("iommu/amd: Copy old trans table from old kernel") which changed early_enable_iommus() but missed to fix enable_iommus(). Resume path continue to work as 'amd_iommu_pre_enabled' is set to false and copy_device_table() will fail. It will just re-loaded IOMMU. Hence I think we don't need to backport this to stable tree. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20241016084958.99727-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:47:02 +01:00
Tomasz Jeznach	488ffbf181	iommu/riscv: Paging domain support Introduce first-stage address translation support. Page table configured by the IOMMU driver will use the highest mode implemented by the hardware, unless not known at the domain allocation time falling back to the CPU’s MMU page mode. This change introduces IOTINVAL.VMA command, required to invalidate any cached IOATC entries after mapping is updated and/or removed from the paging domain. Invalidations for the non-leaf page entries use IOTINVAL for all addresses assigned to the protection domain for hardware not supporting more granular non-leaf page table cache invalidations. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/1109202d389f51c7121cb1460eb2f21429b9bd5d.1729059707.git.tjeznach@rivosinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:46:30 +01:00
Tomasz Jeznach	856c0cfe5c	iommu/riscv: Command and fault queue support Introduce device command submission and fault reporting queues, as described in Chapter 3.1 and 3.2 of the RISC-V IOMMU Architecture Specification. Command and fault queues are instantiated in contiguous system memory local to IOMMU device domain, or mapped from fixed I/O space provided by the hardware implementation. Detection of the location and maximum allowed size of the queue utilize WARL properties of queue base control register. Driver implementation will try to allocate up to 128KB of system memory, while respecting hardware supported maximum queue size. Interrupts allocation is based on interrupt vectors availability and distributed to all queues in simple round-robin fashion. For hardware Implementation with fixed event type to interrupt vector assignment IVEC WARL property is used to discover such mappings. Address translation, command and queue fault handling in this change is limited to simple fault reporting without taking any action. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/c4735fb6829053eff37ce1bcca4906192afd743c.1729059707.git.tjeznach@rivosinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:46:29 +01:00
Tomasz Jeznach	1bac10c557	iommu/riscv: Device directory management. Introduce device context allocation and device directory tree management including capabilities discovery sequence, as described in Chapter 2.1 of the RISC-V IOMMU Architecture Specification. Device directory mode will be auto detected using DDTP WARL property, using highest mode supported by the driver and hardware. If none supported can be configured, driver will fall back to global pass-through. First level DDTP page can be located in I/O (detected using DDTP WARL) and system memory. Only simple identity and blocking protection domains are supported by this implementation. Co-developed-by: Nick Kossifidis <mick@ics.forth.gr> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/e1c763aeccd2c05fd4ad3a32f6f2ff3b3148d907.1729059707.git.tjeznach@rivosinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:46:28 +01:00
Tomasz Jeznach	822e8bc685	iommu/riscv: Enable IOMMU registration and device probe. Advertise IOMMU device and its core API. Only minimal implementation for single identity domain type, without per-group domain protection. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/ba79c8eb9c7f1cd9a8961a1b048e3991ee9a2b05.1729059707.git.tjeznach@rivosinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:46:27 +01:00
Tomasz Jeznach	68682e9578	iommu/riscv: Add RISC-V IOMMU PCIe device driver Introduce device driver for PCIe implementation of RISC-V IOMMU architected hardware. IOMMU hardware and system support for MSI or MSI-X is required by this implementation. Vendor and device identifiers used in this patch matches QEMU implementation of the RISC-V IOMMU PCIe device, from Rivos VID (0x1efd) range allocated by the PCI-SIG. MAINTAINERS \| added iommu-pci.c already covered by matching pattern. Link: https://lore.kernel.org/qemu-devel/20240307160319.675044-1-dbarboza@ventanamicro.com/ Co-developed-by: Nick Kossifidis <mick@ics.forth.gr> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/12f3bdbe519ebb7ca482191e7334d38b25b8ae8f.1729059707.git.tjeznach@rivosinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:46:26 +01:00
Tomasz Jeznach	5c0ebbd3c6	iommu/riscv: Add RISC-V IOMMU platform device driver Introduce platform device driver for implementation of RISC-V IOMMU architected hardware. Hardware interface definition located in file iommu-bits.h is based on ratified RISC-V IOMMU Architecture Specification version 1.0.0. This patch implements platform device initialization, early check and configuration of the IOMMU interfaces and enables global pass-through address translation mode (iommu_mode == BARE), without registering hardware instance in the IOMMU subsystem. Link: https://github.com/riscv-non-isa/riscv-iommu Co-developed-by: Nick Kossifidis <mick@ics.forth.gr> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/2f2e4530c0ee4a81385efa90f1da932f5179f3fb.1729059707.git.tjeznach@rivosinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2024-10-29 09:46:23 +01:00
Steve Sistare	976a40c075	iommufd: File mappings for mdev Support file mappings for mediated devices, aka mdevs. Access is initiated by the vfio_pin_pages() and vfio_dma_rw() kernel interfaces. Link: https://patch.msgid.link/r/1729861919-234514-9-git-send-email-steven.sistare@oracle.com Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-10-28 13:24:24 -03:00

... 4 5 6 7 8 ...

5989 Commits