linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-29 02:59:13 +00:00

Author	SHA1	Message	Date
Ethan Milon	3141153816	iommu/vt-d: Fix missing PASID in dev TLB flush with cache_tag_flush_all The function cache_tag_flush_all() was originally implemented with incorrect device TLB invalidation logic that does not handle PASID, in commit `c4d27ffaa8` ("iommu/vt-d: Add cache tag invalidation helpers") This causes regressions where full address space TLB invalidations occur with a PASID attached, such as during transparent hugepage unmapping in SVA configurations or when calling iommu_flush_iotlb_all(). In these cases, the device receives a TLB invalidation that lacks PASID. This incorrect logic was later extracted into cache_tag_flush_devtlb_all(), in commit `3297d047cd` ("iommu/vt-d: Refactor IOTLB and Dev-IOTLB flush for batching") The fix replaces the call to cache_tag_flush_devtlb_all() with cache_tag_flush_devtlb_psi(), which properly handles PASID. Fixes: `4f609dbff5` ("iommu/vt-d: Use cache helpers in arch_invalidate_secondary_tlbs") Fixes: `4e589a5368` ("iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all") Signed-off-by: Ethan Milon <ethan.milon@eviden.com> Link: https://lore.kernel.org/r/20250708214821.30967-1-ethan.milon@eviden.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-11-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-14 11:18:04 +01:00
Jason Gunthorpe	85cfaacc99	iommu/vt-d: Split paging_domain_compatible() Make First/Second stage specific functions that follow the same pattern in intel_iommu_domain_alloc_first/second_stage() for computing EOPNOTSUPP. This makes the code easier to understand as if we couldn't create a domain with the parameters for this IOMMU instance then we certainly are not compatible with it. Check superpage support directly against the per-stage cap bits and the pgsize_bitmap. Add a note that the force_snooping is read without locking. The locking needs to cover the compatible check and the add of the device to the list. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-10-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-14 11:18:04 +01:00
Jason Gunthorpe	0fa6f08934	iommu/vt-d: Split intel_iommu_enforce_cache_coherency() First Stage and Second Stage have very different ways to deny no-snoop. The first stage uses the PGSNP bit which is global per-PASID so enabling requires loading new PASID entries for all the attached devices. Second stage uses a bit per PTE, so enabling just requires telling future maps to set the bit. Since we now have two domain ops we can have two functions that can directly code their required actions instead of a bunch of logic dancing around use_first_level. Combine domain_set_force_snooping() into the new functions since they are the only caller. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-9-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-14 11:18:04 +01:00
Jason Gunthorpe	b33125296b	iommu/vt-d: Create unique domain ops for each stage Use the domain ops pointer to tell what kind of domain it is instead of the internal use_first_level indication. This also protects against wrongly using a SVA/nested/IDENTITY/BLOCKED domain type in places they should not be. The only remaining uses of use_first_level outside the paging domain are in paging_domain_compatible() and intel_iommu_enforce_cache_coherency(). Thus, remove the useless sets of use_first_level in intel_svm_domain_alloc() and intel_iommu_domain_alloc_nested(). None of the unique ops for these domain types ever reference it on their call chains. Add a WARN_ON() check in domain_context_mapping_one() as it only works with second stage. This is preparation for iommupt which will have different ops for each of the stages. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-8-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-14 11:18:03 +01:00
Jason Gunthorpe	b9434ba97c	iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags() Create stage specific functions that check the stage specific conditions if each stage can be supported. Have intel_iommu_domain_alloc_paging_flags() call both stages in sequence until one does not return EOPNOTSUPP and prefer to use the first stage if available and suitable for the requested flags. Move second stage only operations like nested_parent and dirty_tracking into the second stage function for clarity. Move initialization of the iommu_domain members into paging_domain_alloc(). Drop initialization of domain->owner as the callers all do it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-7-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-14 11:18:03 +01:00
Jason Gunthorpe	5c3687d578	iommu/vt-d: Do not wipe out the page table NID when devices detach The NID is used to control which NUMA node memory for the page table is allocated it from. It should be a permanent property of the page table when it was allocated and not change during attach/detach of devices. Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Fixes: `7c204426b8` ("iommu/vt-d: Add domain_alloc_paging support") Link: https://lore.kernel.org/r/20250714045028.958850-6-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-14 11:18:03 +01:00
Jason Gunthorpe	00939bebe5	iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free() It has only one caller, no need for two functions. Correct the WARN_ON() error handling to leak the entire page table if the HW is still referencing it so we don't UAF during WARN_ON recovery. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-5-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-14 11:18:03 +01:00
Jason Gunthorpe	cd0d0e4e48	iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Pass the phys_addr_t down through the call chain from the top instead of passing a pgd_t * KVA. This moves the __pa() into domain_setup_first_level() which is the first function to obtain the pgd from the IOMMU page table in this call chain. The SVA flow is also adjusted to get the pa of the mm->pgd. iommput will move the __pa() into iommupt code, it never shares the KVA of the page table with the driver. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-4-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-14 11:18:03 +01:00
Lu Baolu	12724ce3fe	iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes The iotlb_sync_map iommu ops allows drivers to perform necessary cache flushes when new mappings are established. For the Intel iommu driver, this callback specifically serves two purposes: - To flush caches when a second-stage page table is attached to a device whose iommu is operating in caching mode (CAP_REG.CM==1). - To explicitly flush internal write buffers to ensure updates to memory- resident remapping structures are visible to hardware (CAP_REG.RWBF==1). However, in scenarios where neither caching mode nor the RWBF flag is active, the cache_tag_flush_range_np() helper, which is called in the iotlb_sync_map path, effectively becomes a no-op. Despite being a no-op, cache_tag_flush_range_np() involves iterating through all cache tags of the iommu's attached to the domain, protected by a spinlock. This unnecessary execution path introduces overhead, leading to a measurable I/O performance regression. On systems with NVMes under the same bridge, performance was observed to drop from approximately ~6150 MiB/s down to ~4985 MiB/s. Introduce a flag in the dmar_domain structure. This flag will only be set when iotlb_sync_map is required (i.e., when CM or RWBF is set). The cache_tag_flush_range_np() is called only for domains where this flag is set. This flag, once set, is immutable, given that there won't be mixed configurations in real-world scenarios where some IOMMUs in a system operate in caching mode while others do not. Theoretically, the immutability of this flag does not impact functionality. Reported-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2115738 Link: https://lore.kernel.org/r/20250701171154.52435-1-ioanna-maria.alifieraki@canonical.com Fixes: `129dab6e12` ("iommu/vt-d: Use cache_tag_flush_range_np() in iotlb_sync_map") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250703031545.3378602-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250714045028.958850-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-14 11:18:03 +01:00
Vineeth Pillai (Google)	bd26cd9d81	iommu/vt-d: Remove the CONFIG_X86 wrapping from iommu init hook iommu init hook is wrapped in CONFI_X86 and is a remnant of dmar.c when it was a common code in "drivers/pci/dmar.c". This was added in commit (`9d5ce73a64` x86: intel-iommu: Convert detect_intel_iommu to use iommu_init hook) Now this is built only for x86. This config wrap could be removed. Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250616131740.3499289-1-vineeth@bitbyteword.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-2-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2025-07-14 11:18:03 +01:00
Nicolin Chen	32b2d3a57e	iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support Add a new vEVENTQ type for VINTFs that are assigned to the user space. Simply report the two 64-bit LVCMDQ_ERR_MAPs register values. Link: https://patch.msgid.link/r/68161a980da41fa5022841209638aeff258557b5.1752126748.git.nicolinc@nvidia.com Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:36 -03:00
Nicolin Chen	4dc0d12474	iommu/tegra241-cmdqv: Add user-space use support The CMDQV HW supports a user-space use for virtualization cases. It allows the VM to issue guest-level TLBI or ATC_INV commands directly to the queue and executes them without a VMEXIT, as HW will replace the VMID field in a TLBI command and the SID field in an ATC_INV command with the preset VMID and SID. This is built upon the vIOMMU infrastructure by allowing VMM to allocate a VINTF (as a vIOMMU object) and assign VCMDQs (HW QUEUE objs) to the VINTF. So firstly, replace the standard vSMMU model with the VINTF implementation but reuse the standard cache_invalidate op (for unsupported commands) and the standard alloc_domain_nested op (for standard nested STE). Each VINTF has two 64KB MMIO pages (128B per logical VCMDQ): - Page0 (directly accessed by guest) has all the control and status bits. - Page1 (trapped by VMM) has guest-owned queue memory location/size info. VMM should trap the emulated VINTF0's page1 of the guest VM for the guest- level VCMDQ location/size info and forward that to the kernel to translate to a physical memory location to program the VCMDQ HW during an allocation call. Then, it should mmap the assigned VINTF's page0 to the VINTF0 page0 of the guest VM. This allows the guest OS to read and write the guest-own VINTF's page0 for direct control of the VCMDQ HW. For ATC invalidation commands that hold an SID, it requires all devices to register their virtual SIDs to the SID_MATCH registers and their physical SIDs to the pairing SID_REPLACE registers, so that HW can use those as a lookup table to replace those virtual SIDs with the correct physical SIDs. Thus, implement the driver-allocated vDEVICE op with a tegra241_vintf_sid structure to allocate SID_REPLACE and to program the SIDs accordingly. This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to the standard SMMUv3 operating in the nested translation mode trapping CMDQ for TLBI and ATC_INV commands, this gives a huge performance improvement: 70% to 90% reductions of invalidation time were measured by various DMA unmap tests running in a guest OS. Link: https://patch.msgid.link/r/fb0eab83f529440b6aa181798912a6f0afa21eb0.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:36 -03:00
Nicolin Chen	81f81db632	iommu/tegra241-cmdqv: Do not statically map LVCMDQs To simplify the mappings from global VCMDQs to VINTFs' LVCMDQs, the design chose to do static allocations and mappings in the global reset function. However, with the user-owned VINTF support, it exposes a security concern: if user space VM only wants one LVCMDQ for a VINTF, statically mapping two or more LVCMDQs creates a hidden VCMDQ that user space could DoS attack by writing random stuff to overwhelm the kernel with unhandleable IRQs. Thus, to support the user-owned VINTF feature, a LVCMDQ mapping has to be done dynamically. HW allows pre-assigning global VCMDQs in the CMDQ_ALLOC registers, without finalizing the mappings by keeping CMDQV_CMDQ_ALLOCATED=0. So, add a pair of map/unmap helper that simply sets/clears that bit. For kernel-owned VINTF0, move LVCMDQ mappings to tegra241_vintf_hw_init(), and the unmappings to tegra241_vintf_hw_deinit(). For user-owned VINTFs that will be added, the mappings/unmappings will be on demand upon an LVCMDQ allocation from the user space. However, the dynamic LVCMDQ mapping/unmapping can complicate the timing of calling tegra241_vcmdq_hw_init/deinit(), which write LVCMDQ address space, i.e. requiring LVCMDQ to be mapped. Highlight that with a note to the top of either of them. Link: https://patch.msgid.link/r/be115a8f75537632daf5995b3e583d8a76553fba.1752126748.git.nicolinc@nvidia.com Acked-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:36 -03:00
Nicolin Chen	589899ee29	iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() The current flow of tegra241_cmdqv_remove_vintf() is: 1. For each LVCMDQ, tegra241_vintf_remove_lvcmdq(): a. Disable the LVCMDQ HW b. Release the LVCMDQ SW resource 2. For current VINTF, tegra241_vintf_hw_deinit(): c. Disable all LVCMDQ HWs d. Disable VINTF HW Obviously, the step 1.a and the step 2.c are redundant. Since tegra241_vintf_hw_deinit() disables all of its LVCMDQ HWs, it could simplify the flow in tegra241_cmdqv_remove_vintf() by calling that first: 1. For current VINTF, tegra241_vintf_hw_deinit(): a. Disable all LVCMDQ HWs b. Disable VINTF HW 2. Release all LVCMDQ SW resources Drop tegra241_vintf_remove_lvcmdq(), and move tegra241_vintf_free_lvcmdq() as the new step 2. Link: https://patch.msgid.link/r/86c97c8c4ee9ca192e7e7fa3007c10399d792ce6.1752126748.git.nicolinc@nvidia.com Acked-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:36 -03:00
Nicolin Chen	1eb468744c	iommu/tegra241-cmdqv: Use request_threaded_irq A vEVENT can be reported only from a threaded IRQ context. Change to using request_threaded_irq to support that. Link: https://patch.msgid.link/r/f160193980e3b273afbd1d9cfc3e360084c05ba6.1752126748.git.nicolinc@nvidia.com Acked-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:35 -03:00
Nicolin Chen	9eb6a666df	iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops This will be used by Tegra241 CMDQV implementation to report a non-default HW info data. Link: https://patch.msgid.link/r/8a3bf5709358eb21aed2e8434534c30ecf83917c.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:35 -03:00
Nicolin Chen	61dd912ee0	iommu/arm-smmu-v3-iommufd: Add vsmmu_size/type and vsmmu_init impl ops An impl driver might want to allocate its own type of vIOMMU object or the standard IOMMU_VIOMMU_TYPE_ARM_SMMUV3 by setting up its own SW/HW bits, as the tegra241-cmdqv driver will add IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV. Add vsmmu_size/type and vsmmu_init to struct arm_smmu_impl_ops. Prioritize them in arm_smmu_get_viommu_size() and arm_vsmmu_init(). Link: https://patch.msgid.link/r/375ac2b056764534bb7c10ecc4f34a0bae82b108.1752126748.git.nicolinc@nvidia.com Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:35 -03:00
Nicolin Chen	a9f10bab2e	iommufd: Allow an input data_type via iommu_hw_info The iommu_hw_info can output via the out_data_type field the vendor data type from a driver, but this only allows driver to report one data type. Now, with SMMUv3 having a Tegra241 CMDQV implementation, it has two sets of types and data structs to report. One way to support that is to use the same type field bidirectionally. Reuse the same field by adding an "in_data_type", allowing user space to request for a specific type and to get the corresponding data. For backward compatibility, since the ioctl handler has never checked an input value, add an IOMMU_HW_INFO_FLAG_INPUT_TYPE to switch between the old output-only field and the new bidirectional field. Link: https://patch.msgid.link/r/887378a7167e1786d9d13cde0c36263ed61823d7.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:35 -03:00
Nicolin Chen	62622a8753	iommu: Allow an input type in hw_info op The hw_info uAPI will support a bidirectional data_type field that can be used as an input field for user space to request for a specific info data. To prepare for the uAPI update, change the iommu layer first: - Add a new IOMMU_HW_INFO_TYPE_DEFAULT as an input, for which driver can output its only (or firstly) supported type - Update the kdoc accordingly - Roll out the type validation in the existing drivers Link: https://patch.msgid.link/r/00f4a2d3d930721f61367014717b3ba2d1e82a81.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:35 -03:00
Nicolin Chen	80478a2b45	iommufd/selftest: Add coverage for the new mmap interface Extend the loopback test to a new mmap page. Link: https://patch.msgid.link/r/b02b1220c955c3cf9ea5dd9fe9349ab1b4f8e20b.1752126748.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:35 -03:00
Nicolin Chen	56e9a0d8e5	iommufd: Add mmap interface For vIOMMU passing through HW resources to user space (VMs), allowing a VM to control the passed through HW directly by accessing hardware registers, add an mmap infrastructure to map the physical MMIO pages to user space. Maintain a maple tree per ictx as a translation table managing mmappable regions, from an allocated for-user mmap offset to an iommufd_mmap struct, where it stores the real physical address range for io_remap_pfn_range(). Keep track of the lifecycle of the mmappable region by taking refcount of its owner, so as to enforce user space to unmap the region first before it can destroy its owner object. To allow an IOMMU driver to add and delete mmappable regions onto/from the maple tree, add iommufd_viommu_alloc/destroy_mmap helpers. Link: https://patch.msgid.link/r/9a888a326b12aa5fe940083eae1156304e210fe0.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:35 -03:00
Nicolin Chen	20896914da	iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC Some simple tests for IOMMUFD_CMD_HW_QUEUE_ALLOC infrastructure covering the new iommufd_hw_queue_depend/undepend() helpers. Link: https://patch.msgid.link/r/e8a194d187d7ef445f43e4a3c04fb39472050afd.1752126748.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 11:09:26 -03:00
Nicolin Chen	0b37d892d0	iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers NVIDIA Virtual Command Queue is one of the iommufd users exposing vIOMMU features to user space VMs. Its hardware has a strict rule when mapping and unmapping multiple global CMDQVs to/from a VM-owned VINTF, requiring mappings in ascending order and unmappings in descending order. The tegra241-cmdqv driver can apply the rule for a mapping in the LVCMDQ allocation handler. However, it can't do the same for an unmapping since user space could start random destroy calls breaking the rule, while the destroy op in the driver level can't reject a destroy call as it returns void. Add iommufd_hw_queue_depend/undepend for-driver helpers, allowing LVCMDQ allocator to refcount_inc() a sibling LVCMDQ object and LVCMDQ destroyer to refcount_dec(), so that iommufd core will help block a random destroy call that breaks the rule. This is a bit of compromise, because a driver might end up with abusing the API that deadlocks the objects. So restrict the API to a dependency between two driver-allocated objects of the same type, as iommufd would unlikely build any core-level dependency in this case. And encourage to use the macro version that currently supports the HW QUEUE objects only. Link: https://patch.msgid.link/r/2735c32e759c82f2e6c87cb32134eaf09b7589b5.1752126748.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 11:09:26 -03:00
Nicolin Chen	2238ddc2b0	iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl Introduce a new IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl for user space to allocate a HW QUEUE object for a vIOMMU specific HW-accelerated queue, e.g.: - NVIDIA's Virtual Command Queue - AMD vIOMMU's Command Buffer, Event Log Buffers, and PPR Log Buffers Since this is introduced with NVIDIA's VCMDQs that access the guest memory in the physical address space, add an iommufd_hw_queue_alloc_phys() helper that will create an access object to the queue memory in the IOAS, to avoid the mappings of the guest memory from being unmapped, during the life cycle of the HW queue object. AMD's HW will need an hw_queue_init op that is mutually exclusive with the hw_queue_init_phys op, and their case will bypass the access part, i.e. no iommufd_hw_queue_alloc_phys() call. Link: https://patch.msgid.link/r/dab4ace747deb46c1fe70a5c663307f46990ae56.1752126748.git.nicolinc@nvidia.com Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 11:09:26 -03:00
Nicolin Chen	ed42eee797	iommufd/viommu: Add driver-defined vDEVICE support NVIDIA VCMDQ driver will have a driver-defined vDEVICE structure and do some HW configurations with that. To allow IOMMU drivers to define their own vDEVICE structures, move the struct iommufd_vdevice to the public header and provide a pair of viommu ops, similar to get_viommu_size and viommu_init. Doing this, however, creates a new window between the vDEVICE allocation and its driver-level initialization, during which an abort could happen but it can't invoke a driver destroy function from the struct viommu_ops since the driver structure isn't initialized yet. vIOMMU object doesn't have this problem, since its destroy op is set via the viommu_ops by the driver viommu_init function. Thus, vDEVICE should do something similar: add a destroy function pointer inside the struct iommufd_vdevice instead of the struct iommufd_viommu_ops. Note that there is unlikely a use case for a type dependent vDEVICE, so a static vdevice_size is probably enough for the near term instead of a get_vdevice_size function op. Link: https://patch.msgid.link/r/1e751c01da7863c669314d8e27fdb89eabcf5605.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 11:09:26 -03:00
Nicolin Chen	27b77ea5fe	iommufd/access: Bypass access->ops->unmap for internal use The access object has been used externally by VFIO mdev devices, allowing them to pin/unpin physical pages (via needs_pin_pages). Meanwhile, a racy unmap can occur in this case, so these devices usually implement an unmap handler, invoked by iommufd_access_notify_unmap(). The new HW queue object will need the same pin/unpin feature, although it (unlike the mdev case) wants to reject any unmap attempt, during its life cycle. Instead, it would not implement an unmap handler. Thus, bypass any access->ops->unmap access call when the access is marked as internal. Also, an area being pinned by an internal access should reject any unmap request. This cannot be done inside iommufd_access_notify_unmap() as it's a per-iopt action. Add a "num_locks" counter in the struct iopt_area, set that in iopt_area_add_access() when the caller is an internal access. Link: https://patch.msgid.link/r/6df9a43febf79c0379091ec59747276ce9d2493b.1752126748.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 11:09:22 -03:00
Easwar Hariharan	c694bc8b61	iommu/amd: Enable PASID and ATS capabilities in the correct order Per the PCIe spec, behavior of the PASID capability is undefined if the value of the PASID Enable bit changes while the Enable bit of the function's ATS control register is Set. Unfortunately, pdev_enable_caps() does exactly that by ordering enabling ATS for the device before enabling PASID. Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Fixes: `eda8c2860a` ("iommu/amd: Enable device ATS/PASID/PRI capabilities independently") Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250703155433.6221-1-eahariha@linux.microsoft.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-07-11 09:13:58 +02:00
Nicolin Chen	1c26c3bbde	iommufd/access: Add internal APIs for HW queue to use The new HW queue object, as an internal iommufd object, wants to reuse the struct iommufd_access to pin some iova range in the iopt. However, an access generally takes the refcount of an ictx. So, in such an internal case, a deadlock could happen when the release of the ictx has to wait for the release of the access first when releasing a hw_queue object, which could wait for the release of the ictx that is refcounted: ictx --releases--> hw_queue --releases--> access ^ \| \|_________________releases________________v To address this, add a set of lightweight internal APIs to unlink the ictx and the access, i.e. no ictx refcounting by the access: ictx --releases--> hw_queue --releases--> access Then, there's no point in setting the access->ictx. So simply define !ictx as an flag for an internal use and add an inline helper. Link: https://patch.msgid.link/r/d8d84bf99cbebec56034b57b966a3d431385b90d.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-10 12:38:51 -03:00
Nicolin Chen	afeaf592c1	iommufd/selftest: Support user_data in mock_viommu_alloc Add a simple user_data for an input-to-output loopback test. Link: https://patch.msgid.link/r/cae4632bb3d98a1efb3b77488fbf81814f2041c6.1752126748.git.nicolinc@nvidia.com Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-10 12:38:51 -03:00
Nicolin Chen	1976cdf61c	iommufd/viommu: Allow driver-specific user data for a vIOMMU object The new type of vIOMMU for tegra241-cmdqv driver needs a driver-specific user data. So, add data_len/uptr to the iommu_viommu_alloc uAPI and pass it in via the viommu_init iommu op. Link: https://patch.msgid.link/r/2315b0e164b355746387e960745ac9154caec124.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Acked-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-10 12:38:51 -03:00
Nicolin Chen	c3436d42f8	iommu: Pass in a driver-level user data structure to viommu_init op The new type of vIOMMU for tegra241-cmdqv allows user space VM to use one of its virtual command queue HW resources exclusively. This requires user space to mmap the corresponding MMIO page from kernel space for direct HW control. To forward the mmap info (offset and length), iommufd should add a driver specific data structure to the IOMMUFD_CMD_VIOMMU_ALLOC ioctl, for driver to output the info during the vIOMMU initialization back to user space. Similar to the existing ioctls and their IOMMU handlers, add a user_data to viommu_init op to bridge between iommufd and drivers. Link: https://patch.msgid.link/r/90bd5637dab7f5507c7a64d2c4826e70431e45a4.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-10 12:38:51 -03:00
Nicolin Chen	4b57c057f9	iommu: Use enum iommu_hw_info_type for type in hw_info op Replace u32 to make it clear. No functional changes. Also simplify the kdoc since the type itself is clear enough. Link: https://patch.msgid.link/r/651c50dee8ab900f691202ef0204cd5a43fdd6a2.1752126748.git.nicolinc@nvidia.com Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-10 12:38:50 -03:00
Nicolin Chen	c50a5de2c4	iommufd/viommu: Explicitly define vdev->virt_id The "id" is too general to get its meaning easily. Rename it explicitly to "virt_id" and update the kdocs for readability. No functional changes. Link: https://patch.msgid.link/r/1fac22d645e6ee555675726faf3798a68315b044.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-10 12:38:50 -03:00
Nicolin Chen	b23e09f999	iommufd: Report unmapped bytes in the error path of iopt_unmap_iova_range There are callers that read the unmapped bytes even when rc != 0. Thus, do not forget to report it in the error path too. Fixes: `8d40205f60` ("iommufd: Add kAPI toward external drivers for kernel access") Link: https://patch.msgid.link/r/e2b61303bbc008ba1a4e2d7c2a2894749b59fdac.1752126748.git.nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-10 12:38:50 -03:00
Nuno Das Neves	faab52b59b	x86/hyperv: Clean up hv_map/unmap_interrupt() return values Fix the return values of these hypercall helpers so they return a negated errno either directly or via hv_result_to_errno(). Update the callers to check for errno instead of using hv_status_success(), and remove redundant error printing. While at it, rearrange some variable declarations to adhere to style guidelines i.e. "reverse fir tree order". Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1751582677-30930-5-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1751582677-30930-5-git-send-email-nunodasneves@linux.microsoft.com>	2025-07-09 23:49:25 +00:00
Lu Baolu	25b1b75bba	iommu/vt-d: Assign devtlb cache tag on ATS enablement Commit <4f1492efb495> ("iommu/vt-d: Revert ATS timing change to fix boot failure") placed the enabling of ATS in the probe_finalize callback. This occurs after the default domain attachment, which is when the ATS cache tag is assigned. Consequently, the device TLB cache tag is missed when the domain is attached, leading to the device TLB not being invalidated in the iommu_unmap paths. Fix this by assigning the CACHE_TAG_DEVTLB cache tag when ATS is enabled. Fixes: `4f1492efb4` ("iommu/vt-d: Revert ATS timing change to fix boot failure") Cc: stable@vger.kernel.org Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250625050135.3129955-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250628100351.3198955-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-07-04 10:33:56 +02:00
Robin Murphy	a695cad695	iommu/mediatek-v1: Tidy up probe_finalize Krzysztof points out that although the driver now supports COMPILE_TEST for other architectures, it does not build cleanly with W=1 where the stubbed-out ARM API can lead to an unused variable warning. Since this is effectively the correct intent of the code in such cases, mark it as __maybe_unused, tidying up some cruft in the process. Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Yong Wu <yong.wu@mediatek.com> Link: https://lore.kernel.org/r/7c78149504900bc6c98a9c48f4418934b72d89ac.1751036478.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-07-04 10:32:43 +02:00
Marc Zyngier	8154f3c0fd	iommu/intel: Convert to msi_create_parent_irq_domain() helper Now that we have a concise helper to create an MSI parent domain, switch the Intel IOMMU remapping over to that. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nam Cao <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241204124549.607054-10-maz@kernel.org Link: https://lore.kernel.org/r/169c793c50be8493cfd9d11affb00e9ed6341c36.1750858125.git.namcao@linutronix.de Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-07-04 10:28:11 +02:00
Marc Zyngier	0eaa67ad3a	iommu/amd: Convert to msi_create_parent_irq_domain() helper Now that we have a concise helper to create an MSI parent domain, switch the AMD IOMMU remapping over to that. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nam Cao <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241204124549.607054-9-maz@kernel.org Link: https://lore.kernel.org/r/92e5ae97a03e4ffc272349d0863cd2cc8f904c44.1750858125.git.namcao@linutronix.de Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-07-04 10:28:10 +02:00
Mikołaj Lenczewski	212c439bdd	iommu/arm: Add BBM Level 2 smmu feature For supporting BBM Level 2 for userspace mappings, we want to ensure that the smmu also supports its own version of BBM Level 2. Luckily, the smmu spec (IHI 0070G 3.21.1.3) is stricter than the aarch64 spec (DDI 0487K.a D8.16.2), so already guarantees that no aborts are raised when BBM level 2 is claimed. Add the feature and testing for it under arm_smmu_sva_supported(). Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250625113435.26849-4-miko.lenczewski@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-06-30 18:09:05 +01:00
Jason Gunthorpe	792ea7b6ca	iommu: Remove ops->pgsize_bitmap No driver uses it now, remove the core code. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/7-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 17:34:11 +02:00
Jason Gunthorpe	53b76df062	iommu/msm: Remove ops->pgsize_bitmap This driver just uses a constant, put it in domain_alloc_paging and use the domain's value instead of ops during finalise. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/0-v1-662aad101e51+45-iommu_rm_ops_pgsize_msm_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 17:34:10 +02:00
Krzysztof Kozlowski	26d1c1f9e3	iommu/omap: Use syscon_regmap_lookup_by_phandle_args Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over syscon_regmap_lookup_by_phandle() combined with getting the syscon argument. Except simpler code this annotates within one line that given phandle has arguments, so grepping for code would be easier. There is also no real benefit in printing errors on missing syscon argument, because this is done just too late: runtime check on static/build-time data. Dtschema and Devicetree bindings offer the static/build-time check for this already. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250624-syscon-phandle-args-iommu-v3-2-1a36487d69b8@linaro.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 09:21:52 +02:00
Krzysztof Kozlowski	217d30bb8e	iommu/omap: Drop redundant check if ti,syscon-mmuconfig exists The syscon_regmap_lookup_by_phandle() will fail if property does not exist, so doing of_property_read_bool() earlier is redundant. Drop that check and move error message to syscon_regmap_lookup_by_phandle() error case while converting it to dev_err_probe(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20250624-syscon-phandle-args-iommu-v3-1-1a36487d69b8@linaro.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 09:21:42 +02:00
Simon Xue	62e062a29a	iommu/rockchip: prevent iommus dead loop when two masters share one IOMMU When two masters share an IOMMU, calling ops->of_xlate during the second master's driver init may overwrite iommu->domain set by the first. This causes the check if (iommu->domain == domain) in rk_iommu_attach_device() to fail, resulting in the same iommu->node being added twice to &rk_domain->iommus, which can lead to an infinite loop in subsequent &rk_domain->iommus operations. Cc: <stable@vger.kernel.org> Fixes: `25c2325575` ("iommu/rockchip: Add missing set_platform_dma_ops callback") Signed-off-by: Simon Xue <xxm@rock-chips.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20250623020018.584802-1-xxm@rock-chips.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 09:10:29 +02:00
Sven Peter	46a7418a3a	iommu/apple-dart: Drop default ARCH_APPLE in Kconfig When the first driver for Apple Silicon was upstreamed we accidentally included `default ARCH_APPLE` in its Kconfig which then spread to almost every subsequent driver. As soon as ARCH_APPLE is set to y this will pull in many drivers as built-ins which is not what we want. Thus, drop `default ARCH_APPLE` from Kconfig. Signed-off-by: Sven Peter <sven@kernel.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Janne Grunau <j@jannau.net> Link: https://lore.kernel.org/r/20250612-apple-kconfig-defconfig-v1-7-0e6f9cb512c1@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 08:59:43 +02:00
Jason Gunthorpe	db64591de4	iommu/qcom: Remove iommu_ops pgsize_bitmap This driver just uses a constant, put it in domain_alloc_paging and use the domain's value instead of ops during init_domain. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/6-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 08:56:00 +02:00
Jason Gunthorpe	b155e26df5	iommu/mtk: Remove iommu_ops pgsize_bitmap This driver just uses a constant, put it in domain_alloc_paging and use the domain's value instead of ops during finalise. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/5-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 08:56:00 +02:00
Jason Gunthorpe	cf39047e46	iommu: Remove iommu_ops pgsize_bitmap from simple drivers These drivers just have a constant value for their page size, move it into their domain_alloc_paging function before setting up the geometry. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> # for s390-iommu.c Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> # for exynos-iommu.c Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Acked-by: Thierry Reding <treding@nvidia.com> Acked-by: Chen-Yu Tsai <wens@csie.org> # sun50i-iommu.c Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/4-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 08:55:57 +02:00
Jason Gunthorpe	8901812485	iommu: Remove ops.pgsize_bitmap from drivers that don't use it These drivers all set the domain->pgsize_bitmap in their domain_alloc_paging() functions, so the ops value is never used. Delete it. Reviewed-by: Sven Peter <sven@svenpeter.dev> # for Apple DART Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com> # for RISC-V Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/3-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 08:55:56 +02:00
Jason Gunthorpe	35145e069e	iommu/arm-smmu: Remove iommu_ops pgsize_bitmap The driver never reads this value, arm_smmu_init_domain_context() always sets domain.pgsize_bitmap to smmu->pgsize_bitmap, the per-instance value. Remove the ops version entirely, the related dead code and make arm_smmu_ops const. Since this driver does not yet finalize the domain under arm_smmu_domain_alloc_paging() add a page size initialization to alloc so the page size is still setup prior to attach. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/2-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 08:55:55 +02:00
Jason Gunthorpe	b5dac93cb6	qiommu/arm-smmu-v3: Remove iommu_ops pgsize_bitmap The driver never reads this value, arm_smmu_domain_finalise() always sets domain.pgsize_bitmap to pgtbl_cfg, which comes from the per-smmu calculated value. Remove the ops version entirely, the related dead code and make arm_smmu_ops const. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/1-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 08:55:55 +02:00
Ankit Soni	025d1371cc	iommu/amd: Add efr[HATS] max v1 page table level The EFR[HATS] bits indicate maximum host translation level supported by IOMMU. Adding support to set the maximum host page table level as indicated by EFR[HATS]. If the HATS=11b (reserved), the driver will attempt to use guest page table for DMA API. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Ankit Soni <Ankit.Soni@amd.com> Link: https://lore.kernel.org/r/df0f8562c2a20895cc185c86f1a02c4d826fd597.1749016436.git.Ankit.Soni@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 08:51:20 +02:00
Ankit Soni	7e5516e609	iommu/amd: Add HATDis feature support Current AMD IOMMU assumes Host Address Translation (HAT) is always supported, and Linux kernel enables this capability by default. However, in case of emulated and virtualized IOMMU, this might not be the case. For example,current QEMU-emulated AMD vIOMMU does not support host translation for VFIO pass-through device, but the interrupt remapping support is required for x2APIC (i.e. kvm-msi-ext-dest-id is also not supported by the guest OS). This would require the guest kernel to boot with guest kernel option iommu=pt to by-pass the initialization of host (v1) table. The AMD I/O Virtualization Technology (IOMMU) Specification Rev 3.10 [1] introduces a new flag 'HATDis' in the IVHD 11h IOMMU attributes to indicate that HAT is not supported on a particular IOMMU instance. Therefore, modifies the AMD IOMMU driver to detect the new HATDis attributes, and disable host translation and switch to use guest translation if it is available. Otherwise, the driver will disable DMA translation. [1] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/specifications/48882_IOMMU.pdf Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Ankit Soni <Ankit.Soni@amd.com> Link: https://lore.kernel.org/r/8109b208f87b80e400c2abd24a2e44fcbc0763a5.1749016436.git.Ankit.Soni@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 08:51:18 +02:00
Sean Christopherson	b9e53f9ff4	iommu/amd: KVM: SVM: Allow KVM to control need for GA log interrupts Add plumbing to the AMD IOMMU driver to allow KVM to control whether or not an IRTE is configured to generate GA log interrupts. KVM only needs a notification if the target vCPU is blocking, so the vCPU can be awakened. If a vCPU is preempted or exits to userspace, KVM clears is_run, but will set the vCPU back to running when userspace does KVM_RUN and/or the vCPU task is scheduled back in, i.e. KVM doesn't need a notification. Unconditionally pass "true" in all KVM paths to isolate the IOMMU changes from the KVM changes insofar as possible. Opportunistically swap the ordering of parameters for amd_iommu_update_ga() so that the match amd_iommu_activate_guest_mode(). Note, as of this writing, the AMD IOMMU manual doesn't list GALogIntr as a non-cached field, but per AMD hardware architects, it's not cached and can be safely updated without an invalidation. Link: https://lore.kernel.org/all/b29b8c22-2fd4-4b5e-b755-9198874157c7@amd.com Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20250611224604.313496-62-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:51 -07:00
Sean Christopherson	a23480fe21	iommu/amd: WARN if KVM calls GA IRTE helpers without virtual APIC support WARN if KVM attempts to update IRTE entries when virtual APIC isn't fully supported, as KVM should guard all such calls on IRQ posting being enabled. Link: https://lore.kernel.org/r/20250611224604.313496-58-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:48 -07:00
Sean Christopherson	6df262f915	iommu/amd: KVM: SVM: Add IRTE metadata to affined vCPU's list if AVIC is inhibited If an IRQ can be posted to a vCPU, but AVIC is currently inhibited on the vCPU, go through the dance of "affining" the IRTE to the vCPU, but leave the actual IRTE in remapped mode. KVM already handles the case where AVIC is inhibited => uninhibited with posted IRQs (see avic_set_pi_irte_mode()), but doesn't handle the scenario where a postable IRQ comes along while AVIC is inhibited. Link: https://lore.kernel.org/r/20250611224604.313496-45-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:40 -07:00
Sean Christopherson	f965255dc5	iommu/amd: KVM: SVM: Set pCPU info in IRTE when setting vCPU affinity Now that setting vCPU affinity is guarded with ir_list_lock, i.e. now that avic_physical_id_entry can be safely accessed, set the pCPU info straight-away when setting vCPU affinity. Putting the IRTE into posted mode, and then immediately updating the IRTE a second time if the target vCPU is running is wasteful and confusing. This also fixes a flaw where a posted IRQ that arrives between putting the IRTE into guest_mode and setting the correct destination could cause the IOMMU to ring the doorbell on the wrong pCPU. Link: https://lore.kernel.org/r/20250611224604.313496-44-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:39 -07:00
Sean Christopherson	0b2b541fa3	iommu/amd: Factor out helper for manipulating IRTE GA/CPU info Split the guts of amd_iommu_update_ga() to a dedicated helper so that the logic can be shared with flows that put the IRTE into posted mode. Opportunistically move amd_iommu_update_ga() and its new helper above amd_iommu_activate_guest_mode() so that it's all co-located. Link: https://lore.kernel.org/r/20250611224604.313496-43-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:38 -07:00
Sean Christopherson	08d9ccdd1a	iommu/amd: KVM: SVM: Infer IsRun from validity of pCPU destination Infer whether or not a vCPU should be marked running from the validity of the pCPU on which it is running. amd_iommu_update_ga() already skips the IRTE update if the pCPU is invalid, i.e. passing %true for is_run with an invalid pCPU would be a blatant and egregrious KVM bug. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-42-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:37 -07:00
Sean Christopherson	3be405e89f	iommu/amd: Document which IRTE fields amd_iommu_update_ga() can modify Add a comment to amd_iommu_update_ga() to document what fields it can safely modify without issuing an invalidation of the IRTE, and to explain its role in keeping GA IRTEs up-to-date. Per page 93 of the IOMMU spec dated Feb 2025: When virtual interrupts are enabled by setting MMIO Offset 0018h[GAEn] and IRTE[GuestMode=1], IRTE[IsRun], IRTE[Destination], and if present IRTE[GATag], are not cached by the IOMMU. Modifications to these fields do not require an invalidation of the Interrupt Remapping Table. Link: https://lore.kernel.org/all/9b7ceea3-8c47-4383-ad9c-1a9bbdc9044a@oracle.com Cc: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20250611224604.313496-41-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:37 -07:00
Sean Christopherson	53527ea1b7	iommu: KVM: Split "struct vcpu_data" into separate AMD vs. Intel structs Split the vcpu_data structure that serves as a handoff from KVM to IOMMU drivers into vendor specific structures. Overloading a single structure makes the code hard to read and maintain, is very misleading as it suggests that mixing vendors is actually supported, and bastardizing Intel's posted interrupt descriptor address when AMD's IOMMU already has its own structure is quite unnecessary. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-33-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:31 -07:00
Sean Christopherson	95d50ebe6d	iommu/amd: KVM: SVM: Pass NULL @vcpu_info to indicate "not guest mode" Pass NULL to amd_ir_set_vcpu_affinity() to communicate "don't post to a vCPU" now that there's no need to communicate information back to KVM about the previous vCPU (KVM does its own tracking). Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:25 -07:00
Sean Christopherson	c4cdbaf9d8	iommu/amd: KVM: SVM: Use pi_desc_addr to derive ga_root_ptr Use vcpu_data.pi_desc_addr instead of amd_iommu_pi_data.base to get the GA root pointer. KVM is the only source of amd_iommu_pi_data.base, and KVM's one and only path for writing amd_iommu_pi_data.base computes the exact same value for vcpu_data.pi_desc_addr and amd_iommu_pi_data.base, and fills amd_iommu_pi_data.base if and only if vcpu_data.pi_desc_addr is valid, i.e. amd_iommu_pi_data.base is fully redundant. Cc: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-23 09:50:24 -07:00
Sean Christopherson	1da19c5ce0	iommu/amd: KVM: SVM: Delete now-unused cached/previous GA tag fields Delete the amd_ir_data.prev_ga_tag field now that all usage is superfluous. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:52:55 -07:00
Nicolin Chen	3e2a9811f6	iommufd: Apply the new iommufd_object_alloc_ucmd helper Now the new ucmd-based object allocator eases the finalize/abort routine, apply this to all existing allocators that aren't protected by any lock. Upgrade the for-driver vIOMMU alloctor too, and pass down to all existing viommu_alloc op accordingly. Note that __iommufd_object_alloc_ucmd() builds in some static tests that cover both static_asserts in the iommufd_viommu_alloc(). Thus drop them. Link: https://patch.msgid.link/r/107b24a3b791091bb09c92ffb0081c56c413b26d.1749882255.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:30 -03:00
Nicolin Chen	c0d498a1b9	iommufd: Introduce iommufd_object_alloc_ucmd helper An object allocator needs to call either iommufd_object_finalize() upon a success or iommufd_object_abort_and_destroy() upon an error code. To reduce duplication, store a new_obj in the struct iommufd_ucmd and call iommufd_object_finalize/iommufd_object_abort_and_destroy() accordingly in the main function. Similar to iommufd_object_alloc() and __iommufd_object_alloc(), add a pair of helpers: __iommufd_object_alloc_ucmd() and iommufd_object_alloc_ucmd(). Link: https://patch.msgid.link/r/e7206d4227844887cc8dbf0cc7b0242580fafd9d.1749882255.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:29 -03:00
Nicolin Chen	17a93473a5	iommufd: Move _iommufd_object_alloc out of driver.c Now, all driver structures will be allocated by the core, i.e. no longer a need of driver calling _iommufd_object_alloc. Thus, move it back. Before: text data bss dec hex filename 3024 180 0 3204 c84 drivers/iommu/iommufd/driver.o 9074 610 64 9748 2614 drivers/iommu/iommufd/main.o After: text data bss dec hex filename 2665 164 0 2829 b0d drivers/iommu/iommufd/driver.o 9410 618 64 10092 276c drivers/iommu/iommufd/main.o Link: https://patch.msgid.link/r/79e630c7b911930cf36e3c8a775a04e66c528d65.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:29 -03:00
Nicolin Chen	f842ea208e	iommu: Deprecate viommu_alloc op To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops are introduced. Now, those existing vIOMMU supported drivers implemented these two ops, replacing the viommu_alloc one. So, there is no use of it. Remove it from the headers and the viommu core. Link: https://patch.msgid.link/r/5b32d4499d7ed02a63e57a293c11b642d226ef8d.1749882255.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:29 -03:00
Nicolin Chen	3961f2f5da	iommu/arm-smmu-v3: Replace arm_vsmmu_alloc with arm_vsmmu_init To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops are introduced. Sanitize the inputs and report the size of struct arm_vsmmu on success, in arm_smmu_get_viommu_size(). Place the type sanity at the last, becase there will be soon an impl level get_viommu_size op, which will require the same sanity tests prior. It can simply insert a piece of code in front of the IOMMU_VIOMMU_TYPE_ARM_SMMUV3 sanity. The core will ensure the viommu_type is set to the core vIOMMU object, and pass in the same dev pointer, so arm_vsmmu_init() won't need to repeat the same sanity tests but to simply init the arm_vsmmu struct. Remove the arm_vsmmu_alloc, completing the replacement. Link: https://patch.msgid.link/r/64e4b4c33acd26e1bd676e077be80e00fb63f17c.1749882255.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:29 -03:00
Nicolin Chen	683cff7c3b	iommufd/selftest: Replace mock_viommu_alloc with mock_viommu_init To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops are introduced. Sanitize the inputs and report the size of struct mock_viommu on success, in mock_get_viommu_size(). The core will ensure the viommu_type is set to the core vIOMMU object, so simply init the driver part in mock_viommu_init(). Remove the mock_viommu_alloc, completing the replacement. Link: https://patch.msgid.link/r/993beabbb0bc9705d979a92801ea5ed5996a34eb.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:29 -03:00
Nicolin Chen	5983d1e7d7	iommufd/selftest: Drop parent domain from mock_iommu_domain_nested There is no use of this parent domain. Delete the dead code. Note that the s2_parent in struct mock_viommu will be a deadcode too. Yet, keep it because it will be soon used by HW queue objects, i.e. no point in adding it back and forth in such a short window. Besides, keeping it could cover the majority of vIOMMU use cases where a driver-level structure will be larger in size than the core structure. Link: https://patch.msgid.link/r/0f155a7cd71034a498448fe4828fb4aaacdabf95.1749882255.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:29 -03:00
Nicolin Chen	63141fa741	iommufd/viommu: Support get_viommu_size and viommu_init ops To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops are introduced to replace the viommu_init op. Let the new viommu_init pathway coexist with the old viommu_alloc one. Since the viommu_alloc op and its pathway will be soon deprecated, try to minimize the code difference between them by adding a tentative jump tag. Note that this fails a !viommu->ops case from now on with a WARN_ON_ONCE since a vIOMMU is expected to support an alloc_domain_nested op for now, or some sort of a viommu op in the foreseeable future. This WARN_ON_ONCE can be lifted, if some day there is a use case wanting !viommu->ops. Link: https://patch.msgid.link/r/35c5fa5926be45bda82f5fc87545cd3180ad4c9c.1749882255.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:28 -03:00
Nicolin Chen	0c6e0ae7a7	iommufd: Return EOPNOTSUPP for failures due to driver bugs It's more accurate to report EOPNOTSUPP when an ioctl failed due to driver bug, since there is nothing wrong with the user space side. Link: https://patch.msgid.link/r/623bb6f0e8fdd7b9c5745a2f99f280163f9f1f5a.1749882255.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:28 -03:00
Nicolin Chen	62b62a55bd	iommufd: Use enum iommu_veventq_type for type in struct iommufd_veventq Replace unsigned int, to make it clear. No functional changes. Link: https://patch.msgid.link/r/208a260c100a00667d3799feaad1260745f96c6b.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:28 -03:00
Nicolin Chen	6e235a7721	iommufd: Drop unused ictx in struct iommufd_vdevice The core code can always get the ictx pointer via vdev->viommu->ictx, thus drop this unused one. Link: https://patch.msgid.link/r/6cbb65e8df433de45b6c3a4bb2c5df09faca8a7c.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:27 -03:00
Nicolin Chen	ea92128fe7	iommufd: Apply obvious cosmetic fixes Run clang-format but exclude those not so obvious ones, which leaves us: - Align indentations - Add missing spaces - Remove unnecessary spaces - Remove unnecessary line wrappings Link: https://patch.msgid.link/r/9132e1ab45690ab1959c66bbb51ac5536a635388.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-06-19 15:43:27 -03:00
Jason Gunthorpe	f9705d66fa	iommu/tegra: Fix incorrect size calculation This driver uses a mixture of ways to get the size of a PTE, tegra_smmu_set_pde() did it as sizeof(pd) which became wrong when pd switched to a struct tegra_pd. Switch pd back to a u32 in tegra_smmu_set_pde() so the sizeof(*pd) returns 4. Fixes: `50568f87d1` ("iommu/terga: Do not use struct page as the handle for as->pd memory") Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt> Closes: https://lore.kernel.org/all/62e7f7fe-6200-4e4f-ad42-d58ad272baa6@tecnico.ulisboa.pt/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt> Link: https://lore.kernel.org/r/0-v1-da7b8b3d57eb+ce-iommu_terga_sizeof_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-13 17:02:31 +02:00
Ingo Molnar	41cb08555c	treewide, timers: Rename from_timer() to timer_container_of() Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com	2025-06-08 09:07:37 +02:00
Linus Torvalds	3719a04a80	pci-v6.16-changes -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmhAa9EUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vyA3w//aX8d73z/xVxkYLMN/6XQA5fdmd4d Dv4n0Pjf0WCMKbsgRCdXEYLvcHV8VhH5iCR/b2UsFm9LjxSIRuqE5XosY3bNhrHn xVKEh2prq2XZOibWrFkJ+RZ0FF7Ogq1Uy5gUBbBHbE1q1byZzrOALaF3FWGaDIZQ 6QLLAFtd3UtqOOUu8J8P9N15uFR8gunyfuM9U7TLMcy4B8txk6T6m/9xAWtRURuJ I6WN8lO+g8Nl2mL9m27+wyWiVT3tKqoMwp8rVtym/L5JQOmHycYhn0WQAr2dPCMs Xbgmoeei0je7mZvk5btpt68NAKQ3ZnCVkxbbINBkUxAjI0dbI6h37EhW18ShYVUk CCo4fmaFtwP8qNN9tSvDN8vZdGB44fN5tIz4lmGzKk5gt+oV50RC/APrzC+PJBQ0 +2SdDVKj71Gr2H1VnI6uLB7oQ+tp7TOdhg+DGV4bdc6QFnsM+BpKWRq5f1UQcau/ XVDmorM/2t6z0DNktAv3NFwSodUjk1loWESr/pRBH1AqAWZTK98PWIg97XYsal59 zbJ3dLrnCqUNozeVgjtZo1LWD2FZaVTvhq2NY7D+QPpnMGhFUhHxNliZUXiQa1q4 boI2hEFdu3IQP/OC2a1zGJyMRLU43d5rhZ1U5xQSVtM0c3lgCY7rn/t26LymQVPA SYdg2jBcnhe6gXo= =eWJw -----END PGP SIGNATURE----- Merge tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Print the actual delay time in pci_bridge_wait_for_secondary_bus() instead of assuming it was 1000ms (Wilfred Mallawa) - Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices', which broke resume from system sleep on AMD platforms and has been fixed by other commits (Lukas Wunner) Resource management: - Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated and unnecessary (Philipp Stanner) - Remove pcim_iounmap_regions() and pcim_request_region_exclusive() and related flags since all uses have been removed (Philipp Stanner) - Rework devres 'request' functions so they are no longer 'hybrid', i.e., their behavior no longer depends on whether pcim_enable_device or pci_enable_device() was used, and remove related code (Philipp Stanner) - Warn (not BUG()) about failure to assign optional resources (Ilpo Järvinen) Error handling: - Log the DPC Error Source ID only when it's actually valid (when ERR_FATAL or ERR_NONFATAL was received from a downstream device) and decode into bus/device/function (Bjorn Helgaas) - Determine AER log level once and save it so all related messages use the same level (Karolina Stolarek) - Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable Errors (Karolina Stolarek) - Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs controls on interval and burst count, to avoid flooding logs and RCU stall warnings (Jon Pan-Doh) Power management: - Increment PM usage counter when probing reset methods so we don't try to read config space of a powered-off device (Alex Williamson) - Set all devices to D0 during enumeration to ensure ACPI opregion is connected via _REG (Mario Limonciello) Power control: - Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match the filename paths. Retain old deprecated symbols for compatibility, except for the pwrctrl slot driver (PCI_PWRCTRL_SLOT) (Johan Hovold) - When unregistering pwrctrl, cancel outstanding rescan work before cleaning up data structures to avoid use-after-free issues (Brian Norris) Bandwidth control: - Simplify link bandwidth controller by replacing the count of Link Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag (Ilpo Järvinen) - Update the Link Speed after retraining, since the Link Speed may have changed (Ilpo Järvinen) PCIe native device hotplug: - Ignore Presence Detect Changed caused by DPC. pciehp already ignores Link Down/Up events caused by DPC, but on slots using in-band presence detect, DPC causes a spurious Presence Detect Changed event (Lukas Wunner) - Ignore Link Down/Up caused by Secondary Bus Reset. On hotplug ports using in-band presence detect, the reset causes a Presence Detect Changed event, which mistakenly caused teardown and re-enumeration of the device. Drivers may need to annotate code that resets their device (Lukas Wunner) Virtualization: - Add an ACS quirk for Loongson Root Ports that don't advertise ACS but don't allow peer-to-peer transactions between Root Ports; the quirk allows each Root Port to be in a separate IOMMU group (Huacai Chen) Endpoint framework: - For fixed-size BARs, retain both the actual size and the possibly larger size allocated to accommodate iATU alignment requirements (Jerome Brunet) - Simplify ctrl/SPAD space allocation and avoid allocating more space than needed (Jerome Brunet) - Correct MSI-X PBA offset calculations for DesignWare and Cadence endpoint controllers (Niklas Cassel) - Align the return value (number of interrupts) encoding for pci_epc_get_msi()/pci_epc_ops::get_msi() and pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel) - Align the nr_irqs parameter encoding for pci_epc_set_msi()/pci_epc_ops::set_msi() and pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel) Common host controller library: - Convert pci-host-common to a library so platforms that don't need native host controller drivers don't need to include these helper functions (Manivannan Sadhasivam) Apple PCIe controller driver: - Extract ECAM bridge creation helper from pci_host_common_probe() to separate driver-specific things like MSI from PCI things (Marc Zyngier) - Dynamically allocate RID-to_SID bitmap to prepare for SoCs with varying capabilities (Marc Zyngier) - Skip ports disabled in DT when setting up ports (Janne Grunau) - Add t6020 compatible string (Alyssa Rosenzweig) - Add T602x PCIe support (Hector Martin) - Directly set/clear INTx mask bits because T602x dropped the accessors that could do this without locking (Marc Zyngier) - Move port PHY registers to their own reg items to accommodate T602x, which moves them around; retain default offsets for existing DTs that lack phy%d entries with the reg offsets (Hector Martin) - Stop polling for core refclk, which doesn't work on T602x and the bootloader has already done anyway (Hector Martin) - Use gpiod_set_value_cansleep() when asserting PERST# in probe because we're allowed to sleep there (Hector Martin) Cadence PCIe controller driver: - Drop a runtime PM 'put' to resolve a runtime atomic count underflow (Hans Zhang) - Make the cadence core buildable as a module (Kishon Vijay Abraham I) - Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by loadable drivers when they are removed (Siddharth Vadapalli) Freescale i.MX6 PCIe controller driver: - Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP (Richard Zhu) - Remove redundant dw_pcie_wait_for_link() from imx_pcie_start_link(); since the DWC core does this, imx6 only needs it when retraining for a faster link speed (Richard Zhu) - Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu) - Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in some cases, the controller can't exit 'L23 Ready' through Beacon or PERST# deassertion (Richard Zhu) - Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum: controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8 GT/s, causing timeouts in L1 (Richard Zhu) - Wait for i.MX95 PLL lock before enabling controller (Richard Zhu) - Save/restore i.MX95 LUT for suspend/resume (Richard Zhu) Mobiveil PCIe controller driver: - Return bool (not int) for link-up check in mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans Zhang) NVIDIA Tegra194 PCIe controller driver: - Create debugfs directory for 'aspm_state_cnt' only when CONFIG_PCIEASPM is enabled, since there are no other entries (Hans Zhang) Qualcomm PCIe controller driver: - Add OF support for parsing DT 'eq-presets-<N>gts' property for lane equalization presets (Krishna Chaitanya Chundru) - Read Maximum Link Width from the Link Capabilities register if DT lacks 'num-lanes' property (Krishna Chaitanya Chundru) - Add Physical Layer 64 GT/s Capability ID and register offsets for 8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya Chundru) - Add generic dwc support for configuring lane equalization presets (Krishna Chaitanya Chundru) - Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar) Renesas R-Car PCIe controller driver: - Describe endpoint BAR 4 as being fixed size (Jerome Brunet) - Document how to obtain R-Car V4H (r8a779g0) controller firmware (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Reorder rockchip_pci_core_rsts because reset_control_bulk_deassert() deasserts in reverse order, to fix a link training regression (Jensen Huang) - Mark RK3399 as being capable of raising INTx interrupts (Niklas Cassel) Rockchip DesignWare PCIe controller driver: - Check only PCIE_LINKUP, not LTSSM status, to determine whether the link is up (Shawn Lin) - Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s for Root Complex and Endpoint modes (Shawn Lin) - Hide the broken ATS Capability in rockchip_pcie_ep_init() instead of rockchip_pcie_ep_pre_init() so it stays hidden after PERST# resets non-sticky registers (Shawn Lin) - Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit() (Diederik de Haas) Synopsys DesignWare PCIe controller driver: - Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training more robust; this will not affect the intended link width if all lanes are functional (Wenbin Yao) - Return bool (not int) for link-up check in dw_pcie_ops.link_up() and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay, keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx, tegra194, uniphier, visconti (Hans Zhang) - Add debugfs support for exposing DWC device-specific PTM context (Manivannan Sadhasivam) TI J721E PCIe driver: - Make j721e buildable as a loadable and removable module (Siddharth Vadapalli) - Fix j721e host/endpoint dependencies that result in link failures in some configs (Arnd Bergmann) Device tree bindings: - Add qcom DT binding for 'global' interrupt (PCIe controller and link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p, sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan Sadhasivam) - Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074, ipq8074-gen3, ipq6018 (Manivannan Sadhasivam) - Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang) - Correct indentation and style of examples in brcm,stb-pcie, cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie, microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm (Krzysztof Kozlowski) - Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and armada8k from text to schema DT bindings (Rob Herring) - Remove obsolete .txt DT bindings for content that has been moved to schemas (Rob Herring) - Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074 and IPQ9574 (Varadarajan Narayanan) - Convert v3,v360epc-pci from text to DT schema binding (Rob Herring) - Change microchip,pcie-host DT binding to be 'dma-noncoherent' since PolarFire may be configured that way (Conor Dooley) Miscellaneous: - Drop 'pci' suffix from intel_mid_pci.c filename to match similar files (Andy Shevchenko) - All platforms with PCI have an MMU, so add PCI Kconfig dependency on MMU to simplify build testing and avoid inadvertent build regressions (Arnd Bergmann) - Update Krzysztof Wilczyński's email address in MAINTAINERS (Krzysztof Wilczyński) - Update Manivannan Sadhasivam's email address in MAINTAINERS (Manivannan Sadhasivam)" * tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits) MAINTAINERS: Update Manivannan Sadhasivam email address PCI: j721e: Fix host/endpoint dependencies PCI: j721e: Add support to build as a loadable module PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup PCI: cadence: Add support to build pcie-cadence library as a kernel module MAINTAINERS: Update Krzysztof Wilczyński email address PCI: Remove unnecessary linesplit in __pci_setup_bridge() PCI: WARN (not BUG()) when we fail to assign optional resources PCI: Remove unused pci_printk() PCI: qcom: Replace PERST# sleep time with proper macro PCI: dw-rockchip: Replace PERST# sleep time with proper macro PCI: host-common: Convert to library for host controller drivers PCI/ERR: Remove misleading TODO regarding kernel panic PCI: cadence: Remove duplicate message code definitions PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding PCI: cadence-ep: Correct PBA offset in .set_msix() callback ...	2025-06-04 11:26:17 -07:00
Linus Torvalds	9d49da4388	Revert "iommu: make inclusion of arm/arm-smmu-v3 directory conditional" This reverts commit `e436576b02`. That commit is very broken, and seems to have missed the fact that CONFIG_ARM_SMMU_V3 is not just a yes-or-no thing, but also can be modular. So it caused build errors on arm64 allmodconfig setups: ERROR: modpost: "arm_smmu_make_cdtable_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined! ERROR: modpost: "arm_smmu_make_s2_domain_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined! ERROR: modpost: "arm_smmu_make_s1_cd" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined! ... (and six more symbols just the same). Link: https://lore.kernel.org/all/CAHk-=wh4qRwm7AQ8sBmQj7qECzgAhj4r73RtCDfmHo5SdcN0Jw@mail.gmail.com/ Cc: Joerg Roedel <joro@8bytes.org> Cc: Rolf Eike Beer <eb@emlix.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-05-31 07:43:16 -07:00
Linus Torvalds	8477ab1430	IOMMU Updates for Linux v6.16: Including: - Core: - Introduction of iommu-pages infrastructure to consolitate page-table allocation code among hardware drivers. This is ground-work for more generalization in the future. - Remove IOMMU_DEV_FEAT_SVA and IOMMU_DEV_FEAT_IOPF feature flags. - Convert virtio-iommu to domain_alloc_paging(). - KConfig cleanups. - Some small fixes for possible overflows and race conditions. - Intel VT-d driver: - Restore WO permissions on second-level paging entries. - Use ida to manage domain id. - Miscellaneous cleanups. - AMD-Vi: - Make sure notifiers finish running before module unload. - Add support for HTRangeIgnore feature. - Allow matching ACPI HID devices without matching UIDs. - ARM-SMMU: - SMMUv2: - Recognise the compatible string for SAR2130P MDSS in the Qualcomm driver, as this device requires an identity domain. - Fix Adreno stall handling so that GPU debugging is more robust and doesn't e.g. result in deadlock. - SMMUv3: - Fix ->attach_dev() error reporting for unrecognised domains. - IO-pgtable: - Allow clients (notably, drivers that process requests from userspace) to silence warnings when mapping an already-mapped IOVA. - S390: - Add support for additional table regions. - Mediatek: - Add support for MT6893 MM IOMMU. - Some smaller fixes and improvements in various other drivers. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmg5nnoACgkQK/BELZcB GuNaPg//XgI0g1WUjENFvtri5OLz7/slIYHw1TPJkmyEnwnipEcz7C607W7L4fbe yvspsc80mnN1xe986GGpfK+wyEOoVWE2k1Jg3iDhHjxSgjbmPl+fMlpTyKm9kPkk nR2v4szosCPCTgfy+km1c6VCS/aRUtyrX289fKK51hlQ38YMa1b+D7p/S87dehLf TJ/VqnE6lwSugXJZek6bVR7bQscArjcmHNK+pTEvdHUZiR2c9WLQAPNCmBpvUglJ oPRQh3JzfF/zbFshyyCpqOAKbsjJsQhrEVXHIgE1lF4ap10U5jEx1CME8XuxqkqL klQZzmAllAyRfEm43OcELpfAiUc3QZpR8+i2Xnmg5r3ZgM5bm6MOc424XCSmKg45 v4R6r976rvOWGOava2c/YrrwtEaemR29f0Q/ht/+m6CtMCvUPokuHYNvZ5/jM4Xh PjPGgXm9sNtNb78TwMgHQHBhPqI71m02lh+W8WKs+pQT1KdaRZbRuftHj72iwUvZ tglkYJmdnpxa30Mlvi5OZTVKLGwzCdUiTpvvvQNVQroH8J1pWtRtaTz2yXKbR8zy B1juOIWbQtqT1NSz+IwCVbNOyMO+Jzu1Olw24LXrX/MMSDcG1ZsoaKhrGy09gcAq s+PSETd01+Z+L2GW6p9panKePf3vWO54SEU6nwQzgVuGdDjcs/U= =RmX9 -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core: - Introduction of iommu-pages infrastructure to consolitate page-table allocation code among hardware drivers. This is ground-work for more generalization in the future - Remove IOMMU_DEV_FEAT_SVA and IOMMU_DEV_FEAT_IOPF feature flags - Convert virtio-iommu to domain_alloc_paging() - KConfig cleanups - Some small fixes for possible overflows and race conditions Intel VT-d driver: - Restore WO permissions on second-level paging entries - Use ida to manage domain id - Miscellaneous cleanups AMD-Vi: - Make sure notifiers finish running before module unload - Add support for HTRangeIgnore feature - Allow matching ACPI HID devices without matching UIDs ARM-SMMU: - SMMUv2: - Recognise the compatible string for SAR2130P MDSS in the Qualcomm driver, as this device requires an identity domain - Fix Adreno stall handling so that GPU debugging is more robust and doesn't e.g. result in deadlock - SMMUv3: - Fix ->attach_dev() error reporting for unrecognised domains - IO-pgtable: - Allow clients (notably, drivers that process requests from userspace) to silence warnings when mapping an already-mapped IOVA S390: - Add support for additional table regions Mediatek: - Add support for MT6893 MM IOMMU And some smaller fixes and improvements in various other drivers" * tag 'iommu-updates-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (75 commits) iommu/vt-d: Restore context entry setup order for aliased devices iommu/mediatek: Fix compatible typo for mediatek,mt6893-iommu-mm iommu/arm-smmu-qcom: Make set_stall work when the device is on iommu/arm-smmu: Move handing of RESUME to the context fault handler iommu/arm-smmu-qcom: Enable threaded IRQ for Adreno SMMUv2/MMU500 iommu/io-pgtable-arm: Add quirk to quiet WARN_ON() iommu: Clear the freelist after iommu_put_pages_list() iommu/vt-d: Change dmar_ats_supported() to return boolean iommu/vt-d: Eliminate pci_physfn() in dmar_find_matched_satc_unit() iommu/vt-d: Replace spin_lock with mutex to protect domain ida iommu/vt-d: Use ida to manage domain id iommu/vt-d: Restore WO permissions on second-level paging entries iommu/amd: Allow matching ACPI HID devices without matching UIDs iommu: make inclusion of arm/arm-smmu-v3 directory conditional iommu: make inclusion of riscv directory conditional iommu: make inclusion of amd directory conditional iommu: make inclusion of intel directory conditional iommu: remove duplicate selection of DMAR_TABLE iommu/fsl_pamu: remove trailing space after \n iommu/arm-smmu-qcom: Add SAR2130P MDSS compatible ...	2025-05-30 10:44:20 -07:00
Linus Torvalds	23022f5456	dma-mapping updates for Linux 6.16: - new two step DMA mapping API, which is is a first step to a long path to provide alternatives to scatterlist and to remove hacks, abuses and design mistakes related to scatterlists; this new approach optimizes some calls to DMA-IOMMU layer and cache maintenance by batching them, reduces memory usage as it is no need to store mapped DMA addresses to unmap them, and reduces some function call overhead; it is a combination effort of many people, lead and developed by Christoph Hellwig and Leon Romanovsky -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaDRXIQAKCRCJp1EFxbsS RG8tAP9kgjIwMoJqfr6DC8yYraIIUuNDyhb/fZ9vPppW6Cb7aAD/cg8udjrsUu3h iAZBIHkYuWmkx8JG7t5/lqBc4AOC1AA= =F3TU -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.16-2025-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: "New two step DMA mapping API, which is is a first step to a long path to provide alternatives to scatterlist and to remove hacks, abuses and design mistakes related to scatterlists. This new approach optimizes some calls to DMA-IOMMU layer and cache maintenance by batching them, reduces memory usage as it is no need to store mapped DMA addresses to unmap them, and reduces some function call overhead. It is a combination effort of many people, lead and developed by Christoph Hellwig and Leon Romanovsky" * tag 'dma-mapping-6.16-2025-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: docs: core-api: document the IOVA-based API dma-mapping: add a dma_need_unmap helper dma-mapping: Implement link/unlink ranges API iommu/dma: Factor out a iommu_dma_map_swiotlb helper dma-mapping: Provide an interface to allow allocate IOVA iommu: add kernel-doc for iommu_unmap_fast iommu: generalize the batched sync after map interface dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h PCI/P2PDMA: Refactor the p2pdma mapping helpers	2025-05-27 20:09:06 -07:00
Joerg Roedel	879b141b7c	Merge branches 'fixes', 'apple/dart', 'arm/smmu/updates', 'arm/smmu/bindings', 'fsl/pamu', 'mediatek', 'renesas/ipmmu', 's390', 'intel/vt-d', 'amd/amd-vi' and 'core' into next	2025-05-23 17:14:32 +02:00
Lu Baolu	320302baed	iommu/vt-d: Restore context entry setup order for aliased devices Commit `2031c469f8` ("iommu/vt-d: Add support for static identity domain") changed the context entry setup during domain attachment from a set-and-check policy to a clear-and-reset approach. This inadvertently introduced a regression affecting PCI aliased devices behind PCIe-to-PCI bridges. Specifically, keyboard and touchpad stopped working on several Apple Macbooks with below messages: kernel: platform pxa2xx-spi.3: Adding to iommu group 20 kernel: input: Apple SPI Keyboard as /devices/pci0000:00/0000:00:1e.3/pxa2xx-spi.3/spi_master/spi2/spi-APP000D:00/input/input0 kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00 kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: DMAR: DRHD: handling fault status reg 3 kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00 Fix this by restoring the previous context setup order. Fixes: `2031c469f8` ("iommu/vt-d: Add support for static identity domain") Closes: https://lore.kernel.org/all/4dada48a-c5dd-4c30-9c85-5b03b0aa01f0@bfh.ch/ Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20250514060523.2862195-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250520075849.755012-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-23 17:11:40 +02:00
AngeloGioacchino Del Regno	bdcea7e396	iommu/mediatek: Fix compatible typo for mediatek,mt6893-iommu-mm Fix the "mediatek.mt6893-iommu-mm" compatible string typo, as the dot was actually meant to be a comma: "mediatek,mt6893-iommu-mm". Fixes: `f6a1e89ab6` ("iommu/mediatek: Add support for Dimensity 1200 MT6893 MM IOMMU") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20250521151548.185910-1-angelogioacchino.delregno@collabora.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-22 09:13:06 +02:00
Tushar Dave	b3f6fcd840	iommu: Skip PASID validation for devices without PASID capability Generally PASID support requires ACS settings that usually create single device groups, but there are some niche cases where we can get multi-device groups and still have working PASID support. The primary issue is that PCI switches are not required to treat PASID tagged TLPs specially so appropriate ACS settings are required to route all TLPs to the host bridge if PASID is going to work properly. pci_enable_pasid() does check that each device that will use PASID has the proper ACS settings to achieve this routing. However, no-PASID devices can be combined with PASID capable devices within the same topology using non-uniform ACS settings. In this case the no-PASID devices may not have strict route to host ACS flags and end up being grouped with the PASID devices. This configuration fails to allow use of the PASID within the iommu core code which wrongly checks if the no-PASID device supports PASID. Fix this by ignoring no-PASID devices during the PASID validation. They will never issue a PASID TLP anyhow so they can be ignored. Fixes: `c404f55c26` ("iommu: Validate the PASID in iommu_attach_device_pasid()") Cc: stable@vger.kernel.org Signed-off-by: Tushar Dave <tdave@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250520011937.3230557-1-tdave@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-22 09:10:00 +02:00
Connor Abbott	70892277ca	iommu/arm-smmu-qcom: Make set_stall work when the device is on Up until now we have only called the set_stall callback during initialization when the device is off. But we will soon start calling it to temporarily disable stall-on-fault when the device is on, so handle that by checking if the device is on and writing SCTLR. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Link: https://lore.kernel.org/r/20250520-msm-gpu-fault-fixes-next-v8-3-fce6ee218787@gmail.com [will: Fix "mixed declarations and code" warning from sparse] Signed-off-by: Will Deacon <will@kernel.org>	2025-05-21 11:34:06 +01:00
Connor Abbott	3053a2c508	iommu/arm-smmu: Move handing of RESUME to the context fault handler The upper layer fault handler is now expected to handle everything required to retry the transaction or dump state related to it, since we enable threaded IRQs. This means that we can take charge of writing RESUME, making sure that we always write it after writing FSR as recommended by the specification. The iommu handler should write -EAGAIN if a transaction needs to be retried. This avoids tricky cross-tree changes in drm/msm, since it never wants to retry the transaction and it already returns 0 from its fault handler. Therefore it will continue to correctly terminate the transaction without any changes required. devcoredumps from drm/msm will temporarily be broken until it is fixed to collect devcoredumps inside its fault handler, but fixing that first would actually be worse because MMU-500 ignores writes to RESUME unless all fields of FSR (except SS of course) are clear and raises an interrupt when only SS is asserted. Right now, things happen to work most of the time if we collect a devcoredump, because RESUME is written asynchronously in the fault worker after the fault handler clears FSR and finishes, although there will be some spurious faults, but if this is changed before this commit fixes the FSR/RESUME write order then SS will never be cleared, the interrupt will never be cleared, and the whole system will hang every time a fault happens. It will therefore help bisectability if this commit goes first. I've changed the TBU path to also accept -EAGAIN and do the same thing, while keeping the old -EBUSY behavior. Although the old path was broken because you'd get a storm of interrupts due to returning IRQ_NONE that would eventually result in the interrupt being disabled, and I think it was dead code anyway, so it should eventually be deleted. Note that drm/msm never uses TBU so this is untested. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Link: https://lore.kernel.org/r/20250520-msm-gpu-fault-fixes-next-v8-2-fce6ee218787@gmail.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-21 10:43:05 +01:00
Connor Abbott	1650620774	iommu/arm-smmu-qcom: Enable threaded IRQ for Adreno SMMUv2/MMU500 The recommended flow for stall-on-fault in SMMUv2 is the following: 1. Resolve the fault. 2. Write to FSR to clear the fault bits. 3. Write RESUME to retry or fail the transaction. MMU500 is designed with this sequence in mind. For example, experimentally we have seen on MMU500 that writing RESUME does not clear FSR.SS unless the original fault is cleared in FSR, so 2 must come before 3. FSR.SS is allowed to signal a fault (and does on MMU500) so that if we try to do 2 -> 1 -> 3 (while exiting from the fault handler after 2) we can get duplicate faults without hacks to disable interrupts. However, resolving the fault typically requires lengthy operations that can stall, like bringing in pages from disk. The only current user, drm/msm, dumps GPU state before failing the transaction which indeed can stall. Therefore, from now on we will require implementations that want to use stall-on-fault to also enable threaded IRQs. Do that with the Adreno MMU implementations. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Link: https://lore.kernel.org/r/20250520-msm-gpu-fault-fixes-next-v8-1-fce6ee218787@gmail.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-21 10:43:05 +01:00
Rob Clark	3318f7b5ce	iommu/io-pgtable-arm: Add quirk to quiet WARN_ON() In situations where mapping/unmapping sequence can be controlled by userspace, attempting to map over a region that has not yet been unmapped is an error. But not something that should spam dmesg. Now that there is a quirk, we can also drop the selftest_running flag, and use the quirk instead for selftests. Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20250519175348.11924-6-robdclark@gmail.com [will: Rename quirk to IO_PGTABLE_QUIRK_NO_WARN per Robin's suggestion] Signed-off-by: Will Deacon <will@kernel.org>	2025-05-20 15:04:13 +01:00
Jason Gunthorpe	5e2ff240b3	iommu: Clear the freelist after iommu_put_pages_list() The commit below reworked how iommu_put_pages_list() worked to not do list_del() on every entry. This was done expecting all the callers to already re-init the list so doing a per-item deletion is not efficient. It was missed that fq_ring_free_locked() re-uses its list after calling iommu_put_pages_list() and so the leftover list reaches free'd struct pages and will crash or WARN/BUG/etc. Reinit the list to empty in fq_ring_free_locked() after calling iommu_put_pages_list(). Audit to see if any other callers of iommu_put_pages_list() need the list to be empty: - iommu_dma_free_fq_single() and iommu_dma_free_fq_percpu() immediately frees the memory - iommu_v1_map_pages(), v1_free_pgtable(), domain_exit(), riscv_iommu_map_pages() uses a stack variable which goes out of scope - intel_iommu_tlb_sync() uses a gather in a iotlb_sync() callback, the caller re-inits the gather Fixes: `13f43d7cf3` ("iommu/pages: Formalize the freelist API") Reported-by: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com> Closes: https://lore.kernel.org/r/SJ1PR11MB61292CE72D7BE06B8810021CB997A@SJ1PR11MB6129.namprd11.prod.outlook.com Tested-by: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-7d4dfa6140f7+11f04-iommu_freelist_init_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 14:29:16 +02:00
Wei Wang	f3fe7e1830	iommu/vt-d: Change dmar_ats_supported() to return boolean According to "Function return values and names" in coding-style.rst, the dmar_ats_supported() function should return a boolean instead of an integer. Also, rename "ret" to "supported" to be more straightforward. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20250509140021.4029303-3-wei.w.wang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:49:30 +02:00
Wei Wang	3df2ebcee9	iommu/vt-d: Eliminate pci_physfn() in dmar_find_matched_satc_unit() The function dmar_find_matched_satc_unit() contains a duplicate call to pci_physfn(). This call is unnecessary as pci_physfn() has already been invoked by the caller. Removing the redundant call simplifies the code and improves efficiency a bit. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Link: https://lore.kernel.org/r/20250509140021.4029303-2-wei.w.wang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:49:29 +02:00
Lu Baolu	868720fe15	iommu/vt-d: Replace spin_lock with mutex to protect domain ida The domain ID allocator is currently protected by a spin_lock. However, ida_alloc_range can potentially block if it needs to allocate memory to grow its internal structures. Replace the spin_lock with a mutex which allows sleep on block. Thus, the memory allocation flags can be updated from GFP_ATOMIC to GFP_KERNEL to allow blocking memory allocations if necessary. Introduce a new mutex, did_lock, specifically for protecting the domain ida. The existing spinlock will remain for protecting other intel_iommu fields. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250430021135.2370244-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:49:28 +02:00
Lu Baolu	f93b4ac592	iommu/vt-d: Use ida to manage domain id Switch the intel iommu driver to use the ida mechanism for managing domain IDs, replacing the previous fixed-size bitmap. The previous approach allocated a bitmap large enough to cover the maximum number of domain IDs supported by the hardware, regardless of the actual number of domains in use. This led to unnecessary memory consumption, especially on systems supporting a large number of iommu units but only utilizing a small number of domain IDs. The ida allocator dynamically manages the allocation and freeing of integer IDs, only consuming memory for the IDs that are currently in use. This significantly optimizes memory usage compared to the fixed-size bitmap. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250430021135.2370244-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:49:27 +02:00
Jason Gunthorpe	6f5dc76580	iommu/vt-d: Restore WO permissions on second-level paging entries VT-D HW can do WO permissions on the second-stage but not the first-stage page table formats. The commit `eea53c5816` ("iommu/vt-d: Remove WO permissions on second-level paging entries") wanted to make this uniform for VT-D by disabling the support for WO permissions in the second-stage. This isn't consistent with how other drivers are working. Instead if the underlying HW can support WO, it should. For instance AMD already supports WO on its second stage (v1) format and not its first (v2). If WO support needs to be discoverable it should be done through an iommu_domain capability flag. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-c26553717e90+65f-iommu_vtd_ss_wo_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:49:27 +02:00
Mario Limonciello	51c33f333b	iommu/amd: Allow matching ACPI HID devices without matching UIDs A BIOS upgrade has changed the IVRS DTE UID for a device that no longer matches the UID in the SSDT. In this case there is only one ACPI device on the system with that _HID but the _UID mismatch. IVRS: ``` Subtable Type : F0 [Device Entry: ACPI HID Named Device] Device ID : 0060 Data Setting (decoded below) : 40 INITPass : 0 EIntPass : 0 NMIPass : 0 Reserved : 0 System MGMT : 0 LINT0 Pass : 1 LINT1 Pass : 0 ACPI HID : "MSFT0201" ACPI CID : 0000000000000000 UID Format : 02 UID Length : 09 UID : "\_SB.MHSP" ``` SSDT: ``` Device (MHSP) { Name (_ADR, Zero) // _ADR: Address Name (_HID, "MSFT0201") // _HID: Hardware ID Name (_UID, One) // _UID: Unique ID ``` To handle this case; while enumerating ACPI devices in get_acpihid_device_id() count the number of matching ACPI devices with a matching _HID. If there is exactly one _HID match then accept it even if the UID doesn't match. Other operating systems allow this, but the current IVRS spec leaves some ambiguity whether to allow or disallow it. This should be clarified in future revisions of the spec. Output 'Firmware Bug' for this case to encourage it to be solved in the BIOS. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250512173129.1274275-1-superm1@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:48:18 +02:00
Rolf Eike Beer	e436576b02	iommu: make inclusion of arm/arm-smmu-v3 directory conditional Nothing in there is active if CONFIG_ARM_SMMU_V3 is not enabled, so the whole directory can depend on that switch as well. Fixes: `e86d1aa8b6` ("iommu/arm-smmu: Move Arm SMMU drivers into their own subdirectory") Signed-off-by: Rolf Eike Beer <eb@emlix.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/2434059.NG923GbCHz@devpool92.emlix.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:46:48 +02:00
Rolf Eike Beer	01c13a1d0e	iommu: make inclusion of riscv directory conditional Nothing in there is active if CONFIG_RISCV_IOMMU is not enabled, so the whole directory can depend on that switch as well. Fixes: `5c0ebbd3c6` ("iommu/riscv: Add RISC-V IOMMU platform device driver") Signed-off-by: Rolf Eike Beer <eb@emlix.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/2235451.Icojqenx9y@devpool92.emlix.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-16 08:46:47 +02:00

1 2 3 4 5 ...

5989 Commits