mirror_ubuntu-kernels

mirror of https://git.proxmox.com/git/mirror_ubuntu-kernels.git synced 2025-12-18 23:00:50 +00:00

Author	SHA1	Message	Date
Max Gurtovoy	1cbd70fe37	vfio/pci: Rename vfio_pci.c to vfio_pci_core.c This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-08-26 10:36:51 -06:00
Cai Huoqing	29848a034a	vfio-pci/zdev: Remove repeated verbose license text The SPDX and verbose license text are redundant, however in this case the verbose license indicates a GPL v2 only while SPDX specifies v2+. Remove the verbose license and correct SPDX to the more restricted version. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20210824003749.1039-1-caihuoqing@baidu.com [aw: commit log] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-08-24 12:04:52 -06:00
Jason Gunthorpe	db44c17458	vfio/pci: Reorganize VFIO_DEVICE_PCI_HOT_RESET to use the device set Like vfio_pci_dev_set_try_reset() this code wants to reset all of the devices in the "reset group" which is the same membership as the device set. Instead of trying to reconstruct the device set from the PCI list go directly from the device set's device list to execute the reset. The same basic structure as vfio_pci_dev_set_try_reset() is used. The 'vfio_devices' struct is replaced with the device set linked list and we simply sweep it multiple times under the lock. This eliminates a memory allocation and get/put traffic and another improperly locked test of pci_dev_driver(). Reviewed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/10-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-08-11 09:50:11 -06:00
Jason Gunthorpe	a882c16a2b	vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_set vfio_pci_try_bus_reset() is triggering a reset of the entire_dev set if any device within it has accumulated a needs_reset. This reset can only be done once all of the drivers operating the PCI devices to be reset are in a known safe state. Make this clearer by directly operating on the dev_set instead of the vfio_pci_device. Rename the function to vfio_pci_dev_set_try_reset(). Use the device list inside the dev_set to check that all drivers are in a safe state instead of working backwards from the pci_device. The dev_set->lock directly prevents devices from joining/leaving the set, or changing their state, which further implies the pci_device cannot change drivers or that the vfio_device be freed, eliminating the need for get/put's. If a pci_device to be reset is not in the dev_set then the reset cannot be used as we can't know what the state of that driver is. Directly measure this by checking that every pci_device is in the dev_set - which effectively proves that VFIO drivers are attached to everything. Remove the odd interaction around vfio_pci_set_power_state() - have the only caller avoid its redundant vfio_pci_set_power_state() instead of avoiding it inside vfio_pci_dev_set_try_reset(). This restructuring corrects a call to pci_dev_driver() without holding the device_lock() and removes a hard wiring to &vfio_pci_driver. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/9-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-08-11 09:50:11 -06:00
Yishai Hadas	2cd8b14aaa	vfio/pci: Move to the device set infrastructure PCI wants to have the usual open/close_device() logic with the slight twist that the open/close_device() must be done under a singelton lock shared by all of the vfio_devices that are in the PCI "reset group". The reset group, and thus the device set, is determined by what devices pci_reset_bus() touches, which is either the entire bus or only the slot. Rely on the core code to do everything reflck was doing and delete reflck entirely. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/8-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-08-11 09:50:11 -06:00
Max Gurtovoy	ae03c3771b	vfio: Introduce a vfio_uninit_group_dev() API call This pairs with vfio_init_group_dev() and allows undoing any state that is stored in the vfio_device unrelated to registration. Add appropriately placed calls to all the drivers. The following patch will use this to add pre-registration state for the device set. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-08-11 09:50:10 -06:00
Yishai Hadas	e7500b3ede	vfio/pci: Make vfio_pci_regops->rw() return ssize_t The only implementation of this in IGD returns a -ERRNO which is implicitly cast through a size_t and then casted again and returned as a ssize_t in vfio_pci_rw(). Fix the vfio_pci_regops->rw() return type to be ssize_t so all is consistent. Fixes: `28541d41c9` ("vfio/pci: Add infrastructure for additional device specific regions") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/0-v3-5db12d1bf576+c910-vfio_rw_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-08-02 15:23:42 -06:00
Dave Airlie	8da49a33dd	drm-misc-next for v5.15-rc1: UAPI Changes: - Remove sysfs stats for dma-buf attachments, as it causes a performance regression. Previous merge is not in a rc kernel yet, so no userspace regression possible. Cross-subsystem Changes: - Sanitize user input in kyro's viewport ioctl. - Use refcount_t in fb_info->count - Assorted fixes to dma-buf. - Extend x86 efifb handling to all archs. - Fix neofb divide by 0. - Document corpro,gm7123 bridge dt bindings. Core Changes: - Slightly rework drm master handling. - Cleanup vgaarb handling. - Assorted fixes. Driver Changes: - Add support for ws2401 panel. - Assorted fixes to stm, ast, bochs. - Demidlayer ingenic irq. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmD5TGAACgkQ/lWMcqZw E8PNgxAApjTYQSfjIBbOZnNraxW6w7/bPea35E9A47EdBQsNGnYftNsFjbrn/mCJ D+0eRLjCMlg4FF1SHdh9cPJ35py+ygbDeupogboLITfU99eGBth3fM2Xdg9LPcBh dbni/JLG9R7gIvSlqdJuweN21trfVrV/9FQEilG5DvQcl27Wx5g8VMRZke1EqGKX 7Id09Uq50ky18vhDjQRCveYhRqJAxV+XozBatzHyxpDVzjLQvRhlAAYdvrSMHZ5R jreGzOfR8awc6Om+w7wx3Jn1oEGmXVZB/VqxEqGtMOr3lpARPucxrqfHsqpam3rv yIoEKPrkG+k6fsU7Tbg59jNqe/PbCUW3AlpyuBxf55EbnVGgjLDbq4sRRMkehPfA fhC31ujOXQQnAgaxyeQAaAJFKNFJzA8Cq5ZPfG+zztzuomHCiUVQBRowP65hJMzR +ZlEDnhUD3STLz39zuO1reZR1ZoPIvKbsokHAA+ZrIwUd6U3D3ia8V51pq+lL5aS TGDkyMN9jyZ+SO8Z7+2FnJAv9FAOPU/WCLU/fWW46jAvuezwMIwVcjfSqDU2XbZD e7KgHpHhx3BGxI8TThHKlY7mf6IL2Bm7X1Cv1pdZs/eEn3Udh2ax942uTQZu/YOO 0AT1XchpvYCBNRw05bVI3OlJ+w3I8uV+h+11jHOKeY6cbwdHeKE= =BUya -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2021-07-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.15-rc1: UAPI Changes: - Remove sysfs stats for dma-buf attachments, as it causes a performance regression. Previous merge is not in a rc kernel yet, so no userspace regression possible. Cross-subsystem Changes: - Sanitize user input in kyro's viewport ioctl. - Use refcount_t in fb_info->count - Assorted fixes to dma-buf. - Extend x86 efifb handling to all archs. - Fix neofb divide by 0. - Document corpro,gm7123 bridge dt bindings. Core Changes: - Slightly rework drm master handling. - Cleanup vgaarb handling. - Assorted fixes. Driver Changes: - Add support for ws2401 panel. - Assorted fixes to stm, ast, bochs. - Demidlayer ingenic irq. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/2d0d2fe8-01fc-e216-c3fd-38db9e69944e@linux.intel.com	2021-07-23 11:32:43 +10:00
Christoph Hellwig	bf44e8cecc	vgaarb: don't pass a cookie to vga_client_register The VGA arbitration is entirely based on pci_dev structures, so just pass that back to the set_vga_decode callback. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-8-hch@lst.de Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>	2021-07-21 10:29:10 +02:00
Christoph Hellwig	f6b1772b25	vgaarb: remove the unused irq_set_state argument to vga_client_register All callers pass NULL as the irq_set_state argument, so remove it and the ->irq_set_state member in struct vga_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-7-hch@lst.de Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>	2021-07-21 10:29:05 +02:00
Christoph Hellwig	b877947586	vgaarb: provide a vga_client_unregister wrapper Add a trivial wrapper for the unregister case that sets all fields to NULL. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-6-hch@lst.de Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>	2021-07-21 10:29:00 +02:00
Alex Williamson	6a45ece4c9	vfio/pci: Handle concurrent vma faults io_remap_pfn_range() will trigger a BUG_ON if it encounters a populated pte within the mapping range. This can occur because we map the entire vma on fault and multiple faults can be blocked behind the vma_lock. This leads to traces like the one reported below. We can use our vma_list to test whether a given vma is mapped to avoid this issue. [ 1591.733256] kernel BUG at mm/memory.c:2177! [ 1591.739515] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 1591.747381] Modules linked in: vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O) [ 1591.760536] CPU: 2 PID: 227 Comm: lcore-worker-2 Tainted: G O 5.11.0-rc3+ #1 [ 1591.770735] Hardware name: , BIOS HixxxxFPGA 1P B600 V121-1 [ 1591.778872] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--) [ 1591.786134] pc : remap_pfn_range+0x214/0x340 [ 1591.793564] lr : remap_pfn_range+0x1b8/0x340 [ 1591.799117] sp : ffff80001068bbd0 [ 1591.803476] x29: ffff80001068bbd0 x28: 0000042eff6f0000 [ 1591.810404] x27: 0000001100910000 x26: 0000001300910000 [ 1591.817457] x25: 0068000000000fd3 x24: ffffa92f1338e358 [ 1591.825144] x23: 0000001140000000 x22: 0000000000000041 [ 1591.832506] x21: 0000001300910000 x20: ffffa92f141a4000 [ 1591.839520] x19: 0000001100a00000 x18: 0000000000000000 [ 1591.846108] x17: 0000000000000000 x16: ffffa92f11844540 [ 1591.853570] x15: 0000000000000000 x14: 0000000000000000 [ 1591.860768] x13: fffffc0000000000 x12: 0000000000000880 [ 1591.868053] x11: ffff0821bf3d01d0 x10: ffff5ef2abd89000 [ 1591.875932] x9 : ffffa92f12ab0064 x8 : ffffa92f136471c0 [ 1591.883208] x7 : 0000001140910000 x6 : 0000000200000000 [ 1591.890177] x5 : 0000000000000001 x4 : 0000000000000001 [ 1591.896656] x3 : 0000000000000000 x2 : 0168044000000fd3 [ 1591.903215] x1 : ffff082126261880 x0 : fffffc2084989868 [ 1591.910234] Call trace: [ 1591.914837] remap_pfn_range+0x214/0x340 [ 1591.921765] vfio_pci_mmap_fault+0xac/0x130 [vfio_pci] [ 1591.931200] __do_fault+0x44/0x12c [ 1591.937031] handle_mm_fault+0xcc8/0x1230 [ 1591.942475] do_page_fault+0x16c/0x484 [ 1591.948635] do_translation_fault+0xbc/0xd8 [ 1591.954171] do_mem_abort+0x4c/0xc0 [ 1591.960316] el0_da+0x40/0x80 [ 1591.965585] el0_sync_handler+0x168/0x1b0 [ 1591.971608] el0_sync+0x174/0x180 [ 1591.978312] Code: eb1b027f 540000c0 f9400022 b4fffe02 (d4210000) Fixes: `11c4cd07ba` ("vfio-pci: Fault mmaps to enable vma tracking") Reported-by: Zeng Tao <prime.zeng@hisilicon.com> Suggested-by: Zeng Tao <prime.zeng@hisilicon.com> Link: https://lore.kernel.org/r/162497742783.3883260.3282953006487785034.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-06-30 13:55:53 -06:00
Luis Chamberlain	742b4c0d1e	vfio: use the new pci_dev_trylock() helper to simplify try lock Use the new pci_dev_trylock() helper to simplify our locking. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20210623022824.308041-3-mcgrof@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-06-24 13:32:31 -06:00
Max Gurtovoy	9dcf01d957	vfio: centralize module refcount in subsystem layer Remove code duplication and move module refcounting to the subsystem module. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20210518192133.59195-2-mgurtovoy@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-06-15 14:12:15 -06:00
Randy Dunlap	2a55ca3735	vfio/pci: zap_vma_ptes() needs MMU zap_vma_ptes() is only available when CONFIG_MMU is set/enabled. Without CONFIG_MMU, vfio_pci.o has build errors, so make VFIO_PCI depend on MMU. riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `vfio_pci_mmap_open': vfio_pci.c:(.text+0x1ec): undefined reference to `zap_vma_ptes' riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `.L0 ': vfio_pci.c:(.text+0x165c): undefined reference to `zap_vma_ptes' Fixes: `11c4cd07ba` ("vfio-pci: Fault mmaps to enable vma tracking") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: kvm@vger.kernel.org Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Eric Auger <eric.auger@redhat.com> Message-Id: <20210515190856.2130-1-rdunlap@infradead.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-05-24 13:40:13 -06:00
Zhen Lei	d1ce2c7915	vfio/pci: Fix error return code in vfio_ecap_init() The error code returned from vfio_ext_cap_len() is stored in 'len', not in 'ret'. Fixes: `89e1f7d4c6` ("vfio: Add PCI device driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20210515020458.6771-1-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-05-24 12:14:18 -06:00
Linus Torvalds	238da4d004	VFIO updates for v5.13-rc1 - Embed struct vfio_device into vfio driver structures (Jason Gunthorpe) - Make vfio_mdev type safe (Jason Gunthorpe) - Remove vfio-pci NVLink2 extensions for POWER9 (Christoph Hellwig) - Update vfio-pci IGD extensions for OpRegion 2.1+ (Fred Gao) - Various spelling/blank line fixes (Zhen Lei, Zhou Wang, Bhaskar Chowdhury) - Simplify unpin_pages error handling (Shenming Lu) - Fix i915 mdev Kconfig dependency (Arnd Bergmann) - Remove unused structure member (Keqian Zhu) -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmCJsEsbHGFsZXgud2ls bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsi4HEP/icVw+sKYvfYkN3D5+7+ J+IUtgdXSFaSmc9S8lj9gED68t6t5BhPMtdfSe9Tfvn/btyaYZ6uNwZGNqkaRCz8 97jLkDmTPzL2F8uiMT4OI/VY6mzcK5yKA0mYNcO+nXQyUNtpDDCCbTD9OfIu62i/ nVAu6u/Sj89zj66opZLTz1sxlYu/IQQ6olDlmgRAZ0JfUmpZLuNrSJbdv1nJZN0i uSAfNthSlgfMK/6hf9oW22GFjiJmnMVPnPd2wMceEq1N2F3Co9mNEhsMwqSZH3ra UQVJKRLA1NMhlee33Pkbsilmwk3lyTXI4GYw61y+TDi/CuOvNkym1BIW6WCQlQRi iOuMnaas3UOwi45pcqDNl9DRR+mfJc9I11auyAbznuw6omeZVxYHkWi94HRbJNzf zDJN94fcnafomHXe0NYJ33eSUrEeN7heNd5wvMkqQJEfHRpqUQf/xOGE9xBZXVAO Vl02C+qQt/fM0tds09lUOvMRjAYBRSaF2xcq6SEQYMXmfx9hcOrEUJUEWegy0NGN lq+Oa9mMxwHllDm9Wtn7vMx4KPqDm6dq3BwB45+S7bdQiDj0RiCohRLK9GC7YMzT K6dlKdYmed9dOnIzCw8R9PI/DDHRyG/mXgzFERkB5DTt1YVssYLTZnGL2lePwM9J BSqG2mMHzyYaHdaaVAWHH/vQ =y3PY -----END PGP SIGNATURE----- Merge tag 'vfio-v5.13-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Embed struct vfio_device into vfio driver structures (Jason Gunthorpe) - Make vfio_mdev type safe (Jason Gunthorpe) - Remove vfio-pci NVLink2 extensions for POWER9 (Christoph Hellwig) - Update vfio-pci IGD extensions for OpRegion 2.1+ (Fred Gao) - Various spelling/blank line fixes (Zhen Lei, Zhou Wang, Bhaskar Chowdhury) - Simplify unpin_pages error handling (Shenming Lu) - Fix i915 mdev Kconfig dependency (Arnd Bergmann) - Remove unused structure member (Keqian Zhu) * tag 'vfio-v5.13-rc1' of git://github.com/awilliam/linux-vfio: (43 commits) vfio/gvt: fix DRM_I915_GVT dependency on VFIO_MDEV vfio/iommu_type1: Remove unused pinned_page_dirty_scope in vfio_iommu vfio/mdev: Correct the function signatures for the mdev_type_attributes vfio/mdev: Remove kobj from mdev_parent_ops->create() vfio/gvt: Use mdev_get_type_group_id() vfio/gvt: Make DRM_I915_GVT depend on VFIO_MDEV vfio/mbochs: Use mdev_get_type_group_id() vfio/mdpy: Use mdev_get_type_group_id() vfio/mtty: Use mdev_get_type_group_id() vfio/mdev: Add mdev/mtype_get_type_group_id() vfio/mdev: Remove duplicate storage of parent in mdev_device vfio/mdev: Add missing error handling to dev_set_name() vfio/mdev: Reorganize mdev_device_create() vfio/mdev: Add missing reference counting to mdev_type vfio/mdev: Expose mdev_get/put_parent to mdev_private.h vfio/mdev: Use struct mdev_type in struct mdev_device vfio/mdev: Simplify driver registration vfio/mdev: Add missing typesafety around mdev_device vfio/mdev: Do not allow a mdev_type to have a NULL parent pointer vfio/mdev: Fix missing static's on MDEV_TYPE_ATTR's ...	2021-04-28 17:19:47 -07:00
Christian A. Ehrhardt	909290786e	vfio/pci: Add missing range check in vfio_pci_mmap When mmaping an extra device region verify that the region index derived from the mmap offset is valid. Fixes: `a15b1883fe` ("vfio_pci: Allow mapping extra regions") Cc: stable@vger.kernel.org Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Message-Id: <20210412214124.GA241759@lisa.in-ulm.de> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-13 08:29:16 -06:00
Alex Williamson	6a2a235aa6	Merge branches 'v5.13/vfio/embed-vfio_device', 'v5.13/vfio/misc' and 'v5.13/vfio/nvlink' into v5.13/vfio/next Spelling fixes merged with file deletion. Conflicts: drivers/vfio/pci/vfio_pci_nvlink2.c Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 12:01:51 -06:00
Jason Gunthorpe	1e04ec1420	vfio: Remove device_data from the vfio bus driver API There are no longer any users, so it can go away. Everything is using container_of now. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <14-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:55:11 -06:00
Jason Gunthorpe	07d47b4222	vfio/pci: Replace uses of vfio_device_data() with container_of This tidies a few confused places that think they can have a refcount on the vfio_device but the device_data could be NULL, that isn't possible by design. Most of the change falls out when struct vfio_devices is updated to just store the struct vfio_pci_device itself. This wasn't possible before because there was no easy way to get from the 'struct vfio_pci_device' to the 'struct vfio_device' to put back the refcount. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <13-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:55:11 -06:00
Jason Gunthorpe	6df62c5b05	vfio: Make vfio_device_ops pass a 'struct vfio_device ' instead of 'void ' This is the standard kernel pattern, the ops associated with a struct get the struct pointer in for typesafety. The expected design is to use container_of to cleanly go from the subsystem level type to the driver level type without having any type erasure in a void *. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <12-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:55:11 -06:00
Jason Gunthorpe	6b018e203d	vfio/pci: Use vfio_init/register/unregister_group_dev pci already allocates a struct vfio_pci_device with exactly the same lifetime as vfio_device, switch to the new API and embed vfio_device in vfio_pci_device. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <9-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:55:10 -06:00
Jason Gunthorpe	4aeec3984d	vfio/pci: Re-order vfio_pci_probe() vfio_add_group_dev() must be called only after all of the private data in vdev is fully setup and ready, otherwise there could be races with user space instantiating a device file descriptor and starting to call ops. For instance vfio_pci_reflck_attach() sets vdev->reflck and vfio_pci_open(), called by fops open, unconditionally derefs it, which will crash if things get out of order. Fixes: `cc20d79990` ("vfio/pci: Introduce VF token") Fixes: `e309df5b0c` ("vfio/pci: Parallelize device open and release") Fixes: `6eb7018705` ("vfio-pci: Move idle devices to D3hot power state") Fixes: `ecaa1f6a01` ("vfio-pci: Add VGA arbiter client") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <8-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:55:10 -06:00
Jason Gunthorpe	61e9081748	vfio/pci: Move VGA and VF initialization to functions vfio_pci_probe() is quite complicated, with optional VF and VGA sub components. Move these into clear init/uninit functions and have a linear flow in probe/remove. This fixes a few little buglets: - vfio_pci_remove() is in the wrong order, vga_client_register() removes a notifier and is after kfree(vdev), but the notifier refers to vdev, so it can use after free in a race. - vga_client_register() can fail but was ignored Organize things so destruction order is the reverse of creation order. Fixes: `ecaa1f6a01` ("vfio-pci: Add VGA arbiter client") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <7-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:55:10 -06:00
Christoph Hellwig	b392a19891	vfio/pci: remove vfio_pci_nvlink2 This driver never had any open userspace (which for VFIO would include VM kernel drivers) that use it, and thus should never have been added by our normal userspace ABI rules. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Message-Id: <20210326061311.1497642-2-hch@lst.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:54:13 -06:00
Zhen Lei	d0915b3291	vfio/pci: fix a couple of spelling mistakes There are several spelling mistakes, as follows: thru ==> through presense ==> presence Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210326083528.1329-4-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:50 -06:00
Fred Gao	bab2c1990b	vfio/pci: Add support for opregion v2.1+ Before opregion version 2.0 VBT data is stored in opregion mailbox #4, but when VBT data exceeds 6KB size and cannot be within mailbox #4 then from opregion v2.0+, Extended VBT region, next to opregion is used to hold the VBT data, so the total size will be opregion size plus extended VBT region size. Since opregion v2.0 with physical host VBT address would not be practically available for end user and guest can not directly access host physical address, so it is not supported. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Swee Yee Fonn <swee.yee.fonn@intel.com> Signed-off-by: Fred Gao <fred.gao@intel.com> Message-Id: <20210325170953.24549-1-fred.gao@intel.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:50 -06:00
Zhou Wang	36f0be5a30	vfio/pci: Remove an unnecessary blank line in vfio_pci_enable This blank line is unnecessary, so remove it. Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Message-Id: <1615808073-178604-1-git-send-email-wangzhou1@hisilicon.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:50 -06:00
Bhaskar Chowdhury	fbc9d37161	vfio: pci: Spello fix in the file vfio_pci.c s/permision/permission/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Message-Id: <20210314052925.3560-1-unixbhaskar@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:49 -06:00
Jason Gunthorpe	e0146a108c	vfio/nvlink: Add missing SPAPR_TCE_IOMMU depends Compiling the nvlink stuff relies on the SPAPR_TCE_IOMMU otherwise there are compile errors: drivers/vfio/pci/vfio_pci_nvlink2.c:101:10: error: implicit declaration of function 'mm_iommu_put' [-Werror,-Wimplicit-function-declaration] ret = mm_iommu_put(data->mm, data->mem); As PPC only defines these functions when the config is set. Previously this wasn't a problem by chance as SPAPR_TCE_IOMMU was the only IOMMU that could have satisfied IOMMU_API on POWERNV. Fixes: `179209fa12` ("vfio: IOMMU_API should be selected") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <0-v1-83dba9768fc3+419-vfio_nvlink2_kconfig_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-29 14:48:00 -06:00
Max Gurtovoy	b9abef43a0	vfio/pci: remove CONFIG_VFIO_PCI_ZDEV from Kconfig In case we're running on s390 system always expose the capabilities for configuration of zPCI devices. In case we're running on different platform, continue as usual. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-19 10:29:56 -07:00
Max Gurtovoy	7e31d6dc2c	vfio-pci/zdev: fix possible segmentation fault issue In case allocation fails, we must behave correctly and exit with error. Fixes: `e6b817d4b8` ("vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO") Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-02 09:06:02 -07:00
Max Gurtovoy	46c4746660	vfio-pci/zdev: remove unused vdev argument Zdev static functions do not use vdev argument. Remove it. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:43:06 -07:00
Heiner Kallweit	37a682ffbe	vfio/pci: Fix handling of pci use accessor return codes The pci user accessors return negative errno's on error. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: drop Fixes tag, pcibios_err_to_errno() behaves correctly for -errno] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:40:52 -07:00
Alexey Kardashevskiy	d22f9a6c92	vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU We execute certain NPU2 setup code (such as mapping an LPID to a device in NPU2) unconditionally if an Nvlink bridge is detected. However this cannot succeed on POWER8NVL machines as the init helpers return an error other than ENODEV which means the device is there is and setup failed so vfio_pci_enable() fails and pass through is not possible. This changes the two NPU2 related init helpers to return -ENODEV if there is no "memory-region" device tree property as this is the distinction between NPU and NPU2. Tested on - POWER9 pvr=004e1201, Ubuntu 19.04 host, Ubuntu 18.04 vm, NVIDIA GV100 10de:1db1 driver 418.39 - POWER8 pvr=004c0100, RHEL 7.6 host, Ubuntu 16.10 vm, NVIDIA P100 10de:15f9 driver 396.47 Fixes: `7f92891778` ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Cc: stable@vger.kernel.org # 5.0 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-12-02 13:04:22 -07:00
Jason Gunthorpe	7b06a56d46	vfio-pci: Use io_remap_pfn_range() for PCI IO memory commit `f8f6ae5d07` ("mm: always have io_remap_pfn_range() set pgprot_decrypted()") allows drivers using mmap to put PCI memory mapped BAR space into userspace to work correctly on AMD SME systems that default to all memory encrypted. Since vfio_pci_mmap_fault() is working with PCI memory mapped BAR space it should be calling io_remap_pfn_range() otherwise it will not work on SME systems. Fixes: `11c4cd07ba` ("vfio-pci: Fault mmaps to enable vma tracking") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-12-02 13:03:12 -07:00
Eric Auger	16b8fe4caf	vfio/pci: Move dummy_resources_list init in vfio_pci_probe() In case an error occurs in vfio_pci_enable() before the call to vfio_pci_probe_mmaps(), vfio_pci_disable() will try to iterate on an uninitialized list and cause a kernel panic. Lets move to the initialization to vfio_pci_probe() to fix the issue. Signed-off-by: Eric Auger <eric.auger@redhat.com> Fixes: `05f0c03fba` ("vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive") CC: Stable <stable@vger.kernel.org> # v4.7+ Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-12-02 13:03:12 -07:00
Fred Gao	e4eccb8536	vfio/pci: Bypass IGD init in case of -ENODEV Bypass the IGD initialization when -ENODEV returns, that should be the case if opregion is not available for IGD or within discrete graphics device's option ROM, or host/lpc bridge is not found. Then use of -ENODEV here means no special device resources found which needs special care for VFIO, but we still allow other normal device resource access. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Xiong Zhang <xiong.y.zhang@intel.com> Cc: Hang Yuan <hang.yuan@linux.intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Fred Gao <fred.gao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-03 11:07:40 -07:00
Alex Williamson	38565c93c8	vfio/pci: Implement ioeventfd thread handler for contended memory lock The ioeventfd is called under spinlock with interrupts disabled, therefore if the memory lock is contended defer code that might sleep to a thread context. Fixes: `bc93b9ae01` ("vfio-pci: Avoid recursive read-lock usage") Link: https://bugzilla.kernel.org/show_bug.cgi?id=209253#c1 Reported-by: Ian Pilcher <arequipeno@gmail.com> Tested-by: Ian Pilcher <arequipeno@gmail.com> Tested-by: Justin Gatzen <justin.gatzen@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-03 11:07:40 -07:00
Linus Torvalds	fc996db970	VFIO updates for v5.10-rc1 - New fsl-mc vfio bus driver supporting userspace drivers of objects within NXP's DPAA2 architecture (Diana Craciun) - Support for exposing zPCI information on s390 (Matthew Rosato) - Fixes for "detached" VFs on s390 (Matthew Rosato) - Fixes for pin-pages and dma-rw accesses (Yan Zhao) - Cleanups and optimize vconfig regen (Zenghui Yu) - Fix duplicate irq-bypass token registration (Alex Williamson) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJfkcCjAAoJECObm247sIsi2XIP/j7NL4glPrWU37mesz9dd5nx SmZhcmxnOqZSQkOCnu+hNFZ9e+tdQjuX+jATOZaYz5l55bLAFmBlBj1Dv8HWaCVI mTbJ6xXUwdOvNSxbFH6BIUkJg8otR0iEkefVyJLNlF84FsaDknH4yZxx0vdeczjF wTkkk3+4VmH+4klvPIa9v0eL7yeKeFmgls9nQViVE5kDWUF4us/z/oHlVm9wR+mL 2r3DEjHyz4L2hwVEkhZk7ytR6szdhuhF2l7NoMmaSEXRXjBzJoO6I3P9Y2W4i+su MFgTfiQ+OpIfVuiR8GzGev+/SrjWGX0Hvb2sYriKOELjhyedkE2kmxacbqMZ/UE+ SRAhFf64C1rzJ4g1IW//Gg+9ObIPqlkqU52VDbOZdCED0AquwSyVmdwIUAK6qF+I HLOyZXhMI8EZ+w063cS+aKLJIvQTBbfIdMmPZkopVZhwWB3N3BjdvBKA+rPpPoTx 0DpeUo891+zyeEE4aunUmCB8HFnBPgUa+XZqg2juq9MxjScsqgTzA0WEZg7jV4oj tORQrqoAKJgSk9oVL3EvAnr+IJix3ScRTqYymESORkz/lRCk2hFX48qdeW+qiSP8 W1DHOnivFb1+JzhuZyaRKFWy1mK0EQQWTsE2b2ymPMKJbFhi+pVxaksmeG5x+4Q9 SAp+Qma8Aj3UtBKcj/S+ =LDPo -----END PGP SIGNATURE----- Merge tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - New fsl-mc vfio bus driver supporting userspace drivers of objects within NXP's DPAA2 architecture (Diana Craciun) - Support for exposing zPCI information on s390 (Matthew Rosato) - Fixes for "detached" VFs on s390 (Matthew Rosato) - Fixes for pin-pages and dma-rw accesses (Yan Zhao) - Cleanups and optimize vconfig regen (Zenghui Yu) - Fix duplicate irq-bypass token registration (Alex Williamson) * tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits) vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages vfio/pci: Clear token on bypass registration failure vfio/fsl-mc: fix the return of the uninitialized variable ret vfio/fsl-mc: Fix the dead code in vfio_fsl_mc_set_irq_trigger vfio/fsl-mc: Fixed vfio-fsl-mc driver compilation on 32 bit MAINTAINERS: Add entry for s390 vfio-pci vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO vfio/fsl-mc: Add support for device reset vfio/fsl-mc: Add read/write support for fsl-mc devices vfio/fsl-mc: trigger an interrupt via eventfd vfio/fsl-mc: Add irq infrastructure for fsl-mc devices vfio/fsl-mc: Added lock support in preparation for interrupt handling vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO s390/pci: track whether util_str is valid in the zpci_dev s390/pci: stash version in the zpci_dev vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices ...	2020-10-22 13:00:44 -07:00
Alex Williamson	852b1beecb	vfio/pci: Clear token on bypass registration failure The eventfd context is used as our irqbypass token, therefore if an eventfd is re-used, our token is the same. The irqbypass code will return an -EBUSY in this case, but we'll still attempt to unregister the producer, where if that duplicate token still exists, results in removing the wrong object. Clear the token of failed producers so that they harmlessly fall out when unregistered. Fixes: `6d7425f109` ("vfio: Register/unregister irq_bypass_producer") Reported-by: guomin chen <guomin_chen@sina.com> Tested-by: guomin chen <guomin_chen@sina.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-19 07:13:55 -06:00
Jann Horn	4d45e75a99	mm: remove the now-unnecessary mmget_still_valid() hack The preceding patches have ensured that core dumping properly takes the mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all its users. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-16 11:11:22 -07:00
Alex Williamson	2099363255	Merge branches 'v5.10/vfio/fsl-mc-v6' and 'v5.10/vfio/zpci-info-v3' into v5.10/vfio/next	2020-10-12 11:41:02 -06:00
Matthew Rosato	e6b817d4b8	vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO Define a new configuration entry VFIO_PCI_ZDEV for VFIO/PCI. When this s390-only feature is configured we add capabilities to the VFIO_DEVICE_GET_INFO ioctl that describe features of the associated zPCI device and its underlying hardware. This patch is based on work previously done by Pierre Morel. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:37:59 -06:00
Alex Williamson	3de066f8f8	Merge branches 'v5.10/vfio/bardirty', 'v5.10/vfio/dma_avail', 'v5.10/vfio/misc', 'v5.10/vfio/no-cmd-mem' and 'v5.10/vfio/yan_zhao_fixes' into v5.10/vfio/next	2020-09-22 10:56:51 -06:00
Matthew Rosato	515ecd5368	vfio/pci: Decouple PCI_COMMAND_MEMORY bit checks from is_virtfn While it is true that devices with is_virtfn=1 will have a Memory Space Enable bit that is hard-wired to 0, this is not the only case where we see this behavior -- For example some bare-metal hypervisors lack Memory Space Enable bit emulation for devices not setting is_virtfn (s390). Fix this by instead checking for the newly-added no_command_memory bit which directly denotes the need for PCI_COMMAND_MEMORY emulation in vfio. Fixes: `abafbc551f` ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-09-22 10:52:24 -06:00
Zenghui Yu	1c0f68252a	vfio/pci: Don't regenerate vconfig for all BARs if !bardirty Now we regenerate vconfig for all the BARs via vfio_bar_fixup(), every time any offset of any of them are read. Though BARs aren't re-read regularly, the regeneration can be avoided if no BARs had been written since they were last read, in which case vdev->bardirty is false. Let's return immediately in vfio_bar_fixup() if bardirty is false. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-09-21 14:08:12 -06:00
Zenghui Yu	eac7cc21c4	vfio/pci: Remove redundant declaration of vfio_pci_driver It was added by commit `137e553135` ("vfio/pci: Add sriov_configure support") but duplicates a forward declaration earlier in the file. Remove it. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-09-21 14:07:05 -06:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Alex Williamson	bc93b9ae01	vfio-pci: Avoid recursive read-lock usage A down_read on memory_lock is held when performing read/write accesses to MMIO BAR space, including across the copy_to/from_user() callouts which may fault. If the user buffer for these copies resides in an mmap of device MMIO space, the mmap fault handler will acquire a recursive read-lock on memory_lock. Avoid this by reducing the lock granularity. Sequential accesses requiring multiple ioread/iowrite cycles are expected to be rare, therefore typical accesses should not see additional overhead. VGA MMIO accesses are expected to be non-fatal regardless of the PCI memory enable bit to allow legacy probing, this behavior remains with a comment added. ioeventfds are now included in memory access testing, with writes dropped while memory space is disabled. Fixes: `abafbc551f` ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-08-17 11:08:18 -06:00
Giovanni Cabiddu	50173329c8	vfio/pci: Add QAT devices to denylist The current generation of Intel® QuickAssist Technology devices are not designed to run in an untrusted environment because of the following issues reported in the document "Intel® QuickAssist Technology (Intel® QAT) Software for Linux" (document number 336211-014): QATE-39220 - GEN - Intel® QAT API submissions with bad addresses that trigger DMA to invalid or unmapped addresses can cause a platform hang QATE-7495 - GEN - An incorrectly formatted request to Intel® QAT can hang the entire Intel® QAT Endpoint The document is downloadable from https://01.org/intel-quickassist-technology at the following link: https://01.org/sites/default/files/downloads/336211-014-qatforlinux-releasenotes-hwv1.7_0.pdf This patch adds the following QAT devices to the denylist: DH895XCC, C3XXX and C62X. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:40 -06:00
Giovanni Cabiddu	1f97970e6c	vfio/pci: Add device denylist Add denylist of devices that by default are not probed by vfio-pci. Devices in this list may be susceptible to untrusted application, even if the IOMMU is enabled. To be accessed via vfio-pci, the user has to explicitly disable the denylist. The denylist can be disabled via the module parameter disable_denylist. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:40 -06:00
Alex Williamson	924b51abf9	vfio/pci: Hold igate across releasing eventfd contexts No need to release and immediately re-acquire igate while clearing out the eventfd ctxs. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:38 -06:00
Alex Williamson	bf3551e150	vfio/pci: Add Intel X550 to hidden INTx devices Intel document 333717-008, "Intel® Ethernet Controller X550 Specification Update", version 2.7, dated June 2020, includes errata #22, added in version 2.1, May 2016, indicating X550 NICs suffer from the same implementation deficiency as the 700-series NICs: "The Interrupt Status bit in the Status register of the PCIe configuration space is not implemented and is not set as described in the PCIe specification." Without the interrupt status bit, vfio-pci cannot determine when these devices signal INTx. They are therefore added to the nointx quirk. Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:37 -06:00
Zeng Tao	b872d06408	vfio/pci: fix racy on error and request eventfd ctx The vfio_pci_release call will free and clear the error and request eventfd ctx while these ctx could be in use at the same time in the function like vfio_pci_request, and it's expected to protect them under the vdev->igate mutex, which is missing in vfio_pci_release. This issue is introduced since commit `1518ac272e` ("vfio/pci: fix memory leaks of eventfd ctx"),and since commit `5c5866c593` ("vfio/pci: Clear error and request eventfd ctx after releasing"), it's very easily to trigger the kernel panic like this: [ 9513.904346] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 9513.913091] Mem abort info: [ 9513.915871] ESR = 0x96000006 [ 9513.918912] EC = 0x25: DABT (current EL), IL = 32 bits [ 9513.924198] SET = 0, FnV = 0 [ 9513.927238] EA = 0, S1PTW = 0 [ 9513.930364] Data abort info: [ 9513.933231] ISV = 0, ISS = 0x00000006 [ 9513.937048] CM = 0, WnR = 0 [ 9513.940003] user pgtable: 4k pages, 48-bit VAs, pgdp=0000007ec7d12000 [ 9513.946414] [0000000000000008] pgd=0000007ec7d13003, p4d=0000007ec7d13003, pud=0000007ec728c003, pmd=0000000000000000 [ 9513.956975] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 9513.962521] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio hclge hns3 hnae3 [last unloaded: vfio_pci] [ 9513.972998] CPU: 4 PID: 1327 Comm: bash Tainted: G W 5.8.0-rc4+ #3 [ 9513.980443] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020 [ 9513.989274] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--) [ 9513.994827] pc : _raw_spin_lock_irqsave+0x48/0x88 [ 9513.999515] lr : eventfd_signal+0x6c/0x1b0 [ 9514.003591] sp : ffff800038a0b960 [ 9514.006889] x29: ffff800038a0b960 x28: ffff007ef7f4da10 [ 9514.012175] x27: ffff207eefbbfc80 x26: ffffbb7903457000 [ 9514.017462] x25: ffffbb7912191000 x24: ffff007ef7f4d400 [ 9514.022747] x23: ffff20be6e0e4c00 x22: 0000000000000008 [ 9514.028033] x21: 0000000000000000 x20: 0000000000000000 [ 9514.033321] x19: 0000000000000008 x18: 0000000000000000 [ 9514.038606] x17: 0000000000000000 x16: ffffbb7910029328 [ 9514.043893] x15: 0000000000000000 x14: 0000000000000001 [ 9514.049179] x13: 0000000000000000 x12: 0000000000000002 [ 9514.054466] x11: 0000000000000000 x10: 0000000000000a00 [ 9514.059752] x9 : ffff800038a0b840 x8 : ffff007ef7f4de60 [ 9514.065038] x7 : ffff007fffc96690 x6 : fffffe01faffb748 [ 9514.070324] x5 : 0000000000000000 x4 : 0000000000000000 [ 9514.075609] x3 : 0000000000000000 x2 : 0000000000000001 [ 9514.080895] x1 : ffff007ef7f4d400 x0 : 0000000000000000 [ 9514.086181] Call trace: [ 9514.088618] _raw_spin_lock_irqsave+0x48/0x88 [ 9514.092954] eventfd_signal+0x6c/0x1b0 [ 9514.096691] vfio_pci_request+0x84/0xd0 [vfio_pci] [ 9514.101464] vfio_del_group_dev+0x150/0x290 [vfio] [ 9514.106234] vfio_pci_remove+0x30/0x128 [vfio_pci] [ 9514.111007] pci_device_remove+0x48/0x108 [ 9514.115001] device_release_driver_internal+0x100/0x1b8 [ 9514.120200] device_release_driver+0x28/0x38 [ 9514.124452] pci_stop_bus_device+0x68/0xa8 [ 9514.128528] pci_stop_and_remove_bus_device+0x20/0x38 [ 9514.133557] pci_iov_remove_virtfn+0xb4/0x128 [ 9514.137893] sriov_disable+0x3c/0x108 [ 9514.141538] pci_disable_sriov+0x28/0x38 [ 9514.145445] hns3_pci_sriov_configure+0x48/0xb8 [hns3] [ 9514.150558] sriov_numvfs_store+0x110/0x198 [ 9514.154724] dev_attr_store+0x44/0x60 [ 9514.158373] sysfs_kf_write+0x5c/0x78 [ 9514.162018] kernfs_fop_write+0x104/0x210 [ 9514.166010] __vfs_write+0x48/0x90 [ 9514.169395] vfs_write+0xbc/0x1c0 [ 9514.172694] ksys_write+0x74/0x100 [ 9514.176079] __arm64_sys_write+0x24/0x30 [ 9514.179987] el0_svc_common.constprop.4+0x110/0x200 [ 9514.184842] do_el0_svc+0x34/0x98 [ 9514.188144] el0_svc+0x14/0x40 [ 9514.191185] el0_sync_handler+0xb0/0x2d0 [ 9514.195088] el0_sync+0x140/0x180 [ 9514.198389] Code: b9001020 d2800000 52800022 f9800271 (885ffe61) [ 9514.204455] ---[ end trace 648de00c8406465f ]--- [ 9514.212308] note: bash[1327] exited with preempt_count 1 Cc: Qian Cai <cai@lca.pw> Cc: Alex Williamson <alex.williamson@redhat.com> Fixes: `1518ac272e` ("vfio/pci: fix memory leaks of eventfd ctx") Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-17 08:28:40 -06:00
Alex Williamson	ebfa440ce3	vfio/pci: Fix SR-IOV VF handling with MMIO blocking SR-IOV VFs do not implement the memory enable bit of the command register, therefore this bit is not set in config space after pci_enable_device(). This leads to an unintended difference between PF and VF in hand-off state to the user. We can correct this by setting the initial value of the memory enable bit in our virtualized config space. There's really no need however to ever fault a user on a VF though as this would only indicate an error in the user's management of the enable bit, versus a PF where the same access could trigger hardware faults. Fixes: `abafbc551f` ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-06-25 11:04:23 -06:00
Alex Williamson	5c5866c593	vfio/pci: Clear error and request eventfd ctx after releasing The next use of the device will generate an underflow from the stale reference. Cc: Qian Cai <cai@lca.pw> Fixes: `1518ac272e` ("vfio/pci: fix memory leaks of eventfd ctx") Reported-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-06-17 15:18:42 -06:00
Michel Lespinasse	c1e8d7c6a7	mmap locking API: convert mmap_sem comments Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-09 09:39:14 -07:00
Michel Lespinasse	89154dd531	mmap locking API: convert mmap_sem call sites missed by coccinelle Convert the last few remaining mmap_sem rwsem calls to use the new mmap locking API. These were missed by coccinelle for some reason (I think coccinelle does not support some of the preprocessor constructs in these files ?) [akpm@linux-foundation.org: convert linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-09 09:39:14 -07:00
Linus Torvalds	5a36f0f3f5	VFIO updates for v5.8-rc1 - Block accesses to disabled MMIO space (Alex Williamson) - VFIO device migration API (Kirti Wankhede) - type1 IOMMU dirty bitmap API and implementation (Kirti Wankhede) - PCI NULL capability masking (Alex Williamson) - Memory leak fixes (Qian Cai) - Reference leak fix (Qiushi Wu) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJe19RZAAoJECObm247sIsiBTEP/1DsA/mL/eDR94Hkmz3y1qoX tkrIJuM5xo95uk6DsbXDikbjM2jpUmvDUiRilara25IceuVX/tKgTjgxNOWJd+h8 KHBoPM0bveILp+Hp6r6ZAEfhZLQ3P6A5XBTB6ql0ebveadyt13TP9Cr0jCf4Jxr2 2A7wZAexWN8P4YBXBqrW8X3ZF3V9rfWYOJ4eILRvEplJ/TkqsGJyKW8LKP45waUF wutN6yh7irnWQbbUO6hb/eTIO7usrNB2L4A5CJMSZIm8JZYYmFQOrn73Ry3bA3Pg U+v3xIRU5U1ed/1DeLs8lRDDxImmkSXbVga0N+e3qnKtDIBh7/i9yNT9hQjhG9Ba C3IQoOM0Fj03hK/UahNerN7FKYoj+JvtdSKeTSNFXwdp/JUSXjj1tLouxUBSFsgD P15CMPXF/U6rT3nQ7/3z3JFKft/iSf6ShjXeNJFqhwH+lKsj13kF2+py3+vgayl8 zLOqeLdRkQwXz73UStX+hrc7MmNYZVRxJ0uf3XXq3/CZ1gwiinIQqxDF3ry+pQ6w vXB8j5XJqsOguoAU7trTdvkKpfoJd4yY0D6aNN9F3JN8xkEBbfm6xZ7Hzur2JI6r iCxUHcS4XrXA/PRlTFupg8451CefuPJa2UkAWSdk182hO2R41l458Q6b7lIMqhOQ oP04S8VYvvDIw5JVnPWU =KSAE -----END PGP SIGNATURE----- Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Block accesses to disabled MMIO space (Alex Williamson) - VFIO device migration API (Kirti Wankhede) - type1 IOMMU dirty bitmap API and implementation (Kirti Wankhede) - PCI NULL capability masking (Alex Williamson) - Memory leak fixes (Qian Cai) - Reference leak fix (Qiushi Wu) * tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio: vfio iommu: typecast corrections vfio iommu: Use shift operation for 64-bit integer division vfio/mdev: Fix reference count leak in add_mdev_supported_type vfio: Selective dirty page tracking if IOMMU backed device pins pages vfio iommu: Add migration capability to report supported features vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap vfio iommu: Implementation of ioctl for dirty pages tracking vfio iommu: Add ioctl definition for dirty pages tracking vfio iommu: Cache pgsize_bitmap in struct vfio_iommu vfio iommu: Remove atomicity of ref_count of pinned pages vfio: UAPI for migration interface for device state vfio/pci: fix memory leaks of eventfd ctx vfio/pci: fix memory leaks in alloc_perm_bits() vfio-pci: Mask cap zero vfio-pci: Invalidate mmaps and block MMIO access on disabled memory vfio-pci: Fault mmaps to enable vma tracking vfio/type1: Support faulting PFNMAP vmas	2020-06-05 13:51:49 -07:00
Alex Williamson	cd34b82e6e	Merge branches 'v5.8/vfio/alex-block-mmio-v3', 'v5.8/vfio/alex-zero-cap-v2' and 'v5.8/vfio/qian-leak-fixes' into v5.8/vfio/next	2020-05-26 10:27:15 -06:00
Qian Cai	1518ac272e	vfio/pci: fix memory leaks of eventfd ctx Finished a qemu-kvm (-device vfio-pci,host=0001:01:00.0) triggers a few memory leaks after a while because vfio_pci_set_ctx_trigger_single() calls eventfd_ctx_fdget() without the matching eventfd_ctx_put() later. Fix it by calling eventfd_ctx_put() for those memory in vfio_pci_release() before vfio_device_release(). unreferenced object 0xebff008981cc2b00 (size 128): comm "qemu-kvm", pid 4043, jiffies 4294994816 (age 9796.310s) hex dump (first 32 bytes): 01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de ....kkkk.....N.. ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff ....kkkk........ backtrace: [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4 [<000000005fcec025>] do_eventfd+0x54/0x1ac [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44 [<00000000b819758c>] do_el0_svc+0x128/0x1dc [<00000000b244e810>] el0_sync_handler+0xd0/0x268 [<00000000d495ef94>] el0_sync+0x164/0x180 unreferenced object 0x29ff008981cc4180 (size 128): comm "qemu-kvm", pid 4043, jiffies 4294994818 (age 9796.290s) hex dump (first 32 bytes): 01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de ....kkkk.....N.. ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff ....kkkk........ backtrace: [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4 [<000000005fcec025>] do_eventfd+0x54/0x1ac [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44 [<00000000b819758c>] do_el0_svc+0x128/0x1dc [<00000000b244e810>] el0_sync_handler+0xd0/0x268 [<00000000d495ef94>] el0_sync+0x164/0x180 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-05-26 10:23:22 -06:00
Qian Cai	3e63b94b62	vfio/pci: fix memory leaks in alloc_perm_bits() vfio_pci_disable() calls vfio_config_free() but forgets to call free_perm_bits() resulting in memory leaks, unreferenced object 0xc000000c4db2dee0 (size 16): comm "qemu-kvm", pid 4305, jiffies 4295020272 (age 3463.780s) hex dump (first 16 bytes): 00 00 ff 00 ff ff ff ff ff ff ff ff ff ff 00 00 ................ backtrace: [<00000000a6a4552d>] alloc_perm_bits+0x58/0xe0 [vfio_pci] [<00000000ac990549>] vfio_config_init+0xdf0/0x11b0 [vfio_pci] init_pci_cap_msi_perm at drivers/vfio/pci/vfio_pci_config.c:1125 (inlined by) vfio_msi_cap_len at drivers/vfio/pci/vfio_pci_config.c:1180 (inlined by) vfio_cap_len at drivers/vfio/pci/vfio_pci_config.c:1241 (inlined by) vfio_cap_init at drivers/vfio/pci/vfio_pci_config.c:1468 (inlined by) vfio_config_init at drivers/vfio/pci/vfio_pci_config.c:1707 [<000000006db873a1>] vfio_pci_open+0x234/0x700 [vfio_pci] [<00000000630e1906>] vfio_group_fops_unl_ioctl+0x8e0/0xb84 [vfio] [<000000009e34c54f>] ksys_ioctl+0xd8/0x130 [<000000006577923d>] sys_ioctl+0x28/0x40 [<000000006d7b1cf2>] system_call_exception+0x114/0x1e0 [<0000000008ea7dd5>] system_call_common+0xf0/0x278 unreferenced object 0xc000000c4db2e330 (size 16): comm "qemu-kvm", pid 4305, jiffies 4295020272 (age 3463.780s) hex dump (first 16 bytes): 00 ff ff 00 ff ff ff ff ff ff ff ff ff ff 00 00 ................ backtrace: [<000000004c71914f>] alloc_perm_bits+0x44/0xe0 [vfio_pci] [<00000000ac990549>] vfio_config_init+0xdf0/0x11b0 [vfio_pci] [<000000006db873a1>] vfio_pci_open+0x234/0x700 [vfio_pci] [<00000000630e1906>] vfio_group_fops_unl_ioctl+0x8e0/0xb84 [vfio] [<000000009e34c54f>] ksys_ioctl+0xd8/0x130 [<000000006577923d>] sys_ioctl+0x28/0x40 [<000000006d7b1cf2>] system_call_exception+0x114/0x1e0 [<0000000008ea7dd5>] system_call_common+0xf0/0x278 Fixes: `89e1f7d4c6` ("vfio: Add PCI device driver") Signed-off-by: Qian Cai <cai@lca.pw> [aw: rolled in follow-up patch] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-05-18 09:58:28 -06:00
Alex Williamson	bc138db1b9	vfio-pci: Mask cap zero The PCI Code and ID Assignment Specification changed capability ID 0 from reserved to a NULL capability in the v1.1 revision. The NULL capability is defined to include only the 16-bit capability header, ie. only the ID and next pointer. Unfortunately vfio-pci creates a map of config space, where ID 0 is used to reserve the standard type 0 header. Finding an actual capability with this ID therefore results in a bogus range marked in that map and conflicts with subsequent capabilities. As this seems to be a dummy capability anyway and we already support dropping capabilities, let's hide this one rather than delving into the potentially subtle dependencies within our map. Seen on an NVIDIA Tesla T4. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-05-18 09:55:35 -06:00
Alex Williamson	abafbc551f	vfio-pci: Invalidate mmaps and block MMIO access on disabled memory Accessing the disabled memory space of a PCI device would typically result in a master abort response on conventional PCI, or an unsupported request on PCI express. The user would generally see these as a -1 response for the read return data and the write would be silently discarded, possibly with an uncorrected, non-fatal AER error triggered on the host. Some systems however take it upon themselves to bring down the entire system when they see something that might indicate a loss of data, such as this discarded write to a disabled memory space. To avoid this, we want to try to block the user from accessing memory spaces while they're disabled. We start with a semaphore around the memory enable bit, where writers modify the memory enable state and must be serialized, while readers make use of the memory region and can access in parallel. Writers include both direct manipulation via the command register, as well as any reset path where the internal mechanics of the reset may both explicitly and implicitly disable memory access, and manipulation of the MSI-X configuration, where the MSI-X vector table resides in MMIO space of the device. Readers include the read and write file ops to access the vfio device fd offsets as well as memory mapped access. In the latter case, we make use of our new vma list support to zap, or invalidate, those memory mappings in order to force them to be faulted back in on access. Our semaphore usage will stall user access to MMIO spaces across internal operations like reset, but the user might experience new behavior when trying to access the MMIO space while disabled via the PCI command register. Access via read or write while disabled will return -EIO and access via memory maps will result in a SIGBUS. This is expected to be compatible with known use cases and potentially provides better error handling capabilities than present in the hardware, while avoiding the more readily accessible and severe platform error responses that might otherwise occur. Fixes: CVE-2020-12888 Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-05-18 09:53:38 -06:00
Alex Williamson	11c4cd07ba	vfio-pci: Fault mmaps to enable vma tracking Rather than calling remap_pfn_range() when a region is mmap'd, setup a vm_ops handler to support dynamic faulting of the range on access. This allows us to manage a list of vmas actively mapping the area that we can later use to invalidate those mappings. The open callback invalidates the vma range so that all tracking is inserted in the fault handler and removed in the close handler. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-05-18 09:53:29 -06:00
Christophe Leroy	7bfc3c84cb	drivers/powerpc: Replace _ALIGN_UP() by ALIGN() _ALIGN_UP() is specific to powerpc ALIGN() is generic and does the same Replace _ALIGN_UP() by ALIGN() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/a5945463f86c984151962a475a3ee56a2893e85d.1587407777.git.christophe.leroy@c-s.fr	2020-05-11 23:15:15 +10:00
Sam Bobroff	00bc509554	vfio-pci/nvlink2: Allow fallback to ibm,mmio-atsd[0] Older versions of skiboot only provide a single value in the device tree property "ibm,mmio-atsd", even when multiple Address Translation Shoot Down (ATSD) registers are present. This prevents NVLink2 devices (other than the first) from being used with vfio-pci because vfio-pci expects to be able to assign a dedicated ATSD register to each NVLink2 device. However, ATSD registers can be shared among devices. This change allows vfio-pci to fall back to sharing the register at index 0 if necessary. Fixes: `7f92891778` ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-04-01 13:50:46 -06:00
Alex Williamson	b66574a3fb	vfio/pci: Cleanup .probe() exit paths The cleanup is getting a tad long. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-03-24 09:28:29 -06:00
Alex Williamson	959e1b75cc	vfio/pci: Remove dev_fmt definition It currently results in messages like: "vfio-pci 0000:03:00.0: vfio_pci: ..." Which is quite a bit redundant. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-03-24 09:28:29 -06:00
Alex Williamson	137e553135	vfio/pci: Add sriov_configure support With the VF Token interface we can now expect that a vfio userspace driver must be in collaboration with the PF driver, an unwitting userspace driver will not be able to get past the GET_DEVICE_FD step in accessing the device. We can now move on to actually allowing SR-IOV to be enabled by vfio-pci on the PF. Support for this is not enabled by default in this commit, but it does provide a module option for this to be enabled (enable_sriov=1). Enabling VFs is rather straightforward, except we don't want to risk that a VF might get autoprobed and bound to other drivers, so a bus notifier is used to "capture" VFs to vfio-pci using the driver_override support. We assume any later action to bind the device to other drivers is condoned by the system admin and allow it with a log warning. vfio-pci will disable SR-IOV on a PF before releasing the device, allowing a VF driver to be assured other drivers cannot take over the PF and that any other userspace driver must know the shared VF token. This support also does not provide a mechanism for the PF userspace driver itself to manipulate SR-IOV through the vfio API. With this patch SR-IOV can only be enabled via the host sysfs interface and the PF driver user cannot create or remove VFs. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-03-24 09:28:28 -06:00
Alex Williamson	43eeeecc8e	vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device agnostic ioctl for setting, retrieving, and probing device features. This implementation provides a 16-bit field for specifying a feature index, where the data porition of the ioctl is determined by the semantics for the given feature. Additional flag bits indicate the direction and nature of the operation; SET indicates user data is provided into the device feature, GET indicates the device feature is written out into user data. The PROBE flag augments determining whether the given feature is supported, and if provided, whether the given operation on the feature is supported. The first user of this ioctl is for setting the vfio-pci VF token, where the user provides a shared secret key (UUID) on a SR-IOV PF device, which users must provide when opening associated VF devices. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-03-24 09:28:27 -06:00
Alex Williamson	cc20d79990	vfio/pci: Introduce VF token If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not fully isolated from the PF. The PF can always cause a denial of service to the VF, even if by simply resetting itself. The degree to which a PF can access the data passed through a VF or interfere with its operation is dependent on a given SR-IOV implementation. Therefore we want to avoid a scenario where an existing vfio-pci based userspace driver might assume the PF driver is trusted, for example assigning a PF to one VM and VF to another with some expectation of isolation. IOMMU grouping could be a solution to this, but imposes an unnecessarily strong relationship between PF and VF drivers if they need to operate with the same IOMMU context. Instead we introduce a "VF token", which is essentially just a shared secret between PF and VF drivers, implemented as a UUID. The VF token can be set by a vfio-pci based PF driver and must be known by the vfio-pci based VF driver in order to gain access to the device. This allows the degree to which this VF token is considered secret to be determined by the applications and environment. For example a VM might generate a random UUID known only internally to the hypervisor while a userspace networking appliance might use a shared, or even well know, UUID among the application drivers. To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD interface is extended to accept key=value pairs in addition to the device name. This allows us to most easily deny user access to the device without risk that existing userspace drivers assume region offsets, IRQs, and other device features, leading to more elaborate error paths. The format of these options are expected to take the form: "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2" Where the device name is always provided first for compatibility and additional options are specified in a space separated list. The relation between and requirements for the additional options will be vfio bus driver dependent, however unknown or unused option within this schema should return error. This allow for future use of unknown options as well as a positive indication to the user that an option is used. An example VF token option would take this form: "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258" When accessing a VF where the PF is making use of vfio-pci, the user MUST provide the current vf_token. When accessing a PF, the user MUST provide the current vf_token IF there are active VF users or MAY provide a vf_token in order to set the current VF token when no VF users are active. The former requirement assures VF users that an unassociated driver cannot usurp the PF device. These semantics also imply that a VF token MUST be set by a PF driver before VF drivers can access their device, the default token is random and mechanisms to read the token are not provided in order to protect the VF token of previous users. Use of the vf_token option outside of these cases will return an error, as discussed above. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-03-24 09:28:27 -06:00
Alex Williamson	467c084f9a	vfio/pci: Implement match ops This currently serves the same purpose as the default implementation but will be expanded for additional functionality. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-03-24 09:28:26 -06:00
Linus Torvalds	a6d5f9dca4	VFIO updates for v5.6-rc1 - Fix nvlink error path (Alexey Kardashevskiy) - Update nvlink and spapr to use mmgrab() (Julia Lawall) - Update static declaration (Ben Dooks) - Annotate __iomem to fix sparse warnings (Ben Dooks) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJeOFZPAAoJECObm247sIsidxIQAIxDITCEjsquBuT9AfBHMcC6 HIqJ9/JYORGxc+nbtc0858rXsvAGJTJR7yu/WE9N9UqDFLHX0yoRHrWEQdK5OFtA pj3PpweMlzdXqMZl5kgQOaLoM4I+gYNrPIdcUyxWpfVJxCCQsgz0iFBrvNqoFgEM YK1UPZSCpHnD5EsG3Lz5BfSM7lyrSfH6R+eWempZST9eBHXC7kvU9pRYVmVDP/l7 fyBsWpoN+sg3rAf+8R7fQyBbPyggvpyhpaWTKVVdARZStISxiprWXEfQl3p2vOK/ aDbyMLhdMraQkkipcu+HAG0AJXX3gdxGBcX9BXLmB6Z8EohBj9hEu8RNUrNnFCm4 rOOT32IuwxSH3oZr+vmOUrG6AQhu10vPLHf5scOYdn3h/6R0gW13aTmsDPTzdOCN oZ2iU5CDJ6w9BMLELRL4WrKTC1lG9xYXS8RmS//GAq24Re7X8kUdHszyoSW8bUFu 5RSqgAYz8Wk/3qrQsMU03yYiEgMS1AZwKHJWRgTCJLdT/p3a8/NXgLWO9Y6EdMua LETW63Bg/YITFTqCr4jTOHYg8XUqrGHFQOWmT+boAkRAHDn56FGcrQ1Bl0BAC8cP 4C1iGwaxsa3DwQkVqsLMsHlBqpyLbz4a2SUQVmaIlwEs58LUv7EBwTymFir4w6aJ Esxh+sm0NYZE0nJ7aAqY =bamw -----END PGP SIGNATURE----- Merge tag 'vfio-v5.6-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Fix nvlink error path (Alexey Kardashevskiy) - Update nvlink and spapr to use mmgrab() (Julia Lawall) - Update static declaration (Ben Dooks) - Annotate __iomem to fix sparse warnings (Ben Dooks) * tag 'vfio-v5.6-rc1' of git://github.com/awilliam/linux-vfio: vfio: platform: fix __iomem in vfio_platform_amdxgbe.c vfio/mdev: make create attribute static vfio/spapr_tce: use mmgrab vfio: vfio_pci_nvlink2: use mmgrab vfio/spapr/nvlink2: Skip unpinning pages on error exit	2020-02-03 22:22:05 +00:00
Julia Lawall	bb3d3cf928	vfio: vfio_pci_nvlink2: use mmgrab Mmgrab was introduced in commit `f1f1007644` ("mm: add new mmgrab() helper") and most of the kernel was updated to use it. Update a remaining file. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) <smpl> @@ expression e; @@ - atomic_inc(&e->mm_count); + mmgrab(e); </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-01-07 13:02:43 -07:00
Alexey Kardashevskiy	338b4e10f9	vfio/spapr/nvlink2: Skip unpinning pages on error exit The nvlink2 subdriver for IBM Witherspoon machines preregisters GPU memory in the IOMMI API so KVM TCE code can map this memory for DMA as well. This is done by mm_iommu_newdev() called from vfio_pci_nvgpu_regops::mmap. In an unlikely event of failure the data->mem remains NULL and since mm_iommu_put() (which unregisters the region and unpins memory if that was regular memory) does not expect mem=NULL, it should not be called. This adds a check to only call mm_iommu_put() for a valid data->mem. Fixes: `7f92891778` ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-01-06 10:01:05 -07:00
Christoph Hellwig	4bdc0d676a	remove ioremap_nocache and devm_ioremap_nocache ioremap has provided non-cached semantics by default since the Linux 2.6 days, so remove the additional ioremap_nocache interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de>	2020-01-06 09:45:59 +01:00
Linus Torvalds	94e89b4023	VFIO updates for v5.5-rc1 - Remove hugepage checks for reserved pfns (Ben Luo) - Fix irq-bypass unregister ordering (Jiang Yi) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJd6oWBAAoJECObm247sIsi7Q8P/37Gs4jKx+ZUfd0K8MwyAJjV 1oiq0vsXVKQPRlBF0G9YdZQubX3X7l87PbeV0P4e6GKH5/A30FUA4OUgurbQaybK siajJVmdIuEeS/xNZxkuEyyjsOXXPBo+oBIhu3yDoLp0wOCYuAMOQea25H34dhG9 A141c/mZg/29jfz8whQMthZiACzIFXMcD/pO6VzspEwVX4aGaX4js1wgOp2firRf QqbQon7BfozwM5WAwwlyEyBYmewILQgciijsifGpKhTTfONl7j34GH8+/JM7WrD7 BNayX+DZdcr+NFrNyuvdU+/HnlCt/6lZADAtnyMRv53QTiRFFmdQkYU4bTpKLqHH XoN4OoPAdxYHOPRq/MOon7QsAvXodCTZ9GTIiXtC+hjIangm5f1jW7tHPjpRrbrT luXvRv3dC94lVOxJ8wfp36H/GPJbJkAASmOasTT54AP1PBmOihAYwYCVBMF9Qihp Fbk9DCg7mTb8vSbOQPxUi732sovOVhwQS4lkPq1BgoNibElzV6UbwYxQhn/0Ixpm ZcDE8lipk/sjns5HhFWYJYFJQAVtFVM3Ets4oJK/tB8ELzCch/URoc+zwNKnCVHQ jjCBcrHU/jdnNm85BrN12vFn1gjHqHZLvTGA7HCAABgqzeJ9Ln4IDuUELg9JsO/T xq/8RRxCREs5XL6aWr4n =veGz -----END PGP SIGNATURE----- Merge tag 'vfio-v5.5-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Remove hugepage checks for reserved pfns (Ben Luo) - Fix irq-bypass unregister ordering (Jiang Yi) * tag 'vfio-v5.5-rc1' of git://github.com/awilliam/linux-vfio: vfio/pci: call irq_bypass_unregister_producer() before freeing irq vfio/type1: remove hugepage checks in is_invalid_reserved_pfn()	2019-12-07 14:51:04 -08:00
Jiang Yi	d567fb8819	vfio/pci: call irq_bypass_unregister_producer() before freeing irq Since irq_bypass_register_producer() is called after request_irq(), we should do tear-down in reverse order: irq_bypass_unregister_producer() then free_irq(). Specifically free_irq() may release resources required by the irqbypass del_producer() callback. Notably an example provided by Marc Zyngier on arm64 with GICv4 that he indicates has the potential to wedge the hardware: free_irq(irq) __free_irq(irq) irq_domain_deactivate_irq(irq) its_irq_domain_deactivate() [unmap the VLPI from the ITS] kvm_arch_irq_bypass_del_producer(cons, prod) kvm_vgic_v4_unset_forwarding(kvm, irq, ...) its_unmap_vlpi(irq) [Unmap the VLPI from the ITS (again), remap the original LPI] Signed-off-by: Jiang Yi <giangyi@amazon.com> Cc: stable@vger.kernel.org # v4.4+ Fixes: `6d7425f109` ("vfio: Register/unregister irq_bypass_producer") Link: https://lore.kernel.org/kvm/20191127164910.15888-1-giangyi@amazon.com Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> [aw: commit log] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-12-02 14:34:41 -07:00
Denis Efremov	c9c13ba428	PCI: Add PCI_STD_NUM_BARS for the number of standard BARs Code that iterates over all standard PCI BARs typically uses PCI_STD_RESOURCE_END. However, that requires the unusual test "i <= PCI_STD_RESOURCE_END" rather than something the typical "i < PCI_STD_NUM_BARS". Add a definition for PCI_STD_NUM_BARS and change loops to use the more idiomatic C style to help avoid fencepost errors. Link: https://lore.kernel.org/r/20190927234026.23342-1-efremov@linux.com Link: https://lore.kernel.org/r/20190927234308.23935-1-efremov@linux.com Link: https://lore.kernel.org/r/20190916204158.6889-3-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Sebastian Ott <sebott@linux.ibm.com> # arch/s390/ Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> # video/fbdev/ Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> # pci/controller/dwc/ Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> # scsi/pm8001/ Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi/pm8001/ Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # memstick/	2019-10-14 10:22:26 -05:00
hexin	92c8026854	vfio_pci: Restore original state on release vfio_pci_enable() saves the device's initial configuration information with the intent that it is restored in vfio_pci_disable(). However, the commit referenced in Fixes: below replaced the call to __pci_reset_function_locked(), which is not wrapped in a state save and restore, with pci_try_reset_function(), which overwrites the restored device state with the current state before applying it to the device. Reinstate use of __pci_reset_function_locked() to return to the desired behavior. Fixes: `890ed578df` ("vfio-pci: Use pci "try" reset interface") Signed-off-by: hexin <hexin15@baidu.com> Signed-off-by: Liu Qi <liuqi16@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-08-22 14:53:06 -06:00
Peng Hao	e66e02c4d9	vfio: vfio_pci_nvlink2: use a vma helper function Use a vma helper function to simply code. Signed-off-by: Peng Hao <richard.peng@oppo.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-07-02 11:44:12 -06:00
Thomas Gleixner	d2912cb15b	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-06-19 17:09:55 +02:00
Thomas Gleixner	ec8f24b7fa	treewide: Add SPDX license identifier - Makefile/Kconfig Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-21 10:50:46 +02:00
Greg Kurz	2c85f2bd51	vfio-pci/nvlink2: Fix potential VMA leak If vfio_pci_register_dev_region() fails then we should rollback previous changes, ie. unmap the ATSD registers. Fixes: `7f92891778` ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-05-01 13:31:41 -06:00
Bjorn Helgaas	a88a7b3eb0	vfio: Use dev_printk() when possible Use dev_printk() when possible to make messages consistent with other device-related messages. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-04-22 11:45:42 -06:00
Louis Taylor	426b046b74	vfio/pci: use correct format characters When compiling with -Wformat, clang emits the following warnings: drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Louis Taylor <louis@kragniz.eu> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-04-03 12:36:20 -06:00
Eric Auger	0cfd027be1	vfio_pci: Enable memory accesses before calling pci_map_rom pci_map_rom/pci_get_rom_size() performs memory access in the ROM. In case the Memory Space accesses were disabled, readw() is likely to trigger a synchronous external abort on some platforms. In case memory accesses were disabled, re-enable them before the call and disable them back again just after. Fixes: `89e1f7d4c6` ("vfio: Add PCI device driver") Signed-off-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-02-18 14:57:50 -07:00
Alex Williamson	51ef3a004b	vfio/pci: Restore device state on PM transition PCI core handles save and restore of device state around reset, but when using pci_set_power_state() we can unintentionally trigger a soft reset of the device, where PCI core only restores the BAR state. If we're using vfio-pci's idle D3 support to try to put devices into low power when unused, this might trigger a reset when the device is woken for use. Also power state management by the user, or within a guest, can put the device into D3 power state with potentially limited ability to restore the device if it should undergo a reset. The PCI spec does not define the extent of a soft reset and many devices reporting soft reset on D3->D0 transition do not undergo a PCI config space reset. It's therefore assumed safe to unconditionally restore the remainder of the state if the device indicates soft reset support, even on a user initiated wakeup. Implement a wrapper in vfio-pci to tag devices reporting PM reset support, save their state on transitions into D3 and restore on transitions back to D0. Reported-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-02-18 14:55:53 -07:00
Alexey Kardashevskiy	9a71ac7e15	vfio-pci/nvlink2: Fix ancient gcc warnings Using the {0} construct as a generic initializer is perfectly fine in C, however due to a bug in old gcc there is a warning: + /kisskb/src/drivers/vfio/pci/vfio_pci_nvlink2.c: warning: (near initialization for 'cap.header') [-Wmissing-braces]: => 181:9 Since for whatever reason we still want to compile the modern kernel with such an old gcc without warnings, this changes the capabilities initialization. The gcc bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119 Fixes: `7f92891778` ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-01-23 08:20:43 -07:00
Thomas Gleixner	33e5ee780e	vfio/pci: Cleanup license mess The recently added nvlink2 VFIO driver introduced a license conflict in two files. In both cases the SPDX license identifier is: SPDX-License-Identifier: GPL-2.0+ but the files contain also the following license boiler plate text: * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation The latter is GPL-2.9-only and not GPL-2.0=. Looking deeper. The nvlink source file is derived from vfio_pci_igd.c which is also licensed under GPL-2.0-only and it can be assumed that the file was copied and modified. As the original file is licensed GPL-2.0-only it's not possible to relicense derivative work to GPL-2.0-or-later. Fix the SPDX identifier and remove the boiler plate as it is redundant. Fixes: `7f92891778` ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: kvm@vger.kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-01-22 11:06:05 -07:00
Masahiro Yamada	d1fc1176c0	vfio/pci: set TRACE_INCLUDE_PATH to fix the build error drivers/vfio/pci/vfio_pci_nvlink2.c cannot be compiled for in-tree building. CC drivers/vfio/pci/vfio_pci_nvlink2.o In file included from drivers/vfio/pci/trace.h:102, from drivers/vfio/pci/vfio_pci_nvlink2.c:29: ./include/trace/define_trace.h:89:42: fatal error: ./trace.h: No such file or directory #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) ^ compilation terminated. make[1]: *** [scripts/Makefile.build;277: drivers/vfio/pci/vfio_pci_nvlink2.o] Error 1 To fix the build error, let's tell include/trace/define_trace.h the location of drivers/vfio/pci/trace.h Fixes: `7f92891778` ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-01-08 09:08:30 -07:00
Linus Torvalds	1984f65c2f	VFIO updates for v4.21 - Replace global vfio-pci lock with per bus lock to allow concurrent open and release (Alex Williamson) - Declare mdev function as static (Paolo Cretaro) - Convert char to u8 in mdev/mtty sample driver (Nathan Chancellor) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJcHQpWAAoJECObm247sIsi1x0QAIZ+XICmk+cWiYJBOH64IIqS ZjjzKj0QtIBnAtecK9OjCeVSbcaPeL10DPNmg/QF0/TmO4Ch71XZZ2D9DOW1neVv 7FPrK9pXDIy/vDq/5yGXr9d/x5LvqiWFG3DmUp+lmUEbd8wFbdBep1/aSr6HY+78 LGoX8TAGuNfj1+GLhL/jXanFE1Az1+dTh63umI9TptRf+HGxQZmRY5JqfZ8OMWrl vNmiugmkS5quhIKa4PO2ZFBndodgI5hMxezkEDqXzOe0bzQ8uBzs43UVa8/V+Ais 05F4Ya5CbeQYTOkSnwN0DC4Vy6WWUtz6TJBB2x8P57ZYb8PYQg7yJluBxZVYmR4S 0c8/Dqi95fUcVSUmDmW/q46YEqFBcXQZEmHP2lap2BLI4H/It7ft/cGacfUuvIq5 g9H7tZBF8g1MsjYRm4D86kC1AcstbBgZeywmmtNwdTVYpeGbIv7DLvWMlVP0Jk/J gKKAO1RHo/tnpZv/W+ge5pXDSOxZOgHRlXnnlj8jMFjQh0J15dtSmQM13MZhaxnZ 5Q7emt2jv7kvYqsBbkTcS27ii4kmjMlnsX0YPL8OK/QlQbW8jJwIraFoPZvUklkZ FrqU9DIJlaGbtMwB04b9wEvWtfPKl/6ycFWFLu+9rFMVaOVT5oeHUQFV1PWOErw8 /l0qJqiTy9y1K4DJEU1P =HOtX -----END PGP SIGNATURE----- Merge tag 'vfio-v4.21-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Replace global vfio-pci lock with per bus lock to allow concurrent open and release (Alex Williamson) - Declare mdev function as static (Paolo Cretaro) - Convert char to u8 in mdev/mtty sample driver (Nathan Chancellor) * tag 'vfio-v4.21-rc1' of git://github.com/awilliam/linux-vfio: vfio-mdev/samples: Use u8 instead of char for handle functions vfio/mdev: add static modifier to add_mdev_supported_type vfio/pci: Parallelize device open and release	2018-12-28 19:41:58 -08:00
Alexey Kardashevskiy	7f92891778	vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not pluggable PCIe devices but still have PCIe links which are used for config space and MMIO. In addition to that the GPUs have 6 NVLinks which are connected to other GPUs and the POWER9 CPU. POWER9 chips have a special unit on a die called an NPU which is an NVLink2 host bus adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each. These systems also support ATS (address translation services) which is a part of the NVLink2 protocol. Such GPUs also share on-board RAM (16GB or 32GB) to the system via the same NVLink2 so a CPU has cache-coherent access to a GPU RAM. This exports GPU RAM to the userspace as a new VFIO device region. This preregisters the new memory as device memory as it might be used for DMA. This inserts pfns from the fault handler as the GPU memory is not onlined until the vendor driver is loaded and trained the NVLinks so doing this earlier causes low level errors which we fence in the firmware so it does not hurt the host system but still better be avoided; for the same reason this does not map GPU RAM into the host kernel (usual thing for emulated access otherwise). This exports an ATSD (Address Translation Shootdown) register of NPU which allows TLB invalidations inside GPU for an operating system. The register conveniently occupies a single 64k page. It is also presented to the userspace as a new VFIO device region. One NPU has 8 ATSD registers, each of them can be used for TLB invalidation in a GPU linked to this NPU. This allocates one ATSD register per an NVLink bridge allowing passing up to 6 registers. Due to the host firmware bug (just recently fixed), only 1 ATSD register per NPU was actually advertised to the host system so this passes that alone register via the first NVLink bridge device in the group which is still enough as QEMU collects them all back and presents to the guest via vPHB to mimic the emulated NPU PHB on the host. In order to provide the userspace with the information about GPU-to-NVLink connections, this exports an additional capability called "tgt" (which is an abbreviated host system bus address). The "tgt" property tells the GPU its own system address and allows the guest driver to conglomerate the routing information so each GPU knows how to get directly to the other GPUs. For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to know LPID (a logical partition ID or a KVM guest hardware ID in other words) and PID (a memory context ID of a userspace process, not to be confused with a linux pid). This assigns a GPU to LPID in the NPU and this is why this adds a listener for KVM on an IOMMU group. A PID comes via NVLink from a GPU and NPU uses a PID wildcard to pass it through. This requires coherent memory and ATSD to be available on the host as the GPU vendor only supports configurations with both features enabled and other configurations are known not to work. Because of this and because of the ways the features are advertised to the host system (which is a device tree with very platform specific properties), this requires enabled POWERNV platform. The V100 GPUs do not advertise any of these capabilities via the config space and there are more than just one device ID so this relies on the platform to tell whether these GPUs have special abilities such as NVLinks. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-12-21 16:20:47 +11:00
Alexey Kardashevskiy	c2c0f1cde0	vfio_pci: Allow regions to add own capabilities VFIO regions already support region capabilities with a limited set of fields. However the subdriver might have to report to the userspace additional bits. This adds an add_capability() hook to vfio_pci_regops. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-12-21 16:20:47 +11:00
Alexey Kardashevskiy	a15b1883fe	vfio_pci: Allow mapping extra regions So far we only allowed mapping of MMIO BARs to the userspace. However there are GPUs with on-board coherent RAM accessible via side channels which we also want to map to the userspace. The first client for this is NVIDIA V100 GPU with NVLink2 direct links to a POWER9 NPU-enabled CPU; such GPUs have 16GB RAM which is coherently mapped to the system address space, we are going to export these as an extra PCI region. We already support extra PCI regions and this adds support for mapping them to the userspace. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-12-21 16:20:47 +11:00
Alex Williamson	e309df5b0c	vfio/pci: Parallelize device open and release In commit `61d792562b` ("vfio-pci: Use mutex around open, release, and remove") a mutex was added to freeze the refcnt for a device so that we can handle errors and perform bus resets on final close. However, bus resets can be rather slow and a global mutex here is undesirable. Evaluating the potential locking granularity, a per-device mutex provides the best resolution but with multiple devices on a bus all released concurrently, they'll race to acquire each other's mutex, likely resulting in no reset at all if we use trylock. We therefore lock at the granularity of the bus/slot reset as we're only attempting a single reset for this group of devices anyway. This allows much greater scaling as we're bounded in the number of devices protected by a single reflck object. Reported-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Tested-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-12-12 12:51:07 -07:00
Alex Williamson	db04264fe9	vfio/pci: Mask buggy SR-IOV VF INTx support The SR-IOV spec requires that VFs must report zero for the INTx pin register as VFs are precluded from INTx support. It's much easier for the host kernel to understand whether a device is a VF and therefore whether a non-zero pin register value is bogus than it is to do the same in userspace. Override the INTx count for such devices and virtualize the pin register to provide a consistent view of the device to the user. As this is clearly a spec violation, warn about it to support hardware validation, but also provide a known whitelist as it doesn't do much good to continue complaining if the hardware vendor doesn't plan to fix it. Known devices with this issue: 8086:270c Tested-by: Gage Eads <gage.eads@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-09-25 13:01:27 -06:00

1 2 3 4 5 ...

281 Commits