linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-28 09:22:08 +00:00

Author	SHA1	Message	Date
Bernd Schubert	88be7aa98d	fuse: Move request bits These are needed by fuse-over-io-uring. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2025-01-24 11:53:51 +01:00
Bernd Schubert	867d93dcde	fuse: Move fuse_get_dev to header file Another preparation patch, as this function will be needed by fuse/dev.c and fuse/dev_uring.c. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2025-01-24 11:53:45 +01:00
Bernd Schubert	92270d0761	fuse: rename to fuse_dev_end_requests and make non-static This function is needed by fuse_uring.c to clean ring queues, so make it non static. Especially in non-static mode the function name 'end_requests' should be prefixed with fuse_ Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2025-01-24 11:53:25 +01:00
Linus Torvalds	f96a974170	lsm/stable-6.14 PR 20250121 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmeQFBoUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPvcA//XCdwMz0bGtWKv58nuyP8vkQx08n6 //olz/O8te3uWK5O3kRiarzFLwH8qsHQ6A7GYalwwix34hatR4ndJE0Y/guVRWa1 +aBmJxJ7Jm/q3fvpAEfqiSgreuE6kBoztlDOWEq+hUQGu4qfnQGm2EnvbvfFrAmN VheOfIQSU2KCL/Scc3FGnF6uru4WrqN0JJ9RbvrEpfdQgmcyTGLnQsZLljutWSIq kDWkteIr7cj3O9J45zpxZsTftvYSgVn/y1iKeXbHI4DBA1eheK12vsHB9AADKI1J GwHxOrnLpZtv+ICUKqcfFTmWTl+NmfJJurAT5KXKdBjL3xM5MoJlBvK1A5qE9CMo LaHVG/TZR2MmBaoM3EN+gvWhDgWlvT02Q/0cYaafTlVLMez3HtfctxN6OnCvTXTB Y8dqYClhhlBm/mHQwYfMoeKw4MftUpzEqBd1Nj7Qe8dbP0f/62Ca3K2B3D6Rf8QV pj3ryMlSWYV9mdTerruLNQexTGoN7l66jPwzdWpTbFeL3WmNtfCako8OZGbXgPIu Iahm3P+jnSVx8ZQro2c9zwdKXI5xiI335pCBbDZ8aX+JAsfj0OofHsFx5Q5diber M7tAEhxDqRisbpz7Ei+/LOAEGg2Z619XKg8ks4z6Y4P5PF7zEgeWTkZJk2iLbxXe 6LLOjmF7LLw+G4M= =fgyr -----END PGP SIGNATURE----- Merge tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Improved handling of LSM "secctx" strings through lsm_context struct The LSM secctx string interface is from an older time when only one LSM was supported, migrate over to the lsm_context struct to better support the different LSMs we now have and make it easier to support new LSMs in the future. These changes explain the Rust, VFS, and networking changes in the diffstat. - Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are enabled Small tweak to be a bit smarter about when we build the LSM's common audit helpers. - Check for absurdly large policies from userspace in SafeSetID SafeSetID policies rules are fairly small, basically just "UID:UID", it easy to impose a limit of KMALLOC_MAX_SIZE on policy writes which helps quiet a number of syzbot related issues. While work is being done to address the syzbot issues through other mechanisms, this is a trivial and relatively safe fix that we can do now. - Various minor improvements and cleanups A collection of improvements to the kernel selftests, constification of some function parameters, removing redundant assignments, and local variable renames to improve readability. * tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lockdown: initialize local array before use to quiet static analysis safesetid: check size of policy writes net: corrections for security_secid_to_secctx returns lsm: rename variable to avoid shadowing lsm: constify function parameters security: remove redundant assignment to return variable lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test binder: initialize lsm_context structure rust: replace lsm context+len with lsm_context lsm: secctx provider check on release lsm: lsm_context in security_dentry_init_security lsm: use lsm_context in security_inode_getsecctx lsm: replace context+len with lsm_context lsm: ensure the correct LSM context releaser	2025-01-21 20:03:04 -08:00
Christian Brauner	3ff93c5935	fuse fixes for 6.13-rc7 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ3vEzQAKCRDh3BK/laaZ PK9jAP9AvwPhO+0ySnPbQ+eupfYV4+NmUhk01SBXAhlHlA31AQD9E7tbrWapFQ4K +TqVBCKSX4b8QSUwtNCq3yw43+ts0Aw= =72uv -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ309+QAKCRCRxhvAZXjc otnUAP9kaerhNRAaEHN60FWo2OyOs0RwGZCtTheAbuFeVryt9QD/UAtlKLETKHN0 BZVIsgxE/Rl9uzjSHAjBAPw905Yqiw4= =r3Ee -----END PGP SIGNATURE----- Merge tag 'fuse-fixes-6.13-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi <mszeredi@redhat.com>: - Fix fuse_get_user_pages() allocation failure handling. - Fix direct-io folio offset and length calculation. * tag 'fuse-fixes-6.13-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Set *nbytesp=0 in fuse_get_user_pages on allocation failure fuse: fix direct io folio offset and length calculation Link: https://lore.kernel.org/r/CAJfpegu7o_X%3DSBWk_C47dUVUQ1mJZDEGe1MfD0N3wVJoUBWdmg@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-01-07 15:43:07 +01:00
Amir Goldstein	03f275adb8	fuse: respect FOPEN_KEEP_CACHE on opendir The re-factoring of fuse_dir_open() missed the need to invalidate directory inode page cache with open flag FOPEN_KEEP_CACHE. Fixes: `7de64d521b` ("fuse: break up fuse_open_common()") Reported-by: Prince Kumar <princer@google.com> Closes: https://lore.kernel.org/linux-fsdevel/CAEW=TRr7CYb4LtsvQPLj-zx5Y+EYBmGfM24SuzwyDoGVNoKm7w@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250101130037.96680-1-amir73il@gmail.com Reviewed-by: Bernd Schubert <bernd.schubert@fastmail.fm> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-01-04 09:58:14 +01:00
Bernd Schubert	78f2560fc9	fuse: Set nbytesp=0 in fuse_get_user_pages on allocation failure In fuse_get_user_pages(), set nbytesp to 0 when struct page *pages allocation fails. This prevents the caller (fuse_direct_io) from making incorrect assumptions that could lead to NULL pointer dereferences when processing the request reply. Previously, nbytesp was left unmodified on allocation failure, which could cause issues if the caller assumed pages had been added to ap->descs[] when they hadn't. Reported-by: syzbot+87b8e6ed25dbc41759f7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=87b8e6ed25dbc41759f7 Fixes: `3b97c3652d` ("fuse: convert direct io to use folios") Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Tested-by: Dmitry Antipov <dmantipov@yandex.ru> Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-12-13 16:43:36 +01:00
Joanne Koong	7a4f541873	fuse: fix direct io folio offset and length calculation For the direct io case, the pages from userspace may be part of a huge folio, even if all folios in the page cache for fuse are small. Fix the logic for calculating the offset and length of the folio for the direct io case, which currently incorrectly assumes that all folios encountered are one page size. Fixes: `3b97c3652d` ("fuse: convert direct io to use folios") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-12-12 09:27:42 +01:00
Casey Schaufler	b530104f50	lsm: lsm_context in security_dentry_init_security Replace the (secctx,seclen) pointer pair with a single lsm_context pointer to allow return of the LSM identifier along with the context and context length. This allows security_release_secctx() to know how to release the context. Callers have been modified to use or save the returned data from the new structure. Cc: ceph-devel@vger.kernel.org Cc: linux-nfs@vger.kernel.org Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>	2024-12-04 14:58:51 -05:00
Linus Torvalds	2a50b1e766	virtio: features, fixes, cleanups A small number of improvements all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmdGPb8PHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpPowH/3Fc6uWqgMRiHgBP6BMlmAYRhhovlBF70Cug SN1dQuV9aVRYC4rqUoYb3F7X4Szn9fpPiGuwDywmI5jcSNMbsQlCxwrymcVXKxuO sZRGBtIYvzHbZzYjp380WHuglCZ+cIfQxLV6fI2ly4oN8LybKwXSxrTQ1uu/CSZ5 vLiyAAJ7J9bKvrMjKg9vXTzK5/jzf7fKhB9NnQb4/JbsVcEoJdNkCxm/cV4wyVa+ RateZBDgy6YUULKKei4MuaBGHX3pHhxlyrE9aas3E74ijIz+H8tOBz6mgcI939z7 xfdqGRGUnZrC7t8ZjWs9CCCu1jR18hXNMZXcCuDMdyghQib5D7o= =GzUl -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: "A small number of improvements all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_vdpa: remove redundant check on desc virtio_fs: store actual queue index in mq_map virtio_fs: add informative log for new tag discovery virtio: Make vring_new_virtqueue support packed vring virtio_pmem: Add freeze/restore callbacks vdpa/mlx5: Fix suboptimal range on iotlb iteration	2024-11-27 13:11:58 -08:00
Linus Torvalds	fb527fc1f3	fuse update for 6.13 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ0Rb/wAKCRDh3BK/laaZ PK80AQDAUgA6S5SSrbJxwRFNOhbwtZxZqJ8fomJR5xuWIEQ9pwEAkpFqhBhBW0y1 0YaREow2aDANQQtSUrfPtgva1ZXFwQU= =Cyx5 -----END PGP SIGNATURE----- Merge tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add page -> folio conversions (Joanne Koong, Josef Bacik) - Allow max size of fuse requests to be configurable with a sysctl (Joanne Koong) - Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun) - Fix large kernel reads (like a module load) in virtio_fs (Hou Tao) - Fix attribute inconsistency in case readdirplus (and plain lookup in corner cases) is racing with inode eviction (Zhang Tianci) - Fix a WARN_ON triggered by virtio_fs (Asahi Lina) * tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits) virtiofs: dax: remove ->writepages() callback fuse: check attributes staleness on fuse_iget() fuse: remove pages for requests and exclusively use folios fuse: convert direct io to use folios mm/writeback: add folio_mark_dirty_lock() fuse: convert writebacks to use folios fuse: convert retrieves to use folios fuse: convert ioctls to use folios fuse: convert writes (non-writeback) to use folios fuse: convert reads to use folios fuse: convert readdir to use folios fuse: convert readlink to use folios fuse: convert cuse to use folios fuse: add support in virtio for requests using folios fuse: support folios in struct fuse_args_pages and fuse_copy_pages() fuse: convert fuse_notify_store to use folios fuse: convert fuse_retrieve to use folios fuse: use the folio based vmstat helpers fuse: convert fuse_writepage_need_send to take a folio fuse: convert fuse_do_readpage to use folios ...	2024-11-26 12:41:27 -08:00
Linus Torvalds	e7675238b9	overlayfs updates for 6.13 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmc90jsACgkQEVvVyTe/ 1Wol0A//RhzFCG8geR7Grbptp40CUm9kVISvkr50mPBdvVk3jX9WvH9m/10qapGP tcGHSdHt+q5qabqutKLmQRiFbwpGEaBMaFOe7JH8na8xWvmSa3p7sJC5kLByS3rm D2F+cVx3Di7MTscz/Ma724bHdHOUO5RbDuMIcjp7uXRvaNWJ0uZg5xWlBKsNa3h8 DbNSYi5ICihLYpUxI9NglHZ6iqcS2jHsUHSAw52/GJ2Zon1LAAmKoSn6s7hZ27ZJ f8Rv5fFuYmkRV7nYo/gjLY1gt7KXZFcfUtMT05yd7zcnqDayKEFXEiwI/Bz5fXZL HmZpOP4RV2M9B8HzhReVR/yG8gZaaUezX+aVQp7plZSc73GhMdFFd1bUyjgJ4Lzf C2BlBMWafc/Zc7a7r0+X5577i34nED8lGuVMEdYMtjSjstpzIP+1Wlzn2cGi4+5K VAb+kEravjP9ck7YrmbruRYfVhDaE37BDs4XML4S8gzcZgdaTcEMyGw1ifEhvPjA vLbRs24a5VO7/cKlks7PWS6i9uExaz7g4re0jUPwUuc+nS+Hv+y8kLSPqLS4CtNY MxhS2IhKK5gp1Z9XGpLsak+ancTYLSV0OJ15qsAChpqoqSG5Xd9Lt4CWACnF33Ea ny8z5QpOAHWVb97k6xaEvu/r0dl+PHdG7vfb0MNhXaajNF8SKiU= =pgoX -----END PGP SIGNATURE----- Merge tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - Fix a syzbot reported NULL pointer deref with bfs lower layers - Fix a copy up failure of large file from lower fuse fs - Followup cleanup of backing_file API from Miklos - Introduction and use of revert/override_creds_light() helpers, that were suggested by Christian as a mitigation to cache line bouncing and false sharing of fields in overlayfs creator_cred long lived struct cred copy. - Store up to two backing file references (upper and lower) in an ovl_file container instead of storing a single backing file in file->private_data. This is used to avoid the practice of opening a short lived backing file for the duration of some file operations and to avoid the specialized use of FDPUT_FPUT in such occasions, that was getting in the way of Al's fd_file() conversions. * tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: Filter invalid inodes with missing lookup function ovl: convert ovl_real_fdget() callers to ovl_real_file() ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path() ovl: store upper real file in ovl_file struct ovl: allocate a container struct ovl_file for ovl private context ovl: do not open non-data lower file for fsync ovl: Optimize override/revert creds ovl: pass an explicit reference of creators creds to callers ovl: use wrapper ovl_revert_creds() fs/backing-file: Convert to revert/override_creds_light() cred: Add a light version of override/revert_creds() backing-file: clean up the API ovl: properly handle large files in ovl_security_fileattr	2024-11-22 20:55:42 -08:00
Linus Torvalds	0f25f0e4ef	the bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdikAAKCRBZ7Krx/gZQ 69nJAQCmbQHK3TGUbQhOw6MJXOK9ezpyEDN3FZb4jsu38vTIdgEA6OxAYDO2m2g9 CN18glYmD3wRyU6Bwl4vGODouSJvDgA= =gVH3 -----END PGP SIGNATURE----- Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' class updates from Al Viro: "The bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) deal with the last remaing boolean uses of fd_file() css_set_fork(): switch to CLASS(fd_raw, ...) memcg_write_event_control(): switch to CLASS(fd) assorted variants of irqfd setup: convert to CLASS(fd) do_pollfd(): convert to CLASS(fd) convert do_select() convert vfs_dedupe_file_range(). convert cifs_ioctl_copychunk() convert media_request_get_by_fd() convert spu_run(2) switch spufs_calls_{get,put}() to CLASS() use convert cachestat(2) convert do_preadv()/do_pwritev() fdget(), more trivial conversions fdget(), trivial conversions privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() o2hb_region_dev_store(): avoid goto around fdget()/fdput() introduce "fd_pos" class, convert fdget_pos() users to it. fdget_raw() users: switch to CLASS(fd_raw) convert vmsplice() to CLASS(fd) ...	2024-11-18 12:24:06 -08:00
Asahi Lina	d1dfb5f52f	virtiofs: dax: remove ->writepages() callback When using FUSE DAX with virtiofs, cache coherency is managed by the host. Disk persistence is handled via fsync() and friends, which are passed directly via the FUSE layer to the host. Therefore, there's no need to do dax_writeback_mapping_range(). All that ends up doing is a cache flush operation, which is not caught by KVM and doesn't do much, since the host and guest are already cache-coherent. Since dax_writeback_mapping_range() checks that the inode block size is equal to PAGE_SIZE, this fixes a spurious WARN when virtiofs is used with a mismatched guest PAGE_SIZE and virtiofs backing FS block size (this happens, for example, when it's a tmpfs and the host and guest have a different PAGE_SIZE). FUSE DAX does not require any particular FS block size, since it always performs DAX mappings in aligned 2MiB blocks. See discussion in [1]. [1] https://lore.kernel.org/lkml/20241101-dax-page-size-v1-1-eedbd0c6b08f@asahilina.net/T/#u [SzM: remove the empty callback] Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Asahi Lina <lina@asahilina.net> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-18 12:24:38 +01:00
Zhang Tianci	69eb56f69e	fuse: check attributes staleness on fuse_iget() Function fuse_direntplus_link() might call fuse_iget() to initialize a new fuse_inode and change its attributes. If fi->attr_version is always initialized with 0, even if the attributes returned by the FUSE_READDIR request is staled, as the new fi->attr_version is 0, fuse_change_attributes will still set the staled attributes to inode. This wrong behaviour may cause file size inconsistency even when there is no changes from server-side. To reproduce the issue, consider the following 2 programs (A and B) are running concurrently, A B ---------------------------------- -------------------------------- { /fusemnt/dir/f is a file path in a fuse mount, the size of f is 0. } readdir(/fusemnt/dir) start //Daemon set size 0 to f direntry fallocate(f, 1024) stat(f) // B see size 1024 echo 2 > /proc/sys/vm/drop_caches readdir(/fusemnt/dir) reply to kernel Kernel set 0 to the I_NEW inode stat(f) // B see size 0 In the above case, only program B is modifying the file size, however, B observes file size changing between the 2 'readonly' stat() calls. To fix this issue, we should make sure readdirplus still follows the rule of attr_version staleness checking even if the fi->attr_version is lost due to inode eviction. To identify this situation, the new fc->evict_ctr is used to record whether the eviction of inodes occurs during the readdirplus request processing. If it does, the result of readdirplus may be inaccurate; otherwise, the result of readdirplus can be trusted. Although this may still lead to incorrect invalidation, considering the relatively low frequency of evict occurrences, it should be acceptable. Link: https://lore.kernel.org/lkml/20230711043405.66256-2-zhangjiachen.jaycee@bytedance.com/ Link: https://lore.kernel.org/lkml/20241114070905.48901-1-zhangtianci.1997@bytedance.com/ Reported-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-18 12:24:13 +01:00
Max Gurtovoy	df28040c7f	virtio_fs: store actual queue index in mq_map This will eliminate the need for index recalculation during the fast path. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20241006184341.9081-1-mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-11-12 18:07:46 -05:00
Max Gurtovoy	22d984f1b9	virtio_fs: add informative log for new tag discovery Enhance the device probing process by adding a log message when a new virtio-fs tag is successfully discovered. This improvement provides better visibility into the initialization of virtio-fs devices. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20241006184324.8497-1-mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-11-12 18:07:46 -05:00
Miklos Szeredi	48b50624ae	backing-file: clean up the API - Pass iocb to ctx->end_write() instead of file + pos - Get rid of ctx->user_file, which is redundant most of the time - Instead pass iocb to backing_file_splice_read and backing_file_splice_write Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2024-11-11 10:45:03 +01:00
Joanne Koong	68bfb7eb7f	fuse: remove pages for requests and exclusively use folios All fuse requests use folios instead of pages for transferring data. Remove pages from the requests and exclusively use folios. No functional changes. [SzM: rename back folio_descs -> descs, etc.] Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 14:08:35 +01:00
Joanne Koong	3b97c3652d	fuse: convert direct io to use folios Convert direct io requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Joanne Koong	cbe9c115b7	fuse: convert writebacks to use folios Convert writeback requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Joanne Koong	448895df03	fuse: convert retrieves to use folios Convert retrieve requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Joanne Koong	ac1cf6e3bb	fuse: convert ioctls to use folios Convert ioctl requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Joanne Koong	f2ef459bab	fuse: convert writes (non-writeback) to use folios Convert non-writeback write requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Joanne Koong	51b0253018	fuse: convert reads to use folios Convert read requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Joanne Koong	02b78c7a7a	fuse: convert readdir to use folios Convert readdir requests to use a folio instead of a page. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Joanne Koong	c1e4862b13	fuse: convert readlink to use folios Convert readlink requests to use a folio instead of a page. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Joanne Koong	ee80369a8a	fuse: convert cuse to use folios Convert cuse requests to use a folio instead of a page. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Joanne Koong	29279e1d42	fuse: add support in virtio for requests using folios Until all requests have been converted to use folios instead of pages, virtio will need to support both types. Once all requests have been converted, then virtio will support just folios. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Joanne Koong	a669c2df36	fuse: support folios in struct fuse_args_pages and fuse_copy_pages() This adds support in struct fuse_args_pages and fuse_copy_pages() for using folios instead of pages for transferring data. Both folios and pages must be supported right now in struct fuse_args_pages and fuse_copy_pages() until all request types have been converted to use folios. Once all have been converted, then struct fuse_args_pages and fuse_copy_pages() will only support folios. Right now in fuse, all folios are one page (large folios are not yet supported). As such, copying folio->page is sufficient for copying the entire folio in fuse_copy_pages(). No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-11-05 11:14:32 +01:00
Al Viro	8152f82010	fdget(), more trivial conversions all failure exits prior to fdget() leave the scope, all matching fdput() are immediately followed by leaving the scope. [xfs_ioc_commit_range() chunk moved here as well] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-11-03 01:28:06 -05:00
Josef Bacik	8807f117be	fuse: convert fuse_notify_store to use folios This function creates pages in an inode and copies data into them, update the function to use a folio instead of a page, and use the appropriate folio helpers. [SzM: use filemap_grab_folio()] [Hau Tao: The third argument of folio_zero_range() should be the length to be zeroed, not the total length. Fix it by using folio_zero_segment() instead in fuse_notify_store()] Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:50 +02:00
Josef Bacik	71e10dc2f5	fuse: convert fuse_retrieve to use folios We're just looking for pages in a mapping, use a folio and the folio lookup function directly instead of using the page helper. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:50 +02:00
Josef Bacik	949d67ac2e	fuse: use the folio based vmstat helpers In order to make it easier to switch to folios in the fuse_args_pages update the places where we update the vmstat counters for writeback to use the folio related helpers. On the inc side this is easy as we already have the folio, on the dec side we have to page_folio() the pages for now. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:50 +02:00
Miklos Szeredi	d34a5575e6	fuse: remove stray debug line It wasn't there when the patch was posted for review, but somehow made it into the pull. Link: https://lore.kernel.org/all/20240913104703.1673180-1-mszeredi@redhat.com/ Fixes: `efad7153bf` ("fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Josef Bacik	6930b8dac1	fuse: convert fuse_writepage_need_send to take a folio fuse_writepage_need_send is called by fuse_writepages_fill() which already has a folio. Change fuse_writepage_need_send() to take a folio instead, add a helper to check if the folio range is under writeback and use this, as well as the appropriate folio helpers in the rest of the function. Update fuse_writepage_need_send() to pass in the folio directly. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Josef Bacik	65fe891d90	fuse: convert fuse_do_readpage to use folios Now that the buffered write path is using folios, convert fuse_do_readpage() to take a folio instead of a page, update it to use the appropriate folio helpers, and update the callers to pass in the folio directly instead of a page. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Josef Bacik	e6befec5e9	fuse: use kiocb_modified in buffered write path This combines the file_remove_privs() and file_update_time() call into one call. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Josef Bacik	184b6eb364	fuse: convert fuse_page_mkwrite to use folios Convert this to grab the folio directly, and update all the helpers to use the folio related functions. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Josef Bacik	9bafbe7ae0	fuse: convert fuse_fill_write_pages to use folios Convert this to grab the folio directly, and update all the helpers to use the folio related functions. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Josef Bacik	785d06afc8	fuse: convert fuse_send_write_pages to use folios Convert this to grab the folio from the fuse_args_pages and use the appropriate folio related functions. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Josef Bacik	3eab9d7bc2	fuse: convert readahead to use folios Currently we're using the __readahead_batch() helper which populates our fuse_args_pages->pages array with pages. Convert this to use the newer folio based pattern which is to call readahead_folio() to get the next folio in the read ahead batch. I've updated the code to use things like folio_size() and to take into account larger folio sizes, but this is purely to make that eventual work easier to do, we currently will not get large folios so this is more future proofing than actual support. [SzM: remove check for readahead_folio() won't return NULL (at least for now) so remove ugly assign in conditional.] Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Josef Bacik	aaa32429da	fuse: use fuse_range_is_writeback() instead of iterating pages fuse_send_readpages() waits for writeback on each page. This can be replaced by a single call to fuse_range_is_writeback(). [SzM: split this off from "fuse: convert readahead to use folios"] Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Hou Tao	86b74eb5a1	virtiofs: use GFP_NOFS when enqueuing request through kworker When invoking virtio_fs_enqueue_req() through kworker, both the allocation of the sg array and the bounce buffer still use GFP_ATOMIC. Considering the size of the sg array may be greater than PAGE_SIZE, use GFP_NOFS instead of GFP_ATOMIC to lower the possibility of memory allocation failure and to avoid unnecessarily depleting the atomic reserves. GFP_NOFS is not passed to virtio_fs_enqueue_req() directly, GFP_KERNEL and memalloc_nofs_{save\|restore} helpers are used instead. It may seem OK to pass GFP_NOFS to virtio_fs_enqueue_req() as well when queuing the request for the first time, but this is not the case. The reason is that fuse_request_queue_background() may call ->queue_request_and_unlock() while holding fc->bg_lock, which is a spin-lock. Therefore, still use GFP_ATOMIC for it. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Hou Tao	41748675c0	virtiofs: use pages instead of pointer for kernel direct IO When trying to insert a 10MB kernel module kept in a virtio-fs with cache disabled, the following warning was reported: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ...... Modules linked in: CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:__alloc_pages+0x2bf/0x380 ...... Call Trace: <TASK> ? __warn+0x8e/0x150 ? __alloc_pages+0x2bf/0x380 __kmalloc_large_node+0x86/0x160 __kmalloc+0x33c/0x480 virtio_fs_enqueue_req+0x240/0x6d0 virtio_fs_wake_pending_and_unlock+0x7f/0x190 queue_request_and_unlock+0x55/0x60 fuse_simple_request+0x152/0x2b0 fuse_direct_io+0x5d2/0x8c0 fuse_file_read_iter+0x121/0x160 __kernel_read+0x151/0x2d0 kernel_read+0x45/0x50 kernel_read_file+0x1a9/0x2a0 init_module_from_file+0x6a/0xe0 idempotent_init_module+0x175/0x230 __x64_sys_finit_module+0x5d/0xb0 x64_sys_call+0x1c3/0x9e0 do_syscall_64+0x3d/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ...... </TASK> ---[ end trace 0000000000000000 ]--- The warning is triggered as follows: 1) syscall finit_module() handles the module insertion and it invokes kernel_read_file() to read the content of the module first. 2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and passes it to kernel_read(). kernel_read() constructs a kvec iter by using iov_iter_kvec() and passes it to fuse_file_read_iter(). 3) virtio-fs disables the cache, so fuse_file_read_iter() invokes fuse_direct_io(). As for now, the maximal read size for kvec iter is only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so fuse_direct_io() doesn't split the 10MB buffer. It saves the address and the size of the 10MB-sized buffer in out_args[0] of a fuse request and passes the fuse request to virtio_fs_wake_pending_and_unlock(). 4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to queue the request. Because virtiofs need DMA-able address, so virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for all fuse args, copies these args into the bounce buffer and passed the physical address of the bounce buffer to virtiofsd. The total length of these fuse args for the passed fuse request is about 10MB, so copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and it triggers the warning in __alloc_pages(): if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) return NULL; 5) virtio_fs_enqueue_req() will retry the memory allocation in a kworker, but it won't help, because kmalloc() will always return NULL due to the abnormal size and finit_module() will hang forever. A feasible solution is to limit the value of max_read for virtio-fs, so the length passed to kmalloc() will be limited. However it will affect the maximal read size for normal read. And for virtio-fs write initiated from kernel, it has the similar problem but now there is no way to limit fc->max_write in kernel. So instead of limiting both the values of max_read and max_write in kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use pages instead of pointer to pass the KVEC_IO data. After switching to pages for KVEC_IO data, these pages will be used for DMA through virtio-fs. If these pages are backed by vmalloc(), {flush\|invalidate}_kernel_vmap_range() are necessary to flush or invalidate the cache before the DMA operation. So add two new fields in fuse_args_pages to record the base address of vmalloc area and the condition indicating whether invalidation is needed. Perform the flush in fuse_get_user_pages() for write operations and the invalidation in fuse_release_user_pages() for read operations. It may seem necessary to introduce another field in fuse_conn to indicate that these KVEC_IO pages are used for DMA, However, considering that virtio-fs is currently the only user of use_pages_for_kvec_io, just reuse use_pages_for_kvec_io to indicate that these pages will be used for DMA. Fixes: `a62a8ef9d9` ("virtio-fs: add virtiofs filesystem") Signed-off-by: Hou Tao <houtao1@huawei.com> Tested-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
yangyun	cc23d537e5	fuse: remove useless IOCB_DIRECT in fuse_direct_read/write_iter Commit `23c94e1cdc` ("fuse: Switch to using async direct IO for FOPEN_DIRECT_IO") gave the async direct IO code path in the fuse_direct_read_iter() and fuse_direct_write_iter(). But since these two functions are only called under FOPEN_DIRECT_IO is set, it seems that we can also use the async direct IO even the flag IOCB_DIRECT is not set to enjoy the async direct IO method. Also move the definition of fuse_io_priv to where it is used in fuse_ direct_write_iter. Signed-off-by: yangyun <yangyun50@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Joanne Koong	2b3933b1e0	fuse: enable dynamic configuration of fuse max pages limit (FUSE_MAX_MAX_PAGES) Introduce the capability to dynamically configure the max pages limit (FUSE_MAX_MAX_PAGES) through a sysctl. This allows system administrators to dynamically set the maximum number of pages that can be used for servicing requests in fuse. Previously, this is gated by FUSE_MAX_MAX_PAGES which is statically set to 256 pages. One result of this is that the buffer size for a write request is limited to 1 MiB on a 4k-page system. The default value for this sysctl is the original limit (256 pages). $ sysctl -a \| grep max_pages_limit fs.fuse.max_pages_limit = 256 $ sysctl -n fs.fuse.max_pages_limit 256 $ echo 1024 \| sudo tee /proc/sys/fs/fuse/max_pages_limit 1024 $ sysctl -n fs.fuse.max_pages_limit 1024 $ echo 65536 \| sudo tee /proc/sys/fs/fuse/max_pages_limit tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument $ echo 0 \| sudo tee /proc/sys/fs/fuse/max_pages_limit tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument $ echo 65535 \| sudo tee /proc/sys/fs/fuse/max_pages_limit 65535 $ sysctl -n fs.fuse.max_pages_limit 65535 Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-25 17:05:49 +02:00
Miklos Szeredi	184429a17f	Revert "fuse: move initialization of fuse_file to fuse_writepages() instead of in callback" This reverts commit `672c3b7457`. fuse_writepages() might be called with no dirty pages after all writable opens were closed. In this case __fuse_write_file_get() will return NULL which will trigger the WARNING. The exact conditions under which this is triggered is unclear and syzbot didn't find a reproducer yet. Reported-by: syzbot+217a976dc26ef2fa8711@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/CAJnrk1aQwfvb51wQ5rUSf9N8j1hArTFeSkHqC_3T-mU6_BCD=A@mail.gmail.com/ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-21 10:02:51 +02:00
Amir Goldstein	20121d3f58	fuse: update inode size after extending passthrough write yangyun reported that libfuse test test_copy_file_range() copies zero bytes from a newly written file when fuse passthrough is enabled. The reason is that extending passthrough write is not updating the fuse inode size and when vfs_copy_file_range() observes a zero size inode, it returns without calling the filesystem copy_file_range() method. Fix this by adjusting the fuse inode size after an extending passthrough write. This does not provide cache coherency of fuse inode attributes and backing inode attributes, but it should prevent situations where fuse inode size is too small, causing read/copy to be wrongly shortened. Reported-by: yangyun <yangyun50@huawei.com> Closes: https://github.com/libfuse/libfuse/issues/1048 Fixes: `57e1176e60` ("fuse: implement read/write passthrough") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-16 13:18:21 +02:00
Amir Goldstein	f03b296e8b	fs: pass offset and result to backing_file end_write() callback This is needed for extending fuse inode size after fuse passthrough write. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/linux-fsdevel/CAJfpegs=cvZ_NYy6Q_D42XhYS=Sjj5poM1b5TzXzOVvX=R36aA@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-10-16 13:17:45 +02:00
Al Viro	cb787f4ac0	[tree-wide] finally take no_llseek out no_llseek had been defined to NULL two years ago, in commit `868941b144` ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek \| grep -v porting.rst \| while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-09-27 08:18:43 -07:00
Linus Torvalds	0181f8c809	virtio: features, fixes, cleanups Several new features here: virtio-balloon supports new stats vdpa supports setting mac address vdpa/mlx5 suspend/resume as well as MKEY ops are now faster virtio_fs supports new sysfs entries for queue info virtio/vsock performance has been improved Fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmbz7ykPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpkk8H/A3vMRYXBzne9anezZLvADKS/CpX7v0DFEVj VfSMWXvYdUariYDyyb7pZsvK5QR22pE0pIaW6Kcgv9fNwq27M/H6g6NJk5ny8a7d 216AQs1J28pXPPY+q03fhf3SzE3yHP8aeD9lyiO9QJYfs9vjtoyZeBGt3a4IUSX4 ZeNBAx8xWTBcEDIIcZLdY1DNDTbZ4+qQ12Ln9IKq7D4xkE6l7Xh+HGdgTWTnDZ8P qEUUOmJTFKTQdOiVuU4NN3wzgHKWHdwKg0uWXo7ereYr3kYe3q//jCcLMv88a1x0 XP7NRBQg/rsErwTMdLz6ffyqXJs6lGGqNXzRfZKEwAvmnh/+zs4= =gNBq -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-balloon supports new stats - vdpa supports setting mac address - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster - virtio_fs supports new sysfs entries for queue info - virtio/vsock performance has been improved And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vsock/virtio: avoid queuing packets when intermediate queue is empty vsock/virtio: refactor virtio_transport_send_pkt_work fw_cfg: Constify struct kobj_type vdpa/mlx5: Postpone MR deletion vdpa/mlx5: Introduce init/destroy for MR resources vdpa/mlx5: Rename mr_mtx -> lock vdpa/mlx5: Extract mr members in own resource struct vdpa/mlx5: Rename function vdpa/mlx5: Delete direct MKEYs in parallel vdpa/mlx5: Create direct MKEYs in parallel MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section virtio_fs: add sysfs entries for queue information virtio_fs: introduce virtio_fs_put_locked helper vdpa: Remove unused declarations vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command vdpa/mlx5: Small improvement for change_num_qps() vdpa/mlx5: Keep notifiers during suspend but ignore vdpa/mlx5: Parallelize device resume vdpa/mlx5: Parallelize device suspend vdpa/mlx5: Use async API for vq modify commands ...	2024-09-26 08:43:17 -07:00
Max Gurtovoy	87cbdc396a	virtio_fs: add sysfs entries for queue information Introduce sysfs entries to provide visibility to the multiple queues used by the Virtio FS device. This enhancement allows users to query information about these queues. Specifically, add two sysfs entries: 1. Queue name: Provides the name of each queue (e.g. hiprio/requests.8). 2. CPU list: Shows the list of CPUs that can process requests for each queue. The CPU list feature is inspired by similar functionality in the block MQ layer, which provides analogous sysfs entries for block devices. These new sysfs entries will improve observability and aid in debugging and performance tuning of Virtio FS devices. Reviewed-by: Idan Zach <izach@nvidia.com> Reviewed-by: Shai Malin <smalin@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20240825130716.9506-2-mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-09-25 07:07:43 -04:00
Max Gurtovoy	4045b64298	virtio_fs: introduce virtio_fs_put_locked helper Introduce a new helper function virtio_fs_put_locked to encapsulate the common pattern of releasing a virtio_fs reference while holding a lock. The existing virtio_fs_put helper will be used to release a virtio_fs reference while not holding a lock. Also add an assertion in case the lock is not taken when it should. Reviewed-by: Idan Zach <izach@nvidia.com> Reviewed-by: Shai Malin <smalin@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20240825130716.9506-1-mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2024-09-25 07:07:43 -04:00
Linus Torvalds	f7fccaa772	fuse update for 6.12 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZvKlbgAKCRDh3BK/laaZ PLliAP9q5btlhlffnRg2LWCf4rIzbJ6vkORkc+GeyAXnWkIljQEA9En1K2vyg7Tk f9FvNQK9C+pS0GxURDRI7YedJ2f9FQ0= =wuY0 -----END PGP SIGNATURE----- Merge tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add support for idmapped fuse mounts (Alexander Mikhalitsyn) - Add optimization when checking for writeback (yangyun) - Add tracepoints (Josef Bacik) - Clean up writeback code (Joanne Koong) - Clean up request queuing (me) - Misc fixes * tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (32 commits) fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set fuse: clear FR_PENDING if abort is detected when sending request fs/fuse: convert to use invalid_mnt_idmap fs/mnt_idmapping: introduce an invalid_mnt_idmap fs/fuse: introduce and use fuse_simple_idmap_request() helper fs/fuse: fix null-ptr-deref when checking SB_I_NOIDMAP flag fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN virtio_fs: allow idmapped mounts fuse: allow idmapped mounts fuse: warn if fuse_access is called when idmapped mounts are allowed fuse: handle idmappings properly in ->write_iter() fuse: support idmapped ->rename op fuse: support idmapped ->set_acl fuse: drop idmap argument from __fuse_get_acl fuse: support idmapped ->setattr op fuse: support idmapped ->permission inode op fuse: support idmapped getattr inode op fuse: support idmap for mkdir/mknod/symlink/create/tmpfile fuse: support idmapped FUSE_EXT_GROUPS fuse: add an idmap argument to fuse_simple_request ...	2024-09-24 15:29:42 -07:00
yangyun	2f3d8ff457	fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set This may be a typo. The comment has said shared locks are not allowed when this bit is set. If using shared lock, the wait in `fuse_file_cached_io_open` may be forever. Fixes: `205c1d8026` ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP") CC: stable@vger.kernel.org # v6.9 Signed-off-by: yangyun <yangyun50@huawei.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-24 13:21:33 +02:00
Miklos Szeredi	fcd2d9e1fd	fuse: clear FR_PENDING if abort is detected when sending request The (!fiq->connected) check was moved into the queuing method resulting in the following: Fixes: `5de8acb41c` ("fuse: cleanup request queuing towards virtiofs") Reported-by: Lai, Yi <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/all/ZvFEAM6JfrBKsOU0@ly-workstation/ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-24 10:56:09 +02:00
Linus Torvalds	f8ffbc365f	struct fd layout change (and conversion to accessor helpers) -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZvDNmgAKCRBZ7Krx/gZQ 63zrAP9vI0rf55v27twiabe9LnI7aSx5ckoqXxFIFxyT3dOYpQD/bPmoApnWDD3d 592+iDgLsema/H/0/CqfqlaNtDNY8Q0= =HUl5 -----END PGP SIGNATURE----- Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' updates from Al Viro: "Just the 'struct fd' layout change, with conversion to accessor helpers" * tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add struct fd constructors, get rid of __to_fd() struct fd: representation change introduce fd_file(), convert all accessors to it.	2024-09-23 09:35:36 -07:00
Alexander Mikhalitsyn	106e4593ed	fs/fuse: convert to use invalid_mnt_idmap We should convert fs/fuse code to use a newly introduced invalid_mnt_idmap instead of passing a NULL as idmap pointer. Suggested-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-23 11:10:26 +02:00
Alexander Mikhalitsyn	0c6793823d	fs/fuse: introduce and use fuse_simple_idmap_request() helper Let's convert all existing callers properly. No functional changes intended. Suggested-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-23 11:07:55 +02:00
Alexander Mikhalitsyn	3988a60d3a	fs/fuse: fix null-ptr-deref when checking SB_I_NOIDMAP flag It was reported [1] that on linux-next/fs-next the following crash is reproducible: [ 42.659136] Oops: general protection fault, probably for non-canonical address 0xdffffc000000000b: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 42.660501] fbcon: Taking over console [ 42.660930] KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f] [ 42.661752] CPU: 1 UID: 0 PID: 1589 Comm: dtprobed Not tainted 6.11.0-rc6+ #1 [ 42.662565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.6.6 08/22/2023 [ 42.663472] RIP: 0010:fuse_get_req+0x36b/0x990 [fuse] [ 42.664046] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8c 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6d 08 48 8d 7d 58 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 05 00 00 f6 45 59 20 0f 85 06 03 00 00 48 83 [ 42.666945] RSP: 0018:ffffc900009a7730 EFLAGS: 00010212 [ 42.668837] RAX: dffffc0000000000 RBX: 1ffff92000134eed RCX: ffffffffc20dec9a [ 42.670122] RDX: 000000000000000b RSI: 0000000000000008 RDI: 0000000000000058 [ 42.672154] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1022110172 [ 42.672160] R10: ffff888110880b97 R11: ffffc900009a737a R12: 0000000000000001 [ 42.672179] R13: ffff888110880b60 R14: ffff888110880b90 R15: ffff888169973840 [ 42.672186] FS: 00007f28cd21d7c0(0000) GS:ffff8883ef280000(0000) knlGS:0000000000000000 [ 42.672191] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 42.[ CR02: ;32m00007f3237366208 CR3: 0 OK 79e001 CR4: 0000000000770ef0 [ 42.672214] PKRU: 55555554 [ 42.672218] Call Trace: [ 42.672223] <TASK> [ 42.672226] ? die_addr+0x41/0xa0 [ 42.672238] ? exc_general_protection+0x14c/0x230 [ 42.672250] ? asm_exc_general_protection+0x26/0x30 [ 42.672260] ? fuse_get_req+0x77a/0x990 [fuse] [ 42.672281] ? fuse_get_req+0x36b/0x990 [fuse] [ 42.672300] ? kasan_unpoison+0x27/0x60 [ 42.672310] ? __pfx_fuse_get_req+0x10/0x10 [fuse] [ 42.672327] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.672333] ? alloc_pages_mpol_noprof+0x195/0x440 [ 42.672340] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.672345] ? kasan_unpoison+0x27/0x60 [ 42.672350] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.672355] ? __kasan_slab_alloc+0x4d/0x90 [ 42.672362] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.672367] ? __kmalloc_cache_noprof+0x134/0x350 [ 42.672376] fuse_simple_background+0xe7/0x180 [fuse] [ 42.672406] cuse_channel_open+0x540/0x710 [cuse] [ 42.672415] misc_open+0x2a7/0x3a0 [ 42.672424] chrdev_open+0x1ef/0x5f0 [ 42.672432] ? __pfx_chrdev_open+0x10/0x10 [ 42.672439] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.672443] ? security_file_open+0x3bb/0x720 [ 42.672451] do_dentry_open+0x43d/0x1200 [ 42.672459] ? __pfx_chrdev_open+0x10/0x10 [ 42.672468] vfs_open+0x79/0x340 [ 42.672475] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.672482] do_open+0x68c/0x11e0 [ 42.672489] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.672495] ? __pfx_do_open+0x10/0x10 [ 42.672501] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.672506] ? open_last_lookups+0x2a2/0x1370 [ 42.672515] path_openat+0x24f/0x640 [ 42.672522] ? __pfx_path_openat+0x10/0x10 [ 42.723972] ? stack_depot_save_flags+0x45/0x4b0 [ 42.724787] ? __fput+0x43c/0xa70 [ 42.725100] do_filp_open+0x1b3/0x3e0 [ 42.725710] ? poison_slab_object+0x10d/0x190 [ 42.726145] ? __kasan_slab_free+0x33/0x50 [ 42.726570] ? __pfx_do_filp_open+0x10/0x10 [ 42.726981] ? do_syscall_64+0x64/0x170 [ 42.727418] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 42.728018] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.728505] ? do_raw_spin_lock+0x131/0x270 [ 42.728922] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 42.729494] ? do_raw_spin_unlock+0x14c/0x1f0 [ 42.729992] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.730889] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.732178] ? alloc_fd+0x176/0x5e0 [ 42.732585] do_sys_openat2+0x122/0x160 [ 42.732929] ? __pfx_do_sys_openat2+0x10/0x10 [ 42.733448] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.734013] ? __pfx_map_id_up+0x10/0x10 [ 42.734482] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.735529] ? __memcg_slab_free_hook+0x292/0x500 [ 42.736131] __x64_sys_openat+0x123/0x1e0 [ 42.736526] ? __pfx___x64_sys_openat+0x10/0x10 [ 42.737369] ? __x64_sys_close+0x7c/0xd0 [ 42.737717] ? srso_alias_return_thunk+0x5/0xfbef5 [ 42.738192] ? syscall_trace_enter+0x11e/0x1b0 [ 42.738739] do_syscall_64+0x64/0x170 [ 42.739113] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 42.739638] RIP: 0033:0x7f28cd13e87b [ 42.740038] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25 [ 42.741943] RSP: 002b:00007ffc992546c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 [ 42.742951] RAX: ffffffffffffffda RBX: 00007f28cd44f1ee RCX: 00007f28cd13e87b [ 42.743660] RDX: 0000000000000002 RSI: 00007f28cd44f2fa RDI: 00000000ffffff9c [ 42.744518] RBP: 00007f28cd44f2fa R08: 0000000000000000 R09: 0000000000000001 [ 42.745211] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 [ 42.745920] R13: 00007f28cd44f2fa R14: 0000000000000000 R15: 0000000000000003 [ 42.746708] </TASK> [ 42.746937] Modules linked in: cuse vfat fat ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common kvm_amd ccp bochs drm_vram_helper kvm drm_ttm_helper ttm pcspkr i2c_piix4 drm_kms_helper i2c_smbus pvpanic_mmio pvpanic joydev sch_fq_codel drm fuse xfs nvme_tcp nvme_fabrics nvme_core sd_mod sg virtio_net net_failover virtio_scsi failover crct10dif_pclmul crc32_pclmul ata_generic pata_acpi ata_piix ghash_clmulni_intel virtio_pci sha512_ssse3 virtio_pci_legacy_dev sha256_ssse3 virtio_pci_modern_dev sha1_ssse3 libata serio_raw dm_multipath btrfs blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qemu_fw_cfg aesni_intel crypto_simd cryptd [ 42.754333] ---[ end trace 0000000000000000 ]--- [ 42.756899] RIP: 0010:fuse_get_req+0x36b/0x990 [fuse] [ 42.757851] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8c 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6d 08 48 8d 7d 58 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 05 00 00 f6 45 59 20 0f 85 06 03 00 00 48 83 [ 42.760334] RSP: 0018:ffffc900009a7730 EFLAGS: 00010212 [ 42.760940] RAX: dffffc0000000000 RBX: 1ffff92000134eed RCX: ffffffffc20dec9a [ 42.761697] RDX: 000000000000000b RSI: 0000000000000008 RDI: 0000000000000058 [ 42.763009] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1022110172 [ 42.763920] R10: ffff888110880b97 R11: ffffc900009a737a R12: 0000000000000001 [ 42.764839] R13: ffff888110880b60 R14: ffff888110880b90 R15: ffff888169973840 [ 42.765716] FS: 00007f28cd21d7c0(0000) GS:ffff8883ef280000(0000) knlGS:0000000000000000 [ 42.766890] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 42.767828] CR2: 00007f3237366208 CR3: 000000012c79e001 CR4: 0000000000770ef0 [ 42.768730] PKRU: 55555554 [ 42.769022] Kernel panic - not syncing: Fatal exception [ 42.770758] Kernel Offset: 0x7200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 42.771947] ---[ end Kernel panic - not syncing: Fatal exception ]--- It's obviously CUSE related callstack. For CUSE case, we don't have superblock and our checks for SB_I_NOIDMAP flag does not make any sense. Let's handle this case gracefully. Fixes: `aa16880d9f` ("fuse: add basic infrastructure to support idmappings") Link: https://lore.kernel.org/linux-next/87v7z586py.fsf@debian-BULLSEYE-live-builder-AMD64/ [1] Reported-by: Chandan Babu R <chandanbabu@kernel.org> Reported-by: syzbot+20c7e20cc8f5296dca12@syzkaller.appspotmail.com Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-23 11:02:25 +02:00
Miklos Szeredi	efad7153bf	fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN Only f_path is used from backing files registered with FUSE_DEV_IOC_BACKING_OPEN, so it makes sense to allow O_PATH descriptors. O_PATH files have an empty f_op, so don't check read_iter/write_iter. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-19 09:37:13 +02:00
Alexander Mikhalitsyn	862b9a8eb9	virtio_fs: allow idmapped mounts Allow idmapped mounts for virtiofs. It's absolutely safe as for virtiofs we have the same feature negotiation mechanism as for classical fuse filesystems. This does not affect any existing setups anyhow. virtiofsd support: https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/245 Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-18 10:28:46 +02:00
Linus Torvalds	2775df6e5e	vfs-6.12.folio -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEvgAKCRCRxhvAZXjc ou77AQD3U1KjbdgzbUi6kaUmiiWOPhfYTlm8mho8dBjqvTCB+AD/XTWSFCWWhHB4 KyQZTbjRD81xmVNbKjASazp0EA6Ahwc= =gIsD -----END PGP SIGNATURE----- Merge tag 'vfs-6.12.folio' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs folio updates from Christian Brauner: "This contains work to port write_begin and write_end to rely on folios for various filesystems. This converts ocfs2, vboxfs, orangefs, jffs2, hostfs, fuse, f2fs, ecryptfs, ntfs3, nilfs2, reiserfs, minixfs, qnx6, sysv, ufs, and squashfs. After this series lands a bunch of the filesystems in this list do not mention struct page anymore" * tag 'vfs-6.12.folio' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (61 commits) Squashfs: Ensure all readahead pages have been used Squashfs: Rewrite and update squashfs_readahead_fragment() to not use page->index Squashfs: Update squashfs_readpage_block() to not use page->index Squashfs: Update squashfs_readahead() to not use page->index Squashfs: Update page_actor to not use page->index jffs2: Use a folio in jffs2_garbage_collect_dnode() jffs2: Convert jffs2_do_readpage_nolock to take a folio buffer: Convert __block_write_begin() to take a folio ocfs2: Convert ocfs2_write_zero_page to use a folio fs: Convert aops->write_begin to take a folio fs: Convert aops->write_end to take a folio vboxsf: Use a folio in vboxsf_write_end() orangefs: Convert orangefs_write_begin() to use a folio orangefs: Convert orangefs_write_end() to use a folio jffs2: Convert jffs2_write_begin() to use a folio jffs2: Convert jffs2_write_end() to use a folio hostfs: Convert hostfs_write_end() to use a folio fuse: Convert fuse_write_begin() to use a folio fuse: Convert fuse_write_end() to use a folio f2fs: Convert f2fs_write_begin() to use a folio ...	2024-09-16 08:54:30 +02:00
Alexander Mikhalitsyn	16e1503eaf	fuse: allow idmapped mounts Now we have everything in place and we can allow idmapped mounts by setting the FS_ALLOW_IDMAP flag. Notice that real availability of idmapped mounts will depend on the fuse daemon. Fuse daemon have to set FUSE_ALLOW_IDMAP flag in the FUSE_INIT reply. To discuss: - we enable idmapped mounts support only if "default_permissions" mode is enabled, because otherwise we would need to deal with UID/GID mappings in the userspace side OR provide the userspace with idmapped req->in.h.uid/req->in.h.gid values which is not something that we probably want to. Idmapped mounts philosophy is not about faking caller uid/gid. Some extra links and examples: - libfuse support https://github.com/mihalicyn/libfuse/commits/idmap_support - fuse-overlayfs support: https://github.com/mihalicyn/fuse-overlayfs/commits/idmap_support - cephfs-fuse conversion example https://github.com/mihalicyn/ceph/commits/fuse_idmap - glusterfs conversion example https://github.com/mihalicyn/glusterfs/commits/fuse_idmap Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:51:11 +02:00
Alexander Mikhalitsyn	6d14b18596	fuse: warn if fuse_access is called when idmapped mounts are allowed It is not possible with the current fuse code, but let's protect ourselves from regressions in the future. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:51:11 +02:00
Alexander Mikhalitsyn	5b8ca5a54c	fuse: handle idmappings properly in ->write_iter() This is needed to properly clear suid/sgid. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:51:11 +02:00
Alexander Mikhalitsyn	4be75ffe72	fuse: support idmapped ->rename op RENAME_WHITEOUT is a special case of ->rename and we need to take idmappings into account there. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:51:11 +02:00
Alexander Mikhalitsyn	d395d0a5d2	fuse: support idmapped ->set_acl It's just a matter of adjusting a permission check condition for S_ISGID flag. All the rest is already handled in the generic VFS code. Notice that this permission check is the analog of what we have in posix_acl_update_mode() generic helper, but fuse doesn't use this helper as on the kernel side we don't care about ensuring that POSIX ACL and CHMOD permissions are in sync as it is a responsibility of a userspace daemon to handle that. For the same reason we don't have a calls to posix_acl_chmod(), while most of other filesystem do. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:51:11 +02:00
Alexander Mikhalitsyn	4d833befa2	fuse: drop idmap argument from __fuse_get_acl We don't need to have idmap in the __fuse_get_acl as we don't have any use for it. In the current POSIX ACL implementation, idmapped mounts are taken into account on the userspace/kernel border (see vfs_set_acl_idmapped_mnt() and vfs_posix_acl_to_xattr()). Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:51:11 +02:00
Alexander Mikhalitsyn	276a025699	fuse: support idmapped ->setattr op Need to translate uid and gid in case of chown(2). Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:51:06 +02:00
Alexander Mikhalitsyn	c1d82215d3	fuse: support idmapped ->permission inode op We only cover the case when "default_permissions" flag is used. A reason for that is that otherwise all the permission checks are done in the userspace and we have to deal with VFS idmapping in the userspace (which is bad), alternatively we have to provide the userspace with idmapped req->in.h.uid/req->in.h.gid which is also not align with VFS idmaps philosophy. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:49:58 +02:00
Alexander Mikhalitsyn	2a8c810d5e	fuse: support idmapped getattr inode op We have to: - pass an idmapping to the generic_fillattr() to properly handle UIG/GID mapping for the userspace. - pass -/- to fuse_fillattr() (analog of generic_fillattr() in fuse). Difference between these two is that generic_fillattr() takes all the stat() data from the inode directly, while fuse_fillattr() codepath takes a fresh data just from the userspace reply on the FUSE_GETATTR request. In some cases we can just pass &nop_mnt_idmap, because idmapping won't be used in these codepaths. For example, when 3rd argument of fuse_do_getattr() is NULL then idmap argument is not used. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:49:28 +02:00
Alexander Mikhalitsyn	556208e139	fuse: support idmap for mkdir/mknod/symlink/create/tmpfile We have all the infrastructure in place, we just need to pass an idmapping here. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:49:22 +02:00
Alexander Mikhalitsyn	d561254fb7	fuse: support idmapped FUSE_EXT_GROUPS We don't need to remap parent_gid, but have to adjust group membership checks and take idmapping into account. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:48:22 +02:00
Alexander Mikhalitsyn	10dc721836	fuse: add an idmap argument to fuse_simple_request If idmap == NULL and filesystem daemon declared idmapped mounts support, then uid/gid values in a fuse header will be -1. No functional changes intended. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:48:22 +02:00
Alexander Mikhalitsyn	aa16880d9f	fuse: add basic infrastructure to support idmappings Add some preparational changes in fuse_get_req/fuse_force_creds to handle idmappings. Miklos suggested [1], [2] to change the meaning of in.h.uid/in.h.gid fields when daemon declares support for idmapped mounts. In a new semantic, we fill uid/gid values in fuse header with a id-mapped caller uid/gid (for requests which create new inodes), for all the rest cases we just send -1 to userspace. No functional changes intended. Link: https://lore.kernel.org/all/CAJfpegsVY97_5mHSc06mSw79FehFWtoXT=hhTUK_E-Yhr7OAuQ@mail.gmail.com/ [1] Link: https://lore.kernel.org/all/CAJfpegtHQsEUuFq1k4ZbTD3E1h-GsrN3PWyv7X8cg6sfU_W2Yw@mail.gmail.com/ [2] Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-09-04 16:48:22 +02:00
Linus Torvalds	88fac17500	fuse fixes for 6.11-rc7 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZtbV4AAKCRDh3BK/laaZ PC33AP9XvLpQii0mLo12hTSP11TYpaatdhUvyFFKERle1yWkUgEAvtVutUJryTD2 sz7x5jj4GD9tCWyMlp8Xs5h1Dr4U6wc= =XdIb -----END PGP SIGNATURE----- Merge tag 'fuse-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: - Fix EIO if splice and page stealing are enabled on the fuse device - Disable problematic combination of passthrough and writeback-cache - Other bug fixes found by code review * tag 'fuse-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: disable the combination of passthrough and writeback cache fuse: update stats for pages in dropped aux writeback list fuse: clear PG_uptodate when using a stolen page fuse: fix memory leak in fuse_create_open fuse: check aborted connection before adding requests to pending list for resending fuse: use unsigned type for getxattr/listxattr size truncation	2024-09-03 12:32:00 -07:00
Aurelien Aptel	506b21c945	fuse: use correct name fuse_conn_list in docstring fuse_mount_list doesn't exist, use fuse_conn_list. Signed-off-by: Aurelien Aptel <aaptel@nvidia.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:13 +02:00
Josef Bacik	396b209e40	fuse: add simple request tracepoints I've been timing various fuse operations and it's quite annoying to do with kprobes. Add two tracepoints for sending and ending fuse requests to make it easier to debug and time various operations. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:13 +02:00
Joanne Koong	0acad9289b	fuse: refactor out shared logic in fuse_writepages_fill() and fuse_writepage_locked() This change refactors the shared logic in fuse_writepages_fill() and fuse_writepages_locked() into two separate helper functions, fuse_writepage_args_page_fill() and fuse_writepage_args_setup(). No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:12 +02:00
Joanne Koong	4046d3adcc	fuse: move fuse file initialization to wpa allocation time Before this change, wpa->ia.ff is initialized with an acquired reference on the fuse file right before it submits the writeback request. If there are auxiliary writebacks, then the initialization and reference acquisition needs to also be set before we submit the auxiliary writeback request. To make the logic simpler and to pave the way for a subsequent refactoring of fuse_writepages_fill() and fuse_writepage_locked(), this change initializes and acquires wpa->ia.ff when the wpa is allocated. No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:12 +02:00
Joanne Koong	9a8ebcf5e0	fuse: convert fuse_writepages_fill() to use a folio for its tmp page To pave the way for refactoring out the shared logic in fuse_writepages_fill() and fuse_writepage_locked(), this change converts the temporary page in fuse_writepages_fill() to use the folio API. This is similar to the change in commit `e0887e095a` ("fuse: Convert fuse_writepage_locked to take a folio"), which converted the tmp page in fuse_writepage_locked() to use the folio API. inc_node_page_state() is intentionally preserved here instead of converting to node_stat_add_folio() since it is updating the stat of the underlying page and to better maintain API symmetry with dec_node_page_stat() in fuse_writepage_finish_stat(). No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:12 +02:00
Joanne Koong	672c3b7457	fuse: move initialization of fuse_file to fuse_writepages() instead of in callback Prior to this change, data->ff is checked and if not initialized then initialized in the fuse_writepages_fill() callback, which gets called for every dirty page in the address space mapping. This logic is better placed in the main fuse_writepages() caller where data.ff is initialized before walking the dirty pages. No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:12 +02:00
Joanne Koong	c04e3b2118	fuse: refactor finished writeback stats updates into helper function Move the logic for updating the bdi and page stats for a finished writeback into a separate helper function, where it can be called from both fuse_writepage_finish() and fuse_writepage_add() (in the case where there is already an auxiliary write request for the page). No functional changes added. Suggested by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:12 +02:00
Joanne Koong	509a6458b4	fuse: drop unused fuse_mount arg in fuse_writepage_finish() Drop the unused "struct fuse_mount *fm" arg in fuse_writepage_finish(). No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:12 +02:00
yangyun	ac5cffec53	fuse: add fast path for fuse_range_is_writeback In some cases, the fi->writepages may be empty. And there is no need to check fi->writepages with spin_lock, which may have an impact on performance due to lock contention. For example, in scenarios where multiple readers read the same file without any writers, or where the page cache is not enabled. Also remove the outdated comment since commit `6b2fb79963` ("fuse: optimize writepages search") has optimize the situation by replacing list with rb-tree. Signed-off-by: yangyun <yangyun50@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:12 +02:00
Miklos Szeredi	5de8acb41c	fuse: cleanup request queuing towards virtiofs Virtiofs has its own queuing mechanism, but still requests are first queued on fiq->pending to be immediately dequeued and queued onto the virtio queue. The queuing on fiq->pending is unnecessary and might even have some performance impact due to being a contention point. Forget requests are handled similarly. Move the queuing of requests and forgets into the fiq->ops->*. fuse_iqueue_ops are renamed to reflect the new semantics. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Fixed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Tested-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Reviewed-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:12 +02:00
Bernd Schubert	3ab394b363	fuse: disable the combination of passthrough and writeback cache Current design and handling of passthrough is without fuse caching and with that FUSE_WRITEBACK_CACHE is conflicting. Fixes: `7dc4e97a4f` ("fuse: introduce FUSE_PASSTHROUGH capability") Cc: stable@kernel.org # v6.9 Signed-off-by: Bernd Schubert <bschubert@ddn.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:01 +02:00
Joanne Koong	f7790d6778	fuse: update stats for pages in dropped aux writeback list In the case where the aux writeback list is dropped (e.g. the pages have been truncated or the connection is broken), the stats for its pages and backing device info need to be updated as well. Fixes: `e2653bd53a` ("fuse: fix leaked aux requests") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Cc: <stable@vger.kernel.org> # v5.1 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-28 18:10:29 +02:00
Miklos Szeredi	76a51ac00c	fuse: clear PG_uptodate when using a stolen page Originally when a stolen page was inserted into fuse's page cache by fuse_try_move_page(), it would be marked uptodate. Then fuse_readpages_end() would call SetPageUptodate() again on the already uptodate page. Commit `413e8f014c` ("fuse: Convert fuse_readpages_end() to use folio_end_read()") changed that by replacing the SetPageUptodate() + unlock_page() combination with folio_end_read(), which does mostly the same, except it sets the uptodate flag with an xor operation, which in the above scenario resulted in the uptodate flag being cleared, which in turn resulted in EIO being returned on the read. Fix by clearing PG_uptodate instead of setting it in fuse_try_move_page(), conforming to the expectation of folio_end_read(). Reported-by: Jürg Billeter <j@bitron.ch> Debugged-by: Matthew Wilcox <willy@infradead.org> Fixes: `413e8f014c` ("fuse: Convert fuse_readpages_end() to use folio_end_read()") Cc: <stable@vger.kernel.org> # v6.10 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-28 18:10:29 +02:00
yangyun	3002240d16	fuse: fix memory leak in fuse_create_open The memory of struct fuse_file is allocated but not freed when get_create_ext return error. Fixes: `3e2b6fdbdc` ("fuse: send security context of inode on file") Cc: stable@vger.kernel.org # v5.17 Signed-off-by: yangyun <yangyun50@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-28 18:10:29 +02:00
Joanne Koong	97f30876c9	fuse: check aborted connection before adding requests to pending list for resending There is a race condition where inflight requests will not be aborted if they are in the middle of being re-sent when the connection is aborted. If fuse_resend has already moved all the requests in the fpq->processing lists to its private queue ("to_queue") and then the connection starts and finishes aborting, these requests will be added to the pending queue and remain on it indefinitely. Fixes: `760eac73f9` ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Cc: <stable@vger.kernel.org> # v6.9 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-28 18:10:29 +02:00
Jann Horn	b18915248a	fuse: use unsigned type for getxattr/listxattr size truncation The existing code uses min_t(ssize_t, outarg.size, XATTR_LIST_MAX) when parsing the FUSE daemon's response to a zero-length getxattr/listxattr request. On 32-bit kernels, where ssize_t and outarg.size are the same size, this is wrong: The min_t() will pass through any size values that are negative when interpreted as signed. fuse_listxattr() will then return this userspace-supplied negative value, which callers will treat as an error value. This kind of bug pattern can lead to fairly bad security bugs because of how error codes are used in the Linux kernel. If a caller were to convert the numeric error into an error pointer, like so: struct foo *func(...) { int len = fuse_getxattr(..., NULL, 0); if (len < 0) return ERR_PTR(len); ... } then it would end up returning this userspace-supplied negative value cast to a pointer - but the caller of this function wouldn't recognize it as an error pointer (IS_ERR_VALUE() only detects values in the narrow range in which legitimate errno values are), and so it would just be treated as a kernel pointer. I think there is at least one theoretical codepath where this could happen, but that path would involve virtio-fs with submounts plus some weird SELinux configuration, so I think it's probably not a concern in practice. Cc: stable@vger.kernel.org # v4.9 Fixes: `63401ccdb2` ("fuse: limit xattr returned size") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-28 18:10:29 +02:00
Jann Horn	3c0da3d163	fuse: Initialize beyond-EOF page contents before setting uptodate fuse_notify_store(), unlike fuse_do_readpage(), does not enable page zeroing (because it can be used to change partial page contents). So fuse_notify_store() must be more careful to fully initialize page contents (including parts of the page that are beyond end-of-file) before marking the page uptodate. The current code can leave beyond-EOF page contents uninitialized, which makes these uninitialized page contents visible to userspace via mmap(). This is an information leak, but only affects systems which do not enable init-on-alloc (via CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y or the corresponding kernel command line parameter). Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2574 Cc: stable@kernel.org Fixes: `a1d75f2582` ("fuse: add store request") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-08-18 08:45:39 -07:00
Al Viro	1da91ea87a	introduce fd_file(), convert all accessors to it. For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-08-12 22:00:43 -04:00
Matthew Wilcox (Oracle)	1da86618bd	fs: Convert aops->write_begin to take a folio Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-07 11:33:21 +02:00
Matthew Wilcox (Oracle)	a225800f32	fs: Convert aops->write_end to take a folio Most callers have a folio, and most implementations operate on a folio, so remove the conversion from folio->page->folio to fit through this interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-07 11:32:02 +02:00
Matthew Wilcox (Oracle)	a060d835cf	fuse: Convert fuse_write_begin() to use a folio Fetch a folio from the page cache instead of a page and use it throughout removing several calls to compound_head() and supporting large folios (in this function). We still have to convert back to a page for calling internal fuse functions, but hopefully they will be converted soon. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-07 11:32:01 +02:00
Matthew Wilcox (Oracle)	556d0ac068	fuse: Convert fuse_write_end() to use a folio Convert the passed page to a folio and operate on that. Replaces five calls to compound_head() with one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-07 11:32:00 +02:00
Linus Torvalds	f4f92db439	virtio: features, fixes, cleanups Several new features here: - Virtio find vqs API has been reworked (required to fix the scalability issue we have with adminq, which I hope to merge later in the cycle) - vDPA driver for Marvell OCTEON - virtio fs performance improvement - mlx5 migration speedups Fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmaXjQQPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpnIsH/jVNqAQbe/vaBQdNMdnsA+P9A9unLbYRxYCQ tN73mQRIXKtnZHBRAEbMGq52HPYg8HlN2HJSgyNo6I6t8VD+PiOco7m+3GpmqEcW aXPOPl0BAbVoDgyutxRuuodP8Z61lBx0mG6iOxpzTXOPGlpQqtPCFHO8YnodqnPf tMix/5uAqgZKV2siCbw5DtzwEc0gDHU8qsD0/nyoS5nBDF9yh/ardr5P/qiyFDQH atCNYTOhIFU83pLAaw0fpCGbkt7gxf+5RpWVx3wkYww+/MwvYhsveRvQyaGbBz3n WDtET3SOtVTta98OAGIKCq/2z8f6mYXBP7vXapBgnJG3vwS/poQ= =LYua -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: "Several new features here: - Virtio find vqs API has been reworked (required to fix the scalability issue we have with adminq, which I hope to merge later in the cycle) - vDPA driver for Marvell OCTEON - virtio fs performance improvement - mlx5 migration speedups Fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (56 commits) virtio: rename virtio_find_vqs_info() to virtio_find_vqs() virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info() virtio_balloon: convert to use virtio_find_vqs_info() virtiofs: convert to use virtio_find_vqs_info() scsi: virtio_scsi: convert to use virtio_find_vqs_info() virtio_net: convert to use virtio_find_vqs_info() virtio_crypto: convert to use virtio_find_vqs_info() virtio_console: convert to use virtio_find_vqs_info() virtio_blk: convert to use virtio_find_vqs_info() virtio: rename find_vqs_info() op to find_vqs() virtio: remove the original find_vqs() op virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly virtio: convert find_vqs() op implementations to find_vqs_info() virtio_pci: convert vp_*find_vqs() ops to find_vqs_info() virtio: introduce virtio_queue_info struct and find_vqs_info() config op virtio: make virtio_find_single_vq() call virtio_find_vqs() virtio: make virtio_find_vqs() call virtio_find_vqs_ctx() caif_virtio: use virtio_find_single_vq() for single virtqueue finding vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready() ...	2024-07-19 11:57:55 -07:00
Jiri Pirko	6c85d6b653	virtio: rename virtio_find_vqs_info() to virtio_find_vqs() Since the original virtio_find_vqs() is no longer present, rename virtio_find_vqs_info() back to virtio_find_vqs(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-20-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-17 05:20:58 -04:00
Jiri Pirko	fc496dcd93	virtiofs: convert to use virtio_find_vqs_info() Instead of passing separate names and callbacks arrays to virtio_find_vqs(), allocate one of virtual_queue_info structs and pass it to virtio_find_vqs_info(). Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-16-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-07-17 05:20:58 -04:00
Linus Torvalds	b8fc1bd73a	vfs-6.11.mount.api -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEGjAAKCRCRxhvAZXjc okXfAP4tFUYszUsSqYdsgy9UvXw3Dr5zOIzQmN++NdjGkbU5fgEAs2ystqEfJgr3 v7XvGbu65CvL4/slNhBZOU4yekGx5Qc= =C4QD -----END PGP SIGNATURE----- Merge tag 'vfs-6.11.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount API updates from Christian Brauner: - Add a generic helper to parse uid and gid mount options. Currently we open-code the same logic in various filesystems which is error prone, especially since the verification of uid and gid mount options is a sensitive operation in the face of idmappings. Add a generic helper and convert all filesystems over to it. Make sure that filesystems that are mountable in unprivileged containers verify that the specified uid and gid can be represented in the owning namespace of the filesystem. - Convert hostfs to the new mount api. * tag 'vfs-6.11.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fuse: Convert to new uid/gid option parsing helpers fuse: verify {g,u}id mount options correctly fat: Convert to new uid/gid option parsing helpers fat: Convert to new mount api fat: move debug into fat_mount_options vboxsf: Convert to new uid/gid option parsing helpers tracefs: Convert to new uid/gid option parsing helpers smb: client: Convert to new uid/gid option parsing helpers tmpfs: Convert to new uid/gid option parsing helpers ntfs3: Convert to new uid/gid option parsing helpers isofs: Convert to new uid/gid option parsing helpers hugetlbfs: Convert to new uid/gid option parsing helpers ext4: Convert to new uid/gid option parsing helpers exfat: Convert to new uid/gid option parsing helpers efivarfs: Convert to new uid/gid option parsing helpers debugfs: Convert to new uid/gid option parsing helpers autofs: Convert to new uid/gid option parsing helpers fs_parse: add uid & gid option option parsing helpers hostfs: Add const qualifier to host_root in hostfs_fill_super() hostfs: convert hostfs to use the new mount API	2024-07-15 11:31:32 -07:00
Peter-Jan Gootzen	106e4df120	virtio-fs: improved request latencies when Virtio queue is full Currently, when the Virtio queue is full, a work item is scheduled to execute in 1ms that retries adding the request to the queue. This is a large amount of time on the scale on which a virtio-fs device can operate. When using a DPU this is around 30-40us baseline without going to a remote server (4k, QD=1). This patch changes the retrying behavior to immediately filling the Virtio queue up again when a completion has been received. This reduces the 99.9th percentile latencies in our tests by 60x and slightly increases the overall throughput, when using a workload IO depth 2x the size of the Virtio queue and a DPU-powered virtio-fs device (NVIDIA BlueField DPU). Signed-off-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Yoray Zack <yorayz@nvidia.com> Message-Id: <20240517190435.152096-3-pgootzen@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2024-07-09 08:42:42 -04:00
Peter-Jan Gootzen	2106e1f444	virtio-fs: let -ENOMEM bubble up or burst gently Currently, when the enqueueing of a request or forget operation fails with -ENOMEM, the enqueueing is retried after a timeout. This patch removes this behavior and treats -ENOMEM in these scenarios like any other error. By bubbling up the error to user space in the case of a request, and by dropping the operation in case of a forget. This behavior matches that of the FUSE layer above, and also simplifies the error handling. The latter will come in handy for upcoming patches that optimize the retrying of operations in case of -ENOSPC. Signed-off-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Yoray Zack <yorayz@nvidia.com> Message-Id: <20240517190435.152096-2-pgootzen@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2024-07-09 08:42:41 -04:00
Eric Sandeen	eea6a8322e	fuse: Convert to new uid/gid option parsing helpers Convert to new uid/gid option parsing helpers Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/4e1a4efa-4ca5-4358-acee-40efd07c3c44@redhat.com Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-03 16:55:11 +02:00
Eric Sandeen	525bd65aa7	fuse: verify {g,u}id mount options correctly As was done in `0200679fc7` ("tmpfs: verify {g,u}id mount options correctly") we need to validate that the requested uid and/or gid is representable in the filesystem's idmapping. Cribbing from the above commit log, The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, fuse has been doing the correct thing. But since fuse is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock. Fixes: `c30da2e981` ("fuse: convert to use the new mount API") Cc: stable@vger.kernel.org # 5.4+ Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/8f07d45d-c806-484d-a2e3-7a2199df1cd2@redhat.com Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-03 16:55:11 +02:00
Youling Tang	153216cf7b	fuse: Use in_group_or_capable() helper Use the in_group_or_capable() helper function to simplify the code. Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Link: https://lore.kernel.org/r/20240620032335.147136-3-youling.tang@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-25 11:15:48 +02:00
Linus Torvalds	2ef32ad224	virtio: features, fixes, cleanups Several new features here: - virtio-net is finally supported in vduse. - Virtio (balloon and mem) interaction with suspend is improved - vhost-scsi now handles signals better/faster. Fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmZN570PHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRp2JUH/1K3fZOHymop6Y5Z3USFS7YdlF+dniedY/vg TKyWERkXOlxq1d9DVxC0mN7tk72DweuWI0YJjLXofrEW1VuW29ecSbyFXxpeWJls b7ErffxDAFRas5jkMCngD8TuFnbEegU0mGP5kbiHpEndBydQ2hH99Gg0x7swW+cE xsvU5zonCCLwLGIP2DrVrn9qGOHtV6o8eZfVKDVXfvicn3lFBkUSxlwEYsO9RMup aKxV4FT2Pb1yBicwBK4TH1oeEXqEGy1YLEn+kAHRbgoC/5L0/LaiqrkzwzwwOIPj uPGkacf8CIbX0qZo5EzD8kvfcYL1xhU3eT9WBmpp2ZwD+4bINd4= =nax1 -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-net is finally supported in vduse - virtio (balloon and mem) interaction with suspend is improved - vhost-scsi now handles signals better/faster And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) virtio-pci: Check if is_avq is NULL virtio: delete vq in vp_find_vqs_msix() when request_irq() fails MAINTAINERS: add Eugenio Pérez as reviewer vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API vp_vdpa: don't allocate unused msix vectors sound: virtio: drop owner assignment fuse: virtio: drop owner assignment scsi: virtio: drop owner assignment rpmsg: virtio: drop owner assignment nvdimm: virtio_pmem: drop owner assignment wifi: mac80211_hwsim: drop owner assignment vsock/virtio: drop owner assignment net: 9p: virtio: drop owner assignment net: virtio: drop owner assignment net: caif: virtio: drop owner assignment misc: nsm: drop owner assignment iommu: virtio: drop owner assignment drm/virtio: drop owner assignment gpio: virtio: drop owner assignment firmware: arm_scmi: virtio: drop owner assignment ...	2024-05-23 12:04:36 -07:00
Krzysztof Kozlowski	bc21020f61	fuse: virtio: drop owner assignment virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-24-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2024-05-22 08:31:18 -04:00
Peter-Jan Gootzen	529395d2ae	virtio-fs: add multi-queue support This commit creates a multi-queue mapping at device bring-up. The driver first attempts to use the existing MSI-X interrupt affinities (previously disabled), and if not present, will distribute the request queues evenly over the CPUs. If the latter fails as well, all CPUs are mapped to request queue zero. When a request is handed from FUSE to the virtio-fs device driver, the driver will use the current CPU to index into the multi-queue mapping and determine the optimal request queue to use. We measured the performance of this patch with the fio benchmarking tool, increasing the number of queues results in a significant speedup for both read and write operations, demonstrating the effectiveness of multi-queue support. Host: - Dell PowerEdge R760 - CPU: Intel(R) Xeon(R) Gold 6438M, 128 cores - VM: KVM with 32 cores Virtio-fs device: - BlueField-3 DPU - CPU: ARM Cortex-A78AE, 16 cores - One thread per queue, each busy polling on one request queue - Each queue is 1024 descriptors deep Workload: - fio, sequential read or write, ioengine=libaio, numjobs=32, 4GiB file per job, iodepth=8, bs=256KiB, runtime=30s Performance Results: +===========================+==========+===========+ \| Number of queues \| Fio read \| Fio write \| +===========================+==========+===========+ \| 1 request queue (GiB/s) \| 6.1 \| 4.6 \| +---------------------------+----------+-----------+ \| 8 request queues (GiB/s) \| 25.8 \| 10.3 \| +---------------------------+----------+-----------+ \| 16 request queues (GiB/s) \| 30.9 \| 19.5 \| +---------------------------+----------+-----------+ \| 32 request queue (GiB/s) \| 33.2 \| 22.6 \| +---------------------------+----------+-----------+ \| Speedup \| 5.5x \| 5x \| +---------------=-----------+----------+-----------+ Signed-off-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Signed-off-by: Yoray Zack <yorayz@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-10 13:38:14 +02:00
Peter-Jan Gootzen	103c2de111	virtio-fs: limit number of request queues Virtio-fs devices might allocate significant resources to virtio queues such as CPU cores that busy poll on the queue. The device indicates how many request queues it can support and the driver should initialize the number of queues that they want to utilize. In this patch we limit the number of initialized request queues to the number of CPUs, to limit the resource consumption on the device-side and to prepare for the upcoming multi-queue patch. Signed-off-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Signed-off-by: Yoray Zack <yorayz@nvidia.com> Suggested-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-10 13:38:14 +02:00
Hou Tao	246014876d	fuse: clear FR_SENT when re-adding requests into pending list The following warning was reported by lee bruce: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 8264 at fs/fuse/dev.c:300 fuse_request_end+0x685/0x7e0 fs/fuse/dev.c:300 Modules linked in: CPU: 0 PID: 8264 Comm: ab2 Not tainted 6.9.0-rc7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:fuse_request_end+0x685/0x7e0 fs/fuse/dev.c:300 ...... Call Trace: <TASK> fuse_dev_do_read.constprop.0+0xd36/0x1dd0 fs/fuse/dev.c:1334 fuse_dev_read+0x166/0x200 fs/fuse/dev.c:1367 call_read_iter include/linux/fs.h:2104 [inline] new_sync_read fs/read_write.c:395 [inline] vfs_read+0x85b/0xba0 fs/read_write.c:476 ksys_read+0x12f/0x260 fs/read_write.c:619 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xce/0x260 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ...... </TASK> The warning is due to the FUSE_NOTIFY_RESEND notify sent by the write() syscall in the reproducer program and it happens as follows: (1) calls fuse_dev_read() to read the INIT request The read succeeds. During the read, bit FR_SENT will be set on the request. (2) calls fuse_dev_write() to send an USE_NOTIFY_RESEND notify The resend notify will resend all processing requests, so the INIT request is moved from processing list to pending list again. (3) calls fuse_dev_read() with an invalid output address fuse_dev_read() will try to copy the same INIT request to the output address, but it will fail due to the invalid address, so the INIT request is ended and triggers the warning in fuse_request_end(). Fix it by clearing FR_SENT when re-adding requests into pending list. Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/linux-fsdevel/58f13e47-4765-fce4-daf4-dffcc5ae2330@huaweicloud.com/T/#m091614e5ea2af403b259e7cea6a49e51b9ee07a7 Fixes: `760eac73f9` ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-10 11:10:39 +02:00
Hou Tao	42815f8ac5	fuse: set FR_PENDING atomically in fuse_resend() When fuse_resend() moves the requests from processing lists to pending list, it uses __set_bit() to set FR_PENDING bit in req->flags. Using __set_bit() is not safe, because other functions may update req->flags concurrently (e.g., request_wait_answer() may call set_bit(FR_INTERRUPTED, &flags)). Fix it by using set_bit() instead. Fixes: `760eac73f9` ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-10 11:10:12 +02:00
Richard Fung	9fe2a036a2	fuse: Add initial support for fs-verity This adds support for the FS_IOC_ENABLE_VERITY and FS_IOC_MEASURE_VERITY ioctls. The FS_IOC_READ_VERITY_METADATA is missing but from the documentation, "This is a fairly specialized use case, and most fs-verity users won’t need this ioctl." Signed-off-by: Richard Fung <richardfung@google.com> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-08 09:31:21 +02:00
Matthew Wilcox (Oracle)	413e8f014c	fuse: Convert fuse_readpages_end() to use folio_end_read() Nobody checks the error flag on fuse folios, so stop setting it. Optimise the (optional) setting of the uptodate flag and clearing of the lock flag by using folio_end_read(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-08 09:31:21 +02:00
Brian Foster	96d88f65ad	virtiofs: include a newline in sysfs tag The internal tag string doesn't contain a newline. Append one when emitting the tag via sysfs. [Stefan] Orthogonal to the newline issue, sysfs_emit(buf, "%s", fs->tag) is needed to prevent format string injection. Signed-off-by: Brian Foster <bfoster@redhat.com> Fixes: `a8f62f50b4` ("virtiofs: export filesystem tags through sysfs") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-08 09:31:21 +02:00
Amir Goldstein	aef8acd79f	fuse: verify zero padding in fuse_backing_map To allow us extending the interface in the future. Fixes: `44350256ab` ("fuse: implement ioctls to manage backing files") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-04-22 17:13:43 +02:00
Yang Li	09492cb451	cuse: add kernel-doc comments to cuse_process_init_reply() This commit adds kernel-doc style comments with complete parameter descriptions for the function cuse_process_init_reply. Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-04-15 11:02:10 +02:00
Danny Lin	eb4b691b91	fuse: fix leaked ENOSYS error on first statx call FUSE attempts to detect server support for statx by trying it once and setting no_statx=1 if it fails with ENOSYS, but consider the following scenario: - Userspace (e.g. sh) calls stat() on a file * succeeds - Userspace (e.g. lsd) calls statx(BTIME) on the same file - request_mask = STATX_BASIC_STATS \| STATX_BTIME - first pass: sync=true due to differing cache_mask - statx fails and returns ENOSYS - set no_statx and retry - retry sets mask = STATX_BASIC_STATS - now mask == cache_mask; sync=false (time_before: still valid) - so we take the "else if (stat)" path - "err" is still ENOSYS from the failed statx call Fix this by zeroing "err" before retrying the failed call. Fixes: `d3045530bd` ("fuse: implement statx") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Danny Lin <danny@orbstack.dev> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-04-15 10:12:44 +02:00
Amir Goldstein	7cc9112628	fuse: fix parallel dio write on file open in passthrough mode Parallel dio write takes a negative refcount of fi->iocachectr and so does open of file in passthrough mode. The refcount of passthrough mode is associated with attach/detach of a fuse_backing object to fuse inode. For parallel dio write, the backing file is irrelevant, so the call to fuse_inode_uncached_io_start() passes a NULL fuse_backing object. Passing a NULL fuse_backing will result in false -EBUSY error if the file is already open in passthrough mode. Allow taking negative fi->iocachectr refcount with NULL fuse_backing, because it does not conflict with an already attached fuse_backing object. Fixes: `4a90451bbc` ("fuse: implement open in passthrough mode") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-04-15 10:12:44 +02:00
Amir Goldstein	4864a6dd83	fuse: fix wrong ff->iomode state changes from parallel dio write There is a confusion with fuse_file_uncached_io_{start,end} interface. These helpers do two things when called from passthrough open()/release(): 1. Take/drop negative refcount of fi->iocachectr (inode uncached io mode) 2. State change ff->iomode IOM_NONE <-> IOM_UNCACHED (file uncached open) The calls from parallel dio write path need to take a reference on fi->iocachectr, but they should not be changing ff->iomode state, because in this case, the fi->iocachectr reference does not stick around until file release(). Factor out helpers fuse_inode_uncached_io_{start,end}, to be used from parallel dio write path and rename fuse_file_cached_io_{start,end} helpers to fuse_file_cached_io_{open,release} to clarify the difference. Fixes: `205c1d8026` ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-04-15 10:12:03 +02:00
Linus Torvalds	6ce8b2ce0d	fuse update for 6.9 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZfLjeQAKCRDh3BK/laaZ PBYQAQDqYZzq91Kn5jdvjaSd+6I/+x7MDLOIP5hPX0HJLuBxWAEAqENoo4Of0GTC ltW7DKrQy9E3CMp6VKSLVJPN4BYP9gk= =GvOE -----END PGP SIGNATURE----- Merge tag 'fuse-update-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add passthrough mode for regular file I/O. This allows performing read and write (also via memory maps) on a backing file without incurring the overhead of roundtrips to userspace. For now this is only allowed to privileged servers, but this limitation will go away in the future (Amir Goldstein) - Fix interaction of direct I/O mode with memory maps (Bernd Schubert) - Export filesystem tags through sysfs for virtiofs (Stefan Hajnoczi) - Allow resending queued requests for server crash recovery (Zhao Chen) - Misc fixes and cleanups * tag 'fuse-update-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (38 commits) fuse: get rid of ff->readdir.lock fuse: remove unneeded lock which protecting update of congestion_threshold fuse: Fix missing FOLL_PIN for direct-io fuse: remove an unnecessary if statement fuse: Track process write operations in both direct and writethrough modes fuse: Use the high bit of request ID for indicating resend requests fuse: Introduce a new notification type for resend pending requests fuse: add support for explicit export disabling fuse: __kuid_val/__kgid_val helpers in fuse_fill_attr_from_inode() fuse: fix typo for fuse_permission comment fuse: Convert fuse_writepage_locked to take a folio fuse: Remove fuse_writepage virtio_fs: remove duplicate check if queue is broken fuse: use FUSE_ROOT_ID in fuse_get_root_inode() fuse: don't unhash root fuse: fix root lookup with nonzero generation fuse: replace remaining make_bad_inode() with fuse_make_bad() virtiofs: drop __exit from virtio_fs_sysfs_exit() fuse: implement passthrough for mmap fuse: implement splice read/write passthrough ...	2024-03-15 09:47:14 -07:00
Linus Torvalds	902861e34c	- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx TMNhHfyiHYDTx/GAV9NXW84tasJSDgA= =TG55 -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...	2024-03-14 17:43:30 -07:00
Linus Torvalds	0c750012e8	vfs-6.9.file -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4tQAKCRCRxhvAZXjc ohnfAP4sm946PZfiC4y5Euk96WDC3hC8WCSBar+fpFmYVzeD9wEAy+NVCsjkMElz vqNxwFULUwQjFxxvsM9gvhrgGUud1AE= =UZk/ -----END PGP SIGNATURE----- Merge tag 'vfs-6.9.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull file locking updates from Christian Brauner: "A few years ago struct file_lock_context was added to allow for separate lists to track different types of file locks instead of using a singly-linked list for all of them. Now leases no longer need to be tracked using struct file_lock. However, a lot of the infrastructure is identical for leases and locks so separating them isn't trivial. This splits a group of fields used by both file locks and leases into a new struct file_lock_core. The new core struct is embedded in struct file_lock. Coccinelle was used to convert a lot of the callers to deal with the move, with the remaining 25% or so converted by hand. Afterwards several internal functions in fs/locks.c are made to work with struct file_lock_core. Ultimately this allows to split struct file_lock into struct file_lock and struct file_lease. The file lease APIs are then converted to take struct file_lease" * tag 'vfs-6.9.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (51 commits) filelock: fix deadlock detection in POSIX locking filelock: always define for_each_file_lock() smb: remove redundant check filelock: don't do security checks on nfsd setlease calls filelock: split leases out of struct file_lock filelock: remove temporary compatibility macros smb/server: adapt to breakup of struct file_lock smb/client: adapt to breakup of struct file_lock ocfs2: adapt to breakup of struct file_lock nfsd: adapt to breakup of struct file_lock nfs: adapt to breakup of struct file_lock lockd: adapt to breakup of struct file_lock fuse: adapt to breakup of struct file_lock gfs2: adapt to breakup of struct file_lock dlm: adapt to breakup of struct file_lock ceph: adapt to breakup of struct file_lock afs: adapt to breakup of struct file_lock 9p: adapt to breakup of struct file_lock filelock: convert seqfile handling to use file_lock_core filelock: convert locks_translate_pid to take file_lock_core ...	2024-03-11 10:37:45 -07:00
Miklos Szeredi	cdf6ac2a03	fuse: get rid of ff->readdir.lock The same protection is provided by file->f_pos_lock. Note, this relies on the fact that file->f_mode has FMODE_ATOMIC_POS. This flag is cleared by stream_open(), which would prevent locking of f_pos_lock. Prior to commit `7de64d521b` ("fuse: break up fuse_open_common()") FOPEN_STREAM on a directory would cause stream_open() to be called. After this commit this is not done anymore, so f_pos_lock will always be locked. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-06 16:20:58 +01:00
Kemeng Shi	efc4105a4c	fuse: remove unneeded lock which protecting update of congestion_threshold Commit `670d21c6e1` ("fuse: remove reliance on bdi congestion") change how congestion_threshold is used and lock in fuse_conn_congestion_threshold_write is not needed anymore. 1. Access to supe_block is removed along with removing of bdi congestion. Then down_read(&fc->killsb) which protecting access to super_block is no needed. 2. Compare num_background and congestion_threshold without holding bg_lock. Then there is no need to hold bg_lock to update congestion_threshold. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-06 11:07:51 +01:00
Lei Huang	738adade96	fuse: Fix missing FOLL_PIN for direct-io Our user space filesystem relies on fuse to provide POSIX interface. In our test, a known string is written into a file and the content is read back later to verify correct data returned. We observed wrong data returned in read buffer in rare cases although correct data are stored in our filesystem. Fuse kernel module calls iov_iter_get_pages2() to get the physical pages of the user-space read buffer passed in read(). The pages are not pinned to avoid page migration. When page migration occurs, the consequence are two-folds. 1) Applications do not receive correct data in read buffer. 2) fuse kernel writes data into a wrong place. Using iov_iter_extract_pages() to pin pages fixes the issue in our test. An auxiliary variable "struct page **pt_pages" is used in the patch to prepare the 2nd parameter for iov_iter_extract_pages() since iov_iter_get_pages2() uses a different type for the 2nd parameter. [SzM] add iov_iter_extract_will_pin(ii) and unpin only if true. Signed-off-by: Lei Huang <lei.huang@linux.intel.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-06 11:07:35 +01:00
Jiachen Zhang	8a5fb18643	fuse: remove an unnecessary if statement FUSE remote locking code paths never add any locking state to inode->i_flctx, so the locks_remove_posix() function called on file close will return without calling fuse_setlk(). Therefore, as the if statement to be removed in this commit will always be false, remove it for clearness. Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-06 09:56:54 +01:00
Zhou Jifeng	2e3f7dd08d	fuse: Track process write operations in both direct and writethrough modes Due to the fact that fuse does not count the write IO of processes in the direct and writethrough write modes, user processes cannot track write_bytes through the “/proc/[pid]/io” path. For example, the system tool iotop cannot count the write operations of the corresponding process. Signed-off-by: Zhou Jifeng <zhoujifeng@kylinos.com.cn> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-06 09:56:35 +01:00
Zhao Chen	9e7f5296f4	fuse: Use the high bit of request ID for indicating resend requests Some FUSE daemons want to know if the received request is a resend request. The high bit of the fuse request ID is utilized for indicating this, enabling the receiver to perform appropriate handling. The init flag "FUSE_HAS_RESEND" is added to indicate this feature. Signed-off-by: Zhao Chen <winters.zc@antgroup.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-06 09:56:35 +01:00
Zhao Chen	760eac73f9	fuse: Introduce a new notification type for resend pending requests When a FUSE daemon panics and failover, we aim to minimize the impact on applications by reusing the existing FUSE connection. During this process, another daemon is employed to preserve the FUSE connection's file descriptor. The new started FUSE Daemon will takeover the fd and continue to provide service. However, it is possible for some inflight requests to be lost and never returned. As a result, applications awaiting replies would become stuck forever. To address this, we can resend these pending requests to the new started FUSE daemon. This patch introduces a new notification type "FUSE_NOTIFY_RESEND", which can trigger resending of the pending requests, ensuring they are properly processed again. Signed-off-by: Zhao Chen <winters.zc@antgroup.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-06 09:56:34 +01:00
Jingbo Xu	e022f6a1c7	fuse: add support for explicit export disabling open_by_handle_at(2) can fail with -ESTALE with a valid handle returned by a previous name_to_handle_at(2) for evicted fuse inodes, which is especially common when entry_valid_timeout is 0, e.g. when the fuse daemon is in "cache=none" mode. The time sequence is like: name_to_handle_at(2) # succeed evict fuse inode open_by_handle_at(2) # fail The root cause is that, with 0 entry_valid_timeout, the dput() called in name_to_handle_at(2) will trigger iput -> evict(), which will send FUSE_FORGET to the daemon. The following open_by_handle_at(2) will send a new FUSE_LOOKUP request upon inode cache miss since the previous inode eviction. Then the fuse daemon may fail the FUSE_LOOKUP request with -ENOENT as the cached metadata of the requested inode has already been cleaned up during the previous FUSE_FORGET. The returned -ENOENT is treated as -ESTALE when open_by_handle_at(2) returns. This confuses the application somehow, as open_by_handle_at(2) fails when the previous name_to_handle_at(2) succeeds. The returned errno is also confusing as the requested file is not deleted and already there. It is reasonable to fail name_to_handle_at(2) early in this case, after which the application can fallback to open(2) to access files. Since this issue typically appears when entry_valid_timeout is 0 which is configured by the fuse daemon, the fuse daemon is the right person to explicitly disable the export when required. Also considering FUSE_EXPORT_SUPPORT actually indicates the support for lookups of "." and "..", and there are existing fuse daemons supporting export without FUSE_EXPORT_SUPPORT set, for compatibility, we add a new INIT flag for such purpose. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-06 09:56:34 +01:00
Alexander Mikhalitsyn	5a4d888e9f	fuse: __kuid_val/__kgid_val helpers in fuse_fill_attr_from_inode() For the sake of consistency, let's use these helpers to extract {u,g}id_t values from k{u,g}id_t ones. There are no functional changes, just to make code cleaner. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-06 09:56:25 +01:00
Alexander Mikhalitsyn	2d09ab2203	fuse: fix typo for fuse_permission comment Found by chance while working on support for idmapped mounts in fuse. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-06 09:56:11 +01:00
Matthew Wilcox (Oracle)	e0887e095a	fuse: Convert fuse_writepage_locked to take a folio The one remaining caller of fuse_writepage_locked() already has a folio, so convert this function entirely. Saves a few calls to compound_head() but no attempt is made to support large folios in this patch. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 14:07:24 +01:00
Matthew Wilcox (Oracle)	e1c420ac99	fuse: Remove fuse_writepage The writepage operation is deprecated as it leads to worse performance under high memory pressure due to folios being written out in LRU order rather than sequentially within a file. Use filemap_migrate_folio() to support dirty folio migration instead of writepage. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 14:07:24 +01:00
Li RongQing	f9c2913739	virtio_fs: remove duplicate check if queue is broken virtqueue_enable_cb() will call virtqueue_poll() which will check if queue is broken at beginning, so remove the virtqueue_is_broken() call Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:43 +01:00
Miklos Szeredi	253e524371	fuse: use FUSE_ROOT_ID in fuse_get_root_inode() ...when calling fuse_iget(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:43 +01:00
Miklos Szeredi	b1fe686a76	fuse: don't unhash root The root inode is assumed to be always hashed. Do not unhash the root inode even if it is marked BAD. Fixes: `5d069dbe8a` ("fuse: fix bad inode") Cc: <stable@vger.kernel.org> # v5.11 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:43 +01:00
Miklos Szeredi	68ca1b49e4	fuse: fix root lookup with nonzero generation The root inode has a fixed nodeid and generation (1, 0). Prior to the commit `15db16837a` ("fuse: fix illegal access to inode with reused nodeid") generation number on lookup was ignored. After this commit lookup with the wrong generation number resulted in the inode being unhashed. This is correct for non-root inodes, but replacing the root inode is wrong and results in weird behavior. Fix by reverting to the old behavior if ignoring the generation for the root inode, but issuing a warning in dmesg. Reported-by: Antonio SJ Musumeci <trapexit@spawn.link> Closes: https://lore.kernel.org/all/CAOQ4uxhek5ytdN8Yz2tNEOg5ea4NkBb4nk0FGPjPk_9nz-VG3g@mail.gmail.com/ Fixes: `15db16837a` ("fuse: fix illegal access to inode with reused nodeid") Cc: <stable@vger.kernel.org> # v5.14 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:42 +01:00
Miklos Szeredi	82e081aebe	fuse: replace remaining make_bad_inode() with fuse_make_bad() fuse_do_statx() was added with the wrong helper. Fixes: `d3045530bd` ("fuse: implement statx") Cc: <stable@vger.kernel.org> # v6.6 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:42 +01:00
Stefan Hajnoczi	d30ff89870	virtiofs: drop __exit from virtio_fs_sysfs_exit() virtio_fs_sysfs_exit() is called by: - static int __init virtio_fs_init(void) - static void __exit virtio_fs_exit(void) Remove __exit from virtio_fs_sysfs_exit() since virtio_fs_init() is not an __exit function. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402270649.GYjNX0yw-lkp@intel.com/ Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:42 +01:00
Amir Goldstein	fda0b98ef0	fuse: implement passthrough for mmap An mmap request for a file open in passthrough mode, maps the memory directly to the backing file. An mmap of a file in direct io mode, usually uses cached mmap and puts the inode in caching io mode, which denies new passthrough opens of that inode, because caching io mode is conflicting with passthrough io mode. For the same reason, trying to mmap a direct io file, while there is a passthrough file open on the same inode will fail with -ENODEV. An mmap of a file in direct io mode, also needs to wait for parallel dio writes in-progress to complete. If a passthrough file is opened, while an mmap of another direct io file is waiting for parallel dio writes to complete, the wait is aborted and mmap fails with -ENODEV. A FUSE server that uses passthrough and direct io opens on the same inode that may also be mmaped, is advised to provide a backing fd also for the files that are open in direct io mode (i.e. use the flags combination FOPEN_DIRECT_IO \| FOPEN_PASSTHROUGH), so that mmap will always use the backing file, even if read/write do not passthrough. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:42 +01:00
Amir Goldstein	5ca7346861	fuse: implement splice read/write passthrough This allows passing fstests generic/249 and generic/591. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:42 +01:00
Amir Goldstein	57e1176e60	fuse: implement read/write passthrough Use the backing file read/write helpers to implement read/write passthrough to a backing file. After read/write, we invalidate a/c/mtime/size attributes. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:42 +01:00
Amir Goldstein	4a90451bbc	fuse: implement open in passthrough mode After getting a backing file id with FUSE_DEV_IOC_BACKING_OPEN ioctl, a FUSE server can reply to an OPEN request with flag FOPEN_PASSTHROUGH and the backing file id. The FUSE server should reuse the same backing file id for all the open replies of the same FUSE inode and open will fail (with -EIO) if a the server attempts to open the same inode with conflicting io modes or to setup passthrough to two different backing files for the same FUSE inode. Using the same backing file id for several different inodes is allowed. Opening a new file with FOPEN_DIRECT_IO for an inode that is already open for passthrough is allowed, but only if the FOPEN_PASSTHROUGH flag and correct backing file id are specified as well. The read/write IO of such files will not use passthrough operations to the backing file, but mmap, which does not support direct_io, will use the backing file insead of using the page cache as it always did. Even though all FUSE passthrough files of the same inode use the same backing file as a backing inode reference, each FUSE file opens a unique instance of a backing_file object to store the FUSE path that was used to open the inode and the open flags of the specific open file. The per-file, backing_file object is released along with the FUSE file. The inode associated fuse_backing object is released when the last FUSE passthrough file of that inode is released AND when the backing file id is closed by the server using the FUSE_DEV_IOC_BACKING_CLOSE ioctl. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:42 +01:00
Amir Goldstein	fc8ff397b2	fuse: prepare for opening file in passthrough mode In preparation for opening file in passthrough mode, store the fuse_open_out argument in ff->args to be passed into fuse_file_io_open() with the optional backing_id member. This will be used for setting up passthrough to backing file on open reply with FOPEN_PASSTHROUGH flag and a valid backing_id. Opening a file in passthrough mode may fail for several reasons, such as missing capability, conflicting open flags or inode in caching mode. Return EIO from fuse_file_io_open() in those cases. The combination of FOPEN_PASSTHROUGH and FOPEN_DIRECT_IO is allowed - it mean that read/write operations will go directly to the server, but mmap will be done to the backing file. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:42 +01:00
Amir Goldstein	44350256ab	fuse: implement ioctls to manage backing files FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file descriptor. If the call succeeds, a backing file identifier is returned. A later change will be using this backing file id in a reply to OPEN request with the flag FOPEN_PASSTHROUGH to setup passthrough of file operations on the open FUSE file to the backing file. The FUSE server should call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the backing file by its id. This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH flag is still in progress, the open may fail if the backing file is closed before the fuse file was opened. Setting up backing files requires a server with CAP_SYS_ADMIN privileges. For the backing file to be successfully setup, the backing file must implement both read_iter and write_iter file operations. The limitation on the level of filesystem stacking allowed for the backing file is enforced before setting up the backing file. Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-03-05 13:40:36 +01:00

1 2 3 4 5 ...

1663 Commits