linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-08-16 10:53:34 +00:00

Author	SHA1	Message	Date
Linus Torvalds	d8441523f2	f2fs-for-6.16-rc1 In this round, Matthew converted most of page operations to using folio. Beyond the work, we've applied some performance tunings such as GC and linear lookup, in addition to enhancing fault injection and sanity checks. Enhancement: - large number of folio conversions - add a control to turn on/off the linear lookup for performance - tune GC logics for zoned block device - improve fault injection and sanity checks Bug fix: - handle error cases of memory donation - fix to correct check conditions in f2fs_cross_rename - fix to skip f2fs_balance_fs() if checkpoint is disabled - don't over-report free space or inodes in statvfs - prevent the current section from being selected as a victim during GC - fix to calculate first_zoned_segno correctly - fix to avoid inconsistence in between SIT and SSA for zoned block device As usual, there are several debugging patches and clean-ups as well. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmg3PdcACgkQQBSofoJI UNL/mQ/9Hkru4XSCokhxt8+/HoFRnTliAlzfD45Vzkkhz1YP7J8VdvWOzJV/WEai D3Ib50Q6/y2ptxu7cwOpmToR3fI3RzAlgQsYooFAiZOBnyUkBOLA1oaVuT4s/EYg u85xxLx0SW/IMX5CKKbYzhbXnocGAvRUkp/k30kjKJxpCeQ7pw/mLhw/2XeNIb9h FxJbECWPpf4PA6ot22YUNvQn0plF/s9873PPhv50vpGyXTHIlTbDCSMeEC1r1E5v xWsPcWmTkyPIyBhNFEONWJw1l3wcVIVKNBfBqwMEDr+Tgqi5UDEREeTDV9q5C6y+ vw3KnsOqX7RTdLExGfefTOnBsTqqMwSZQSH2HL5/Poayg5obXf3D/fUqAQajJpt/ FbAtfKaXElJcC7l3DJQU3Trh+WpdEPbuMiJo43OzX0YGvMfkA/sYrAHTYm5Q4nsC wrRLaWiBgG6nQDKNXz+amD9kL1SMxp+Vsf6ybtChH3gvMqDAJsR7DY1F/Cxe3ry8 8JoJiGRYq70lw5xNACfJNQwWwRbtySy63nIwMA7FGR9zaXBQJx+cSPhEeLsS+0hI zgijgtgRjbfuojlh7qvfFArHEIL4A67Um3RhjHbLWSFhREPaTB0665ElUNTGPe+y hVdYtkb0X2ngsYdV/Xdmp/OThpSxI8x1ZCXVsrElawVIMpjP+nA= =G8sl -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, Matthew converted most of page operations to using folio. Beyond the work, we've applied some performance tunings such as GC and linear lookup, in addition to enhancing fault injection and sanity checks. Enhancements: - large number of folio conversions - add a control to turn on/off the linear lookup for performance - tune GC logics for zoned block device - improve fault injection and sanity checks Bug fixes: - handle error cases of memory donation - fix to correct check conditions in f2fs_cross_rename - fix to skip f2fs_balance_fs() if checkpoint is disabled - don't over-report free space or inodes in statvfs - prevent the current section from being selected as a victim during GC - fix to calculate first_zoned_segno correctly - fix to avoid inconsistence between SIT and SSA for zoned block device As usual, there are several debugging patches and clean-ups as well" * tag 'f2fs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (195 commits) f2fs: fix to correct check conditions in f2fs_cross_rename f2fs: use d_inode(dentry) cleanup dentry->d_inode f2fs: fix to skip f2fs_balance_fs() if checkpoint is disabled f2fs: clean up to check bi_status w/ BLK_STS_OK f2fs: introduce is_{meta,node}_folio f2fs: add ckpt_valid_blocks to the section entry f2fs: add a method for calculating the remaining blocks in the current segment in LFS mode. f2fs: introduce FAULT_VMALLOC f2fs: use vmalloc instead of kvmalloc in .init_{,de}compress_ctx f2fs: add f2fs_bug_on() in f2fs_quota_read() f2fs: add f2fs_bug_on() to detect potential bug f2fs: remove unused sbi argument from checksum functions f2fs: fix 32-bits hexademical number in fault injection doc f2fs: don't over-report free space or inodes in statvfs f2fs: return bool from __write_node_folio f2fs: simplify return value handling in f2fs_fsync_node_pages f2fs: always unlock the page in f2fs_write_single_data_page f2fs: remove wbc->for_reclaim handling f2fs: return bool from __f2fs_write_meta_folio f2fs: fix to return correct error number in f2fs_sync_node_pages() ...	2025-05-30 08:40:25 -07:00
Linus Torvalds	9d230d500b	Driver core changes for 6.16-rc1 Here are the driver core / kernfs changes for 6.16-rc1. Not a huge number of changes this development cycle, here's the summary of what is included in here: - kernfs locking tweaks, pushing some global locks down into a per-fs image lock - rust driver core and pci device bindings added for new features. - sysfs const work for bin_attributes. This churn should now be completed for those types of attributes - auxbus device creation helpers added - fauxbus fix for creating sysfs files after the probe completed properly - other tiny updates for driver core things. All of these have been in linux-next for over a week with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaDbe+g8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylYbACgl/MngU9pRnx5jZIQh6bWveFSeo8AnRE4U5x0 X+lgTPjGKL1RrV3C5HJp =+0BA -----END PGP SIGNATURE----- Merge tag 'driver-core-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Greg KH: "Here are the driver core / kernfs changes for 6.16-rc1. Not a huge number of changes this development cycle, here's the summary of what is included in here: - kernfs locking tweaks, pushing some global locks down into a per-fs image lock - rust driver core and pci device bindings added for new features. - sysfs const work for bin_attributes. The final churn of switching away from and removing the transitional struct members, "read_new", "write_new" and "bin_attrs_new" will come after the merge window to avoid unnecesary merge conflicts. - auxbus device creation helpers added - fauxbus fix for creating sysfs files after the probe completed properly - other tiny updates for driver core things. All of these have been in linux-next for over a week with no reported issues" * tag 'driver-core-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: kernfs: Relax constraint in draining guard Documentation: embargoed-hardware-issues.rst: Remove myself drivers: hv: fix up const issue with vmbus_chan_bin_attrs firmware_loader: use SHA-256 library API instead of crypto_shash API docs: debugfs: do not recommend debugfs_remove_recursive PM: wakeup: Do not expose 4 device wakeup source APIs kernfs: switch global kernfs_rename_lock to per-fs lock kernfs: switch global kernfs_idr_lock to per-fs lock driver core: auxiliary bus: Fix IS_ERR() vs NULL mixup in __devm_auxiliary_device_create() sysfs: constify attribute_group::bin_attrs sysfs: constify bin_attribute argument of bin_attribute::read/write() software node: Correct a OOB check in software_node_get_reference_args() devres: simplify devm_kstrdup() using devm_kmemdup() platform: replace magic number with macro PLATFORM_DEVID_NONE component: do not try to unbind unbound components driver core: auxiliary bus: add device creation helpers driver core: faux: Add sysfs groups after probing	2025-05-29 09:11:39 -07:00
Linus Torvalds	d87d73895f	New ext4 features and performance improvements: * Fast commit performance improvements * Multi-fsblock atomic write support for bigalloc file systems * Large folio support for regular files This last can result in really stupendous performance for the right workloads. For example, see [1] where the Kernel Test Robot reported over 37% improvement on a large sequential I/O workload. [1] https://lore.kernel.org/all/202505161418.ec0d753f-lkp@intel.com/ There are also the usual bug fixes and cleanups. Of note are cleanups of the extent status tree to fix potential races that could result in the extent status tree getting corrupted under heavy siulatneous allocation and deallocation to a single file. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmg2GJgACgkQ8vlZVpUN gaOmmgf/fEh2OPDG6aAJpQ6hjy2WIbrxqTyuWC/+AFnyI5/Jy0Iskis3lHBiFdKP IFjgC1h9CB5ARVvOLd7NOgflPHSSHsnYqoTCS6J4tdWvFN4VRiHe2J3fdTZd/bea dzdWniHS3SAJiQm4wvbkhluFgecItBHYzDltapkHI0OGepxZt3thWVvbay6veO9R ChXQ7T7/9eUZa5N5IVUeJmWobgh0RD+DgtwCih59UDfnezGqiDr6/shpyNC6EvWV oZdvJw2+2DCPn5+DF4Ut77mLpKnxorQ4osNPOovZf59JnSyEcCmbBDuvyNfRldfC yQYoCFkOv0Fz8tgJbtoAN71+YXl66w== =fxDh -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New ext4 features and performance improvements: - Fast commit performance improvements - Multi-fsblock atomic write support for bigalloc file systems - Large folio support for regular files This last can result in really stupendous performance for the right workloads. For example, see [1] where the Kernel Test Robot reported over 37% improvement on a large sequential I/O workload. There are also the usual bug fixes and cleanups. Of note are cleanups of the extent status tree to fix potential races that could result in the extent status tree getting corrupted under heavy simultaneous allocation and deallocation to a single file" Link: https://lore.kernel.org/all/202505161418.ec0d753f-lkp@intel.com/ [1] * tag 'ext4_for_linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (52 commits) ext4: Add a WARN_ON_ONCE for querying LAST_IN_LEAF instead ext4: Simplify flags in ext4_map_query_blocks() ext4: Rename and document EXT4_EX_FILTER to EXT4_EX_QUERY_FILTER ext4: Simplify last in leaf check in ext4_map_query_blocks ext4: Unwritten to written conversion requires EXT4_EX_NOCACHE ext4: only dirty folios when data journaling regular files ext4: Add atomic block write documentation ext4: Enable support for ext4 multi-fsblock atomic write using bigalloc ext4: Add multi-fsblock atomic write support with bigalloc ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS ext4: Make ext4_meta_trans_blocks() non-static for later use ext4: Check if inode uses extents in ext4_inode_can_atomic_write() ext4: Document an edge case for overwrites jbd2: remove journal_t argument from jbd2_superblock_csum() jbd2: remove journal_t argument from jbd2_chksum() ext4: remove sb argument from ext4_superblock_csum() ext4: remove sbi argument from ext4_chksum() ext4: enable large folio for regular file ext4: make online defragmentation support large folios ext4: make the writeback path support large folios ...	2025-05-28 12:12:08 -07:00
Chao Yu	54ca9be0bc	f2fs: introduce FAULT_VMALLOC Introduce a new fault type FAULT_VMALLOC to simulate no memory error in f2fs_vmalloc(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-05-27 23:52:36 +00:00
Linus Torvalds	3e443d1673	A moderately busy cycle for documentation this time around: - The most significant change is the replacement of the old kernel-doc script (a monstrous collection of Perl regexes that predates the Git era) with a Python reimplementation. That, too, is a horrifying collection of regexes, but in a much cleaner and more maintainable structure that integrates far better with the Sphinx build system. This change has been in linux-next for the full 6.15 cycle; the small number of problems that turned up have been addressed, seemingly to everybody's satisfaction. The Perl kernel-doc script remains in tree (as scripts/kernel-doc.pl) and can be used with a command-line option if need be. Unless some reason to keep it around materializes, it will probably go away in 6.17. Credit goes to Mauro Carvalho Chehab for doing all this work. - Some RTLA documentation updates - A handful of Chinese translations - The usual collection of typo fixes, general updates, etc. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmg0j/IPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5Yu7sH/1w2LtO8XB/KTRNmuz3tV6KzGtDvQVwqgxB2 X8bbeJlBtYenvuak66RjCfucOh7Y8Ni3UN0G2BGa67KBAxmZEYc6u+IF4SrJUg5g DuS6+ZXgqV4TrjWMRof5LtPS8KbNJLGnqgxSVdEPSBV0jJ13r3gb3/e7X06iNAKR X4Nq+h5aa1tCwZTkPOSHHQn4qm3Tb1LQreDSn8gnBn6e8nVJIakNlwaVYkClhI9B byvItInv32LPAXPDkcEWITvLNUTiMobTyfBYHOD6i3nImQ+j4ZiMMmOUjiB+0jDO UQDvoUa46ipXkLBsBOrYEkM/iKXBawMwTa3CcudxR4scvVgATJs= =BQ9X -----END PGP SIGNATURE----- Merge tag 'docs-6.16' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "A moderately busy cycle for documentation this time around: - The most significant change is the replacement of the old kernel-doc script (a monstrous collection of Perl regexes that predates the Git era) with a Python reimplementation. That, too, is a horrifying collection of regexes, but in a much cleaner and more maintainable structure that integrates far better with the Sphinx build system. This change has been in linux-next for the full 6.15 cycle; the small number of problems that turned up have been addressed, seemingly to everybody's satisfaction. The Perl kernel-doc script remains in tree (as scripts/kernel-doc.pl) and can be used with a command-line option if need be. Unless some reason to keep it around materializes, it will probably go away in 6.17. Credit goes to Mauro Carvalho Chehab for doing all this work. - Some RTLA documentation updates - A handful of Chinese translations - The usual collection of typo fixes, general updates, etc" * tag 'docs-6.16' of git://git.lwn.net/linux: (85 commits) Docs: doc-guide: update sphinx.rst Sphinx version number docs: doc-guide: clarify latest theme usage Documentation/scheduler: Fix typo in sched-stats domain field description scripts: kernel-doc: prevent a KeyError when checking output docs: kerneldoc.py: simplify exception handling logic MAINTAINERS: update linux-doc entry to cover new Python scripts docs: align with scripts/syscall.tbl migration Documentation: NTB: Fix typo Documentation: ioctl-number: Update table intro docs: conf.py: drop backward support for old Sphinx versions Docs: driver-api/basics: add kobject_event interfaces Docs: relay: editing cleanups docs: fix "incase" typo in coresight/panic.rst Fix spelling error for 'parallel' docs: admin-guide: fix typos in reporting-issues.rst docs: dmaengine: add explanation for DMA_ASYNC_TX capability Documentation: leds: improve readibility of multicolor doc docs: fix typo in firmware-related section docs: Makefile: Inherit PYTHONPYCACHEPREFIX setting as env variable Documentation: ioctl-number: Update outdated submission info ...	2025-05-27 11:22:19 -07:00
Linus Torvalds	664a231d90	Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmg0UDwACgkQEsHwGGHe VUqsZw//SNSNcVHF7Gz2YvHrMXGYQFBETScg6fRWn/pTe3x1NrKEJedzMANXpAIy 1sBAsfDSOyi8MxIZnvMYapLcRdfLGAD+6FQTkyu/IQ3oSsjAxPgrTXornhxUswMY LUs40hCv/UaEMkg35NVrRqDlT973kWLwA4iDNNnm6IGtrC8qv4EmdJvgVWHyPTjk D80KA5ta+iPzK4l8noBrqyhUIZN3ZAJVJLrjS3Tx/gabuolLURE6p4IdlF/O6WzC 4NcqUjpwDeFpHpl2M9QJLVEKXHxKz9zZF2gLpT8Eon/ftqqQigBjzsUx/FKp07hZ fe2AiQsd4gN9GZa3BGX+Lv+bjvyFadARsOoFbY45szuiUb0oceaRYtFF1ihmO0bV bD4nAROE1kAfZpr/9ZRZT63LfE/DAm9TR1YBsViq1rrJvp4odvL15YbdOlIDHZD3 SmxhTxAokj058MRnhGdHoiMtPa54iw186QYDp0KxLQHLrToBPd7RBtRE8jsYrqrv 2EvwUxYKyO4vtwr9tzr0ZfptZ/DEsGovoTYD5EtlEGjotQUqsmi5Rxx4+SEQuwFw CKSJ3j73gpxqDXTujjOe9bCeeXJqyEbrIkaWpkiBRwm5of7eFPG3Sw74jaCGvm4L NM4UufMSDtyVAKfu3HmPkGhujHv0/7h1zYND51aW+GXEroKxy9s= =eNCr -----END PGP SIGNATURE----- Merge tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API" * tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) MAINTAINERS: Add reviewers for fs/resctrl x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl x86/resctrl: Always initialise rid field in rdt_resources_all[] x86/resctrl: Relax some asm #includes x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() x86/resctrl: Squelch whitespace anomalies in resctrl core code x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs x86/resctrl: Move enum resctrl_event_id to resctrl.h x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl fs/resctrl: Add boiler plate for external resctrl code x86/resctrl: Add 'resctrl' to the title of the resctrl documentation x86/resctrl: Split trace.h x86/resctrl: Expand the width of domid by replacing mon_data_bits x86/resctrl: Add end-marker to the resctrl_event_id enum x86/resctrl: Move is_mba_sc() out of core.c x86/resctrl: Drop __init/__exit on assorted symbols x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point x86/resctrl: Check all domains are offline in resctrl_exit() x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" ...	2025-05-27 09:53:02 -07:00
Linus Torvalds	14f19dc644	fscrypt update for 6.16 Add support for "hardware-wrapped inline encryption keys" to fscrypt. When enabled on supported platforms, this feature protects file contents keys from certain attacks, such as cold boot attacks. This feature uses the block layer support for wrapped keys which was merged in 6.15. Wrapped key support has existed out-of-tree in Android for a long time, and it's finally ready for upstream now that there is a platform on which it works end-to-end with upstream. Specifically, it works on the Qualcomm SM8650 HDK, using the Qualcomm ICE (Inline Crypto Engine) and HWKM (Hardware Key Manager). The corresponding driver support is included in the SCSI tree for 6.16. Validation for this feature includes two new tests that were already merged into xfstests (generic/368 and generic/369). -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaDNaqxQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK1K7AP92naB88sRzH1KG7Oic9+dMK+PImARP f15ebG2TzQ3qBgEAreqtNmtCNOH7pguYsTeAcX3Y243vzIkwkDRGk7k+aAI= =P6Sj -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt update from Eric Biggers: "Add support for 'hardware-wrapped inline encryption keys' to fscrypt. When enabled on supported platforms, this feature protects file contents keys from certain attacks, such as cold boot attacks. This feature uses the block layer support for wrapped keys which was merged in 6.15. Wrapped key support has existed out-of-tree in Android for a long time, and it's finally ready for upstream now that there is a platform on which it works end-to-end with upstream. Specifically, it works on the Qualcomm SM8650 HDK, using the Qualcomm ICE (Inline Crypto Engine) and HWKM (Hardware Key Manager). The corresponding driver support is included in the SCSI tree for 6.16. Validation for this feature includes two new tests that were already merged into xfstests (generic/368 and generic/369)" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: add support for hardware-wrapped keys	2025-05-26 13:27:40 -07:00
Linus Torvalds	79b98edf91	Changes since last update: - Add a `fsoffset` mount option to specify the filesystem offset; - Support Intel QAT accelerators to boost up the DEFLATE algorithm; - Initialize per-CPU workers and CPU hotplug hooks lazily to avoid unnecessary overhead when EROFS is not mounted; - Fix file handle encoding for 64-bit NIDs; - Minor cleanups. -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmgz6IQRHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qrp8w//V8rQpo/jQwUXP2xZDWUGe/iS5APQU/w+ IQRn8LRt1RYLD1ssShW60y5mc40pa/PktxLlddIfcDDFfhAv4zYEK7Iosrd5FeGX vDawKcFvjzozpveqtjWR63QPO0Ff/ldSsnl9FsdQopffWNFw+X7D+/4fgUJah+CF p5jnyp6D7RvNMHdLIjQjiqvvmmAdllqb+nbyLy0jGQkzjIGR2RdJtqrM5gdsE/B1 zKQRzs6NwYaBQ2MO6XmLAd2P0603RBGplR9OyLEpfFmUHX877pUxuGLQW2o+NbRY TodevQdzSJPlvHNrO0T+ztistwRhKGkCmyrP7+Vl4ackgRmA5ozT23CUxFX2hwQM GhE24aXyqO/vIA/RCsy+Tb8vxVY3ysNd4fz001HtWq0tOqLVyFkVEhvaZwLGqi1A PAV6WHqtYo/gjc8nrvq88GMGTUH0orIwlJpS9YQHhStzexyePDjl3cgQlmS0Q8J3 JHtf8S+pnaModsvqKJJ9LQW0bHrbry9Bfo0M6yQ5sirehcrqGeDFZ0m+ny16Ki9N bv8Mx811KNtAVoeuwAidH2NqUxnz1/faiIs0yYE/2Vg2QfuEKjVXbpkDo2wfQj1i TVsQ9gPJB9mZpvnuaGYGdgzxN/lQAIo3JxWAHvHhMz/1suike97vqKms4W4lSoBY JPbJjs/4uUA= =+2IX -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, Intel QAT hardware accelerators are supported to improve DEFLATE decompression performance. I've tested it with the enwik9 dataset of 1 MiB pclusters on our Intel Sapphire Rapids bare-metal server and a PL0 ESSD, and the sequential read performance even surpasses LZ4 software decompression on this setup. In addition, a `fsoffset` mount option is introduced for file-backed mounts to specify the filesystem offset in order to adapt customized container formats. And other improvements and minor cleanups. Summary: - Add a `fsoffset` mount option to specify the filesystem offset - Support Intel QAT accelerators to boost up the DEFLATE algorithm - Initialize per-CPU workers and CPU hotplug hooks lazily to avoid unnecessary overhead when EROFS is not mounted - Fix file handle encoding for 64-bit NIDs - Minor cleanups" * tag 'erofs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: support DEFLATE decompression by using Intel QAT erofs: clean up erofs_{init,exit}_sysfs() erofs: add 'fsoffset' mount option to specify filesystem offset erofs: lazily initialize per-CPU workers and CPU hotplug hooks erofs: refine readahead tracepoint erofs: avoid using multiple devices with different type erofs: fix file handle encoding for 64-bit NIDs	2025-05-26 12:47:41 -07:00
Linus Torvalds	522544fc71	bcachefs updates for 6.16 Lots of changes: - Poisoned extents can now be moved: this lets us handle bitrotted data without deleting it. For now, reading from poisoned extents only returns -EIO: in the future we'll have an API for specifying "read this data even if there were bitflips". - Incompatible features may now be enabled at runtime, via "opts/version_upgrade" in sysfs. Toggle it to incompatible, and then toggle it back - option changes via the sysfs interface are persistent. - Various changes to support deployable disk images: - RO mounts now use less memory - Images may be stripped of alloc info, particularly useful for slimming them down if they will primarily be mounted RO. Alloc info will be automatically regenerated on first RW mount, and this is quite fast. - Filesystem images generated with 'bcachefs image' will be automatically resized the first time they're mounted on a larger device. The images 'bcachefs image' generates with compression enabled have been comparable in size to those generated by squashfs and erofs - but you get a full RW capable filesystem. - Major error message improvements for btree node reads, data reads, and elsewhere. We now build up a single error message that lists all the errors encountered, actions taken to repair, and success/failure of the IO. This extends to other error paths that may kick off other actions, e.g. scheduling recovery passes: actions we took because of an error are included in that error message, with grouping/indentation so we can see what caused what. - Repair/self healing: - We can now kick off recovery passes and run them in the background if we detect errors. Currently, this is just used by code that walks backpointers; we now also check for missing backpointers at runtime and run check_extents_to_backpointers if required. The messy 6.14 upgrade left missing backpointers for some users, and this will correct that automatically instead of requiring a manual fsck - some users noticed this as copygc spinning and not making progress. In the future, as more recovery passes come online, we'll be able to repair and recover from nearly anything - except for unreadable btree nodes, and that's why you're using replication, of course - without shutting down the filesystem. - There's a new recovery pass, for checking the rebalance_work btree, which tracks extents that rebalance will process later. - Hardening: - Close the last known hole in btree iterator/btree locking assertions: path->should_be_locked paths must stay locked until the end of the transaction. This shook out a few bugs, including a performance issue that was causing unnecessary path_upgrade transaction restarts. - Performance; - Faster snapshot deletion: this is an incompatible feature, as it requires new sentinal values, for safety. Snapshot deletion no longer has to do a full metadata scan, it now just scans the inodes btree: if an extent/dirent/xattr is present for a given snapshot ID, we already require that an inode be present with that same snapshot ID. If/when users hit scalability limits again (ridiculously huge filesystems with lots of inodes, and many sparse snapshots), let me know - the next step will be to add an index from snapshot ID -> inode number, which won't be too hard. - Faster device removal: the "scan for pointers to this device" no longer does a full metadata scan, instead it walks backpointers. Like fast snapshot deletion this is another incompat feature: it also requires a new sentinal value, because we don't want to reuse these device IDs until after a fsck. - We're now coalescing redundant accounting updates prior to transaction commit, taking some pressure off the journal. Shortly we'll also be doing multiple extent updates in a transaction in the main write path, which combined with the previous should drastically cut down on the amount of metadata updates we have to journal. - Stack usage improvements: All allocator state has been moved off the stack - Debug improvements: - enumerated refcounts: The debug code previously used for filesystem write refs is now a small library, and used for other heavily used refcounts. Different users of a refcount are enumerated, making it much easier to debug refcount issues. - Async object debugging: There's a new kconfig option that makes various async objects (different types of bios, data updates, write ops, etc.) visible in debugfs, and it should be fast enough to leave on in production. - Various sets of assertions no longer require CONFIG_BCACHEFS_DEBUG, instead they're controlled by module parameters and static keys, meaning users won't need to compile custom kernels as often to help debug issues. - bch2_trans_kmalloc() calls can be tracked (there's a new kconfig option); with it on you can check the btree_transaction_stats in debugfs to see the bch2_trans_kmalloc() calls a transaction did when it used the most memory. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmgyaC0ACgkQE6szbY3K bnYcGQ//ZOCe34wjVFub+dNn9os0llaIFaShTC9Baoi+Ly8qmMBkiVR8h0XZWJ6I Xue8FaPksEDUF+pXSPjI+L/WA2uW/qNm2Q2RxEfxigSMSzUUZvHs/jU3ZkpZ1JQb l327tun1XNNY2JagcTj09X+VoasLuhQtvBKXM6gAWozXNszLesd1vaFexPsk13bV GwqSxlfayYt5DwzEf7OCL9CXWfW86qs8snLYAPpv/pyoVNKw+iuPFlhDA1AD1ZMG s+syQ5R7u5ikcfpYnaakDsn3KhxsX+jLk5PoSHk/6kGy/5BdJ1AUYQEsSNfdcxHy pxNht12Nuoo2q2qI0gL4oegnz36cndtveCf9vs6K0Vg24ZRylhh8uz3v/ZcAu0Ne CwFvpxMn5jtIgqh75i9R1/W6aiuKffkE29D4Me5RJxEqoM8yKKhKx6tHHzZftT3a QSvbgsfBghetfTqcajBvDDN5GQM2Z8pz2iLrIw/EHuAh15hAhzf+7ULHprIh6IDz m/Px72xrh39CAKI8IdsjD7QLT9a7xN3WKQXbSvFMEPjnJtGL3JGARZfsKB2gL7ZO 551ONexueFkilQmGQfy20VYGF1Mu9mWTUqyVnNaQUMbgKKDcAivy71UyFe/n3GOB xJyEKTfrJg8Qn+vEJvlhXevVnz5FO/hiOAMIrMPKQq8XT0iNdAA= =srxl -----END PGP SIGNATURE----- Merge tag 'bcachefs-2025-05-24' of git://evilpiepirate.org/bcachefs Pull bcachefs updates from Kent Overstreet: - Poisoned extents can now be moved: this lets us handle bitrotted data without deleting it. For now, reading from poisoned extents only returns -EIO: in the future we'll have an API for specifying "read this data even if there were bitflips". - Incompatible features may now be enabled at runtime, via "opts/version_upgrade" in sysfs. Toggle it to incompatible, and then toggle it back - option changes via the sysfs interface are persistent. - Various changes to support deployable disk images: - RO mounts now use less memory - Images may be stripped of alloc info, particularly useful for slimming them down if they will primarily be mounted RO. Alloc info will be automatically regenerated on first RW mount, and this is quite fast - Filesystem images generated with 'bcachefs image' will be automatically resized the first time they're mounted on a larger device The images 'bcachefs image' generates with compression enabled have been comparable in size to those generated by squashfs and erofs - but you get a full RW capable filesystem - Major error message improvements for btree node reads, data reads, and elsewhere. We now build up a single error message that lists all the errors encountered, actions taken to repair, and success/failure of the IO. This extends to other error paths that may kick off other actions, e.g. scheduling recovery passes: actions we took because of an error are included in that error message, with grouping/indentation so we can see what caused what. - New option, 'rebalance_on_ac_only'. Does exactly what the name suggests, quite handy with background compression. - Repair/self healing: - We can now kick off recovery passes and run them in the background if we detect errors. Currently, this is just used by code that walks backpointers. We now also check for missing backpointers at runtime and run check_extents_to_backpointers if required. The messy 6.14 upgrade left missing backpointers for some users, and this will correct that automatically instead of requiring a manual fsck - some users noticed this as copygc spinning and not making progress. In the future, as more recovery passes come online, we'll be able to repair and recover from nearly anything - except for unreadable btree nodes, and that's why you're using replication, of course - without shutting down the filesystem. - There's a new recovery pass, for checking the rebalance_work btree, which tracks extents that rebalance will process later. - Hardening: - Close the last known hole in btree iterator/btree locking assertions: path->should_be_locked paths must stay locked until the end of the transaction. This shook out a few bugs, including a performance issue that was causing unnecessary path_upgrade transaction restarts. - Performance: - Faster snapshot deletion: this is an incompatible feature, as it requires new sentinal values, for safety. Snapshot deletion no longer has to do a full metadata scan, it now just scans the inodes btree: if an extent/dirent/xattr is present for a given snapshot ID, we already require that an inode be present with that same snapshot ID. If/when users hit scalability limits again (ridiculously huge filesystems with lots of inodes, and many sparse snapshots), let me know - the next step will be to add an index from snapshot ID -> inode number, which won't be too hard. - Faster device removal: the "scan for pointers to this device" no longer does a full metadata scan, instead it walks backpointers. Like fast snapshot deletion this is another incompat feature: it also requires a new sentinal value, because we don't want to reuse these device IDs until after a fsck. - We're now coalescing redundant accounting updates prior to transaction commit, taking some pressure off the journal. Shortly we'll also be doing multiple extent updates in a transaction in the main write path, which combined with the previous should drastically cut down on the amount of metadata updates we have to journal. - Stack usage improvements: All allocator state has been moved off the stack - Debug improvements: - enumerated refcounts: The debug code previously used for filesystem write refs is now a small library, and used for other heavily used refcounts. Different users of a refcount are enumerated, making it much easier to debug refcount issues. - Async object debugging: There's a new kconfig option that makes various async objects (different types of bios, data updates, write ops, etc.) visible in debugfs, and it should be fast enough to leave on in production. - Various sets of assertions no longer require CONFIG_BCACHEFS_DEBUG, instead they're controlled by module parameters and static keys, meaning users won't need to compile custom kernels as often to help debug issues. - bch2_trans_kmalloc() calls can be tracked (there's a new kconfig option). With it on you can check the btree_transaction_stats in debugfs to see the bch2_trans_kmalloc() calls a transaction did when it used the most memory. * tag 'bcachefs-2025-05-24' of git://evilpiepirate.org/bcachefs: (218 commits) bcachefs: Don't mount bs > ps without TRANSPARENT_HUGEPAGE bcachefs: Fix btree_iter_next_node() for new locking asserts bcachefs: Ensure we don't use a blacklisted journal seq bcachefs: Small check_fix_ptr fixes bcachefs: Fix opts.recovery_pass_last bcachefs: Fix allocate -> self healing path bcachefs: Fix endianness in casefold check/repair bcachefs: Path must be locked if trans->locked && should_be_locked bcachefs: Simplify bch2_path_put() bcachefs: Plumb btree_trans for more locking asserts bcachefs: Clear trans->locked before unlock bcachefs: Clear should_be_locked before unlock in key_cache_drop() bcachefs: bch2_path_get() reuses paths if upgrade_fails & !should_be_locked bcachefs: Give out new path if upgrade fails bcachefs: Fix btree_path_get_locks when not doing trans restart bcachefs: btree_node_locked_type_nowrite() bcachefs: Kill bch2_path_put_nokeep() bcachefs: bch2_journal_write_checksum() bcachefs: Reduce stack usage in data_update_index_update() bcachefs: bch2_trans_log_str() ...	2025-05-26 12:43:30 -07:00
Linus Torvalds	a2e43397e5	vfs-6.16-rc1.iomap -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPUAAKCRCRxhvAZXjc opWHAP9xpS4Z/MvxYpRMQ7G6MSECDNZq0ru8k6HXuq4BIeMgNQEA7nI0JiyVjanY ZCkRuBpoWMxR5OsiNIpL0GbhTVFwvwk= =or0/ -----END PGP SIGNATURE----- Merge tag 'vfs-6.16-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull iomap updates from Christian Brauner: - More fallout and preparatory work associated with the folio batch prototype posted a while back. Mainly this just cleans up some of the helpers and pushes some pos/len trimming further down in the write begin path. - Add missing flag descriptions to the iomap documentation * tag 'vfs-6.16-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: rework iomap_write_begin() to return folio offset and length iomap: push non-large folio check into get folio path iomap: helper to trim pos/bytes to within folio iomap: drop pos param from __iomap_[get\|put]_folio() iomap: drop unnecessary pos param from iomap_write_[begin\|end] iomap: resample iter->pos after iomap_write_begin() calls iomap: trace: Add missing flags to [IOMAP_\|IOMAP_F_]FLAGS_STRINGS Documentation: iomap: Add missing flags description	2025-05-26 11:28:42 -07:00
Linus Torvalds	181d8e399f	vfs-6.16-rc1.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc om0+AQDMxKLweJXplqQQ7jxuvW2dEa60YpE2EalEKWGg9YA3KgEA3nI4kyKMKn7Y PRFXgIcKvhs62oJLKsq8SGQUqExqvAE= =atEw -----END PGP SIGNATURE----- Merge tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Use folios for symlinks in the page cache FUSE already uses folios for its symlinks. Mirror that conversion in the generic code and the NFS code. That lets us get rid of a few folio->page->folio conversions in this path, and some of the few remaining users of read_cache_page() / read_mapping_page() - Try and make a few filesystem operations killable on the VFS inode->i_mutex level - Add sysctl vfs_cache_pressure_denom for bulk file operations Some workloads need to preserve more dentries than we currently allow through out sysctl interface A HDFS servers with 12 HDDs per server, on a HDFS datanode startup involves scanning all files and caching their metadata (including dentries and inodes) in memory. Each HDD contains approximately 2 million files, resulting in a total of ~20 million cached dentries after initialization To minimize dentry reclamation, they set vfs_cache_pressure to 1. Despite this configuration, memory pressure conditions can still trigger reclamation of up to 50% of cached dentries, reducing the cache from 20 million to approximately 10 million entries. During the subsequent cache rebuild period, any HDFS datanode restart operation incurs substantial latency penalties until full cache recovery completes To maintain service stability, more dentries need to be preserved during memory reclamation. The current minimum reclaim ratio (1/100 of total dentries) remains too aggressive for such workload. This patch introduces vfs_cache_pressure_denom for more granular cache pressure control The configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000] effectively maintains the full 20 million dentry cache under memory pressure, preventing datanode restart performance degradation - Avoid some jumps in inode_permission() using likely()/unlikely() - Avid a memory access which is most likely a cache miss when descending into devcgroup_inode_permission() - Add fastpath predicts for stat() and fdput() - Anonymous inodes currently don't come with a proper mode causing issues in the kernel when we want to add useful VFS debug assert. Fix that by giving them a proper mode and masking it off when we report it to userspace which relies on them not having any mode - Anonymous inodes currently allow to change inode attributes because the VFS falls back to simple_setattr() if i_op->setattr isn't implemented. This means the ownership and mode for every single user of anon_inode_inode can be changed. Block that as it's either useless or actively harmful. If specific ownership is needed the respective subsystem should allocate anonymous inodes from their own private superblock - Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock - Add proper tests for anonymous inode behavior - Make it easy to detect proper anonymous inodes and to ensure that we can detect them in codepaths such as readahead() Cleanups: - Port pidfs to the new anon_inode_{g,s}etattr() helpers - Try to remove the uselib() system call - Add unlikely branch hint return path for poll - Add unlikely branch hint on return path for core_sys_select - Don't allow signals to interrupt getdents copying for fuse - Provide a size hint to dir_context for during readdir() - Use writeback_iter directly in mpage_writepages - Update compression and mtime descriptions in initramfs documentation - Update main netfs API document - Remove useless plus one in super_cache_scan() - Remove unnecessary NULL-check guards during setns() - Add separate separate {get,put}_cgroup_ns no-op cases Fixes: - Fix typo in root= kernel parameter description - Use KERN_INFO for infof()\|info_plog()\|infofc() - Correct comments of fs_validate_description() - Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep() - Delete macro fsparam_u32hex() - Remove unused and problematic validate_constant_table() - Fix potential unsigned integer underflow in fs_name() - Make file-nr output the total allocated file handles" * tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits) fs: Pass a folio to page_put_link() nfs: Use a folio in nfs_get_link() fs: Convert __page_get_link() to use a folio fs/read_write: make default_llseek() killable fs/open: make do_truncate() killable fs/open: make chmod_common() and chown_common() killable include/linux/fs.h: add inode_lock_killable() readdir: supply dir_context.count as readdir buffer size hint vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations fuse: don't allow signals to interrupt getdents copying Documentation: fix typo in root= kernel parameter description include/cgroup: separate {get,put}_cgroup_ns no-op case kernel/nsproxy: remove unnecessary guards fs: use writeback_iter directly in mpage_writepages fs: remove useless plus one in super_cache_scan() fs: add S_ANON_INODE fs: remove uselib() system call device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission() fs/fs_parse: Remove unused and problematic validate_constant_table() fs: touch up predicts in inode_permission() ...	2025-05-26 09:02:39 -07:00
Linus Torvalds	dc76285144	vfs-6.16-rc1.writepage -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTgAKCRCRxhvAZXjc ovkTAP9tyN24Oo+koY/2UedYBxM54cW4BCCRsVmkzfr8NSVdwwD/dg+v6gS8+nyD 3jlR0Z/08UyMHapB7fnAuFxPXXc8oAo= =e55o -----END PGP SIGNATURE----- Merge tag 'vfs-6.16-rc1.writepage' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull final writepage conversion from Christian Brauner: "This converts vboxfs from ->writepage() to ->writepages(). This was the last user of the ->writepage() method. So remove ->writepage() completely and all references to it" * tag 'vfs-6.16-rc1.writepage' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Remove aops->writepage mm: Remove swap_writepage() and shmem_writepage() ttm: Call shmem_writeout() from ttm_backup_backup_page() i915: Use writeback_iter() shmem: Add shmem_writeout() writeback: Remove writeback_use_writepage() migrate: Remove call to ->writepage vboxsf: Convert to writepages 9p: Add a migrate_folio method	2025-05-26 08:23:09 -07:00
Linus Torvalds	6d5b940e1e	vfs-6.16-rc1.async.dir -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBN6wAKCRCRxhvAZXjc ok32AQD9DTiSCAoVg+7s+gSBuLTi8drPTN++mCaxdTqRh5WpRAD9GVyrGQT0s6LH eo9bm8d1TAYjilEWM0c0K0TxyQ7KcAA= =IW7H -----END PGP SIGNATURE----- Merge tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs directory lookup updates from Christian Brauner: "This contains cleanups for the lookup_one() family of helpers. We expose a set of functions with names containing "lookup_one_len" and others without the "_len". This difference has nothing to do with "len". It's rater a historical accident that can be confusing. The functions without "_len" take a "mnt_idmap" pointer. This is found in the "vfsmount" and that is an important question when choosing which to use: do you have a vfsmount, or are you "inside" the filesystem. A related question is "is permission checking relevant here?". nfsd and cachefiles do* have a vfsmount but don't use the non-_len functions. They pass nop_mnt_idmap and refuse to work on filesystems which have any other idmap. This work changes nfsd and cachefile to use the lookup_one family of functions and to explictily pass &nop_mnt_idmap which is consistent with all other vfs interfaces used where &nop_mnt_idmap is explicitly passed. The remaining uses of the "_one" functions do not require permission checks so these are renamed to be "_noperm" and the permission checking is removed. This series also changes these lookup function to take a qstr instead of separate name and len. In many cases this simplifies the call" * tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: VFS: change lookup_one_common and lookup_noperm_common to take a qstr Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS VFS: rename lookup_one_len family to lookup_noperm and remove permission check cachefiles: Use lookup_one() rather than lookup_one_len() nfsd: Use lookup_one() rather than lookup_one_len() VFS: improve interface for lookup_one functions	2025-05-26 08:02:43 -07:00
Sheng Yong	c36ec00d7f	erofs: add 'fsoffset' mount option to specify filesystem offset When attempting to use an archive file, such as APEX on android, as a file-backed mount source, it fails because EROFS image within the archive file does not start at offset 0. As a result, a loop or a dm device is still needed to attach the image file at an appropriate offset first. Similarly, if an EROFS image within a block device does not start at offset 0, it cannot be mounted directly either. To address this issue, this patch adds a new mount option `fsoffset=x' to accept a start offset for the primary device. The offset should be aligned to the block size. EROFS will add this offset before performing read requests. Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Signed-off-by: Wang Shuai <wangshuai12@xiaomi.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250517090544.2687651-1-shengyong1@xiaomi.com [ Gao Xiang: minor update on documentation and the error message. ] Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-05-22 11:57:57 +08:00
Kent Overstreet	c53e5c0c19	docs: bcachefs: add casefolding reference Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:14:39 -04:00
Kent Overstreet	9e260e4590	docs: bcachefs: idle work scheduling design doc People have been asking to see the plan for this, so - bcachefs has various background tasks that need to be scheduled to balance efficiency, predictability of performance, etc. The design and philosophy hasn't changed too much since bcache, which was primarily designed for server usage, with sustained load in mind. These days we're seeing more desktop usage - where we really want to let the system idle effictively, to reduce total power usage - while also still balancing previous concerns, we still want to let work accumulate to a degree. This lays out all the requirements and starts to sketch out the algorithm I have in mind. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-05-21 20:14:23 -04:00
Max Kellermann	c1a606cd75	fs/netfs: remove unused flag NETFS_SREQ_SEEK_DATA_READ This flag was added by commit `3d3c950467` ("netfs: Provide readahead and readpage netfs helpers") but its only user was removed by commit `86b374d061` ("netfs: Remove fs/netfs/io.c"). Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-3-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:37 +02:00
Ritesh Harjani (IBM)	0bf1f51e34	ext4: Add atomic block write documentation Add an initial documentation around atomic writes support in ext4. Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/d3893b9f5ad70317abae72046e81e4c180af91bf.1747337952.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-05-20 10:31:12 -04:00
Randy Dunlap	14e991154d	Docs: relay: editing cleanups Cleanup some punctuation, capital letter, and a missing word in relay.rst. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20250512023233.107582-1-rdunlap@infradead.org>	2025-05-19 08:02:14 -06:00
James Morse	7168ae330e	x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl Resctrl is a filesystem interface to hardware that provides cache allocation policy and bandwidth control for groups of tasks or CPUs. To support more than one architecture, resctrl needs to live in /fs/. Move the code that is concerned with the filesystem interface to /fs/resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-25-james.morse@arm.com	2025-05-16 14:36:09 +02:00
Chao Yu	13be879576	f2fs: fix 32-bits hexademical number in fault injection doc FAULT_KMALLOC 0x000000001 There is one redundant '0' in 32-bits hexademical number of fault type, remove it. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-05-13 15:37:00 +00:00
Chen Linxuan	18ee43c398	docs: filesystems: add fuse-passthrough.rst Add a documentation about FUSE passthrough. It's mainly about why FUSE passthrough needs CAP_SYS_ADMIN. Link: https://lore.kernel.org/all/4b64a41c-6167-4c02-8bae-3021270ca519@fastmail.fm/T/#mc73e04df56b8830b1d7b06b5d9f22e594fba423e Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxhAY1m7ubJ3p-A3rSufw_53WuDRMT1Zqe_OC0bP_Fb3Zw@mail.gmail.com/ Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2025-05-12 10:02:08 +02:00
Dr. David Alan Gilbert	2a1c615813	relay: remove unused relay_late_setup_files The last use of relay_late_setup_files() was removed in 2018 by commit `2b47733045` ("drm/i915/guc: Merge log relay file and channel creation") Remove it and the helper it used. relay_late_setup_files() was used for eventually registering 'buffer only' channels. With it gone, delete the docs that explain how to do that. Which suggests it should be possible to lose the 'has_base_filename' flags. (Are there any other uses??) Link: https://lkml.kernel.org/r/20250418234932.490863-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:54:09 -07:00
Chao Yu	0244c77fed	f2fs: support FAULT_TIMEOUT Support to inject a timeout fault into function, currently it only support to inject timeout to commit_atomic_write flow to reproduce inconsistent bug, like the bug fixed by commit `f098aeba04` ("f2fs: fix to avoid atomicity corruption of atomic file"). By default, the new type fault will inject 1000ms timeout, and the timeout process can be interrupted by SIGKILL. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-05-06 15:46:55 +00:00
Al Viro	006ff7498f	saner calling conventions for ->d_automount() Currently the calling conventions for ->d_automount() instances have an odd wart - returned new mount to be attached is expected to have refcount 2. That kludge is intended to make sure that mark_mounts_for_expiry() called before we get around to attaching that new mount to the tree won't decide to take it out. finish_automount() drops the extra reference after it's done with attaching mount to the tree - or drops the reference twice in case of error. ->d_automount() instances have rather counterintuitive boilerplate in them. There's a much simpler approach: have mark_mounts_for_expiry() skip the mounts that are yet to be mounted. And to hell with grabbing/dropping those extra references. Makes for simpler correctness analysis, at that... Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Acked-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-05-05 13:42:49 -04:00
Timur Tabi	e99efa8ac8	docs: debugfs: do not recommend debugfs_remove_recursive Update the debugfs documentation to indicate that debugfs_remove() should be used to clean up debugfs entries. In commit `a3d1e7eb5a` ("simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems"), function debugfs_remove_recursive() was made into an alias for debugfs_remove(): #define debugfs_remove_recursive debugfs_remove Therefore, drivers should just use debugfs_remove() going forward. Signed-off-by: Timur Tabi <ttabi@nvidia.com> Link: https://lore.kernel.org/r/20250429173958.3973958-1-ttabi@nvidia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-04-30 19:11:04 +02:00
Miklos Szeredi	5ef7bcdeec	ovl: relax redirect/metacopy requirements for lower -> data redirect Allow the special case of a redirect from a lower layer to a data layer without having to turn on metacopy. This makes the feature work with userxattr, which in turn allows data layers to be usable in user namespaces. Minimize the risk by only enabling redirect from a single lower layer to a data layer iff a data layer is specified. The only way to access a data layer is to enable this, so there's really no reason not to enable this. This can be used safely if the lower layer is read-only and the user.overlay.redirect xattr cannot be modified. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2025-04-30 10:55:27 +02:00
Zijun Hu	d1f482108a	fs/fs_parse: Remove unused and problematic validate_constant_table() Remove validate_constant_table() since: - It has no caller. - It has below 3 bugs for good constant table array array[] which must end with a empty entry, and take below invocation for explaination: validate_constant_table(array, ARRAY_SIZE(array), ...) - Always return wrong value due to the last empty entry. - Imprecise error message for missorted case. - Potential NULL pointer dereference since the last pr_err() may use @tbl[i].name NULL pointer to print the last empty entry's name. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/20250415-fix_fs-v4-1-5d575124a3ff@quicinc.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:59 +02:00
Zijun Hu	296b67059e	fs/fs_parse: Delete macro fsparam_u32hex() Delete macro fsparam_u32hex() since: - it has no caller. - it uses as type @fs_param_is_u32_hex which is never defined, so will cause compile error when caller uses it. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/20250411-fix_fs-v2-1-5d3395c102e4@quicinc.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:58 +02:00
Ritesh Harjani (IBM)	336bac5e08	Documentation: iomap: Add missing flags description Let's document the use of these flags in iomap design doc where other flags are defined too - - IOMAP_F_BOUNDARY was added by XFS to prevent merging of I/O and I/O completions across RTG boundaries. - IOMAP_F_ATOMIC_BIO was added for supporting atomic I/O operations for filesystems to inform the iomap that it needs HW-offload based mechanism for torn-write protection. While we are at it, let's also fix the description of IOMAP_F_PRIVATE flag after a recent: commit `923936efeb` ("iomap: Fix conflicting values of iomap flags") Signed-off-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Link: https://lore.kernel.org/8d8534a704c4f162f347a84830710db32a927b2e.1744432270.git.ritesh.list@gmail.com Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-15 10:30:48 +02:00
Linus Torvalds	5aaaedb0cb	A few more miscellaneous ext4 bug fixes and cleanups including some syzbot failures and fixing a stale file handing refeencing an inode previously used as a regular file, but which has been deleted and reused as an ea_inode would result in ext4 erroneously consider this a case of fs corrupotion. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmf7r3YACgkQ8vlZVpUN gaPl9QgApwE5BAQdO6miW0sDMPj5b4sMc25aG4OPlfKhFqiIJB0Ub4zC2n0OFnaf HXk8P5oVeepH9ciTnYFF30X20Ythzjwmd9j5eyq2wsfYASQUjfcvmR9WovbqZtGQ 3Zerd9QFp7SvZa+K4sADBhEb/7HAnxDGfiqSQptY6WQTwD+it1bnuhmzG0m6AH4m R1ItREDx7D2QrudDToFBd8XQ+FgRETZ8Qrs7PqIznw/dBNMdHRnAiw2eiyuoPU/S T8cmCxii3Z9sJ6LtohKYuWOmOmdxg951V5ZcekVRuaFSljSUsRsIplO7OlaMvQDs 9vGVKiiZLdU2B0Wd90IeQUdJmP4xPg== =I8qx -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A few more miscellaneous ext4 bug fixes and cleanups including some syzbot failures and fixing a stale file handing refeencing an inode previously used as a regular file, but which has been deleted and reused as an ea_inode would result in ext4 erroneously considering this a case of fs corruption" * tag 'ext4_for_linus-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix off-by-one error in do_split ext4: make block validity check resistent to sb bh corruption ext4: avoid -Wflex-array-member-not-at-end warning Documentation: ext4: Add fields to ext4_super_block documentation ext4: don't treat fhandle lookup of ea_inode as FS corruption	2025-04-13 07:15:50 -07:00
Tom Vierjahn	ce7e8a65aa	Documentation: ext4: Add fields to ext4_super_block documentation Documentation and implementation of the ext4 super block have slightly diverged: Padding has been removed in order to make room for new fields that are still missing in the documentation. Add the new fields s_encryption_level, s_first_error_errorcode, s_last_error_errorcode to the documentation of the ext4 super block. Fixes: `f542fbe8d5` ("ext4 crypto: reserve codepoints used by the ext4 encryption feature") Fixes: `878520ac45` ("ext4: save the error code which triggered an ext4_error() in the superblock") Signed-off-by: Tom Vierjahn <tom.vierjahn@acm.org> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20250324221004.5268-1-tom.vierjahn@acm.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-04-12 21:50:41 -04:00
David Howells	f1745496d3	netfs: Update main API document Bring the netfs documentation up to date. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/1690127.1744208325@warthog.procyon.org.uk Reviewed-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: Viacheslav Dubeyko <slava@dubeyko.com> cc: Alex Markuze <amarkuze@redhat.com> cc: Timothy Day <timday@amazon.com> cc: Jonathan Corbet <corbet@lwn.net> cc: netfs@lists.linux.dev cc: linux-doc@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-11 15:23:50 +02:00
Eric Biggers	c07d3aede2	fscrypt: add support for hardware-wrapped keys Add support for hardware-wrapped keys to fscrypt. Such keys are protected from certain attacks, such as cold boot attacks. For more information, see the "Hardware-wrapped keys" section of Documentation/block/inline-encryption.rst. To support hardware-wrapped keys in fscrypt, we allow the fscrypt master keys to be hardware-wrapped. File contents encryption is done by passing the wrapped key to the inline encryption hardware via blk-crypto. Other fscrypt operations such as filenames encryption continue to be done by the kernel, using the "software secret" which the hardware derives. For more information, see the documentation which this patch adds to Documentation/filesystems/fscrypt.rst. Note that this feature doesn't require any filesystem-specific changes. However it does depend on inline encryption support, and thus currently it is only applicable to ext4 and f2fs. The version of this feature introduced by this patch is mostly equivalent to the version that has existed downstream in the Android Common Kernels since 2020. However, a couple fixes are included. First, the flags field in struct fscrypt_add_key_arg is now placed in the proper location. Second, key identifiers for HW-wrapped keys are now derived using a distinct HKDF context byte; this fixes a bug where a raw key could have the same identifier as a HW-wrapped key. Note that as a result of these fixes, the version of this feature introduced by this patch is not UAPI or on-disk format compatible with the version in the Android Common Kernels, though the divergence is limited to just those specific fixes. This version should be used going forwards. This patch has been heavily rewritten from the original version by Gaurav Kashyap <quic_gaurkash@quicinc.com> and Barani Muthukumaran <bmuthuku@codeaurora.org>. Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650 Link: https://lore.kernel.org/r/20250404225859.172344-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2025-04-08 19:32:11 -07:00
NeilBrown	06c567403a	Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS try_lookup_noperm() and d_hash_and_lookup() are nearly identical. The former does some validation of the name where the latter doesn't. Outside of the VFS that validation is likely valuable, and having only one exported function for this task is certainly a good idea. So make d_hash_and_lookup() local to VFS files and change all other callers to try_lookup_noperm(). Note that the arguments are swapped. Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250319031545.2999807-6-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-08 11:24:41 +02:00
NeilBrown	fa6fe07d15	VFS: rename lookup_one_len family to lookup_noperm and remove permission check The lookup_one_len family of functions is (now) only used internally by a filesystem on itself either - in a context where permission checking is irrelevant such as by a virtual filesystem populating itself, or xfs accessing its ORPHANAGE or dquota accessing the quota file; or - in a context where a permission check (MAY_EXEC on the parent) has just been performed such as a network filesystem finding in "silly-rename" file in the same directory. This is also the context after the _parentat() functions where currently lookup_one_qstr_excl() is used. So the permission check is pointless. The name "one_len" is unhelpful in understanding the purpose of these functions and should be changed. Most of the callers pass the len as "strlen()" so using a qstr and QSTR() can simplify the code. This patch renames these functions (include lookup_positive_unlocked() which is part of the family despite the name) to have a name based on "lookup_noperm". They are changed to receive a 'struct qstr' instead of separate name and len. In a few cases the use of QSTR() results in a new call to strlen(). try_lookup_noperm() takes a pointer to a qstr instead of the whole qstr. This is consistent with d_hash_and_lookup() (which is nearly identical) and useful for lookup_noperm_unlocked(). The new lookup_noperm_common() doesn't take a qstr yet. That will be tidied up in a subsequent patch. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-08 11:24:36 +02:00
Matthew Wilcox (Oracle)	6b0dfabb35	fs: Remove aops->writepage All callers and implementations are now removed, so remove the operation and update the documentation to match. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250402150005.2309458-10-willy@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 09:36:50 +02:00
NeilBrown	5741909697	VFS: improve interface for lookup_one functions The family of functions: lookup_one() lookup_one_unlocked() lookup_one_positive_unlocked() appear designed to be used by external clients of the filesystem rather than by filesystems acting on themselves as the lookup_one_len family are used. They are used by: btrfs/ioctl - which is a user-space interface rather than an internal activity exportfs - i.e. from nfsd or the open_by_handle_at interface overlayfs - at access the underlying filesystems smb/server - for file service They should be used by nfsd (more than just the exportfs path) and cachefs but aren't. It would help if the documentation didn't claim they should "not be called by generic code". Also the path component name is passed as "name" and "len" which are (confusingly?) separate by the "base". In some cases the len in simply "strlen" and so passing a qstr using QSTR() would make the calling clearer. Other callers do pass separate name and len which are stored in a struct. Sometimes these are already stored in a qstr, other times it easily could be. So this patch changes these three functions to receive a 'struct qstr *', and improves the documentation. QSTR_LEN() is added to make it easy to pass a QSTR containing a known len. [brauner@kernel.org: take a struct qstr pointer] Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/r/20250319031545.2999807-2-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 09:25:32 +02:00
Linus Torvalds	bdafff62ae	9p update for 6.15-rc1 - fix handling of bogus (negative/too long) replies - fix crash on mkdir with ACLs (... looks like nobody is using ACLs with semi-recent kernels...) - ipv6 support for trans=tcp - minor concurrency fix to make syzbot happy - minor cleanup -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmfuAoIACgkQq06b7GqY 5nBb9w/+K9WnU4MdSTFSXDJ+ZZTY//fPpFaUTqHl1hTeRjmIBtBdngy9ASvnPrPj n6DHnd+qkdFV6cMvs5wPUskRxJZuRDugzZMAd6yzjJoRNPmNFN2Ux7EXWEdFwvFG mk4EJtzgiZhp7XWlNzQeMziuDmMZJijzLsd4zVYNo9fNKEh5jLKjKWyHTVRxfuCc i22Y8oUgcghK0YSSLoL59xF4nRrvn57DBF3wnrW6pqVvVQ05NJRH4fNgXp4wW497 jxQq01ela7IgNUoMgib7F0ov1fu8pSEd95T+fzcqynZCePQ9rzDbvt3MR7rjJuqo /VXwW7N3KT6DrQG6Wu21B9VcfBeWjdbtJ/GWGVp8d2iP04Sv0escx53qETZSD0iZ pMIZLthJuXlq9dmxZ/j+BPLlbm7uAFPbP15/O9Un5xVvrisANFm1TPvM77btnrEP KovWfooheoUrK6DmkKbkzS5HJH2ko4CASAG7c8GL+R1hXwVDswC06cecyvXaKQQK Um4nOe59hRqbqWXmIEs4jssoUjfg8MfuX71DvX0p6+r1WR+eySieG2HiTz/mTj0q /27cCWlAvjYxa42opxASAD1/HvW2tZfcPKtSQbh/3s0FBpTVqbof3fxmnTjcb0Po V7WpuRSD7DnmawjbQQLXznUQokagO23/ySO1vARnluKyGwsn5yI= =Q0mE -----END PGP SIGNATURE----- Merge tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: - fix handling of bogus (negative/too long) replies - fix crash on mkdir with ACLs (... looks like nobody is using ACLs with semi-recent kernels...) - ipv6 support for trans=tcp - minor concurrency fix to make syzbot happy - minor cleanup * tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux: docs: fs/9p: Add missing "not" in cache documentation 9p: Use hashtable.h for hash_errmap Documentation/fs/9p: fix broken link 9p/trans_fd: mark concurrent read and writes to p9_conn->err 9p/net: return error on bogus (longer than requested) replies 9p/net: fix improper handling of bogus negative read/write replies fs/9p: fix NULL pointer dereference on mkdir net/9p/fd: support ipv6 for trans=tcp	2025-04-03 15:35:46 -07:00
Tingmao Wang	4210030d8b	docs: fs/9p: Add missing "not" in cache documentation A quick fix for what I assume is a typo. Signed-off-by: Tingmao Wang <m@maowtm.org> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Message-ID: <20250330213443.98434-1-m@maowtm.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-04-03 12:31:11 +09:00
Linus Torvalds	d6b02199cd	- The 7 patch series "powerpc/crash: use generic crashkernel reservation" from Sourabh Jain changes powerpc's kexec code to use more of the generic layers. - The 2 patch series "get_maintainer: report subsystem status separately" from Vlastimil Babka makes some long-requested improvements to the get_maintainer output. - The 4 patch series "ucount: Simplify refcounting with rcuref_t" from Sebastian Siewior cleans up and optimizing the refcounting in the ucount code. - The 12 patch series "reboot: support runtime configuration of emergency hw_protection action" from Ahmad Fatoum improves the ability for a driver to perform an emergency system shutdown or reboot. - The 16 patch series "Converge on using secs_to_jiffies() part two" from Easwar Hariharan performs further migrations from msecs_to_jiffies() to secs_to_jiffies(). - The 7 patch series "lib/interval_tree: add some test cases and cleanup" from Wei Yang permits more userspace testing of kernel library code, adds some more tests and performs some cleanups. - The 2 patch series "hung_task: Dump the blocking task stacktrace" from Masami Hiramatsu arranges for the hung_task detector to dump the stack of the blocking task and not just that of the blocked task. - The 4 patch series "resource: Split and use DEFINE_RES() macros" from Andy Shevchenko provides some cleanups to the resource definition macros. - Plus the usual shower of singleton patches - please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq r6khQUIcQImPPcjFqEFpuiSOU0MBZA0= =Kii8 -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "powerpc/crash: use generic crashkernel reservation" from Sourabh Jain changes powerpc's kexec code to use more of the generic layers. - The series "get_maintainer: report subsystem status separately" from Vlastimil Babka makes some long-requested improvements to the get_maintainer output. - The series "ucount: Simplify refcounting with rcuref_t" from Sebastian Siewior cleans up and optimizing the refcounting in the ucount code. - The series "reboot: support runtime configuration of emergency hw_protection action" from Ahmad Fatoum improves the ability for a driver to perform an emergency system shutdown or reboot. - The series "Converge on using secs_to_jiffies() part two" from Easwar Hariharan performs further migrations from msecs_to_jiffies() to secs_to_jiffies(). - The series "lib/interval_tree: add some test cases and cleanup" from Wei Yang permits more userspace testing of kernel library code, adds some more tests and performs some cleanups. - The series "hung_task: Dump the blocking task stacktrace" from Masami Hiramatsu arranges for the hung_task detector to dump the stack of the blocking task and not just that of the blocked task. - The series "resource: Split and use DEFINE_RES() macros" from Andy Shevchenko provides some cleanups to the resource definition macros. - Plus the usual shower of singleton patches - please see the individual changelogs for details. * tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) mailmap: consolidate email addresses of Alexander Sverdlin fs/procfs: fix the comment above proc_pid_wchan() relay: use kasprintf() instead of fixed buffer formatting resource: replace open coded variant of DEFINE_RES() resource: replace open coded variants of DEFINE_RES_*_NAMED() resource: replace open coded variant of DEFINE_RES_NAMED_DESC() resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED() samples: add hung_task detector mutex blocking sample hung_task: show the blocker task if the task is hung on mutex kexec_core: accept unaccepted kexec segments' destination addresses watchdog/perf: optimize bytes copied and remove manual NUL-termination lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap() lib/interval_tree: skip the check before go to the right subtree lib/interval_tree: add test case for span iteration lib/interval_tree: add test case for interval_tree_iter_xxx() helpers lib/rbtree: add random seed lib/rbtree: split tests lib/rbtree: enable userland test suite for rbtree related data structure checkpatch: describe --min-conf-desc-length scripts/gdb/symbols: determine KASLR offset on s390 ...	2025-04-01 10:06:52 -07:00
Linus Torvalds	eb0ece1602	- The 6 patch series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was founf to be incorrect. - The 4 patch series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The 17 patch series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The 5 patch series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The 4 patch series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The 12 patch series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The 2 patch series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The 3 patch series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The 4 patch series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The 4 patch series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The 4 patch series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The 18 patch series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The 27 patch series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The 12 patch series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The 8 patch series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The 2 patch series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The 3 patch series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The 3 patch series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The 3 patch series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The 9 patch series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The 5 patch series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The 13 patch series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The 13 patch series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The 3 patch series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The 8 patch series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The 2 patch series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The 3 patch series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The 4 patch series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The 3 patch series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The 5 patch series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The 5 patch series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The 4 patch series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The 2 patch series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. -----BEGIN PGP SIGNATURE----- iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF Ee93qSSLR1BkNdDw+931Pu0mXfbnBw== =Pn2K -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. * tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits) mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex() x86/mm: restore early initialization of high_memory for 32-bits mm/vmscan: don't try to reclaim hwpoison folio mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper cgroup: docs: add pswpin and pswpout items in cgroup v2 doc mm: vmscan: split proactive reclaim statistics from direct reclaim statistics selftests/mm: speed up split_huge_page_test selftests/mm: uffd-unit-tests support for hugepages > 2M docs/mm/damon/design: document active DAMOS filter type mm/damon: implement a new DAMOS filter type for active pages fs/dax: don't disassociate zero page entries MM documentation: add "Unaccepted" meminfo entry selftests/mm: add commentary about 9pfs bugs fork: use __vmalloc_node() for stack allocation docs/mm: Physical Memory: Populate the "Zones" section xen: balloon: update the NR_BALLOON_PAGES state hv_balloon: update the NR_BALLOON_PAGES state balloon_compaction: update the NR_BALLOON_PAGES state meminfo: add a per node counter for balloon drivers mm: remove references to folio in __memcg_kmem_uncharge_page() ...	2025-04-01 09:29:18 -07:00
Linus Torvalds	b6dde1e527	NFSD 6.15 Release Notes Neil Brown contributed more scalability improvements to NFSD's open file cache, and Jeff Layton contributed a menagerie of repairs to NFSD's NFSv4 callback / backchannel implementation. Mike Snitzer contributed a change to NFS re-export support that disables support for file locking on a re-exported NFSv4 mount. This is because NFSv4 state recovery is currently difficult if not impossible for re-exported NFS mounts. The change aims to prevent data integrity exposures after the re-export server crashes. Work continues on the evolving NFSD netlink administrative API. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.15 development cycle. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmfmpMIACgkQM2qzM29m f5f6DA/+P0YqoRg3Zk/4oWwXZWbfEOMhWFltT+D1PE2QjUfOZpiwUSFQfsfYgXO6 OFu0iDQ4g8BxBeP6Umv61qy7Cv6n4fVzIHqzymXQvymh9JzoQiXlE9/fA8nAHuiH u7kkNPRi7faBz1sMg/WpN9CHctg7STPOhhG/JrZcSFZnh87mU1i4i4bZBNz8tVnK ZWf483OUuSmJY2/bUTkwvr4GbceTKBlLWFFjiRhfAKvJBWvu4myfC0DI5QzxmsgI MJ62do7AFJP1ww2Ih9LLi2kFIt/yyInSVAgyts1CPhlJ4BfPnTSOw/i2+CuF3D/M bZYEAOjH3AqjBZmq58sIQezpD5f9/TOrTSwYwS31zl/THYE413WiW80/MDoWqo0y 9cSNkD3nJlPVLLCfF58vXLoe7wpLoN/ZbTdxoozzUWEFR5A4Jz3XP8F/Cws0cjem uWWAQMItiQpg1+RYJYfu4dg5+iN6dbgYbvzlr7buISwFNXi3Zo99MkJ4wHj9TJbL Tpjth1rWGPwwSOMT6ojKiYMq1oUzx5PuAm9Saq9oIzQAbBySmxHF/LSDz3wEuBoO MK1jzKroEmMk3fJOOAajSDLOdAbL3vfj6H/xi2IHvKnaz9yHCZNu2YGV05BBMprd hWePf69AO5Ky5Q9KuGClEtwvJ9ZR5pb4DO2dqaYu8ximu3O4vPo= =e2E2 -----END PGP SIGNATURE----- Merge tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "Neil Brown contributed more scalability improvements to NFSD's open file cache, and Jeff Layton contributed a menagerie of repairs to NFSD's NFSv4 callback / backchannel implementation. Mike Snitzer contributed a change to NFS re-export support that disables support for file locking on a re-exported NFSv4 mount. This is because NFSv4 state recovery is currently difficult if not impossible for re-exported NFS mounts. The change aims to prevent data integrity exposures after the re-export server crashes. Work continues on the evolving NFSD netlink administrative API. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.15 development cycle" * tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits) NFSD: Add a Kconfig setting to enable delegated timestamps sysctl: Fixes nsm_local_state bounds nfsd: use a long for the count in nfsd4_state_shrinker_count() nfsd: remove obsolete comment from nfs4_alloc_stid nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault() nfsd: reorganize struct nfs4_delegation for better packing nfsd: handle errors from rpc_call_async() nfsd: move cb_need_restart flag into cb_flags nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY nfsd: prevent callback tasks running concurrently nfsd: disallow file locking and delegations for NFSv4 reexport nfsd: filecache: drop the list_lru lock during lock gc scans nfsd: filecache: don't repeatedly add/remove files on the lru list nfsd: filecache: introduce NFSD_FILE_RECENT nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc() nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync() NFSD: Re-organize nfsd_file_gc_worker() nfsd: filecache: remove race handling. fs: nfs: acl: Avoid -Wflex-array-member-not-at-end warning ...	2025-03-31 17:28:17 -07:00
Linus Torvalds	5c2a430e85	Ext4 bug fixes and cleanups, including: * hardening against maliciously fuzzed file systems * backwards compatibility for the brief period when we attempted to ignore zero-width characters * avoid potentially BUG'ing if there is a file system corruption found during the file system unmount * fix free space reporting by statfs when project quotas are enabled and the free space is less than the remaining project quota Also improve performance when replaying a journal with a very large number of revoke records (applicable for Lustre volumes). -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmflfY4ACgkQ8vlZVpUN gaMx7Qf/akTELvyBZ7iPCCHh2HwayuO8qLhPNqrU0TmYMFvgwgYUPcQ3BLn8CE+/ j5UeT8XxNaLU4GJn3z+q6yW6PnNHfqZqKry9j/iPc3s1mjTslntr/xENlgu6i4Bp Q58xc7Pj45vdmP+xmYhRnJcefgsZMvB/N1SEHxwIP8bntZqsEvP9pI82r9Ouc8SA ZLQ1/K4OADmk7f3GhlPr9AtgH7O0CjlAas30h/AW77DXBQl7ZgbDsGDlgTwaGqkR jHcvfr6hLnWy+MUVGmlNZ2HY6iUgBPItWlYCP/fsrUdnc+CONyl5E17JPSl1QQtR CLYlo4xV8j1+zJ094DjhDWMKI2G7jw== =oudL -----END PGP SIGNATURE----- Merge tag 'ext4-for_linus-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Ext4 bug fixes and cleanups, including: - hardening against maliciously fuzzed file systems - backwards compatibility for the brief period when we attempted to ignore zero-width characters - avoid potentially BUG'ing if there is a file system corruption found during the file system unmount - fix free space reporting by statfs when project quotas are enabled and the free space is less than the remaining project quota Also improve performance when replaying a journal with a very large number of revoke records (applicable for Lustre volumes)" * tag 'ext4-for_linus-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (71 commits) ext4: fix OOB read when checking dotdot dir ext4: on a remount, only log the ro or r/w state when it has changed ext4: correct the error handle in ext4_fallocate() ext4: Make sb update interval tunable ext4: avoid journaling sb update on error if journal is destroying ext4: define ext4_journal_destroy wrapper ext4: hash: simplify kzalloc(n * 1, ...) to kzalloc(n, ...) jbd2: add a missing data flush during file and fs synchronization ext4: don't over-report free space or inodes in statvfs ext4: clear DISCARD flag if device does not support discard jbd2: remove jbd2_journal_unfile_buffer() ext4: reorder capability check last ext4: update the comment about mb_optimize_scan jbd2: fix off-by-one while erasing journal ext4: remove references to bh->b_page ext4: goto right label 'out_mmap_sem' in ext4_setattr() ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all() ext4: introduce ITAIL helper jbd2: remove redundant function jbd2_journal_has_csum_v2or3_feature ext4: remove redundant function ext4_has_metadata_csum ...	2025-03-27 13:27:08 -07:00
Linus Torvalds	4a4b30ea80	bcachefs updates for 6.15 On disk format is now soft frozen: no more required/automatic are anticipated before taking off the experimental label. Major changes/features since 6.14: - Scrub - Blocksize greater than page size support - A number of "rebalance spinning and doing no work" issues have been fixed; we now check if the write allocation will succeed in bch2_data_update_init(), before kicking off the read. There's still more work to do in this area. Later we may want to add another bitset btree, like rebalance_work, to track "extents that rebalance was requested to move but couldn't", e.g. due to destination target having insufficient online devices. - We can now support scaling well into the petabyte range: latest bcachefs-tools will pick an appropriate bucket size at format time to ensure fsck can run in available memory (e.g. a server with 256GB of ram and 100PB of storage would want 16MB buckets). On disk format changes: - 1.21: cached backpointers (scalability improvement) Cached replicas now get backpointers, which means we no longer rely on incrementing bucket generation numbers to invalidate cached data: this lets us get rid of the bucket generation number garbage collection, which had to periodically rescan all extents to recompute bucket oldest_gen. Bucket generation numbers are now only used as a consistency check, but they're quite useful for that. - 1.22: stripe backpointers Stripes now have backpointers: erasure coded stripes have their own checksums, separate from the checksums for the extents they contain (and stripe checksums also cover the parity blocks). This is required for implementing scrub for stripes. - 1.23: stripe lru (scalability improvement) Persistent lru for stripes, ordered by "number of empty blocks". This is used by the stripe creation path, which depending on free space may create a new stripe out of a partially empty existing stripe instead of starting a brand new stripe. This replaces an in-memory heap, and means we no longer have to read in the stripes btree at startup. - 1.24: casefolding Case insensitive directory support, courtesy of Valve. This is an incompatible feature, to enable mount with -o version_upgrade=incompatible - 1.25: extent_flags Another incompatible feature requiring explicit opt-in to enable. This adds a flags entry to extents, and a flag bit that marks extents as poisoned. A poisoned extent is an extent that was unreadable due to checksum errors. We can't move such extents without giving them a new checksum, and we may have to move them (for e.g. copygc or device evacuate). We also don't want to delete them: in the future we'll have an API that lets userspace ignore checksum errors and attempt to deal with simple bitrot itself. Marking them as poisoned lets us continue to return the correct error to userspace on normal read calls. Other changes/features: - BCH_IOCTL_QUERY_COUNTERS: this is used by the new 'bcachefs fs top' command, which shows a live view of all internal filesystem counters. - Improved journal pipelining: we can now have 16 journal writes in flight concurrently, up from 4. We're logging significantly more to the journal than we used to with all the recent disk accounting changes and additions, so some users should see a performance increase on some workloads. - BCH_MEMBER_STATE_failed: previously, we would do no IO at all to devices marked as failed. Now we will attempt to read from them, but only if we have no better options. - New option, write_error_timeout: devices will be kicked out of the filesystem if all writes have been failing for x number of seconds. We now also kick devices out when notified by blk_holder_ops that they've gone offline. - Device option handling improvements: the discard option should now be working as expected (additionally, in -tools, all device options that can be set at format time can now be set at device add time, i.e. data_allowed, state). - We now try harder to read data after a checksum error: we'll do additional retries if necessary to a device after after it gave us data with a checksum error. - More self healing work: the full inode <-> dirent consistency checks that are currently run by fsck are now also run every time we do a lookup, meaning we'll be able to correct errors at runtime. Runtime self healing will be flipped on after the new changes have seen more testing, currently they're just checking for consistency. - KMSAN fixes: our KMSAN builds should be nearly clean now, which will put a massive dent in the syzbot dashboard. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmfhbnsACgkQE6szbY3K bnY6ew/9FXh3m71BvVpuqTYcUGzIC7gVrnkFy6n4W96v07OjSOoTNHOVVovajxc3 P9LvA77BHC4Xro3H7ORpsIurOZUc6yx18ZizzulVbQFuYa7LY/kNri4ZBtGHcRiV pIdQDLSNmwFjPA4x2S1qTFSF1c586lad+UNQiLam5ophBwQPEO6vG51ZEHa4wld9 +OhWTDYfrvij4D3Lt1ppvhuDP+PQBjhu/QFc0bGjHvKOjfV6sw9XU91sCYKOJIzd qzpsiQd5sepnX717Br3f5SLdxMq2lJYvRp9756vltOCaMBvJYJtHqtXCglHQEkFw yjhmPjk4r3VlKTF8K+wEJfAHwbC2kEn7csJNbt0+Nko5PPtFyrb8ok6QUbHCKscL L0VMnzaXHVqvG2VgYa31temfdz7HM/zHjQ8Al3eQPaqTHIoTXIBQxOQSea/apVMt TIlastvLoHfR8W7+LrwOmTjnBJGCJ+MrdcJzJDVk2tQmmcMA0boeZvl4aSklFuyB zNN5fxp0VMsxNyIHLJjQ3UcwVqHXC5w+f5H1ByQLUyQh+m/xaAaz7S+BTVdVbFPa 1Z1xDuvuHOTnjIOamnOD1l36afJnhq5RciPCXCNtQSB819mc+AfNGQNQTVNOTReC iTiUCcNxu0/DIPlPmeJzAlukVJUgz+/knOI/6zPs3eI7/o88ZGg= =k3cV -----END PGP SIGNATURE----- Merge tag 'bcachefs-2025-03-24' of git://evilpiepirate.org/bcachefs Pull bcachefs updates from Kent Overstreet: "On disk format is now soft frozen: no more required/automatic are anticipated before taking off the experimental label. Major changes/features since 6.14: - Scrub - Blocksize greater than page size support - A number of "rebalance spinning and doing no work" issues have been fixed; we now check if the write allocation will succeed in bch2_data_update_init(), before kicking off the read. There's still more work to do in this area. Later we may want to add another bitset btree, like rebalance_work, to track "extents that rebalance was requested to move but couldn't", e.g. due to destination target having insufficient online devices. - We can now support scaling well into the petabyte range: latest bcachefs-tools will pick an appropriate bucket size at format time to ensure fsck can run in available memory (e.g. a server with 256GB of ram and 100PB of storage would want 16MB buckets). On disk format changes: - 1.21: cached backpointers (scalability improvement) Cached replicas now get backpointers, which means we no longer rely on incrementing bucket generation numbers to invalidate cached data: this lets us get rid of the bucket generation number garbage collection, which had to periodically rescan all extents to recompute bucket oldest_gen. Bucket generation numbers are now only used as a consistency check, but they're quite useful for that. - 1.22: stripe backpointers Stripes now have backpointers: erasure coded stripes have their own checksums, separate from the checksums for the extents they contain (and stripe checksums also cover the parity blocks). This is required for implementing scrub for stripes. - 1.23: stripe lru (scalability improvement) Persistent lru for stripes, ordered by "number of empty blocks". This is used by the stripe creation path, which depending on free space may create a new stripe out of a partially empty existing stripe instead of starting a brand new stripe. This replaces an in-memory heap, and means we no longer have to read in the stripes btree at startup. - 1.24: casefolding Case insensitive directory support, courtesy of Valve. This is an incompatible feature, to enable mount with -o version_upgrade=incompatible - 1.25: extent_flags Another incompatible feature requiring explicit opt-in to enable. This adds a flags entry to extents, and a flag bit that marks extents as poisoned. A poisoned extent is an extent that was unreadable due to checksum errors. We can't move such extents without giving them a new checksum, and we may have to move them (for e.g. copygc or device evacuate). We also don't want to delete them: in the future we'll have an API that lets userspace ignore checksum errors and attempt to deal with simple bitrot itself. Marking them as poisoned lets us continue to return the correct error to userspace on normal read calls. Other changes/features: - BCH_IOCTL_QUERY_COUNTERS: this is used by the new 'bcachefs fs top' command, which shows a live view of all internal filesystem counters. - Improved journal pipelining: we can now have 16 journal writes in flight concurrently, up from 4. We're logging significantly more to the journal than we used to with all the recent disk accounting changes and additions, so some users should see a performance increase on some workloads. - BCH_MEMBER_STATE_failed: previously, we would do no IO at all to devices marked as failed. Now we will attempt to read from them, but only if we have no better options. - New option, write_error_timeout: devices will be kicked out of the filesystem if all writes have been failing for x number of seconds. We now also kick devices out when notified by blk_holder_ops that they've gone offline. - Device option handling improvements: the discard option should now be working as expected (additionally, in -tools, all device options that can be set at format time can now be set at device add time, i.e. data_allowed, state). - We now try harder to read data after a checksum error: we'll do additional retries if necessary to a device after after it gave us data with a checksum error. - More self healing work: the full inode <-> dirent consistency checks that are currently run by fsck are now also run every time we do a lookup, meaning we'll be able to correct errors at runtime. Runtime self healing will be flipped on after the new changes have seen more testing, currently they're just checking for consistency. - KMSAN fixes: our KMSAN builds should be nearly clean now, which will put a massive dent in the syzbot dashboard" * tag 'bcachefs-2025-03-24' of git://evilpiepirate.org/bcachefs: (180 commits) bcachefs: Kill unnecessary bch2_dev_usage_read() bcachefs: btree node write errors now print btree node bcachefs: Fix race in print_chain() bcachefs: btree_trans_restart_foreign_task() bcachefs: bch2_disk_accounting_mod2() bcachefs: zero init journal bios bcachefs: Eliminate padding in move_bucket_key bcachefs: Fix a KMSAN splat in btree_update_nodes_written() bcachefs: kmsan asserts bcachefs: Fix kmsan warnings in bch2_extent_crc_pack() bcachefs: Disable asm memcpys when kmsan enabled bcachefs: Handle backpointers with unknown data types bcachefs: Count BCH_DATA_parity backpointers correctly bcachefs: Run bch2_check_dirent_target() at lookup time bcachefs: Refactor bch2_check_dirent_target() bcachefs: Move bch2_check_dirent_target() to namei.c bcachefs: fs-common.c -> namei.c bcachefs: EIO cleanup bcachefs: bch2_write_prep_encoded_data() now returns errcode bcachefs: Simplify bch2_write_op_error() ...	2025-03-27 13:20:07 -07:00
Linus Torvalds	81d8e5e213	f2fs-for-6.15-rc1 In this round, there are three major updates: 1) folio conversion, 2) refactor for mount API conversion, 3) some performance improvement such as direct IO, checkpoint speed, and IO priority hints. For stability, there are patches which add more sanity checks and fixes some major issues like i_size in atomic write operations and write pointer recovery in zoned devices. Enhancement: - huge folio converion work by Matthew Wilcox - clean up for mount API conversion by Eric Sandeen - improve direct IO speed in the overwrite case - add some sanity check on node consistency - set highest IO priority for checkpoint thread - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages - add ioctl to get IO priority hint - add carve_out sysfs node for fsstat Bug fix: - disable nat_bits during umount to avoid potential nat entry corruption - fix missing i_size update on atomic writes - fix missing discard for active segments - fix running out of free segments - fix out-of-bounds access in f2fs_truncate_inode_blocks() - call f2fs_recover_quota_end() correctly - fix potential deadloop in prepare_compress_overwrite() - fix the missing write pointer correction for zoned device - fix to avoid panic once fallocation fails for pinfile - don't retry IO for corrupted data scenario There are many other clean up patches and minor bug fixes as usual. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmfhjuYACgkQQBSofoJI UNJF+hAAip0Sf6bXwt43KR3lmbXg/YBTtPDACOs375Pn8pfRVsAPIAf5e1AnlBET rA0KpqUrEZHXenQFF9n4LIoHYqfuUw3EMempuvp4qgQcx15Kaajw+EGWVLYstNy2 9dELc9DA55f/i1uvHezGfQFy6hMfXasf+tkaYk0z0ZYDStYboLgkVAY+F869q1Dl D16Y7Lna12h4eCxDdssIPDsLjFH/2LDn7SmhXsnpZxwK0Zx8JSo83WxXHnzXHxz4 vUkukInKgqcBDvf6ufW/YF3/tqSs20XEXNK3cI1vyHx1dwij6Us+G0n0WJHL30QE zsliecR0X28JP1rJ3ldCjr+4Kxm6/u/Uwpinm2Fm1jL67UY4TZcaARBE8k20I6ND j/L1+sXrIdZ1aILM9/bwCjgXiVFdZbvlfGfpTj1duAkQgRd+/s9cjPlo5j5HTad5 XJmzJz6YOaUNarhP/E31Z9SV9M2kEcmoDOTxKBg6ZcMWXZ27B6Z0ag92prd0GWk6 rWDuVj0eP/LDY1QedbHRPbg1D84jVgcnldfPaf9ptln993skJGdgS0dkegTqMR0L H8RgpWOzWZI53gnQdePdej8diHmD8uRTrLf/oABm//GzTHC5BdwVpArOl1DCuLFF YkMtVEicgnWRmL3PgODAdTJXaDi3uGvT116i+lfMlUn9p+t93r0= =E+TE -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, there are three major updates: (1) folio conversion, (2) refactoring for mount API conversion, (3) some performance improvement such as direct IO, checkpoint speed, and IO priority hints. For stability, there are patches which add more sanity checks and fixes some major issues like i_size in atomic write operations and write pointer recovery in zoned devices. Enhancements: - huge folio converion work by Matthew Wilcox - clean up for mount API conversion by Eric Sandeen - improve direct IO speed in the overwrite case - add some sanity check on node consistency - set highest IO priority for checkpoint thread - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages - add ioctl to get IO priority hint - add carve_out sysfs node for fsstat Bug fixes: - disable nat_bits during umount to avoid potential nat entry corruption - fix missing i_size update on atomic writes - fix missing discard for active segments - fix running out of free segments - fix out-of-bounds access in f2fs_truncate_inode_blocks() - call f2fs_recover_quota_end() correctly - fix potential deadloop in prepare_compress_overwrite() - fix the missing write pointer correction for zoned device - fix to avoid panic once fallocation fails for pinfile - don't retry IO for corrupted data scenario There are many other clean up patches and minor bug fixes as usual" * tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits) f2fs: fix missing discard for active segments f2fs: optimize f2fs DIO overwrites f2fs: fix to avoid atomicity corruption of atomic file f2fs: pass sbi rather than sb to parse_options() f2fs: pass sbi rather than sb to quota qf_name helpers f2fs: defer readonly check vs norecovery f2fs: Pass sbi rather than sb to f2fs_set_test_dummy_encryption f2fs: make LAZYTIME a mount option flag f2fs: make INLINECRYPT a mount option flag f2fs: factor out an f2fs_default_check function f2fs: consolidate unsupported option handling errors f2fs: use f2fs_sb_has_device_alias during option parsing f2fs: add carve_out sysfs node f2fs: fix to avoid running out of free segments f2fs: Remove f2fs_write_node_page() f2fs: Remove f2fs_write_meta_page() f2fs: Remove f2fs_write_data_page() f2fs: Remove check for ->writepage Revert "f2fs: rebuild nat_bits during umount" f2fs: fix to avoid accessing uninitialized curseg ...	2025-03-27 12:55:54 -07:00
Linus Torvalds	a86c6d0b2a	fscrypt updates for 6.15 A fix for an issue where CONFIG_FS_ENCRYPTION could be enabled without some of its dependencies, and a small documentation update. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZ+CBShQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKxj0AQC4OlE/HbIY2w3zO8Az9WEF9+4Dz9od EpxTwO/fk9PC+gD/Z7r8J3xn+ykcy9QcZW+Qucd64k+rbmvqj36gXce6hAI= =r+lX -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "A fix for an issue where CONFIG_FS_ENCRYPTION could be enabled without some of its dependencies, and a small documentation update" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: mention init_on_free instead of page poisoning fscrypt: drop obsolete recommendation to enable optimized ChaCha20 Revert "fscrypt: relax Kconfig dependencies for crypto API algorithms"	2025-03-25 18:31:38 -07:00
Linus Torvalds	bdab2977e4	fsverity updates for 6.15 A fix for an issue where CONFIG_FS_VERITY could be enabled without some of its dependencies, and a small documentation update. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZ+CBZBQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK5WSAP9sihsazS5s2FwkY2ukwxuPBm448O6R w457RM/j2Dz6oQEAhQF4oKU6mWJk0dOFMoBJCEUUJinb1diYTE0BMtSbXAk= =BwSf -----END PGP SIGNATURE----- Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux Pull fsverity updates from Eric Biggers: "A fix for an issue where CONFIG_FS_VERITY could be enabled without some of its dependencies, and a small documentation update" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: Revert "fsverity: relax build time dependency on CRYPTO_SHA256" Documentation: add a usecase for FS_IOC_READ_VERITY_METADATA	2025-03-25 18:30:23 -07:00
Linus Torvalds	f81c2b8150	It has been a reasonably busy cycle for docs... - Significant changes throughout the tree to bring Python code up to current standards and raise the minimum Python required to 3.9. Much of this is preparatory to replacing the ancient Perl scripts/kernel-doc horror with a slightly less horrifying Python implementation, expected for 6.16. - Update the minimum Sphinx required to 3.4.3, allowing us to remove a bunch of older compatibility code. - Rework and improve the generation of the ABI documentation. (All of the above done by Mauro) - Lots of translation updates. Alex Shi and Yanteng Si are taking on responsibility for the Chinese translations going forward; that work will still get to you via docs-next - Try to standardize the format for indicating a developer's affiliation in commit tags. - Clarify the TAB's role in CoC enforcement actions. - Try to spell out the rules for when a commit tag can name another developer without their explicit permission. Plus lots of other typo fixes and updates. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmfccxwPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YoYcH/jL/nS8YAiJ3awF5PH5tR3m5ddt9l+fKXWJx PB3KcHtDORbWltTA+Tvo2aP1jxGY9wqsIIvl+nvjJyUcfd72g4HNfTDUDXwP3OFU wTkaEAQp3n/hqnLXtJ2AzV3Ir5cIfEL2d7F6QsN1Gnof8iu2OuMk5iMeb0iexUX6 FYjJq+jknh30VdAp2hxHy8q17R7h7PySh5OsjeAYJJroLv60n3DwQgnzHjXC/FT2 Qq1UuEzlSpRoso2o2NwVTND6OVW081umo6YrioqD7ZC2G2fhRgLFJJtJGXDNcyUl gQv9xLSaTD97V4zaWPm28ObNBpY/GnAd4hMjB17wAH5xUfVS5Aw= =Gvdp -----END PGP SIGNATURE----- Merge tag 'docs-6.15' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "It has been a reasonably busy cycle for docs... - Significant changes throughout the tree to bring Python code up to current standards and raise the minimum Python required to 3.9 Much of this is preparatory to replacing the ancient Perl scripts/kernel-doc horror with a slightly less horrifying Python implementation, expected for 6.16 - Update the minimum Sphinx required to 3.4.3, allowing us to remove a bunch of older compatibility code - Rework and improve the generation of the ABI documentation (All of the above done by Mauro) - Lots of translation updates. Alex Shi and Yanteng Si are taking on responsibility for the Chinese translations going forward; that work will still get to you via docs-next - Try to standardize the format for indicating a developer's affiliation in commit tags - Clarify the TAB's role in CoC enforcement actions - Try to spell out the rules for when a commit tag can name another developer without their explicit permission Plus lots of other typo fixes and updates" * tag 'docs-6.15' of git://git.lwn.net/linux: (98 commits) docs/zh_CN: fix spelling mistake docs/Chinese: change the disclaimer words docs/zh_CN: Add snp-tdx-threat-model index Chinese translation docs: driver-api: firmware: clarify userspace requirements docs: clarify rules wrt tagging other people docs: Remove outdated highuid.rst documentation Documentation: dma-buf: heaps: Add heap name definitions docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for cross-ref of README docs: Correct installation instruction Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst Documentation/CoC: Spell out the TAB role in enforcement decisions Documentation: ocxl.rst: Update consortium site scripts: get_feat.pl: substitute s390x with s390 scripts/kernel-doc: drop dead code for Wcontents_before_sections scripts/kernel-doc: don't add not needed new lines docs: driver-api/infiniband.rst: fix Kerneldoc markup drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup drivers: media: intel-ipu3.h: fix identation on a kernel-doc markup include/asm-generic/io.h: fix kerneldoc markup Docs/arch/arm64: Fix spelling in amu.rst ...	2025-03-24 18:42:27 -07:00
Linus Torvalds	aaca83f7b1	vfs-6.15-rc1.sysv -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rswAKCRCRxhvAZXjc ogkpAQD7/kR7QiOQTdDztDtAavZELOE3p4pUZfAcg8XFfH8QKQD/cgDpxgBbyvSQ VBInBrv1vPeSvlPWvFkyqT/n8eiR9AQ= =KQVR -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.sysv' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs sysv removal from Christian Brauner: "This removes the sysv filesystem. We've discussed this various times. It's time to try" * tag 'vfs-6.15-rc1.sysv' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: sysv: Remove the filesystem	2025-03-24 11:35:53 -07:00
Linus Torvalds	26d8e43079	vfs-6.15-rc1.async.dir -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rNwAKCRCRxhvAZXjc onBJAP9Z8Ywmlb5KQ1E3HvDmkwyY6yOSyZ9/CmbzrkCJ8ywYkQD/d9/xt0EP/O/q N8YtzXArHWt7u0YbcVpy9WK3F72BdwU= =VJgY -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs async dir updates from Christian Brauner: "This contains cleanups that fell out of the work from async directory handling: - Change kern_path_locked() and user_path_locked_at() to never return a negative dentry. This simplifies the usability of these helpers in various places - Drop d_exact_alias() from the remaining place in NFS where it is still used. This also allows us to drop the d_exact_alias() helper completely - Drop an unnecessary call to fh_update() from nfsd_create_locked() - Change i_op->mkdir() to return a struct dentry Change vfs_mkdir() to return a dentry provided by the filesystems which is hashed and positive. This allows us to reduce the number of cases where the resulting dentry is not positive to very few cases. The code in these places becomes simpler and easier to understand. - Repack DENTRY_* and LOOKUP_* flags" * tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: fix inline emphasis warning VFS: Change vfs_mkdir() to return the dentry. nfs: change mkdir inode_operation to return alternate dentry if needed. fuse: return correct dentry for ->mkdir ceph: return the correct dentry on mkdir hostfs: store inode in dentry after mkdir if possible. Change inode_operations.mkdir to return struct dentry * nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked() nfs/vfs: discard d_exact_alias() VFS: add common error checks to lookup_one_qstr_excl() VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry VFS: repack LOOKUP_ bit flags. VFS: repack DENTRY_ flags.	2025-03-24 10:47:14 -07:00
Linus Torvalds	804382d59b	vfs-6.15-rc1.overlayfs -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rUAAKCRCRxhvAZXjc opI3AP9ws4S/JXOjxNKoTYmNM2nZ8+r1v8tUxbLIiqdvzx9PygD/V1ZjXtn6lwZr OK8d5Y8UnlPZTlBF8D61op3AjnXYzws= =KV4p -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs overlayfs updates from Christian Brauner: "Currently overlayfs uses the mounter's credentials for its override_creds() calls. That provides a consistent permission model. This patches allows a caller to instruct overlayfs to use its credentials instead. The caller must be located in the same user namespace hierarchy as the user namespace the overlayfs instance will be mounted in. This provides a consistent and simple security model. With this it is possible to e.g., mount an overlayfs instance where the mounter must have CAP_SYS_ADMIN but the credentials used for override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage of custom fs{g,u}id different from the callers and other tweaks" * tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/ovl: add third selftest for "override_creds" selftests/ovl: add second selftest for "override_creds" selftests/filesystems: add utils.{c,h} selftests/ovl: add first selftest for "override_creds" ovl: allow to specify override credentials	2025-03-24 10:37:40 -07:00
Linus Torvalds	0ec0d4ecdd	vfs-6.15-rc1.iomap -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rGwAKCRCRxhvAZXjc okVnAP9VgaYjWGzaeep/dLzWtu7C/Cg5Swl1P84Vj+SJ+hFPEAD/auzWTV0D0Ko5 5GLyUsLZehfeVDOSRqmiyt1po8iVsQo= =ANks -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iomap updates from Christian Brauner: - Allow the filesystem to submit the writeback bios. - Allow the filsystem to track completions on a per-bio bases instead of the entire I/O. - Change writeback_ops so that ->submit_bio can be done by the filesystem. - A new ANON_WRITE flag for writes that don't have a block number assigned to them at the iomap level leaving the filesystem to do that work in the submission handler. - Incremental iterator advance The folio_batch support for zero range where the filesystem provides a batch of folios to process that might not be logically continguous requires more flexibility than the current offset based iteration currently offers. Update all iomap operations to advance the iterator within the operation and thus remove the need to advance from the core iomap iterator. - Make buffered writes work with RWF_DONTCACHE If RWF_DONTCACHE is set for a write, mark the folios being written as uncached. On writeback completion the pages will be dropped. - Introduce infrastructure for large atomic writes This will eventually be used by xfs and ext4. * tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits) iomap: rework IOMAP atomic flags iomap: comment on atomic write checks in iomap_dio_bio_iter() iomap: inline iomap_dio_bio_opflags() iomap: fix inline data on buffered read iomap: Lift blocksize restriction on atomic writes iomap: Support SW-based atomic writes iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW xfs: flag as supporting FOP_DONTCACHE iomap: make buffered writes work with RWF_DONTCACHE iomap: introduce a full map advance helper iomap: rename iomap_iter processed field to status iomap: remove unnecessary advance from iomap_iter() dax: advance the iomap_iter on pte and pmd faults dax: advance the iomap_iter on dedupe range dax: advance the iomap_iter on unshare range dax: advance the iomap_iter on zero range dax: push advance down into dax_iomap_iter() for read and write dax: advance the iomap_iter in the read/write path iomap: convert misc simple ops to incremental advance iomap: advance the iter on direct I/O ...	2025-03-24 10:19:31 -07:00
Linus Torvalds	99c21beaab	vfs-6.15-rc1.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90p4AAKCRCRxhvAZXjc ojMIAP9atkG3u7+490+NGWLdulQlaHnD51Owa9MiW87UfKpsTQEArwi/NrJqXJNT PFQ2xIa5TxG+9haChR89w3kjZ6b/hgs= =iDkx -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Add CONFIG_DEBUG_VFS infrastucture: - Catch invalid modes in open - Use the new debug macros in inode_set_cached_link() - Use debug-only asserts around fd allocation and install - Place f_ref to 3rd cache line in struct file to resolve false sharing Cleanups: - Start using anon_inode_getfile_fmode() helper in various places - Don't take f_lock during SEEK_CUR if exclusion is guaranteed by f_pos_lock - Add unlikely() to kcmp() - Remove legacy ->remount_fs method from ecryptfs after port to the new mount api - Remove invalidate_inodes() in favour of evict_inodes() - Simplify ep_busy_loopER by removing unused argument - Avoid mmap sem relocks when coredumping with many missing pages - Inline getname() - Inline new_inode_pseudo() and de-staticize alloc_inode() - Dodge an atomic in putname if ref == 1 - Consistently deref the files table with rcu_dereference_raw() - Dedup handling of struct filename init and refcounts bumps - Use wq_has_sleeper() in end_dir_add() - Drop the lock trip around I_NEW wake up in evict() - Load the ->i_sb pointer once in inode_sb_list_{add,del} - Predict not reaching the limit in alloc_empty_file() - Tidy up do_sys_openat2() with likely/unlikely - Call inode_sb_list_add() outside of inode hash lock - Sort out fd allocation vs dup2 race commentary - Turn page_offset() into a wrapper around folio_pos() - Remove locking in exportfs around ->get_parent() call - try_lookup_one_len() does not need any locks in autofs - Fix return type of several functions from long to int in open - Fix return type of several functions from long to int in ioctls Fixes: - Fix watch queue accounting mismatch" * tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) fs: sort out fd allocation vs dup2 race commentary, take 2 fs: call inode_sb_list_add() outside of inode hash lock fs: tidy up do_sys_openat2() with likely/unlikely fs: predict not reaching the limit in alloc_empty_file() fs: load the ->i_sb pointer once in inode_sb_list_{add,del} fs: drop the lock trip around I_NEW wake up in evict() fs: use wq_has_sleeper() in end_dir_add() VFS/autofs: try_lookup_one_len() does not need any locks fs: dedup handling of struct filename init and refcounts bumps fs: consistently deref the files table with rcu_dereference_raw() exportfs: remove locking around ->get_parent() call. fs: use debug-only asserts around fd allocation and install fs: dodge an atomic in putname if ref == 1 vfs: Remove invalidate_inodes() ecryptfs: remove NULL remount_fs from super_operations watch_queue: fix pipe accounting mismatch fs: place f_ref to 3rd cache line in struct file to resolve false sharing epoll: simplify ep_busy_loop by removing always 0 argument fs: Turn page_offset() into a wrapper around folio_pos() kcmp: improve performance adding an unlikely hint to task comparisons ...	2025-03-24 09:13:50 -07:00
Tuomas Ahola	34ceb69edd	Documentation/fs/9p: fix broken link In `b529c06f9d` (Update the documentation referencing Plan 9 from User Space., 2020-04-26), another instance of the link was left unfixed. Fix that as well. Signed-off-by: Tuomas Ahola <taahol@utu.fi> Message-ID: <20250322153639.4917-1-taahol@utu.fi> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-03-23 06:20:36 +09:00
Nico Pache	0bfd458685	MM documentation: add "Unaccepted" meminfo entry Commit `dcdfdd40fa` ("mm: Add support for unaccepted memory") added a entry to meminfo but did not document it in the proc.rst file. This counter tracks the amount of "Unaccepted" guest memory for some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP. Add the missing entry in the documentation. Link: https://lkml.kernel.org/r/20250317230403.79632-1-npache@redhat.com Signed-off-by: Nico Pache <npache@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-21 22:03:14 -07:00
Nico Pache	835de37603	meminfo: add a per node counter for balloon drivers Patch series "track memory used by balloon drivers", v2. This series introduces a way to track memory used by balloon drivers. Add a NR_BALLOON_PAGES counter to track how many pages are reclaimed by the balloon drivers. First add the accounting, then updates the balloon drivers (virtio, Hyper-V, VMware, Pseries-cmm, and Xen) to maintain this counter. The virtio, Vmware, and pseries-cmm balloon drivers utilize the balloon_compaction interface to allocate and free balloon pages. Other balloon drivers will have to maintain this counter manually. This makes the information visible in memory reporting interfaces like /proc/meminfo, show_mem, and OOM reporting. This provides admins visibility into their VM balloon sizes without requiring different virtualization tooling. Furthermore, this information is helpful when debugging an OOM inside a VM. This patch (of 4): Add NR_BALLOON_PAGES counter to track memory used by balloon drivers and expose it through /proc/meminfo and other memory reporting interfaces. [npache@redhat.com: document Balloon Meminfo entry] Link: https://lkml.kernel.org/r/a0315ccf-f244-460e-8643-fd7388724fe5@redhat.com Link: https://lkml.kernel.org/r/20250314213757.244258-1-npache@redhat.com Link: https://lkml.kernel.org/r/20250314213757.244258-2-npache@redhat.com Signed-off-by: Nico Pache <npache@redhat.com> Cc: Alexander Atanasov <alexander.atanasov@virtuozzo.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Juegren Gross <jgross@suse.com> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-21 22:03:13 -07:00
John Garry	370a6de765	iomap: rework IOMAP atomic flags Flag IOMAP_ATOMIC_SW is not really required. The idea of having this flag is that the FS ->iomap_begin callback could check if this flag is set to decide whether to do a SW (FS-based) atomic write. But the FS can set which ->iomap_begin callback it wants when deciding to do a FS-based atomic write. Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, as the block driver can use SW-methods to emulate an atomic write. So change back to IOMAP_ATOMIC. The ->iomap_begin callback needs though to indicate to iomap core that REQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that. These changes were suggested by Christoph Hellwig and Dave Chinner. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250320120250.4087011-4-john.g.garry@oracle.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-20 15:16:03 +01:00
David Hildenbrand	749492229e	mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT) Everything is in place to stop using the per-page mapcounts in large folios: the mapcount of tail pages will always be logically 0 (-1 value), just like it currently is for hugetlb folios already, and the page mapcount of the head page is either 0 (-1 value) or contains a page type (e.g., hugetlb). Maintaining _nr_pages_mapped without per-page mapcounts is impossible, so that one also has to go with CONFIG_NO_PAGE_MAPCOUNT. There are two remaining implications: (1) Per-node, per-cgroup and per-lruvec stats of "NR_ANON_MAPPED" ("mapped anonymous memory") and "NR_FILE_MAPPED" ("mapped file memory"): As soon as any page of the folio is mapped -- folio_mapped() -- we now account the complete folio as mapped. Once the last page is unmapped -- !folio_mapped() -- we account the complete folio as unmapped. This implies that ... * "AnonPages" and "Mapped" in /proc/meminfo and /sys/devices/system/node//meminfo cgroup v2: "anon" and "file_mapped" in "memory.stat" and "memory.numa_stat" * cgroup v1: "rss" and "mapped_file" in "memory.stat" and "memory.numa_stat ... can now appear higher than before. But note that these folios do consume that memory, simply not all pages are actually currently mapped. It's worth nothing that other accounting in the kernel (esp. cgroup charging on allocation) is not affected by this change. [why oh why is "anon" called "rss" in cgroup v1] (2) Detecting partial mappings Detecting whether anon THPs are partially mapped gets a bit more unreliable. As long as a single MM maps such a large folio ("exclusively mapped"), we can reliably detect it. Especially before fork() / after a short-lived child process quit, we will detect partial mappings reliably, which is the common case. In essence, if the average per-page mapcount in an anon THP is < 1, we know for sure that we have a partial mapping. However, as soon as multiple MMs are involved, we might miss detecting partial mappings: this might be relevant with long-lived child processes. If we have a fully-mapped anon folio before fork(), once our child processes and our parent all unmap (zap/COW) the same pages (but not the complete folio), we might not detect the partial mapping. However, once the child processes quit we would detect the partial mapping. How relevant this case is in practice remains to be seen. Swapout/migration will likely mitigate this. In the future, RMAP walkers could check for that for that case (e.g., when collecting access bits during reclaim) and simply flag them for deferred-splitting. Link: https://lkml.kernel.org/r/20250303163014.1128035-21-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andy Lutomirks^H^Hski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michal Koutn <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: tejun heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-17 22:06:48 -07:00
David Hildenbrand	6dd55dd1c5	fs/proc/task_mmu: remove per-page mapcount dependency for smaps/smaps_rollup (CONFIG_NO_PAGE_MAPCOUNT) Let's implement an alternative when per-page mapcounts in large folios are no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT. When computing the output for smaps / smaps_rollups, in particular when calculating the USS (Unique Set Size) and the PSS (Proportional Set Size), we still rely on per-page mapcounts. To determine private vs. shared, we'll use folio_likely_mapped_shared(), similar to how we handle PM_MMAP_EXCLUSIVE. Similarly, we might now under-estimate the USS and count pages towards "shared" that are actually "private" ("exclusively mapped"). When calculating the PSS, we'll now also use the average per-page mapcount for large folios: this can result in both, an over-estimation and an under-estimation of the PSS. The difference is not expected to matter much in practice, but we'll have to learn as we go. We can now provide folio_precise_page_mapcount() only with CONFIG_PAGE_MAPCOUNT, and remove one of the last users of per-page mapcounts when CONFIG_NO_PAGE_MAPCOUNT is enabled. Document the new behavior. Link: https://lkml.kernel.org/r/20250303163014.1128035-20-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andy Lutomirks^H^Hski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michal Koutn <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: tejun heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-17 22:06:47 -07:00
David Hildenbrand	7a34ae1449	fs/proc/task_mmu: remove per-page mapcount dependency for "mapmax" (CONFIG_NO_PAGE_MAPCOUNT) Let's implement an alternative when per-page mapcounts in large folios are no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT. For calculating "mapmax", we now use the average per-page mapcount in a large folio instead of the per-page mapcount. For hugetlb folios and folios that are not partially mapped into MMs, there is no change. Likely, this change will not matter much in practice, and an alternative might be to simple remove this stat with CONFIG_NO_PAGE_MAPCOUNT. However, there might be value to it, so let's keep it like that and document the behavior. Link: https://lkml.kernel.org/r/20250303163014.1128035-19-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andy Lutomirks^H^Hski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michal Koutn <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: tejun heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-17 22:06:47 -07:00
Dan Williams	653d7825c1	dcssblk: mark DAX broken, remove FS_DAX_LIMITED support The dcssblk driver has long needed special case supoprt to enable limited dax operation, so called CONFIG_FS_DAX_LIMITED. This mode works around the incomplete support for ZONE_DEVICE on s390 by forgoing the ability of dax-mapped pages to support GUP. Now, pending cleanups to fsdax that fix its reference counting [1] depend on the ability of all dax drivers to supply ZONE_DEVICE pages. To allow that work to move forward, dax support needs to be paused for dcssblk until ZONE_DEVICE support arrives. That work has been known for a few years [2], and the removal of "pte_devmap" requirements [3] makes the conversion easier. For now, place the support behind CONFIG_BROKEN, and remove PFN_SPECIAL (dcssblk was the only user). Link: http://lore.kernel.org/cover.9f0e45d52f5cff58807831b6b867084d0b14b61c.1725941415.git-series.apopple@nvidia.com [1] Link: http://lore.kernel.org/20210820210318.187742e8@thinkpad/ [2] Link: http://lore.kernel.org/4511465a4f8429f45e2ac70d2e65dc5e1df1eb47.1725941415.git-series.apopple@nvidia.com [3] Link: https://lkml.kernel.org/r/33eef2379c0d240f40cc15453fad2df1a4ae34c8.1740713401.git-series.apopple@nvidia.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: Asahi Lina <lina@asahilina.net> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: linmiaohe <linmiaohe@huawei.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-17 22:06:40 -07:00
Andrii Nakryiko	87ad827a27	docs,procfs: document /proc/PID/* access permission checks Add a paragraph explaining what sort of capabilities a process would need to read procfs data for some other process. Also mention that reading data for its own process doesn't require any extra permissions. Link: https://lkml.kernel.org/r/20250129001747.759990-1-andrii@kernel.org Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:30:47 -07:00
Bagas Sanjaya	a42d685ff2	Documentation: bcachefs: SubmittingPatches: Convert footnotes to reST syntax Footnotes list are outputted in htmldocs simply as long-running paragraph instead. Use reST numbered footnotes syntax for the job. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:15 -04:00
Bagas Sanjaya	76d6305dca	Documentation: bcachefs: SubmittingPatches: Demote section headings SubmttingPatches.rst has 4 section headings, all under the same heading levels. In absence of title headings, these section headings are all ended up as title headings in the docs output, which also affect the index toctree (increasing titles to 6 from the original 2) due to :numbered: option. Demote second-to-last section headings, making "Submitting patches to bcachefs" as title heading. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:15 -04:00
Bagas Sanjaya	93422e0b33	Documentation: bcachefs: Split index toctree bcachefs subsystem currently has 4 docs: two are development notes and the rest are actual filesystem docs. These two groups are clearly distinct and can be organized. Split the toctree into two, one for each docs group. While at it, also reduce :maxdepth: so that only title headings are listed in the toctrees. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:15 -04:00
Bagas Sanjaya	7442ef7082	Documentation: bcachefs: Add casefolding toctree entry Sphinx reports htmldocs toctree warning: Documentation/filesystems/bcachefs/casefolding.rst: WARNING: document isn't included in any toctree Fix the warning by adding casefolding documentation entry to bcachefs toctree. Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250221161728.32739f85@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:15 -04:00
Bagas Sanjaya	47d4100b15	Documentation: bcachefs: casefolding: Use bullet list for dirent structure The doc lists dirent structure for both regular and casefolded names, yet it is written (and rendered) as long paragraph instead. Write the structure list as bullet list. Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:15 -04:00
Bagas Sanjaya	210997859a	Documentation: bcachefs: casefolding: Fix dentry/dcache considerations section Sphinx reports htmldocs warnings on dentry/dcache section: Documentation/filesystems/bcachefs/casefolding.rst:75: WARNING: Title underline too short. dentry/dcache considerations --------- [docutils] Documentation/filesystems/bcachefs/casefolding.rst:84: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils] Fix the section by: * Extending the section underline to match the section title length; * Separating problem list from surrounding paragraphs. Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250221161911.2d16138b@canb.auug.org.au/ Closes: https://lore.kernel.org/linux-next/20250221162135.79be0147@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:15 -04:00
Bagas Sanjaya	82b5666912	Documentation: bcachefs: casefolding: Do not italicize NUL Sphinx reports htmldocs warning: Documentation/filesystems/bcachefs/casefolding.rst:36: WARNING: Inline interpreted text or phrase reference start-string without end-string. [docutils] That's because NUL word is italicized but it is written in plural form instead (`NUL`s). Sphinx, however, doesn't tip over when the italicized word in this fashion is followed by punctuation instead. Do not italicize the word to keep Sphinx happy. Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250221162135.79be0147@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:15 -04:00
Joshua Ashton	d37c14ac6f	bcachefs: bcachefs_metadata_version_casefolding This patch implements support for case-insensitive file name lookups in bcachefs. The implementation uses the same UTF-8 lowering and normalization that ext4 and f2fs is using. More information is provided in Documentation/bcachefs/casefolding.rst Compatibility notes: This uses the new versioning scheme for incompatible features where an incompatible feature is tied to a version number: the superblock says "we may use incompat features up to x" and "incompat features up to x are in use", disallowing mounting by previous versions. Additionally, and old style incompat feature bit is used, so that kernels without utf8 casefolding support know if casefolding specifically is in use and they're allowed to mount. Signed-off-by: Joshua Ashton <joshua@froggi.es> Cc: André Almeida <andrealmeid@igalia.com> Cc: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-03-14 21:02:15 -04:00
Chao Yu	1788971e0b	f2fs: introduce FAULT_INCONSISTENT_FOOTER To simulate inconsistent node footer error. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-03-11 03:25:53 +00:00
Mike Snitzer	9254c8ae9b	nfsd: disallow file locking and delegations for NFSv4 reexport We do not and cannot support file locking with NFS reexport over NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport server reboot cannot allow clients to recover locks because the source NFS server has not rebooted, and so it is not in grace. Since the source NFS server is not in grace, it cannot offer any guarantees that the file won't have been changed between the locks getting lost and any attempt to recover/reclaim them. The same applies to delegations and any associated locks, so disallow them too. Clients are no longer allowed to get file locks or delegations from a reexport server, any attempts will fail with operation not supported. Update the "Reboot recovery" section accordingly in Documentation/filesystems/nfs/reexport.rst Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-03-10 09:11:08 -04:00
Chao Yu	c2ecba0265	f2fs: control nat_bits feature via mount option Introduce a new mount option "nat_bits" to control nat_bits feature, by default nat_bits feature is disabled. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-03-08 16:04:10 +00:00
Jan Kara	93fd0d46cb	vfs: Remove invalidate_inodes() The function can be replaced by evict_inodes. The only difference is that evict_inodes() skips the inodes with positive refcount without touching ->i_lock, but they are equivalent as evict_inodes() repeats the refcount check after having grabbed ->i_lock. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250307144318.28120-2-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-08 12:19:22 +01:00
John Garry	794ca29dcc	iomap: Support SW-based atomic writes Currently atomic write support requires dedicated HW support. This imposes a restriction on the filesystem that disk blocks need to be aligned and contiguously mapped to FS blocks to issue atomic writes. XFS has no method to guarantee FS block alignment for regular, non-RT files. As such, atomic writes are currently limited to 1x FS block there. To deal with the scenario that we are issuing an atomic write over misaligned or discontiguous data blocks - and raise the atomic write size limit - support a SW-based software emulated atomic write mode. For XFS, this SW-based atomic writes would use CoW support to issue emulated untorn writes. It is the responsibility of the FS to detect discontiguous atomic writes and switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed, SW-based atomic writes could be used always when the mounted bdev does not support HW offload, but this strategy is not initially expected to be used. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250303171120.2837067-6-john.g.garry@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-06 11:00:12 +01:00
John Garry	b4de0e9be9	iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW In future xfs will support a SW-based atomic write, so rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW to be clear which mode is being used. Also relocate setting of IOMAP_ATOMIC_HW to the write path in __iomap_dio_rw(), to be clear that this flag is only relevant to writes. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250303171120.2837067-3-john.g.garry@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-06 11:00:12 +01:00
Christian Brauner	1743d385e7	Merge branch 'vfs-6.15.shared.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Bring in iomap changes that xfs relies on. Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-06 10:59:18 +01:00
Aiden Ma	50dc696c3a	doc: correcting two prefix errors in idmappings.rst Add the 'k' prefix to id 21000. And id `u1000` in the third idmapping should be mapped to `k31000`, not `u31000`. Signed-off-by: Aiden Ma <jiaheng.ma@foxmail.com> Link: https://lore.kernel.org/r/tencent_4E7B1F143E8051530C21FCADF4E014DCBB06@qq.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-05 11:54:18 +01:00
Christian Brauner	be66901997	doc: fix inline emphasis warning Fix a warning spotted by linux-next build (htmldocs): Documentation/filesystems/porting.rst:1186: WARNING: Inline emphasis start-string without end-string. [docutils] Introduced by commit `88d5baf690` ("Change inode_operations.mkdir to return struct dentry ") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `88d5baf690` ("Change inode_operations.mkdir to return struct dentry ") Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-05 11:52:50 +01:00
Eric Biggers	13dc8eb900	fscrypt: mention init_on_free instead of page poisoning Page poisoning is an older debug option. The modern way to initialize memory on free for security reasons is to set init_on_free=1. Link: https://lore.kernel.org/r/20250304210156.14912-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2025-03-04 13:02:45 -08:00
Eric Biggers	eea957d8db	fscrypt: drop obsolete recommendation to enable optimized ChaCha20 Since the crypto kconfig options are being fixed to enable optimized ChaCha20 automatically (https://lore.kernel.org/r/Z8AY16EIqAYpfmRI@gondor.apana.org.au/), it is no longer necessary to give a recommendation to enable it. Link: https://lore.kernel.org/r/20250304205501.13797-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2025-03-04 12:56:06 -08:00
NeilBrown	88d5baf690	Change inode_operations.mkdir to return struct dentry * Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-27 20:00:17 +01:00
Jens Axboe	b2cd5ae693	iomap: make buffered writes work with RWF_DONTCACHE Add iomap buffered write support for RWF_DONTCACHE. If RWF_DONTCACHE is set for a write, mark the folios being written as uncached. Then writeback completion will drop the pages. The write_iter handler simply kicks off writeback for the pages, and writeback completion will take care of the rest. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20250204184047.356762-2-axboe@kernel.dk Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-27 11:27:54 +01:00
Christian Brauner	71628584df	Merge patch series "prep patches for my mkdir series" NeilBrown <neilb@suse.de> says: These two patches are cleanup are dependencies for my mkdir changes and subsequence directory locking changes. * patches from https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de: (2 commits) nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked() nfs/vfs: discard d_exact_alias() Link: https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-27 09:25:34 +01:00
Jan Kara	448fa70158	sysv: Remove the filesystem Since 2002 (change "Replace BKL for chain locking with sysvfs-private rwlock") the sysv filesystem was doing IO under a rwlock in its get_block() function (yes, a non-sleepable lock hold over a function used to read inode metadata for all reads and writes). Nobody noticed until syzbot in 2023 [1]. This shows nobody is using the filesystem. Just drop it. [1] https://lore.kernel.org/all/0000000000000ccf9a05ee84f5b0@google.com/ Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250220163940.10155-2-jack@suse.cz Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-21 10:32:47 +01:00
Christian Brauner	539a0879de	ovl: allow to specify override credentials Currently overlayfs uses the mounter's credentials for it's override_creds() calls. That provides a consistent permission model. This patches allows a caller to instruct overlayfs to use its credentials instead. The caller must be located in the same user namespace hierarchy as the user namespace the overlayfs instance will be mounted in. This provides a consistent and simple security model. With this it is possible to e.g., mount an overlayfs instance where the mounter must have CAP_SYS_ADMIN but the credentials used for override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage of custom fs{g,u}id different from the callers and other tweaks. Link: https://lore.kernel.org/r/20250219-work-overlayfs-v3-1-46af55e4ceda@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-19 14:32:09 +01:00
NeilBrown	204a575e91	VFS: add common error checks to lookup_one_qstr_excl() Callers of lookup_one_qstr_excl() often check if the result is negative or positive. These changes can easily be moved into lookup_one_qstr_excl() by checking the lookup flags: LOOKUP_CREATE means it is NOT an error if the name doesn't exist. LOOKUP_EXCL means it IS an error if the name DOES exist. This patch adds these checks, then removes error checks from callers, and ensures that appropriate flags are passed. This subtly changes the meaning of LOOKUP_EXCL. Previously it could only accompany LOOKUP_CREATE. Now it can accompany LOOKUP_RENAME_TARGET as well. A couple of small changes are needed to accommodate this. The NFS change is functionally a no-op but ensures nfs_is_exclusive_create() does exactly what the name says. Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250217003020.3170652-3-neilb@suse.de Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-19 14:09:15 +01:00
NeilBrown	1c3cb50b58	VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry No callers of kern_path_locked() or user_path_locked_at() want a negative dentry. So change them to return -ENOENT instead. This simplifies callers. This results in a subtle change to bcachefs in that an ioctl will now return -ENOENT in preference to -EXDEV. I believe this restores the behaviour to what it was prior to Commit `bbe6a7c899` ("bch2_ioctl_subvolume_destroy(): fix locking") Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250217003020.3170652-2-neilb@suse.de Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-19 14:08:41 +01:00
Allison Karlitskaya	212df80e01	Documentation: add a usecase for FS_IOC_READ_VERITY_METADATA Mention another potential usecase for FS_IOC_READ_VERITY_METADATA: creating filesystem images which contain fs-verity-enabled files, without having to redo all of the work in userspace. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com> Link: https://lore.kernel.org/r/20241126084833.70538-1-allison.karlitskaya@redhat.com Signed-off-by: Eric Biggers <ebiggers@google.com>	2025-02-17 11:03:29 -08:00
Charles Han	07ab93f3cc	Documentation: Remove repeated word in docs Remove the repeated word "to" docs. Signed-off-by: Charles Han <hanchunchao@inspur.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250207073433.23604-1-hanchunchao@inspur.com	2025-02-10 10:54:50 -07:00
Ritvik Gupta	7038f9f2e8	documentation/filesystems: fix spelling mistakes Corrected the following spelling mistakes, based on the suggestions by codespell: 1. Optionaly -> Optionally 2. prefereable -> preferable 3. peformance -> performance 4. ontext -> context 5. failuer -> failure 6. poiners -> pointers 7. realtively -> relatively 8. uptream -> upstream Signed-off-by: Ritvik Gupta <ritvikfoss@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250210043937.30952-1-ritvikfoss@gmail.com	2025-02-10 10:42:28 -07:00
Kemeng Shi	06b9e91425	jbd2: remove unused transaction->t_private_list After we remove ext4 journal callback, transaction->t_private_list is not used anymore. Just remove unused transaction->t_private_list. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20241218145414.1422946-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-02-10 07:48:24 -05:00
Kent Overstreet	fdfd0ad828	bcachefs docs: SubmittingPatches.rst Add an (initial?) patch submission checklist, focusing mainly on testing. Yes, all patches must be tested, and that starts (but does not end) with the patch author. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-02-06 22:35:11 -05:00
Christoph Hellwig	034c29fb3e	iomap: add a IOMAP_F_ANON_WRITE flag Add a IOMAP_F_ANON_WRITE flag that indicates that the write I/O does not have a target block assigned to it yet at iomap time and the file system will do that in the bio submission handler, splitting the I/O as needed. This is used to implement Zone Append based I/O for zoned XFS, where splitting writes to the hardware limits and assigning a zone to them happens just before sending the I/O off to the block layer, but could also be useful for other things like compressed I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-4-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-06 13:02:13 +01:00
Christoph Hellwig	c50105933f	iomap: allow the file system to submit the writeback bios Change ->prepare_ioend to ->submit_ioend and require file systems that implement it to submit the bio. This is needed for file systems that do their own work on the bios before submitting them to the block layer like btrfs or zoned xfs. To make this easier also pass the writeback context to the method. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-2-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-06 13:02:13 +01:00
Linus Torvalds	d3d90cc289	Provide stable parent and name to ->d_revalidate() instances Most of the filesystem methods where we care about dentry name and parent have their stability guaranteed by the callers; ->d_revalidate() is the major exception. It's easy enough for callers to supply stable values for expected name and expected parent of the dentry being validated. That kills quite a bit of boilerplate in ->d_revalidate() instances, along with a bunch of races where they used to access ->d_name without sufficient precautions. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5gkoQAKCRBZ7Krx/gZQ 6w9FAP4nyxNNWMjE1TwuWR/DNDMYYuw/qn/miZ88B5BUM8hzqgD/W2SjRvcbSaIm xSIYpbtKgtqNU34P1PU+dBvL8Utz2AE= =TWY8 -----END PGP SIGNATURE----- Merge tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs d_revalidate updates from Al Viro: "Provide stable parent and name to ->d_revalidate() instances Most of the filesystem methods where we care about dentry name and parent have their stability guaranteed by the callers; ->d_revalidate() is the major exception. It's easy enough for callers to supply stable values for expected name and expected parent of the dentry being validated. That kills quite a bit of boilerplate in ->d_revalidate() instances, along with a bunch of races where they used to access ->d_name without sufficient precautions" * tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: fix ->rename_sem exclusion orangefs_d_revalidate(): use stable parent inode and name passed by caller ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller nfs: fix ->d_revalidate() UAF on ->d_name accesses nfs{,4}_lookup_validate(): use stable parent inode passed by caller gfs2_drevalidate(): use stable parent inode and name passed by caller fuse_dentry_revalidate(): use stable parent inode and name passed by caller vfat_revalidate{,_ci}(): use stable parent inode passed by caller exfat_d_revalidate(): use stable parent inode passed by caller fscrypt_d_revalidate(): use stable parent inode passed by caller ceph_d_revalidate(): propagate stable name down into request encoding ceph_d_revalidate(): use stable parent inode passed by caller afs_d_revalidate(): use stable name and parent inode passed by caller Pass parent directory inode and expected name to ->d_revalidate() generic_ci_d_compare(): use shortname_storage ext4 fast_commit: make use of name_snapshot primitives dissolve external_name.u into separate members make take_dentry_name_snapshot() lockless dcache: back inline names with a struct-wrapped array of unsigned long make sure that DNAME_INLINE_LEN is a multiple of word size	2025-01-30 09:13:35 -08:00
Linus Torvalds	92cc9acff7	fuse update for 6.14 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ5nzXQAKCRDh3BK/laaZ PCaJAP4gw6CnxrdzuPvm7yEsINuHdavQ8aeCiimWwOC4eBzkOgD/SlMry5vwCkW9 WOzoONVUcNIPEqYXThw77OFlkFpKGwQ= =JQE1 -----END PGP SIGNATURE----- Merge tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "Add support for io-uring communication between kernel and userspace using IORING_OP_URING_CMD (Bernd Schubert). Following features enable gains in performance compared to the regular interface: - Allow processing multiple requests with less syscall overhead - Combine commit of old and fetch of new fuse request - CPU/NUMA affinity of queues Patches were reviewed by several people, including Pavel Begunkov, io-uring co-maintainer" * tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: prevent disabling io-uring on active connections fuse: enable fuse-over-io-uring fuse: block request allocation until io-uring init is complete fuse: {io-uring} Prevent mount point hang on fuse-server termination fuse: Allow to queue bg requests through io-uring fuse: Allow to queue fg requests through io-uring fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static fuse: {io-uring} Handle teardown of ring entries fuse: Add io-uring sqe commit and fetch support fuse: {io-uring} Make hash-list req unique finding functions non-static fuse: Add fuse-io-uring handling into fuse_copy fuse: Make fuse_copy non static fuse: {io-uring} Handle SQEs - register commands fuse: make args->in_args[0] to be always the header fuse: Add fuse-io-uring design documentation fuse: Move request bits fuse: Move fuse_get_dev to header file fuse: rename to fuse_dev_end_requests and make non-static	2025-01-29 09:40:23 -08:00
Linus Torvalds	b88fe2b5dd	NFS Client Updates for Linux 6.14 New Features: * Enable using direct IO with localio * Added localio related tracepoints Bugfixes: * Sunrpc fixes for working with a very large cl_tasks list * Fix a possible buffer overflow in nfs_sysfs_link_rpc_client() * Fixes for handling reconnections with localio * Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT * Fix COPY_NOTIFY xdr_buf size calculations * pNFS/Flexfiles fix for retrying requesting a layout segment for reads * Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is expired Cleanups: * Various other nfs & nfsd localio cleanups * Prepratory patches for async copy improvements that are under development * Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other xprts * Add netns inum and srcaddr to debugfs rpc_xprt info -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmeZUzUACgkQ18tUv7Cl QOvArw/9HltIlcJHbi7tApGJ4dFpuJCa/fHbA1n5bHvKrCR5aElmFZoiFDdsM1JX kFAlMED9n1dW9VmzLJcepxmrLo/t7KXueiZNharHynWTxcszSl6jS+tOFBW6OflG Rrrjq/SrsWI2Fu8X4e/7ZV7pqRLGGn5SSMwgbuMbcyzBvVgN8mZM/BneIp1J59AI 5NOsif5KWetVhQc43zlRlbVWR5cvNGcUK4i58LIaPFzPMt0xq/XJI+QWffj6kv4g cHabCNYTdQYMkhiPQC+LLYkw6sMbw2NatajTTYNMWfR/I+7wz9k5ej6CHKPIFCSr xjmscypySTLfMFQjrDFZkpX2CwSp/VIbV6go36DJwAlcCRzqz+I7cajlrRK4zvyr DyrcaZHvClEczP9QqdPj2wqRXbmIOsDMksOu4ACTUImd4o3f2v1K6DcwRj9oUIhV AGR31OEMt2A+RaVvVZYR4PpixJ01vH9LcmsaOu5KkHX8X4q2osQ7eMy+FV4kV09S pMnxDMAyszJU8IuzUG1/HfkonNlDMivIbqpgG4ZaVW08Nq4mCxJll1vTAa9FTLz2 z+9eocqKwf724q1RAgOB7vj4AwOwL4Ul6d18UBtyUitZz3ndLRZ8Yy6r/AhrpCsC 3co0Y3znZbKeRjmReNl0GLG4qiKE+E7Xh23Lf3IqXg8GE2Mu+Ls= =srvH -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "New Features: - Enable using direct IO with localio - Added localio related tracepoints Bugfixes: - Sunrpc fixes for working with a very large cl_tasks list - Fix a possible buffer overflow in nfs_sysfs_link_rpc_client() - Fixes for handling reconnections with localio - Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT - Fix COPY_NOTIFY xdr_buf size calculations - pNFS/Flexfiles fix for retrying requesting a layout segment for reads - Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is expired Cleanups: - Various other nfs & nfsd localio cleanups - Prepratory patches for async copy improvements that are under development - Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other xprts - Add netns inum and srcaddr to debugfs rpc_xprt info" * tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits) SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info pnfs/flexfiles: retry getting layout segment for reads NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE NFSv4.2: fix COPY_NOTIFY xdr buf size calculation NFS: Rename struct nfs4_offloadcancel_data NFS: Fix typo in OFFLOAD_CANCEL comment NFS: CB_OFFLOAD can return NFS4ERR_DELAY nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it nfs: fix incorrect error handling in LOCALIO nfs: probe for LOCALIO when v3 client reconnects to server nfs: probe for LOCALIO when v4 client reconnects to server nfs/localio: remove redundant code and simplify LOCALIO enablement nfs_common: add nfs_localio trace events nfs_common: track all open nfsd_files per LOCALIO nfs_client nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_ nfsd: update percpu_ref to manage references on nfsd_net ...	2025-01-28 14:23:46 -08:00
Linus Torvalds	2ab002c755	Driver core and debugfs updates Here is the big set of driver core and debugfs updates for 6.14-rc1. It's coming late in the merge cycle as there are a number of merge conflicts with your tree now, and I wanted to make sure they were working properly. To resolve them, look in linux-next, and I will send the "fixup" patch as a response to the pull request. Included in here is a bunch of driver core, PCI, OF, and platform rust bindings (all acked by the different subsystem maintainers), hence the merge conflict with the rust tree, and some driver core api updates to mark things as const, which will also require some fixups due to new stuff coming in through other trees in this merge window. There are also a bunch of debugfs updates from Al, and there is at least one user that does have a regression with these, but Al is working on tracking down the fix for it. In my use (and everyone else's linux-next use), it does not seem like a big issue at the moment. Here's a short list of the things in here: - driver core bindings for PCI, platform, OF, and some i/o functions. We are almost at the "write a real driver in rust" stage now, depending on what you want to do. - misc device rust bindings and a sample driver to show how to use them - debugfs cleanups in the fs as well as the users of the fs api for places where drivers got it wrong or were unnecessarily doing things in complex ways. - driver core const work, making more of the api take const * for different parameters to make the rust bindings easier overall. - other small fixes and updates All of these have been in linux-next with all of the aforementioned merge conflicts, and the one debugfs issue, which looks to be resolved "soon". Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ5koPA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymFHACfT5acDKf2Bov2Lc/5u3vBW/R6ChsAnj+LmgVI hcDSPodj4szR40RRnzBd =u5Ey -----END PGP SIGNATURE----- Merge tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs updates from Greg KH: "Here is the big set of driver core and debugfs updates for 6.14-rc1. Included in here is a bunch of driver core, PCI, OF, and platform rust bindings (all acked by the different subsystem maintainers), hence the merge conflict with the rust tree, and some driver core api updates to mark things as const, which will also require some fixups due to new stuff coming in through other trees in this merge window. There are also a bunch of debugfs updates from Al, and there is at least one user that does have a regression with these, but Al is working on tracking down the fix for it. In my use (and everyone else's linux-next use), it does not seem like a big issue at the moment. Here's a short list of the things in here: - driver core rust bindings for PCI, platform, OF, and some i/o functions. We are almost at the "write a real driver in rust" stage now, depending on what you want to do. - misc device rust bindings and a sample driver to show how to use them - debugfs cleanups in the fs as well as the users of the fs api for places where drivers got it wrong or were unnecessarily doing things in complex ways. - driver core const work, making more of the api take const * for different parameters to make the rust bindings easier overall. - other small fixes and updates All of these have been in linux-next with all of the aforementioned merge conflicts, and the one debugfs issue, which looks to be resolved "soon"" * tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits) rust: device: Use as_char_ptr() to avoid explicit cast rust: device: Replace CString with CStr in property_present() devcoredump: Constify 'struct bin_attribute' devcoredump: Define 'struct bin_attribute' through macro rust: device: Add property_present() saner replacement for debugfs_rename() orangefs-debugfs: don't mess with ->d_name octeontx2: don't mess with ->d_parent or ->d_parent->d_name arm_scmi: don't mess with ->d_parent->d_name slub: don't mess with ->d_name sof-client-ipc-flood-test: don't mess with ->d_name qat: don't mess with ->d_name xhci: don't mess with ->d_iname mtu3: don't mess wiht ->d_iname greybus/camera - stop messing with ->d_iname mediatek: stop messing with ->d_iname netdevsim: don't embed file_operations into your structs b43legacy: make use of debugfs_get_aux() b43: stop embedding struct file_operations into their objects carl9170: stop embedding file_operations into their objects ...	2025-01-28 12:25:12 -08:00

1 2 3 4 5 ...

2837 Commits